Re: [DISCUSS][C++/Python] Bazel example
> > I don't get how this is a cycle. It only means Bazel is too limited to > distinguish between a header dependency and a C++ module? Agreed, this isn't a true cycle, but bazel is opinionated about this (i.e. forces workarounds). In the example I highlighted it might have been cleaner to take the approach combining the two ".cc" files and ".h" files into a single bazel target. Within Google, there is a fairly strong convention of 1 ".h" and ".cc" per build target. > Do you mean that long compile times are ok because we can ask > contributors to buy 16-core monsters? No, this was my poor attempt at humor. I apologize if it offended you or anyone else. The hardware I use for my Arrow development is old enough that I've just started accepting slow build times. Getting back to potentially merging this, we discussed on bazel on the sync call. One option is to not add this to the Arrow CI builds and let Google projects that depend on the binding be responsible for keeping it working. This has the potential for bit-rot, but might be a good compromise and let other developers try it out to see if they like it. Cheers, Micah On Wed, Nov 27, 2019 at 6:52 AM Antoine Pitrou wrote: > > Le 27/11/2019 à 06:16, Micah Kornfield a écrit : > > > >> Can you give an example of circular dependency? Can this be solved by > >> having more "type_fwd.h" headers for forward declarations of opaque > types? > > > > I think the type_fwd.h might contribute to the problem. The solution > would > > be more granular header/compilation units when possible (or combining > > targets appropriately). An example of the problem is expression.h/.cc > and > > operation.h/.cc in the compute library. Because operation.cc depends on > > expression.h and expression.cc relies on expression.h there is cycle > > between the two targets. > > I don't get how this is a cycle. It only means Bazel is too limited to > distinguish between a header dependency and a C++ module? > > For me, a cycle would be something like "expression.h includes > operation.h which includes expression.h" (I've actually already seen > things like this, though not in Arrow AFAIR). > > > I thought computer > > upgrades where something to look forward to ;) > > Do you mean that long compile times are ok because we can ask > contributors to buy 16-core monsters? > > Regards > > Antoine. >
Re: [DISCUSS][C++/Python] Bazel example
Le 27/11/2019 à 06:16, Micah Kornfield a écrit : > >> Can you give an example of circular dependency? Can this be solved by >> having more "type_fwd.h" headers for forward declarations of opaque types? > > I think the type_fwd.h might contribute to the problem. The solution would > be more granular header/compilation units when possible (or combining > targets appropriately). An example of the problem is expression.h/.cc and > operation.h/.cc in the compute library. Because operation.cc depends on > expression.h and expression.cc relies on expression.h there is cycle > between the two targets. I don't get how this is a cycle. It only means Bazel is too limited to distinguish between a header dependency and a C++ module? For me, a cycle would be something like "expression.h includes operation.h which includes expression.h" (I've actually already seen things like this, though not in Arrow AFAIR). > I thought computer > upgrades where something to look forward to ;) Do you mean that long compile times are ok because we can ask contributors to buy 16-core monsters? Regards Antoine.
Re: [DISCUSS][C++/Python] Bazel example
Hi Antoine, > My question would be: what happens after the PR is merged? Are > developers supposed to keep the Bazel setup working in addition to > CMake? Or is there a dedicated maintainer (you? :-)) to fix regressions > when they happen? In the short term, I would be will to be a dedicated maintainer for Mac (and once I get Linux support working for that as well). I'd like to classify the support as very experimental (not advertise in documentation yet). If other devs find Bazel useful, I would expect others to help with maintenance naturally. If it gets too much for me to maintain, I'm willing to drop support completely, since it won't be a critical part of the build infrastructure. Once the setup is more complete, I would plan on adding a CI target for it as well. > Can you give an example of circular dependency? Can this be solved by > having more "type_fwd.h" headers for forward declarations of opaque types? I think the type_fwd.h might contribute to the problem. The solution would be more granular header/compilation units when possible (or combining targets appropriately). An example of the problem is expression.h/.cc and operation.h/.cc in the compute library. Because operation.cc depends on expression.h and expression.cc relies on expression.h there is cycle between the two targets. I fixed this by making a new header only target for expression.h, which the operation target depends on. Then the expression target depends on the operation target. An alternative approach would be to combine "expression.*" and "operation.*" into a single target. > (also, generally, it would be desirable to use more of these, since our > compile times have become egregious as of late - I'm currently > considering replacing my 8-core desktop CPU with a beefier one :-/) I'm not a huge fan of this approach in general, but since I haven't been able to contribute on a day-to-day basis to the C++ code base, I'll let the active contributors decide the best course here. I thought computer upgrades where something to look forward to ;) This sounds really like a bummer. Do you have to spell those out by > hand? Or is there some tool that infers dependencies and generates the > declarations for you? Yes, I had to spell them out by hand. There is an internal tool at Google that helps with it (I didn't use it for this PR). There has been some discussion of open-sourcing the tool [1], but I wouldn't expect it any time soon. Luckily things are fairly well modularized at the moment, so while painful, I still felt it was not tremendously painful. Another solution would be to have larger targets (e.g. one per directory) that use globs which would make it less painful, but this loses some of the benefits mentioned above. [1] https://github.com/bazelbuild/bazel/issues/6871 On Tue, Nov 26, 2019 at 1:27 AM Antoine Pitrou wrote: > > Hi Micah, > > Le 26/11/2019 à 05:52, Micah Kornfield a écrit : > > > > After going through this exercise I put together a list of pros and cons > > below. > > > > I would like to hear from other devs: > > 1. Their opinions on setting this up as an alternative system (I'm > willing > > to invest some more time in it). > > 2. What people think the minimum bar for merging a PR like this should > be? > > My question would be: what happens after the PR is merged? Are > developers supposed to keep the Bazel setup working in addition to > CMake? Or is there a dedicated maintainer (you? :-)) to fix regressions > when they happen? > > > Pros: > > 1. Being able to run "bazel test python/..." and having compilation of > all > > python dependencies just work is a nice experience. > > 2. Because of the granular compilation units, it can improve developer > > velocity. Unit tests can depend only on the sub-components they are meant > > to test. They don't need to compile and relink arrow.so. > > 3. The built-in documentation it provides about visibility and > > relationships between components is nice (its uncovered some "interesting > > dependencies"). I didn't make heavy use of it, but its concept of > > "visibility" makes things more explicit about what external consumers > > should be depending on, and what inter-project components should depend > on > > (e.g. explicitly limit the scope of vendored code). > > 4. Extensions are essentially python, which might be easier to work with > > then CMake > > Those sound nice. > > > Cons: > > 1. Bazel is opinionated on C++ layout. In particular it requires some > > workarounds to deal with circular .h/.cc dependencies. The two main ways > > of doing this are either increasing the size of compilable units [4] to > > span all dependencies in the cycle, or creating separate > > header/implementation targets, I've used both strategies in the PR. One > > could argue that it would be nice to reduce circular dependencies in > > general. > > Can you give an example of circular dependency? Can this be solved by > having more "type_fwd.h" header
Re: [DISCUSS][C++/Python] Bazel example
Hi Micah, Le 26/11/2019 à 05:52, Micah Kornfield a écrit : > > After going through this exercise I put together a list of pros and cons > below. > > I would like to hear from other devs: > 1. Their opinions on setting this up as an alternative system (I'm willing > to invest some more time in it). > 2. What people think the minimum bar for merging a PR like this should be? My question would be: what happens after the PR is merged? Are developers supposed to keep the Bazel setup working in addition to CMake? Or is there a dedicated maintainer (you? :-)) to fix regressions when they happen? > Pros: > 1. Being able to run "bazel test python/..." and having compilation of all > python dependencies just work is a nice experience. > 2. Because of the granular compilation units, it can improve developer > velocity. Unit tests can depend only on the sub-components they are meant > to test. They don't need to compile and relink arrow.so. > 3. The built-in documentation it provides about visibility and > relationships between components is nice (its uncovered some "interesting > dependencies"). I didn't make heavy use of it, but its concept of > "visibility" makes things more explicit about what external consumers > should be depending on, and what inter-project components should depend on > (e.g. explicitly limit the scope of vendored code). > 4. Extensions are essentially python, which might be easier to work with > then CMake Those sound nice. > Cons: > 1. Bazel is opinionated on C++ layout. In particular it requires some > workarounds to deal with circular .h/.cc dependencies. The two main ways > of doing this are either increasing the size of compilable units [4] to > span all dependencies in the cycle, or creating separate > header/implementation targets, I've used both strategies in the PR. One > could argue that it would be nice to reduce circular dependencies in > general. Can you give an example of circular dependency? Can this be solved by having more "type_fwd.h" headers for forward declarations of opaque types? (also, generally, it would be desirable to use more of these, since our compile times have become egregious as of late - I'm currently considering replacing my 8-core desktop CPU with a beefier one :-/) > 4. It is more verbose to configure then CMake (each compilation unit needs > to be spelled out with dependencies). This sounds really like a bummer. Do you have to spell those out by hand? Or is there some tool that infers dependencies and generates the declarations for you? Regards Antoine.
[DISCUSS][C++/Python] Bazel example
As previously discussed [1], I took on the effort the effort of trying to come up with a demo for using bazel as a build system for C++/Python. The results [2] are a little bit of a mixed bag. I was able to construct an example that runs on my Mac that can compile and run most of the tests in "src/arrow" as well as the IPC read/write test, and a python test (test_array.py). I also have C++ Flight compiling. A demonstration for how different library locations can be selected is also available [3]. This would need a lot more work to come to the current functionality that CMake has. After going through this exercise I put together a list of pros and cons below. I would like to hear from other devs: 1. Their opinions on setting this up as an alternative system (I'm willing to invest some more time in it). 2. What people think the minimum bar for merging a PR like this should be? Pros: 1. Being able to run "bazel test python/..." and having compilation of all python dependencies just work is a nice experience. 2. Because of the granular compilation units, it can improve developer velocity. Unit tests can depend only on the sub-components they are meant to test. They don't need to compile and relink arrow.so. 3. The built-in documentation it provides about visibility and relationships between components is nice (its uncovered some "interesting dependencies"). I didn't make heavy use of it, but its concept of "visibility" makes things more explicit about what external consumers should be depending on, and what inter-project components should depend on (e.g. explicitly limit the scope of vendored code). 4. Extensions are essentially python, which might be easier to work with then CMake Cons: 1. Bazel is opinionated on C++ layout. In particular it requires some workarounds to deal with circular .h/.cc dependencies. The two main ways of doing this are either increasing the size of compilable units [4] to span all dependencies in the cycle, or creating separate header/implementation targets, I've used both strategies in the PR. One could argue that it would be nice to reduce circular dependencies in general. 2. Bazel python support still seems lacking. To make the test work, I needed to explicitly include all transitive dependencies of the "pip" installed packaged by hand. 3. Bazel in general doesn't seem to have wide adoption so any customization probably won't have a whole lot of support (I've been told there are some adapters with CMake that can leverage some of the existing code). 4. It is more verbose to configure then CMake (each compilation unit needs to be spelled out with dependencies). 5. The "packaging" story of different build artifacts still needs to be explored. Thanks, Micah [1] https://lists.apache.org/thread.html/26c2a9e7e35ffc6f6ff68fbbfb38a0a33002b8e7210e8d323566f447@%3Cdev.arrow.apache.org%3E [2] https://github.com/apache/arrow/pull/5897/files [3] https://github.com/apache/arrow/pull/5897/files#diff-85ecc9fdaae4c714198a1c31c7748f2a [4] https://github.com/apache/arrow/pull/5897/files#diff-c23198ffa8af9adf6825cb9c6f6e135b