westonpace commented on issue #36090:
URL: https://github.com/apache/arrow/issues/36090#issuecomment-1595586144

   I investigated this today and I think I figured out the problem.  When we 
split out acero into it's own module we ended up introducing 
`test_util_internal.cc`  I believe this had originally been a part of the arrow 
testing library but I didn't dig too much into where it came from.
   
   To make it available to all the tests we created an `add_arrow_acero_test` 
function which had...
   
   ```
     add_arrow_test(${REL_TEST_NAME}
                    EXTRA_LINK_LIBS
                    ${ARROW_ACERO_TEST_LINK_LIBS}
                    PREFIX
                    ${PREFIX}
                    SOURCES
                    test_util_internal.cc
                    LABELS
                    ${LABELS}
                    ${ARG_UNPARSED_ARGUMENTS})
   ```
   
   We also added `test_util_internal.cc` to all of the dataset tests in a 
similar fashion.
   
   Unfortunately, adding this common utility in this way actually means that 
the file gets compiled again for every single test.  Since this file is fairly 
complex it ended up adding quite a bit to the overall compilation time.
   
   The fix is straightforward but I may need some cmake help to figure out the 
best way to do it in our existing infrastructure.  I'd like to create a small 
test library for acero that contains `src/arrow/acero/dummy_nodes.cc` and 
`src/arrow/acero/test_util_internal.cc`.  Also, I'd like to create a small test 
library for datasets that contains `src/arrow/acero/test_util_internal.cc` and 
`src/arrow/dataset/test_util_internal.cc`.
   
   I don't believe we want to export these libraries (like we do with 
arrow_testing).  I think they can just be small static archives that get added 
as dependencies to all of the tests.
   
   As a hack to test this out I added `test_util_internal.cc` to the list of 
acero sources.  That appears to restore the build time to what I got in 11.0.0.
   
   I did attempt to use ClangBuildAnalyzer to further reduce compile times.  I 
didn't find any clear wins.  However, I was able to tune precompiled header 
files to drop build times by an additional 35%.  In other words, on my system:
   
   Current Build Time: 2100
   After fixing duplicate test files: 1900
   With tuned precompiled headers: 1300
   
   The files I needed to add to the precompiled headers were:
   
    * datum.h - Since Datum is generally passed around by value this is an 
inevitable include for anything remotely involved with compute.
    * scalar.h - This is another header that was fairly ubiquitous.  However, 
we may be able to reduce this one with better forward declarations since 
scalars are typically passed around by reference/pointer.
    * compute/exec.h - This is needed for ExecBatch.  Similar to Datum I think 
this one is inevitable because ExecBatch is usually passed by value
    * compute/expression.h - Same as the Datum and ExecBatch as Expression is 
typically passed by value
    * compute/function.h
    * compute/kernel.h
   
   Adding datum/exec/expression to the pch for acero and datasets may be 
sufficient.  I don't think we need it in the top-level pch.
   
   These last two are not actually included excessively.  However, they are 
needed by all of the compute kernel files and they are very expensive includes 
(kernel.h in particular).  We could probably leave them out or, if we really 
want them, find a way to make a dedicated pch file for just compute kernels.
   
   For the testing pch I needed to add the gmock headers.  gtest/gtest.h isn't 
actually too bad but the gmock includes are very expensive.
   
   Action items:
   
    - [ ] Figure out how to make internal "common testing" libraries for acero 
& dataset (maybe @kou can help?)
    - [ ] See if we can reduce the number of times we include scalar.h 
(@westonpace)
    - [ ] Get `kernel.h` out of the compute api_xyz.h files (these only need 
FunctionOptions which we can put in its own file) @westonpace 
    - [ ] Consider enabling precompiled headers on builds that don't seem to be 
getting much benefit from ccache (I can look into this but I don't know much 
about how well ccache is or isn't working on these CI jobs so I'd appreciate 
any input) 
   
   I've prototyped some of these tasks in 
https://github.com/westonpace/arrow/tree/experiment/build-times 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to