paleolimbot commented on issue #33094:
URL: https://github.com/apache/arrow/issues/33094#issuecomment-1387452294

   I've done some bisecting of the tests in the pursuit of a minimal reproducer 
here. Since it appears that the docker image used in the nightly test is the 
only way to reproduce this, I looked up some image details and what gets run.
   
   - The image is defined in arrow/docker-compose.yml ("ubuntu-r-valgrind")
   - It's based on Winston Chang's r-debug image
   - The image might have some ubuntu version mismatch thing going on...some of 
the options in  seems to suggest it's an 18.04 image but I'm pretty sure it's 
22.04 that's running.
   - The script that runs is in ci/scripts/r_valgrind.sh. It basically runs 
r/tests/testthat.R with R -d valgrind.
   
   ```
   ubuntu-r-valgrind:
       # Only 18.04 and amd64 supported
       # Usage:
       #   docker-compose build ubuntu-r-valgrind
       #   docker-compose run ubuntu-r-valgrind
       image: ${REPO}:amd64-ubuntu-18.04-r-valgrind
       build:
         context: .
         dockerfile: ci/docker/linux-r.dockerfile
         cache_from:
           - ${REPO}:amd64-ubuntu-18.04-r-valgrind
         args:
           base: wch1/r-debug:latest
           r_bin: RDvalgrind
           tz: ${TZ}
       environment:
         <<: [*ccache, *sccache]
         ARROW_R_DEV: ${ARROW_R_DEV}
         # AVX512 not supported by Valgrind (similar to ARROW-9851) some 
runners support AVX512 and some do not
         # so some build might pass without this setting, but we want to ensure 
that we stay to AVX2 regardless of runner.
         EXTRA_CMAKE_FLAGS: "-DARROW_RUNTIME_SIMD_LEVEL=AVX2"
         ARROW_SOURCE_HOME: "/arrow"
       volumes: *ubuntu-volumes
       command: >
         /bin/bash -c "
           /arrow/ci/scripts/r_valgrind.sh /arrow"
   ```
   
   To find a test file with a a leak, I modified `r/test/testthat.R` with a 
filter to use specific tests:
   
   ```r
   # Tried:
   # filter = "^Array" (no leaks)
   # filter = "^dataset" (no leaks)
   # filter = "^dplyr" (leaks!)
   # filter = "^dplyr-[g-u]" (leaks!)
   # filter = "^dplyr-[s-u]" (leaks!)
   # filter = "^dplyr-summarize" (leaks!)
   test_check("arrow", reporter = arrow_reporter, filter = "^dplyr-summarize")
   ```
   
   Next, I'll see if I can isolate one test in the summarize tests that leaks 
consistently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to