Joris Van den Bossche created ARROW-18293: ---------------------------------------------
Summary: [C++] Proxy memory pool crashes with Dataset scanning Key: ARROW-18293 URL: https://issues.apache.org/jira/browse/ARROW-18293 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Joris Van den Bossche Discovered while trying to use the proxy memory pool for testing ARROW-18164 See https://github.com/apache/arrow/pull/14516#discussion_r1005433867 This test segfaults (using the fixture in {{test_dataset.py}}: {code:python} @pytest.mark.parquet def test_scanner_proxy_memory_pool(dataset): proxy_pool = pa.proxy_memory_pool(pa.default_memory_pool()) _ = dataset.to_table(memory_pool=proxy_pool) {code} Response of [~westonpace]: {quote}My guess is that the problem is that the scanner erroneously returns before all work is completely finished. Changing the thread pool or the memory pool too quickly after a scan can lead to this kind of error. The new scanner was created specifically to avoid this problem but it isn't the default yet (still working through some follow-up PRs to make sure we have the same functionality).{quote} So once that becomes the default new scanner, we can see if this is fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)