[ https://issues.apache.org/jira/browse/ARROW-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicola Crane reassigned ARROW-14653: ------------------------------------ Assignee: Nicola Crane > [R] head() hangs on CSV datasets > 600MB > ---------------------------------------- > > Key: ARROW-14653 > URL: https://issues.apache.org/jira/browse/ARROW-14653 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Nicola Crane > Assignee: Nicola Crane > Priority: Major > Fix For: 7.0.0 > > > I'm calling {{head()}} on a CSV dataset containing CSV files. I'm doing this > as I want to preview my dataset before I try to do anything with it that's > going to be more expensive computationally. > {code:r} > open_dataset("../../data/nyc-raw/", format = "csv") %>% > head(1) %>% > collect() > {code} > I have experimented with different combinations of files in the dataset > folder, and it seems to work fine when my total file size is <~600Mb but hang > if it's above that. This might not even be what that actual issue is but I'm > struggling to narrow it down beyond add extra files to the equation. > I've tried running with with the C++ debugger attached, but again, it just > hangs. > The files I'm using are the 2020-2021 Yellow Taxi trip records available > from: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page > A bit of investigation has shown me that I can load in different subsets of > files in fine, but when using all of them, the session hangs. -- This message was sent by Atlassian Jira (v8.20.1#820001)