[ https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-15730: ------------------------------------------ Fix Version/s: (was: 6.0.1) > [R] Memory usage in R blows up > ------------------------------ > > Key: ARROW-15730 > URL: https://issues.apache.org/jira/browse/ARROW-15730 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 6.0.1 > Reporter: Christian > Assignee: Will Jones > Priority: Major > Attachments: image-2022-02-19-09-05-32-278.png > > > Hi, > I'm trying to load a ~10gb arrow file into R (under Windows) > _(The file is generated in the 6.0.1 arrow version under Linux)._ > For whatever reason the memory usage blows up to ~110-120gb (in a fresh and > empty R instance). > The weird thing is that when deleting the object again and running a gc() the > memory usage goes down to 90gb only. The delta of ~20-30gb is what I would > have expected the dataframe to use up in memory (and that's also approx. what > was used - in total during the load - when running the old arrow version of > 0.15.1. And it is also what R shows me when just printing the object size.) > The commands I'm running are simply: > options(arrow.use_threads=FALSE); > arrow::set_cpu_count(1); # need this - otherwise it freezes under windows > arrow::read_arrow('file.arrow5') > Is arrow reserving some resources in the background and not giving them up > again? Are there some settings I need to change for this? > Is this something that is known and fixed in a newer version? > *Note* that this doesn't happen in Linux. There all the resources are freed > up when calling the gc() function - not sure if it matters but there I also > don't need to set the cpu count to 1. > Any help would be appreciated. -- This message was sent by Atlassian Jira (v8.20.1#820001)