[ 
https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian updated ARROW-15730:
------------------------------
    Description: 
Hi,

I'm trying to load a ~10gb arrow file into R.

For whatever reason the memory usage blows up to ~110-120gb.

The weird thing is that when deleting the object again and running a gc() the 
memory usage goes down to 90gb. The delta of ~20-30gb is what I would have 
expected the dataframe to use up in memory (and that's also approx. what was 
used - in total during the load - when running the old arrow version of 0.15.1. 
And it is also what R shows me when just printing the object size.)

The commands I'm running are simply:

options(arrow.use_threads=FALSE);

arrow::set_cpu_count(1); # need this - otherwise it freezes under windows

arrow::read_arrow('file.arrow5')

Is arrow reserving some resources in the background and not giving them up 
again? Are there some settings I need to change for this?

Is this something that is known and fixed in a newer version?

*Note* that this doesn't happen in Linux. There all the resources are freed up 
when calling the gc() function - not sure if it matters but there I also don't 
need to set the cpu count to 1.

Any help would be appreciated.

  was:
Hi,

I'm trying to load a ~10gb arrow file into R.

For whatever reason the memory usage blows up to ~110-120gb.

The weird thing is that when deleting the object again and running a gc() the 
memory usage goes down to 90gb. The delta of ~20-30gb is what I would have 
expected the dataframe to use up in memory (and that's also approx. what was 
used when running the old arrow version of 0.15.1. And it is also what R shows 
me when just printing the object size.)

The commands I'm running are simply:

options(arrow.use_threads=FALSE);

arrow::set_cpu_count(1); # need this - otherwise it freezes under windows

arrow::read_arrow('file.arrow5')

Is arrow reserving some resources in the background and not giving them up 
again? Are there some settings I need to change for this?

Is this something that is known and fixed in a newer version?

*Note* that this doesn't happen in Linux. There all the resources are freed up 
when calling the gc() function - not sure if it matters but there I also don't 
need to set the cpu count to 1.

Any help would be appreciated.


> Memory usage in R
> -----------------
>
>                 Key: ARROW-15730
>                 URL: https://issues.apache.org/jira/browse/ARROW-15730
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Christian
>            Priority: Major
>             Fix For: 6.0.1
>
>
> Hi,
> I'm trying to load a ~10gb arrow file into R.
> For whatever reason the memory usage blows up to ~110-120gb.
> The weird thing is that when deleting the object again and running a gc() the 
> memory usage goes down to 90gb. The delta of ~20-30gb is what I would have 
> expected the dataframe to use up in memory (and that's also approx. what was 
> used - in total during the load - when running the old arrow version of 
> 0.15.1. And it is also what R shows me when just printing the object size.)
> The commands I'm running are simply:
> options(arrow.use_threads=FALSE);
> arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
> arrow::read_arrow('file.arrow5')
> Is arrow reserving some resources in the background and not giving them up 
> again? Are there some settings I need to change for this?
> Is this something that is known and fixed in a newer version?
> *Note* that this doesn't happen in Linux. There all the resources are freed 
> up when calling the gc() function - not sure if it matters but there I also 
> don't need to set the cpu count to 1.
> Any help would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to