[
https://issues.apache.org/jira/browse/ARROW-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian updated ARROW-15729:
--
Description:
Hi -
I recently upgraded to Arrow 6.0.1 and am using it in R.
Whenever reading a large file (~10gb) in Windows it randomly freezes sometimes.
I can see the memory being allocated in the first 10-20 seconds, but then
nothing happens and R just doesn't respond (the R process becomes idle too).
I'm using the option options(arrow.use_threads=FALSE).
I didn't have this issue with the previous version (0.15.1) I was using. And
the file reads fine under Linux.
I would post a reproducible example but it happens randomly. I even thought I
would just read large files in pieces by first getting all the distinct
sections of a specific column (with compute>collect) but that hangs too.
Any ideas would be appreciated.
*Edit*
Not sure if it makes sense to anyone but after a few tries it seems that the
issue only happens in Rstudio. In the R console it loads it fine. All I'm
executing is the below.
options(arrow.use_threads=FALSE)
aa <- arrow::read_arrow('.../file.arrow5')
One thing I want to point out that the underlying Rscript process under Rstudio
seems to definitely use more than one core when executing the above.
*Edit2*
Using arrow::set_cpu_count(1) seems to solve the issue.
was:
Hi -
I recently upgraded to Arrow 6.0.1 and am using it in R.
Whenever reading a large file (~10gb) in Windows it randomly freezes sometimes.
I can see the memory being allocated in the first 10-20 seconds, but then
nothing happens and R just doesn't respond (the R process becomes idle too).
I'm using the option options(arrow.use_threads=FALSE).
I didn't have this issue with the previous version (0.15.1) I was using. And
the file reads fine under Linux.
I would post a reproducible example but it happens randomly. I even thought I
would just read large files in pieces by first getting all the distinct
sections of a specific column (with compute>collect) but that hangs too.
Any ideas would be appreciated.
*Edit*
Not sure if it makes sense to anyone but after a few tries it seems that the
issue only happens in Rstudio. In the R console it loads it fine. All I'm
executing is the below.
options(arrow.use_threads=FALSE)
aa <- arrow::read_arrow('.../file.arrow5')
One thing I want to point out that the underlying Rscript process under Rstudio
seems to definitely use more than one core when executing the above.
> [R] Reading large files randomly freezes
>
>
> Key: ARROW-15729
> URL: https://issues.apache.org/jira/browse/ARROW-15729
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
>Reporter: Christian
>Priority: Critical
> Fix For: 6.0.1
>
>
> Hi -
> I recently upgraded to Arrow 6.0.1 and am using it in R.
> Whenever reading a large file (~10gb) in Windows it randomly freezes
> sometimes. I can see the memory being allocated in the first 10-20 seconds,
> but then nothing happens and R just doesn't respond (the R process becomes
> idle too).
> I'm using the option options(arrow.use_threads=FALSE).
> I didn't have this issue with the previous version (0.15.1) I was using. And
> the file reads fine under Linux.
> I would post a reproducible example but it happens randomly. I even thought I
> would just read large files in pieces by first getting all the distinct
> sections of a specific column (with compute>collect) but that hangs too.
> Any ideas would be appreciated.
> *Edit*
> Not sure if it makes sense to anyone but after a few tries it seems that the
> issue only happens in Rstudio. In the R console it loads it fine. All I'm
> executing is the below.
> options(arrow.use_threads=FALSE)
> aa <- arrow::read_arrow('.../file.arrow5')
> One thing I want to point out that the underlying Rscript process under
> Rstudio seems to definitely use more than one core when executing the above.
> *Edit2*
> Using arrow::set_cpu_count(1) seems to solve the issue.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)