Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-12-01 Thread Jan Gorecki
Yes, I do set outside of R, in shell:
R_MAX_VSIZE=100Gb SRC_DATANAME=G1_1e9_2e0_0_0 /usr/bin/time -v Rscript
datatable/groupby-datatable.R

I think it might be related to allocations made with malloc rather than R_alloc.
Probably malloc allocation is not capped by setting this env var.
If so, then I have to limit memory on OS/shell level. As you mentioned before.

Best

On Tue, Dec 1, 2020 at 6:54 PM  wrote:
>
> The fact that your max resident size isn't affected looks odd.  Are
> you setting the environment variable outside R? When I run
>
>  env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0
>
> (your code in jg.R). I get a quick failure with 11785524maxresident)k
>
> Best,
>
> luke
>
> On Tue, 1 Dec 2020, Jan Gorecki wrote:
>
> > Thank you Luke,
> >
> > I tried your suggestion about R_MAX_VSIZE but I am not able to get the
> > error you are getting.
> > I tried recent R devel as I have seen you made a change to GC there.
> > My machine is 128GB, free -h reports 125GB available. I tried to set
> > 128, 125 and 100. In all cases the result is "Command terminated by
> > signal 9". Each took around 6-6.5h.
> > Details below, if it tells you anything how could I optimize it (or
> > raise an exception early) please do let me know.
> >
> > R 4.0.3
> >
> > unset R_MAX_VSIZE
> >User time (seconds): 40447.92
> >System time (seconds): 4034.37
> >Percent of CPU this job got: 201%
> >Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59
> >Maximum resident set size (kbytes): 127261184
> >Major (requiring I/O) page faults: 72441
> >Minor (reclaiming a frame) page faults: 3315491751
> >Voluntary context switches: 381446
> >Involuntary context switches: 529554
> >File system inputs: 108339200
> >File system outputs: 120
> >
> > R-devel 2020-11-27 r79522
> >
> > unset R_MAX_VSIZE
> >User time (seconds): 40713.52
> >System time (seconds): 4039.52
> >Percent of CPU this job got: 198%
> >Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52
> >Maximum resident set size (kbytes): 127254796
> >Major (requiring I/O) page faults: 72810
> >Minor (reclaiming a frame) page faults: 3433589848
> >Voluntary context switches: 384363
> >Involuntary context switches: 609024
> >File system inputs: 108467064
> >File system outputs: 112
> >
> > R_MAX_VSIZE=128Gb
> >User time (seconds): 40411.13
> >System time (seconds): 4227.99
> >Percent of CPU this job got: 198%
> >Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01
> >Maximum resident set size (kbytes): 127249316
> >Major (requiring I/O) page faults: 88500
> >Minor (reclaiming a frame) page faults: 3544520527
> >Voluntary context switches: 384117
> >Involuntary context switches: 545397
> >File system inputs: 111675896
> >File system outputs: 120
> >
> > R_MAX_VSIZE=125Gb
> >User time (seconds): 40246.83
> >System time (seconds): 4042.76
> >Percent of CPU this job got: 201%
> >Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56
> >Maximum resident set size (kbytes): 127254200
> >Major (requiring I/O) page faults: 63867
> >Minor (reclaiming a frame) page faults: 3449493803
> >Voluntary context switches: 370753
> >Involuntary context switches: 614607
> >File system inputs: 106322880
> >File system outputs: 112
> >
> > R_MAX_VSIZE=100Gb
> >User time (seconds): 41837.10
> >System time (seconds): 3979.57
> >Percent of CPU this job got: 192%
> >Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34
> >Maximum resident set size (kbytes): 127256940
> >Major (requiring I/O) page faults: 66829
> >Minor (reclaiming a frame) page faults: 3357778594
> >Voluntary context switches: 391149
> >Involuntary context switches: 646410
> >File system inputs: 106605648
> >File system outputs: 120
> >
> > On Fri, Nov 27, 2020 at 10:18 PM  wrote:
> >>
> >> On Thu, 26 Nov 2020, Jan Gorecki wrote:
> >>
> >>> Thank you Luke for looking into it. Your knowledge of gc is definitely
> >>> helpful here. I put comments inline below.
> >>>
> >>> Best,
> >>> Jan
> >>>
> >>> On Wed, Nov 25, 2020 at 10:38 PM  wrote:
> 
>  On Tue, 24 Nov 2020, Jan Gorecki wrote:
> 
> > As for other calls to system. I avoid calling system. In the past I
> > had some (to get memory stats from OS), but they were failing with
> > exactly the same issue. So yes, if I would add call to system before
> > calling quit, I believe it would fail with the same error.
> > At the same time I think (although I am not sure) that new allocations
> > made in R are working fine. So R seems to reserve some memory and can
> > continue to operate, while external call like system

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-12-01 Thread luke-tierney

The fact that your max resident size isn't affected looks odd.  Are
you setting the environment variable outside R? When I run

env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0

(your code in jg.R). I get a quick failure with 11785524maxresident)k

Best,

luke

On Tue, 1 Dec 2020, Jan Gorecki wrote:


Thank you Luke,

I tried your suggestion about R_MAX_VSIZE but I am not able to get the
error you are getting.
I tried recent R devel as I have seen you made a change to GC there.
My machine is 128GB, free -h reports 125GB available. I tried to set
128, 125 and 100. In all cases the result is "Command terminated by
signal 9". Each took around 6-6.5h.
Details below, if it tells you anything how could I optimize it (or
raise an exception early) please do let me know.

R 4.0.3

unset R_MAX_VSIZE
   User time (seconds): 40447.92
   System time (seconds): 4034.37
   Percent of CPU this job got: 201%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59
   Maximum resident set size (kbytes): 127261184
   Major (requiring I/O) page faults: 72441
   Minor (reclaiming a frame) page faults: 3315491751
   Voluntary context switches: 381446
   Involuntary context switches: 529554
   File system inputs: 108339200
   File system outputs: 120

R-devel 2020-11-27 r79522

unset R_MAX_VSIZE
   User time (seconds): 40713.52
   System time (seconds): 4039.52
   Percent of CPU this job got: 198%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52
   Maximum resident set size (kbytes): 127254796
   Major (requiring I/O) page faults: 72810
   Minor (reclaiming a frame) page faults: 3433589848
   Voluntary context switches: 384363
   Involuntary context switches: 609024
   File system inputs: 108467064
   File system outputs: 112

R_MAX_VSIZE=128Gb
   User time (seconds): 40411.13
   System time (seconds): 4227.99
   Percent of CPU this job got: 198%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01
   Maximum resident set size (kbytes): 127249316
   Major (requiring I/O) page faults: 88500
   Minor (reclaiming a frame) page faults: 3544520527
   Voluntary context switches: 384117
   Involuntary context switches: 545397
   File system inputs: 111675896
   File system outputs: 120

R_MAX_VSIZE=125Gb
   User time (seconds): 40246.83
   System time (seconds): 4042.76
   Percent of CPU this job got: 201%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56
   Maximum resident set size (kbytes): 127254200
   Major (requiring I/O) page faults: 63867
   Minor (reclaiming a frame) page faults: 3449493803
   Voluntary context switches: 370753
   Involuntary context switches: 614607
   File system inputs: 106322880
   File system outputs: 112

R_MAX_VSIZE=100Gb
   User time (seconds): 41837.10
   System time (seconds): 3979.57
   Percent of CPU this job got: 192%
   Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34
   Maximum resident set size (kbytes): 127256940
   Major (requiring I/O) page faults: 66829
   Minor (reclaiming a frame) page faults: 3357778594
   Voluntary context switches: 391149
   Involuntary context switches: 646410
   File system inputs: 106605648
   File system outputs: 120

On Fri, Nov 27, 2020 at 10:18 PM  wrote:


On Thu, 26 Nov 2020, Jan Gorecki wrote:


Thank you Luke for looking into it. Your knowledge of gc is definitely
helpful here. I put comments inline below.

Best,
Jan

On Wed, Nov 25, 2020 at 10:38 PM  wrote:


On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going to be great at handling huge
numbers o

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-12-01 Thread Jan Gorecki
Thank you Luke,

I tried your suggestion about R_MAX_VSIZE but I am not able to get the
error you are getting.
I tried recent R devel as I have seen you made a change to GC there.
My machine is 128GB, free -h reports 125GB available. I tried to set
128, 125 and 100. In all cases the result is "Command terminated by
signal 9". Each took around 6-6.5h.
Details below, if it tells you anything how could I optimize it (or
raise an exception early) please do let me know.

R 4.0.3

unset R_MAX_VSIZE
User time (seconds): 40447.92
System time (seconds): 4034.37
Percent of CPU this job got: 201%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59
Maximum resident set size (kbytes): 127261184
Major (requiring I/O) page faults: 72441
Minor (reclaiming a frame) page faults: 3315491751
Voluntary context switches: 381446
Involuntary context switches: 529554
File system inputs: 108339200
File system outputs: 120

R-devel 2020-11-27 r79522

unset R_MAX_VSIZE
User time (seconds): 40713.52
System time (seconds): 4039.52
Percent of CPU this job got: 198%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52
Maximum resident set size (kbytes): 127254796
Major (requiring I/O) page faults: 72810
Minor (reclaiming a frame) page faults: 3433589848
Voluntary context switches: 384363
Involuntary context switches: 609024
File system inputs: 108467064
File system outputs: 112

R_MAX_VSIZE=128Gb
User time (seconds): 40411.13
System time (seconds): 4227.99
Percent of CPU this job got: 198%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01
Maximum resident set size (kbytes): 127249316
Major (requiring I/O) page faults: 88500
Minor (reclaiming a frame) page faults: 3544520527
Voluntary context switches: 384117
Involuntary context switches: 545397
File system inputs: 111675896
File system outputs: 120

R_MAX_VSIZE=125Gb
User time (seconds): 40246.83
System time (seconds): 4042.76
Percent of CPU this job got: 201%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56
Maximum resident set size (kbytes): 127254200
Major (requiring I/O) page faults: 63867
Minor (reclaiming a frame) page faults: 3449493803
Voluntary context switches: 370753
Involuntary context switches: 614607
File system inputs: 106322880
File system outputs: 112

R_MAX_VSIZE=100Gb
User time (seconds): 41837.10
System time (seconds): 3979.57
Percent of CPU this job got: 192%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34
Maximum resident set size (kbytes): 127256940
Major (requiring I/O) page faults: 66829
Minor (reclaiming a frame) page faults: 3357778594
Voluntary context switches: 391149
Involuntary context switches: 646410
File system inputs: 106605648
File system outputs: 120

On Fri, Nov 27, 2020 at 10:18 PM  wrote:
>
> On Thu, 26 Nov 2020, Jan Gorecki wrote:
>
> > Thank you Luke for looking into it. Your knowledge of gc is definitely
> > helpful here. I put comments inline below.
> >
> > Best,
> > Jan
> >
> > On Wed, Nov 25, 2020 at 10:38 PM  wrote:
> >>
> >> On Tue, 24 Nov 2020, Jan Gorecki wrote:
> >>
> >>> As for other calls to system. I avoid calling system. In the past I
> >>> had some (to get memory stats from OS), but they were failing with
> >>> exactly the same issue. So yes, if I would add call to system before
> >>> calling quit, I believe it would fail with the same error.
> >>> At the same time I think (although I am not sure) that new allocations
> >>> made in R are working fine. So R seems to reserve some memory and can
> >>> continue to operate, while external call like system will fail. Maybe
> >>> it is like this by design, don't know.
> >>
> >> Thanks for the report on quit(). We're exploring how to make the
> >> cleanup on exit more robust to low memory situations like these.
> >>
> >>>
> >>> Aside from this problem that is easy to report due to the warning
> >>> message, I think that gc() is choking at the same time. I tried to
> >>> make reproducible example for that, multiple times but couldn't, let
> >>> me try one more time.
> >>> It happens to manifest when there is 4e8+ unique characters/factors in
> >>> an R session. I am able to reproduce it using data.table and dplyr
> >>> (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
> >>> because of the size. I described briefly problem in:
> >>> https://github.com/h2oai/db-benchmark/issues/110
> >>
> >> Because of the design of R's character vectors, with each element
> >> allocated separately, R is never going to be great at handling huge
> >> numbers of distinct strings. But it can do an adequate job given
> >> enough memory to work with.

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-27 Thread luke-tierney

On Thu, 26 Nov 2020, Jan Gorecki wrote:


Thank you Luke for looking into it. Your knowledge of gc is definitely
helpful here. I put comments inline below.

Best,
Jan

On Wed, Nov 25, 2020 at 10:38 PM  wrote:


On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going to be great at handling huge
numbers of distinct strings. But it can do an adequate job given
enough memory to work with.

When I run your GitHub issue example on a machine with around 500 Gb
of RAM it seems to run OK; /usr/bin/time reports

2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 
92180796maxresident)k
0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps

So the memory footprint is quite large. Using gc.time() it looks like
about 1/3 of the time is in GC. Not ideal, and maybe could be improved
on a bit, but probably not by much. The GC is basically doing an
adequate job, given enough RAM.


Agree, 1/3 is a lot but still acceptable. So this strictly is not
something that requires intervention.
PS. I wasn't aware of gc.time(), it may be worth linking it from
SeeAlso in gc() manual.



If you run this example on a system without enough RAM, or with other
programs competing for RAM, you are likely to end up fighting with
your OS/hardware's virtual memory system. When I try to run it on a
16Gb system it churns for an hour or so before getting killed, and
/usr/bin/time reports a huge number of page faults:

312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps

You are probably experiencing something similar.


Yes, this is exactly what I am experiencing.
The machine is a bare metal machine of 128GB mem, csv size 50GB,
data.frame size 74GB.
In my case it churns for ~3h before it gets killed with SIGINT from
the parent R process which uses 3h as a timeout for this script.
This is something I would like to be addressed because gc time is far
bigger than actual computation time. This is not really acceptable, I
would prefer to raise an exception instead.



There may be opportunities for more tuning of the GC to better handle
running this close to memory limits, but I doubt the payoff would be
worth the effort.


If you don't have plans/time to work on that anytime soon, then I can
fill bugzilla for this problem so it won't get lost in the mailing
list.


I'm not convinced anything useful can be done that would work well for
your application without working badly for others.

If you want to drive this close to your memory limits you are probably
going to have to take responsibility for some tuning at your end. One
option in ?Memory you might try is the R_MAX_VSIZE environment
variable. On my 16Gb machine with R_MAX_VSIZE=16Gb your example fails
very quickly with

Error: vector memory exhausted (limit reached?)

rather than churning for an hour trying to make things work. Setting
memory and/or virtual memory limits in your shell is another option.

Best,

luke






Best,

luke


It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
even more information about gc, like how much time the each gc()
process took, how many objects it has to check on each level.

Best regards,
Jan



On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera  wrote:


On 11/24/20 11:27 AM, Jan Gorecki wrote:

Thanks Bill for checking that.
It was my impression that warnings are raised from some internal
system calls made when quitting R. At that point I don't have much
control over checking the return status of those.
Your suggestion looks good to me.

Tomas, do you think this could help? could this be implemented?


I think this is a good suggestion. Deleting files on Unix was changed
from system("rm") to doing that in C, and deleting the session directory
sho

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-26 Thread Jan Gorecki
Thank you Luke for looking into it. Your knowledge of gc is definitely
helpful here. I put comments inline below.

Best,
Jan

On Wed, Nov 25, 2020 at 10:38 PM  wrote:
>
> On Tue, 24 Nov 2020, Jan Gorecki wrote:
>
> > As for other calls to system. I avoid calling system. In the past I
> > had some (to get memory stats from OS), but they were failing with
> > exactly the same issue. So yes, if I would add call to system before
> > calling quit, I believe it would fail with the same error.
> > At the same time I think (although I am not sure) that new allocations
> > made in R are working fine. So R seems to reserve some memory and can
> > continue to operate, while external call like system will fail. Maybe
> > it is like this by design, don't know.
>
> Thanks for the report on quit(). We're exploring how to make the
> cleanup on exit more robust to low memory situations like these.
>
> >
> > Aside from this problem that is easy to report due to the warning
> > message, I think that gc() is choking at the same time. I tried to
> > make reproducible example for that, multiple times but couldn't, let
> > me try one more time.
> > It happens to manifest when there is 4e8+ unique characters/factors in
> > an R session. I am able to reproduce it using data.table and dplyr
> > (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
> > because of the size. I described briefly problem in:
> > https://github.com/h2oai/db-benchmark/issues/110
>
> Because of the design of R's character vectors, with each element
> allocated separately, R is never going to be great at handling huge
> numbers of distinct strings. But it can do an adequate job given
> enough memory to work with.
>
> When I run your GitHub issue example on a machine with around 500 Gb
> of RAM it seems to run OK; /usr/bin/time reports
>
> 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 
> 92180796maxresident)k
> 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps
>
> So the memory footprint is quite large. Using gc.time() it looks like
> about 1/3 of the time is in GC. Not ideal, and maybe could be improved
> on a bit, but probably not by much. The GC is basically doing an
> adequate job, given enough RAM.

Agree, 1/3 is a lot but still acceptable. So this strictly is not
something that requires intervention.
PS. I wasn't aware of gc.time(), it may be worth linking it from
SeeAlso in gc() manual.

>
> If you run this example on a system without enough RAM, or with other
> programs competing for RAM, you are likely to end up fighting with
> your OS/hardware's virtual memory system. When I try to run it on a
> 16Gb system it churns for an hour or so before getting killed, and
> /usr/bin/time reports a huge number of page faults:
>
> 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps
>
> You are probably experiencing something similar.

Yes, this is exactly what I am experiencing.
The machine is a bare metal machine of 128GB mem, csv size 50GB,
data.frame size 74GB.
In my case it churns for ~3h before it gets killed with SIGINT from
the parent R process which uses 3h as a timeout for this script.
This is something I would like to be addressed because gc time is far
bigger than actual computation time. This is not really acceptable, I
would prefer to raise an exception instead.

>
> There may be opportunities for more tuning of the GC to better handle
> running this close to memory limits, but I doubt the payoff would be
> worth the effort.

If you don't have plans/time to work on that anytime soon, then I can
fill bugzilla for this problem so it won't get lost in the mailing
list.


>
> Best,
>
> luke
>
> > It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
> > even more information about gc, like how much time the each gc()
> > process took, how many objects it has to check on each level.
> >
> > Best regards,
> > Jan
> >
> >
> >
> > On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera  
> > wrote:
> >>
> >> On 11/24/20 11:27 AM, Jan Gorecki wrote:
> >>> Thanks Bill for checking that.
> >>> It was my impression that warnings are raised from some internal
> >>> system calls made when quitting R. At that point I don't have much
> >>> control over checking the return status of those.
> >>> Your suggestion looks good to me.
> >>>
> >>> Tomas, do you think this could help? could this be implemented?
> >>
> >> I think this is a good suggestion. Deleting files on Unix was changed
> >> from system("rm") to doing that in C, and deleting the session directory
> >> should follow.
> >>
> >> It might also help diagnosing your problem, but I don't think it would
> >> solve it. If the diagnostics in R works fine and the OS was so
> >> hopelessly out of memory that it couldn't run any more external
> >> processes, then really this is not a problem of R, but of having
> >> exhausted the resources. And it would be a coincidence that just this
> >> particular call to "system" at the end of the 

Re: [Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

2020-11-25 Thread luke-tierney

On Tue, 24 Nov 2020, Jan Gorecki wrote:


As for other calls to system. I avoid calling system. In the past I
had some (to get memory stats from OS), but they were failing with
exactly the same issue. So yes, if I would add call to system before
calling quit, I believe it would fail with the same error.
At the same time I think (although I am not sure) that new allocations
made in R are working fine. So R seems to reserve some memory and can
continue to operate, while external call like system will fail. Maybe
it is like this by design, don't know.


Thanks for the report on quit(). We're exploring how to make the
cleanup on exit more robust to low memory situations like these.



Aside from this problem that is easy to report due to the warning
message, I think that gc() is choking at the same time. I tried to
make reproducible example for that, multiple times but couldn't, let
me try one more time.
It happens to manifest when there is 4e8+ unique characters/factors in
an R session. I am able to reproduce it using data.table and dplyr
(0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
because of the size. I described briefly problem in:
https://github.com/h2oai/db-benchmark/issues/110


Because of the design of R's character vectors, with each element
allocated separately, R is never going to be great at handling huge
numbers of distinct strings. But it can do an adequate job given
enough memory to work with.

When I run your GitHub issue example on a machine with around 500 Gb
of RAM it seems to run OK; /usr/bin/time reports

2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 
92180796maxresident)k
0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps

So the memory footprint is quite large. Using gc.time() it looks like
about 1/3 of the time is in GC. Not ideal, and maybe could be improved
on a bit, but probably not by much. The GC is basically doing an
adequate job, given enough RAM.

If you run this example on a system without enough RAM, or with other
programs competing for RAM, you are likely to end up fighting with
your OS/hardware's virtual memory system. When I try to run it on a
16Gb system it churns for an hour or so before getting killed, and
/usr/bin/time reports a huge number of page faults:

312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps

You are probably experiencing something similar.

There may be opportunities for more tuning of the GC to better handle
running this close to memory limits, but I doubt the payoff would be
worth the effort.

Best,

luke


It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
even more information about gc, like how much time the each gc()
process took, how many objects it has to check on each level.

Best regards,
Jan



On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera  wrote:


On 11/24/20 11:27 AM, Jan Gorecki wrote:

Thanks Bill for checking that.
It was my impression that warnings are raised from some internal
system calls made when quitting R. At that point I don't have much
control over checking the return status of those.
Your suggestion looks good to me.

Tomas, do you think this could help? could this be implemented?


I think this is a good suggestion. Deleting files on Unix was changed
from system("rm") to doing that in C, and deleting the session directory
should follow.

It might also help diagnosing your problem, but I don't think it would
solve it. If the diagnostics in R works fine and the OS was so
hopelessly out of memory that it couldn't run any more external
processes, then really this is not a problem of R, but of having
exhausted the resources. And it would be a coincidence that just this
particular call to "system" at the end of the session did not work.
Anything else could break as well close to the end of the script. This
seems the most likely explanation to me.

Do you get this warning repeatedly, reproducibly at least in slightly
different scripts at the very end, with this warning always from quit()?
So that the "call" part of the warning message has .Internal(quit) like
in the case you posted? Would adding another call to "system" before the
call to "q()" work - with checking the return value? If it is always
only the last call to "system" in "q()", then it is suspicious, perhaps
an indication that some diagnostics in R is not correct. In that case, a
reproducible example would be the key - so either if you could diagnose
on your end what is the problem, or create a reproducible example that
someone else can use to reproduce and debug.

Best
Tomas



On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap  wrote:

The call to system() probably is an internal call used to delete the session's 
tempdir().  This sort of failure means that a potentially large amount of disk 
space is not being recovered when R is done.  Perhaps R_CleanTempDir() could 
call R_unlink() instead of having a subprocess call 'rm -rf ...'.  Then it 
could also issue a specific wa