Re: [Rd] Memory leak with tons of closed connections
> Gábor Csárdi > on Sun, 13 Nov 2016 20:49:57 + writes: > Using dup() before fdopen() (and calling fclose() on the connection > when it is closed) indeed fixes the memory leak. > Thank you, Gábor! Yes I can confirm that this fixes the memory leak. I'm testing ('make check-all') currently and then (probably) will commit the patch R-devel only for the time being. Martin > FYI, > Gabor > > Index: src/main/connections.c > === > --- src/main/connections.c (revision 71653) > +++ src/main/connections.c (working copy) > @@ -576,7 +576,7 @@ > fp = R_fopen(name, con->mode); > } else { /* use file("stdin") to refer to the file and not the console > */ > #ifdef HAVE_FDOPEN > - fp = fdopen(0, con->mode); > +fp = fdopen(dup(0), con->mode); > #else > warning(_("cannot open file '%s': %s"), name, > "fdopen is not supported on this platform"); > @@ -633,8 +633,7 @@ > static void file_close(Rconnection con) > { > Rfileconn this = con->private; > -if(con->isopen && strcmp(con->description, "stdin")) > - con->status = fclose(this->fp); > +con->status = fclose(this->fp); > con->isopen = FALSE; > #ifdef Win32 > if(this->anon_file) unlink(this->name); > > On Fri, Nov 11, 2016 at 1:12 PM, Gábor Csárdi wrote: > > On Fri, Nov 11, 2016 at 12:46 PM, Gergely Daróczi > > wrote: > > [...] > >>> I've changed the above to *print* the gc() result every 1000th > >>> iteration, and after 100'000 iterations, there is still no > >>> memory increase from the point of view of R itself. > > > > Yes, R does not know about it, it does not manage this memory (any > > more), but the R process requested this memory from the OS, and never > > gave it back, which is basically the definition of a memory leak. No? > > > > I think the leak is because 'stdin' is special and R opens it with fdopen(): > > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L561-L579 > > > > and then it does not close it: > > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L636 > > > > I understand that R cannot fclose the FILE*, because that would also > > close the file descriptor, but anyway, this causes a memory leak. I > > think. > > > > It seems that you cannot close the FILE* without closing the > > descriptor, so maybe a workaround would be to keep one FILE* open, > > instead of calling fdopen() to create new ones every time. Another > > possible workaround is to use dup(), but I don't know enough about the > > details to be sure. > > > > Gabor > > > > [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Memory leak with tons of closed connections
Using dup() before fdopen() (and calling fclose() on the connection when it is closed) indeed fixes the memory leak. FYI, Gabor Index: src/main/connections.c === --- src/main/connections.c (revision 71653) +++ src/main/connections.c (working copy) @@ -576,7 +576,7 @@ fp = R_fopen(name, con->mode); } else { /* use file("stdin") to refer to the file and not the console */ #ifdef HAVE_FDOPEN - fp = fdopen(0, con->mode); +fp = fdopen(dup(0), con->mode); #else warning(_("cannot open file '%s': %s"), name, "fdopen is not supported on this platform"); @@ -633,8 +633,7 @@ static void file_close(Rconnection con) { Rfileconn this = con->private; -if(con->isopen && strcmp(con->description, "stdin")) - con->status = fclose(this->fp); +con->status = fclose(this->fp); con->isopen = FALSE; #ifdef Win32 if(this->anon_file) unlink(this->name); On Fri, Nov 11, 2016 at 1:12 PM, Gábor Csárdi wrote: > On Fri, Nov 11, 2016 at 12:46 PM, Gergely Daróczi > wrote: > [...] >>> I've changed the above to *print* the gc() result every 1000th >>> iteration, and after 100'000 iterations, there is still no >>> memory increase from the point of view of R itself. > > Yes, R does not know about it, it does not manage this memory (any > more), but the R process requested this memory from the OS, and never > gave it back, which is basically the definition of a memory leak. No? > > I think the leak is because 'stdin' is special and R opens it with fdopen(): > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L561-L579 > > and then it does not close it: > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L636 > > I understand that R cannot fclose the FILE*, because that would also > close the file descriptor, but anyway, this causes a memory leak. I > think. > > It seems that you cannot close the FILE* without closing the > descriptor, so maybe a workaround would be to keep one FILE* open, > instead of calling fdopen() to create new ones every time. Another > possible workaround is to use dup(), but I don't know enough about the > details to be sure. > > Gabor > > [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Memory leak with tons of closed connections
On Fri, Nov 11, 2016 at 12:46 PM, Gergely Daróczi wrote: [...] >> I've changed the above to *print* the gc() result every 1000th >> iteration, and after 100'000 iterations, there is still no >> memory increase from the point of view of R itself. Yes, R does not know about it, it does not manage this memory (any more), but the R process requested this memory from the OS, and never gave it back, which is basically the definition of a memory leak. No? I think the leak is because 'stdin' is special and R opens it with fdopen(): https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L561-L579 and then it does not close it: https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L636 I understand that R cannot fclose the FILE*, because that would also close the file descriptor, but anyway, this causes a memory leak. I think. It seems that you cannot close the FILE* without closing the descriptor, so maybe a workaround would be to keep one FILE* open, instead of calling fdopen() to create new ones every time. Another possible workaround is to use dup(), but I don't know enough about the details to be sure. Gabor [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Memory leak with tons of closed connections
On Fri, Nov 11, 2016 at 12:08 PM, Martin Maechler wrote: >> Gergely Daróczi >> on Thu, 10 Nov 2016 16:48:12 +0100 writes: > > > Dear All, > > I'm developing an R application running inside of a Java daemon on > > multiple threads, and interacting with the parent daemon via stdin and > > stdout. > > > Everything works perfectly fine except for having some memory leaks > > somewhere. Simplified version of the R app: > > > while (TRUE) { > > con <- file('stdin', open = 'r', blocking = TRUE) > > line <- scan(con, what = character(0), nlines = 1, quiet = TRUE) > > close(con) > > } > > > This loop uses more and more RAM as time passes (see more on this > > below), not sure why, and I have no idea currently on how to debug > > this further. Can someone please try to reproduce it and give me some > > hints on what is the problem? > > > Sample bash script to trigger an R process with such memory leak: > > > Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript > > --vanilla -e > "cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}" > > > Maybe you have to escape '\n' depending on your shell. > > > Thanks for reading this and any hints would be highly appreciated! > > I have no hints, sorry... but give some more "data": > > I've changed the above to *print* the gc() result every 1000th > iteration, and after 100'000 iterations, there is still no > memory increase from the point of view of R itself. > > However, monitoring the process (via 'htop', e.g.) shows about > 1 MB per second increase in memory foot print of the process. > > One could argue that the error is with the OS / pipe / bash > rather than with R itself... but I'm not expert enough to do > argue here at all. > > Here's my version of your sample bash script and its output: > > $ Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla > -e "cat(Sys.getpid(),'\n');i <- 0; > while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a > <- gc(); i <- i+1; if(i %% 1000 == 1) {cat('i=',i,'\\n'); print(a)} }" > > 11059 > i= 1 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83216 4.5 1000 534.1 213529 11.5 > Vcells 172923 1.4 16777216 128.0 562476 4.3 > i= 1001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172958 1.4 16777216 128.0 562476 4.3 > ... > ... > ... > ... > i= 80001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172958 1.4 16777216 128.0 562476 4.3 > i= 81001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172959 1.4 16777216 128.0 562476 4.3 > i= 82001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172959 1.4 16777216 128.0 562476 4.3 > i= 83001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172958 1.4 16777216 128.0 562476 4.3 > i= 84001 > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 83255 4.5 1000 534.1 213529 11.5 > Vcells 172958 1.4 16777216 128.0 562476 4.3 > Thank you very much, this was very useful! I tried to do some more research on this, as Gabor Csardi also suspected that the memory grow might be due to the writer being faster than the reader, so data is simply accumulating in the input buffer of the reader. I double checked this via: Rscript --vanilla -e "i<-1;while(TRUE){cat(runif(1),'\n');i<-i+1;if(i==1e6){Sys.sleep(15);i<-1}}" | Rscript --vanilla -e "cat(Sys.getpid(),'\n');i<-0;while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a<-gc();i<-i+1;if(i%%1e3==1){cat('i=',i,'\\n');print(a)}}"scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}" So the writer generates a good number of lines, but sleeps for 15 seconds after a while so that the reader can catch up. Monitoring the memory footprint of the process (by the way gc reported no memory increase in the reader, just like in Martin's output) shows that the memory grows when the writer sends data, and it's constant when the writer is sleeping, but it never decreases: http://imgur.com/r7T02pK Maybe it's more like an OS-specific question based on this, you are absolutely right, but I was not able to reproduce the same memory issue in plain bash via: while :;do echo '1';done | bash -c "while :;do read;done" But I'm not sure if this does exactly
Re: [Rd] Memory leak with tons of closed connections
> Gergely Daróczi > on Thu, 10 Nov 2016 16:48:12 +0100 writes: > Dear All, > I'm developing an R application running inside of a Java daemon on > multiple threads, and interacting with the parent daemon via stdin and > stdout. > Everything works perfectly fine except for having some memory leaks > somewhere. Simplified version of the R app: > while (TRUE) { > con <- file('stdin', open = 'r', blocking = TRUE) > line <- scan(con, what = character(0), nlines = 1, quiet = TRUE) > close(con) > } > This loop uses more and more RAM as time passes (see more on this > below), not sure why, and I have no idea currently on how to debug > this further. Can someone please try to reproduce it and give me some > hints on what is the problem? > Sample bash script to trigger an R process with such memory leak: > Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript > --vanilla -e "cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}" > Maybe you have to escape '\n' depending on your shell. > Thanks for reading this and any hints would be highly appreciated! I have no hints, sorry... but give some more "data": I've changed the above to *print* the gc() result every 1000th iteration, and after 100'000 iterations, there is still no memory increase from the point of view of R itself. However, monitoring the process (via 'htop', e.g.) shows about 1 MB per second increase in memory foot print of the process. One could argue that the error is with the OS / pipe / bash rather than with R itself... but I'm not expert enough to do argue here at all. Here's my version of your sample bash script and its output: $ Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla -e "cat(Sys.getpid(),'\n');i <- 0; while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a <- gc(); i <- i+1; if(i %% 1000 == 1) {cat('i=',i,'\\n'); print(a)} }" 11059 i= 1 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83216 4.5 1000 534.1 213529 11.5 Vcells 172923 1.4 16777216 128.0 562476 4.3 i= 1001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172958 1.4 16777216 128.0 562476 4.3 ... ... ... ... i= 80001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172958 1.4 16777216 128.0 562476 4.3 i= 81001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172959 1.4 16777216 128.0 562476 4.3 i= 82001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172959 1.4 16777216 128.0 562476 4.3 i= 83001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172958 1.4 16777216 128.0 562476 4.3 i= 84001 used (Mb) gc trigger (Mb) max used (Mb) Ncells 83255 4.5 1000 534.1 213529 11.5 Vcells 172958 1.4 16777216 128.0 562476 4.3 > Best, > Gergely > PS1 see the image posted at > http://stackoverflow.com/questions/40522584/memory-leak-with-closed-connections > on memory usage over time > PS2 the issue doesn't seem to be due to writing more data in the first > R app compared to what the second R app can handle, as I tried the > same with adding a Sys.sleep(0.01) in the first app and that's not an > issue at all in the real application > PS3 I also tried using stdin() instead of file('stdin'), but that did > not work well for the stream running on multiple threads started by > the same parent Java daemon > PS4 I've tried this on Linux using R 3.2.3 and 3.3.2 For me, it's Linux, too (Fedora 24), using 'R 3.3.2 patched'.. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Memory leak with tons of closed connections
Dear All, I'm developing an R application running inside of a Java daemon on multiple threads, and interacting with the parent daemon via stdin and stdout. Everything works perfectly fine except for having some memory leaks somewhere. Simplified version of the R app: while (TRUE) { con <- file('stdin', open = 'r', blocking = TRUE) line <- scan(con, what = character(0), nlines = 1, quiet = TRUE) close(con) } This loop uses more and more RAM as time passes (see more on this below), not sure why, and I have no idea currently on how to debug this further. Can someone please try to reproduce it and give me some hints on what is the problem? Sample bash script to trigger an R process with such memory leak: Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla -e "cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}" Maybe you have to escape '\n' depending on your shell. Thanks for reading this and any hints would be highly appreciated! Best, Gergely PS1 see the image posted at http://stackoverflow.com/questions/40522584/memory-leak-with-closed-connections on memory usage over time PS2 the issue doesn't seem to be due to writing more data in the first R app compared to what the second R app can handle, as I tried the same with adding a Sys.sleep(0.01) in the first app and that's not an issue at all in the real application PS3 I also tried using stdin() instead of file('stdin'), but that did not work well for the stream running on multiple threads started by the same parent Java daemon PS4 I've tried this on Linux using R 3.2.3 and 3.3.2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel