Re: [R] Memory management in R
> Date: Sun, 10 Oct 2010 15:27:11 +0200 > From: lorenzo.ise...@gmail.com > To: dwinsem...@comcast.net > CC: r-help@r-project.org > Subject: Re: [R] Memory management in R > > > > I already offered the Biostrings package. It provides more robust > > methods for string matching than does grepl. Is there a reason that you > > choose not to? > > > > Indeed that is the way I should go for and I have installed the package > after some struggling. Since biostring is a fairly complex package and I > need only a way to check if a certain string A is a subset of string B, > do you know the biostring functions to achieve this? > I see a lot of methods for biological (DNA, RNA) sequences, and they may > not apply to my series (which are definitely not from biology). Generally the differences relate to alphabet and "things you may want to know about them." Unless you are looking for reverse complement text strings, there will be a lot of stuff you don't need. Offhand, I'd be looking for things like computational linguistics packages as you are looking to find patterns or predictability in human readable character sequences. Now, humans can probably write hairpin-text( look at what RNA can do LOL) but this is probably not what you care about. However, as I mentioned earlier, I had to write my own regex compiler ( coincidently for bio apps ) to get required performance. Your application and understanding may benefit from things like building dictionaries that aren't really part of regex and that can easily be done in a few lines of c++ code using STL containers. To get statistically meaningful samples, you almost will certainly need faster code. > Cheers > > Lorenzo > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? Indeed that is the way I should go for and I have installed the package after some struggling. Since biostring is a fairly complex package and I need only a way to check if a certain string A is a subset of string B, do you know the biostring functions to achieve this? I see a lot of methods for biological (DNA, RNA) sequences, and they may not apply to my series (which are definitely not from biology). Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
On Oct 9, 2010, at 4:23 PM, Lorenzo Isella wrote: My suggestion is to explore other alternatives. (I will admit that I don't yet fully understand the test that you are applying.) Hi, I am trying to partially implement the Lempel Ziv compression algorithm. The point is that compressibility and entropy of a time series are related, hence my final goal is to evaluate the entropy of a time series. You can find more at http://bit.ly/93zX4T http://en.wikipedia.org/wiki/LZ77_and_LZ78 http://bit.ly/9NgIFt The two that have occurred to me are Biostrings which I have already mentioned and rle() which I have illustrated the use of but not referenced as an avenue. The Biostrings package is part of bioConductor (part of the R universe) although you should be prepared for a coffee break when you install it if you haven't gotten at least bioClite already installed. When I installed it last night it had 54 other package dependents also downloaded and installed. It seems to me that taking advantage of the coding resources in the molecular biology domain that are currently directed at decoding the information storage mechanism of life might be a smart strategy. You have not described the domain you are working in but I would guess that the "digest" package might be biological in primary application? So forgive me if I am preaching to the choir. The rle option also occurred to me but it might take a smarter coder than I to fully implement it. (But maybe Holtman would be up to it. He's a _lot_ smarter than I.) In your example the long "x" string is faithfully represented by two aligned vectors, each 197 characters in length. The long repeat sequence that broke the grepl mechanism are just one pair of values. > rle(x) Run Length Encoding lengths: int [1:197] 1 1 2 1 1 4 1 9 1 1 ... values : chr [1:197] "5d64d58a" "ac76183b" "202fbcc4" "78087f5e" ... So maybe as soon as you got to a bundle that was greater than 1/2 the overall length (as happened in the "x" case) you could stop, since it could not have "occurred before". I doubt that rle() can be deployed to replace Lempel-Ziv (LZ) algorithm in a trivial way. As a less convoluted example, consider the series x <- c("d","a","b","d","a","b","e","z") If i=4 and therefore the i-th element is the second 'd' in the series, the shortest series starting from i=4 that I do not see in the past of 'd' is "d","a","b","e", whose length is equal to 4 and that is the value returned by the function below. The frustrating thing is that I already have the tools I need, just they crash for reasons beyond my control on relatively short series. If anyone can make the function below more robust, that is really a big help for me. I already offered the Biostrings package. It provides more robust methods for string matching than does grepl. Is there a reason that you choose not to? -- David. Cheers Lorenzo ### entropy_lz <- function(x,i){ past <- x[1:i-1] n <- length(x) lp <- length(past) future <- x[i:n] go_on <- 1 count_len <- 0 past_string <- paste(past, collapse="#") while (go_on>0){ new_seq <- x[i:(i+count_len)] fut_string <- paste(new_seq, collapse="#") count_len <- count_len+1 if (grepl(fut_string,past_string)!=1){ go_on <- -1 } } return(count_len) } x <- c("c","a","b","c","a","b","e","z") S <- entropy_lz(x,4) David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
My suggestion is to explore other alternatives. (I will admit that I don't yet fully understand the test that you are applying.) Hi, I am trying to partially implement the Lempel Ziv compression algorithm. The point is that compressibility and entropy of a time series are related, hence my final goal is to evaluate the entropy of a time series. You can find more at http://bit.ly/93zX4T http://en.wikipedia.org/wiki/LZ77_and_LZ78 http://bit.ly/9NgIFt The two that have occurred to me are Biostrings which I have already mentioned and rle() which I have illustrated the use of but not referenced as an avenue. The Biostrings package is part of bioConductor (part of the R universe) although you should be prepared for a coffee break when you install it if you haven't gotten at least bioClite already installed. When I installed it last night it had 54 other package dependents also downloaded and installed. It seems to me that taking advantage of the coding resources in the molecular biology domain that are currently directed at decoding the information storage mechanism of life might be a smart strategy. You have not described the domain you are working in but I would guess that the "digest" package might be biological in primary application? So forgive me if I am preaching to the choir. The rle option also occurred to me but it might take a smarter coder than I to fully implement it. (But maybe Holtman would be up to it. He's a _lot_ smarter than I.) In your example the long "x" string is faithfully represented by two aligned vectors, each 197 characters in length. The long repeat sequence that broke the grepl mechanism are just one pair of values. > rle(x) Run Length Encoding lengths: int [1:197] 1 1 2 1 1 4 1 9 1 1 ... values : chr [1:197] "5d64d58a" "ac76183b" "202fbcc4" "78087f5e" ... So maybe as soon as you got to a bundle that was greater than 1/2 the overall length (as happened in the "x" case) you could stop, since it could not have "occurred before". I doubt that rle() can be deployed to replace Lempel-Ziv (LZ) algorithm in a trivial way. As a less convoluted example, consider the series x <- c("d","a","b","d","a","b","e","z") If i=4 and therefore the i-th element is the second 'd' in the series, the shortest series starting from i=4 that I do not see in the past of 'd' is "d","a","b","e", whose length is equal to 4 and that is the value returned by the function below. The frustrating thing is that I already have the tools I need, just they crash for reasons beyond my control on relatively short series. If anyone can make the function below more robust, that is really a big help for me. Cheers Lorenzo ### entropy_lz <- function(x,i){ past <- x[1:i-1] n <- length(x) lp <- length(past) future <- x[i:n] go_on <- 1 count_len <- 0 past_string <- paste(past, collapse="#") while (go_on>0){ new_seq <- x[i:(i+count_len)] fut_string <- paste(new_seq, collapse="#") count_len <- count_len+1 if (grepl(fut_string,past_string)!=1){ go_on <- -1 } } return(count_len) } x <- c("c","a","b","c","a","b","e","z") S <- entropy_lz(x,4) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
On Oct 9, 2010, at 9:45 AM, Lorenzo Isella wrote: Hi David, I am replying to you and to the other people who provided some insight into my problems with grepl. Well, at least we now know that the bug is reproducible. Indeed it is a strange sequence the one I am postprocessing, probably pathological to some extent, nevertheless the problem is given by grepl crushing when a long (but not huge) chunk of repeated data is loaded has to be acknowledged. Now, my problem is the following: given a potentially long string (or before that a sequence, where every element has been generated via the hash function, algo='crc32' of the digest package), how can I, starting from an arbitrary position i along the list, calculate the shortest substring in the future of i (i.e. the interval i:end of the series) that has not occurred in the past of i (i.e. [1:i-1])? Maybe you should work on a less convoluted explanation of the test? Or perhaps a couple of compact examples, preferably in R-copy-paste format? Efficiency is not the main point here, I need to run this code only once to get what I need, but it cannot crush on a 2000-entry string. My suggestion is to explore other alternatives. (I will admit that I don't yet fully understand the test that you are applying.) The two that have occurred to me are Biostrings which I have already mentioned and rle() which I have illustrated the use of but not referenced as an avenue. The Biostrings package is part of bioConductor (part of the R universe) although you should be prepared for a coffee break when you install it if you haven't gotten at least bioClite already installed. When I installed it last night it had 54 other package dependents also downloaded and installed. It seems to me that taking advantage of the coding resources in the molecular biology domain that are currently directed at decoding the information storage mechanism of life might be a smart strategy. You have not described the domain you are working in but I would guess that the "digest" package might be biological in primary application? So forgive me if I am preaching to the choir. The rle option also occurred to me but it might take a smarter coder than I to fully implement it. (But maybe Holtman would be up to it. He's a _lot_ smarter than I.) In your example the long "x" string is faithfully represented by two aligned vectors, each 197 characters in length. The long repeat sequence that broke the grepl mechanism are just one pair of values. > rle(x) Run Length Encoding lengths: int [1:197] 1 1 2 1 1 4 1 9 1 1 ... values : chr [1:197] "5d64d58a" "ac76183b" "202fbcc4" "78087f5e" ... So maybe as soon as you got to a bundle that was greater than 1/2 the overall length (as happened in the "x" case) you could stop, since it could not have "occurred before". -- David. Cheers Lorenzo On 10/09/2010 01:30 AM, David Winsemius wrote: What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists. But maybe your loop terminated in them eaarlier/ Someplace between 11*225 and 11*240 the grepping machine gives up: > eprs <- paste(rep("aa", 225), collapse="#") > grepl(eprs, eprs) [1] TRUE > eprs <- paste(rep("aa", 240), collapse="#") > grepl(eprs, eprs) Error in grepl(eprs, eprs) : invalid regular expression 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a In addition: Warning message: In grepl(eprs, eprs) : regcomp error: 'Out of memory' The complexity of the problem may depend on the distribution of values. You have a very skewed distribution with the vast majority being in the same value as appeared in your error message : > table(x) x 12653a6 202fbcc4 48bef8c3 4e084ddc 51f342a4 5d64d58a 78087f5e abddf3d1 1419 299 1 1 1 3 1 1 ac76183b b955be36 c600173a e96f6bbd e9c56275 1 30 5 1 9 And you have 1159 of them in one clump (which would seem to be somewhat improbably under a random null hypothesis: > max(rle(x)$lengths) [1
Re: [R] Memory management in R
Hi David, I am replying to you and to the other people who provided some insight into my problems with grepl. Well, at least we now know that the bug is reproducible. Indeed it is a strange sequence the one I am postprocessing, probably pathological to some extent, nevertheless the problem is given by grepl crushing when a long (but not huge) chunk of repeated data is loaded has to be acknowledged. Now, my problem is the following: given a potentially long string (or before that a sequence, where every element has been generated via the hash function, algo='crc32' of the digest package), how can I, starting from an arbitrary position i along the list, calculate the shortest substring in the future of i (i.e. the interval i:end of the series) that has not occurred in the past of i (i.e. [1:i-1])? Efficiency is not the main point here, I need to run this code only once to get what I need, but it cannot crush on a 2000-entry string. Cheers Lorenzo On 10/09/2010 01:30 AM, David Winsemius wrote: What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists. But maybe your loop terminated in them eaarlier/ Someplace between 11*225 and 11*240 the grepping machine gives up: > eprs <- paste(rep("aa", 225), collapse="#") > grepl(eprs, eprs) [1] TRUE > eprs <- paste(rep("aa", 240), collapse="#") > grepl(eprs, eprs) Error in grepl(eprs, eprs) : invalid regular expression 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a In addition: Warning message: In grepl(eprs, eprs) : regcomp error: 'Out of memory' The complexity of the problem may depend on the distribution of values. You have a very skewed distribution with the vast majority being in the same value as appeared in your error message : > table(x) x 12653a6 202fbcc4 48bef8c3 4e084ddc 51f342a4 5d64d58a 78087f5e abddf3d1 1419 299 1 1 1 3 1 1 ac76183b b955be36 c600173a e96f6bbd e9c56275 1 30 5 1 9 And you have 1159 of them in one clump (which would seem to be somewhat improbably under a random null hypothesis: > max(rle(x)$lengths) [1] 1159 > which(rle(x)$lengths == 1159) [1] 123 > rle(x)$values[123] [1] "12653a6" HTH (although I think it means you need to construct a different implementation strategy); David. Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
On Oct 8, 2010, at 9:19 PM, Mike Marchywka wrote: From: dwinsem...@comcast.net To: lorenzo.ise...@gmail.com Date: Fri, 8 Oct 2010 19:30:45 -0400 CC: r-help@r-project.org Subject: Re: [R] Memory management in R On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote: Please find below the R snippet which requires an input file (a simple text file) you can download from http://dl.dropbox.com/u/5685598/time_series25_.dat What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists. But maybe your loop terminated in them eaarlier/ Someplace between 11*225 and 11*240 the grepping machine gives up: eprs <- paste(rep("aa", 225), collapse="#") grepl(eprs, eprs) [1] TRUE eprs <- paste(rep("aa", 240), collapse="#") grepl(eprs, eprs) Error in grepl(eprs, eprs) : invalid regular expression 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a In addition: Warning message: In grepl(eprs, eprs) : regcomp error: 'Out of memory' The complexity of the problem may depend on the distribution of values. You have a very skewed distribution with the vast majority being in the same value as appeared in your error message : HTH (although I think it means you need to construct a different implementation strategy); You really need to look at the question posed by your regex and consider the complexity of what you are asking and what likely implementations would do with your regex. The R regex machine (at least on a Mac with R 2.11.1) breaks when the length of the the pattern argument exceeds 2559 characters. There is no complexity for the regex parser here. No metacharacters were in the string. Something like this probably needs to be implemented in dedicated code to handle the more general case or you need to determine if input data is pathological given your regex. There is a Biostrings package in BioC that may provide more robust treatment of long strings. -- David. Being able to write something concisely doesn't mean the execution of that something is simple. Even if it does manage to return a result, it likely will get very slow. In the past I have had to write my own simple regex compilers to handle a limited class of expressions to make the speed reasonable. In this case, depending on your objectives, dedicated code may even be helpful to you in understanding the algorithm. David. Many thanks Lorenzo David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
> From: dwinsem...@comcast.net > To: lorenzo.ise...@gmail.com > Date: Fri, 8 Oct 2010 19:30:45 -0400 > CC: r-help@r-project.org > Subject: Re: [R] Memory management in R > > > On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote: > > > Please find below the R snippet which requires an input file (a > > simple text file) you can download from > > > > http://dl.dropbox.com/u/5685598/time_series25_.dat > > > > What puzzles me is that the list is not really long (less than 2000 > > entries) and I have not experienced the same problem even with > > longer lists. > > But maybe your loop terminated in them eaarlier/ Someplace between > 11*225 and 11*240 the grepping machine gives up: > > > eprs <- paste(rep("aa", 225), collapse="#") > > grepl(eprs, eprs) > [1] TRUE > > > eprs <- paste(rep("aa", 240), collapse="#") > > grepl(eprs, eprs) > Error in grepl(eprs, eprs) : > invalid regular expression > 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a > In addition: Warning message: > In grepl(eprs, eprs) : regcomp error: 'Out of memory' > > The complexity of the problem may depend on the distribution of > values. You have a very skewed distribution with the vast majority > being in the same value as appeared in your error message : > > > HTH (although I think it means you need to construct a different > implementation strategy); You really need to look at the question posed by your regex and consider the complexity of what you are asking and what likely implementations would do with your regex. Something like this probably needs to be implemented in dedicated code to handle the more general case or you need to determine if input data is pathological given your regex. Being able to write something concisely doesn't mean the execution of that something is simple. Even if it does manage to return a result, it likely will get very slow. In the past I have had to write my own simple regex compilers to handle a limited class of expressions to make the speed reasonable. In this case, depending on your objectives, dedicated code may even be helpful to you in understanding the algorithm. > > David. > > > > Many thanks > > > > Lorenzo > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote: Thanks for lending a helping hand. I put together a self-contained example. Basically, it all relies on a couple of functions, where one function simply iterates the application of the other function. I am trying to implement the so-called Lempel-Ziv entropy estimator. The idea is to choose a position i along a string x (standing for a time series) and find the length of the shortest string starting from i which has never occurred before i. Please find below the R snippet which requires an input file (a simple text file) you can download from http://dl.dropbox.com/u/5685598/time_series25_.dat What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists. But maybe your loop terminated in them eaarlier/ Someplace between 11*225 and 11*240 the grepping machine gives up: > eprs <- paste(rep("aa", 225), collapse="#") > grepl(eprs, eprs) [1] TRUE > eprs <- paste(rep("aa", 240), collapse="#") > grepl(eprs, eprs) Error in grepl(eprs, eprs) : invalid regular expression 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a In addition: Warning message: In grepl(eprs, eprs) : regcomp error: 'Out of memory' The complexity of the problem may depend on the distribution of values. You have a very skewed distribution with the vast majority being in the same value as appeared in your error message : > table(x) x 12653a6 202fbcc4 48bef8c3 4e084ddc 51f342a4 5d64d58a 78087f5e abddf3d1 1419 299111311 ac76183b b955be36 c600173a e96f6bbd e9c56275 1 30519 And you have 1159 of them in one clump (which would seem to be somewhat improbably under a random null hypothesis: > max(rle(x)$lengths) [1] 1159 > which(rle(x)$lengths == 1159) [1] 123 > rle(x)$values[123] [1] "12653a6" HTH (although I think it means you need to construct a different implementation strategy); David. Many thanks Lorenzo ## total_entropy_lz <- function(x){ if (length(x)==1){ print("sequence too short") return("error") } else{ n <- length(x) prefactor <- 1/(n*log(n)/log(2)) n_seq <- seq(n) entropy_list <- n_seq for (i in n_seq){ entropy_list[i] <- entropy_lz(x,i) } } total_entropy <- 1/(prefactor*sum(entropy_list)) return(total_entropy) } entropy_lz <- function(x,i){ past <- x[1:i-1] n <- length(x) lp <- length(past) future <- x[i:n] go_on <- 1 count_len <- 0 past_string <- paste(past, collapse="#") while (go_on>0){ new_seq <- x[i:(i+count_len)] fut_string <- paste(new_seq, collapse="#") count_len <- count_len+1 if (grepl(fut_string,past_string)!=1){ go_on <- -1 } } return(count_len) } x <- scan("time_series25_.dat", what="") S <- total_entropy_lz(x) On 10/08/2010 07:30 PM, jim holtman wrote: More specificity: how long is the string, what is the pattern you are matching against? It sounds like you might have a complex pattern that in trying to match the string might be doing a lot of back tracking and such. There is an O'Reilly book on Mastering Regular Expression that might help you understand what might be happening. So if you can provide a better example than just the error message, it would be helpful. On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella> wrote: Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#
Re: [R] Memory management in R
Thanks for lending a helping hand. I put together a self-contained example. Basically, it all relies on a couple of functions, where one function simply iterates the application of the other function. I am trying to implement the so-called Lempel-Ziv entropy estimator. The idea is to choose a position i along a string x (standing for a time series) and find the length of the shortest string starting from i which has never occurred before i. Please find below the R snippet which requires an input file (a simple text file) you can download from http://dl.dropbox.com/u/5685598/time_series25_.dat What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists. Many thanks Lorenzo ## total_entropy_lz <- function(x){ if (length(x)==1){ print("sequence too short") return("error") } else{ n <- length(x) prefactor <- 1/(n*log(n)/log(2)) n_seq <- seq(n) entropy_list <- n_seq for (i in n_seq){ entropy_list[i] <- entropy_lz(x,i) } } total_entropy <- 1/(prefactor*sum(entropy_list)) return(total_entropy) } entropy_lz <- function(x,i){ past <- x[1:i-1] n <- length(x) lp <- length(past) future <- x[i:n] go_on <- 1 count_len <- 0 past_string <- paste(past, collapse="#") while (go_on>0){ new_seq <- x[i:(i+count_len)] fut_string <- paste(new_seq, collapse="#") count_len <- count_len+1 if (grepl(fut_string,past_string)!=1){ go_on <- -1 } } return(count_len) } x <- scan("time_series25_.dat", what="") S <- total_entropy_lz(x) On 10/08/2010 07:30 PM, jim holtman wrote: More specificity: how long is the string, what is the pattern you are matching against? It sounds like you might have a complex pattern that in trying to match the string might be doing a lot of back tracking and such. There is an O'Reilly book on Mastering Regular Expression that might help you understand what might be happening. So if you can provide a better example than just the error message, it would be helpful. On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella wrote: Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
> Date: Fri, 8 Oct 2010 13:30:59 -0400 > From: jholt...@gmail.com > To: lorenzo.ise...@gmail.com > CC: r-help@r-project.org > Subject: Re: [R] Memory management in R > > More specificity: how long is the string, what is the pattern you are > matching against? It sounds like you might have a complex pattern > that in trying to match the string might be doing a lot of back > tracking and such. There is an O'Reilly book on Mastering Regular > Expression that might help you understand what might be happening. So > if you can provide a better example than just the error message, it > would be helpful. This is possibly a stack issue. Error messages are not often literal, I have seen out of memory for graphic device objects :) Regex suggests a stack issue but that would be a guess on the mechanism of death but what you probably really want is a simpler regex :) > > On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella wrote: > > Dear All, > > I am experiencing some problems with a script of mine. > > It crashes with this message > > > > Error in grepl(fut_string, past_string) : > > invalid regular expression > > '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 > > Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl > > In addition: Warning message: > > In grepl(fut_string, past_string) : regcomp error: 'Out of memory' > > Execution halted > > > > To make a long story short, I use some functions which eventually call grepl > > on very long strings to check whether a certain substring is part of a > > longer string. > > Now, the script technically works (it never crashes when I run it on a > > smaller dataset) and the problem does not seem to be RAM memory (I have > > several GB of RAM on my machine and its consumption never shoots up so my > > machine never resorts to swap memory). > > So (though I am not an expert) it looks like the problem is some limitation > > of grepl or R memory management. > > Any idea about how I could tackle this problem or how I can profile my code > > to fix it (though it really seems to me that I have to find a way to allow R > > to process longer strings). > > Any suggestion is appreciated. > > Cheers > > > > Lorenzo > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
More specificity: how long is the string, what is the pattern you are matching against? It sounds like you might have a complex pattern that in trying to match the string might be doing a lot of back tracking and such. There is an O'Reilly book on Mastering Regular Expression that might help you understand what might be happening. So if you can provide a better example than just the error message, it would be helpful. On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella wrote: > Dear All, > I am experiencing some problems with a script of mine. > It crashes with this message > > Error in grepl(fut_string, past_string) : > invalid regular expression > '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 > Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl > In addition: Warning message: > In grepl(fut_string, past_string) : regcomp error: 'Out of memory' > Execution halted > > To make a long story short, I use some functions which eventually call grepl > on very long strings to check whether a certain substring is part of a > longer string. > Now, the script technically works (it never crashes when I run it on a > smaller dataset) and the problem does not seem to be RAM memory (I have > several GB of RAM on my machine and its consumption never shoots up so my > machine never resorts to swap memory). > So (though I am not an expert) it looks like the problem is some limitation > of grepl or R memory management. > Any idea about how I could tackle this problem or how I can profile my code > to fix it (though it really seems to me that I have to find a way to allow R > to process longer strings). > Any suggestion is appreciated. > Cheers > > Lorenzo > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
These questions are OS-specific. Please provide sessionInfo() or other details as needed -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lorenzo Isella Sent: Friday, October 08, 2010 1:12 PM To: r-help Subject: [R] Memory management in R Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory management in R
On 10/08/2010 07:25 PM, Doran, Harold wrote: These questions are OS-specific. Please provide sessionInfo() or other details as needed I see. I am running R on a 64 bit machine running Ubuntu 10.04 > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base and in case it matters, this is the output of my top command $ top top - 19:28:21 up 8:04, 8 users, load average: 0.60, 0.72, 1.33 Tasks: 220 total, 1 running, 219 sleeping, 0 stopped, 0 zombie Cpu(s): 10.3%us, 0.6%sy, 0.0%ni, 87.2%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6110484k total, 3847008k used, 2263476k free,72748k buffers Swap: 2929656k total,0k used, 2929656k free, 2621420k cached Cheers Lorenzo -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Sent: Friday, October 08, 2010 1:12 PM To: r-help Subject: [R] Memory management in R Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory management in R
Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory management in R
You might want to mention/talk about packages that enhance R's ability to work with less RAM / more data, such as package SOAR (transparently moving objects between RAM and disk) and ff (which allows vectors and dataframes larger than RAM and which supports dense datatypes like true boolean, short integers etc.). Jens Oehlschlägel -Ursprüngliche Nachricht- Von: john Gesendet: Jun 16, 2010 12:20:17 PM An: r-help@r-project.org Betreff: [R] memory management in R > > >I have volunteered to give a short talk on "memory management in R" > to my local R user group, mainly to motivate myself to learn about it. > >The focus will be on what a typical R coder might want to know ( e.g. how >objects are created, call by value, basics of garbage collection ) but I >want to go a little deeper just in case there are some advanced users in the >crowd. > >Here are the resources I am using right now > Chambers book "Software for Data Analysis" > Manuals such as "R Internals" and "Writing R Extensions" > >Any suggestions on other sources of information? > >There are still some things that are not clear to me, such as > - how to make sense of the output from various memory diagnostics such as >memory.profile ... are these counts? >How to get the amount of memory used: gc() and memory.size() seem to >differ > - what gets allocated on the heap versus stack > - why the name "cons cells" for the stack allocation > >Any help with these would be greatly appreciated. > >Thanks greatly, > >John Muller > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory management in R
I have volunteered to give a short talk on "memory management in R" to my local R user group, mainly to motivate myself to learn about it. The focus will be on what a typical R coder might want to know ( e.g. how objects are created, call by value, basics of garbage collection ) but I want to go a little deeper just in case there are some advanced users in the crowd. Here are the resources I am using right now Chambers book "Software for Data Analysis" Manuals such as "R Internals" and "Writing R Extensions" Any suggestions on other sources of information? There are still some things that are not clear to me, such as - how to make sense of the output from various memory diagnostics such as memory.profile ... are these counts? How to get the amount of memory used: gc() and memory.size() seem to differ - what gets allocated on the heap versus stack - why the name "cons cells" for the stack allocation Any help with these would be greatly appreciated. Thanks greatly, John Muller __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.