On Oct 8, 2010, at 9:19 PM, Mike Marchywka wrote:
----------------------------------------
From: dwinsem...@comcast.net
To: lorenzo.ise...@gmail.com
Date: Fri, 8 Oct 2010 19:30:45 -0400
CC: r-help@r-project.org
Subject: Re: [R] Memory management in R


On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote:


Please find below the R snippet which requires an input file (a
simple text file) you can download from

http://dl.dropbox.com/u/5685598/time_series25_.dat

What puzzles me is that the list is not really long (less than 2000
entries) and I have not experienced the same problem even with
longer lists.

But maybe your loop terminated in them eaarlier/ Someplace between
11*225 and 11*240 the grepping machine gives up:

eprs <- paste(rep("aaaaaaaaaa", 225), collapse="#")
grepl(eprs, eprs)
[1] TRUE

eprs <- paste(rep("aaaaaaaaaa", 240), collapse="#")
grepl(eprs, eprs)
Error in grepl(eprs, eprs) :
invalid regular expression
'aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaa
In addition: Warning message:
In grepl(eprs, eprs) : regcomp error: 'Out of memory'

The complexity of the problem may depend on the distribution of
values. You have a very skewed distribution with the vast majority
being in the same value as appeared in your error message :



HTH (although I think it means you need to construct a different
implementation strategy);

You really need to look at the question posed by your regex and consider
the complexity of what you are asking and what likely implementations
would do with your regex.

The R regex machine (at least on a Mac with R 2.11.1) breaks when the length of the the pattern argument exceeds 2559 characters. There is no complexity for the regex parser here. No metacharacters were in the string.

Something like this probably needs to be implemented
in dedicated code to handle the more general case or you need to determine
if input data is pathological given your regex.

There is a Biostrings package in BioC that may provide more robust treatment of long strings.

--
David.


Being able to write something
concisely doesn't mean the execution of that something is simple. Even if it does manage to return a result, it likely will get very slow. In the past I have had to write my own simple regex compilers to handle a limited class of expressions to make the speed reasonable. In this case, depending on your objectives, dedicated code may even be helpful to you in understanding
the algorithm.


David.


Many thanks

Lorenzo


                                        

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to