I was able to confirm the error on RH8.0 Linux and the segfault on Windows.
Note that PCRE is not being used, and if you add perl=TRUE to your [g]sub calls you get correct results extremely fast. The segfault is occurring in regexec, that is in the GNU regex code included in R. I am not clear it is worth spending any time on trying to find the problem in that code as - you can use perl=TRUE as an alternative - we will be replacing the GNU regex code in due course to cope with internationalization issues. On Fri, 27 Feb 2004 [EMAIL PROTECTED] wrote: > A possible regex bug when working with large strings. The > following code snippet > > t5 <- paste( c( "# === TEST", rep(' ', 2452294) ), collapse='') > str( sub("^.*TEST", "xyz", t5) ) > str( sub("^.*TEST", "xyz", substr(t5,0,200)) ) > > doesn't behave right; on one machine, the second and third > lines print different results [the second line, on the long > string, doesn't do the substitution], while on another, the > second line causes a segfault. Both are running R 1.8.1 > with PCRE, under NetBSD (1.6.1 and 1.6 respectively). > > Possible related (although perhaps not a bug): > > function(n) { > line <- paste(as.character(trunc(runif(n)*100)),collapse=" ") > system.time( rep <- gsub("[[:space:]]", "-", line) ) > } > > gives rather long times rising v sharply for big strings (eg > 2.2s at n=2e4, 360s at n=2e5 on AMD 1.2GHz). Other languages > aren't so slow on this task (eg n=2e5: 0.4s ruby 1.8.1, and > 5.2s python 2). Doubtless my extremely-quick-hack benchmarks > aren't fair, but the difference still seems rather big. > > Mark <>< > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel