A possible regex bug when working with large strings. The following code snippet
t5 <- paste( c( "# === TEST", rep(' ', 2452294) ), collapse='') str( sub("^.*TEST", "xyz", t5) ) str( sub("^.*TEST", "xyz", substr(t5,0,200)) ) doesn't behave right; on one machine, the second and third lines print different results [the second line, on the long string, doesn't do the substitution], while on another, the second line causes a segfault. Both are running R 1.8.1 with PCRE, under NetBSD (1.6.1 and 1.6 respectively). Possible related (although perhaps not a bug): function(n) { line <- paste(as.character(trunc(runif(n)*100)),collapse=" ") system.time( rep <- gsub("[[:space:]]", "-", line) ) } gives rather long times rising v sharply for big strings (eg 2.2s at n=2e4, 360s at n=2e5 on AMD 1.2GHz). Other languages aren't so slow on this task (eg n=2e5: 0.4s ruby 1.8.1, and 5.2s python 2). Doubtless my extremely-quick-hack benchmarks aren't fair, but the difference still seems rather big. Mark <>< ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel