Hi Hilmar, Hmm, it looks like I spoke too soon; the previous run was doing nothing as all of the cases were commented out. I can now see that the results of my runs are not massively different from that of yours. It would help if you could encourage your student to write a few unit tests so that we know what you are trying to achieve and to simplify the testing.
Just a thought Thanks, Peter On 24 October 2012 17:47, Hilmar Lapp <[email protected]> wrote: > Hi everyone, > > Thanks for all your responses. Indeed I know that the Java regex API isn't an > enjoyable one to program with, and if the underlying task were about writing > something from scratch, I'd be all for avoiding regex's too if the same thing > could be achieved by string comparison. > > However, and of course I failed to say that initially, the task from which > this query is originating is about converting a Perl script to Java (not > because Perl is somehow bad, but because those Perl scripts have shown to be > an obstacle to easy cross-platform installation of the - mostly Java - > software they are a part of). That doesn't mean one couldn't in the course > also rewrite the code that uses regular expressions to one that doesn't, but > I also think it wise not to introduce multiple variables as a source of error > at once. > > Some of the responses would be best answered by looking at the expressions > and the code that uses them, so here are the two "benchmark" scripts. > > Java: https://gist.github.com/3940931 > Perl: https://gist.github.com/3940780 > > I'm also copying Dongye Meng here, who is a CS student at UNC working with us > on the project - if anyone has further wisdom to share about how to reduce > the performance gap between the two versions, he'd surely appreciate. > > -hilmar > > On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > >> Hilmar Lapp <[email protected]> writes: >>> They (at least as in java.util.regex) have been reported to me as >>> performing much slower (by several orders of magnitude) than the regex >>> implementation in Perl, and some simple benchmarking tests seem to >>> bear that out. Even after scrutinizing the benchmark and finding >>> nothing obvious, I'm still skeptical as to why this would be the case >>> - naively I would have assumed that the underlying runtime library is >>> implemented in C in both cases. But perhaps this is not true? >> >> >> Well, the difference is that Perl is perl, while Java is not; it all >> depends on the JVM, and libraries also. A quick shuftie at >> the source for the open-jdk libraries suggests that the regexp searching >> is done in Java -- it's not just a drop through to C. Always the problem >> with performance optimisation on Java -- you are only optimising for one >> situation. It might be interesting to see how much variation there is >> between JVMs. >> >> Like others, I would only use regexp as a last resort in Java anyway; >> compared to Perl, writing the code is painful. Still, I guess that you >> know this! >> >> Phil > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
