On Mon, Aug 25, 2014 at 12:47:06PM +0200, Daniel Naber wrote: > On 2014-08-25 12:27, Silvan Jegen wrote: > > > I agree that it would be about equally confusing (and inelegant) but at > > least it would save some unnecessary work for LT. > > I don't think we should argue with performance unless there's a > real-world use case that's actually too slow and we can show that the > new solution is actually significantly faster.
I don't know about the real-world use case but I tested both implementations using languagetool-standalone.jar on a 114MB text file. I ran both versions ten times and on average the suggested one was about 15% faster (note that it was not very rigorous testing and the difference between runs was surprisingly high at times). This simple testing also highlighted an oversight of mine. If the tokenized List<String> result is ignored, the replaceSoftHyphens function won't have anything to work with. That means that at least some of the speed gain will be due to this function not being used. Not handling soft hyphens does make sense for Japanese since they are only very rarely used. They seem to be allowed according to 3.1.10f in http://www.w3.org/TR/2009/NOTE-jlreq-20090604/ though. Cheers, Silvan ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel