Re: [jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer

Mark Miller Thu, 02 Aug 2007 14:19:55 -0700



Mark -- have you tried the jflex-analyzer-r560135-patch.txt patch with your wikipedia diff test? 
That's the early one whose grammar was "dot for dot" translated from the original JavaCC 
spec -- for further patches I did some "optimizations", which seem to have broken the 
compatibility...

The test is Mike's and I think it is off your latest patch. Looks likethe optimizations might have to go then?

Incidentally, what was the motivation for requiring the <NUM> token to have 
numbers only in every second segment and not in any segment?

I don't think the rule is "every second segment" but "at least everyother segment". Why this rule was made, I am not sure; I am guessing itwas just a good rule of thumb to catch a lot of serial numbers, modelnumbers, etc but without going too overboard in the matching.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer

Reply via email to