[ 
https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284087#comment-13284087
 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

bq. I'm not really following you there ... '|' is the OR operator, so the regex 
"|" is a redundant way of saying "" which is "the empty pattern" or a way of 
saying "match the empty string".

Yeah, I am a bit surprised at what "" matches. It doesn't match an empty 
string. It matches an empty string in between characters... or in other words, 
it matches what's not there. Makes sense when you think of it.

As for '|', I looked at it from automata theory point of view -- '|' doesn't 
need any arguments or post-arguments (or states), unlike '+', '*' or the like 
which need a state to reference. I'd be convinced '|' is a consistent way of 
saying 'match empty string or empty string' if "+" pattern worked ("match empty 
string one or more times"), but it doesn't -- this will fail with an error. So 
'|' is kind of special here.

I don't know much about regexp theory to argue if I'm right or wrong though. I 
don't even think there is one "right" way to do things if this is a true quote:

I define UNIX as “30 definitions of regular expressions living under one roof.” 
—Don Knuth

Dawid
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at 
> __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at 
> org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at 
> org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at 
> org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to