[
https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284087#comment-13284087
]
Dawid Weiss commented on LUCENE-4078:
-------------------------------------
bq. I'm not really following you there ... '|' is the OR operator, so the regex
"|" is a redundant way of saying "" which is "the empty pattern" or a way of
saying "match the empty string".
Yeah, I am a bit surprised at what "" matches. It doesn't match an empty
string. It matches an empty string in between characters... or in other words,
it matches what's not there. Makes sense when you think of it.
As for '|', I looked at it from automata theory point of view -- '|' doesn't
need any arguments or post-arguments (or states), unlike '+', '*' or the like
which need a state to reference. I'd be convinced '|' is a consistent way of
saying 'match empty string or empty string' if "+" pattern worked ("match empty
string one or more times"), but it doesn't -- this will fail with an error. So
'|' is kind of special here.
I don't know much about regexp theory to argue if I'm right or wrong though. I
don't even think there is one "right" way to do things if this is a true quote:
I define UNIX as “30 definitions of regular expressions living under one roof.”
—Don Knuth
Dawid
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
> Key: LUCENE-4078
> URL: https://issues.apache.org/jira/browse/LUCENE-4078
> Project: Lucene - Java
> Issue Type: Bug
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:
> org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
> at
> __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
> at
> org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
> at
> org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
> at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
> at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
> at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
> at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
> at
> org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]