[ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284113#comment-13284113 ]
Hoss Man commented on LUCENE-4078: ---------------------------------- bq. It doesn't match an empty string. It matches an empty string in between characters... Well, it's more complicated then that. it *does* match the empty string (in the sense of "does this regex match this entire string which happens to be empty) but in the context of "find" or "replace" on a larger string you are correct that it matches nothing, which means it matches the emptiness between characters. bq. I'd be convinced '|' is a consistent way of saying 'match empty string or empty string' if "+" pattern worked ("match empty string one or more times"), but it doesn't -- this will fail with an error. So '|' is kind of special here. I think that's just a fluke of syntax/precedence ... if you use parens (capturing or otherwise) you can say "match the empty pattern 1 or more times)... {code} $ perl -MData::Dumper -le 'print Dumper split /(?:)+/, "ABCD";' $VAR1 = 'A'; $VAR2 = 'B'; $VAR3 = 'C'; $VAR4 = 'D'; {code} Bottom Line: these patterns are all valid and meaningful, and everything we've discussed is tangential to the problem -- which seems to be that the JVM lets the empty pattern split in between chars instead of codepoints, which seems like a bug. > PatternReplaceCharFilter assertion error > ---------------------------------------- > > Key: LUCENE-4078 > URL: https://issues.apache.org/jira/browse/LUCENE-4078 > Project: Lucene - Java > Issue Type: Bug > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: 4.0 > > > Build: https://builds.apache.org/job/Lucene-trunk/1942/ > 1 tests failed. > REGRESSION: > org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings > Error Message: > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0) > at > org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153) > at > org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424) > at > org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org