We could add a check at the random pattern level generation which would apply the pattern to a complex unicode string and then verify it's valid utf16 afterwards. If it's not, the pattern would be picked again?
Dawid On Sun, May 27, 2012 at 2:49 PM, Robert Muir <[email protected]> wrote: > there is another situation, with the html stripper where we turn this assert > into an assume. I don't have the code in front of me, but I think it would > be good to just add this as a toggle to mocktokenizer (throw > assumptionviolated in this case), if that's not how it works already. > > On May 27, 2012 4:51 AM, "Dawid Weiss (JIRA)" <[email protected]> wrote: >> >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284124#comment-13284124 >> ] >> >> Dawid Weiss commented on LUCENE-4078: >> ------------------------------------- >> >> Thanks for the insight -- interesting. >> >> bq. is tangential to the problem – which seems to be that the JVM lets the >> empty pattern split in between chars instead of codepoints, which seems like >> a bug. >> >> Absolutely. This seems like a bug to me too. >> >> > PatternReplaceCharFilter assertion error >> > ---------------------------------------- >> > >> > Key: LUCENE-4078 >> > URL: https://issues.apache.org/jira/browse/LUCENE-4078 >> > Project: Lucene - Java >> > Issue Type: Bug >> > Reporter: Dawid Weiss >> > Assignee: Dawid Weiss >> > Priority: Minor >> > Fix For: 4.0 >> > >> > >> > Build: https://builds.apache.org/job/Lucene-trunk/1942/ >> > 1 tests failed. >> > REGRESSION: >> > org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings >> > Error Message: >> > Stack Trace: >> > java.lang.AssertionError >> > at >> > __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0) >> > at >> > org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153) >> > at >> > org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123) >> > at >> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558) >> > at >> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488) >> > at >> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430) >> > at >> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424) >> > at >> > org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > at >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > at java.lang.reflect.Method.invoke(Method.java:616) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) >> > at >> > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> > at >> > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) >> > at >> > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> > at >> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> > at >> > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) >> > at >> > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) >> > at >> > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> > at >> > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
