[ https://issues.apache.org/jira/browse/LUCENE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758201#comment-16758201 ]
ASF subversion and git services commented on LUCENE-8676: --------------------------------------------------------- Commit e3ac4c9180a0eb6f1c7a3e49d1a8cda8669ae3fa in lucene-solr's branch refs/heads/branch_8x from Jim Ferenczi [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e3ac4c9 ] LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars). > TestKoreanTokenizer#testRandomHugeStrings failure > ------------------------------------------------- > > Key: LUCENE-8676 > URL: https://issues.apache.org/jira/browse/LUCENE-8676 > Project: Lucene - Core > Issue Type: Bug > Reporter: Jim Ferenczi > Priority: Major > Attachments: LUCENE-8676.patch > > > KoreanTokenizer#testRandomHugeString failed in CI with the following > exception: > {noformat} > [junit4] > Throwable #1: java.lang.AssertionError > [junit4] > at > __randomizedtesting.SeedInfo.seed([8C5E2BE10F581CB:90E6857D4E833D83]:0) > [junit4] > at > org.apache.lucene.analysis.ko.KoreanTokenizer.add(KoreanTokenizer.java:334) > [junit4] > at > org.apache.lucene.analysis.ko.KoreanTokenizer.parse(KoreanTokenizer.java:707) > [junit4] > at > org.apache.lucene.analysis.ko.KoreanTokenizer.incrementToken(KoreanTokenizer.java:377) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:748) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:659) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:561) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:474) > [junit4] > at > org.apache.lucene.analysis.ko.TestKoreanTokenizer.testRandomHugeStrings(TestKoreanTokenizer.java:313) > [junit4] > at java.lang.Thread.run(Thread.java:748) > [junit4] 2> NOTE: leaving temporary files > {noformat} > I am able to reproduce locally with: > {noformat} > ant test -Dtestcase=TestKoreanTokenizer -Dtests.method=testRandomHugeStrings > -Dtests.seed=8C5E2BE10F581CB -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.7/test-data/enwiki.random.lines.txt > -Dtests.locale=uk-UA -Dtests.timezone=Europe/Istanbul -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1 > {noformat} > After some investigation I found out that the position of the buffer is not > updated when the maximum backtrace size is reached (1024). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org