[
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689623#comment-16689623
]
Steve Rowe commented on LUCENE-8517:
------------------------------------
Another reproducing seed, though it only fails for me if I run the whole suite,
i.e. remove {{-Dtests.method=testRandomChainsWithLargeStrings}} from the
cmdline - maybe this test method is affected by other methods somehow? From
[https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.x/377]:
{noformat}
[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
[junit4] 2> Exception from random analyzer:
[junit4] 2> charfilters=
[junit4] 2> tokenizer=
[junit4] 2>
org.apache.lucene.analysis.MockTokenizer(org.apache.lucene.util.AttributeFactory$1@9c912349,
initial state: 0
[junit4] 2> state 0 [reject]:
[junit4] 2> a -> 1
[junit4] 2> b -> 2
[junit4] 2> f -> 3
[junit4] 2> i -> 4
[junit4] 2> n -> 5
[junit4] 2> o -> 6
[junit4] 2> s -> 7
[junit4] 2> t -> 8
[junit4] 2> w -> 9
[junit4] 2> state 1 [accept]:
[junit4] 2> n -> 10
[junit4] 2> r -> 11
[junit4] 2> s -> 12
[junit4] 2> t -> 13
[junit4] 2> state 2 [reject]:
[junit4] 2> e -> 14
[junit4] 2> u -> 15
[junit4] 2> y -> 16
[junit4] 2> state 3 [reject]:
[junit4] 2> o -> 17
[junit4] 2> state 4 [reject]:
[junit4] 2> f -> 18
[junit4] 2> n -> 19
[junit4] 2> s -> 20
[junit4] 2> t -> 21
[junit4] 2> state 5 [reject]:
[junit4] 2> o -> 22
[junit4] 2> state 6 [reject]:
[junit4] 2> f -> 23
[junit4] 2> n -> 24
[junit4] 2> r -> 25
[junit4] 2> state 7 [reject]:
[junit4] 2> u -> 26
[junit4] 2> state 8 [reject]:
[junit4] 2> h -> 27
[junit4] 2> o -> 28
[junit4] 2> state 9 [reject]:
[junit4] 2> a -> 29
[junit4] 2> i -> 30
[junit4] 2> state 10 [accept]:
[junit4] 2> d -> 31
[junit4] 2> state 11 [reject]:
[junit4] 2> e -> 32
[junit4] 2> state 12 [accept]:
[junit4] 2> state 13 [accept]:
[junit4] 2> state 14 [accept]:
[junit4] 2> state 15 [reject]:
[junit4] 2> t -> 33
[junit4] 2> state 16 [accept]:
[junit4] 2> state 17 [reject]:
[junit4] 2> r -> 34
[junit4] 2> state 18 [accept]:
[junit4] 2> state 19 [accept]:
[junit4] 2> t -> 35
[junit4] 2> state 20 [accept]:
[junit4] 2> state 21 [accept]:
[junit4] 2> state 22 [accept]:
[junit4] 2> t -> 36
[junit4] 2> state 23 [accept]:
[junit4] 2> state 24 [accept]:
[junit4] 2> state 25 [accept]:
[junit4] 2> state 26 [reject]:
[junit4] 2> c -> 37
[junit4] 2> state 27 [reject]:
[junit4] 2> a -> 38
[junit4] 2> e -> 39
[junit4] 2> i -> 40
[junit4] 2> state 28 [accept]:
[junit4] 2> state 29 [reject]:
[junit4] 2> s -> 41
[junit4] 2> state 30 [reject]:
[junit4] 2> l -> 42
[junit4] 2> t -> 43
[junit4] 2> state 31 [accept]:
[junit4] 2> state 32 [accept]:
[junit4] 2> state 33 [accept]:
[junit4] 2> state 34 [accept]:
[junit4] 2> state 35 [reject]:
[junit4] 2> o -> 44
[junit4] 2> state 36 [accept]:
[junit4] 2> state 37 [reject]:
[junit4] 2> h -> 45
[junit4] 2> state 38 [reject]:
[junit4] 2> t -> 46
[junit4] 2> state 39 [accept]:
[junit4] 2> i -> 47
[junit4] 2> n -> 48
[junit4] 2> r -> 49
[junit4] 2> s -> 50
[junit4] 2> y -> 51
[junit4] 2> state 40 [reject]:
[junit4] 2> s -> 52
[junit4] 2> state 41 [accept]:
[junit4] 2> state 42 [reject]:
[junit4] 2> l -> 53
[junit4] 2> state 43 [reject]:
[junit4] 2> h -> 54
[junit4] 2> state 44 [accept]:
[junit4] 2> state 45 [accept]:
[junit4] 2> state 46 [accept]:
[junit4] 2> state 47 [reject]:
[junit4] 2> r -> 55
[junit4] 2> state 48 [accept]:
[junit4] 2> state 49 [reject]:
[junit4] 2> e -> 56
[junit4] 2> state 50 [reject]:
[junit4] 2> e -> 57
[junit4] 2> state 51 [accept]:
[junit4] 2> state 52 [accept]:
[junit4] 2> state 53 [accept]:
[junit4] 2> state 54 [accept]:
[junit4] 2> state 55 [accept]:
[junit4] 2> state 56 [accept]:
[junit4] 2> state 57 [accept]:
[junit4] 2> , true)
[junit4] 2> filters=
[junit4] 2>
org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@13de14e
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
[junit4] 2>
Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@2c0047b9
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
2)
[junit4] 2>
org.apache.lucene.analysis.miscellaneous.DateRecognizerFilter(ValidatingTokenFilter@7846ee89
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
[junit4] 2> NOTE: download the large Jenkins line-docs file by running
'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains
-Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=8021DE70475B4140
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
-Dtests.locale=en-PH -Dtests.timezone=America/Belize -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 14.4s J1 |
TestRandomChains.testRandomChainsWithLargeStrings <<<
[junit4] > Throwable #1: java.lang.IllegalStateException: stage 2:
inconsistent endOffset at pos=1: 2 vs 3; token=be s s
[junit4] > at
__randomizedtesting.SeedInfo.seed([8021DE70475B4140:EA7A61611E1561B3]:0)
[junit4] > at
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:125)
[junit4] > at
org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:49)
[junit4] > at
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
[junit4] > at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
[junit4] > at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
[junit4] > at
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
[junit4] > at java.lang.Thread.run(Thread.java:748)
[junit4] 2> NOTE: leaving temporary files on disk at:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/checkout/lucene/build/analysis/common/test/J1/temp/lucene.analysis.core.TestRandomChains_8021DE70475B4140-001
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene70):
{dummy=PostingsFormat(name=LuceneVarGapDocFreqInterval)}, docValues:{},
maxPointsInLeafNode=1764, maxMBSortInHeap=6.560614608283627,
sim=RandomSimilarity(queryNorm=true): {dummy=DFR GLZ(0.3)}, locale=en-PH,
timezone=America/Belize
[junit4] 2> NOTE: Linux 4.4.0-137-generic amd64/Oracle Corporation
1.8.0_191 (64-bit)/cpus=4,threads=1,free=210477784,total=288358400
{noformat}
> TestRandomChains.testRandomChainsWithLargeStrings failure
> ---------------------------------------------------------
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/analysis
> Reporter: Steve Rowe
> Priority: Major
>
> From
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText],
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
> [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
> [junit4] 2> Exception from random analyzer:
> [junit4] 2> charfilters=
> [junit4] 2>
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
> java.io.StringReader@70dde633)
> [junit4] 2>
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
> [junit4] 2> tokenizer=
> [junit4] 2> org.apache.lucene.analysis.th.ThaiTokenizer()
> [junit4] 2> filters=
> [junit4] 2>
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
> org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
> [junit4] 2>
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
> OneTimeWrapper@aa1c073
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
> [junit4] 2>
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
> 4, <NUM>, <SOUTHEAST_ASIAN>)
> [junit4] 2>
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true
> -Dtests.file.encoding=US-ASCII
> [junit4] ERROR 0.46s J2 |
> TestRandomChains.testRandomChainsWithLargeStrings <<<
> [junit4] > Throwable #1: java.lang.IllegalStateException: stage 3:
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
> [junit4] > at
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
> [junit4] > at
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
> [junit4] > at
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
> [junit4] > at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
> [junit4] > at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
> [junit4] > at
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
> [junit4] > at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit4] > at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [junit4] > at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit4] > at
> java.base/java.lang.reflect.Method.invoke(Method.java:564)
> [junit4] > at java.base/java.lang.Thread.run(Thread.java:844)
> [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70):
> {dummy=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))},
> docValues:{}, maxPointsInLeafNode=214, maxMBSortInHeap=5.729405811878087,
> sim=RandomSimilarity(queryNorm=true): {}, locale=en-ZW,
> timezone=Atlantic/Faroe
> [junit4] 2> NOTE: Linux 4.15.0-32-generic amd64/Oracle Corporation
> 10.0.1 (64-bit)/cpus=8,threads=1,free=266844648,total=518979584
> [junit4] 2> NOTE: All tests run in this JVM: [TestOptionalCondition,
> TestSerbianNormalizationRegularFilter, TestCommonGramsFilterFactory,
> TestDoubleEscape, TestDictionaryCompoundWordTokenFilterFactory,
> TestNorwegianMinimalStemFilter, TestCzechStemmer,
> TestTurkishLowerCaseFilterFactory, TestAnalyzers,
> TestScandinavianFoldingFilterFactory, TestReversePathHierarchyTokenizer,
> TestSimplePatternTokenizer, TestGalicianMinimalStemFilter, MinHashFilterTest,
> TestPortugueseStemFilterFactory, TestPersianCharFilter,
> TestPerFieldAnalyzerWrapper, TestRandomChains]
> [junit4] Completed [73/291 (1!)] on J2 in 3.12s, 2 tests, 1 error <<<
> FAILURES!
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]