[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792375#comment-13792375
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531202 from [~rcmuir] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531202 ]

LUCENE-5269: Fix NGramTokenFilter length filtering

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792402#comment-13792402
 ] 

Uwe Schindler commented on LUCENE-5269:
---

This is so crazy! Why did we never hit this combination before?

Thanks for fixing, although I see the CodePointLengthFilter not really as a bug 
fix, it is more a new feature! Maybe explicitely add this as new feature to 
changes.txt?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792424#comment-13792424
 ] 

Robert Muir commented on LUCENE-5269:
-

I didnt want new features mixed with bugfixes really :(

But in my opinion this was the simplest way to solve the problem: to just add a 
filter like this and for it to use that instead of LengthFilter.

I think it would be wierd to see new features in a 4.5.1?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792429#comment-13792429
 ] 

Robert Muir commented on LUCENE-5269:
-

{quote}
This is so crazy! Why did we never hit this combination before?
{quote}

This combination is especially good at finding the bug, here's why:
{code}
Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 
94);
TokenStream stream = new ShingleFilter(tokenizer, 5);
stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83);
{code}

The edge-ngram has min=2 max=94, its basically brute forcing every token size.
then the shingles makes tons of tokens with positionIncrement=0.
so it makes it easy for the (previously buggy ngramtokenfilter with wrong 
length filter) to misclassify tokens with its logic expecting codepoints, emit 
an initial token with posinc=0:

{code}
if ((curPos + curGramSize) = curCodePointCount) {
...
  posIncAtt.setPositionIncrement(curPosInc);
{code}


 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792461#comment-13792461
 ] 

Uwe Schindler commented on LUCENE-5269:
---

bq. I didnt want new features mixed with bugfixes really 

I agree! But now we have the new feature, so I just asked to add this as a 
separate entry in CHANGES.txt under New features, just the new filter nothing 
more.

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792845#comment-13792845
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531368 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531368 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792846#comment-13792846
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531369 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791661#comment-13791661
 ] 

Adrien Grand commented on LUCENE-5269:
--

Good catch. This was definitely not intentional, thanks for fixing those tests!

Patch looks good to me!

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792311#comment-13792311
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531186 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531186 ]

LUCENE-5269: Fix NGramTokenFilter length filtering

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792325#comment-13792325
 ] 

Robert Muir commented on LUCENE-5269:
-

The test needs some improvement... after backporting i ran tests about 30 
times, and I hit this one:

ant test  -Dtestcase=TestBugInSomething 
-Dtests.method=testUnicodeShinglesAndNgrams -Dtests.seed=1BFA8BADE39EDF70 
-Dtests.slow=true -Dtests.locale=th_TH_TH_#u-nu-thai 
-Dtests.timezone=Europe/Copenhagen -Dtests.file.encoding=US-ASCII

{noformat}
   [junit4] Suite: org.apache.lucene.analysis.core.TestBugInSomething
   [junit4]   2 TEST FAIL: useCharFilter=true text='ike to thank the rap'
   [junit4]   2 ?.?. ??,  ?:??:?? ?? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2 WARNING: Uncaught exception in thread: 
Thread[Thread-2,5,TGRP-TestBugInSomething]
   [junit4]   2 java.lang.OutOfMemoryError: GC overhead limit exceeded
   [junit4]   2at 
__randomizedtesting.SeedInfo.seed([1BFA8BADE39EDF70]:0)
   [junit4]   2at 
org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.toString(CharTermAttributeImpl.java:269)
   [junit4]   2at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:696)
   [junit4]   2at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605)
   [junit4]   2at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:57)
   [junit4]   2at 
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:476)
   [junit4]   2 
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestBugInSomething 
-Dtests.method=testUnicodeShinglesAndNgrams -Dtests.seed=1BFA8BADE39EDF70 
-Dtests.slow=true -Dtests.locale=th_TH_TH_#u-nu-thai 
-Dtests.timezone=Europe/Copenhagen -Dtests.file.encoding=US-ASCII
   [junit4] ERROR   30.6s | TestBugInSomething.testUnicodeShinglesAndNgrams 
   [junit4] Throwable #1: java.lang.RuntimeException: some thread(s) failed
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:526)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:428)
   [junit4]at 
org.apache.lucene.analysis.core.TestBugInSomething.testUnicodeShinglesAndNgrams(TestBugInSomething.java:255)
   [junit4]at java.lang.Thread.run(Thread.java:724)Throwable #2: 
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=12, name=Thread-2, state=RUNNABLE, 
group=TGRP-TestBugInSomething]
   [junit4] Caused by: java.lang.OutOfMemoryError: GC overhead limit 
exceeded
   [junit4]at 
__randomizedtesting.SeedInfo.seed([1BFA8BADE39EDF70]:0)
   [junit4]at 
org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.toString(CharTermAttributeImpl.java:269)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:696)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:57)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:476)
   [junit4]   2 NOTE: test params are: 
codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY,
 chunkSize=313), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, 
chunkSize=313)), sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {}, 
locale=th_TH_TH_#u-nu-thai, timezone=Europe/Copenhagen
   [junit4]   2 NOTE: Linux 3.5.0-27-generic amd64/Oracle Corporation 1.7.0_25 
(64-bit)/cpus=8,threads=1,free=155107808,total=477233152
   [junit4]   2 NOTE: All tests run in this JVM: [TestBugInSomething]
   [junit4] Completed in 30.92s, 1 test, 1 error  FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures:
   [junit4]   - 
org.apache.lucene.analysis.core.TestBugInSomething.testUnicodeShinglesAndNgrams
{noformat}

I will see if i can make a less-ridiculous version of the test that still fails 
with the bug.

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 

[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792330#comment-13792330
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531193 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531193 ]

LUCENE-5269: make test use less RAM

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792334#comment-13792334
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531195 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531195 ]

LUCENE-5269: Fix NGramTokenFilter length filtering

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790940#comment-13790940
 ] 

Robert Muir commented on LUCENE-5269:
-

Now i see stuff like this:
{noformat}
EdgeNGramTokenizer.reset()
ShingleFilter.reset()
NGramTokenFilter.reset()
EdgeNGramTokenizer-term=⥞ ,bytes=[e2 a5 9e 
20],positionIncrement=1,positionLength=1,startOffset=0,endOffset=2,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋ,bytes=[e2 a5 9e 20 f0 90 a4 
8b],positionIncrement=1,positionLength=1,startOffset=0,endOffset=4,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 
9f],positionIncrement=1,positionLength=1,startOffset=0,endOffset=6,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट ,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 
20],positionIncrement=1,positionLength=1,startOffset=0,endOffset=7,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट x,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 
78],positionIncrement=1,positionLength=1,startOffset=0,endOffset=8,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट xq,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 
78 
71],positionIncrement=1,positionLength=1,startOffset=0,endOffset=9,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट xqx,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 
78 71 
78],positionIncrement=1,positionLength=1,startOffset=0,endOffset=10,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट xqxp,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 
20 78 71 78 
70],positionIncrement=1,positionLength=1,startOffset=0,endOffset=11,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट xqxp ,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 
20 78 71 78 70 
20],positionIncrement=1,positionLength=1,startOffset=0,endOffset=12,type=word,clearCalled=true
EdgeNGramTokenizer-term=⥞ ऋट xqxp ,bytes=[e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 
20 78 71 78 70 20 
16],positionIncrement=1,positionLength=1,startOffset=0,endOffset=13,type=word,clearCalled=true
EdgeNGramTokenizer.end()
ShingleFilter-term=⥞ ,bytes=[e2 a5 9e 
20],positionIncrement=1,positionLength=1,startOffset=0,endOffset=2,type=word,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 f0 90 a4 
8b],positionIncrement=0,positionLength=2,startOffset=0,endOffset=4,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 f0 90 a4 8b 
20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 
9f],positionIncrement=0,positionLength=3,startOffset=0,endOffset=6,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट ,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 f0 90 
a4 8b 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 e2 a5 9e 20 f0 90 a4 8b f0 90 
a4 9f 
20],positionIncrement=0,positionLength=4,startOffset=0,endOffset=7,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट  ⥞ ऋट x,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 
f0 90 a4 8b 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 e2 a5 9e 20 f0 90 a4 8b 
f0 90 a4 9f 20 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 
78],positionIncrement=0,positionLength=5,startOffset=0,endOffset=8,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट  ⥞ ऋट x ⥞ ऋट xq,bytes=[e2 a5 9e 20 20 e2 
a5 9e 20 f0 90 a4 8b 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 e2 a5 9e 20 f0 
90 a4 8b f0 90 a4 9f 20 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 20 e2 a5 
9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 
71],positionIncrement=0,positionLength=6,startOffset=0,endOffset=9,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट  ⥞ ऋट x ⥞ ऋट xq ⥞ ऋट xqx,bytes=[e2 a5 9e 
20 20 e2 a5 9e 20 f0 90 a4 8b 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 e2 a5 
9e 20 f0 90 a4 8b f0 90 a4 9f 20 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 
20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 20 e2 a5 9e 20 f0 90 a4 8b f0 
90 a4 9f 20 78 71 
78],positionIncrement=0,positionLength=7,startOffset=0,endOffset=10,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट  ⥞ ऋट x ⥞ ऋट xq ⥞ ऋट xqx ⥞ ऋट 
xqxp,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 f0 90 a4 8b 20 e2 a5 9e 20 f0 90 a4 8b 
f0 90 a4 9f 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 20 e2 a5 9e 20 f0 90 a4 
8b f0 90 a4 9f 20 78 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 20 e2 a5 
9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 78 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 
9f 20 78 71 78 
70],positionIncrement=0,positionLength=8,startOffset=0,endOffset=11,type=shingle,clearCalled=true
ShingleFilter-term=⥞  ⥞ ऋ ⥞ ऋट ⥞ ऋट  ⥞ ऋट x ⥞ ऋट xq ⥞ ऋट xqx ⥞ ऋट xqxp ⥞ ऋट 
xqxp ,bytes=[e2 a5 9e 20 20 e2 a5 9e 20 f0 90 a4 8b 20 e2 a5 9e 20 f0 90 a4 8b 
f0 90 a4 9f 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 20 e2 a5 9e 20 f0 90 a4 
8b f0 90 a4 9f 20 78 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 20 e2 a5 
9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 78 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 
9f 20 78 71 78 70 20 e2 a5 9e 20 f0 90 a4 8b f0 90 a4 9f 20 78 71 78 70