[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904419#comment-16904419 ] Chongchen Chen commented on SOLR-9894: -- As far as I know, IKTokenizer only tokenizes Chinese characters, it doesn't tokenize pinyin. So the bug is in org.wltea.pinyin.solr5.PinyinTokenFilterFactory, not IKTokenizer or Solr. Also I cannot find org.wltea.pinyin.solr5.PinyinTokenFilterFactory on internet. I think this issue can be closed. > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786758#comment-15786758 ] Erick Erickson commented on SOLR-9894: -- We've mentioned several times that this involves a tokenizer that is _not_ supported by Apache Solr, specifically: org.wltea.pinyin.solr5.PinyinTokenFilterFactory. You have yet to show that the problem isn't in this custom class. Plus, the class mentions Solr 5, yet you're logging this against Solr 6. Unless and until you can show that this issue is a problem with Solr and not this non-solr tokenizer there is little that we can do. If you would like to retain consulting services to debug this custom code, please contact one of the many consulting services. > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786574#comment-15786574 ] 王海涛 commented on SOLR-9894: --- Does anyone can resolve this bug? I will appreciate you, because this bug make my company search result so bad bad bad... > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781805#comment-15781805 ] 王海涛 commented on SOLR-9894: --- I operate this 4 steps one by one. setp1-->step2-->step3-->step4. It guess that the step1 made solr cache the tokenizer's index result not tokenizer's query result, so that step2 use tokenizer's index result but the query should use tokenzier's query result. when step1 then step2; 98% possibility when step3 then step4; 98% possibility > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781748#comment-15781748 ] 王海涛 commented on SOLR-9894: --- The schema definition as fowlow: {color:red}{color} > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781229#comment-15781229 ] Alexandre Rafalovitch commented on SOLR-9894: - The search does **not** happen against *xf_name*, it happens against *default_search_field* as the debug shows. So, the question is what is the type of the *default_search_field*. *xf_name* (parameter value for *fl*) is the name of the field to be return in the document list, not the field to search against. > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780735#comment-15780735 ] Steve Rowe commented on SOLR-9894: -- Hi [~wanghaitao], please provide the schema definition for your {{xf_name}} field. It looks to me like step2 and step4 should produce the same result, since they are the same query against the same collection with the same schema; however, the analysis results are different. Have you figured out how to make this happen? For example, if you see step2, then index some docs, do you then see step4? > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780676#comment-15780676 ] Erick Erickson commented on SOLR-9894: -- If you think this should really be opened, you need to provide reasons why you think this is a Solr bug. The Solr devs are not responsible for someone else's code. In this case your complaint is about org.wltea.analyzer.lucene.IKTokenizerFactory. Which is _not_ part of Solr so why do you think this should be recorded against Solr? > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780213#comment-15780213 ] 王海涛 commented on SOLR-9894: --- I add 4 attachments which show the case clearly. I made a lot of test about this problem and sure it caught by solr not by IKTokenizer. Please check it again. Very Thankyou! > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > Attachments: step1.png, step2.png, step3.png, step4.png > > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9894) Tokenizer work randomly
[ https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779952#comment-15779952 ] Alexandre Rafalovitch commented on SOLR-9894: - The tokenizers used are not part of the Lucene/Solr code base. They seem to come from https://github.com/EugenePig/ik-analyzer-solr5 . A bug report should be opened against that repository with a specific example. I would recommend being very clear on what example showcases the issue and perhaps even annotate and recompile the code to confirm this. It is unlikely to be something random, but might be a strange combination of factors that triggers whatever you are observing. > Tokenizer work randomly > --- > > Key: SOLR-9894 > URL: https://issues.apache.org/jira/browse/SOLR-9894 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 6.2.1 > Environment: solrcloud 6.2.1(3 solr nodes) > OS:linux > RAM:8G >Reporter: 王海涛 >Priority: Critical > Labels: patch > > my schema.xml has a fieldType as folow: > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/> >class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" > minTermLength="2"/> > > > >class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/> > > > > Attention: > index tokenzier useSmart is false > query tokenzier useSmart is true > But when I send query request with parameter q , > the query tokenziner sometimes useSmart equals true > sometimes useSmart equal false. > That is so terrible! > I guess the problem may be caught by tokenizer cache. > when I query ,the tokenizer should use true as the useSmart's value, > but it had cache the wrong tokenizer result which created by indexWriter who > use false as useSmart's value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org