[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2019-08-10 Thread Chongchen Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904419#comment-16904419
 ] 

Chongchen Chen commented on SOLR-9894:
--

As far as I know, IKTokenizer only tokenizes Chinese characters, it doesn't 
tokenize pinyin. So the bug is in 
org.wltea.pinyin.solr5.PinyinTokenFilterFactory, not IKTokenizer or Solr. Also 
I cannot find org.wltea.pinyin.solr5.PinyinTokenFilterFactory on internet. I 
think this issue can be closed.

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786758#comment-15786758
 ] 

Erick Erickson commented on SOLR-9894:
--

We've mentioned several times that this involves a tokenizer that is _not_ 
supported by Apache Solr, specifically: 
org.wltea.pinyin.solr5.PinyinTokenFilterFactory. You have yet to show that the 
problem isn't in this custom class.

Plus, the class mentions Solr 5, yet you're logging this against Solr 6.

Unless and until you can show that this issue is a problem with Solr and not 
this non-solr tokenizer there is little that we can do. If you would like to 
retain consulting services to debug this custom code, please contact one of the 
many consulting services.

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786574#comment-15786574
 ] 

王海涛 commented on SOLR-9894:
---

Does anyone can resolve this bug? I will appreciate you, because this bug make 
my company search result so bad bad bad...

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781805#comment-15781805
 ] 

王海涛 commented on SOLR-9894:
---

I operate this 4 steps one by one. setp1-->step2-->step3-->step4.
It guess that the step1 made solr cache the tokenizer's index result not 
tokenizer's query result, 
so that step2 use tokenizer's index result but the query should use tokenzier's 
query result.

when step1 then step2;   98%  possibility
when step3 then step4;   98%  possibility

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781748#comment-15781748
 ] 

王海涛 commented on SOLR-9894:
---

The schema definition as fowlow:
  
  
  {color:red}{color}
  


 




   



> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781229#comment-15781229
 ] 

Alexandre Rafalovitch commented on SOLR-9894:
-

The search does **not** happen against *xf_name*, it happens against 
*default_search_field* as the debug shows. So, the question is what is the type 
of the *default_search_field*. *xf_name* (parameter value for *fl*) is the name 
of the field to be return in the document list, not the field to search against.

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780735#comment-15780735
 ] 

Steve Rowe commented on SOLR-9894:
--

Hi [~wanghaitao], please provide the schema definition for your {{xf_name}} 
field.

It looks to me like step2 and step4 should produce the same result, since they 
are the same query against the same collection with the same schema; however, 
the analysis results are different.

Have you figured out how to make this happen?  For example, if you see step2, 
then index some docs, do you then see step4?

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780676#comment-15780676
 ] 

Erick Erickson commented on SOLR-9894:
--

If you think this should really be opened, you need to provide reasons why you 
think this is a Solr bug. The Solr devs are not responsible for someone else's 
code. In this case your complaint is about 
org.wltea.analyzer.lucene.IKTokenizerFactory. Which is _not_ part of Solr so 
why do you think this should be recorded against Solr?

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780213#comment-15780213
 ] 

王海涛 commented on SOLR-9894:
---

I add 4 attachments which show the case clearly.
I made a lot of test about this problem and sure it caught by solr not by 
IKTokenizer.
Please check it again.
Very Thankyou!

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9894) Tokenizer work randomly

2016-12-27 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779952#comment-15779952
 ] 

Alexandre Rafalovitch commented on SOLR-9894:
-

The tokenizers used are not part of the Lucene/Solr code base. They seem to 
come from https://github.com/EugenePig/ik-analyzer-solr5 . A bug report should 
be opened against that repository with a specific example.

I would recommend being very clear on what example showcases the issue and 
perhaps even annotate and recompile the code to confirm this. It is unlikely to 
be something random, but might be a strange combination of factors that 
triggers whatever you are observing.

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org