Hi,

With this article ( 
https://opensourceconnections.com/blog/2011/12/23/indexing-chinese-in-solr/ ), 
I begin to understand what happens.

Is someone have already try, with a recent SOLR, the Poading algorithm?


Thanks,
Bruno

-----Message d'origine-----
De : Bruno Mannina [mailto:bmann...@free.fr]
Envoyé : dimanche 10 janvier 2021 17:57
À : solr-user@lucene.apache.org
Objet : [solr8.7] not relevant results for chinese query

Hello,



I try to use chinese language with my index.



My definition is:

<field name="tizh" type="text_zh" multiValued="true" indexed="true"
stored="true" termVectors="true" termPositions="true" termOffsets="true"/>



    <!-- Simplified chinese -->

    <!-- BRUNO -->

    <fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">

      <analyzer>

       <tokenizer class="solr.HMMChineseTokenizerFactory"/>

       <filter class="solr.CJKWidthFilterFactory"/>

       <filter class="solr.StopFilterFactory"

          words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>

       <filter class="solr.PorterStemFilterFactory"/>

       <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>



But, I get too much not relevant results.



i.e. : With the query (phone case):

tizh:(手機殼)



my query is translate to:

tizh:(手 OR 機 OR 殼)



But:

tizh:(手 AND 機 AND 殼)

returns 0 result.



And:

tizh:”手機殼”

returns also 0 result.



Is it possible to improve my fieldType ? or must I add something else ?



Thanks,

Bruno





--
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Reply via email to