Re: Probleme with unicode query
Thanks !! This is a tomcat issue and not solr : URIEncoding=UTF-8 is missing in tomcat server.xml Frederic 2012/2/23 Em mailformailingli...@yahoo.de Hi Frederic, I saw similar issues when sending such a request without proper URL-encoding. It is important to note that the URL-encoded string already has to be an UTF-8-string. What happens if you send that query via Solr's admin-panel? Have a look at this page for troubleshooting: http://wiki.apache.org/solr/SolrTomcat Kind regards, Em Am 23.02.2012 18:15, schrieb Frederic Bouchery: hello, I'm using Solr 3.5 over Tomcat 6 and I've some problemes with unicode quey. Here is my text field configuration analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer When I performe this request : select/?q=hygiene sécuritédebugQuery=true Here is debug infos : str name=rawquerystringhygiene sécurité/str str name=querystringhygiene sécurité/str str name=parsedquerysearchText:hygien (searchText:sa searchText:curit)/str str name=parsedquery_toStringsearchText:hygien (searchText:sa searchText:curit)/str Has you can see, unicode request failed : searchText:sa searchText:curit instead of searchText:securite I've tried with ISOLatin1AccentFilterFactory, I've changed the order, but no difference :( Any ideas ? Thanks Frederic -- *Frédéric BOUCHERY* OuestFranceMultimédi@ *BU - Emploi* : 0.22.33.55.88.9
Probleme with unicode query
hello, I'm using Solr 3.5 over Tomcat 6 and I've some problemes with unicode quey. Here is my text field configuration analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer When I performe this request : select/?q=hygiene sécuritédebugQuery=true Here is debug infos : str name=rawquerystringhygiene sécurité/str str name=querystringhygiene sécurité/str str name=parsedquerysearchText:hygien (searchText:sa searchText:curit)/str str name=parsedquery_toStringsearchText:hygien (searchText:sa searchText:curit)/str Has you can see, unicode request failed : searchText:sa searchText:curit instead of searchText:securite I've tried with ISOLatin1AccentFilterFactory, I've changed the order, but no difference :( Any ideas ? Thanks Frederic
probleme with unicode query
hello, I'm using Solr 3.5 over Tomcat 6 and I've some problemes with unicode quey. Here is my text field configuration analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer When I performe this request : select/?q=hygiene sécuritédebugQuery=true Here is debug infos : str name=rawquerystringhygiene sécurité/str str name=querystringhygiene sécurité/str str name=parsedquerysearchText:hygien (searchText:sa searchText:curit)/str str name=parsedquery_toStringsearchText:hygien (searchText:sa searchText:curit)/str Has you can see, unicode request failed : searchText:sa searchText:curit instead of searchText:securite I've tried with ISOLatin1AccentFilterFactory, I've changed the order, but no difference :( Any ideas ? Thanks Frederic
Re: Probleme with unicode query
Hi Frederic, I saw similar issues when sending such a request without proper URL-encoding. It is important to note that the URL-encoded string already has to be an UTF-8-string. What happens if you send that query via Solr's admin-panel? Have a look at this page for troubleshooting: http://wiki.apache.org/solr/SolrTomcat Kind regards, Em Am 23.02.2012 18:15, schrieb Frederic Bouchery: hello, I'm using Solr 3.5 over Tomcat 6 and I've some problemes with unicode quey. Here is my text field configuration analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer When I performe this request : select/?q=hygiene sécuritédebugQuery=true Here is debug infos : str name=rawquerystringhygiene sécurité/str str name=querystringhygiene sécurité/str str name=parsedquerysearchText:hygien (searchText:sa searchText:curit)/str str name=parsedquery_toStringsearchText:hygien (searchText:sa searchText:curit)/str Has you can see, unicode request failed : searchText:sa searchText:curit instead of searchText:securite I've tried with ISOLatin1AccentFilterFactory, I've changed the order, but no difference :( Any ideas ? Thanks Frederic