Re: Highlighting words with non-ascii chars
Thanks for the suggestion, Peter; the problem was elsewhere though - somewhere in the highlighting module. I've fixed it by adding (into the field definition in schema.xml) a custom czech charFilter (mappings from "í" => "i") - then it started to work as expected. Cheers, Pavel Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200: > Does your servlet container have the URI encoding set correctly, e.g. > URIEncoding="UTF-8" for tomcat6? > > http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config > > Older versions of Jetty use ISO-8859-1 as the default URI encoding, > but jetty 6 should use UTF-8 as default: > > http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings > > -Peter > > On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka > wrote: > > Hello, > > > >I've hit a (probably trivial) roadblock I don't know how to overcome > > with Solr 3.1: > > I have a document with common fields (title, keywords, content) and I'm > > trying to use highlighting. > >With queries using ASCII characters there is no problem; it works > > smoothly. However, > > when I search using a czech word including non-ascii chars (like "slovíčko" > > for example - > > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > > the document is found, but > > the response doesn't contain the highlighted snippet in the highlighting > > node - there is only an > > empty node - like this: > > ** > > . > > . > > . > > > > > > > > > > > > > > When searching for the other keyword ( > > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > > the resulting response is fine - like this: > > > > > > > > > > slovíčko > id="highlighting">slovo > > > > > > > > > > > > > > Did anyone come accross this problem? > > Cheers, > > Pavel > > > > > > > > >
Re: Highlighting words with non-ascii chars
Does your servlet container have the URI encoding set correctly, e.g. URIEncoding="UTF-8" for tomcat6? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Older versions of Jetty use ISO-8859-1 as the default URI encoding, but jetty 6 should use UTF-8 as default: http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings -Peter On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka wrote: > Hello, > > I've hit a (probably trivial) roadblock I don't know how to overcome > with Solr 3.1: > I have a document with common fields (title, keywords, content) and I'm > trying to use highlighting. > With queries using ASCII characters there is no problem; it works > smoothly. However, > when I search using a czech word including non-ascii chars (like "slovíčko" > for example - > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the document is found, but > the response doesn't contain the highlighted snippet in the highlighting node > - there is only an > empty node - like this: > ** > . > . > . > > > > > > > When searching for the other keyword ( > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the resulting response is fine - like this: > > > > > slovíčko id="highlighting">slovo > > > > > > > Did anyone come accross this problem? > Cheers, > Pavel > > > -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";
Re: Highlighting words with non-ascii chars
Hi, thanks for pointing me to the encoder config - this change alone didn't solve it, though - it just leaves normal characters without HTML entities - like this in the non-problematic case: ** slovíčko <em id="highlighting">slovo</em> ** Searching for non-ascii stuff isn't changed. I've went through the wiki & guides but haven't found anything related to this. Thanks though, Pavel Ahmet Arslan píše v So 30. 04. 2011 v 14:10 +0200: > Hi, > > What happens when you set the default encoder to > solr.highlight.DefaultEncoder in solrconfig.xml? > > > > > > > > > - Original Message - > From: Pavel Kukačka > To: solr-user@lucene.apache.org > Cc: > Sent: Saturday, April 30, 2011 1:31 PM > Subject: Re: Highlighting words with non-ascii chars > > Hello, > > I've hit a (probably trivial) roadblock I don't know how to overcome with > Solr 3.1: > I have a document with common fields (title, keywords, content) and I'm > trying to use highlighting. > With queries using ASCII characters there is no problem; it works > smoothly. However, > when I search using a czech word including non-ascii chars (like "slovíčko" > for example - > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the document is found, but > the response doesn't contain the highlighted snippet in the highlighting node > - there is only an > empty node - like this: > ** > . > . > . > > > > > > > When searching for the other keyword ( > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), > the resulting response is fine - like this: > > > > > slovíčko <em > id="highlighting">slovo</em> > > > > > > > Did anyone come accross this problem? > Cheers, > Pavel
Re: Highlighting words with non-ascii chars
Hi, What happens when you set the default encoder to solr.highlight.DefaultEncoder in solrconfig.xml? - Original Message - From: Pavel Kukačka To: solr-user@lucene.apache.org Cc: Sent: Saturday, April 30, 2011 1:31 PM Subject: Re: Highlighting words with non-ascii chars Hello, I've hit a (probably trivial) roadblock I don't know how to overcome with Solr 3.1: I have a document with common fields (title, keywords, content) and I'm trying to use highlighting. With queries using ASCII characters there is no problem; it works smoothly. However, when I search using a czech word including non-ascii chars (like "slovíčko" for example - http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), the document is found, but the response doesn't contain the highlighted snippet in the highlighting node - there is only an empty node - like this: ** . . . When searching for the other keyword ( http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), the resulting response is fine - like this: slovíčko <em id="highlighting">slovo</em> Did anyone come accross this problem? Cheers, Pavel
Highlighting words with non-ascii chars
Hello, I've hit a (probably trivial) roadblock I don't know how to overcome with Solr 3.1: I have a document with common fields (title, keywords, content) and I'm trying to use highlighting. With queries using ASCII characters there is no problem; it works smoothly. However, when I search using a czech word including non-ascii chars (like "slovíčko" for example - http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), the document is found, but the response doesn't contain the highlighted snippet in the highlighting node - there is only an empty node - like this: ** . . . When searching for the other keyword ( http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*), the resulting response is fine - like this: slovíčko slovo Did anyone come accross this problem? Cheers, Pavel