Re: Highlighting words with non-ascii chars

2011-05-05 Thread Pavel Kukačka
Thanks for the suggestion, Peter;

the problem was elsewhere though - somewhere in the highlighting
module.
I've fixed it by adding (into the field definition in schema.xml) a
custom czech charFilter (mappings from "í" => "i") - then it started to
work as expected.

Cheers,
Pavel


Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200:
> Does your servlet container have the URI encoding set correctly, e.g.
> URIEncoding="UTF-8" for tomcat6?
> 
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> 
> Older versions of Jetty use ISO-8859-1 as the default URI encoding,
> but jetty 6 should use UTF-8 as default:
> 
> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
> 
> -Peter
> 
> On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka  
> wrote:
> > Hello,
> >
> >I've hit a (probably trivial) roadblock I don't know how to overcome 
> > with Solr 3.1:
> > I have a document with common fields (title, keywords, content) and I'm
> > trying to use highlighting.
> >With queries using ASCII characters there is no problem; it works 
> > smoothly. However,
> > when I search using a czech word including non-ascii chars (like "slovíčko" 
> > for example - 
> > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the document is found, but
> > the response doesn't contain the highlighted snippet in the highlighting 
> > node - there is only an
> > empty node - like this:
> > **
> > .
> > .
> > .
> > 
> >  
> > 
> > 
> >
> >
> > When searching for the other keyword ( 
> > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the resulting response is fine - like this:
> > 
> > 
> >  
> > 
> >  slov&#237;&#269;ko <em 
> > id="highlighting">slovo</em>
> >
> >  
> > 
> >
> > 
> >
> > Did anyone come accross this problem?
> > Cheers,
> > Pavel
> >
> >
> >
> 
> 
> 




Re: Highlighting words with non-ascii chars

2011-04-30 Thread Pavel Kukačka
Hi,

thanks for pointing me to the encoder config - this change alone didn't
solve it, though - it just leaves normal characters without HTML
entities - like this in the non-problematic case:
**

  

  slovíčko <em id="highlighting">slovo</em>

  

**
Searching for non-ascii stuff isn't changed. I've went through the wiki & 
guides but haven't found anything related to this.

Thanks though,
Pavel


Ahmet Arslan píše v So 30. 04. 2011 v 14:10 +0200:
> Hi,
> 
> What happens when you set the default encoder to 
> solr.highlight.DefaultEncoder in solrconfig.xml?
> 
> 
> 
> 
> 
> 
> 
> 
> - Original Message -
> From: Pavel Kukačka 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Saturday, April 30, 2011 1:31 PM
> Subject: Re: Highlighting words with non-ascii chars
> 
> Hello,
> 
> I've hit a (probably trivial) roadblock I don't know how to overcome with 
> Solr 3.1: 
> I have a document with common fields (title, keywords, content) and I'm
> trying to use highlighting.
> With queries using ASCII characters there is no problem; it works 
> smoothly. However,
> when I search using a czech word including non-ascii chars (like "slovíčko" 
> for example - 
> http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the document is found, but
> the response doesn't contain the highlighted snippet in the highlighting node 
> - there is only an
> empty node - like this:
> **
> .
> .
> .
> 
>   
> 
> 
> 
> 
> When searching for the other keyword ( 
> http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the resulting response is fine - like this:
> 
> 
>   
> 
>   slov&#237;&#269;ko <em 
> id="highlighting">slovo</em>
> 
>   
> 
> 
> 
> 
> Did anyone come accross this problem?
> Cheers,
> Pavel




Highlighting words with non-ascii chars

2011-04-30 Thread Pavel Kukačka
Hello,

I've hit a (probably trivial) roadblock I don't know how to overcome 
with Solr 3.1: 
I have a document with common fields (title, keywords, content) and I'm
trying to use highlighting.
With queries using ASCII characters there is no problem; it works 
smoothly. However,
when I search using a czech word including non-ascii chars (like "slovíčko" for 
example - 
http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the document is found, but
the response doesn't contain the highlighted snippet in the highlighting node - 
there is only an
empty node - like this:
**
.
.
.

  




When searching for the other keyword ( 
http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the resulting response is fine - like this:


  

  slovíčko slovo

  




Did anyone come accross this problem?
Cheers,
Pavel