Re: Highlighting words with non-ascii chars

2011-05-05 Thread Pavel Kukačka
Thanks for the suggestion, Peter;

the problem was elsewhere though - somewhere in the highlighting
module.
I've fixed it by adding (into the field definition in schema.xml) a
custom czech charFilter (mappings from "í" => "i") - then it started to
work as expected.

Cheers,
Pavel


Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200:
> Does your servlet container have the URI encoding set correctly, e.g.
> URIEncoding="UTF-8" for tomcat6?
> 
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> 
> Older versions of Jetty use ISO-8859-1 as the default URI encoding,
> but jetty 6 should use UTF-8 as default:
> 
> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
> 
> -Peter
> 
> On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka  
> wrote:
> > Hello,
> >
> >I've hit a (probably trivial) roadblock I don't know how to overcome 
> > with Solr 3.1:
> > I have a document with common fields (title, keywords, content) and I'm
> > trying to use highlighting.
> >With queries using ASCII characters there is no problem; it works 
> > smoothly. However,
> > when I search using a czech word including non-ascii chars (like "slovíčko" 
> > for example - 
> > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the document is found, but
> > the response doesn't contain the highlighted snippet in the highlighting 
> > node - there is only an
> > empty node - like this:
> > **
> > .
> > .
> > .
> > 
> >  
> > 
> > 
> >
> >
> > When searching for the other keyword ( 
> > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the resulting response is fine - like this:
> > 
> > 
> >  
> > 
> >  slovíčko  > id="highlighting">slovo
> >
> >  
> > 
> >
> > 
> >
> > Did anyone come accross this problem?
> > Cheers,
> > Pavel
> >
> >
> >
> 
> 
> 




Re: Highlighting words with non-ascii chars

2011-05-02 Thread Peter Wolanin
Does your servlet container have the URI encoding set correctly, e.g.
URIEncoding="UTF-8" for tomcat6?

http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Older versions of Jetty use ISO-8859-1 as the default URI encoding,
but jetty 6 should use UTF-8 as default:

http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings

-Peter

On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka  wrote:
> Hello,
>
>        I've hit a (probably trivial) roadblock I don't know how to overcome 
> with Solr 3.1:
> I have a document with common fields (title, keywords, content) and I'm
> trying to use highlighting.
>        With queries using ASCII characters there is no problem; it works 
> smoothly. However,
> when I search using a czech word including non-ascii chars (like "slovíčko" 
> for example - 
> http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the document is found, but
> the response doesn't contain the highlighted snippet in the highlighting node 
> - there is only an
> empty node - like this:
> **
> .
> .
> .
> 
>  
> 
> 
>
>
> When searching for the other keyword ( 
> http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the resulting response is fine - like this:
> 
> 
>  
> 
>      slovíčko  id="highlighting">slovo
>    
>  
> 
>
> 
>
> Did anyone come accross this problem?
> Cheers,
> Pavel
>
>
>



-- 
Peter M. Wolanin, Ph.D.      : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 978-296-5247

"Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";


Re: Highlighting words with non-ascii chars

2011-04-30 Thread Pavel Kukačka
Hi,

thanks for pointing me to the encoder config - this change alone didn't
solve it, though - it just leaves normal characters without HTML
entities - like this in the non-problematic case:
**

  

  slovíčko <em id="highlighting">slovo</em>

  

**
Searching for non-ascii stuff isn't changed. I've went through the wiki & 
guides but haven't found anything related to this.

Thanks though,
Pavel


Ahmet Arslan píše v So 30. 04. 2011 v 14:10 +0200:
> Hi,
> 
> What happens when you set the default encoder to 
> solr.highlight.DefaultEncoder in solrconfig.xml?
> 
> 
> 
> 
> 
> 
> 
> 
> - Original Message -
> From: Pavel Kukačka 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Saturday, April 30, 2011 1:31 PM
> Subject: Re: Highlighting words with non-ascii chars
> 
> Hello,
> 
> I've hit a (probably trivial) roadblock I don't know how to overcome with 
> Solr 3.1: 
> I have a document with common fields (title, keywords, content) and I'm
> trying to use highlighting.
> With queries using ASCII characters there is no problem; it works 
> smoothly. However,
> when I search using a czech word including non-ascii chars (like "slovíčko" 
> for example - 
> http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the document is found, but
> the response doesn't contain the highlighted snippet in the highlighting node 
> - there is only an
> empty node - like this:
> **
> .
> .
> .
> 
>   
> 
> 
> 
> 
> When searching for the other keyword ( 
> http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
>  the resulting response is fine - like this:
> 
> 
>   
> 
>   slov&#237;&#269;ko <em 
> id="highlighting">slovo</em>
> 
>   
> 
> 
> 
> 
> Did anyone come accross this problem?
> Cheers,
> Pavel




Re: Highlighting words with non-ascii chars

2011-04-30 Thread Ahmet Arslan
Hi,

What happens when you set the default encoder to solr.highlight.DefaultEncoder 
in solrconfig.xml?








- Original Message -
From: Pavel Kukačka 
To: solr-user@lucene.apache.org
Cc: 
Sent: Saturday, April 30, 2011 1:31 PM
Subject: Re: Highlighting words with non-ascii chars

Hello,

    I've hit a (probably trivial) roadblock I don't know how to overcome with 
Solr 3.1: 
I have a document with common fields (title, keywords, content) and I'm
trying to use highlighting.
    With queries using ASCII characters there is no problem; it works smoothly. 
However,
when I search using a czech word including non-ascii chars (like "slovíčko" for 
example - 
http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the document is found, but
the response doesn't contain the highlighted snippet in the highlighting node - 
there is only an
empty node - like this:
**
.
.
.

  




When searching for the other keyword ( 
http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the resulting response is fine - like this:


  

      slov&#237;&#269;ko <em 
id="highlighting">slovo</em>
    
  




Did anyone come accross this problem?
Cheers,
Pavel


Highlighting words with non-ascii chars

2011-04-30 Thread Pavel Kukačka
Hello,

I've hit a (probably trivial) roadblock I don't know how to overcome 
with Solr 3.1: 
I have a document with common fields (title, keywords, content) and I'm
trying to use highlighting.
With queries using ASCII characters there is no problem; it works 
smoothly. However,
when I search using a czech word including non-ascii chars (like "slovíčko" for 
example - 
http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the document is found, but
the response doesn't contain the highlighted snippet in the highlighting node - 
there is only an
empty node - like this:
**
.
.
.

  




When searching for the other keyword ( 
http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
 the resulting response is fine - like this:


  

  slovíčko slovo

  




Did anyone come accross this problem?
Cheers,
Pavel