Re: Solved! Solr interprets UTF-8 as ISO-8859-1

2008-04-01 Thread Daniel Löfquist
That did the trick. I actually figured it out on my own 10 minutes after 
I posted to the mailinglist. Typical ;-)

Thanks for the help anyway everybody!

//Daniel

Uwe Klosa wrote:

You should set uriEncoding=UTF-8 in your application server. For tomcat
you can do that in the server.xml. For Glassfish you have to create a
sun-web.xml containing the according parameters. Yoy r application server
should provide a similar mechanism.

Uwe

On Mon, Mar 31, 2008 at 4:32 PM, Daniel Löfquist 
[EMAIL PROTECTED] wrote:


Hello,

We're building a webapplication that uses Solr for searching and I've
come upon a problem that I can't seem to get my head around.

We have a servlet that accepts input via XML-RPC and based on that input
constructs the correct URL to perform a search with the Solr-servlet.

I know that the call to Solr (the URL) from our servlet looks like this
(which is what it should look like):

http://myserver:8080/solrproducts/select/?q=all_SV:ljusbl
å+status:onlinefl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2Csort=titleSort_SV+asc,id+ascstart=0q.op=ANDrows=25

But Solr reports the input-fields (the GET-variables in the URL) as:

INFO: /select/

fl=id,artno,title_SV,titleSort_SV,description_SV,sort=titleSort_SV+asc,id+ascstart=0q=all_SV:ljusblå+status:onlineq.op=ANDrows=25

which is all fine except where it says ljusblå. Apparently Solr is
interpreting the UTF-8 string ljusblå as ISO-8859-1 and thus creates
this garbage that makes the search return 0 when it should in reality
return 3 hits.

All other searches that don't use special characters work 100% fine.

I'm new to Solr so I'm not sure what I'm doing wrong here. Can anybody
help me out and point me in the direction of a solution?

Sincerely,

Daniel Löfquist






--
Daniel Löfquist
Application Manager / Software Engineer

CDON.COM
Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden

Office: +46 40 601 61 00
Direct: +46 40 601 61 16
Mobile: +46 702 92 21 75
Fax: +46 40 601 61 20
E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

CDON.COM http://www.cdon.com/

Confidentiality
Information contained in this e-mail is intended for the use of the
addressee only, and is confidential. Any dissemination, distribution,
copying or use of this communication without prior permission of
the addressee is strictly prohibited. If you are not the intended
addressee you must delete this e-mail and its attachments.


Re: Solr interprets UTF-8 as ISO-8859-1

2008-03-31 Thread Sean Timm
Send the URL with the å character URL encoded as %C3%A5.  That is the 
UTF-8 URL encoding.


http://myserver:8080/solrproducts/select/?q=all_SV:ljusbl%C3%A5+status:onlinefl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2Csort=titleSort_SV+asc,id+ascstart=0q.op=ANDrows=25

-Sean


Daniel Löfquist wrote:

Hello,

We're building a webapplication that uses Solr for searching and I've
come upon a problem that I can't seem to get my head around.

We have a servlet that accepts input via XML-RPC and based on that input
constructs the correct URL to perform a search with the Solr-servlet.

I know that the call to Solr (the URL) from our servlet looks like this
(which is what it should look like):

http://myserver:8080/solrproducts/select/?q=all_SV:ljusblå+status:onlinefl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2Csort=titleSort_SV+asc,id+ascstart=0q.op=ANDrows=25 



But Solr reports the input-fields (the GET-variables in the URL) as:

INFO: /select/
fl=id,artno,title_SV,titleSort_SV,description_SV,sort=titleSort_SV+asc,id+ascstart=0q=all_SV:ljusblå+status:onlineq.op=ANDrows=25 



which is all fine except where it says ljusblå. Apparently Solr is
interpreting the UTF-8 string ljusblå as ISO-8859-1 and thus creates
this garbage that makes the search return 0 when it should in reality
return 3 hits.

All other searches that don't use special characters work 100% fine.

I'm new to Solr so I'm not sure what I'm doing wrong here. Can anybody
help me out and point me in the direction of a solution?

Sincerely,

Daniel Löfquist



Re: Solr interprets UTF-8 as ISO-8859-1

2008-03-31 Thread Uwe Klosa
You should set uriEncoding=UTF-8 in your application server. For tomcat
you can do that in the server.xml. For Glassfish you have to create a
sun-web.xml containing the according parameters. Yoy r application server
should provide a similar mechanism.

Uwe

On Mon, Mar 31, 2008 at 4:32 PM, Daniel Löfquist 
[EMAIL PROTECTED] wrote:

 Hello,

 We're building a webapplication that uses Solr for searching and I've
 come upon a problem that I can't seem to get my head around.

 We have a servlet that accepts input via XML-RPC and based on that input
 constructs the correct URL to perform a search with the Solr-servlet.

 I know that the call to Solr (the URL) from our servlet looks like this
 (which is what it should look like):

 http://myserver:8080/solrproducts/select/?q=all_SV:ljusbl
 å+status:onlinefl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2Csort=titleSort_SV+asc,id+ascstart=0q.op=ANDrows=25

 But Solr reports the input-fields (the GET-variables in the URL) as:

 INFO: /select/

 fl=id,artno,title_SV,titleSort_SV,description_SV,sort=titleSort_SV+asc,id+ascstart=0q=all_SV:ljusblå+status:onlineq.op=ANDrows=25

 which is all fine except where it says ljusblå. Apparently Solr is
 interpreting the UTF-8 string ljusblå as ISO-8859-1 and thus creates
 this garbage that makes the search return 0 when it should in reality
 return 3 hits.

 All other searches that don't use special characters work 100% fine.

 I'm new to Solr so I'm not sure what I'm doing wrong here. Can anybody
 help me out and point me in the direction of a solution?

 Sincerely,

 Daniel Löfquist