RE: Tomcat special character problem
The problem was firstly the wrong URIEncoding of tomcat itself. The second problem came from the application's side: The params were wrongly encoded, so it was not possible to show the desired results. If you need to convert from different encodings to utf8, I can give you the following piece of pseudocode: string = urlencode(encodeForUtf8(myString)); And if you need to decode for several reasons, keep in mind that you must change the order of decodings: value = decodeFromUtf8(urldecode(string)); Hope that helps. Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1868024.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Tomcat special character problem
Tomcat is notorious for not having the defaults right for UTF-8. Em, I suggest you go over the suggestions in: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding Also, maybe you can use wget/curl to issue your http requests from a shell which is better suited for the encoding. -- Yuval -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Sunday, November 07, 2010 10:55 PM To: solr-user@lucene.apache.org Subject: Re: Tomcat special character problem In a post document, or a get document with URL encoded variables in the BODY of the document, it's possible to specify/use different encodings that are actually specified in the headers. For SURE in post, and I'm pretty sure in GET also. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Michael Sokolov To: solr-user@lucene.apache.org Cc: Em Sent: Sun, November 7, 2010 12:40:45 PM Subject: Re: Tomcat special character problem Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not possible to specify an encoding explicitly). -Mike On 11/7/2010 10:26 AM, Em wrote: > This helped a lot, since it solved the "göteburg"-problem. > Thank you, Ken! Great help :-). > > Unfortunately there are some other encoding problems > > "fq=testcat%3Aacôme" worked, however the full url-encoded version > "fq=testcat%3Aac%F4me" does not. > > The first version is the result of submitting the form.jsp, the second is > the version when you click into the adress-bar and press enter. > > This is a real problem for me, since applications that send a query send an > urlencoded query like the second one. > > Any suggestions?
Re: Tomcat special character problem
In a post document, or a get document with URL encoded variables in the BODY of the document, it's possible to specify/use different encodings that are actually specified in the headers. For SURE in post, and I'm pretty sure in GET also. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Michael Sokolov To: solr-user@lucene.apache.org Cc: Em Sent: Sun, November 7, 2010 12:40:45 PM Subject: Re: Tomcat special character problem Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not possible to specify an encoding explicitly). -Mike On 11/7/2010 10:26 AM, Em wrote: > This helped a lot, since it solved the "göteburg"-problem. > Thank you, Ken! Great help :-). > > Unfortunately there are some other encoding problems > > "fq=testcat%3Aacôme" worked, however the full url-encoded version > "fq=testcat%3Aac%F4me" does not. > > The first version is the result of submitting the form.jsp, the second is > the version when you click into the adress-bar and press enter. > > This is a real problem for me, since applications that send a query send an > urlencoded query like the second one. > > Any suggestions?
Re: Tomcat special character problem
I also thought that this might be the case a few hours ago. However, I have to verify that tomorrow. >From a debugging point of view: How can I set the encoding of my browser's adress-bar? When I pressed enter the encoding switched from clear-text to an urlencoded version. The urlencoded version did not work. Thank you Mike. I will give you a feedback whether it worked or not! -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1859259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not possible to specify an encoding explicitly). -Mike On 11/7/2010 10:26 AM, Em wrote: This helped a lot, since it solved the "göteburg"-problem. Thank you, Ken! Great help :-). Unfortunately there are some other encoding problems "fq=testcat%3Aacôme" worked, however the full url-encoded version "fq=testcat%3Aac%F4me" does not. The first version is the result of submitting the form.jsp, the second is the version when you click into the adress-bar and press enter. This is a real problem for me, since applications that send a query send an urlencoded query like the second one. Any suggestions?
Re: Tomcat special character problem
This helped a lot, since it solved the "göteburg"-problem. Thank you, Ken! Great help :-). Unfortunately there are some other encoding problems "fq=testcat%3Aacôme" worked, however the full url-encoded version "fq=testcat%3Aac%F4me" does not. The first version is the result of submitting the form.jsp, the second is the version when you click into the adress-bar and press enter. This is a real problem for me, since applications that send a query send an urlencoded query like the second one. Any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857963.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
On Sun, Nov 7, 2010 at 9:34 AM, Em wrote: > > Hi Ken, > > thank you for your quick answer! > > To make sure that there occurs no mistakes at my application's side, I send > my requests with the form that is available at solr/admin/form.jsp > > I changed almost nothing from the example-configurations within the > example-package except some auto-commit params. > > All the special-characters within the results were displayed correctly, and > so far they were also indexed correctly. > The only problem is querying with special-characters. > > I can confirm that the page is encoded in UTF-8 within my browser. > > Is there a possibility that Tomcat did not use the UTF-8 URIEncoding? > Maybe I should say that Tomcat is behind an Apache HttpdServer and is > mounted by a jk_mount. > > Thank you! > > I am not familiar with using your type of set up, but a quick Google search suggested using a second connector on a different port. If you're using mod_jk, you can try setting "JkOptions +ForwardURICompatUnparsed" to see if that helps. ( http://markstechstuff.blogspot.com/2008/02/utf-8-problem-between-apache-and-tomcat.html). Sorry I couldn't have been more help. :) - Ken
Re: Tomcat special character problem
Hi Ken, thank you for your quick answer! To make sure that there occurs no mistakes at my application's side, I send my requests with the form that is available at solr/admin/form.jsp I changed almost nothing from the example-configurations within the example-package except some auto-commit params. All the special-characters within the results were displayed correctly, and so far they were also indexed correctly. The only problem is querying with special-characters. I can confirm that the page is encoded in UTF-8 within my browser. Is there a possibility that Tomcat did not use the UTF-8 URIEncoding? Maybe I should say that Tomcat is behind an Apache HttpdServer and is mounted by a jk_mount. Thank you! Ken Stanley wrote: > > On Sun, Nov 7, 2010 at 9:11 AM, Em wrote: > >> >> Hi List, >> >> I got an issue with my Solr-environment in Tomcat. >> First: I am not very familiar with Tomcat, so it might be my fault and >> not >> Solr's. >> >> It can not be a solr-side configuration problem, since everything worked >> fine with my local Jetty-servlet container. >> >> However, when I deploy into Tomcat, several special characters were shown >> in >> their utf-8 representation. >> >> Example: >> göteburg will be displayed as göteburg when it comes >> to >> search. >> >> I tried the following within my server.xml-file >> >>> connectionTimeout="2" >> redirectPort="8443" >> URIEncoding="UTF-8" /> >> >> And restarted Tomcat afterwards. >> >> The problem only occurs when I try to search for something. >> It is no problem to index that data. >> >> Thank you for any help! >> >> Regards, >> Em >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > That is definitely odd. When I tried copying "göteburg" and doing a manual > query in my web browser, everything worked. How are you making the request > to SOLR? When I viewed the properties/info of the results, my returned > charset was in UTF-8. Can you confirm similar for you? > > When I grepped for "UTF-8" in both my SOLR and Tomcat configs, nothing > stood > out as a special configuration option. > > -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857729.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
On Sun, Nov 7, 2010 at 9:11 AM, Em wrote: > > Hi List, > > I got an issue with my Solr-environment in Tomcat. > First: I am not very familiar with Tomcat, so it might be my fault and not > Solr's. > > It can not be a solr-side configuration problem, since everything worked > fine with my local Jetty-servlet container. > > However, when I deploy into Tomcat, several special characters were shown > in > their utf-8 representation. > > Example: > göteburg will be displayed as göteburg when it comes > to > search. > > I tried the following within my server.xml-file > > connectionTimeout="2" > redirectPort="8443" > URIEncoding="UTF-8" /> > > And restarted Tomcat afterwards. > > The problem only occurs when I try to search for something. > It is no problem to index that data. > > Thank you for any help! > > Regards, > Em > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html > Sent from the Solr - User mailing list archive at Nabble.com. > That is definitely odd. When I tried copying "göteburg" and doing a manual query in my web browser, everything worked. How are you making the request to SOLR? When I viewed the properties/info of the results, my returned charset was in UTF-8. Can you confirm similar for you? When I grepped for "UTF-8" in both my SOLR and Tomcat configs, nothing stood out as a special configuration option.