Re: charset encoding
Thank you for this. This work around using ie works great. However this is called fairly early by Solr, before the request handlers are called. So it cannot be added be used by the solrconfig. Anybody has an idea, how we can force ie all the time by simply changing some Solr settings ? (not changing the query) Thank you. On Thu, Sep 12, 2013 at 7:38 PM, Shawn Heisey s...@elyograg.org wrote: On 9/12/2013 11:17 AM, Andreas Owen wrote: it was the http-header, as soon as i force a iso-8859-1 header it worked Glad you found a workaround! If you are in a situation where you cannot control the header of the request or modify the content itself to include charset information, or there's some reason you would rather not take that route, there will be another way with the next Solr release. https://issues.apache.org/jira/browse/SOLR-5082 Solr 4.5 will support an ie (input encoding) parameter for the update request so you can inform Solr what charset encoding to expect. The release process for Solr 4.5 has been started, it usually takes 2-3 weeks to complete. Thanks, Shawn
Re: charset encoding
Can you do a ServletFilter and modify things before they hit Solr? Haven't tried this particular scenario myself, but it's something to look at. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Mar 26, 2014 at 6:39 PM, Antoine LE FLOC'H lefl...@gmail.com wrote: Thank you for this. This work around using ie works great. However this is called fairly early by Solr, before the request handlers are called. So it cannot be added be used by the solrconfig. Anybody has an idea, how we can force ie all the time by simply changing some Solr settings ? (not changing the query) Thank you. On Thu, Sep 12, 2013 at 7:38 PM, Shawn Heisey s...@elyograg.org wrote: On 9/12/2013 11:17 AM, Andreas Owen wrote: it was the http-header, as soon as i force a iso-8859-1 header it worked Glad you found a workaround! If you are in a situation where you cannot control the header of the request or modify the content itself to include charset information, or there's some reason you would rather not take that route, there will be another way with the next Solr release. https://issues.apache.org/jira/browse/SOLR-5082 Solr 4.5 will support an ie (input encoding) parameter for the update request so you can inform Solr what charset encoding to expect. The release process for Solr 4.5 has been started, it usually takes 2-3 weeks to complete. Thanks, Shawn
Re: charset encoding
no jetty, and yes for tomcat i've seen a couple of answers On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote: Using tomcat by any chance? The ML archive has the solution. May be on Wiki, too. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone got a idea whats wrong?
Re: charset encoding
could it have something to do with the meta encoding tag is iso-8859-1 but the http-header tag is utf8 and firefox inteprets it as utf8? On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote: no jetty, and yes for tomcat i've seen a couple of answers On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote: Using tomcat by any chance? The ML archive has the solution. May be on Wiki, too. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone got a idea whats wrong?
Re: charset encoding
it was the http-header, as soon as i force a iso-8859-1 header it worked On 12. Sep 2013, at 9:44 AM, Andreas Owen wrote: could it have something to do with the meta encoding tag is iso-8859-1 but the http-header tag is utf8 and firefox inteprets it as utf8? On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote: no jetty, and yes for tomcat i've seen a couple of answers On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote: Using tomcat by any chance? The ML archive has the solution. May be on Wiki, too. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone got a idea whats wrong?
Re: charset encoding
On 9/12/2013 11:17 AM, Andreas Owen wrote: it was the http-header, as soon as i force a iso-8859-1 header it worked Glad you found a workaround! If you are in a situation where you cannot control the header of the request or modify the content itself to include charset information, or there's some reason you would rather not take that route, there will be another way with the next Solr release. https://issues.apache.org/jira/browse/SOLR-5082 Solr 4.5 will support an ie (input encoding) parameter for the update request so you can inform Solr what charset encoding to expect. The release process for Solr 4.5 has been started, it usually takes 2-3 weeks to complete. Thanks, Shawn
charset encoding
i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone got a idea whats wrong?
Re: charset encoding
Using tomcat by any chance? The ML archive has the solution. May be on Wiki, too. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3.1 with tika to index html-pages. the html files are iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the server-http-header says it's utf8 and firefox-webdeveloper agrees. when i index a page with special chars like ä,ö,ü solr outputs it completly foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone got a idea whats wrong?