Re: charset encoding

2014-03-26 Thread Antoine LE FLOC'H
Thank you for this. This work around using ie works great.

However this is called fairly early by Solr, before the request handlers
are called. So it cannot be added be used by the solrconfig.

Anybody has an idea, how we can force ie all the time by simply changing
some Solr settings ?
(not changing the query)

Thank you.



On Thu, Sep 12, 2013 at 7:38 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/12/2013 11:17 AM, Andreas Owen wrote:
  it was the http-header, as soon as i force a iso-8859-1 header it worked

 Glad you found a workaround!

 If you are in a situation where you cannot control the header of the
 request or modify the content itself to include charset information, or
 there's some reason you would rather not take that route, there will be
 another way with the next Solr release.

 https://issues.apache.org/jira/browse/SOLR-5082

 Solr 4.5 will support an ie (input encoding) parameter for the update
 request so you can inform Solr what charset encoding to expect.  The
 release process for Solr 4.5 has been started, it usually takes 2-3
 weeks to complete.

 Thanks,
 Shawn




Re: charset encoding

2014-03-26 Thread Alexandre Rafalovitch
Can you do a ServletFilter and modify things before they hit Solr?
Haven't tried this particular scenario myself, but it's something to
look at.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Mar 26, 2014 at 6:39 PM, Antoine LE FLOC'H lefl...@gmail.com wrote:
 Thank you for this. This work around using ie works great.

 However this is called fairly early by Solr, before the request handlers
 are called. So it cannot be added be used by the solrconfig.

 Anybody has an idea, how we can force ie all the time by simply changing
 some Solr settings ?
 (not changing the query)

 Thank you.



 On Thu, Sep 12, 2013 at 7:38 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/12/2013 11:17 AM, Andreas Owen wrote:
  it was the http-header, as soon as i force a iso-8859-1 header it worked

 Glad you found a workaround!

 If you are in a situation where you cannot control the header of the
 request or modify the content itself to include charset information, or
 there's some reason you would rather not take that route, there will be
 another way with the next Solr release.

 https://issues.apache.org/jira/browse/SOLR-5082

 Solr 4.5 will support an ie (input encoding) parameter for the update
 request so you can inform Solr what charset encoding to expect.  The
 release process for Solr 4.5 has been started, it usually takes 2-3
 weeks to complete.

 Thanks,
 Shawn




Re: charset encoding

2013-09-12 Thread Andreas Owen
no jetty, and yes for tomcat i've seen a couple of answers

On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote:

 Using tomcat by any chance? The ML archive has the solution. May be on
 Wiki, too.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote:
 
 i'm using solr 4.3.1 with tika to index html-pages. the html files are
 iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the
 server-http-header says it's utf8 and firefox-webdeveloper agrees.
 
 when i index a page with special chars like ä,ö,ü solr outputs it
 completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
 it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
 has anyone got a idea whats wrong?
 
 



Re: charset encoding

2013-09-12 Thread Andreas Owen
could it have something to do with the meta encoding tag is iso-8859-1 but the 
http-header tag is utf8 and firefox inteprets it as utf8?

On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote:

 no jetty, and yes for tomcat i've seen a couple of answers
 
 On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote:
 
 Using tomcat by any chance? The ML archive has the solution. May be on
 Wiki, too.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote:
 
 i'm using solr 4.3.1 with tika to index html-pages. the html files are
 iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the
 server-http-header says it's utf8 and firefox-webdeveloper agrees.
 
 when i index a page with special chars like ä,ö,ü solr outputs it
 completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
 it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
 has anyone got a idea whats wrong?
 
 



Re: charset encoding

2013-09-12 Thread Andreas Owen
it was the http-header, as soon as i force a iso-8859-1 header it worked

On 12. Sep 2013, at 9:44 AM, Andreas Owen wrote:

 could it have something to do with the meta encoding tag is iso-8859-1 but 
 the http-header tag is utf8 and firefox inteprets it as utf8?
 
 On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote:
 
 no jetty, and yes for tomcat i've seen a couple of answers
 
 On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote:
 
 Using tomcat by any chance? The ML archive has the solution. May be on
 Wiki, too.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote:
 
 i'm using solr 4.3.1 with tika to index html-pages. the html files are
 iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the
 server-http-header says it's utf8 and firefox-webdeveloper agrees.
 
 when i index a page with special chars like ä,ö,ü solr outputs it
 completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
 it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
 has anyone got a idea whats wrong?
 
 



Re: charset encoding

2013-09-12 Thread Shawn Heisey
On 9/12/2013 11:17 AM, Andreas Owen wrote:
 it was the http-header, as soon as i force a iso-8859-1 header it worked

Glad you found a workaround!

If you are in a situation where you cannot control the header of the
request or modify the content itself to include charset information, or
there's some reason you would rather not take that route, there will be
another way with the next Solr release.

https://issues.apache.org/jira/browse/SOLR-5082

Solr 4.5 will support an ie (input encoding) parameter for the update
request so you can inform Solr what charset encoding to expect.  The
release process for Solr 4.5 has been started, it usually takes 2-3
weeks to complete.

Thanks,
Shawn



charset encoding

2013-09-11 Thread Andreas Owen
i'm using solr 4.3.1 with tika to index html-pages. the html files are 
iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the 
server-http-header says it's utf8 and firefox-webdeveloper agrees. 

when i index a page with special chars like ä,ö,ü solr outputs it completly 
foreign signs, not the normal wrong chars with 1/4 or the Flag in it. so it 
seams that its not simply the normal utf8/iso-8859-1 discrepancy. has anyone 
got a idea whats wrong?



Re: charset encoding

2013-09-11 Thread Otis Gospodnetic
Using tomcat by any chance? The ML archive has the solution. May be on
Wiki, too.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Sep 11, 2013 8:56 AM, Andreas Owen a...@conx.ch wrote:

 i'm using solr 4.3.1 with tika to index html-pages. the html files are
 iso-8859-1 (ansi) encoded and the meta tag content-encoding as well. the
 server-http-header says it's utf8 and firefox-webdeveloper agrees.

 when i index a page with special chars like ä,ö,ü solr outputs it
 completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
 it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
 has anyone got a idea whats wrong?