RE: Facet only support english?
If my memory is correct, UTF-8 has been the default encoding per XML specification from a very early stage. If the XML parser is not defaulting to UTF-8 in absence of the encoding attribute, that means the XML parser has a bug, and the code should be corrected. (I don't have an objection to add the encoding attribute for clarity, however.) -kuro > -Original Message- > From: Walter Underwood [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 09, 2007 4:33 PM > To: solr-user@lucene.apache.org > Subject: Re: Facet only support english? > > I didn't remember that requirement, so I looked it up. It was added > in XML 1.0 2nd edition. Originally, unspecified encodings were open > for auto-detection. > > Content type trumps encoding declarations, of course, per RFC 3023 > and allowed by the XML spec. > > wunder > > On 5/9/07 4:19 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > > > I thought that conformant parsers use UTF-8 as the default anyway: > > > > http://www.w3.org/TR/REC-xml/#charencoding > > > > -Mike > >
Re: Facet only support english?
I didn't remember that requirement, so I looked it up. It was added in XML 1.0 2nd edition. Originally, unspecified encodings were open for auto-detection. Content type trumps encoding declarations, of course, per RFC 3023 and allowed by the XML spec. wunder On 5/9/07 4:19 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > I thought that conformant parsers use UTF-8 as the default anyway: > > http://www.w3.org/TR/REC-xml/#charencoding > > -Mike
Re: Facet only support english?
On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > +1 on explicit encoding declarations. Done (even though it really wasn't needed since there were no int'l chars in the example). As Mike points out, it only marginally helps... if the user adds international chars to the config and saves it as something other than UTF-8 they are still hosed. At least UTF-8 is a better default than something like latin-1 though. I thought that conformant parsers use UTF-8 as the default anyway: http://www.w3.org/TR/REC-xml/#charencoding -Mike
Re: Facet only support english?
+1 on explicit encoding declarations. Done (even though it really wasn't needed since there were no int'l chars in the example). As Mike points out, it only marginally helps... if the user adds international chars to the config and saves it as something other than UTF-8 they are still hosed. At least UTF-8 is a better default than something like latin-1 though. -Yonik
Re: Facet only support english?
+1 on explicit encoding declarations. Yonik Seeley wrote: On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? -Yonik
Re: Facet only support english?
On 5/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: > We should probably change the example solrconfig.xml and schema.xml to > be UTF-8 by default. Any objections? > I'm for it... but if the xml parser uses getReader() does it make any difference? For Solr's XML config files, DocumentBuilder.parse(InputStream) is called, so we don't construct a reader first. -Yonik
Re: Facet only support english?
Yonik Seeley wrote: On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? I'm for it... but if the xml parser uses getReader() does it make any difference?
Re: Facet only support english?
I was about to suggest the same thing. +1 on explicit encoding declarations. wunder On 5/9/07 3:26 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> If you are saving the file in UTF-8 format, then try changing the >> first line to be this: >> > > We should probably change the example solrconfig.xml and schema.xml to > be UTF-8 by default. Any objections? > > -Yonik
Re: Facet only support english?
On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > If you are saving the file in UTF-8 format, then try changing the > first line to be this: > We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? No--I'm not sure that it'll bring clarity for anyone who isn't aware of xml encoding issues, but I can't see it hurting. -Mike
Re: Facet only support english?
On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? -Yonik
Re: Facet only support english?
On 5/5/07, James liu <[EMAIL PROTECTED]> wrote: Expect it to support other language like chinese. maybe solr facet can config like this when it support other language. title:"诺基亚" solrconfig.xml is a normal XML document. It currently starts off with which has no char encoding specified and the XML parser may default to something you don't want. If you are saving the file in UTF-8 format, then try changing the first line to be this: -Yonik
Re: Facet only support english?
: Subject: Facet only support english? there isn't anything in the faceting support that is specific to english, but by the looks of it the problem you are having is when you try to put default facet.query params in your solrconfig.xml right? it's very possible that Solr isn't doing "the right thing" when reading the solrconfig.xml and making bad assumptions about the solrconfig.xml. regretable, i don't have hte time/energy to try and track down this issue at the moment ... perhaps you or someone else can using the example solrconfig.xml and exampledocs/utf8-example.xml files (ie: setup a defualt facet.query that should match on the UTF8TEST doc and then create a patch for Solr to make it work). -- regards jl -Hoss
Facet only support english?
Expect it to support other language like chinese. maybe solr facet can config like this when it support other language. title:"诺基亚" or title:'诺基亚' or title:诺基亚 -- regards jl