RE: Facet only support english?

2007-05-10 Thread Teruhiko Kurosaka
If my memory is correct,  UTF-8 has been the default encoding per
XML specification from a very early stage. If the XML parser is not
defaulting 
to UTF-8 in absence of the encoding attribute, that means the XML
parser has a bug, and the code should be corrected.

(I don't have an objection to add the encoding attribute for clarity,
however.)
-kuro

> -Original Message-
> From: Walter Underwood [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, May 09, 2007 4:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet only support english?
> 
> I didn't remember that requirement, so I looked it up. It was added
> in XML 1.0 2nd edition. Originally, unspecified encodings were open
> for auto-detection.
> 
> Content type trumps encoding declarations, of course, per RFC 3023
> and allowed by the XML spec.
> 
> wunder
> 
> On 5/9/07 4:19 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote:
> 
> > I thought that conformant parsers use UTF-8 as the default anyway:
> > 
> > http://www.w3.org/TR/REC-xml/#charencoding
> > 
> > -Mike
> 
> 


Re: Facet only support english?

2007-05-09 Thread Walter Underwood
I didn't remember that requirement, so I looked it up. It was added
in XML 1.0 2nd edition. Originally, unspecified encodings were open
for auto-detection.

Content type trumps encoding declarations, of course, per RFC 3023
and allowed by the XML spec.

wunder

On 5/9/07 4:19 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote:

> I thought that conformant parsers use UTF-8 as the default anyway:
> 
> http://www.w3.org/TR/REC-xml/#charencoding
> 
> -Mike



Re: Facet only support english?

2007-05-09 Thread Mike Klaas

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> +1 on explicit encoding declarations.

Done  (even though it really wasn't needed since there were no int'l
chars in the example).

As Mike points out, it only marginally helps... if the user adds
international chars to the config and saves it as something other than
UTF-8 they are still hosed.  At least UTF-8 is a better default than
something like latin-1 though.


I thought that conformant parsers use UTF-8 as the default anyway:

http://www.w3.org/TR/REC-xml/#charencoding

-Mike


Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

+1 on explicit encoding declarations.


Done  (even though it really wasn't needed since there were no int'l
chars in the example).

As Mike points out, it only marginally helps... if the user adds
international chars to the config and saves it as something other than
UTF-8 they are still hosed.  At least UTF-8 is a better default than
something like latin-1 though.

-Yonik


Re: Facet only support english?

2007-05-09 Thread Koji Sekiguchi

+1 on explicit encoding declarations.

Yonik Seeley wrote:

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:



We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?

-Yonik





Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:

Yonik Seeley wrote:
> We should probably change the example solrconfig.xml and schema.xml to
> be UTF-8 by default.  Any objections?
>

I'm for it...

but if the xml parser uses getReader() does it make any difference?


For Solr's XML config files, DocumentBuilder.parse(InputStream) is
called, so we don't construct a reader first.

-Yonik


Re: Facet only support english?

2007-05-09 Thread Ryan McKinley

Yonik Seeley wrote:

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:



We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?



I'm for it...

but if the xml parser uses getReader() does it make any difference?


Re: Facet only support english?

2007-05-09 Thread Walter Underwood
I was about to suggest the same thing.
+1 on explicit encoding declarations.

wunder

On 5/9/07 3:26 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> If you are saving the file in UTF-8 format, then try changing the
>> first line to be this:
>> 
> 
> We should probably change the example solrconfig.xml and schema.xml to
> be UTF-8 by default.  Any objections?
> 
> -Yonik



Re: Facet only support english?

2007-05-09 Thread Mike Klaas

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> If you are saving the file in UTF-8 format, then try changing the
> first line to be this:
> 

We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?


No--I'm not sure that it'll bring clarity for anyone who isn't aware
of xml encoding issues, but I can't see it hurting.

-Mike


Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

If you are saving the file in UTF-8 format, then try changing the
first line to be this:



We should probably change the example solrconfig.xml and schema.xml to
be UTF-8 by default.  Any objections?

-Yonik


Re: Facet only support english?

2007-05-09 Thread Yonik Seeley

On 5/5/07, James liu <[EMAIL PROTECTED]> wrote:

Expect it to support other language like chinese.

maybe solr facet can config like this when it support other language.

title:"诺基亚"


solrconfig.xml is a normal XML document.  It currently starts off with

which has no char encoding specified and the XML parser may default to
something you don't want.

If you are saving the file in UTF-8 format, then try changing the
first line to be this:


-Yonik


Re: Facet only support english?

2007-05-06 Thread Chris Hostetter

: Subject: Facet only support english?

there isn't anything in the faceting support that is specific to english,
but by the looks of it the problem you are having is when you try to put
default facet.query params in your solrconfig.xml right?

it's very possible that Solr isn't doing "the right thing" when
reading the solrconfig.xml and making bad assumptions about the
solrconfig.xml.

regretable, i don't have hte time/energy to try and track down this issue
at the moment ... perhaps you or someone else can using the example
solrconfig.xml and exampledocs/utf8-example.xml files (ie:  setup a
defualt facet.query that should match on the UTF8TEST doc and then create
a patch for Solr to make it work).

-- 
regards
jl




-Hoss



Facet only support english?

2007-05-04 Thread James liu

Expect it to support other language like chinese.

maybe solr facet can config like this when it support other language.

title:"诺基亚"


or


title:'诺基亚'



or


title:诺基亚





--
regards
jl