Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Chris Hostetter

: Probably a historical artifact.

Yeah, probably.  fixing the solr example configs would be fairly trivial 
-- the names are just symbolic strings -- but currently they are all 
consistent with the lucene packagine names, which would me a more complex 
cange from a back compat standpoint -- i've opened some linked issues, 
hopefully someone who is more of an expert on the naming conventions of 
these packages can chime in and we can clean this up...

https://issues.apache.org/jira/browse/SOLR-7267
https://issues.apache.org/jira/browse/LUCENE-6366


-Hoss
http://www.lucidworks.com/


Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Jack Krupansky
It does indeed appear that use of the "_cz" suffix is a mistake - those
suffixes are supposed to be language codes. Sure, generally, there tends to
be a one-to-one relationship between language and country, but clearly that
is not as absolute as a casual observer might misguidedly think.

I think it's worth a Jira - text types should use language codes, not
country codes.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru  wrote:

> Hi,
>
> First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> all.
>
> We are using Solr's dynamic fields in our project (XWiki), and we have
> recently noticed a problem [1] with the Czech language.
>
> Basically, our mapping says something like this:
>
>  multiValued="true" />
>
> ...but at runtime, we ask for the language code "cs" (which is the ISO
> language code for Czech [2]) and it obviously fails (due to the mapping).
>
> Now, we can easily fix this on our end by fixing the mapping to
> name="*_cs",
> but what we are really wondering now is why does Lucene/Solr use "cz"
> (country code) instead of "cs" (language code) in both its "text_cz" field
> and its "stopwords_cz.txt" file?
>
> Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> Is it going to be fixed?
>
> Thanks,
> Eduard
>
> --
> [1] http://jira.xwiki.org/browse/XWIKI-11897
> [2] http://en.wikipedia.org/wiki/Czech_language
>


Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Eduard Moraru
Hi,

On Wed, Mar 18, 2015 at 9:28 AM, steve  wrote:

> FYI:http://www.w3schools.com/tags/ref_country_codes.asp CZECH REPUBLICCZ
> No entry for CS
>

Exactly, steve. "CZ" is the country code, however we are talking about
language codes (which is "CS"), since those Solr types deal with languages
not with countries.

Or were you trying to point out something else?

Thanks,
Eduard

P.S: Here's the 2-letter language codes ISO for reference:
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

> From: md...@apache.org
> > Date: Tue, 17 Mar 2015 12:45:57 -0500
> > Subject: Re: Which one is it "cs" or "cz" for Czech language?
> > To: solr-user@lucene.apache.org
> >
> > Probably a historical artifact.
> >
> > cz is the country code for the Czech Republic, cs is the language code
> for
> > Czech. Once, cs was also the country code for Czechosolvakia, leading
> some
> > folks to accidentally conflate the two.
> >
> > On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru 
> > wrote:
> >
> > > Hi,
> > >
> > > First of all, a bit of a disclaimer: I am not a Czech language
> speaker, at
> > > all.
> > >
> > > We are using Solr's dynamic fields in our project (XWiki), and we have
> > > recently noticed a problem [1] with the Czech language.
> > >
> > > Basically, our mapping says something like this:
> > >
> > >  > > multiValued="true" />
> > >
> > > ...but at runtime, we ask for the language code "cs" (which is the ISO
> > > language code for Czech [2]) and it obviously fails (due to the
> mapping).
> > >
> > > Now, we can easily fix this on our end by fixing the mapping to
> > > name="*_cs",
> > > but what we are really wondering now is why does Lucene/Solr use "cz"
> > > (country code) instead of "cs" (language code) in both its "text_cz"
> field
> > > and its "stopwords_cz.txt" file?
> > >
> > > Is that a mistake on the Solr/Lucene side? Is it some kind of
> convention?
> > > Is it going to be fixed?
> > >
> > > Thanks,
> > > Eduard
> > >
> > > --
> > > [1] http://jira.xwiki.org/browse/XWIKI-11897
> > > [2] http://en.wikipedia.org/wiki/Czech_language
> > >
>
>


RE: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread steve
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry 
for CS
> From: md...@apache.org
> Date: Tue, 17 Mar 2015 12:45:57 -0500
> Subject: Re: Which one is it "cs" or "cz" for Czech language?
> To: solr-user@lucene.apache.org
> 
> Probably a historical artifact.
> 
> cz is the country code for the Czech Republic, cs is the language code for
> Czech. Once, cs was also the country code for Czechosolvakia, leading some
> folks to accidentally conflate the two.
> 
> On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru 
> wrote:
> 
> > Hi,
> >
> > First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> > all.
> >
> > We are using Solr's dynamic fields in our project (XWiki), and we have
> > recently noticed a problem [1] with the Czech language.
> >
> > Basically, our mapping says something like this:
> >
> >  > multiValued="true" />
> >
> > ...but at runtime, we ask for the language code "cs" (which is the ISO
> > language code for Czech [2]) and it obviously fails (due to the mapping).
> >
> > Now, we can easily fix this on our end by fixing the mapping to
> > name="*_cs",
> > but what we are really wondering now is why does Lucene/Solr use "cz"
> > (country code) instead of "cs" (language code) in both its "text_cz" field
> > and its "stopwords_cz.txt" file?
> >
> > Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> > Is it going to be fixed?
> >
> > Thanks,
> > Eduard
> >
> > --
> > [1] http://jira.xwiki.org/browse/XWIKI-11897
> > [2] http://en.wikipedia.org/wiki/Czech_language
> >
  

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-17 Thread Mike Drob
Probably a historical artifact.

cz is the country code for the Czech Republic, cs is the language code for
Czech. Once, cs was also the country code for Czechosolvakia, leading some
folks to accidentally conflate the two.

On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru 
wrote:

> Hi,
>
> First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> all.
>
> We are using Solr's dynamic fields in our project (XWiki), and we have
> recently noticed a problem [1] with the Czech language.
>
> Basically, our mapping says something like this:
>
>  multiValued="true" />
>
> ...but at runtime, we ask for the language code "cs" (which is the ISO
> language code for Czech [2]) and it obviously fails (due to the mapping).
>
> Now, we can easily fix this on our end by fixing the mapping to
> name="*_cs",
> but what we are really wondering now is why does Lucene/Solr use "cz"
> (country code) instead of "cs" (language code) in both its "text_cz" field
> and its "stopwords_cz.txt" file?
>
> Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> Is it going to be fixed?
>
> Thanks,
> Eduard
>
> --
> [1] http://jira.xwiki.org/browse/XWIKI-11897
> [2] http://en.wikipedia.org/wiki/Czech_language
>


Which one is it "cs" or "cz" for Czech language?

2015-03-17 Thread Eduard Moraru
Hi,

First of all, a bit of a disclaimer: I am not a Czech language speaker, at
all.

We are using Solr's dynamic fields in our project (XWiki), and we have
recently noticed a problem [1] with the Czech language.

Basically, our mapping says something like this:



...but at runtime, we ask for the language code "cs" (which is the ISO
language code for Czech [2]) and it obviously fails (due to the mapping).

Now, we can easily fix this on our end by fixing the mapping to name="*_cs",
but what we are really wondering now is why does Lucene/Solr use "cz"
(country code) instead of "cs" (language code) in both its "text_cz" field
and its "stopwords_cz.txt" file?

Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
Is it going to be fixed?

Thanks,
Eduard

--
[1] http://jira.xwiki.org/browse/XWIKI-11897
[2] http://en.wikipedia.org/wiki/Czech_language