Re: Which one is it "cs" or "cz" for Czech language?
: Probably a historical artifact. Yeah, probably. fixing the solr example configs would be fairly trivial -- the names are just symbolic strings -- but currently they are all consistent with the lucene packagine names, which would me a more complex cange from a back compat standpoint -- i've opened some linked issues, hopefully someone who is more of an expert on the naming conventions of these packages can chime in and we can clean this up... https://issues.apache.org/jira/browse/SOLR-7267 https://issues.apache.org/jira/browse/LUCENE-6366 -Hoss http://www.lucidworks.com/
Re: Which one is it "cs" or "cz" for Czech language?
It does indeed appear that use of the "_cz" suffix is a mistake - those suffixes are supposed to be language codes. Sure, generally, there tends to be a one-to-one relationship between language and country, but clearly that is not as absolute as a casual observer might misguidedly think. I think it's worth a Jira - text types should use language codes, not country codes. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru wrote: > Hi, > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > all. > > We are using Solr's dynamic fields in our project (XWiki), and we have > recently noticed a problem [1] with the Czech language. > > Basically, our mapping says something like this: > > multiValued="true" /> > > ...but at runtime, we ask for the language code "cs" (which is the ISO > language code for Czech [2]) and it obviously fails (due to the mapping). > > Now, we can easily fix this on our end by fixing the mapping to > name="*_cs", > but what we are really wondering now is why does Lucene/Solr use "cz" > (country code) instead of "cs" (language code) in both its "text_cz" field > and its "stopwords_cz.txt" file? > > Is that a mistake on the Solr/Lucene side? Is it some kind of convention? > Is it going to be fixed? > > Thanks, > Eduard > > -- > [1] http://jira.xwiki.org/browse/XWIKI-11897 > [2] http://en.wikipedia.org/wiki/Czech_language >
Re: Which one is it "cs" or "cz" for Czech language?
Hi, On Wed, Mar 18, 2015 at 9:28 AM, steve wrote: > FYI:http://www.w3schools.com/tags/ref_country_codes.asp CZECH REPUBLICCZ > No entry for CS > Exactly, steve. "CZ" is the country code, however we are talking about language codes (which is "CS"), since those Solr types deal with languages not with countries. Or were you trying to point out something else? Thanks, Eduard P.S: Here's the 2-letter language codes ISO for reference: http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes > From: md...@apache.org > > Date: Tue, 17 Mar 2015 12:45:57 -0500 > > Subject: Re: Which one is it "cs" or "cz" for Czech language? > > To: solr-user@lucene.apache.org > > > > Probably a historical artifact. > > > > cz is the country code for the Czech Republic, cs is the language code > for > > Czech. Once, cs was also the country code for Czechosolvakia, leading > some > > folks to accidentally conflate the two. > > > > On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru > > wrote: > > > > > Hi, > > > > > > First of all, a bit of a disclaimer: I am not a Czech language > speaker, at > > > all. > > > > > > We are using Solr's dynamic fields in our project (XWiki), and we have > > > recently noticed a problem [1] with the Czech language. > > > > > > Basically, our mapping says something like this: > > > > > > > > multiValued="true" /> > > > > > > ...but at runtime, we ask for the language code "cs" (which is the ISO > > > language code for Czech [2]) and it obviously fails (due to the > mapping). > > > > > > Now, we can easily fix this on our end by fixing the mapping to > > > name="*_cs", > > > but what we are really wondering now is why does Lucene/Solr use "cz" > > > (country code) instead of "cs" (language code) in both its "text_cz" > field > > > and its "stopwords_cz.txt" file? > > > > > > Is that a mistake on the Solr/Lucene side? Is it some kind of > convention? > > > Is it going to be fixed? > > > > > > Thanks, > > > Eduard > > > > > > -- > > > [1] http://jira.xwiki.org/browse/XWIKI-11897 > > > [2] http://en.wikipedia.org/wiki/Czech_language > > > > >
RE: Which one is it "cs" or "cz" for Czech language?
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry for CS > From: md...@apache.org > Date: Tue, 17 Mar 2015 12:45:57 -0500 > Subject: Re: Which one is it "cs" or "cz" for Czech language? > To: solr-user@lucene.apache.org > > Probably a historical artifact. > > cz is the country code for the Czech Republic, cs is the language code for > Czech. Once, cs was also the country code for Czechosolvakia, leading some > folks to accidentally conflate the two. > > On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru > wrote: > > > Hi, > > > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > > all. > > > > We are using Solr's dynamic fields in our project (XWiki), and we have > > recently noticed a problem [1] with the Czech language. > > > > Basically, our mapping says something like this: > > > > > multiValued="true" /> > > > > ...but at runtime, we ask for the language code "cs" (which is the ISO > > language code for Czech [2]) and it obviously fails (due to the mapping). > > > > Now, we can easily fix this on our end by fixing the mapping to > > name="*_cs", > > but what we are really wondering now is why does Lucene/Solr use "cz" > > (country code) instead of "cs" (language code) in both its "text_cz" field > > and its "stopwords_cz.txt" file? > > > > Is that a mistake on the Solr/Lucene side? Is it some kind of convention? > > Is it going to be fixed? > > > > Thanks, > > Eduard > > > > -- > > [1] http://jira.xwiki.org/browse/XWIKI-11897 > > [2] http://en.wikipedia.org/wiki/Czech_language > >
Re: Which one is it "cs" or "cz" for Czech language?
Probably a historical artifact. cz is the country code for the Czech Republic, cs is the language code for Czech. Once, cs was also the country code for Czechosolvakia, leading some folks to accidentally conflate the two. On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru wrote: > Hi, > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > all. > > We are using Solr's dynamic fields in our project (XWiki), and we have > recently noticed a problem [1] with the Czech language. > > Basically, our mapping says something like this: > > multiValued="true" /> > > ...but at runtime, we ask for the language code "cs" (which is the ISO > language code for Czech [2]) and it obviously fails (due to the mapping). > > Now, we can easily fix this on our end by fixing the mapping to > name="*_cs", > but what we are really wondering now is why does Lucene/Solr use "cz" > (country code) instead of "cs" (language code) in both its "text_cz" field > and its "stopwords_cz.txt" file? > > Is that a mistake on the Solr/Lucene side? Is it some kind of convention? > Is it going to be fixed? > > Thanks, > Eduard > > -- > [1] http://jira.xwiki.org/browse/XWIKI-11897 > [2] http://en.wikipedia.org/wiki/Czech_language >
Which one is it "cs" or "cz" for Czech language?
Hi, First of all, a bit of a disclaimer: I am not a Czech language speaker, at all. We are using Solr's dynamic fields in our project (XWiki), and we have recently noticed a problem [1] with the Czech language. Basically, our mapping says something like this: ...but at runtime, we ask for the language code "cs" (which is the ISO language code for Czech [2]) and it obviously fails (due to the mapping). Now, we can easily fix this on our end by fixing the mapping to name="*_cs", but what we are really wondering now is why does Lucene/Solr use "cz" (country code) instead of "cs" (language code) in both its "text_cz" field and its "stopwords_cz.txt" file? Is that a mistake on the Solr/Lucene side? Is it some kind of convention? Is it going to be fixed? Thanks, Eduard -- [1] http://jira.xwiki.org/browse/XWIKI-11897 [2] http://en.wikipedia.org/wiki/Czech_language