Re: dynamic analyzer based on condition

2012-04-15 Thread Erick Erickson
You'll have to create a field per language...

The 3.6 example code has the fieldType
definitions for a lot of languages, that might
be a good place to start.

Best
Erick

On Fri, Apr 13, 2012 at 8:32 PM, srinir sramasw...@nextag.com wrote:
 Hi,

 I want to pick different analyzers for the same field for different
 languages. I can determine the language from a different field. I would have
 different fieldTypes defined in my schema.xml such as text_en, text_de,
 text_fr, etc where i specify which analyzer and filter to use during
 indexing and query time.

    fieldType name=text_en class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPossessiveFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPossessiveFilterFactory/
      /analyzer
    /fieldType

 but i would like to define the field dynamically. for e.g

 if lang==en
 field name=description type=text_en indexed=true stored=true  /
 else if lang==de
 field name=description type=text_de indexed=true stored=true /
 ...


 Can I achieve this somehow ? If this approach cannot be done then i can just
 create one field for every language.

 Thanks
 Srini

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic analyzer based on condition

2012-04-15 Thread srinir
Hi Erick,

Thanks a lot for your reply. I have around 10-15 searchable text fields (and
5-6 languages). If I create one per language will that increase the memory
occupied by my index. Even though only one field will have a value at a
time, will there be a case the empty fields in the index will occupy some
memory ? will that happen if i enable field caching ?


Thanks
Srini

--
View this message in context: 
http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic analyzer based on condition

2012-04-15 Thread Erick Erickson
Before you start worrying about memory, do you have any proof at all
that memory is a problem? Are you expecting to have a lot of documents
in your index (as in multiple tens of millions)?

If you try to put multiple languages in a single field, the results will be
problematic for some set documents/queries, especially if you're mixing
widely disparate languages (think English and Chinese for instance).

I'd try the field per language option just to see if you need to go to a more
complex solution.

There is no penalty for empty fields in documents, so don't worry about
that.


Best
Erick

On Sun, Apr 15, 2012 at 3:40 PM, srinir sramasw...@nextag.com wrote:
 Hi Erick,

 Thanks a lot for your reply. I have around 10-15 searchable text fields (and
 5-6 languages). If I create one per language will that increase the memory
 occupied by my index. Even though only one field will have a value at a
 time, will there be a case the empty fields in the index will occupy some
 memory ? will that happen if i enable field caching ?


 Thanks
 Srini

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
 Sent from the Solr - User mailing list archive at Nabble.com.


dynamic analyzer based on condition

2012-04-13 Thread srinir
Hi,

I want to pick different analyzers for the same field for different
languages. I can determine the language from a different field. I would have
different fieldTypes defined in my schema.xml such as text_en, text_de,
text_fr, etc where i specify which analyzer and filter to use during
indexing and query time. 

fieldType name=text_en class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
  /analyzer
/fieldType

but i would like to define the field dynamically. for e.g

if lang==en
field name=description type=text_en indexed=true stored=true  /
else if lang==de
field name=description type=text_de indexed=true stored=true /
...


Can I achieve this somehow ? If this approach cannot be done then i can just
create one field for every language. 

Thanks
Srini

--
View this message in context: 
http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
Sent from the Solr - User mailing list archive at Nabble.com.