SolrJ: getBeans with multiple document types in response

2015-06-19 Thread Catala, Francois
Hello,

I'm trying to parse Solr Responses with SolrJ,  but the responses contain mixed 
types : for example 'song' documents and 'movie' documents with different 
fields.
The getBeans method takes 1 class type as input parameter, this does not allow 
for mixed document types responses.
What would be the best approach to parse the response and to get a list of 
'entity' (the super class).

I'm about to write another implementation of the DocumentObjectBinder class but 
I'd like to avoid that.

Thanks!!

François Catala
Software Developer
NUANCE COMMUNICATIONS, INC.
1500 University, Suite 557
Montréal QC  H3A 3S7
514 904 7800   Officejust say my name or ext. 2345



Chinese to Pinyin transliteration : homophone matching

2013-06-10 Thread Catala, Francois
Hi,

I've been looking for ways to do homophone matching in Solr for CJK languages. 
I am digging into Chinese for a start.
My inputs are words made of simplified characters, and I need to match words 
that use different characters, but are pronounced the same way.

My conclusion is that I need to index all the possible pinyin representations 
for a given word. Then at query time, generate all pinyin representations for 
the searched word, and match all documents containing any one of them.

My question is : which components can do that in Solr? I've been looking at 
ICUTokenFilterFactory, but with id=Han-Latin it seems to to do a 1 to 1 
mapping, between characters and pinyin, while in reality it should be a 1 to 
many mapping.

Do you know of any Analyzer that could do something like :


-   input :
长


-   output :
cháng | zhǎng | zháng


Thanks so much for your help!



Shingles Filter Query time behaviour

2013-03-18 Thread Catala, Francois
Hello,

I am trying to have the input darkknight match documents containing either 
dark knight and darkknight.
The reverse should also work (dark knight matching dark knight and 
darkknight) but it doesn't. Does anyone know why?


When I run the following query I get the expected response with the two 
documents matched

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=flname/str
str name=indenttrue/str
str name=qname:darkknight/str
str name=wtxml/str
  /lst
/lst
result name=response numFound=2 start=0
  doc
str name=nameBatman, the darkknight Rises/str/doc
  doc
str name=nameBatman, the dark knight Rises/str/doc
/result
/response


HOWEVER when I run the same query looking for dark knight two words I get 
only 1 document matched as shows the response :

lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  lst name=params
str name=flname/str
str name=indenttrue/str
str name=qname:dark knight/str
str name=wtxml/str
  /lst
/lst
result name=response numFound=1 start=0
  doc
str name=nameBatman, the dark knight Rises/str/doc
/result
/response

I have these documents as input :

doc
  field name=idbat1/field
  field name=nameBatman, the dark knight Rises/field
/doc
doc
  field name=idbat2/field
  field name=nameBatman, the darkknight Rises/field
/doc

And I defined this analyser :

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory
tokenSeparator=
outputUnigrams=true/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory
tokenSeparator=
outputUnigrams=true
outputUnigramIfNoNgrams=true/
  /analyzer