RE: problems with search in solr

2012-03-22 Thread Juan Pablo Mora
Remove the stemmer filter. "Caso" and "casa" are transformed into "cas" if you 
use the stemmer filter.

En español:
Quita el filtro de stemmer, que se usa para sacar la raiz de las palabras, pero 
en tu caso la raíz de "casa" y "caso" es la misma, "cas".

Un saludo.


De: PINA CORONADO, RAFAEL [rafael.p...@carm.es]
Enviado el: jueves, 22 de marzo de 2012 13:38
Para: solr-user@lucene.apache.org
Asunto: problems with search in solr

Good morning:
I have problems with the results obtained Solr search string (eg caso). Me back 
records with similar terms (in this example would return the same as if looking 
casa).
The 1.4.1 version of Solr is
The definition of type text in the file schema.xml is:


  






  


Could you tell if an error in the configuration and how to solve it.

thanks

=
Rafael Pina Coronado
Servicio de Informática.
Archivo General de la Región de Murcia
Email: rafael.p...@carm.es
==



RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Juan Pablo Mora
Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript 
dataimport. You have to use the Oracle JDK.

Juampa.

De: randolf.julian [randolf.jul...@dominionenterprises.com]
Enviado el: martes, 20 de marzo de 2012 5:41
Para: solr-user@lucene.apache.org
Asunto: SOLR 3.3 DIH and Java 1.6

I am trying to use the data import handler to update SOLR index with Oracle
data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I
created a script transformer:

  

RE: Solr Optimization Fail

2011-12-16 Thread Juan Pablo Mora
Maybe you are generating a snapshot of your index attached to the optimize ???
Look for post-commit or post-optimize events in your solr-config.xml


De: Rajani Maski [rajinima...@gmail.com]
Enviado el: viernes, 16 de diciembre de 2011 11:11
Para: solr-user@lucene.apache.org
Asunto: Solr Optimization Fail

Hi,

 When we do optimize, it actually reduces the data size right?

I have index of size 6gb(5 million documents). Index is already created
with commits for every 1 documents.

Now I was trying to do optimization with  http optimize command.   When i
did that,  data size became - 12gb.  Why this might have happened?

And can anyone please suggest me fix for it?

Thanks
Rajani


Re: Grouping or Facet ?

2011-12-09 Thread Juan Pablo Mora
Sorry if I don´t explain my problem clearly...

I need to do a suggester of names based on a prefix. My data are from two 
categories of people, admins and developers for example. So when the client 
write "SAN" my results should be:

Prefix: San
Developers: Sanchez Garcia, Juan (5)
   Sanchez Roman, Ivan (2)
   San...

Admins: Sanchez, Pedro (7)
Sanchez Garcia, Javier (2)


And the most common a name is, the upper position will have. So I think is not 
posible to do that with grouping. So finally my schema will be:

id
nameDeveloper or nameAdmin : both String fields, but only one will have values 
in a doc.

And my query with facet will be:

/q=*:*&facet=true&facet.field=nameDeveloper&facet.field=nameAdmin&facet.prefix=SAN&facet.minCounts=1


If I try to do that with grouping I need something like 
group.pivot=category,name , and is not posible in Solr yet.


Best,
Juampa.



El 08/12/2011, a las 02:23, Darren Govoni escribió:

> Yes. That's what I would expect. I guess I didn't understand when you said
> 
> "The facet counts are the counts of the *values* in that field"
> 
> Because it seems its the count of the number of matching documents 
> irrespective
> if one document has 20 values for that field and another 10, the facet count 
> will be 2,
> one for each document in the results.
> 
> On 12/07/2011 09:04 AM, Erick Erickson wrote:
>> In your example you'll have 10 facets returned each with a value of 1.
>> 
>> Best
>> Erick
>> 
>> On Tue, Dec 6, 2011 at 9:54 AM,  wrote:
>>> Sorry to jump into this thread, but are you saying that the facet count is
>>> not # of result hits?
>>> 
>>> So if I have 1 document with field CAT that has 10 values and I do a query
>>> that returns this 1 document with faceting, that the CAT facet count will
>>> be 10 not 1? I don't seem to be seeing that behavior in my app (Solr 3.5).
>>> 
>>> Thanks.
>>> 
>>>> OK, I'm not understanding here. You get the counts and the results if you
>>>> facet
>>>> on a single category field. The facet counts are the counts of the
>>>> *values* in that
>>>> field. So it would help me if you showed the output of faceting on a
>>>> single
>>>> category field and why that didn't work for you
>>>> 
>>>> But either way, faceting will probably outperform grouping.
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> On Mon, Dec 5, 2011 at 9:05 AM, Juan Pablo Mora  wrote:
>>>>> Because I need the count and the result to return back to the client
>>>>> side. Both the grouping and the facet offers me a solution to do that,
>>>>> but my doubt is about performance ...
>>>>> 
>>>>> With Grouping my results are:
>>>>> 
>>>>> "grouped":{
>>>>>"category":{
>>>>>  "matches": ...,
>>>>>  "groups":[{
>>>>>  "groupValue":"categoryXX",
>>>>>  "doclist":{"numFound":Important_number,"start":0,"docs":[
>>>>>  {
>>>>>   doc:id
>>>>>   category:XX
>>>>>  }
>>>>>   "groupValue":"categoryYY",
>>>>>  "doclist":{"numFound":Important_number,"start":0,"docs":[
>>>>>  {
>>>>>   doc: id
>>>>>   category:YY
>>>>>  }
>>>>> 
>>>>> And with faceting my results are :
>>>>> "facet.prefix=whatever"
>>>>> "facet_counts":{
>>>>>"facet_queries":{},
>>>>>"facet_fields":{
>>>>>  "namesXX":[
>>>>>"whatever_name_in_category",76,
>>>>>...
>>>>>  "namesYY":[
>>>>>"whatever_name_in_category",76,
>>>>>...
>>>>> 
>>>>> Both results are OK to me.
>>>>> 
>>>>> 
>>>>> 
>>>>> De: Erick Erickson [erickerick...@gmail.com]
>>>>> Enviado el: lunes, 05 de diciembre de 2011 14:48
>>>>> Para: solr-user@lucene.apache.org
>>>>> Asunto: Re: Grouping or Facet ?
>>>>> 
>>>>> Why not just use the first form of the document
>>>>> and just facet.field=category? You'll get
>>>>> two different facet counts for XX and YY
>>>>> that way.
>>>>> 
>>>>> I don't think grouping is the way to go here.
>>>>> 
>>>>> Best
>>>>> Erick
>>>>> 
>>>>> On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora
>>>>> wrote:
>>>>>> I need to do some counts on a StrField field to suggest options from
>>>>>> two different categories, and I don´t know what option is the best:
>>>>>> 
>>>>>> My schema looks:
>>>>>> 
>>>>>> - id
>>>>>> - name
>>>>>> - category: XX or YY
>>>>>> 
>>>>>> with Grouping I do:
>>>>>> 
>>>>>> http://localhost:8983/?q=name:prefix*&group=true&group.field=category
>>>>>> 
>>>>>> But I can change my schema to to:
>>>>>> 
>>>>>> - id
>>>>>> - nameXX
>>>>>> - nameYY
>>>>>> - category: XX or YY (only 1 value in nameXX or nameYY)
>>>>>> 
>>>>>> With facet:
>>>>>> http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix
>>>>>> 
>>>>>> 
>>>>>> What option have the best performance ?
>>>>>> 
>>>>>> Best,
>>>>>> Juampa.
> 



RE: Grouping or Facet ?

2011-12-05 Thread Juan Pablo Mora
Because I need the count and the result to return back to the client side. Both 
the grouping and the facet offers me a solution to do that, but my doubt is 
about performance ...

With Grouping my results are:

"grouped":{
"category":{
  "matches": ...,
  "groups":[{
  "groupValue":"categoryXX",
  "doclist":{"numFound":Important_number,"start":0,"docs":[
  {
   doc:id
   category:XX
  }  
   "groupValue":"categoryYY",
  "doclist":{"numFound":Important_number,"start":0,"docs":[
  {
   doc: id
   category:YY
  }  

And with faceting my results are :
"facet.prefix=whatever"
"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "namesXX":[
"whatever_name_in_category",76,
...
  "namesYY":[
"whatever_name_in_category",76,
...

Both results are OK to me.



De: Erick Erickson [erickerick...@gmail.com]
Enviado el: lunes, 05 de diciembre de 2011 14:48
Para: solr-user@lucene.apache.org
Asunto: Re: Grouping or Facet ?

Why not just use the first form of the document
and just facet.field=category? You'll get
two different facet counts for XX and YY
that way.

I don't think grouping is the way to go here.

Best
Erick

On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora  wrote:
> I need to do some counts on a StrField field to suggest options from two 
> different categories, and I don´t know what option is the best:
>
> My schema looks:
>
> - id
> - name
> - category: XX or YY
>
> with Grouping I do:
>
> http://localhost:8983/?q=name:prefix*&group=true&group.field=category
>
> But I can change my schema to to:
>
> - id
> - nameXX
> - nameYY
> - category: XX or YY (only 1 value in nameXX or nameYY)
>
> With facet:
> http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix
>
>
> What option have the best performance ?
>
> Best,
> Juampa.


Grouping or Facet ?

2011-12-03 Thread Juan Pablo Mora
I need to do some counts on a StrField field to suggest options from two 
different categories, and I don´t know what option is the best:

My schema looks:

- id
- name
- category: XX or YY

with Grouping I do:

http://localhost:8983/?q=name:prefix*&group=true&group.field=category

But I can change my schema to to:

- id
- nameXX
- nameYY
- category: XX or YY (only 1 value in nameXX or nameYY)

With facet:
http://localhost:8983/?q=*:*&facet=true&facet.field=nameXX&facet.field=nameYY&facet.prefix=prefix


What option have the best performance ?

Best,
Juampa.

Highlight, Dismax and local params

2011-04-18 Thread Juan Pablo Mora
Hello,

I think I have found something extrange with local params and edismax. If I do 
querys like :


"params":{
  "hl.requireFieldMatch":"true",
  "hl.fragsize":"200",
  "json.wrf":"callback0",
  "indent":"on",
  "hl.fl":"domicilio,deno",
  "wt":"json",
  "hl":"true",
  "rows":"5",
  "fl":"oidEmpresa,codNif,codTpoEmp,codVidaEmp,denoDef",
  "debugQuery":"on",
  "q":"{!edismax qf=$tipoDeno^5 pf=$tipoDeno^30 ps=5 qs=1}construcciones 
garcía",
  "tipoDeno":"deno",
  "f.domicilio.hl.alternateField":"domicilioDef",
  "fq":"-codTpoNif:F"}},

The highlighting section of the response looks like:


"highlighting":{
"75663":{
  "domicilio":["P45 FOO BAR"],
  "deno":["V00T06 FOO BAR"]},
"76021":{
  "domicilio":["P45 BLAH BLAH"],
  "deno":["V00T00 BLAH BLAH"]},

But if I repeat the query with:

 "q":"{!edismax qf='$tipoDeno^5 ANOTHER_FIELD' pf=$tipoDeno^30 ps=5 qs=1} 
construcciones garcía"
 tipoDeno = deno


The debug show:

"parsedquery":"+((DisjunctionMaxQuery((deno:construcciones)) 
DisjunctionMaxQuery((deno:garcia)))~2)",
"parsedquery_toString":"+(((deno:construcciones) (deno:garcia))~2)",

And there is no reference to "anotherField" field and the highlight of the 
field deno dissapear in the response.


"highlighting":{
"75663":{
  "domicilio":["P45 FOO BAR"],
"76021":{
  "domicilio":["P45 BLAH BLAH"],





Re: Matching on a multi valued field

2011-04-04 Thread Juan Pablo Mora
I have not find any solution to this. The only thing is to denormalize your 
multivalue field into several docs with a single value field.

Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) 
if you are using solr 1.4 version.


El 04/04/2011, a las 21:21, Brian Lamb escribió:

I just noticed Juan's response and I find that I am encountering that very 
issue in a few cases. Boosting is a good way to put the more relevant results 
to the top but it is possible to only have the correct results returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb 
mailto:brian.l...@journalexperts.com>> wrote:
Thank you all for your responses. The field had already been set up with 
positionIncrementGap=100 so I just needed to add in the slop.


On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora 
mailto:jua...@informa.es>> wrote:
>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
   field A: ["foo bar","dooh"] 2 values

Doc2:
   field A: ["bar dooh", "whatever"] Another 2 values

the query:
   qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
>
>> Hi,
>>
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>>
>> That should work
>>
>> Cheers,
>>
>>> Hi all,
>>>
>>> I have a field set up like this:
>>>
>>> >> stored="true" required="false" />
>>>
>>> And I have some records:
>>>
>>> RECORD1
>>> 
>>>
>>>  man's best friend
>>>  pooch
>>>
>>> 
>>>
>>> RECORD2
>>> 
>>>
>>>  man's worst enemy
>>>  friend to no one
>>>
>>> 
>>>
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND<http://localhost:8983/solr/search/?q=*:*&fq=%7B!q.op=AND>
>>> df=common_names}man's friend
>>>
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>>
>>> Thanks,
>>>
>>> Brian Lamb






Re: Matching on a multi valued field

2011-03-29 Thread Juan Pablo Mora
>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
field A: ["foo bar","dooh"] 2 values

Doc2:
field A: ["bar dooh", "whatever"] Another 2 values

the query:
qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
> 
>> Hi,
>> 
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>> 
>> That should work
>> 
>> Cheers,
>> 
>>> Hi all,
>>> 
>>> I have a field set up like this:
>>> 
>>> >> stored="true" required="false" />
>>> 
>>> And I have some records:
>>> 
>>> RECORD1
>>> 
>>> 
>>>  man's best friend
>>>  pooch
>>> 
>>> 
>>> 
>>> RECORD2
>>> 
>>> 
>>>  man's worst enemy
>>>  friend to no one
>>> 
>>> 
>>> 
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
>>> df=common_names}man's friend
>>> 
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>> 
>>> Thanks,
>>> 
>>> Brian Lamb



RE: Transform a SolrDocument into a SolrInputDocument

2011-03-21 Thread Juan Pablo Mora
I answered myself a similar question here:

http://stackoverflow.com/questions/4037625/change-schema-in-solr-without-reindex

I Hope it helps


De: Marc SCHNEIDER [marc.schneide...@gmail.com]
Enviado el: lunes, 21 de marzo de 2011 15:20
Para: solr-user@lucene.apache.org
Asunto: Re: Transform a SolrDocument into a SolrInputDocument

Hi Péter,

I'm not sure to understand your answer. A SolrInputDocument always contains
only stored fields, so I don't see the problem.
I just like to update an existing stored field...

Thanks,
Marc.

2011/3/21 Péter Király 

> Hi Marc,
>
> as far as I know the best way to do it is working from the original
> source, because it is possible, that not all fields are stores, and
> the original content of the not stored fields is not inside the Solr
> document.
>
> Péter
>
> 2011/3/21 Marc SCHNEIDER :
> > Hello,
> >
> > I'd like to know the fastest way (code lines) to update a field of a
> > document.
> > So my idea was:
> > 1) Get a SolrDocument
> > 2) Add all fields of the SolrDocument to a new SolrInputDocument
> > 3) Update the field in SolrInputDocument
> > 4) Add SolrInputDocument to the server and commit it
> >
> > Is there a fastest way to do that? I mean transforming a SolrDocument
> into a
> > SolrInputDocument?
> >
> > Thanks in advance,
> > Regards,
> > Marc.
> >
>