Re: Java - setting multi-valued fields

2020-07-06 Thread kumar gaurav
If this approach did not work , that means there is something wrong in Solr
schema .

Can you share a field schema ?


Regards
Kumar Gaurav


On Wed, Jun 24, 2020 at 2:29 PM Eivind Hodneland <
eivind.hodnel...@uptimeconsulting.no> wrote:

> Hi,
>
> Thanks for your input.
> However, this approach did not work either, it gave the same result as
> previously.
>
> Is there perhaps a different approach that could be used, other methods
> etc. ?
>
>
> Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7,
> NO-0165 Oslo, Norway
> Tel: +47 22 33 71 00 | Mob: +47 971 76 083 |
> eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
> --
> Search and Big Data solutions
> Software Development
> IT outsourcing services and consultancy
>
>
>
>
>
> -Original Message-
> From: kumar gaurav 
> Sent: onsdag 17. juni 2020 19:02
> To: solr-user@lucene.apache.org
> Subject: Re: Java - setting multi-valued fields
>
> HI
>
> Example:
>
> String[] values = new String[] {“value 1”, “value 2” };
>
> inputDoc.setField (multiFieldName, values);
>
>
> Can you try once to change the array to list ?
>
> List values = new ArrayList<>();
>
> values.add("value 1");
>
> values.add("value 2");
>
> inputDoc.setField (multiFieldName, values);
>
>
>
> regards
>
> Kumar Gaurav
>
>
>
>
>
>
>
> On Wed, Jun 17, 2020 at 8:33 PM Eivind Hodneland <
> eivind.hodnel...@uptimeconsulting.no> wrote:
>
> > Hi,
> >
> >
> >
> > My customer has a Solr index with a large amount of fields, many of
> > these are multivalued (type=”string”, multiValued=”true”).
> >
> >
> >
> > I am having problems with setting the values for these fields in my
> > Java update processors.
> >
> > Example:
> >
> > String[] values = new String[] {“value 1”, “value 2” };
> >
> > inputDoc.setField (multiFieldName, values);
> >
> >
> >
> > However, only “value 1” is present in the index after updating.
> >
> > What is the best / correct way to make this work?
> >
> >
> >
> >
> >
> >
> >
> > Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate
> > 7,
> > NO-0165 Oslo, Norway
> >
> > Tel: +47 22 33 71 00 | Mob: +47 971 76 083 |
> > eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
> >
> > --
> >
> > Search and Big Data solutions
> >
> > Software Development
> >
> > IT outsourcing services and consultancy
> >
> >
> >
> > [image: 4180EEB7]
> >
> >
> >
>


RE: Java - setting multi-valued fields

2020-06-24 Thread Eivind Hodneland
Hi,

Thanks for your input.
However, this approach did not work either, it gave the same result as 
previously.

Is there perhaps a different approach that could be used, other methods etc. ?


Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7, 
NO-0165 Oslo, Norway
Tel: +47 22 33 71 00 | Mob: +47 971 76 083 | 
eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
--
Search and Big Data solutions
Software Development
IT outsourcing services and consultancy
 




-Original Message-
From: kumar gaurav  
Sent: onsdag 17. juni 2020 19:02
To: solr-user@lucene.apache.org
Subject: Re: Java - setting multi-valued fields

HI

Example:

String[] values = new String[] {“value 1”, “value 2” };

inputDoc.setField (multiFieldName, values);


Can you try once to change the array to list ?

List values = new ArrayList<>();

values.add("value 1");

values.add("value 2");

inputDoc.setField (multiFieldName, values);



regards

Kumar Gaurav







On Wed, Jun 17, 2020 at 8:33 PM Eivind Hodneland < 
eivind.hodnel...@uptimeconsulting.no> wrote:

> Hi,
>
>
>
> My customer has a Solr index with a large amount of fields, many of 
> these are multivalued (type=”string”, multiValued=”true”).
>
>
>
> I am having problems with setting the values for these fields in my 
> Java update processors.
>
> Example:
>
> String[] values = new String[] {“value 1”, “value 2” };
>
> inputDoc.setField (multiFieldName, values);
>
>
>
> However, only “value 1” is present in the index after updating.
>
> What is the best / correct way to make this work?
>
>
>
>
>
>
>
> Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 
> 7,
> NO-0165 Oslo, Norway
>
> Tel: +47 22 33 71 00 | Mob: +47 971 76 083 | 
> eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
>
> --
>
> Search and Big Data solutions
>
> Software Development
>
> IT outsourcing services and consultancy
>
>
>
> [image: 4180EEB7]
>
>
>


Re: Java - setting multi-valued fields

2020-06-17 Thread kumar gaurav
HI

Example:

String[] values = new String[] {“value 1”, “value 2” };

inputDoc.setField (multiFieldName, values);


Can you try once to change the array to list ?

List values = new ArrayList<>();

values.add("value 1");

values.add("value 2");

inputDoc.setField (multiFieldName, values);



regards

Kumar Gaurav







On Wed, Jun 17, 2020 at 8:33 PM Eivind Hodneland <
eivind.hodnel...@uptimeconsulting.no> wrote:

> Hi,
>
>
>
> My customer has a Solr index with a large amount of fields, many of these
> are multivalued (type=”string”, multiValued=”true”).
>
>
>
> I am having problems with setting the values for these fields in my Java
> update processors.
>
> Example:
>
> String[] values = new String[] {“value 1”, “value 2” };
>
> inputDoc.setField (multiFieldName, values);
>
>
>
> However, only “value 1” is present in the index after updating.
>
> What is the best / correct way to make this work?
>
>
>
>
>
>
>
> Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7,
> NO-0165 Oslo, Norway
>
> Tel: +47 22 33 71 00 | Mob: +47 971 76 083 |
> eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
>
> --
>
> Search and Big Data solutions
>
> Software Development
>
> IT outsourcing services and consultancy
>
>
>
> [image: 4180EEB7]
>
>
>


Java - setting multi-valued fields

2020-06-17 Thread Eivind Hodneland
Hi,

My customer has a Solr index with a large amount of fields, many of these are 
multivalued (type="string", multiValued="true").

I am having problems with setting the values for these fields in my Java update 
processors.
Example:
String[] values = new String[] {"value 1", "value 2" };
inputDoc.setField (multiFieldName, values);

However, only "value 1" is present in the index after updating.
What is the best / correct way to make this work?



Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7, 
NO-0165 Oslo, Norway
Tel: +47 22 33 71 00 | Mob: +47 971 76 083 | 
eivind.hodnel...@uptimeconsulting.no
  | www.uptimeconsulting.no
--
Search and Big Data solutions
Software Development
IT outsourcing services and consultancy

[4180EEB7]



Re: Solr 7.5 multi-valued fields will not update with multiple values

2019-03-29 Thread Erick Erickson
Separate out the author bits. Instead of

"author_fullname":["Author 1","Author 2”,”Author 3”]

use

"author_fullname":"Author 1”,
"author_fullname":"Author 2”,
"author_fullname":”Author 3”

> On Mar 29, 2019, at 6:16 AM, Eivind Hodneland 
>  wrote:
> 
> curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:18080/solr/customer_core/update/' --data-binary 
> '[{"id":"MyId","author_fullname":["Author 1","Author 2”,”Author 3”]}]'



Solr 7.5 multi-valued fields will not update with multiple values

2019-03-29 Thread Eivind Hodneland
Hi,

I am running a Solr 7.5 index for a customer.
I have recently discovered that none of the multivalued string/text fields are 
filled with more than one value each.

Example of indexing (edited and abbreviated):
curl -X POST -H 'Content-Type: application/json' 
'http://localhost:18080/solr/customer_core/update/' --data-binary 
'[{"id":"MyId","author_fullname":["Author 1","Author 2","Author 3"]}]'

The multivalued field author_fullname only gets one value, namely "Author 1".
This is also the case for the other multivalued fields in the schema.

Definition of author_fullname and its corresponding type from managed-schema:

  




  
  
  
  


  
  
  
  

  

Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7, 
NO-0165 Oslo, Norway
Tel: +47 22 33 71 00 | Mob: +47 971 76 083 | 
eivind.hodnel...@uptimeconsulting.no
  | www.uptimeconsulting.no
--
Search and Big Data solutions
Software Development
IT outsourcing services and consultancy

[4180EEB7]



Re: Sorting multi-valued fields

2018-09-14 Thread Shawn Heisey

On 9/14/2018 4:50 AM, richard.clarke wrote:

What does it mean to sort documents by a multivalued field?  If a field has
multiple values, how can this be used to sort documents?

e.g. if document 1 has a numeric field containing values 1,2,3,4,5 and
document 2 has values in the same field of 1,2,3 - which come first in the
sort order and why?


It's my understanding that Solr will refuse to sort on a multivalued 
field, returning an error.  If Solr were to make a decision to use the 
first value, or the minimum value, or the maximum value ... some users 
would think that was the wrong choice.


You can use a function query to have it sort on the min or max value of 
a field.


https://lucidworks.com/2015/09/10/minmax-on-multivalued-field/

Thanks,
Shawn



Re: Sorting multi-valued fields

2018-09-14 Thread Mikhail Khludnev
http://people.apache.org/~mkhl/searchable-solr-guide-7-3/common-query-parameters.html#sort-parameter
In the case of primitive fields, or SortableTextFields, that are
multiValued="true" the representative value used for each doc when sorting
depends on the sort direction: The minimum value in each document is used
for ascending (asc) sorting, while the maximal value in each document is
used for descending (desc) sorting. This default behavior is equivilent to
explicitly sorting using the 2 argument field()

 function: sort=field(name,min) asc and sort=field(name,max) desc

On Fri, Sep 14, 2018 at 1:50 PM richard.clarke 
wrote:

> Hi
> What does it mean to sort documents by a multivalued field?  If a field has
> multiple values, how can this be used to sort documents?
>
> e.g. if document 1 has a numeric field containing values 1,2,3,4,5 and
> document 2 has values in the same field of 1,2,3 - which come first in the
> sort order and why?
>
> Thanks in advance.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Sorting multi-valued fields

2018-09-14 Thread richard.clarke
Hi
What does it mean to sort documents by a multivalued field?  If a field has
multiple values, how can this be used to sort documents?

e.g. if document 1 has a numeric field containing values 1,2,3,4,5 and
document 2 has values in the same field of 1,2,3 - which come first in the
sort order and why?

Thanks in advance.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Reading multi-valued fields in a Function Query

2018-04-16 Thread Ugo Matrangolo
Hi,

I'm trying to write a function query that needs to stick a score [0..1] to
each doc in the search results based on some logic applied to a
multi-valued field in the document.

This is an excerpt of the schema:



And this is how it looks in a generic document on the index:

"sku_store": [ "women",
"women-apparel" ]

I need to access this multi-valued field at query time in a function query
that then I use to boost the search results.

I'm using the following code that is the best I have found to access a
multi-valued field inside a *ValueSource*:

  override def getValues(context: JMap[_, _], readerContext:
LeafReaderContext): FunctionValues =
new DoubleDocValues(this) {
  val userGuidValue: FunctionValues = userGuidSource.getValues(context,
readerContext)
  *val skuStores: FunctionValues = new
SortedSetFieldSource("sku_store").getValues(context, readerContext)*

  override def doubleVal(docId: Int): Double = {
// Fail fast and loud if the client is sending invalid GUIDs
val userGuid: UUID = UUID.fromString(userGuidValue.strVal(docId))
*val store: String = skuStores.strVal(docId) // How to get an
Iterable[String] ???*

// Logic to compute the score based on userGuid and store
  }
}

I have searched the documentation but using a *SortedSetFieldSource* and
invoking the *strVal(docId)* on it is the best I have found to access the
content of a multi-valued field.

The problem with this approach is that I can read only the *first *value
(in the above case would be 'women') skipping on the other one (that, of
course, is the one I'm mostly interested in).

What I'm looking for is something that will give me a way to *iterate *on
the values in the multi-valued `sku-store` field.

Any suggestion ?

Best
Ugo


JSON Facet on Multi-Valued fields

2016-12-23 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, is it possible to do JSON Facet on Multi-Valued fields?

I tried to do it, and I get the following error:

  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"can not use FieldCache on multivalued field: number_ds",
"code":400}}


I'm using Solr 6.2.1.

Regards,
Edwin


Re: facet on two multi-valued fields

2016-03-03 Thread Jan Høydahl
Hi,

BlockJoin with Parent/Child is your solution.
See http://yonik.com/solr-nested-objects/ and 
https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 3. mar. 2016 kl. 10.35 skrev Andreas Hubold <andreas.hub...@coremedia.com>:
> 
> Hi,
> 
> sorry, the subject may have been misleading. I want to get facet results for 
> only one field (tagIds) but restrict the returned values to those with a 
> matching tagDescription. Both multi-valued fields have the same order.
> 
> Example docs
> 
> id:"1"
> tagIds:["10","12","13"]
> tagDescriptions:["News", "Sport News", "Economy"]
> text:"... foo ..."
> 
> id:"2"
> tagIds:["14", "10"]
> tagDescriptions:["IT", "News"]
> text:"... foo ..."
> 
> Query
> q=text:foo
> =tagDescriptions:news
> =tagIds
> 
> IIRC, this would give me a facet result with values 10, 12, 13, 14 but I want 
> to restrict the result to 10, 12 (the ones with "News" in their 
> tagDescription)
> 
> I thought about using query-time join but am unsure about performance 
> implications (if there are many tags) and concrete usage.
> 
> Or is it possible to somehow put both tagIds and tagDescriptions into a 
> single multi-valued field?
> 
> Thank you,
> Andreas
> 
> 
> Jan Høydahl schrieb am 02.03.2016 um 22:52:
>> It makes no sense to facet on a “text_general” ananlyzed field. Can you give 
>> a concrete example with a few dummy docs and show some queries (do you query 
>> the tagDescription field?) and wanted facet output?
>> 
>> There may be several ways to solve the task, depending on the exact use 
>> case. One solution could be to use child documents.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 2. mar. 2016 kl. 17.30 skrev Andreas Hubold <andreas.hub...@coremedia.com>:
>>> 
>>> Hi,
>>> 
>>> my schema looks like this
>>> 
>>> 
>>> >> multiValued="true"/>
>>> >> stored="false" multiValued="true"/>
>>> 
>>> 
>>> I'd like to get the tagIds of documents with a certain tagDescription (and 
>>> text). However tagIds contains multiple ids in the same order as 
>>> tagDescription and simple faceting would return all. Is there a way to just 
>>> get the IDs of the tags with a matching description?
>>> 
>>> Or would you recommend some other schema?
>>> 
>>> Thanks,
>>> Andreas
>>> 
>>> 
>> 
> 



Re: facet on two multi-valued fields

2016-03-03 Thread Andreas Hubold

Hi,

sorry, the subject may have been misleading. I want to get facet results 
for only one field (tagIds) but restrict the returned values to those 
with a matching tagDescription. Both multi-valued fields have the same 
order.


Example docs

id:"1"
tagIds:["10","12","13"]
tagDescriptions:["News", "Sport News", "Economy"]
text:"... foo ..."

id:"2"
tagIds:["14", "10"]
tagDescriptions:["IT", "News"]
text:"... foo ..."

Query
q=text:foo
=tagDescriptions:news
=tagIds

IIRC, this would give me a facet result with values 10, 12, 13, 14 but I 
want to restrict the result to 10, 12 (the ones with "News" in their 
tagDescription)


I thought about using query-time join but am unsure about performance 
implications (if there are many tags) and concrete usage.


Or is it possible to somehow put both tagIds and tagDescriptions into a 
single multi-valued field?


Thank you,
Andreas


Jan Høydahl schrieb am 02.03.2016 um 22:52:

It makes no sense to facet on a “text_general” ananlyzed field. Can you give a 
concrete example with a few dummy docs and show some queries (do you query the 
tagDescription field?) and wanted facet output?

There may be several ways to solve the task, depending on the exact use case. 
One solution could be to use child documents.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


2. mar. 2016 kl. 17.30 skrev Andreas Hubold <andreas.hub...@coremedia.com>:

Hi,

my schema looks like this






I'd like to get the tagIds of documents with a certain tagDescription (and 
text). However tagIds contains multiple ids in the same order as tagDescription 
and simple faceting would return all. Is there a way to just get the IDs of the 
tags with a matching description?

Or would you recommend some other schema?

Thanks,
Andreas








Re: facet on two multi-valued fields

2016-03-02 Thread Jan Høydahl
It makes no sense to facet on a “text_general” ananlyzed field. Can you give a 
concrete example with a few dummy docs and show some queries (do you query the 
tagDescription field?) and wanted facet output?

There may be several ways to solve the task, depending on the exact use case. 
One solution could be to use child documents.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. mar. 2016 kl. 17.30 skrev Andreas Hubold :
> 
> Hi,
> 
> my schema looks like this
> 
> 
>  multiValued="true"/>
>  stored="false" multiValued="true"/>
> 
> 
> I'd like to get the tagIds of documents with a certain tagDescription (and 
> text). However tagIds contains multiple ids in the same order as 
> tagDescription and simple faceting would return all. Is there a way to just 
> get the IDs of the tags with a matching description?
> 
> Or would you recommend some other schema?
> 
> Thanks,
> Andreas
> 
> 



facet on two multi-valued fields

2016-03-02 Thread Andreas Hubold

Hi,

my schema looks like this


multiValued="true"/>
stored="false" multiValued="true"/>



I'd like to get the tagIds of documents with a certain tagDescription 
(and text). However tagIds contains multiple ids in the same order as 
tagDescription and simple faceting would return all. Is there a way to 
just get the IDs of the tags with a matching description?


Or would you recommend some other schema?

Thanks,
Andreas




faceting on correlated multi-valued fields?

2016-02-25 Thread Andreas Hubold

Hi,

I'm thinking about indexing articles with tags in a denormalized way as 
follows





multiValued="true"/>
stored="false" multiValued="true"/>


An article can have multiple tags. Each tag has a description and an ID. 
The multi-valued fields tagIds and tagDescriptions have the same length 
and order (tagDescriptions[5] is the description of the tag with ID 
tagId[5]).


Is there a good way to get the IDs of tags with some description (query 
on tagDescriptions) that are used for articles matching some query on 
field text?
I thought about faceting on tagIds but I don't know how to restrict the 
result to the ids with a given description.


Or would you use a different index schema for this use-case?

I'm still using Solr 4.10.4. Is this something that can be done more 
easily with newer versions?


Thanks for any hints!

Cheers,
Andreas





Re: AND operator in multi valued fields

2014-09-23 Thread Mikhail Khludnev
On Fri, Sep 19, 2014 at 12:45 PM, lboutros boutr...@gmail.com wrote:

 What do you think about developing a new SpanQuery class that allows cross
 field queries ?


indeed
http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: AND operator in multi valued fields

2014-09-23 Thread lboutros
That's excellent Mikhail !

Thanks so much.

I have to use it in my custom query parser now.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4160668.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-19 Thread lboutros
Thx Alex for your answer.

1) This could be tricky, because the application users write very complex
combined queries with main document fields and event fields too. A custom
parser does the abstraction. I think that could be very tricky to extract
event part of a complex query in order to filter on children events.

2) I did not thought about this solution ! Thanks, I will check the
feasibility.

3) You are right. I think that denormalizing will have too much impact on
performances during document updates and on the index size too.

4) You are right. Not an option here.

I have a 5th proposal :

What do you think about developing a new SpanQuery class that allows cross
field queries ? 

Ludovic




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159883.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-19 Thread Alexandre Rafalovitch
I do not think the queries have access to multiple fields at once. Did you
check the API?

But I am not sure why 1 would be so hard. You know what event field names
are, so you just need to copy their conditions into subquery. You could
probably even do that in custom search component and not even touch the
client code.

Regards,
 Alex
On 19/09/2014 4:45 am, lboutros boutr...@gmail.com wrote:

 Thx Alex for your answer.

 1) This could be tricky, because the application users write very complex
 combined queries with main document fields and event fields too. A custom
 parser does the abstraction. I think that could be very tricky to extract
 event part of a complex query in order to filter on children events.

 2) I did not thought about this solution ! Thanks, I will check the
 feasibility.

 3) You are right. I think that denormalizing will have too much impact on
 performances during document updates and on the index size too.

 4) You are right. Not an option here.

 I have a 5th proposal :

 What do you think about developing a new SpanQuery class that allows cross
 field queries ?

 Ludovic




 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159883.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: AND operator in multi valued fields

2014-09-19 Thread lboutros
I 've just finished a first implementation of a CrossFieldSpanNearQuery and
it just works perfectly :D

I can now play with position increments and slops to get exact results
within two multi valued fields.

And for the 1st proposal, my user queries can be bigger than 10k with lots
of different blocks and operators ;)

Now I have to use this query in my custom query parser.

Thx a lot Alex,

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-19 Thread Alexandre Rafalovitch
Well, if it works, open source it. Could even become an official
contribution. You are not the only one asking for this kind of
features. Though your use case does seem to be a bit further out than
most.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 19 September 2014 09:34, lboutros boutr...@gmail.com wrote:
 I 've just finished a first implementation of a CrossFieldSpanNearQuery and
 it just works perfectly :D

 I can now play with position increments and slops to get exact results
 within two multi valued fields.

 And for the 1st proposal, my user queries can be bigger than 10k with lots
 of different blocks and operators ;)

 Now I have to use this query in my custom query parser.

 Thx a lot Alex,

 Ludovic.



 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159942.html
 Sent from the Solr - User mailing list archive at Nabble.com.


AND operator in multi valued fields

2014-09-18 Thread lboutros
Dear all,

let's say you have two multivalued fields with two different complex
analyzers in a quite complex schema.

I would like to match specific combinations of values in these fields.

For instance :

Field1 : Value1, Value2
Field2 : Value3, Value4

I would like to match this document with a query like this one :

+Field1:Value1 +Field2:Value3

But not with this one :

+Field1:Value1 +Field2:Value4

I tried to check the PayloadNearQuery class but this class cannot use two
different fields (due to the SpanNearQuery inheritance).

Is there an easy way to do that ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-18 Thread Alexandre Rafalovitch
Both queries seem valid. The values are there and you asking to match
them. They both should match.

Can you explain how query 2 is actually different from query 1? Are
you saying you want to match 1st value with 1st value (like positional
constraints?).

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 18 September 2014 11:24, lboutros boutr...@gmail.com wrote:
 Dear all,

 let's say you have two multivalued fields with two different complex
 analyzers in a quite complex schema.

 I would like to match specific combinations of values in these fields.

 For instance :

 Field1 : Value1, Value2
 Field2 : Value3, Value4

 I would like to match this document with a query like this one :

 +Field1:Value1 +Field2:Value3

 But not with this one :

 +Field1:Value1 +Field2:Value4

 I tried to check the PayloadNearQuery class but this class cannot use two
 different fields (due to the SpanNearQuery inheritance).

 Is there an easy way to do that ?

 Ludovic.



 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-18 Thread lboutros
Alexandre Rafalovitch wrote
 Are you saying you want to match 1st value with 1st value (like positional
 constraints?).

That's exactly what I would like to do. :)




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-18 Thread Alexandre Rafalovitch
Do you know the position when you are doing the search? Or just that
they need to be parallel within their tokenized groups?

Regards,
   Alex.
P.s. It may help if you explain a business level issue here. There
might be a completely different approach to that as well.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 18 September 2014 11:42, lboutros boutr...@gmail.com wrote:
 Alexandre Rafalovitch wrote
 Are you saying you want to match 1st value with 1st value (like positional
 constraints?).

 That's exactly what I would like to do. :)




 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159728.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-18 Thread lboutros
Thx Alex.

We have main documents in the index. (more than 100 complex fields).

Each document can have events attached.

An event contains 4 fields with 3 different analyzers.

We need more than just filtering on them (highlighting on documents and
events at the same time for instance).
That means that nested documents cannot be used.

These events are indexed as additional multi valued fields in each
documents.
They are searched like any other field.

The issue here is that the operator 'AND' between event fields can match
false positives.

We do not know the position during search. We just want to respect the event
integrity in the search. So you are right, we just want them to be parallel
within their tokenized groups ? 

The first idea was to index the event in only one field and use
proximity/phrase search in order to prevent false positives.

But that means that we need to index dates, ids and text in one unique
field.

Do you think this could be a better/easier approach ?

Ludovic



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AND operator in multi valued fields

2014-09-18 Thread Alexandre Rafalovitch
Well, I can think of four ways, increasingly complicated.

1) You could have both parent record with unzipped events and also
child events as individual documents. Then, you do filtering based on
children and highlighting based on parent documents.

2) The other way is to have a custom post filter that looks at the
matches and discards the ones that have different offset (by using
very large positionIncrementGap to create clear group boundaries). But
I don't know whether you can access the match token offsets in the
post filter, so this is more of a thought experiment.

3) You could also duplicate main field contents and be the document
one per event. If most of the fields are indexed, it's ok and no real
duplication. But you may need to store fields for highlighter and
those are not de-duplicated internally, as far as I know.

4) You could create zipped pairs of values in a dedicated field and
search that as near-queries. But than you do have to have the same
analyzer for all members. Sounds like this may not be an option for
you.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 18 September 2014 16:41, lboutros boutr...@gmail.com wrote:
 Thx Alex.

 We have main documents in the index. (more than 100 complex fields).

 Each document can have events attached.

 An event contains 4 fields with 3 different analyzers.

 We need more than just filtering on them (highlighting on documents and
 events at the same time for instance).
 That means that nested documents cannot be used.

 These events are indexed as additional multi valued fields in each
 documents.
 They are searched like any other field.

 The issue here is that the operator 'AND' between event fields can match
 false positives.

 We do not know the position during search. We just want to respect the event
 integrity in the search. So you are right, we just want them to be parallel
 within their tokenized groups ?

 The first idea was to index the event in only one field and use
 proximity/phrase search in order to prevent false positives.

 But that means that we need to index dates, ids and text in one unique
 field.

 Do you think this could be a better/easier approach ?

 Ludovic



 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AND-operator-in-multi-valued-fields-tp4159715p4159797.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-18 Thread David Parks
If I understand the reading, you've suggested that I index the vendor names
as their own document (currently this is a multi-valued field of each
document).

Each such vendor document would just have a single valued 'name' field.

Each normal product document would contain a multi-valued field that is a
list of vendor document IDs and that we use to join the query results with
the vendor documents.

I presume this means that I would have some kind of dynamic field created
from the join which I could use as the 'group.field' value? 

I didn't quite follow the last point.



-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, January 18, 2013 9:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?

Hi,

Instead of the multi-valued fields, would parent-child setup for you here?

See http://search-lucene.com/?q=solr+joinfc_type=wiki

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote:

 The documents are individual products which come from 1 or more vendors.
 Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
 Most fields are multi valued (short_description from each of the 2 
 vendors, long_description, product_name, vendor, etc. the same).

 I'd like to collapse on the vendor in an attempt to ensure that vast 
 collections of books, music, and movies, by just a few vendors, don't 
 overwhelm the results simply due to the fact that they have every 
 search term imaginable due to the sheer volume of books, CDs, and 
 DVDs, in relation to other product items.

 But in this case there is clearly 1...N vendors per document, solidly 
 a multi-valued field. And it's hard to put a maximum number of vendors 
 possible.

 Thanks,
 Dave


 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
 Sent: Friday, January 18, 2013 2:32 AM
 To: solr-user
 Subject: Re: Field Collapsing - Anything in the works for multi-valued 
 fields?

 David,

 What's the documents and the field? It can help to suggest workaround.


 On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com
 wrote:

  I want to configure Field Collapsing, but my target field is 
  multi-valued (e.g. the field I want to group on has a variable # of 
  entries per document, 1-N entries).
 
  I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) 
  that grouping doesn't support multi-valued fields yet.
 
  Anything in the works on that front by chance?  Any common work-arounds?
 
 
 


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com





Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
I want to configure Field Collapsing, but my target field is multi-valued
(e.g. the field I want to group on has a variable # of entries per document,
1-N entries).

I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
grouping doesn't support multi-valued fields yet.

Anything in the works on that front by chance?  Any common work-arounds?




Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Mikhail Khludnev
David,

What's the documents and the field? It can help to suggest workaround.


On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote:

 I want to configure Field Collapsing, but my target field is multi-valued
 (e.g. the field I want to group on has a variable # of entries per
 document,
 1-N entries).

 I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
 grouping doesn't support multi-valued fields yet.

 Anything in the works on that front by chance?  Any common work-arounds?





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
The documents are individual products which come from 1 or more vendors.
Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
Most fields are multi valued (short_description from each of the 2 vendors,
long_description, product_name, vendor, etc. the same).

I'd like to collapse on the vendor in an attempt to ensure that vast
collections of books, music, and movies, by just a few vendors, don't
overwhelm the results simply due to the fact that they have every search
term imaginable due to the sheer volume of books, CDs, and DVDs, in relation
to other product items.

But in this case there is clearly 1...N vendors per document, solidly a
multi-valued field. And it's hard to put a maximum number of vendors
possible.

Thanks,
Dave


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Friday, January 18, 2013 2:32 AM
To: solr-user
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?

David,

What's the documents and the field? It can help to suggest workaround.


On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote:

 I want to configure Field Collapsing, but my target field is 
 multi-valued (e.g. the field I want to group on has a variable # of 
 entries per document, 1-N entries).

 I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that 
 grouping doesn't support multi-valued fields yet.

 Anything in the works on that front by chance?  Any common work-arounds?





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Otis Gospodnetic
Hi,

Instead of the multi-valued fields, would parent-child setup for you here?

See http://search-lucene.com/?q=solr+joinfc_type=wiki

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote:

 The documents are individual products which come from 1 or more vendors.
 Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
 Most fields are multi valued (short_description from each of the 2 vendors,
 long_description, product_name, vendor, etc. the same).

 I'd like to collapse on the vendor in an attempt to ensure that vast
 collections of books, music, and movies, by just a few vendors, don't
 overwhelm the results simply due to the fact that they have every search
 term imaginable due to the sheer volume of books, CDs, and DVDs, in
 relation
 to other product items.

 But in this case there is clearly 1...N vendors per document, solidly a
 multi-valued field. And it's hard to put a maximum number of vendors
 possible.

 Thanks,
 Dave


 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
 Sent: Friday, January 18, 2013 2:32 AM
 To: solr-user
 Subject: Re: Field Collapsing - Anything in the works for multi-valued
 fields?

 David,

 What's the documents and the field? It can help to suggest workaround.


 On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com
 wrote:

  I want to configure Field Collapsing, but my target field is
  multi-valued (e.g. the field I want to group on has a variable # of
  entries per document, 1-N entries).
 
  I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
  grouping doesn't support multi-valued fields yet.
 
  Anything in the works on that front by chance?  Any common work-arounds?
 
 
 


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com




Re: Solr 3.6 issue - DataImportHandler with CachedSqlEntityProcessor not importing all multi-valued fields

2012-07-04 Thread Mikhail Khludnev
It's hard to troubleshoot without debug logs. Pls pay attention that
regular configuration for CachedSqlEP is slightly different

http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
see

  where=xid=x.id



On Wed, Jun 27, 2012 at 2:29 AM, ps_sra praveens1...@yahoo.com wrote:

 Not sure if this is the right forum to post this question.  If not, please
 excuse.

 I'm trying to use the DataImportHandler with
 processor=CachedSqlEntityProcessor to speed up import from an RDBMS.
 While
 processor=CachedSqlEntityProcessor is much faster than
 processor=SqlEntityProcessor, the resulting Solr index does not contain
 multi-valued fields on sub-entities.

 So, for example, my db-data-config.xml has the following structure:

 document
 ..
 entity name=foo  pk=id

 processor=SqlEntityProcessor
 query=SELECT
 f.id AS foo_id,

   f.name AS foo_name
  FROM
   foo f
  
 field column=foo_id name=foo_id /
 field column=foo_name name=foo_name /


 entity name=bar
 processor=CachedSqlEntityProcessor

 query=SELECT   b.name as bar_name

   FROMbar b

  WHEREb.id = '${foo.id}'
 
  field column=bar_name name=bar_name
 /
 /entity

 /entity
 ..
 /document

 where the database relationship foo:bar is 1:m.

 The issue is that when I import with processor=SqlEntityProcessor ,
 everything works fine and the multi-valued field - bar_name has multiple
 values, while importing with processor=CachedSqlEntityProcessor does not
 even create the bar_name field in the index.

 I've deployed Solr 3.6 on Weblogic 11g, with the patch
 https://issues.apache.org/jira/browse/SOLR-3360 applied.

 Any help on this issue is appreciated.


 Thanks,
 ps

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-3-6-issue-DataImportHandler-with-CachedSqlEntityProcessor-not-importing-all-multi-valued-fields-tp3991449.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Solr 3.6 issue - DataImportHandler with CachedSqlEntityProcessor not importing all multi-valued fields

2012-06-26 Thread ps_sra
Not sure if this is the right forum to post this question.  If not, please
excuse.

I'm trying to use the DataImportHandler with
processor=CachedSqlEntityProcessor to speed up import from an RDBMS. While
processor=CachedSqlEntityProcessor is much faster than
processor=SqlEntityProcessor, the resulting Solr index does not contain
multi-valued fields on sub-entities. 

So, for example, my db-data-config.xml has the following structure:

document
..
entity name=foo  pk=id 

processor=SqlEntityProcessor  
query=SELECT   f.id AS 
foo_id, 

f.name AS foo_name
 FROM   
foo f 
   

field column=foo_id name=foo_id / 

field column=foo_name name=foo_name /



entity name=bar processor=CachedSqlEntityProcessor 

query=SELECT   b.name as bar_name

FROMbar b

   WHEREb.id = '${foo.id}' 

 field column=bar_name name=bar_name /
/entity

/entity
..
/document

where the database relationship foo:bar is 1:m.

The issue is that when I import with processor=SqlEntityProcessor ,
everything works fine and the multi-valued field - bar_name has multiple
values, while importing with processor=CachedSqlEntityProcessor does not
even create the bar_name field in the index.

I've deployed Solr 3.6 on Weblogic 11g, with the patch
https://issues.apache.org/jira/browse/SOLR-3360 applied. 

Any help on this issue is appreciated.


Thanks,
ps

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-issue-DataImportHandler-with-CachedSqlEntityProcessor-not-importing-all-multi-valued-fields-tp3991449.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to expand list into multi-valued fields?

2012-05-01 Thread invisbl
I am indexing content from a RDBMS. I have a column in a table with pipe
separated values, and upon indexing I would like to transform these values
into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to expand list into multi-valued fields?

2012-05-01 Thread Jeevanandam

here you go

specify regex transformer in entity tag of DIH config xml like below

entity 
transformer=RegexTransformer ... /

and then

field column=ColumnA name=FruitField splitBy=\| /

That's it!

- Jeevanandam


On 02-05-2012 12:35 am, invisbl wrote:
I am indexing content from a RDBMS. I have a column in a table with 
pipe
separated values, and upon indexing I would like to transform these 
values

into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context:

http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: frange with multi-valued fields

2012-02-03 Thread Chris Hostetter

: Has anyone had experience using frange with multi-valued fields?  In 
: solr 3.5 doing so results in the error: can not use FieldCache on 
: multivalued field

correct.

: Here's the use case.  We have multiple years attached to each document 
: and want to be able to refine by a year range.  We're currently using 
: the standard range query syntax [ 1900 TO 1910 ] which works, but those 
: queries are slower than we would like.  I've seen reports that using 
: frange can greatly improve performance.  
: http://solr.pl/en/2011/05/30/quick-look-frange/

note that there is a mistake in the Faster implementation 
column of performance table on that article .. the actaul data (and hte 
paragraph after the table) indicate that...

standard range query is faster only for queries that cover a
small number of terms from the given field. 

Yonik got similar results when he did testing on range queries over 
strings, but the specifics on where the cut-off point was were slightly 
different...

https://yonik.wordpress.com/2009/07/06/ranges-over-functions-in-solr-1-4/

In general you'd have to test it, but for things like years 
unless you are dealing really big spans of time (ie: 
[1901 TO 20]) and will have ranges that are generally large relative 
the total span of data you are dealing with, i seriously  doubt fgrange 
would be much faster for you if you had a single valued fields -- and the 
bottom line is frange won't work with multivalued fields.

forget about frange for a moment, and tell us more about your specific 
sitaution. to start with: what field configuration are you using right now 
for your year field? specificly are you using TrieIntField? have you 
tried tunning the options on it? how many unique year values are in your 
corpus? how big to your ranges usually get?



https://people.apache.org/~hossman/#xyproblem
Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss


frange with multi-valued fields

2012-01-20 Thread Russell Black
Has anyone had experience using frange with multi-valued fields?  In solr 3.5 
doing so results in the error: can not use FieldCache on multivalued field  

Here's the use case.  We have multiple years attached to each document and want 
to be able to refine by a year range.  We're currently using the standard range 
query syntax [ 1900 TO 1910 ] which works, but those queries are slower than we 
would like.  I've seen reports that using frange can greatly improve 
performance.  http://solr.pl/en/2011/05/30/quick-look-frange/  

From what I can tell, there seem to be efforts in 4.0 to allow functions to 
work on multivalued fields.  Does anyone know for sure?

Thanks,

Russ

Multi-word searches in multi-valued fields

2011-09-22 Thread Olson, Ron
Hi all-

I'm not clear on how to allow a user to search a multi-valued field with 
multiple words and return only those documents where all the words are together 
in one value, and not spread over multiple values.

If I do a literal search on the company name field for smith trucking (with 
the quotes), then it works because it's looking for only smith trucking, and 
it finds it, great. However, if I put in trucking smith, then I get no 
results. If I try using something like (+trucking +smith), then I get documents 
where one document might have joe's trucking and bob smith in the resulting 
array of names.

So I guess what I need is an exact match, regardless of word positioning (i.e. 
smith trucking and trucking smith should find only those documents that 
have that those two words in one value of the resulting array).

I've been going through the wiki and it seems like this is probably a 
super-simple thing, but I'm clearly just not getting it; I just can't figure 
out the right syntax to make this work.

Thanks for any info.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Multi-word searches in multi-valued fields

2011-09-22 Thread Otis Gospodnetic
Ron,

Try smith trucking~N  where N is a number like 1 or 2 or 3 ... it's called 
phrase 
slop: http://search-lucene.com/?q=phrase+slopfc_project=Lucenefc_project=Solr

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
 From: Olson, Ron rol...@lbpc.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc: 
 Sent: Thursday, September 22, 2011 10:49 AM
 Subject: Multi-word searches in multi-valued fields
 
 Hi all-
 
 I'm not clear on how to allow a user to search a multi-valued field with 
 multiple words and return only those documents where all the words are 
 together 
 in one value, and not spread over multiple values.
 
 If I do a literal search on the company name field for smith 
 trucking (with the quotes), then it works because it's looking for 
 only smith trucking, and it finds it, great. However, if I put in 
 trucking smith, then I get no results. If I try using something like 
 (+trucking +smith), then I get documents where one document might have 
 joe's trucking and bob smith in the resulting array 
 of names.
 
 So I guess what I need is an exact match, regardless of word positioning 
 (i.e. 
 smith trucking and trucking smith should find only those 
 documents that have that those two words in one value of the resulting array).
 
 I've been going through the wiki and it seems like this is probably a 
 super-simple thing, but I'm clearly just not getting it; I just can't 
 figure out the right syntax to make this work.
 
 Thanks for any info.
 
 Ron
 
 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in 
 error, please notify the sender immediately by reply e-mail and permanently 
 delete and destroy this message and its attachments, along with any copies 
 thereof. This message does not create any contractual obligation on behalf of 
 the sender or Law Bulletin Publishing Company.
 Thank you.



Re: unique terms and multi-valued fields

2011-08-13 Thread Erick Erickson
Here's a very useful page for looking at what index size means.
http://lucene.apache.org/java/3_0_2/fileformats.html#file-names
Note that the files having to do with stored data (e.g. *.fdt) have very
little impact on searching, they don't consume very many valuable
resources.

The stored=true-related files *do* have an impact on replication, and
perhaps assembling the results pages though

One bit of clarification about the indexed portion of the files. The
terms are stored once, but each term has the doc IDs associated
with it, so even though the term is only there once, having it appear
in multiple documents will increase the size because of having to
store the document associations

Best
Erick

On Thu, Aug 11, 2011 at 4:30 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Thant makes sense. There are actually stored fields. I was mostly just trying 
 to figure out how much my index size might grow. These fields I am dealing 
 with are large and repetitive (but mixed).


 
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org; Kevin Osborn osbo...@yahoo.com
 Sent: Wednesday, August 10, 2011 7:08 AM
 Subject: Re: unique terms and multi-valued fields

 Well, it depends (tm).

 If you're talking about *indexed* terms, then the value is stored only
 once in both the cases you mentioned below. There's really very little
 difference between a non-multi-valued field and a multi-valued field
 in terms of how it's stored in the searchable portion of the index,
 except for some position information.

 So, having an XML doc with a single-valued field

 field name=categorycomputers laptops/field

 is almost identical (except for position info as positionIncrementGap) as a

 field name=categorycomputers/field
 field name=categorylaptops/field

 multiValued refers to the *input*, not whether more than one word is
 allowed in that field.


 Now, about *stored* fields. If you store the data, verbatim copies are
 kept in the
 storage-specific files in each segment, and the values will be on disk for
 each document.

 But you probably don't care much because this data is only referenced when you
 assemble a document for return to the client, it's irrelevant for searching.

 Best
 Erick

 On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Please verify my understanding. I have a field called category and it has 
 a value computers. If I use this same field and value for all of my 
 documents, it is really only stored on disk once because 
 category:computers is a unique term. Is this correct?

 But, what about multi-valued fields. So, I have a field called category. 
 For 100 documents, it has the values computers and laptops. For 100 
 other documents, it has the values computers and tablets. Is this stored 
 as category:computers, category:laptops, category:tablets, meaning 3 
 unique terms. Or is it stored as category:computers,laptops and 
 category:computers,tablets. I believe it is the first case (hopefully), 
 but I am not sure.

 Thanks.


Re: unique terms and multi-valued fields

2011-08-11 Thread Kevin Osborn
Thant makes sense. There are actually stored fields. I was mostly just trying 
to figure out how much my index size might grow. These fields I am dealing with 
are large and repetitive (but mixed).



From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Kevin Osborn osbo...@yahoo.com
Sent: Wednesday, August 10, 2011 7:08 AM
Subject: Re: unique terms and multi-valued fields

Well, it depends (tm).

If you're talking about *indexed* terms, then the value is stored only
once in both the cases you mentioned below. There's really very little
difference between a non-multi-valued field and a multi-valued field
in terms of how it's stored in the searchable portion of the index,
except for some position information.

So, having an XML doc with a single-valued field

field name=categorycomputers laptops/field

is almost identical (except for position info as positionIncrementGap) as a

field name=categorycomputers/field
field name=categorylaptops/field

multiValued refers to the *input*, not whether more than one word is
allowed in that field.


Now, about *stored* fields. If you store the data, verbatim copies are
kept in the
storage-specific files in each segment, and the values will be on disk for
each document.

But you probably don't care much because this data is only referenced when you
assemble a document for return to the client, it's irrelevant for searching.

Best
Erick

On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Please verify my understanding. I have a field called category and it has a 
 value computers. If I use this same field and value for all of my 
 documents, it is really only stored on disk once because category:computers 
 is a unique term. Is this correct?

 But, what about multi-valued fields. So, I have a field called category. 
 For 100 documents, it has the values computers and laptops. For 100 other 
 documents, it has the values computers and tablets. Is this stored as 
 category:computers, category:laptops, category:tablets, meaning 3 
 unique terms. Or is it stored as category:computers,laptops and 
 category:computers,tablets. I believe it is the first case (hopefully), but 
 I am not sure.

 Thanks.

Re: unique terms and multi-valued fields

2011-08-10 Thread Erick Erickson
Well, it depends (tm).

If you're talking about *indexed* terms, then the value is stored only
once in both the cases you mentioned below. There's really very little
difference between a non-multi-valued field and a multi-valued field
in terms of how it's stored in the searchable portion of the index,
except for some position information.

So, having an XML doc with a single-valued field

field name=categorycomputers laptops/field

is almost identical (except for position info as positionIncrementGap) as a

field name=categorycomputers/field
field name=categorylaptops/field

multiValued refers to the *input*, not whether more than one word is
allowed in that field.


Now, about *stored* fields. If you store the data, verbatim copies are
kept in the
storage-specific files in each segment, and the values will be on disk for
each document.

But you probably don't care much because this data is only referenced when you
assemble a document for return to the client, it's irrelevant for searching.

Best
Erick

On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Please verify my understanding. I have a field called category and it has a 
 value computers. If I use this same field and value for all of my 
 documents, it is really only stored on disk once because category:computers 
 is a unique term. Is this correct?

 But, what about multi-valued fields. So, I have a field called category. 
 For 100 documents, it has the values computers and laptops. For 100 other 
 documents, it has the values computers and tablets. Is this stored as 
 category:computers, category:laptops, category:tablets, meaning 3 
 unique terms. Or is it stored as category:computers,laptops and 
 category:computers,tablets. I believe it is the first case (hopefully), but 
 I am not sure.

 Thanks.


Re: PositionIncrement gap and multi-valued fields.

2011-08-09 Thread Marco Martinez
Hi Luis,

As far as i know, the position increment gap only affects in some queries,
like phrase queries if you use the slop. The position incremente gap does
not affect  the similarity scoring formula of lucene :

score(q,d)   =
coord(q,d)http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_coord
  ·  
queryNorm(q)http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_queryNorm
  · ∑( tf(t in 
d)http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_tf
  ·  
idf(t)http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_idf
2  ·  
t.getBoost()http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_termBoost
 ·  
norm(t,d)http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#formula_norm
 )t in q*Lucene Practical Scoring Function*
*
*
*
*
The two first arguments are related to normalizes the queries. In the
summation, the two first arguments are related to the frequency of the term,
in the document and in the index, the third one is the boost of the term in
the query, and the final one, encapsulates a few (indexing time) boost and
length factors, but the lengths factor are calculated with the number of
terms so the position increment gap doesnt make more tokens, so this factor
neither affect the score.

But if you use, for example a multivalue field, with a position incremente
gap of 100, if you do a query with a slop less than 100, you prevent to have
matches between two separated values of this field, ex:

q=test:A B~99

doc1
field test position increment gap=100
strA/str
strB/str

You dont get any matches for this doc, but if you do this query q=test:A
B~101 you will get the doc1 as a match.


Bye!


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/8/8 Luis Cappa Banda luisca...@gmail.com

 Hello!

 I have a doubt about the behaviour of searching over field types that have
 positionIncrementGap defined. For example, supose that:


   1. We have a field called test defined as multi-valued and white space
   tokenized.
   2. The index has an single document with a test value:

 str
 TEST1
 /str
 str
 AAA BBB
 /str
 str
 CCC DDD
 /str
 str
 EEE FFF
 /str
 str
 TEST2
 /str


 I read that positionIncrementGap defines the virtual space between the last
 token of one field instance and the first token of the next instance
 (source:

 http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html
 ).
 When it says last token of one field instance means that is the last
 token
 of the first entry from the multi-valued content? In our example before it
 will be TEST1.

 Anyway, I've been doing some tests modifying the positionIncrementGap value
 with high values and low values. Can anybody explain me with detail which
 implications has in Solr scoring algorythm an upper and a lower value? I
 would like to understand how this value affects matching results in fields
 and also calculating the final score (maybe more gap implies more spaces
 and
 a worst score when the value matches, etc.).

 Thank you for reading so far!



unique terms and multi-valued fields

2011-08-09 Thread Kevin Osborn
Please verify my understanding. I have a field called category and it has a 
value computers. If I use this same field and value for all of my documents, 
it is really only stored on disk once because category:computers is a unique 
term. Is this correct?

But, what about multi-valued fields. So, I have a field called category. For 
100 documents, it has the values computers and laptops. For 100 other 
documents, it has the values computers and tablets. Is this stored as 
category:computers, category:laptops, category:tablets, meaning 3 unique 
terms. Or is it stored as category:computers,laptops and 
category:computers,tablets. I believe it is the first case (hopefully), but I 
am not sure.

Thanks.

PositionIncrement gap and multi-valued fields.

2011-08-08 Thread Luis Cappa Banda
Hello!

I have a doubt about the behaviour of searching over field types that have
positionIncrementGap defined. For example, supose that:


   1. We have a field called test defined as multi-valued and white space
   tokenized.
   2. The index has an single document with a test value:

str
TEST1
/str
str
AAA BBB
/str
str
CCC DDD
/str
str
EEE FFF
/str
str
TEST2
/str


I read that positionIncrementGap defines the virtual space between the last
token of one field instance and the first token of the next instance
(source:
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html).
When it says last token of one field instance means that is the last token
of the first entry from the multi-valued content? In our example before it
will be TEST1.

Anyway, I've been doing some tests modifying the positionIncrementGap value
with high values and low values. Can anybody explain me with detail which
implications has in Solr scoring algorythm an upper and a lower value? I
would like to understand how this value affects matching results in fields
and also calculating the final score (maybe more gap implies more spaces and
a worst score when the value matches, etc.).

Thank you for reading so far!


RE: Joining on multi valued fields

2011-08-04 Thread matthew . fowler
Hi Yonik

So I tested the join using the sample data below and the latest trunk. I still 
got the same behaviour.

HOWEVER! In this case it was nothing to do with the patch or solr version. It 
was the tokeniser splitting G1 into G and 1.

So thank you for a nice patch and your suggestions.

I do have a couple of questions for you: At what level does the join happen and 
what do you expect the performance penalty to be. We might use this extensively 
if the performance penalty isn't great.

Thanks again,

Matt

-Original Message-
From: Fowler, Matthew (Markets Eikon) 
Sent: 03 August 2011 15:04
To: yo...@lucidimagination.com
Cc: solr-user@lucene.apache.org
Subject: RE: Joining on multi valued fields

No I haven't. I will get the latest out of the trunk and report back.

Cheers again,

Matt

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 03 August 2011 14:51
To: Fowler, Matthew (Markets Eikon)
Cc: solr-user@lucene.apache.org
Subject: Re: Joining on multi valued fields

Hmmm, if these are real responses from a solr server at rest (i.e.
documents not being changed between queries) then what you show
definitely looks like a bug.
That's interesting, since TestJoin implements a random test that
should cover cases like this pretty well.

I assume you are using a version of trunk (4.0-dev) and not just the
actual attached to the JIRA issue (which IIRC had at least one bug...
SOLR-2521).
Have you tried a more recent version of trunk?

-Yonik
http://www.lucidimagination.com



On Wed, Aug 3, 2011 at 7:00 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 Sorry for my late reply. I have been trying to get to the bottom of this
 but I'm getting inconsistent behaviour. Here's an example:

 Query = pi:rcs100     -       Here going to use pid_rcs as join
 value

 result name=response numFound=1 start=0
  doc
  str name=pircs100/str
  str name=ctrcs/str
  str name=pid_rcsG1/str
  str name=name_rcsEmerging Market Countries/str
  str name=definition_rcsAll business events relating to companies
 and other issuers of securities./str
  /doc
  /result
  /response

 Query = code:G1       -       See how many docs have G1 in their
 code field. Notice that code is multi valued

 - result name=response numFound=2 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
  /result
  /response

 Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
 from=pid_rcs to=code}pi:rcs100

 - result name=response numFound=3 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:58Z/date
  str name=pinCN1763203+1029782/str
 - arr name=code
  strA2/str
  strA5/str
  strA9/str
  strAN/str
  strB125/str
  strB126/str
  strB130/str
  strBL63/str
  strG41/str
  strGK/str
  strMZ/str
  /arr
  /doc
  /result
  /response

 So as you can see I get back 3 results when only 2 match the criteria.
 i.e. docs where G1 is present in multi valued code field. Why should
 the last document be included in the result of the join?

 Thank you,

 Matt


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 01 August 2011 18:28
 To: solr-user@lucene.apache.org
 Subject: Re: Joining on multi valued fields

 On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com
 wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

 That should work (and the unit tests do test with multi-valued fields).
 Can you come up with a simple example where you are not getting the
 expected results?

 -Yonik
 http://www.lucidimagination.com

 This email was sent to you by Thomson Reuters, the global news and 
 information company. Any views expressed in this message are those of the 
 individual sender, except where the sender specifically states them to be the 
 views of Thomson Reuters.


This email was sent to you by Thomson Reuters, the global news and information

Re: Joining on multi valued fields

2011-08-04 Thread Yonik Seeley
On Thu, Aug 4, 2011 at 11:21 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 So I tested the join using the sample data below and the latest trunk. I 
 still got the same behaviour.

 HOWEVER! In this case it was nothing to do with the patch or solr version. It 
 was the tokeniser splitting G1 into G and 1.

Ah, glad you figured it out!

 So thank you for a nice patch and your suggestions.

 I do have a couple of questions for you: At what level does the join happen 
 and what do you expect the performance penalty to be. We might use this 
 extensively if the performance penalty isn't great.

With the current implementation, the performance is proportional to
the number of unique terms in the fields being joined.

-Yonik
http://www.lucidimagination.com


RE: Joining on multi valued fields

2011-08-03 Thread matthew . fowler
Hi Yonik

Sorry for my late reply. I have been trying to get to the bottom of this
but I'm getting inconsistent behaviour. Here's an example:

Query = pi:rcs100 -   Here going to use pid_rcs as join
value

result name=response numFound=1 start=0
 doc
  str name=pircs100/str 
  str name=ctrcs/str 
  str name=pid_rcsG1/str 
  str name=name_rcsEmerging Market Countries/str 
  str name=definition_rcsAll business events relating to companies
and other issuers of securities./str 
  /doc
  /result
  /response

Query = code:G1   -   See how many docs have G1 in their
code field. Notice that code is multi valued 

- result name=response numFound=2 start=0
- doc
  str name=ctcat/str 
  date name=maindocdate2011-04-22T05:48:57Z/date 
  str name=pinCIF3wGpXk+1029782/str 
- arr name=code
  strG1/str 
  strG7U/str 
  strGK/str 
  strME7/str 
  strME8/str 
  strMN/str 
  strMR/str 
  /arr
  /doc
- doc
  str name=ctcat/str 
  date name=maindocdate2011-04-22T05:48:57Z/date 
  str name=pinCIF7YcLP+1029782/str 
- arr name=code
  strG1/str 
  strG7U/str 
  strGK/str 
  strME7/str 
  strME8/str 
  strMN/str 
  strMR/str 
  /arr
  /doc
  /result
  /response

Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
from=pid_rcs to=code}pi:rcs100

- result name=response numFound=3 start=0
- doc
  str name=ctcat/str 
  date name=maindocdate2011-04-22T05:48:57Z/date 
  str name=pinCIF3wGpXk+1029782/str 
- arr name=code
  strG1/str 
  strG7U/str 
  strGK/str 
  strME7/str 
  strME8/str 
  strMN/str 
  strMR/str 
  /arr
  /doc
- doc
  str name=ctcat/str 
  date name=maindocdate2011-04-22T05:48:57Z/date 
  str name=pinCIF7YcLP+1029782/str 
- arr name=code
  strG1/str 
  strG7U/str 
  strGK/str 
  strME7/str 
  strME8/str 
  strMN/str 
  strMR/str 
  /arr
  /doc
- doc
  str name=ctcat/str 
  date name=maindocdate2011-04-22T05:48:58Z/date 
  str name=pinCN1763203+1029782/str 
- arr name=code
  strA2/str 
  strA5/str 
  strA9/str 
  strAN/str 
  strB125/str 
  strB126/str 
  strB130/str 
  strBL63/str 
  strG41/str 
  strGK/str 
  strMZ/str 
  /arr
  /doc
  /result
  /response

So as you can see I get back 3 results when only 2 match the criteria.
i.e. docs where G1 is present in multi valued code field. Why should
the last document be included in the result of the join?

Thank you,

Matt


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: 01 August 2011 18:28
To: solr-user@lucene.apache.org
Subject: Re: Joining on multi valued fields

On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com
wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

That should work (and the unit tests do test with multi-valued fields).
Can you come up with a simple example where you are not getting the
expected results?

-Yonik
http://www.lucidimagination.com

This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.


Re: Joining on multi valued fields

2011-08-03 Thread Yonik Seeley
Hmmm, if these are real responses from a solr server at rest (i.e.
documents not being changed between queries) then what you show
definitely looks like a bug.
That's interesting, since TestJoin implements a random test that
should cover cases like this pretty well.

I assume you are using a version of trunk (4.0-dev) and not just the
actual attached to the JIRA issue (which IIRC had at least one bug...
SOLR-2521).
Have you tried a more recent version of trunk?

-Yonik
http://www.lucidimagination.com



On Wed, Aug 3, 2011 at 7:00 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 Sorry for my late reply. I have been trying to get to the bottom of this
 but I'm getting inconsistent behaviour. Here's an example:

 Query = pi:rcs100     -       Here going to use pid_rcs as join
 value

 result name=response numFound=1 start=0
  doc
  str name=pircs100/str
  str name=ctrcs/str
  str name=pid_rcsG1/str
  str name=name_rcsEmerging Market Countries/str
  str name=definition_rcsAll business events relating to companies
 and other issuers of securities./str
  /doc
  /result
  /response

 Query = code:G1       -       See how many docs have G1 in their
 code field. Notice that code is multi valued

 - result name=response numFound=2 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
  /result
  /response

 Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
 from=pid_rcs to=code}pi:rcs100

 - result name=response numFound=3 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:58Z/date
  str name=pinCN1763203+1029782/str
 - arr name=code
  strA2/str
  strA5/str
  strA9/str
  strAN/str
  strB125/str
  strB126/str
  strB130/str
  strBL63/str
  strG41/str
  strGK/str
  strMZ/str
  /arr
  /doc
  /result
  /response

 So as you can see I get back 3 results when only 2 match the criteria.
 i.e. docs where G1 is present in multi valued code field. Why should
 the last document be included in the result of the join?

 Thank you,

 Matt


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 01 August 2011 18:28
 To: solr-user@lucene.apache.org
 Subject: Re: Joining on multi valued fields

 On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com
 wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

 That should work (and the unit tests do test with multi-valued fields).
 Can you come up with a simple example where you are not getting the
 expected results?

 -Yonik
 http://www.lucidimagination.com

 This email was sent to you by Thomson Reuters, the global news and 
 information company. Any views expressed in this message are those of the 
 individual sender, except where the sender specifically states them to be the 
 views of Thomson Reuters.



RE: Joining on multi valued fields

2011-08-03 Thread matthew . fowler
No I haven't. I will get the latest out of the trunk and report back.

Cheers again,

Matt

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 03 August 2011 14:51
To: Fowler, Matthew (Markets Eikon)
Cc: solr-user@lucene.apache.org
Subject: Re: Joining on multi valued fields

Hmmm, if these are real responses from a solr server at rest (i.e.
documents not being changed between queries) then what you show
definitely looks like a bug.
That's interesting, since TestJoin implements a random test that
should cover cases like this pretty well.

I assume you are using a version of trunk (4.0-dev) and not just the
actual attached to the JIRA issue (which IIRC had at least one bug...
SOLR-2521).
Have you tried a more recent version of trunk?

-Yonik
http://www.lucidimagination.com



On Wed, Aug 3, 2011 at 7:00 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 Sorry for my late reply. I have been trying to get to the bottom of this
 but I'm getting inconsistent behaviour. Here's an example:

 Query = pi:rcs100     -       Here going to use pid_rcs as join
 value

 result name=response numFound=1 start=0
  doc
  str name=pircs100/str
  str name=ctrcs/str
  str name=pid_rcsG1/str
  str name=name_rcsEmerging Market Countries/str
  str name=definition_rcsAll business events relating to companies
 and other issuers of securities./str
  /doc
  /result
  /response

 Query = code:G1       -       See how many docs have G1 in their
 code field. Notice that code is multi valued

 - result name=response numFound=2 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
  /result
  /response

 Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
 from=pid_rcs to=code}pi:rcs100

 - result name=response numFound=3 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:58Z/date
  str name=pinCN1763203+1029782/str
 - arr name=code
  strA2/str
  strA5/str
  strA9/str
  strAN/str
  strB125/str
  strB126/str
  strB130/str
  strBL63/str
  strG41/str
  strGK/str
  strMZ/str
  /arr
  /doc
  /result
  /response

 So as you can see I get back 3 results when only 2 match the criteria.
 i.e. docs where G1 is present in multi valued code field. Why should
 the last document be included in the result of the join?

 Thank you,

 Matt


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 01 August 2011 18:28
 To: solr-user@lucene.apache.org
 Subject: Re: Joining on multi valued fields

 On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com
 wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

 That should work (and the unit tests do test with multi-valued fields).
 Can you come up with a simple example where you are not getting the
 expected results?

 -Yonik
 http://www.lucidimagination.com

 This email was sent to you by Thomson Reuters, the global news and 
 information company. Any views expressed in this message are those of the 
 individual sender, except where the sender specifically states them to be the 
 views of Thomson Reuters.


This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.


Joining on multi valued fields

2011-08-01 Thread matthew . fowler
Hi List

 

I have been using the JOIN patch
https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 

However I have hit a case where it doesn't seem to be working. It
doesn't seem to work when joining to a multi-valued field.

 

Has anyone any experience using the patch to join on multi-valued
fields?

 

Thanks,

 

Matt


This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.

Re: Joining on multi valued fields

2011-08-01 Thread Yonik Seeley
On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

That should work (and the unit tests do test with multi-valued fields).
Can you come up with a simple example where you are not getting the
expected results?

-Yonik
http://www.lucidimagination.com


Re: StatsComponent and multi-valued fields

2010-10-12 Thread Peter Karich
I'm not sure ... just reading it yesterday night ...
but isn't the unapplied patch from Harish
https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch
what you want?

Regards,
Peter.

 Running 1.4.1.

 I'm able to execute stats queries against multi-valued fields, but when
 given a facet, the statscomponent only considers documents that have a facet
 value as the last value in the field.

 As an example, imagine you are running stats on fooCount, and you want to
 facet on bar, which is multi-valued.  Two documents...

 1)
 fooCount = 100
 bar = A, B, C

 2) 
 fooCount = 5
 bar = C, B, A

 stats.field=fooCountstats=truestats.facet=bar

 I would expect to see stats for A, B, and C all with sums of 105.  But what
 I'm seeing is stats for C and A with sums of 100 and 5 respectively.

 Is this expected behavior?  Something I'm possibly doing wrong?  Is this
 just not advisable?

 Thanks!

   


-- 
http://jetwick.com twitter search prototype



Re: StatsComponent and multi-valued fields

2010-10-11 Thread Chris Hostetter

: I'm able to execute stats queries against multi-valued fields, but when
: given a facet, the statscomponent only considers documents that have a facet
: value as the last value in the field.
: 
: As an example, imagine you are running stats on fooCount, and you want to
: facet on bar, which is multi-valued.  Two documents...

It's a known bug ... StatsComponent's Faceted Stats make some really 
gross assumptions about the Field...

https://issues.apache.org/jira/browse/SOLR-1782

-Hoss


StatsComponent and multi-valued fields

2010-10-06 Thread dbashford

Running 1.4.1.

I'm able to execute stats queries against multi-valued fields, but when
given a facet, the statscomponent only considers documents that have a facet
value as the last value in the field.

As an example, imagine you are running stats on fooCount, and you want to
facet on bar, which is multi-valued.  Two documents...

1)
fooCount = 100
bar = A, B, C

2) 
fooCount = 5
bar = C, B, A

stats.field=fooCountstats=truestats.facet=bar

I would expect to see stats for A, B, and C all with sums of 105.  But what
I'm seeing is stats for C and A with sums of 100 and 5 respectively.

Is this expected behavior?  Something I'm possibly doing wrong?  Is this
just not advisable?

Thanks!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StatsComponent-and-multi-valued-fields-tp1644918p1644918.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-06 Thread Chris Hostetter

: The other would be to somehow control the scores of each id. So a document
: with 2 ids matching should be worth more then the document with only 1 id
: matching (This is how it works now) but a document with 7 ids matching
: shouldn't be worth more, or at least not a lot more, then a document that
: matches only 3 ids (this is not how it works). 

this is all drive by the coord factor of the outermost BooleanQuery ... 
you can provide a custom Similarity class thta generates differnet values 
based on the field/number of clauses, or if you are already generating the 
BooleanQuery via custom code (ie: your own QParser or what not) you can 
override the SImilartiy there.

: The reason this would be ideal for us is that we don't have any control over
: how many ids will be in the query and we don't want documents that have lots
: of ids to have an unnatural advantage over those with just a few.

If you put 'omitNorms=false' on the field in question, then the length 
normalization (which rewards shorter documents) should help offset this -- 
no custom code required.

-Hoss



Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

I checked the explain query.

What happens is that the sums of all the hits on ID are added up. Is there a
way to only grab the first score?

Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Oh sorry guys, I didn't correctly submit my original post to the mailing
list. The original message was this:

Hello all. We are having some trouble with queries similar to the type shown
below:

name: pizza OR (id:10 OR id:20 OR id:30) (id is a multi-valued field)

With the above query, we will always get documents with pizza in the name,
and any document with id values of 10, 20, and 30 will always come up first.
What we would like is to have a document with only id 10 to be weighted the
same as a document with ids 10, 20, and 30.

Is this possible with Lucene/Solr?

Thanks in advance for any assistance you might be able to offer. 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020181.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread Yonik Seeley
On Tue, Aug 3, 2010 at 2:42 PM, oleg.gnatovskiy crooke...@gmail.com wrote:

 Oh sorry guys, I didn't correctly submit my original post to the mailing
 list. The original message was this:
 
 Hello all. We are having some trouble with queries similar to the type shown
 below:

 name: pizza OR (id:10 OR id:20 OR id:30) (id is a multi-valued field)

 With the above query, we will always get documents with pizza in the name,
 and any document with id values of 10, 20, and 30 will always come up first.
 What we would like is to have a document with only id 10 to be weighted the
 same as a document with ids 10, 20, and 30.

How do you want pizza weighted against 10, 20, or 30?
If pizza can always come first, you can boost the second clause to zero:
pizza OR (id:10 OR id:20 OR id:30)^0

 What happens is that the sums of all the hits on ID are added up. Is there a
 way to only grab the first score?

There is a way to grab only the highest score from a set of options
(DisjunctionMaxQuery) but unfortunately there is no general query
parser syntax to support that yet.

-Yonik
http://www.lucidimagination.com


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Sorry guess I messed up my example query. The query should look like this:

name:pizza AND id:(10 OR 20 OR 30) 

Thus if I do name:pizza^10 AND id:(10 OR 20 OR 30)^0 wouldn't a document
that has all the ids (10,20, and 30) still come up higher then a document
that has just one?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread Yonik Seeley
On Tue, Aug 3, 2010 at 3:16 PM, oleg.gnatovskiy crooke...@gmail.com wrote:

 Sorry guess I messed up my example query. The query should look like this:

 name:pizza AND id:(10 OR 20 OR 30)

 Thus if I do name:pizza^10 AND id:(10 OR 20 OR 30)^0 wouldn't a document
 that has all the ids (10,20, and 30) still come up higher then a document
 that has just one?

No, because the whole id:(10 OR 20 OR 30)^0 clause will contribute 0
to the final score.
Another way to get the same effect would be to pull it out as a filter:
q=name:pizzafq=id:(10 OR 20 OR 30)

-Yonik
http://www.lucidimagination.com


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Well that does take care of some cases.

How about if we still want a hit on a tag to contribute to the weight
though? 

There would be 2 options. One is the one I described in the original post,
which is to grab the highest score of a set of ids.

The other would be to somehow control the scores of each id. So a document
with 2 ids matching should be worth more then the document with only 1 id
matching (This is how it works now) but a document with 7 ids matching
shouldn't be worth more, or at least not a lot more, then a document that
matches only 3 ids (this is not how it works). 

The reason this would be ideal for us is that we don't have any control over
how many ids will be in the query and we don't want documents that have lots
of ids to have an unnatural advantage over those with just a few.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi valued fields

2010-03-18 Thread Chris Hostetter

: Can I build a query such as : 
: 
:   -field: A
: 
: which will return all documents that do not have exclusive A in the 
: their field's values. By exclusive I mean that I don't want documents 
: that only have A in their list of values. In my sample case, the query 
: would return doc A and B. Because they both have other values in field1.

the most straight forward way i know of to deal with requirements like 
this is to also have a field_count field where you record the number of 
values indexed into field ... an UpdateProcessor can automate creating 
this field for you, and then you can query for something like...

-(+field:A +field_count:1)


-Hoss



Re: Multi valued fields

2010-03-14 Thread Lance Norskog
This could be done with a function query, except that the function I
would use does not exist.  There is no function that returns the
number of values that exist for a field. If there were, you could say:

-field:A OR (field:A and function()  1)

I don't know the Lucene data structures well, but I suspect this would
be incredibly expensive to calculate.

On 3/11/10, Jean-Sebastien Vachon js.vac...@videotron.ca wrote:
 Hi All,

 I'd like to know if it is possible to do the following on a multi-value
 field:

 Given the following data:

 document A:  field1   = [ A B C D]
 document B:  field 1  = [A B]
 document C:  field 1  = [A]

 Can I build a query such as :

   -field: A

 which will return all documents that do not have exclusive A in the their
 field's values. By exclusive I mean that I don't want documents that only
 have A in their list of values. In my sample case, the query would return
 doc A and B.
 Because they both have other values in field1.

 It this kind of query possible with Solr/Lucene?

 Thanks






-- 
Lance Norskog
goks...@gmail.com


Multi valued fields

2010-03-11 Thread Jean-Sebastien Vachon
Hi All,

I'd like to know if it is possible to do the following on a multi-value field:

Given the following data:

document A:  field1   = [ A B C D]
document B:  field 1  = [A B]
document C:  field 1  = [A]

Can I build a query such as : 

-field: A

which will return all documents that do not have exclusive A in the their 
field's values. By exclusive I mean that I don't want documents that only have 
A in their list of values. In my sample case, the query would return doc A and 
B.
Because they both have other values in field1.

It this kind of query possible with Solr/Lucene?

Thanks





Re: How can I get Solr-Cell to extract to multi-valued fields?

2010-03-02 Thread Lance Norskog
It is a bug. I just filed this. It is just a unit test that displays
the behavior.

http://issues.apache.org/jira/browse/SOLR-1803

On Tue, Mar 2, 2010 at 9:07 AM, Mark Roberts mark.robe...@red-gate.com wrote:
 Hi,

 I have a schema with a multivalued field like so:

 field name=product type=string indexed=true stored=true 
 multiValued=true/

 I am uploading html documents to the Solr extraction handler which contain 
 meta in the head, like so:

 meta name=product content=firstproduct /
 meta name=product content=anotherproduct /
 meta name=product content=andanotherproduct /

 I want the extraction handler to map each of these pieces of meta onto the 
 product field, however, there seems to be a problem - only the last item 
 andanotherproduct is mapped, the first seem to be ignored.

 It does work, however, if I pass the values as literals in the query string 
 (e.g. 
 literal.product=firstproductliteral.product=anotherproductliteral.product=andanotherproduct)

 I've tried the release version 1.4 of solr and a recent nightly build of 1.5 
 and neither work.

 Is this a bug in Solr-cell or am I doing something wrong?

 Many thanks,
 Mark.




-- 
Lance Norskog
goks...@gmail.com


Re: complex multi valued fields

2010-01-18 Thread Shalin Shekhar Mangar
On Tue, Jan 12, 2010 at 7:55 PM, Adamsky, Robert radam...@techtarget.comwrote:


 I have a document that has a multi-valued field where each value in
 the field itself is comprised of two values itself.  Think of an invoice
 doc
 with multi value line items - each line item having quantity and product
 name.

 One option I see is to have a line item multi value field and when
 producing
 the document to pass to Solr, concat the quantity and desc and put it in
 the multi
 value field.

 My preference would be the ability to define such complex multi valued
 fields
 out of the box.  Is that supported in a Solr 1.4 environment?  Basically a
 field
 type that allows you to define the other fields that make up a field.

 This could look like something like this in schema.xml if supported:

 field type=complex name=lineitem
  field type=integer name=quantity/
  field type=text name=description/
 /field


Well, no, atleast not with Solr 1.4. This is a new feature being added to
Solr and it is already in trunk. The existing poly field does not let you
have different types for the individual items but you should be able to
write your own poly field for this task.

-- 
Regards,
Shalin Shekhar Mangar.


complex multi valued fields

2010-01-12 Thread Adamsky, Robert

I have a document that has a multi-valued field where each value in
the field itself is comprised of two values itself.  Think of an invoice doc
with multi value line items - each line item having quantity and product name.

One option I see is to have a line item multi value field and when producing
the document to pass to Solr, concat the quantity and desc and put it in the 
multi
value field.

My preference would be the ability to define such complex multi valued fields
out of the box.  Is that supported in a Solr 1.4 environment?  Basically a field
type that allows you to define the other fields that make up a field.

This could look like something like this in schema.xml if supported:

field type=complex name=lineitem
  field type=integer name=quantity/
  field type=text name=description/
/field



faceting/searching on multi-valued fields

2009-08-11 Thread AHMET ARSLAN
I have two parallel multivauled fields for holding key value pairs for each 
document.

doc
 arr name=value
strred/str 
strother/str 
strVS/str 
str10 cm./str 
str50 GB/str 
...
 /arr
 arr name=key
strColor/str 
strType/str 
strBrand/str 
strSize/str
strRAM/str

  /arr
/doc

There ara about 300 different keys. New key values can be created dynamicly. 
And some key values are meaningles for some documents. I want to faceting over 
these multivalued fields. Something like:

RAM:
50 GB (5)
40 GB (2)

Brand:
XX (10)
VS (1)

color:
red (9)
blue (2)

Is there a way to create this kind of faceting over multi-valued fields?

And about querying multi-valued fields:
Lets say I want to get blue coloured documents,
The query  '+key:color +value:blue' would return those two docs:

doc1
 arr name=value
   strblue/str 
... 
 /arr
 arr name=key
   strcolor/str 
   ...  
  /arr
/doc1

doc2
 arr name=value
   strred/str 
   strblue/str 
... 
 /arr
 arr name=key
   strcolor/str 
   strbackcolor/str 
   ...  
  /arr
/doc2

Is there are way to get only doc1 for that kind of query?
Specifying/preserving index/position info for the multivalued fields.

Let's say I have a dynamic field defined as 
dynamicField name=* type=string 
and I add docs with the fields that are elements of those multivalued fields:

add
doc
  field name=colorblue/field
  field name=brandvs/field  
/doc
/add

Can I use those fields at query time, although they are not defined in 
schema.xml?

'q=color:blue' and in faceting 'facet.field=color'

I'd appreciate any help and pointers.


  


Re: faceting/searching on multi-valued fields

2009-08-11 Thread Avlesh Singh

 Let's say I have a dynamic field defined as dynamicField name=*
 type=string 
 Can I use those fields at query time, although they are not defined in
 schema.xml?


Yes. Though I am not sure whether you can create a dynamic field without a
prefix of suffix with the wild-card. I would rather suggest to name this
field as attribute_* and use fields like attribute_color, attribute_ram
etc during indexing and searching.

This thread might be useful -
http://www.lucidimagination.com/search/document/45897f3dfa3ca260/storing_key_value_pair_in_solr_document

Cheers
Avlesh

On Tue, Aug 11, 2009 at 7:22 PM, AHMET ARSLAN iori...@yahoo.com wrote:

 I have two parallel multivauled fields for holding key value pairs for each
 document.

 doc
  arr name=value
strred/str
strother/str
strVS/str
str10 cm./str
str50 GB/str
...
  /arr
  arr name=key
strColor/str
strType/str
strBrand/str
strSize/str
strRAM/str

  /arr
 /doc

 There ara about 300 different keys. New key values can be created
 dynamicly. And some key values are meaningles for some documents. I want to
 faceting over these multivalued fields. Something like:

 RAM:
 50 GB (5)
 40 GB (2)

 Brand:
 XX (10)
 VS (1)

 color:
 red (9)
 blue (2)

 Is there a way to create this kind of faceting over multi-valued fields?

 And about querying multi-valued fields:
 Lets say I want to get blue coloured documents,
 The query  '+key:color +value:blue' would return those two docs:

 doc1
  arr name=value
   strblue/str
...
  /arr
  arr name=key
   strcolor/str
   ...
  /arr
 /doc1

 doc2
  arr name=value
   strred/str
   strblue/str
...
  /arr
  arr name=key
   strcolor/str
   strbackcolor/str
   ...
  /arr
 /doc2

 Is there are way to get only doc1 for that kind of query?
 Specifying/preserving index/position info for the multivalued fields.

 Let's say I have a dynamic field defined as
 dynamicField name=* type=string 
 and I add docs with the fields that are elements of those multivalued
 fields:

 add
 doc
  field name=colorblue/field
  field name=brandvs/field
 /doc
 /add

 Can I use those fields at query time, although they are not defined in
 schema.xml?

 'q=color:blue' and in faceting 'facet.field=color'

 I'd appreciate any help and pointers.






RE: Boosting ('bq') on multi-valued fields

2009-07-30 Thread Ensdorf Ken

 Hey Ken,
 Thanks for your reply.
 When I wrote '5|6' I ment that this is a multiValued field with two
 values
 '5' and '6', rather than the literal string '5|6' (and any Tokenizer).
 Does
 your reply still holds? That is, are multiValued fields dependent on
 the
 notion of tokenization to such a degree so that I cant use str type
 with
 them meaningfully? if so, it seems weird to me that I should be able to
 define a str multiValued field to begin with..

I'm pretty sure you can use multiValued string fields in the way you are 
describing.  If you just do a query without the boost do documents with 
multiple values come back?  That would at least tell you whether the problem 
was matching on the term itself or something to do with your use of boosts.

-Ken


Boosting ('bq') on multi-valued fields

2009-07-29 Thread KaktuChakarabati

Hey,
I have a field defined as such:

 field name=site_idtype=string indexed=true stored=false
multiValued=true /

with the string type defined as:

fieldtype name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

When I try using some query-time boost parameters using the bq on values of
this field it seems to behave
strangely in case of documents actually having multiple values:
If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems like
all the cases where this field is actually
populated with multiple ones ( i.e a document with field value 5|6 ) do
not get boosted at all. I verified this using
debugQuery  explainOther=doc_id:document_with_multiple_values.
is this a known issue/bug? any work arounds? (i'm using a nightly solr build
from a few months back.. )

Thanks,
-Chak
-- 
View this message in context: 
http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24713905.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Boosting ('bq') on multi-valued fields

2009-07-29 Thread Ensdorf Ken
 Hey,
 I have a field defined as such:

  field name=site_idtype=string indexed=true
 stored=false
 multiValued=true /

 with the string type defined as:

 fieldtype name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 When I try using some query-time boost parameters using the bq on
 values of
 this field it seems to behave
 strangely in case of documents actually having multiple values:
 If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems
 like
 all the cases where this field is actually
 populated with multiple ones ( i.e a document with field value 5|6 )
 do
 not get boosted at all. I verified this using
 debugQuery  explainOther=doc_id:document_with_multiple_values.
 is this a known issue/bug? any work arounds? (i'm using a nightly solr
 build
 from a few months back.. )

There is no tokenization on 'string' fields, so a query for 5 does not match 
a doc with a value of 5|6 for this field.  You could try  using field type 
'text' for this and see what you get.  You may need to customize it to you the 
StandardAnalyzer or WordDelimiterFilterFactory to get the right behavior.  
Using the analysis tool in the solr admin UI to experiment will probably be 
helpful.

-Ken




RE: Boosting ('bq') on multi-valued fields

2009-07-29 Thread KaktuChakarabati

Hey Ken,
Thanks for your reply.
When I wrote '5|6' I ment that this is a multiValued field with two values
'5' and '6', rather than the literal string '5|6' (and any Tokenizer). Does
your reply still holds? That is, are multiValued fields dependent on the
notion of tokenization to such a degree so that I cant use str type with
them meaningfully? if so, it seems weird to me that I should be able to
define a str multiValued field to begin with..

-Chak


Ensdorf Ken wrote:
 
 Hey,
 I have a field defined as such:

  field name=site_idtype=string indexed=true
 stored=false
 multiValued=true /

 with the string type defined as:

 fieldtype name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 When I try using some query-time boost parameters using the bq on
 values of
 this field it seems to behave
 strangely in case of documents actually having multiple values:
 If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems
 like
 all the cases where this field is actually
 populated with multiple ones ( i.e a document with field value 5|6 )
 do
 not get boosted at all. I verified this using
 debugQuery  explainOther=doc_id:document_with_multiple_values.
 is this a known issue/bug? any work arounds? (i'm using a nightly solr
 build
 from a few months back.. )
 
 There is no tokenization on 'string' fields, so a query for 5 does not
 match a doc with a value of 5|6 for this field.  You could try  using
 field type 'text' for this and see what you get.  You may need to
 customize it to you the StandardAnalyzer or WordDelimiterFilterFactory to
 get the right behavior.  Using the analysis tool in the solr admin UI to
 experiment will probably be helpful.
 
 -Ken
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24730981.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi-valued fields with DIH

2009-04-04 Thread ashokc

That worked. Thanks again.

Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 the column names are case sensitive try this
 
 field column=PROJECT_AREA name=projects /
field column=PROJECT_VERSION name=projects /
 On Sat, Apr 4, 2009 at 3:58 AM, ashokc ash...@qualcomm.com wrote:

 Hi,
 I need to assign multiple values to a field, with each value coming from
 a
 different column of the sql query.

 My data config snippet has lines like

field column=project_area name=projects /
field column=project_version name=projects /

 where 'project_area'  'project_version' are output by the sql query to
 the
 datasource. The 'verbose-output' from dataimport.jsp does show that these
 columns have values returned by the query

 ===

 lst name=verbose-output
 -
 lst name=entity:log
 -
 lst name=document#1
 +
 str name=query
 x
 /str
 str name=time-taken0:0:0.142/str
 str--- row #1-/str
 str name=PROJECT_AREAMySource/Area/Admin/str
 str name=PROJECT_VERSIONMySource/Version/06.02/str
 date name=LAST_MODIFIED_DATE2008-10-21T07:00:00Z/date
 .

 ==

 But the resulting index has no data in the field 'projects'. Is it NOT
 possible to create multi-valued fields with DIH?

 Thanks
 --
 View this message in context:
 http://www.nabble.com/Multi-valued-fields-with-DIH-tp22877509p22877509.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-valued-fields-with-DIH-tp22877509p22886586.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multi-valued fields with DIH

2009-04-03 Thread ashokc

Hi,
I need to assign multiple values to a field, with each value coming from a
different column of the sql query.

My data config snippet has lines like

field column=project_area name=projects /
field column=project_version name=projects /

where 'project_area'  'project_version' are output by the sql query to the
datasource. The 'verbose-output' from dataimport.jsp does show that these
columns have values returned by the query

===

lst name=verbose-output
−
lst name=entity:log
−
lst name=document#1
+
str name=query
x
/str
str name=time-taken0:0:0.142/str
str--- row #1-/str
str name=PROJECT_AREAMySource/Area/Admin/str
str name=PROJECT_VERSIONMySource/Version/06.02/str
date name=LAST_MODIFIED_DATE2008-10-21T07:00:00Z/date
.

==

But the resulting index has no data in the field 'projects'. Is it NOT
possible to create multi-valued fields with DIH?

Thanks
-- 
View this message in context: 
http://www.nabble.com/Multi-valued-fields-with-DIH-tp22877509p22877509.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi-valued fields with DIH

2009-04-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
the column names are case sensitive try this

field column=PROJECT_AREA name=projects /
   field column=PROJECT_VERSION name=projects /
On Sat, Apr 4, 2009 at 3:58 AM, ashokc ash...@qualcomm.com wrote:

 Hi,
 I need to assign multiple values to a field, with each value coming from a
 different column of the sql query.

 My data config snippet has lines like

field column=project_area name=projects /
field column=project_version name=projects /

 where 'project_area'  'project_version' are output by the sql query to the
 datasource. The 'verbose-output' from dataimport.jsp does show that these
 columns have values returned by the query

 ===

 lst name=verbose-output
 -
 lst name=entity:log
 -
 lst name=document#1
 +
 str name=query
 x
 /str
 str name=time-taken0:0:0.142/str
 str--- row #1-/str
 str name=PROJECT_AREAMySource/Area/Admin/str
 str name=PROJECT_VERSIONMySource/Version/06.02/str
 date name=LAST_MODIFIED_DATE2008-10-21T07:00:00Z/date
 .

 ==

 But the resulting index has no data in the field 'projects'. Is it NOT
 possible to create multi-valued fields with DIH?

 Thanks
 --
 View this message in context: 
 http://www.nabble.com/Multi-valued-fields-with-DIH-tp22877509p22877509.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Facet search on Multi-Valued Fields

2009-02-19 Thread Chris Hostetter

: I am trying to do facet search on 3 fields (all multivalued fields) in one
: query. field1 has 2 million distinct values, field2 has 1.5 million distinct
: values, field3 has 50,000 distinct values.
: 
: I already set the filterCache to 3,000,000, But the searching speed is still

making your filterCache bigger by some arbitrary amount won't magicly make 
facet searching on multivalued fields faster.   Making your filterCache 
big enough that all of the filters generated by multi-valued field 
faceting can be cached will help.

   300  200 + 150 + 5

...so all you've really done is use up a lot of ram, because you've 
esentially garunteed solr will never get a cache hit as it iterates 
through all the facet terms because of the LRU caching.  (in fact: you'll 
probably have even noticed your queries slow down with a cache size that 
big because of the added garbage collection)

: Is there anyway to optimize the faceted search?  Every help is appreciated.
: Thanks in advanced.

as mentioned, a lot of optimizations have already been made in the trunk.  
you can also improve things by reducing the number of unique values in 
those facet fields (stop words perhaps?) or by making your filterCache 
big enough that you don't see any evictions on the stats page when issuing 
multiple queries.



-Hoss



Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Hi all,
I have been experimenting solr faceted search for 2 weeks. But I meet
performance limitation on facet Search.
My solr contains 4,000,000 documents. Normal searching is fairly fast, But
faceted search is extremely slow.

I am trying to do facet search on 3 fields (all multivalued fields) in one
query. field1 has 2 million distinct values, field2 has 1.5 million distinct
values, field3 has 50,000 distinct values.

I already set the filterCache to 3,000,000, But the searching speed is still
very slow. Normally each query will took 5 mins or more.  As I narrow down
the search, the speed will increase dramatically.

Is there anyway to optimize the faceted search?  Every help is appreciated.
Thanks in advanced.



Regards

GC


Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese

Have you tired with a  nightly build with the new facet algorithm (it is
activated by default)?
http://www.nabble.com/new-faceting-algorithm-td20674902.html


Wang Guangchen wrote:
 
 Hi all,
 I have been experimenting solr faceted search for 2 weeks. But I meet
 performance limitation on facet Search.
 My solr contains 4,000,000 documents. Normal searching is fairly fast, But
 faceted search is extremely slow.
 
 I am trying to do facet search on 3 fields (all multivalued fields) in one
 query. field1 has 2 million distinct values, field2 has 1.5 million
 distinct
 values, field3 has 50,000 distinct values.
 
 I already set the filterCache to 3,000,000, But the searching speed is
 still
 very slow. Normally each query will took 5 mins or more.  As I narrow down
 the search, the speed will increase dramatically.
 
 Is there anyway to optimize the faceted search?  Every help is
 appreciated.
 Thanks in advanced.
 
 
 
 Regards
 
 GC
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22053578.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Nope, I am using the latest stable version of solr 1.3.0.

Thanks for your tips.

Besides this, Is there any other thing I should do?  I am reading some
previous threads about index optimization. (
http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html), Will
it improve the facet search speed?

GC



On Tue, Feb 17, 2009 at 5:30 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Have you tired with a  nightly build with the new facet algorithm (it is
 activated by default)?
 http://www.nabble.com/new-faceting-algorithm-td20674902.html


 Wang Guangchen wrote:
 
  Hi all,
  I have been experimenting solr faceted search for 2 weeks. But I meet
  performance limitation on facet Search.
  My solr contains 4,000,000 documents. Normal searching is fairly fast,
 But
  faceted search is extremely slow.
 
  I am trying to do facet search on 3 fields (all multivalued fields) in
 one
  query. field1 has 2 million distinct values, field2 has 1.5 million
  distinct
  values, field3 has 50,000 distinct values.
 
  I already set the filterCache to 3,000,000, But the searching speed is
  still
  very slow. Normally each query will took 5 mins or more.  As I narrow
 down
  the search, the speed will increase dramatically.
 
  Is there anyway to optimize the faceted search?  Every help is
  appreciated.
  Thanks in advanced.
 
 
 
  Regards
 
  GC
 
 

 --
 View this message in context:
 http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22053578.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese

Well doing an optimization after you do indexing will always improve your
search speed a little bit. But with the new facet algorithm you will note a
huge improvement ...
Other things to consider is to just index and store the necessary fields,
omitNorms always that is possible... there are many tips around... keep
reading ;) 


Wang Guangchen wrote:
 
 Nope, I am using the latest stable version of solr 1.3.0.
 
 Thanks for your tips.
 
 Besides this, Is there any other thing I should do?  I am reading some
 previous threads about index optimization. (
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html),
 Will
 it improve the facet search speed?
 
 GC
 
 
 
 On Tue, Feb 17, 2009 at 5:30 PM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 Have you tired with a  nightly build with the new facet algorithm (it is
 activated by default)?
 http://www.nabble.com/new-faceting-algorithm-td20674902.html


 Wang Guangchen wrote:
 
  Hi all,
  I have been experimenting solr faceted search for 2 weeks. But I meet
  performance limitation on facet Search.
  My solr contains 4,000,000 documents. Normal searching is fairly fast,
 But
  faceted search is extremely slow.
 
  I am trying to do facet search on 3 fields (all multivalued fields) in
 one
  query. field1 has 2 million distinct values, field2 has 1.5 million
  distinct
  values, field3 has 50,000 distinct values.
 
  I already set the filterCache to 3,000,000, But the searching speed is
  still
  very slow. Normally each query will took 5 mins or more.  As I narrow
 down
  the search, the speed will increase dramatically.
 
  Is there anyway to optimize the faceted search?  Every help is
  appreciated.
  Thanks in advanced.
 
 
 
  Regards
 
  GC
 
 

 --
 View this message in context:
 http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22053578.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22054095.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Thank you very much.

On Tue, Feb 17, 2009 at 6:04 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Well doing an optimization after you do indexing will always improve your
 search speed a little bit. But with the new facet algorithm you will note a
 huge improvement ...
 Other things to consider is to just index and store the necessary fields,
 omitNorms always that is possible... there are many tips around... keep
 reading ;)


 Wang Guangchen wrote:
 
  Nope, I am using the latest stable version of solr 1.3.0.
 
  Thanks for your tips.
 
  Besides this, Is there any other thing I should do?  I am reading some
  previous threads about index optimization. (
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html),
  Will
  it improve the facet search speed?
 
  GC
 
 
 
  On Tue, Feb 17, 2009 at 5:30 PM, Marc Sturlese
  marc.sturl...@gmail.comwrote:
 
 
  Have you tired with a  nightly build with the new facet algorithm (it is
  activated by default)?
  http://www.nabble.com/new-faceting-algorithm-td20674902.html
 
 
  Wang Guangchen wrote:
  
   Hi all,
   I have been experimenting solr faceted search for 2 weeks. But I meet
   performance limitation on facet Search.
   My solr contains 4,000,000 documents. Normal searching is fairly fast,
  But
   faceted search is extremely slow.
  
   I am trying to do facet search on 3 fields (all multivalued fields) in
  one
   query. field1 has 2 million distinct values, field2 has 1.5 million
   distinct
   values, field3 has 50,000 distinct values.
  
   I already set the filterCache to 3,000,000, But the searching speed is
   still
   very slow. Normally each query will took 5 mins or more.  As I narrow
  down
   the search, the speed will increase dramatically.
  
   Is there anyway to optimize the faceted search?  Every help is
   appreciated.
   Thanks in advanced.
  
  
  
   Regards
  
   GC
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22053578.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Facet-search-on-Multi-Valued-Fields-tp22053260p22054095.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Different XML format for multi-valued fields?

2008-10-17 Thread Chris Hostetter
: the multi-value field has only one value for a document, the XML returned
: looks like this: 
: arr name=someIds
: long name=someIds5693/long
: /arr

I think you are mistaken.  it will either look like this...

   long name=someIds5693/long

...or it will look like this...

   arr name=someIds
 long5693/long
   /arr

...depending on what the value of the version param is in your request, 
but it won't redundently output someIds

  version=2.0 ... no arr value is used if there is only one value
  version=2.1 ... arr for all multivalue fields, even if only one value
  version=2.2 ... responseHeader format changed to standard lst tag

: Is there a reason for this difference? Also, how does faceting work with
: multi-valued fields? It seems that I sometimes get facet results from
: multi-valued fields, and sometimes I don't.

i'm not sure i understand what exactly your question is ... you need to 
give us more info to go on (ie: what the field and fieldType looks 
like, what request params you are using, what you are getting back in 
the response, a description of what you've indexed, etc...)


-Hoss



Re: Different XML format for multi-valued fields?

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
The component that writes out the values do not know if it is
multivalued or not. So if it finds only a single value it writes it
out as such


On Thu, Oct 16, 2008 at 10:52 PM, oleg_gnatovskiy
[EMAIL PROTECTED] wrote:

 Hello. I have an index built in Solr with several multi-value fields. When
 the multi-value field has only one value for a document, the XML returned
 looks like this:
 arr name=someIds
 long name=someIds5693/long
 /arr
 However, when there are multiple values for the field, the XMl looks like
 this:
 arr name=someIds
 long11199/long
 long1722/long
 /arr
 Is there a reason for this difference? Also, how does faceting work with
 multi-valued fields? It seems that I sometimes get facet results from
 multi-valued fields, and sometimes I don't.

 Thanks.
 --
 View this message in context: 
 http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Bizarre DisMax behavior: q parameter not working but q.alt is, and multi-valued fields not matching at all

2008-05-01 Thread Ezra Epstein
Config:

1.  The relevant part of the solrconfig.xml:

  requestHandler name=/genre class=solr.StandardRequestHandler
defType=dismax
lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
primaryCategory^2 cat^0.5
 /str
 str name=pf
primaryCategory^2 cat^0.5
 /str
 str name=fl
id,contentID
 /str
 int name=ps100/int
 str name=q.alt*:*/str
/lst
  /requestHandler

2.  The relevant part of the schema.xml 

   field name=primaryCategory type=string indexed=true
stored=true required=true /
   field name=cat type=string indexed=true stored=true
required=false multiValued=true/

3.  Some queries with curious results:

a.  http://test02:8080/sfx/genre?fl=score
Fine - all items returned, as expected.  E.g.,:

result name=response numFound=2 start=0 maxScore=1.0
doc
float name=score1.0/float
arr name=cat
strDrama/str
strFeatured Titles/str
/arr
str name=id726032414/str
str name=primaryCategoryDrama/str
...
/doc
doc
float name=score1.0/float
arr name=cat
strAnimation/str
strFeatured Titles/str
/arr
str name=id726030178/str
str name=primaryCategoryAnimation/str
...
/doc
/result

b. http://test02:8080/sfx/genre?fl=scoreq=drama
Works: Returns the single, expected result.  
Drama shows up in both a single-valued field (primaryCategory) and a
multi-valued field (cat), both of which are listed in the /genre
response handler's qf parameter.

c. http://test02:8080/sfx/genre?fl=scoreq=Featured%20Titles
No results. Featured Titles appears only in the multi-valued cat
field.  

What am I doing wrong?




Length norm on multi-valued fields

2007-06-04 Thread Walter Underwood
With a multi-valued field, is the length norm based the individual
matched value (string) or on all the tokens in the field? I'm guessing
that it is the latter, and I expect I could find that in the source
or explain if I looked hard enough, but maybe someone already knows.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Length norm on multi-valued fields

2007-06-04 Thread Walter Underwood
On 6/4/07 11:24 AM, Chris Hostetter [EMAIL PROTECTED] wrote:

 : With a multi-valued field, is the length norm based the individual
 : matched value (string) or on all the tokens in the field? I'm guessing
 : that it is the latter, and I expect I could find that in the source
 : or explain if I looked hard enough, but maybe someone already knows.
 
 it's all tokens in the field.

Thanks, and dang. Looks like I'll need an extra field for the base
name of the series, so that the query friends can match well against
Friends: Season 1: Disk 1.

wunder