Re: Query on multivalue field
Tested it out and seems to work well as long as I set the gap to a value much longer than the text - 1 appear to work fine for our current data. Thanks heaps for all the help guys! Scott. On 2/03/11 11:13 AM, Jonathan Rochkind wrote: Each token has a position set on it. So if you index the value "alpha beta gamma", it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then after one value in a multi-valued field ends, the position increment gap will be added to the positions for the next value. Solr doesn't actually internally have much of any idea of a multi-valued field, ALL a multi-valued indexed field is, is a position increment gap seperating tokens from different 'values'. So index in a multi-valued field, with position increment gap 1, the values: ["alpha beta gamma", "aleph bet"], you get kind of like: document1: alpha: 1 beta: 2 gamma: 3 aleph: 10004 bet: 10005 A large position increment gap, as far as I know and can tell (please someone correct me if I'm wrong, I am not a Solr developer) has no effect on the size or efficiency of your index on disk. I am not sure why positionIncrementGap doesn't just default to a very large number, to provide behavior that more matches what people expect from the idea of a "multi-valued field". So maybe there is some flaw in my understanding, that justifies some reason for it not to be this way? But I set my positionIncrementGap very large, and haven't seen any issues. On 3/1/2011 5:46 PM, Scott Yeadon wrote: The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = "some text...termA...more text" and value 2 = "some text...termB...more text" and do a search such as field1:(termA termB) (where) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap="100", you can simulate this with using &q=field1:"termA termB"~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Query on multivalue field
Each token has a position set on it. So if you index the value "alpha beta gamma", it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then after one value in a multi-valued field ends, the position increment gap will be added to the positions for the next value. Solr doesn't actually internally have much of any idea of a multi-valued field, ALL a multi-valued indexed field is, is a position increment gap seperating tokens from different 'values'. So index in a multi-valued field, with position increment gap 1, the values: ["alpha beta gamma", "aleph bet"], you get kind of like: document1: alpha: 1 beta: 2 gamma: 3 aleph: 10004 bet: 10005 A large position increment gap, as far as I know and can tell (please someone correct me if I'm wrong, I am not a Solr developer) has no effect on the size or efficiency of your index on disk. I am not sure why positionIncrementGap doesn't just default to a very large number, to provide behavior that more matches what people expect from the idea of a "multi-valued field". So maybe there is some flaw in my understanding, that justifies some reason for it not to be this way? But I set my positionIncrementGap very large, and haven't seen any issues. On 3/1/2011 5:46 PM, Scott Yeadon wrote: The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = "some text...termA...more text" and value 2 = "some text...termB...more text" and do a search such as field1:(termA termB) (where) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap="100", you can simulate this with using &q=field1:"termA termB"~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Query on multivalue field
The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = "some text...termA...more text" and value 2 = "some text...termB...more text" and do a search such as field1:(termA termB) (where) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap="100", you can simulate this with using &q=field1:"termA termB"~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Query on multivalue field
> In a multiValued field, call it field1, if I have two > values indexed to > this field, say value 1 = "some text...termA...more text" > and value 2 = > "some text...termB...more text" and do a search such as > field1:(termA termB) > (where ) I'm > getting a hit > returned even though both terms don't occur within a single > value in the > multiValued field. > > What I'm wondering is if there is a way of applying the > query against > each value of the field rather than against the field in > its entirety. > The reason being is the number of values I want to store is > variable and > I'd like to avoid the use of dynamic fields or > restructuring the index > if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap="100", you can simulate this with using &q=field1:"termA termB"~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Query on multivalue field
Thanks, but just to confirm the way multiValued fields work: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = "some text...termA...more text" and value 2 = "some text...termB...more text" and do a search such as field1:(termA termB) (where ) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Scott. On 2/03/11 12:35 AM, Steven A Rowe wrote: Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve -Original Message- From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] Sent: Monday, February 28, 2011 11:50 PM To:solr-user@lucene.apache.org Subject: Query on multivalue field Hi, I have a variable number of text-based fields associated with each primary record which I wanted to apply a search across. I wanted to avoid the use of dynamic fields if possible or having to create a different document type in the index (as the app is based around the primary record and different views mean a lot of work to revamp pagination etc). So, is there a way to apply a query to each value of a multivalued field or is it always treated as a "single" field from a query perspective? Thanks. Scott.
RE: Query on multivalue field
Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve > -Original Message- > From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] > Sent: Monday, February 28, 2011 11:50 PM > To: solr-user@lucene.apache.org > Subject: Query on multivalue field > > Hi, > > I have a variable number of text-based fields associated with each > primary record which I wanted to apply a search across. I wanted to > avoid the use of dynamic fields if possible or having to create a > different document type in the index (as the app is based around the > primary record and different views mean a lot of work to revamp > pagination etc). > > So, is there a way to apply a query to each value of a multivalued field > or is it always treated as a "single" field from a query perspective? > > Thanks. > > Scott.