Re: Parent-child options

2011-11-08 Thread Michael McCandless
Lucene itself has BlockJoinQuery/Collector (in contrib/join), which is
what ElasticSearch is using under the hood for its nested documents (I
think?).

But I don't think this has been exposed in Solr yet patches welcome!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Nov 8, 2011 at 12:59 PM, Jean Maynier  wrote:
> Hello,
>
> Did someone find a way to solve the parent-child problem? The Join option
> is too complex because you have to create multiple document type and do the
> join in the query.
>
> ElasticSearch did a better job at solving this problem:
> http://www.elasticsearch.org/guide/reference/mapping/nested-type.html
> http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html
>
> Is Solr has a similar feature (at least in the roadmap) ? I don't want to
> change for ES (too much changed) but it seems better for the moment for
> structured content.
>
> --
> Jean Maynier
>


Parent-child options

2011-11-08 Thread Jean Maynier
Hello,

Did someone find a way to solve the parent-child problem? The Join option
is too complex because you have to create multiple document type and do the
join in the query.

ElasticSearch did a better job at solving this problem:
http://www.elasticsearch.org/guide/reference/mapping/nested-type.html
http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html

Is Solr has a similar feature (at least in the roadmap) ? I don't want to
change for ES (too much changed) but it seems better for the moment for
structured content.

--
Jean Maynier


Re: Parent-child options

2011-03-25 Thread Jan Høydahl
Otis,

Impressive list of possible solutions you've come up with :)

I've used Jonathan's "pattern" in several projects, but it quickly becomes 
unmanagable. My plan was to try to come up with a new FieldType inspired by 
FAST's Scope-field, which would take JSON in and be able to match hierarchical 
relationships with a syntax such as q=itemType:shoes AND 
items_json:"item:and(color:red,size:10)". The FieldType would make sure that 
the sub-tags within the and() actually exists within the scope of the same 
item. I's not trivial, as you implement a mini matching engine inside a field 
type and a new query syntax, but it should be possible for simple string type 
metadata. The FieldType would need to convert the json structure into some 
internal tree structure which is easily matched against the query.

I also thought about a JSON PolyField, where inserting one JSON string into the 
poly field would generate a bunch of sub fields _items_json_item1_color, 
_items_json_item1_size... to be able to re-use Lucene's matching capabilities, 
but I did not get it to support all use cases in my head.

Did anyone try SIREn?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 17. mars 2011, at 16.48, Jonathan Rochkind wrote:

> The standard answer, which is a kind of de-normalizing, is to index tokens 
> like this:
> 
> red_10   red_11orange_12
> 
> in another field, you could do these things with size first:
> 
> 10_red 11_red 12_orange
> 
> Now if you want to see what sizes of red you have, you can do a facet query 
> with facet.prefix=red_ .  You'll need to do a bit of parsing/interpreting 
> client size to translate from the results you get ("red_10", "red_11") to 
> telling the users "sizes 10 and 11 are available".  The second field with 
> size first lets you do the same thing to answer "what colors do we have in 
> size X?".
> 
> That gets unmanageable with more than 2-3 facet combinations, but with just 2 
> (or, pushing it, 3), can work out okay. You'd probably ALSO want to keep the 
> facets you have with plain values "red red orange" etc, to support that first 
> level of user-implementing. There is a bit more work to do on client side 
> with this approach, Solr isn't just giving you exactly what you want in it's 
> response, you've got to have logic for when to use the top-level facets and 
> when to go to that second-level combo facet ("red_12"), but it's do-able.
> 
> On 3/17/2011 11:21 AM, Otis Gospodnetic wrote:
>> Hi,
>> 
>> 
>> 
>> - Original Message 
>>> From: Yonik Seeley
>>> Subject: Re: Parent-child options
>>> 
>>> On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
>>>wrote:
>>>> The dreaded parent-child without denormalization question.  What  are one's
>>>> options for the following example:
>>>> 
>>>> parent:  shoes
>>>> 3 children. each with 2 attributes/fields: color and size
>>>>   * color: red black orange
>>>>  * size: 10 11 12
>>>> 
>>>> The goal is  to be able to search for:
>>>> 1) color:red AND size:10 and get 1 hit for the  above
>>>> 2) color:red AND size:12 and get *no* matches because there are no  red 
>>>> shoes
>>> of
>>>> size 12, only size 10.
>>> What if you had this  instead:
>>> 
>>>   color: red red orange
>>>   size: 10 11 12
>>> 
>>> Do  you need for color:red to return 1 or 2 (i.e. is the final answer
>>> in units of  child hits or parent hits)?
>> The final answer is the parent, which is "shoes" in this example.
>> So:
>> if the query is color:red AND size:10 the answer is: Yes, we got red shoes 
>> size
>> 10
>> if the query is color:red AND size:11 the answer is: Yes, we got red shoes 
>> size
>> 11
>> if the query is color:red AND size:12 the answer is: No, we don't have red 
>> shoes
>> size 12
>> 
>> Thanks,
>> Otis
>> 



Re: Parent-child options

2011-03-17 Thread Jonathan Rochkind
The standard answer, which is a kind of de-normalizing, is to index 
tokens like this:


red_10   red_11orange_12

in another field, you could do these things with size first:

10_red 11_red 12_orange

Now if you want to see what sizes of red you have, you can do a facet 
query with facet.prefix=red_ .  You'll need to do a bit of 
parsing/interpreting client size to translate from the results you get 
("red_10", "red_11") to telling the users "sizes 10 and 11 are 
available".  The second field with size first lets you do the same thing 
to answer "what colors do we have in size X?".


That gets unmanageable with more than 2-3 facet combinations, but with 
just 2 (or, pushing it, 3), can work out okay. You'd probably ALSO want 
to keep the facets you have with plain values "red red orange" etc, to 
support that first level of user-implementing. There is a bit more work 
to do on client side with this approach, Solr isn't just giving you 
exactly what you want in it's response, you've got to have logic for 
when to use the top-level facets and when to go to that second-level 
combo facet ("red_12"), but it's do-able.


On 3/17/2011 11:21 AM, Otis Gospodnetic wrote:

Hi,



- Original Message 

From: Yonik Seeley
Subject: Re: Parent-child options

On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
   wrote:

The dreaded parent-child without denormalization question.  What  are one's
options for the following example:

parent:  shoes
3 children. each with 2 attributes/fields: color and size
   * color: red black orange
  * size: 10 11 12

The goal is  to be able to search for:
1) color:red AND size:10 and get 1 hit for the  above
2) color:red AND size:12 and get *no* matches because there are no  red shoes

of

size 12, only size 10.

What if you had this  instead:

   color: red red orange
   size: 10 11 12

Do  you need for color:red to return 1 or 2 (i.e. is the final answer
in units of  child hits or parent hits)?

The final answer is the parent, which is "shoes" in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size
11
if the query is color:red AND size:12 the answer is: No, we don't have red shoes
size 12

Thanks,
Otis



Re: Parent-child options

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 11:21 AM, Otis Gospodnetic
 wrote:
> Hi,
>
>
>
> - Original Message 
>> From: Yonik Seeley 
>> Subject: Re: Parent-child options
>>
>> On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
>>   wrote:
>> > The dreaded parent-child without denormalization question.  What  are one's
>> > options for the following example:
>> >
>> > parent:  shoes
>> > 3 children. each with 2 attributes/fields: color and size
>> >   * color: red black orange
>> >  * size: 10 11 12
>> >
>> > The goal is  to be able to search for:
>> > 1) color:red AND size:10 and get 1 hit for the  above
>> > 2) color:red AND size:12 and get *no* matches because there are no  red 
>> > shoes
>>of
>> > size 12, only size 10.
>>
>> What if you had this  instead:
>>
>>   color: red red orange
>>   size: 10 11 12
>>
>> Do  you need for color:red to return 1 or 2 (i.e. is the final answer
>> in units of  child hits or parent hits)?
>
> The final answer is the parent, which is "shoes" in this example.
> So:
> if the query is color:red AND size:10 the answer is: Yes, we got red shoes 
> size
> 10
> if the query is color:red AND size:11 the answer is: Yes, we got red shoes 
> size
> 11
> if the query is color:red AND size:12 the answer is: No, we don't have red 
> shoes
> size 12

Then yes, the join patch would work (as long as it's just filtering
and you don't need relevancy of child hits to propagate to the
parent).

parent {category:"shoes"}
child {parent:"shoes", color:"red", size:10}

q={!join from=parent to=category}color:red AND size:10

If you had a query on the parent type docs, the join could also be
used as an "fq".

-Yonik
http://lucidimagination.com


Re: Parent-child options

2011-03-17 Thread Otis Gospodnetic
Hi,



- Original Message 
> From: Yonik Seeley 
> Subject: Re: Parent-child options
> 
> On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
>   wrote:
> > The dreaded parent-child without denormalization question.  What  are one's
> > options for the following example:
> >
> > parent:  shoes
> > 3 children. each with 2 attributes/fields: color and size
> >   * color: red black orange
> >  * size: 10 11 12
> >
> > The goal is  to be able to search for:
> > 1) color:red AND size:10 and get 1 hit for the  above
> > 2) color:red AND size:12 and get *no* matches because there are no  red 
> > shoes 
>of
> > size 12, only size 10.
> 
> What if you had this  instead:
> 
>   color: red red orange
>   size: 10 11 12
> 
> Do  you need for color:red to return 1 or 2 (i.e. is the final answer
> in units of  child hits or parent hits)?

The final answer is the parent, which is "shoes" in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 
11
if the query is color:red AND size:12 the answer is: No, we don't have red 
shoes 
size 12

Thanks,
Otis


Re: Parent-child options

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
 wrote:
> The dreaded parent-child without denormalization question.  What are one's
> options for the following example:
>
> parent: shoes
> 3 children. each with 2 attributes/fields: color and size
>  * color: red black orange
>  * size: 10 11 12
>
> The goal is to be able to search for:
> 1) color:red AND size:10 and get 1 hit for the above
> 2) color:red AND size:12 and get *no* matches because there are no red shoes 
> of
> size 12, only size 10.

What if you had this instead:

  color: red red orange
  size: 10 11 12

Do you need for color:red to return 1 or 2 (i.e. is the final answer
in units of child hits or parent hits)?

-Yonik
http://lucidimagination.com


Parent-child options

2011-03-16 Thread Otis Gospodnetic
Hi,

The dreaded parent-child without denormalization question.  What are one's 
options for the following example:

parent: shoes
3 children. each with 2 attributes/fields: color and size
 * color: red black orange
 * size: 10 11 12

The goal is to be able to search for:
1) color:red AND size:10 and get 1 hit for the above
2) color:red AND size:12 and get *no* matches because there are no red shoes of 
size 12, only size 10.

What's the best thing to do without denormalizing?
* Are Poly fields designed for this?
* Should one use JSONKeyValueTokenizerFactory from SOLR-1690 as suggested by 
Ryan in http://search-lucene.com/m/I8VaDeusnJ1 ?
* Should one use SIREn as suggested by Renaud in 
http://search-lucene.com/m/qoQWMVk3w91 ?
* Should one use SpanMaskingQuery and SpanNearQuery as suggested by Hoss in 
http://search-lucene.com/m/AEvbbeusnJ1 ?
* Should one use JOIN from https://issues.apache.org/jira/browse/SOLR-2272 ?
* Should one use Nested Document query support from LUCENE-2454 (not in trunk, 
not in Solr) ?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/