What about nested or parent/child query? How to achieve? On Thursday, May 8, 2014 4:45:36 PM UTC-7, Yao Li wrote: > > I have a collection of products which belong to few users, like > > [ > { id: 1, user_id: 1, description: "blabla...", ... }, > { id: 2, user_id: 2, description: "blabla...", ... }, > { id: 3, user_id: 2, description: "blabla...", ... }, > { id: 4, user_id: 3, description: "blabla...", ... }, > { id: 5, user_id: 4, description: "blabla...", ... }, > { id: 6, user_id: 2, description: "blabla...", ... }, > { id: 7, user_id: 3, description: "blabla...", ... }, > { id: 8, user_id: 4, description: "blabla...", ... }, > { id: 9, user_id: 2, description: "blabla...", ... }, > { id: 10, user_id: 3, description: "blabla...", ... }, > { id: 11, user_id: 4, description: "blabla...", ... }, > ... > ] > > (the real data has more fields, but most important ones like 1st for > product id, 2nd for user id, 3rd for product description.) > > I'd like to retrieve 2 products for top 3 users whose products have > highest matching score (matching condition is description includes > "fashion" and some other keywords, in this case just use "fashion" as > example) : > > [ > { id: 2, user_id: '2', description: "blabla...", ..., _score: 100}, > { id: 3, user_id: '2', description: "blabla...", ..., _score: 95}, > { id: 4, user_id: '3', description: "blabla...", ..., _score: 90}, > { id: 5, user_id: '4', description: "blabla...", ..., _score: 80}, > { id: 7, user_id: '3', description: "blabla...", ..., _score: 70}, > { id: 8, user_id: '4', description: "blabla...", ..., _score: 65}, > ... > ] > > I have 3 possible ways to try: > > 1. use term facet to get unique user_id in nested query, then use them for > the user id range of outside query which focus on match description with > keywords like "fashion". > > I don't know how to implement it in ES (stuck in facet terms iteration and > construct user_id range with subquery with facet), try in sql like: > > select id, user_id, description > from product > where user_id in ( > select distinct user_id > from product > limit 3) > order by _score > limit 6 > /* 6 = 2 * 3 */ > > But it cannot guarantee top 6 products coming from 3 different user. > > Also, according to the following two links, it seems facet terms specific > information iteration feature has not been implemented in ES so far. > > http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html > > https://github.com/elasticsearch/elasticsearch/issues/256 > > 2. query with term filed in description matched with keywords like > "fashion", at same time do statistics for each user_id with aggregation and > limit the count to 2, then pick top 6 products with highest matching score. > > I still don't know how to implement in ES. > > 3. use brute force with multiple queries until find top 3 users, each one > has 2 products with highest matching scores. > > I mean use a hash map, key is user_id, value is how many times it appears. > Query with matching keywords first, then iterate immediate results and > check hash map, if value is less than 2, add to final result product list, > otherwise skip it. > > Please let me know if you can figure it out in the above 1st or 2nd way. > > Appreciate in advance. > Yao >
-- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8273ae86-1344-4b59-8680-2a82eee98de5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.