I have a collection of products which belong to few users,  like 

[ 
  { id: 1, user_id: 1, description: "blabla...", ... }, 
  { id: 2, user_id: 2, description: "blabla...", ... }, 
  { id: 3, user_id: 2, description: "blabla...", ... }, 
  { id: 4, user_id: 3, description: "blabla...", ... }, 
  { id: 5, user_id: 4, description: "blabla...", ... }, 
  { id: 6, user_id: 2, description: "blabla...", ... }, 
  { id: 7, user_id: 3, description: "blabla...", ... }, 
  { id: 8, user_id: 4, description: "blabla...", ... }, 
  { id: 9, user_id: 2, description: "blabla...", ... }, 
  { id: 10, user_id: 3, description: "blabla...", ... }, 
  { id: 11, user_id: 4, description: "blabla...", ... }, 
  ... 
] 

(the real data has more fields, but most important ones like 1st for 
product id, 2nd for user id, 3rd for product description.) 

I'd like to retrieve 2 products for top 3 users whose products have highest 
matching score (matching condition is description includes "fashion" and 
some other keywords, in this case just use "fashion" as example) : 

[ 
  { id: 2, user_id: '2', description: "blabla...", ..., _score: 100}, 
  { id: 3, user_id: '2', description: "blabla...", ..., _score: 95}, 
  { id: 4, user_id: '3', description: "blabla...", ..., _score: 90}, 
  { id: 5, user_id: '4', description: "blabla...", ..., _score: 80}, 
  { id: 7, user_id: '3', description: "blabla...", ..., _score: 70}, 
  { id: 8, user_id: '4', description: "blabla...", ..., _score: 65}, 
  ... 
] 

I have 3 possible ways to try: 

1. use term facet to get unique user_id in nested query, then use them for 
the user id range of outside query which focus on match description with 
keywords like "fashion". 

I don't know how to implement it in ES (stuck in facet terms iteration and 
construct user_id range with subquery with facet), try in sql like: 

select id, user_id, description 
from product 
where user_id in ( 
  select distinct user_id 
  from product 
  limit 3) 
order by _score 
limit 6 
/* 6  = 2 * 3 */ 

But it cannot guarantee top 6 products coming from 3 different user. 

Also, according to the following two links, it seems facet terms specific 
information iteration feature has not been implemented in ES so far. 
http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html

https://github.com/elasticsearch/elasticsearch/issues/256

2.  query with term filed in description matched with keywords like 
"fashion", at same time do statistics for each user_id with aggregation and 
limit the count to 2, then pick top 6 products with highest matching score. 

I still don't know how to implement in ES. 

3. use brute force with multiple queries until find top 3 users, each one 
has 2 products with highest matching scores. 

I mean use a hash map, key is user_id, value is how many times it appears. 
Query with matching keywords first, then iterate immediate results and 
check hash map, if value is less than 2, add to final result product list, 
otherwise skip it. 

Please let me know if you can figure it out in the above 1st or 2nd way. 

Appreciate in advance. 
Yao

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/723e0e59-e587-42b5-9fa4-390a27f2e7a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to