Hi all, We are trying to move some of our offline data analytics from hadoop hive stack to elasticsearch, but ran in to some issue.
We have daily event, in hive we use partition (hdfs directories) to store daily events. For instance , the hdfs directory layout of event table is like below event/dt=20141112 event/dt=20141113 user retention is tracking if a user produce an event(activity) today and produce an event in another day. the sql is like SELECT count(*) FROM event-log-20141112 AS l JOIN event-log-20141112 AS r ON l.user_id = r.user_id According to the documentation of elasticsearch, we can build one index per day, like log-20141112/event, log-20141113/event. But seems different index can't do a join as fast as co-locate through routing. If we store all the events in one index, each type represent one day's event. Seems there is still no way to do user retention query. Actually we can collapse all the events by user id. Maintaining a parent table stores users' information, including user id. Each day of event declares user information table as its parent table. The layout should like event/user event/log-20141112 event/log-20141113 All of those tables can be routed by user_id, so that those table will co-located. If they doing a join, no data shuffling needed. However, seems currently easlticsearch can't do a query related to multiple children tables join, they just do parent-child join, right? Can anyone help me on this? or if there is another solution on elasticsearch? Min -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d2f12ed-96aa-4239-98fe-1297b196397d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.