Hi, I am sure Hadoop can help you calculate this, but you may also be able to go about this more efficiently in Elasticsearch. If you, as you mentioned, were to create a user centric index in addition to the event centric one that you have got, you could store a list of all the events belonging to a user there. This would allow you to efficiently identify the users that have all the required events through a simple query, and then just process these to verify that the order is correct, which is likely to scale and perform much better than the current approach. This is what is usually referred to as entity-centric indexing [1].
As updating the user centric index for every event inserted can often be expensive, a common approach is to create a batch job that periodically retrieves all new events, aggregates these per user and updates the user index. This will mean that the user index will not be completely up to date all the time, but as you spread out the processing work, it can make queries much more efficient. [1] https://www.elastic.co/videos/entity-centric-indexing-london-meetup-sep-2014 Best regards, Christian On Sunday, 3 May 2015 10:21:35 UTC+1, Lior Goldemberg wrote: > > hi, > > i have few basic questions about es-hadoop, > and i would really appreciate your kind help > > 1. if i have currently ES cluster, do i have motivation to add hadoop > layer? > > 2. is the idea of ES-hadoop, that hadoop will be the data store, and ES > the search engine above it? > > 3. can logstash write to hadoop? > > 4. when i run queries to ES, does it go to HDFS in real time? > > thanks a lot! > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/891973ff-14be-4720-9895-d7e6581b2323%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.