Hi, I have like billion records on 20 nodes and would like to run custom map/reduce or "aggregation" (word count,sentiment analysis,etc) immediately after the ES result set is determined.
I came up with using Plugin system to customize "aggregation" like this: https://github.com/algolia/elasticsearch-cardinality-plugin/tree/1.0.X/src/main/java/org/alg/elasticsearch/search/aggregations/cardinality but want to update the jar quite often which will eventually require ES to be reload,I look up the scripted map/ reduce http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-scripted-metric-aggregation.html but was not sure about the memory usage or customization,I decide to run hazelcast or Spark on the same node or jvm and use their map/reduce framework.I use Filter phase to put the ES data like this: https://github.com/medcl/elasticsearch-filter-redis/blob/master/src/main/java/org/elasticsearch/index/query/RedisFilterParser.java#L121 but it just takes quite long time to put data on those in-memory middleware... Is there any best practice to put ES data to in-memory middleware, just to re-use the same data efficiently in subsequent program? I don't think I can use the ES query result set (on each shard) which seems to be on memory ,in my program,am I right? Thanks, Haji -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsobDAfy7%3DNXuD0%3DmH12H4haadiFYq25NCz47dfsOkDmmA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.