elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

Julien Naour Thu, 15 Jan 2015 07:18:57 -0800

Hi,

I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
Learning....).
At the end of the workflow I want to send aggregated results to 
elasticsearch so my portal could query data.
There will be two types of processing: streaming and the possibility to 
relaunch workflow on all available data.


Right now I use elasticsearch-hadoop and particularly the spark part to 
send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
method.
The target is to have an index by day using the proper template that we 
build.
AFAIK you could not add consideration of a feature in a document to send it 
to the proper index in elasticsearch-hadoop.

What is the proper way to implement this feature? 
Have a special step useing spark and bulk so that each executor send 
documents to the proper index considering the feature of each line?
Is there something that I missed in elasticsearch-hadoop?

Julien

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58b0e0e3-a297-4cf4-95bf-d3cf34546ea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

Reply via email to