Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

Julien Naour Thu, 15 Jan 2015 07:25:07 -0800

I think I have a solution:
Build JSON files so I could send it directly to _bulk
saveJsonToEs("_bulk")


Not sure if it will be optimized or even worked, I'll try.

On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote:
>
> Hi,
>
> I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
> Learning....).
> At the end of the workflow I want to send aggregated results to 
> elasticsearch so my portal could query data.
> There will be two types of processing: streaming and the possibility to 
> relaunch workflow on all available data.
>
> Right now I use elasticsearch-hadoop and particularly the spark part to 
> send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
> method.
> The target is to have an index by day using the proper template that we 
> build.
> AFAIK you could not add consideration of a feature in a document to send 
> it to the proper index in elasticsearch-hadoop.
>
> What is the proper way to implement this feature? 
> Have a special step useing spark and bulk so that each executor send 
> documents to the proper index considering the feature of each line?
> Is there something that I missed in elasticsearch-hadoop?
>
> Julien
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9bba847-9e64-4336-92d9-80cd52c081d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

Reply via email to