I implemented a solution for my problem.
I use a foreachPartitions and instantiate a bulk processor using a 
transport client (i.e. one by partition) to send documents.
It's not fast but it works.
Somebody have an idea to be more efficient?

Julien

On Thursday, January 15, 2015 at 4:40:22 PM UTC+1, Julien Naour wrote:
>
> My previous idea doesn't seem to work. Cannot send documents directly to 
> _bulk only to "index/type" pattern
>
> On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote:
>>
>> Hi,
>>
>> I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
>> Learning....).
>> At the end of the workflow I want to send aggregated results to 
>> elasticsearch so my portal could query data.
>> There will be two types of processing: streaming and the possibility to 
>> relaunch workflow on all available data.
>>
>> Right now I use elasticsearch-hadoop and particularly the spark part to 
>> send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
>> method.
>> The target is to have an index by day using the proper template that we 
>> build.
>> AFAIK you could not add consideration of a feature in a document to send 
>> it to the proper index in elasticsearch-hadoop.
>>
>> What is the proper way to implement this feature? 
>> Have a special step useing spark and bulk so that each executor send 
>> documents to the proper index considering the feature of each line?
>> Is there something that I missed in elasticsearch-hadoop?
>>
>> Julien
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dad84f38-cc36-4351-9d13-e0d1f461ebe9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to