I implemented a solution for my problem. I use a foreachPartitions and instantiate a bulk processor using a transport client (i.e. one by partition) to send documents. It's not fast but it works. Somebody have an idea to be more efficient?
Julien On Thursday, January 15, 2015 at 4:40:22 PM UTC+1, Julien Naour wrote: > > My previous idea doesn't seem to work. Cannot send documents directly to > _bulk only to "index/type" pattern > > On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote: >> >> Hi, >> >> I work on a complex workflow using Spark (Parsing, Cleaning, Machine >> Learning....). >> At the end of the workflow I want to send aggregated results to >> elasticsearch so my portal could query data. >> There will be two types of processing: streaming and the possibility to >> relaunch workflow on all available data. >> >> Right now I use elasticsearch-hadoop and particularly the spark part to >> send document to elasticsearch with the saveJsonToEs(myindex, mytype) >> method. >> The target is to have an index by day using the proper template that we >> build. >> AFAIK you could not add consideration of a feature in a document to send >> it to the proper index in elasticsearch-hadoop. >> >> What is the proper way to implement this feature? >> Have a special step useing spark and bulk so that each executor send >> documents to the proper index considering the feature of each line? >> Is there something that I missed in elasticsearch-hadoop? >> >> Julien >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dad84f38-cc36-4351-9d13-e0d1f461ebe9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.