I think I have a solution: Build JSON files so I could send it directly to _bulk saveJsonToEs("_bulk")
Not sure if it will be optimized or even worked, I'll try. On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote: > > Hi, > > I work on a complex workflow using Spark (Parsing, Cleaning, Machine > Learning....). > At the end of the workflow I want to send aggregated results to > elasticsearch so my portal could query data. > There will be two types of processing: streaming and the possibility to > relaunch workflow on all available data. > > Right now I use elasticsearch-hadoop and particularly the spark part to > send document to elasticsearch with the saveJsonToEs(myindex, mytype) > method. > The target is to have an index by day using the proper template that we > build. > AFAIK you could not add consideration of a feature in a document to send > it to the proper index in elasticsearch-hadoop. > > What is the proper way to implement this feature? > Have a special step useing spark and bulk so that each executor send > documents to the proper index considering the feature of each line? > Is there something that I missed in elasticsearch-hadoop? > > Julien > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9bba847-9e64-4336-92d9-80cd52c081d8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.