Well, you can just use any async http library to do it, but I wouldn't recommend it since putting it all back together might be difficult (to see which documents failed to index etc). You can always just have a couple of threads each running a streaming bulk, reading from a Queue and writing the results to another Queue, should be fairly easy to do in your code.
On Sun, Mar 2, 2014 at 7:17 PM, <euneve...@gmail.com> wrote: > Hey thanks, > > So is there a convenient way to asynchronously call the bulk (helpers bulk > or helpers streaming_bulk) in a way that means the client isn't waiting for > the request to complete? > > > On Sunday, March 2, 2014 5:51:15 PM UTC, Honza Král wrote: >> >> Hi, >> >> the streaming_bulk function in elasticsearch-py is a helper that will >> actually split the stream of documents into chunk and send them to >> elasticsearch - it does not stream all documents to es as a single >> request. It is impossible (due to the nature of bulk requests) for >> elasticsearch to consume arbitrary number of documents in a single >> request so this helper was created to give you the abstraction. >> >> The difference between bulk and streaming_bulk is in the way it's >> executed and returned - bulk will just return statistics/errors while >> streaming_bulk is a generator that will keep yielding results per >> document, thus completely hiding the fact that the stream is being >> sent to ES in chunks. >> >> Hope this helps, >> Honza >> >> On Sun, Mar 2, 2014 at 6:46 PM, <eune...@gmail.com> wrote: >> > Hi, >> > >> > I have a similar question as the OP : what is the best way to get 1m or >> > 30m >> > records indexed? >> > I mean I can send client.bulk batches of records but while the request >> > is >> > being indexed the client is waiting: valuable seconds. >> > >> > Also I have tried: python elasticsearch-py and there is a helpers.bulk >> > and >> > helpers.streaming_bulk >> > And looking at the source code I can see that helpers.bulk calls -> >> > helpers.streaming_bulk so is it the same thing? ie >> > I should continue to call helpers.bulk? Or what is the difference? >> > >> > Thanks, >> > >> > >> > On Wednesday, January 8, 2014 6:02:25 PM UTC, Jörg Prante wrote: >> >> >> >> You are correct, ES nodes consumes data request by request, before they >> >> are passed on through the cluster. Also the bulk indexing requests, >> >> such >> >> requests are temporarily pushed to buffers, but they are split by lines >> >> and >> >> executed as single actions. >> >> >> >> So to reduce network roundtrips, the best thing is to use the bulk API. >> >> What is left is a few percent to optimize, which is not much worth it. >> >> With >> >> gzip, ES HTTP provides transparent compression. Main challenge is HTTP >> >> overhead (headers can't be compressed), and base64, if you use binary >> >> data >> >> with ES. >> >> >> >> Please note that you must evaluate the bulk responses too, in order to >> >> validate the notification about bulk success on doc level. >> >> >> >> It is possible to extend the whole ES API also to Websocket, so beside >> >> JSON, it could also be possible to transfer JSON text frames or >> >> SMILE/binary >> >> frames on a single bi-directional channel. HTTP must use two channels >> >> for >> >> this, so with Websocket, you can reduce connection resources to the >> >> half. In >> >> this sense, the Netty channel / REST / Java API could be extended for >> >> special realtime WS streaming mode applications, like for pubsub >> >> applications. I experimented with that some time ago on ES 0.20 >> >> https://github.com/jprante/elasticsearch-transport-websocket (needs >> >> updating) >> >> >> >> From what I understand, the thrift transport plugin compiles the ES >> >> API, >> >> operates in a streaming-like fashion, and is providing a solution that >> >> reduces HTTP overhead: >> >> https://github.com/elasticsearch/elasticsearch-transport-thrift >> >> >> >> Jörg >> >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "elasticsearch" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to elasticsearc...@googlegroups.com. >> > To view this discussion on the web visit >> > >> > https://groups.google.com/d/msgid/elasticsearch/b2702386-ca31-4551-9a92-15775a9011d2%40googlegroups.com. >> > >> > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/f0bc0dc9-7341-4eb4-9e78-ff5200a03635%40googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDir2qbnprp6hbnv4paB8S9GdGEwDbjnhPnqFxX%3DxSn9SOg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.