Re: Does the server support streaming?

Honza Král Sun, 02 Mar 2014 10:21:24 -0800

Well, you can just use any async http library to do it, but I wouldn't
recommend it since putting it all back together might be difficult (to
see which documents failed to index etc). You can always just have a
couple of threads each running a streaming bulk, reading from a Queue
and writing the results to another Queue, should be fairly easy to do
in your code.


On Sun, Mar 2, 2014 at 7:17 PM,  <euneve...@gmail.com> wrote:
> Hey thanks,
>
> So is there a convenient way to asynchronously call the bulk (helpers bulk
> or helpers streaming_bulk) in a way that means the client isn't waiting for
> the request to complete?
>
>
> On Sunday, March 2, 2014 5:51:15 PM UTC, Honza Král wrote:
>>
>> Hi,
>>
>> the streaming_bulk function in elasticsearch-py is a helper that will
>> actually split the stream of documents into chunk and send them to
>> elasticsearch - it does not stream all documents to es as a single
>> request. It is impossible (due to the nature of bulk requests) for
>> elasticsearch to consume arbitrary number of documents in a single
>> request so this helper was created to give you the abstraction.
>>
>> The difference between bulk and streaming_bulk is in the way it's
>> executed and returned - bulk will just return statistics/errors while
>> streaming_bulk is a generator that will keep yielding results per
>> document, thus completely hiding the fact that the stream is being
>> sent to ES in chunks.
>>
>> Hope this helps,
>> Honza
>>
>> On Sun, Mar 2, 2014 at 6:46 PM,  <eune...@gmail.com> wrote:
>> > Hi,
>> >
>> > I have a similar question as the OP : what is the best way to get 1m or
>> > 30m
>> > records indexed?
>> > I mean I can send client.bulk batches of records but while the request
>> > is
>> > being indexed the client is waiting: valuable seconds.
>> >
>> > Also I have tried: python elasticsearch-py and there is a helpers.bulk
>> > and
>> > helpers.streaming_bulk
>> > And looking at the source code I can see that helpers.bulk calls ->
>> > helpers.streaming_bulk so is it the same thing? ie
>> > I should continue to call helpers.bulk? Or what is the difference?
>> >
>> > Thanks,
>> >
>> >
>> > On Wednesday, January 8, 2014 6:02:25 PM UTC, Jörg Prante wrote:
>> >>
>> >> You are correct, ES nodes consumes data request by request, before they
>> >> are passed on through the cluster. Also the bulk indexing requests,
>> >> such
>> >> requests are temporarily pushed to buffers, but they are split by lines
>> >> and
>> >> executed as single actions.
>> >>
>> >> So to reduce network roundtrips, the best thing is to use the bulk API.
>> >> What is left is a few percent to optimize, which is not much worth it.
>> >> With
>> >> gzip, ES HTTP provides transparent compression. Main challenge is HTTP
>> >> overhead (headers can't be compressed), and base64, if you use binary
>> >> data
>> >> with ES.
>> >>
>> >> Please note that you must evaluate the bulk responses too, in order to
>> >> validate the notification about bulk success on doc level.
>> >>
>> >> It is possible to extend the whole ES API also to Websocket, so beside
>> >> JSON, it could also be possible to transfer JSON text frames or
>> >> SMILE/binary
>> >> frames on a single bi-directional channel. HTTP must use two channels
>> >> for
>> >> this, so with Websocket, you can reduce connection resources to the
>> >> half. In
>> >> this sense, the Netty channel / REST / Java API could be extended for
>> >> special realtime WS streaming mode applications, like for pubsub
>> >> applications. I experimented with that some time ago on ES 0.20
>> >> https://github.com/jprante/elasticsearch-transport-websocket  (needs
>> >> updating)
>> >>
>> >> From what I understand, the thrift transport plugin compiles the ES
>> >> API,
>> >> operates in a streaming-like fashion, and is providing a solution that
>> >> reduces HTTP overhead:
>> >> https://github.com/elasticsearch/elasticsearch-transport-thrift
>> >>
>> >> Jörg
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to elasticsearc...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/b2702386-ca31-4551-9a92-15775a9011d2%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f0bc0dc9-7341-4eb4-9e78-ff5200a03635%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDir2qbnprp6hbnv4paB8S9GdGEwDbjnhPnqFxX%3DxSn9SOg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Does the server support streaming?

Reply via email to