Not sure if I understand.

If I had to index a pile of documents, say 15M, I would build bulk request
of 1000 documents, where each doc is in avg ~1K so I end up at ~1MB. I
would not care about different doc size as they equal out over the total
amountThen I send this bulk request over the wire. With a threaded bulk
feeder, I can control concurrent bulk requests of up to the number of CPU
cores, say 32 cores. Then repeat. In total, I send 15K bulk requests.

The effect is that on the ES cluster, each bulk request of 1M size
allocates only few resources on the heap and the bulk request can be
processed fast. If the cluster is slow, the client sees the ongoing bulk
requests piling up before bulk responses are returned, and can control bulk
capacity against a maximum concurrency limit. If the cluster is fast, the
client receives responses almost instantly, and the client can decide if it
is more appropriate to increase bulk request size or concurrency.

Does it make sense?

Jörg




On Mon, Feb 3, 2014 at 5:06 PM, ZenMaster80 <sabdall...@gmail.com> wrote:

> Jörg,
>
> Just so I understand this, if I were to index 100 MB worth of data total
> with chunk volumes of 5 MB each, this means I have to index 20 times.If I
> were to set the bulk size to 20 MB, I will have to index 5 times.
> This is a small data size, picture I have millions of documents. Are you
> saying the first method is better because of GC operations would be faster?
>
> Thanks again
>
>
> On Monday, February 3, 2014 9:47:46 AM UTC-5, Jörg Prante wrote:
>>
>> Note, bulk operates just on network transport level, not on index level
>> (there are no transactions or chunks). Bulk saves network roundtrips, while
>> the execution of index operations is essentially the same as if you
>> transferred the operations one by one.
>>
>> To change refresh interval to -1, use an update settings request like
>> this:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/indices-update-settings.html
>>
>>         ImmutableSettings.Builder settingsBuilder = ImmutableSettings.
>> settingsBuilder();
>>         settingsBuilder.put("refresh_interval", "-1"));
>>         UpdateSettingsRequest updateSettingsRequest = new
>> UpdateSettingsRequest(myIndexName)
>>                 .settings(settingsBuilder);
>>         client.admin().indices()
>>                 .updateSettings(updateSettingsRequest)
>>                 .actionGet();
>>
>> Jörg
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/531710e5-e42a-4ed1-a1e0-ad5d48e14146%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF9WFcFD5pgjdjV1fM7iJhwZdf%2B4zzhYzGRKtFbhN55bA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to