The absolute time taken depends on the cluster resources of course. At my
laptop, for 1000 docs of ~1k size in average, a scroll response 'took'
field shows usually ~200-500ms. It takes additional time to process the
response hits.

I am not sure if the number of shards is relevant. There are more important
factors: shard numbers per node, shard size, buffers and heap memory,
network compression, network speed, node workload...

If you are interested in a Java scan/scroll example, you can peek into the
knapsack plugin source

https://github.com/jprante/elasticsearch-knapsack/blob/master/src/main/java/org/xbib/elasticsearch/action/RestExportAction.java#L310

Critical for a scalable scan/scroll is a reasonable timeout. In the
knapsack plugin, I use a default of 30 seconds.

In the ES docs, a timeout of 10 minutes is used

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html

which seems not very helpful, as this will pressure your heap in almost all
cases of long-lasting scan/scroll...

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGhWWJf%3DdvxsBBEc%3DzoNfGsqLofTfOv4J4CmXbGJACg-w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to