1. Mostly, indexes are result of a partition design outside ES. For
example, by time, user, data origin. The beauty of ES is that it can host
as many indexes as you wish.

2. If your maximum number of nodes (hosts) you want to spend to ES is
known, use that node number for the number of shards. So you make sure your
cluster can scale. If the number is not known, try to estimate the total
number of documents to get indexed, the total volume of that documents, and
an estimated index volume per shard. Rule of thumb: a shard should be sized
so it can fit into the Java heap and so that it can be moved between nodes
in reasonable time (~1-10 GB).

3. You can scale up by adding nodes - just start ES on another host. Scale
down is also easy, stop ES on a node.

4. You have to write a program that traverses your folders, picks up each
document, and extracts fields from the document to get them indexed. With
scrutmydocs.org you can experiment how this works by using such a file
traverser which is already prepared to handle quite a lot of file types
automatically.

5. You should consider using one of the standard clients. As ES supports
HTTP REST, and the standard clients are designed to support a comparable
set of features, it does not matter what language you use. Just pick your
favorite language. (My personal favorite is Java, where there is no need to
use HTTP REST, instead the native transport protocol can be used)

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvSgLthdp8Nk%3DTMVQYymzRYWOnEvAC4HYo14bMH1Ks8g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to