1. Mostly, indexes are result of a partition design outside ES. For example, by time, user, data origin. The beauty of ES is that it can host as many indexes as you wish.
2. If your maximum number of nodes (hosts) you want to spend to ES is known, use that node number for the number of shards. So you make sure your cluster can scale. If the number is not known, try to estimate the total number of documents to get indexed, the total volume of that documents, and an estimated index volume per shard. Rule of thumb: a shard should be sized so it can fit into the Java heap and so that it can be moved between nodes in reasonable time (~1-10 GB). 3. You can scale up by adding nodes - just start ES on another host. Scale down is also easy, stop ES on a node. 4. You have to write a program that traverses your folders, picks up each document, and extracts fields from the document to get them indexed. With scrutmydocs.org you can experiment how this works by using such a file traverser which is already prepared to handle quite a lot of file types automatically. 5. You should consider using one of the standard clients. As ES supports HTTP REST, and the standard clients are designed to support a comparable set of features, it does not matter what language you use. Just pick your favorite language. (My personal favorite is Java, where there is no need to use HTTP REST, instead the native transport protocol can be used) Jörg -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvSgLthdp8Nk%3DTMVQYymzRYWOnEvAC4HYo14bMH1Ks8g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.