On Fri, Feb 27, 2015 at 1:29 AM, Ravikumar Govindarajan < [email protected]> wrote:
> Hi, > > I need a general guidance on number of machines/shards required for our > first blur set-up > > Some data as follows > > 1. Shard-Server Config : 128GB RAM, 16-core dual socket with > hyper-threading. 32 procs > 2. Total dataset size: 10TB. With rep-factor=3, total cluster-size=30TB. > Pre-populated via > MR or Thrift... > 3. We receive very less queries per minute [600-900 queries]. But the > response times for > every query must be <=150 ms > > Initially we thought we can create 500 shards each of 20GB size with around > 20 machines. > Ok that sounds about right. This will greatly depend on your data, how many fields, how many terms, etc. > > Can each shard-server machine with above specs handle 25 shards? Is such a > configuration over-utilized/under-utilized? > Again, it likely depends. I would recommend using the latest Java7 (perhaps Java8 but I haven't tested with it), and use the G1 garbage collector if you plan on running larger heaps. Whatever is leftover can be allocated to the block cache. I would also recommend that you increase the block cache buffer and file buffer sizes from 8K to 64K. This will also decrease heap pressure for the number of entries in the block cache lru map. > > How do folks run in production. Any numbers/pointers will be really helpful > Generally large shard servers (like the ones you are suggesting) with a few controllers (6-12) are typical setups. If I can provide more details I will follow up. Aaron > > -- > Ravi >
