Many thanks Aaron… Ok that sounds about right. This will greatly depend on your data, how > many fields, how many terms, etc.
Fields will be around 20-25.. But majority of searches happen on only 5-6 fields. Term count ~= 2.5 million. I would recommend using the latest Java7 > (perhaps Java8 but I haven't tested with it), and use the G1 garbage > collector if you plan on running larger heaps We are using latest version of 1.7. Actually we imported around 2TB of data as dry-run with just 16GB heap without major GC issues using good old CMS. There is no sorting/faceting during searches also. My reluctance stems from the fact that I am not quite familiar with G1 :) I am favouring more for write-thru cache [at-least around 50Gb] rather than read-cache because we have a lot of free RAM available & I feel read cache is going to use very less RAM. But I am not sure about this. Any pointers will greatly help. I would also recommend that you increase the > block cache buffer and file buffer sizes from 8K to 64K This is one issue we faced during dry-run. Marking-up from 8K to 16K solved the issue. I thought 32K must be a good fit for us. Will surely explore this… Thanks again for helping out On Sat, Feb 28, 2015 at 3:04 AM, Aaron McCurry <[email protected]> wrote: > On Fri, Feb 27, 2015 at 1:29 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Hi, > > > > I need a general guidance on number of machines/shards required for our > > first blur set-up > > > > Some data as follows > > > > 1. Shard-Server Config : 128GB RAM, 16-core dual socket with > > hyper-threading. 32 procs > > 2. Total dataset size: 10TB. With rep-factor=3, total cluster-size=30TB. > > Pre-populated via > > MR or Thrift... > > 3. We receive very less queries per minute [600-900 queries]. But the > > response times for > > every query must be <=150 ms > > > > Initially we thought we can create 500 shards each of 20GB size with > around > > 20 machines. > > > > Ok that sounds about right. This will greatly depend on your data, how > many fields, how many terms, etc. > > > > > > Can each shard-server machine with above specs handle 25 shards? Is such > a > > configuration over-utilized/under-utilized? > > > > Again, it likely depends. I would recommend using the latest Java7 > (perhaps Java8 but I haven't tested with it), and use the G1 garbage > collector if you plan on running larger heaps. Whatever is leftover can be > allocated to the block cache. I would also recommend that you increase the > block cache buffer and file buffer sizes from 8K to 64K. This will also > decrease heap pressure for the number of entries in the block cache lru > map. > > > > > > How do folks run in production. Any numbers/pointers will be really > helpful > > > > Generally large shard servers (like the ones you are suggesting) with a few > controllers (6-12) are typical setups. > > If I can provide more details I will follow up. > > Aaron > > > > > > -- > > Ravi > > >
