On Tue, Nov 15, 2016 at 6:16 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> You can make no assumptions about locality in terms of where separate
> documents land on disk. I suppose if you have the whole corpus at index
> time you
> could index these "similar" documents contiguously. T
>

Wow.. that's shockingly frightening. There are a ton of optimizations if
you can trick the underlying content store into performing locality.

Not trying to be overly negative so another way to phrase it is that at
least there's room for improvement !


> My base question is why you'd care about compressing 500G. Disk space
> is so cheap that the expense of trying to control this dwarfs any
> imaginable
> $avings, unless you're talking about a lot of 500G indexes. In other words
> this seems like an
> XY problem, you're asking about compressing when you are really concerned
> with something else.
>

500GB per day... additionally, disk is cheap, but IOPS are not. The more we
can keep in ram and on SSD the better.

And we're trying to get as much in RAM then SSD as possible... plus we have
about 2 years of content.  It adds up ;)

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to