Re: minimize disc space requirement.

2019-05-18 Thread Erick Erickson
It Depends (tm).

No, limiting the background threads won’t help much. Here’s the issue:
At time T, the segments file contains the current “snapshot” of the index, i.e. 
the names of all the segments that have been committed.

At time T+N, another commit happens. Or, consider an optimize which for 6x 
defaults to merging into a single segment. During any merge, _all_ the new 
segments are written before _any_ old segment is deleted. The very last 
operation is to rewrite the segments file, but only after all the new segments 
are flushed.

After this point, the next time a searcher is opened all the old, 
no-longer-used segments will be deleted, but the trigger is opening a new 
searcher.

To make matters more interesting, during the merge process say new documents 
are indexed. Those go into new segments that aren’t in the totals above. Plus 
you have transaction logs being written which are usually pretty small, but can 
grow between commits.

I’ve used optimize as the example, but it’t at least theoretically possible 
that all the current segments are rewritten into a larger segment as part of a 
normal merge. This is frankly not very likely with large indexes (say > 20G) 
but still possible.

Now all that said, on a disk that’s hosting multiple replicas from multiple 
shards and/or multiple collections, the likelihood of all this happening at 
once (barring someone issuing an optimize for all the collections hosted on the 
machine) is very low. But what you’re risking is an unknown. Lucene/Solr try 
very hard to prevent bad stuff happening on a “disk full” situation, but given 
the number of possible code paths that could be affected it can’t be guaranteed 
to have benign outcomes.

So perhaps you can run forever with, say, 25% of the aggregate index size free. 
Perhaps you’ll blow up unexpectedly and there’s really no way to say ahead of 
time.

Best,
Erick

> On May 18, 2019, at 8:36 AM, tom_s  wrote:
> 
> hey, 
> im aware that the best practice is to have disk space on your solr servers
> to be 2 times the size of the index. but my goal to minimize this overhead
> and have my index occupy more than 50% of disk space. in our index documents
> have TTL, so documents are deleted every day and it causes background merge
> of segments. can i change the merge policy and make the overhead of
> background merging lower?  
> will limiting the number of concurrent merges help(with the maxMergeCount
> parameter)? do you know other methods that will help? 
> 
> info about my server: 
> i use solr 6.5.1 . i index 200/docs per hour for each shard.i hard commit
> every 5 minutes. the size of the index in each shard is around 70GB (with
> around 15% deletions) . 
> i use the following merge policy:
> 
>  2
>  4
> 
> (the rest of the params are default) 
> 
> thanks
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: minimize disc space requirement.

2019-05-18 Thread Erick Erickson
Oh, and none of that includes people adding more and more documents to the 
existing replicas….

> On May 18, 2019, at 10:22 AM, Shawn Heisey  wrote:
> 
> On 5/18/2019 9:36 AM, tom_s wrote:
>> im aware that the best practice is to have disk space on your solr servers
>> to be 2 times the size of the index. but my goal to minimize this overhead
>> and have my index occupy more than 50% of disk space. in our index documents
>> have TTL, so documents are deleted every day and it causes background merge
>> of segments. can i change the merge policy and make the overhead of
>> background merging lower?
>> will limiting the number of concurrent merges help(with the maxMergeCount
>> parameter)? do you know other methods that will help?
> 
> Actually the recommendation is to have enough space for the index to triple, 
> not just double.  This can happen in the wild.
> 
> There are no merge settings that can prevent situations where the index 
> doubles in size temporarily due to merging.  Chances are that it's going to 
> happen eventually to any index.
> 
> Thanks,
> Shawn



Re: minimize disc space requirement.

2019-05-18 Thread Shawn Heisey

On 5/18/2019 9:36 AM, tom_s wrote:

im aware that the best practice is to have disk space on your solr servers
to be 2 times the size of the index. but my goal to minimize this overhead
and have my index occupy more than 50% of disk space. in our index documents
have TTL, so documents are deleted every day and it causes background merge
of segments. can i change the merge policy and make the overhead of
background merging lower?
will limiting the number of concurrent merges help(with the maxMergeCount
parameter)? do you know other methods that will help?


Actually the recommendation is to have enough space for the index to 
triple, not just double.  This can happen in the wild.


There are no merge settings that can prevent situations where the index 
doubles in size temporarily due to merging.  Chances are that it's going 
to happen eventually to any index.


Thanks,
Shawn


minimize disc space requirement.

2019-05-18 Thread tom_s
hey, 
im aware that the best practice is to have disk space on your solr servers
to be 2 times the size of the index. but my goal to minimize this overhead
and have my index occupy more than 50% of disk space. in our index documents
have TTL, so documents are deleted every day and it causes background merge
of segments. can i change the merge policy and make the overhead of
background merging lower?  
will limiting the number of concurrent merges help(with the maxMergeCount
parameter)? do you know other methods that will help? 

info about my server: 
i use solr 6.5.1 . i index 200/docs per hour for each shard.i hard commit
every 5 minutes. the size of the index in each shard is around 70GB (with
around 15% deletions) . 
i use the following merge policy:

  2
  4

(the rest of the params are default) 

thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html