In a word... no. There are simply too many variables here to give any decent estimate.
The spreadsheet is, at best, an estimate. It hasn't been put through any rigorous QA so the fact that it's off in your situation is not surprising. I wish we had a better answer. And the disk size isn't particularly interesting anyway. The *.fdt and *.fdx files contain compressed copies of the raw data in _stored_ fields. If I index the same data with all fields set stored="true" then stored="false", my disk size may vary by a large factor. And the stored data has very little memory cost, memory usually being the limiting factor in your Solr installation. Are you storing position information? Term vectors? Are you ngramming your fields? and on and on. Each and every one of these changes the memory requirements... Sorry we can't be more help Erick On Mon, Mar 9, 2015 at 12:20 PM, Gaurav gupta <gupta.gaurav0...@gmail.com> wrote: > Could you please guide me how to reasonably estimate the disk size for > Lucene 4.x (precisely 4.8.1 version) including worst case scenario. > > I have referred the formula and excel sheet shared @ > https://lucidworks.com/blog/estimating-memory-and-storage-for-lucenesolr/ > > I think it seems to be devised for Lucene 2.9. I am not sure if it's hold > true for 4.x version. > In my case, either the actual index size is coming close to the worst case > or higher than that. Even, one of our enterprise customer has observed 3 > times higher index size than the estimated index size (based on excel > sheet). > > Alternatively, can I know the average doc size in Lucene index (of a > reasonable size of data) so that I can extrapolate that for complete 250 > million documents. > > Thanks > Gaurav --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org