We are considering indexing our 11 million books at a page level, which comes to about 3 billion Solr documents.
Our subject field by necessity is multi-valued so the UnInvertedField is used for faceting. When testing an index of about 200 million documents, when we do a first faceting on one field (query appended below), the memory use rises from about 2.5 GB to 13GB. If I run GC after the query the memory use goes down to about 3GB and subsequent queries don't significantly increase the memory use. After the query is run various statistics from UnInvertedField are sent to the log (see below), but they seem to represent the final data structure rather than the peak. For example memSize is listed as 1.8GB, while the temporary data structure was probably closer to 10GB (total 13GB). Is there a formula for estimating the peak memory size? Can the statistics spit out to INFO be used to somehow estimate the peak memory size? Tom ----- Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init> INFO: UnInverted multi-valued field {field=topicStr, memSize=1,768,101,824, tindexSize=86,028, time=45,854, phase1=41,039, nTerms=271,987, bigTerms=0, termInstances=569,429,716, uses=0} Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-3 path=/select params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml} hits=138,605,690 status=0 QTime=49,797