We are considering indexing our 11 million books at a page level, which
comes to about 3 billion Solr documents.

Our subject field  by necessity is multi-valued so the UnInvertedField is
used for faceting.

When testing an index of about 200 million documents, when we do a first
faceting on one field (query appended below), the memory use rises from
about 2.5 GB to 13GB.  If I run GC after the query the memory use goes down
to about 3GB and subsequent queries don't significantly increase the memory
use.

After the query is run various statistics from UnInvertedField are sent to
the log (see below), but they seem to represent the final data structure
rather than the peak.  For example memSize is listed as 1.8GB, while the
temporary data structure was probably closer to 10GB (total 13GB).

Is there a formula for estimating the peak memory size?
Can the statistics spit out to INFO be used to somehow estimate the peak
memory size?

Tom
-----

Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
INFO: UnInverted multi-valued field {field=topicStr,
memSize=1,768,101,824,
tindexSize=86,028,
time=45,854,
phase1=41,039,
nTerms=271,987,
bigTerms=0,
termInstances=569,429,716,
uses=0}
Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute

INFO: [core] webapp=/dev-3 path=/select
params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
hits=138,605,690 status=0 QTime=49,797

Reply via email to