[ https://issues.apache.org/jira/browse/SOLR-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858919#comment-16858919 ]
Erick Erickson commented on SOLR-13512: --------------------------------------- What am I actually seeing here? This is for the content of a Wikipedia page (i.e. textfield) {code} "field 'text' [BlockTreeTerms(seg=_p5 terms=3060769,postings=58308889,positions=157811023,docs=900727)]":{ "total":248, "term index [FST(input=BYTE1,output=ByteSequenceOutputs]":88}, {code} I have 3,060,769 terms 58,308,889 postings 157,811,023 positions 900,727 docs. What is the "total" of 248? I find it hard to believe that this field only occupies 248 bytes, unless that's just a pointer to, stuff out in MMap space. So if I'm trying to estimate how much of my RAM this segment needs, what clues do I have? And is there any way to determine Java heap .vs. MMap space? I know it's "tricky", what I'm after here is something a user who hasn't a clue about postings can get their arms around. Running more experiments.... > Raw index data analysis tool > ---------------------------- > > Key: SOLR-13512 > URL: https://issues.apache.org/jira/browse/SOLR-13512 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: SOLR-13512.patch, SOLR-13512.patch, SOLR-13512.patch, > SOLR-13512.patch, rawSizeDetails.json, rawSizeSummary.json > > > A common question from Solr users is how to determine how a given schema > field and all its related index data contributes to the total index size. > It's possible to estimate this information by doing a single full pass > through all index data, aggregating estimated sizes of terms, postings, doc > values and stored fields. The totals represent of course the worst case > scenario when there's no index compression at all, but still they should be > useful for answering the questions above. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org