[jira] [Commented] (SOLR-13512) Raw index data analysis tool

Erick Erickson (JIRA) Fri, 07 Jun 2019 12:11:22 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858919#comment-16858919
 ]


Erick Erickson commented on SOLR-13512:
---------------------------------------

What am I actually seeing here? This is for the content of a Wikipedia page 
(i.e. textfield)
{code}
    "field 'text' [BlockTreeTerms(seg=_p5 
terms=3060769,postings=58308889,positions=157811023,docs=900727)]":{
                        "total":248,
                        "term index 
[FST(input=BYTE1,output=ByteSequenceOutputs]":88},
{code}

I have
    3,060,769 terms 
  58,308,889 postings
157,811,023 positions
       900,727  docs.

What is the "total" of 248? I find it hard to believe that this field only 
occupies 248 bytes, unless that's just a pointer to, stuff out in MMap space.

So if I'm trying to estimate how much of my RAM this  segment  needs, what  
clues do I have? And is there any way to determine Java heap .vs. MMap space? I 
know it's "tricky", what I'm after here is something a user who hasn't a  clue 
about postings can get their arms around.

Running more experiments....

> Raw index data analysis tool
> ----------------------------
>
>                 Key: SOLR-13512
>                 URL: https://issues.apache.org/jira/browse/SOLR-13512
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>             Fix For: master (9.0), 8.2
>
>         Attachments: SOLR-13512.patch, SOLR-13512.patch, SOLR-13512.patch, 
> SOLR-13512.patch, rawSizeDetails.json, rawSizeSummary.json
>
>
> A common question from Solr users is how to determine how a given schema 
> field and all its related index data contributes to the total index size.
> It's possible to estimate this information by doing a single full pass 
> through all index data, aggregating estimated sizes of terms, postings, doc 
> values and stored fields. The totals represent of course the worst case 
> scenario when there's no index compression at all, but still they should be 
> useful for answering the questions above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13512) Raw index data analysis tool

Reply via email to