[jira] [Commented] (LUCENE-5949) Add Accountable.getChildResources()

Dawid Weiss (JIRA) Mon, 15 Sep 2014 10:58:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134204#comment-14134204
 ]


Dawid Weiss commented on LUCENE-5949:
-------------------------------------

Very cool. I just needed it very recently and had to inspect stuff manually.

> Add Accountable.getChildResources()
> -----------------------------------
>
>                 Key: LUCENE-5949
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5949
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>         Attachments: LUCENE-5949.patch
>
>
> Since Lucene 4.5, you can see how much memory lucene is using at a basic 
> level by looking at SegmentReader.ramBytesUsed()
> In 4.11 its already improved, you can pull the codec producers and get ram 
> usage split out by postings, norms, docvalues, stored fields, term vectors, 
> etc.
> Unfortunately most toString's are fairly useless, so you don't have any 
> insight further than that, even though behind the scenes its mostly just 
> adding up other Accountables.
> So instead if we can improve the toString's, and if an Accountable can return 
> its children, we can connect all the dots and you can easily diagnose/debug 
> issues and see what is going on. I know i've been frustrated with having to 
> hack up tons of System.out.printlns during development to see this stuff.
> So I think we should add this method to Accountable:
> {code}
>   /**
>    * Returns nested resources of this class. 
>    * The result should be a point-in-time snapshot (to avoid race conditions).
>    * @see Accountables
>    */
>   // TODO: on java8 make this a default method returning emptyList
>   Iterable<? extends Accountable> getChildResources();
> {code}
> We can also add a simple helper method for quick debugging 
> {{Accountables.toString(Accountable)}} to print the "tree", example output 
> for a lucene segment:
> {noformat}
> _5f(5.0.0):C8330469: 36.4 MB
> |-- postings [PerFieldPostings(formats=1)]: 8 MB
>     |-- format 'Lucene41_0' 
> [BlockTreeTermsReader(fields=6,delegate=Lucene41PostingsReader(positions=true,payloads=false))]:
>  8 MB
>         |-- field 'alternatenames' 
> [BlockTreeTerms(terms=3360242,postings=13779349,positions=17102250,docs=2876726)]:
>  945.2 KB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=23318,arcs=66497)]:
>  945.1 KB
>         |-- field 'asciiname' 
> [BlockTreeTerms(terms=2451266,postings=16849659,positions=16891234,docs=8329981)]:
>  686.1 KB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=12976,arcs=44103)]:
>  686 KB
>         |-- field 'geonameid' 
> [BlockTreeTerms(terms=8363399,postings=33321876,positions=-1,docs=8330469)]: 
> 1.3 MB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=528,arcs=66225)]:
>  1.3 MB
>         |-- field 'latitude' 
> [BlockTreeTerms(terms=8714542,postings=33321876,positions=-1,docs=8330469)]: 
> 1.7 MB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=854,arcs=77300)]:
>  1.7 MB
>         |-- field 'longitude' 
> [BlockTreeTerms(terms=11557222,postings=33321876,positions=-1,docs=8330469)]: 
> 2.6 MB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=1577,arcs=114570)]:
>  2.6 MB
>         |-- field 'name' 
> [BlockTreeTerms(terms=2598879,postings=16833071,positions=16874267,docs=8330325)]:
>  771.5 KB
>             |-- term index 
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=13790,arcs=46514)]:
>  771.3 KB
>         |-- delegate [Lucene41PostingsReader(positions=true,payloads=false)]: 
> 32 bytes
> |-- norms [Lucene49NormsProducer(fields=3,active=3)]: 15.9 MB
>     |-- field 'alternatenames' [byte array]: 7.9 MB
>     |-- field 'asciiname' [table compressed 
> [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
>     |-- field 'name' [table compressed 
> [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
> |-- docvalues [PerFieldDocValues(formats=1)]: 12.1 MB
>     |-- format 'Lucene410_0' [Lucene410DocValuesProducer(fields=5)]: 12.1 MB
>         |-- addresses field 'alternatenames' 
> [MonotonicBlockPackedReader(blocksize=16384,size=407026,avgBPV=16)]: 808.5 KB
>         |-- addresses field 'asciiname' 
> [MonotonicBlockPackedReader(blocksize=16384,size=330528,avgBPV=17)]: 698.6 KB
>         |-- addresses field 'name' 
> [MonotonicBlockPackedReader(blocksize=16384,size=335020,avgBPV=17)]: 703.7 KB
>         |-- ord index field 'alternatenames' 
> [MonotonicBlockPackedReader(blocksize=16384,size=8330470,avgBPV=9)]: 9.8 MB
>         |-- reverse index field 'alternatenames' 
> [ReverseTermsIndex(size=6360)]: 77.9 KB
>             |-- term bytes [PagedBytes(blocksize=32768)]: 67.7 KB
>             |-- term addresses 
> [MonotonicBlockPackedReader(blocksize=16384,size=6360,avgBPV=13)]: 10.2 KB
>         |-- reverse index field 'asciiname' [ReverseTermsIndex(size=5165)]: 
> 60.1 KB
>             |-- term bytes [PagedBytes(blocksize=32768)]: 53 KB
>             |-- term addresses 
> [MonotonicBlockPackedReader(blocksize=16384,size=5165,avgBPV=11)]: 7 KB
>         |-- reverse index field 'name' [ReverseTermsIndex(size=5235)]: 61.2 KB
>             |-- term bytes [PagedBytes(blocksize=32768)]: 54.1 KB
>             |-- term addresses 
> [MonotonicBlockPackedReader(blocksize=16384,size=5235,avgBPV=11)]: 7.1 KB
> |-- stored fields [CompressingStoredFieldsReader(mode=FAST,chunksize=16384)]: 
> 216.3 KB
>     |-- stored field index [CompressingStoredFieldsIndexReader(blocks=65)]: 
> 216.3 KB
>         |-- doc base deltas: 55.8 KB
>         |-- start pointer deltas: 158.9 KB
> |-- term vectors [CompressingTermVectorsReader(mode=FAST,chunksize=4096)]: 
> 224 KB
>     |-- term vector index [CompressingStoredFieldsIndexReader(blocks=67)]: 
> 224 KB
>         |-- doc base deltas: 65.6 KB
>         |-- start pointer deltas: 156.8 KB
> {noformat}
> Note this works for any accountable, so also e.g. NRTCachingDirectory, 
> OrdinalMap, Suggesters, FSTs, and so on. You can also e.g. traverse the graph 
> yourself and output whatever you want.
> To be safe, I define that the graph returned is "point in time snapshot" and 
> free of race conditions, and the Accountable helper methods provide this and 
> also prevent access (even via cast) to datastructures you shouldn't be able 
> to get to, just provide information.
> Since we aren't on java 8 yet (and cannot provide a simple default method), 
> instead I think we should just add the method to Accountable, but add default 
> emptyList() implementations to impacted datastructures such as DocIDSet and 
> Suggester. For codec APIs, these are lower level, and there I think its best 
> to leave the method abstract since they should really be providing useful 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5949) Add Accountable.getChildResources()

Reply via email to