[
https://issues.apache.org/jira/browse/LUCENE-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-5949.
---------------------------------
Resolution: Fixed
Fix Version/s: 5.0
4.11
> Add Accountable.getChildResources()
> -----------------------------------
>
> Key: LUCENE-5949
> URL: https://issues.apache.org/jira/browse/LUCENE-5949
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Fix For: 4.11, 5.0
>
> Attachments: LUCENE-5949.patch
>
>
> Since Lucene 4.5, you can see how much memory lucene is using at a basic
> level by looking at SegmentReader.ramBytesUsed()
> In 4.11 its already improved, you can pull the codec producers and get ram
> usage split out by postings, norms, docvalues, stored fields, term vectors,
> etc.
> Unfortunately most toString's are fairly useless, so you don't have any
> insight further than that, even though behind the scenes its mostly just
> adding up other Accountables.
> So instead if we can improve the toString's, and if an Accountable can return
> its children, we can connect all the dots and you can easily diagnose/debug
> issues and see what is going on. I know i've been frustrated with having to
> hack up tons of System.out.printlns during development to see this stuff.
> So I think we should add this method to Accountable:
> {code}
> /**
> * Returns nested resources of this class.
> * The result should be a point-in-time snapshot (to avoid race conditions).
> * @see Accountables
> */
> // TODO: on java8 make this a default method returning emptyList
> Iterable<? extends Accountable> getChildResources();
> {code}
> We can also add a simple helper method for quick debugging
> {{Accountables.toString(Accountable)}} to print the "tree", example output
> for a lucene segment:
> {noformat}
> _5f(5.0.0):C8330469: 36.4 MB
> |-- postings [PerFieldPostings(formats=1)]: 8 MB
> |-- format 'Lucene41_0'
> [BlockTreeTermsReader(fields=6,delegate=Lucene41PostingsReader(positions=true,payloads=false))]:
> 8 MB
> |-- field 'alternatenames'
> [BlockTreeTerms(terms=3360242,postings=13779349,positions=17102250,docs=2876726)]:
> 945.2 KB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=23318,arcs=66497)]:
> 945.1 KB
> |-- field 'asciiname'
> [BlockTreeTerms(terms=2451266,postings=16849659,positions=16891234,docs=8329981)]:
> 686.1 KB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=12976,arcs=44103)]:
> 686 KB
> |-- field 'geonameid'
> [BlockTreeTerms(terms=8363399,postings=33321876,positions=-1,docs=8330469)]:
> 1.3 MB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=528,arcs=66225)]:
> 1.3 MB
> |-- field 'latitude'
> [BlockTreeTerms(terms=8714542,postings=33321876,positions=-1,docs=8330469)]:
> 1.7 MB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=854,arcs=77300)]:
> 1.7 MB
> |-- field 'longitude'
> [BlockTreeTerms(terms=11557222,postings=33321876,positions=-1,docs=8330469)]:
> 2.6 MB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=1577,arcs=114570)]:
> 2.6 MB
> |-- field 'name'
> [BlockTreeTerms(terms=2598879,postings=16833071,positions=16874267,docs=8330325)]:
> 771.5 KB
> |-- term index
> [FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=13790,arcs=46514)]:
> 771.3 KB
> |-- delegate [Lucene41PostingsReader(positions=true,payloads=false)]:
> 32 bytes
> |-- norms [Lucene49NormsProducer(fields=3,active=3)]: 15.9 MB
> |-- field 'alternatenames' [byte array]: 7.9 MB
> |-- field 'asciiname' [table compressed
> [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
> |-- field 'name' [table compressed
> [Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
> |-- docvalues [PerFieldDocValues(formats=1)]: 12.1 MB
> |-- format 'Lucene410_0' [Lucene410DocValuesProducer(fields=5)]: 12.1 MB
> |-- addresses field 'alternatenames'
> [MonotonicBlockPackedReader(blocksize=16384,size=407026,avgBPV=16)]: 808.5 KB
> |-- addresses field 'asciiname'
> [MonotonicBlockPackedReader(blocksize=16384,size=330528,avgBPV=17)]: 698.6 KB
> |-- addresses field 'name'
> [MonotonicBlockPackedReader(blocksize=16384,size=335020,avgBPV=17)]: 703.7 KB
> |-- ord index field 'alternatenames'
> [MonotonicBlockPackedReader(blocksize=16384,size=8330470,avgBPV=9)]: 9.8 MB
> |-- reverse index field 'alternatenames'
> [ReverseTermsIndex(size=6360)]: 77.9 KB
> |-- term bytes [PagedBytes(blocksize=32768)]: 67.7 KB
> |-- term addresses
> [MonotonicBlockPackedReader(blocksize=16384,size=6360,avgBPV=13)]: 10.2 KB
> |-- reverse index field 'asciiname' [ReverseTermsIndex(size=5165)]:
> 60.1 KB
> |-- term bytes [PagedBytes(blocksize=32768)]: 53 KB
> |-- term addresses
> [MonotonicBlockPackedReader(blocksize=16384,size=5165,avgBPV=11)]: 7 KB
> |-- reverse index field 'name' [ReverseTermsIndex(size=5235)]: 61.2 KB
> |-- term bytes [PagedBytes(blocksize=32768)]: 54.1 KB
> |-- term addresses
> [MonotonicBlockPackedReader(blocksize=16384,size=5235,avgBPV=11)]: 7.1 KB
> |-- stored fields [CompressingStoredFieldsReader(mode=FAST,chunksize=16384)]:
> 216.3 KB
> |-- stored field index [CompressingStoredFieldsIndexReader(blocks=65)]:
> 216.3 KB
> |-- doc base deltas: 55.8 KB
> |-- start pointer deltas: 158.9 KB
> |-- term vectors [CompressingTermVectorsReader(mode=FAST,chunksize=4096)]:
> 224 KB
> |-- term vector index [CompressingStoredFieldsIndexReader(blocks=67)]:
> 224 KB
> |-- doc base deltas: 65.6 KB
> |-- start pointer deltas: 156.8 KB
> {noformat}
> Note this works for any accountable, so also e.g. NRTCachingDirectory,
> OrdinalMap, Suggesters, FSTs, and so on. You can also e.g. traverse the graph
> yourself and output whatever you want.
> To be safe, I define that the graph returned is "point in time snapshot" and
> free of race conditions, and the Accountable helper methods provide this and
> also prevent access (even via cast) to datastructures you shouldn't be able
> to get to, just provide information.
> Since we aren't on java 8 yet (and cannot provide a simple default method),
> instead I think we should just add the method to Accountable, but add default
> emptyList() implementations to impacted datastructures such as DocIDSet and
> Suggester. For codec APIs, these are lower level, and there I think its best
> to leave the method abstract since they should really be providing useful
> information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]