[ 
https://issues.apache.org/jira/browse/OAK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926377#comment-17926377
 ] 

Thomas Mueller commented on OAK-11478:
--------------------------------------

Usage:
{noformat}
java -cp oak-run-1.77-SNAPSHOT.jar 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.StatsBuilder 
--fileName <treeStoreFileName> --treeStore
{noformat}

Where the tree store file name is eg. "r194f8dfc943-0-1.merged-tree.lz4"

Output: first, the progress is listed, in million nodes (with path). Then there 
are different sections.
* NodeCount: number of nodes if more than one million.
* PropertyStats: for each property the count, approximate distinct values, avg 
and max size.
* NodeTypeCount: number of nodes with the various primary types and mixins.
* BinarySize: references (in blob store) and embedded (small binaries), in GB, 
per path
* BinarySizeHistogram: histogram of binary sizes (approximation) for references 
and embedded
* TopLargestBinaries: top 10 largest binaries
* DistinctBinarySizeHistogram: histogram for approximate counts of binaries, 
and approximate distinct counts
* DistinctBinarySize: approximate counts of binaries, and approximate distinct 
counts
 
The following means there are around 4 GB of binary references, and 2 GB in 
/content/dam and 2 GB in the version store. The DistinctBinarySize shows that 
only ~2 GB are distinct binaries (multiple references can point to the same 
binary):

{noformat}
BinarySize references in GB (resolution: 100000000)
/: 4
/content: 2
/content/dam: 2
/content/dam/projects: 1
/content/dam/projects/translation: 1
/jcr:system: 2
/jcr:system/jcr:versionStorage: 2

DistinctBinarySize
total distinct count: 33866
total distinct size GiB: 2
total reference count: 117717
total reference size GiB: 4
{noformat}

> Node store statistics: support the tree store
> ---------------------------------------------
>
>                 Key: OAK-11478
>                 URL: https://issues.apache.org/jira/browse/OAK-11478
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>
> There is a statistics collector in oak-run-common that I use sometimes.
> It is currently missing support for tree stores.
> This issue is about adding support for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to