[ https://issues.apache.org/jira/browse/ZOOKEEPER-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mate Szalay-Beko resolved ZOOKEEPER-4566. ----------------------------------------- Fix Version/s: 3.9.0 Resolution: Fixed Issue resolved by pull request 1902 [https://github.com/apache/zookeeper/pull/1902] > Create tool for recursive snapshot analysis > ------------------------------------------- > > Key: ZOOKEEPER-4566 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4566 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Szabolcs Bukros > Priority: Major > Labels: pull-request-available > Fix For: 3.9.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I needed to analyze snapshots to determine which application caused a massive > snapshot size increase by recursively checking child node count and data size > for nodes, but could not find a tool for the job. Loading the snapshots one > by one and using a ZooKeeper client proved too slow and SnapshotFormatter was > very fast but processing the output to get the relevant data for my usecase > proved more work than writing a tool that has the output I need. So I wrote > SnapshotSumFormatter based on SnapshotFormatter: > {code:java} > USAGE: SnapshotSumFormatter snapshot_file starting_node max_depth > {code} > The tool recursively travels the child nodes under "starting_node" and > collects both node count and summarizes the data stored in every node under > the current one. This helps to identify problematic jobs/applications that > either store too much data or does not properly clean up. "max_depth" defines > the depth where the tool still writes to the output. 0 means there is no > depth limit, every node's stats will be displayed, 1 means it will only > contain the starting node's and it's children's stats, 2 ads another level > and so on. This ONLY affects the level of details displayed, NOT the > calculation. > An example output looks like this (with "SnapshotSumFormatter <snapshot_file> > / 2"): > {code:java} > / > children: 1250511 > data: 1952186580 > -- /zookeeper > -- children: 1 > -- data: 0 > -- /solr > -- children: 1773 > -- data: 8419162 > ---- /solr/configs > ---- children: 1640 > ---- data: 8407643 > ---- /solr/overseer > ---- children: 6 > ---- data: 0 > ---- /solr/live_nodes > ---- children: 3 > ---- data: 0 > {code} > I think this might prove useful for others too and would like to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010)