[ https://issues.apache.org/jira/browse/ACCUMULO-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133060#comment-15133060 ]
Eric Newton commented on ACCUMULO-4120: --------------------------------------- The proposed recovery procedure above worked. This ticket has already been addressed in 1.7 and beyond. > large root tablet causes system failure > --------------------------------------- > > Key: ACCUMULO-4120 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4120 > Project: Accumulo > Issue Type: Bug > Components: tserver > Affects Versions: 1.6.4, 1.7.0 > Environment: 300 node test cluster > Reporter: Eric Newton > Fix For: 1.7.0 > > > On a large test cluster, a table was storing very large row keys that were > similar for 1 - 10 M per row id (which, is a bad schema... but not the > problem, yet). > Large row keys made the tablet large, so it split. And the first 1-10M of the > row keys were stored in the metadata table. > The metadata table has a small split size, so it split. > This ended up recording several keys in the root tablet that were very large. > For example, a single metadata table file was 700M (compressed) and contained > 34 keys. > The problem is that *everyone* wants to read the root tablet to find the > metadata tablets. And that was causing the tablet server hosting the tablet > to run out of heap. > Possible solution: bring down the cluster, put it in "safe mode" where only > the metadata table is brought online. Raise the split size of the metadata > table to something large (1G?). Merge the metadata table which should remove > the large records from the root tablet. > There's a utility (SplitLarge) than can be used to remove large keys from the > RFiles of the offending table. Once the ridiculous keys are stripped out the > table can be brought online and merged, which will remove the large keys from > the metadata table. > As long is this is done on a small number of nodes, the servers should have > enough memory to satisfy the requests to perform the metadata table queries > and updates. > We may want to consider adding key size to the metadata table constraint to > prevent these things in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)