[ 
https://issues.apache.org/jira/browse/ACCUMULO-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133060#comment-15133060
 ] 

Eric Newton commented on ACCUMULO-4120:
---------------------------------------

The proposed recovery procedure above worked.  This ticket has already been 
addressed in 1.7 and beyond.


> large root tablet causes system failure
> ---------------------------------------
>
>                 Key: ACCUMULO-4120
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4120
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.4, 1.7.0
>         Environment: 300 node test cluster
>            Reporter: Eric Newton
>             Fix For: 1.7.0
>
>
> On a large test cluster, a table was storing very large row keys that were 
> similar for 1 - 10 M per row id (which, is a bad schema... but not the 
> problem, yet).
> Large row keys made the tablet large, so it split. And the first 1-10M of the 
> row keys were stored in the metadata table.
> The metadata table has a small split size, so it split.
> This ended up recording several keys in the root tablet that were very large. 
> For example, a single metadata table file was 700M (compressed) and contained 
> 34 keys.
> The problem is that *everyone* wants to read the root tablet to find the 
> metadata tablets. And that was causing the tablet server hosting the tablet 
> to run out of heap.
> Possible solution: bring down the cluster, put it in "safe mode" where only 
> the metadata table is brought online. Raise the split size of the metadata 
> table to something large (1G?). Merge the metadata table which should remove 
> the large records from the root tablet.
> There's a utility (SplitLarge) than can be used to remove large keys from the 
> RFiles of the offending table. Once the ridiculous keys are stripped out the 
> table can be brought online and merged, which will remove the large keys from 
> the metadata table.
> As long is this is done on a small number of nodes, the servers should have 
> enough memory to satisfy the requests to perform the metadata table queries 
> and updates.
> We may want to consider adding key size to the metadata table constraint to 
> prevent these things in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to