tablet server runs out of memory performing a major compaction
--------------------------------------------------------------

                 Key: ACCUMULO-201
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-201
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
            Reporter: Eric Newton
            Assignee: Eric Newton


An accumulo user watched their cluster slowly shrink: one tablet server would 
fail every 8-10 minutes.

We determined that a major compaction of a single tablet would cause the tablet 
server to run out of memory.  That tablet would then be sent to a new server, 
which would schedule a major compaction, and it would die as well.

 # it was harder than it should have been to identify the tablet causing the 
problem
 # the tablet had a combination of several large existing files and a few bulk 
loaded files with a few very large key/values
 # large key/values were between *10 and 100 megabytes each*, the tablet server 
had a 1G memory limit
 # the next key for each file will sit in memory while performing the merge-sort

There exists a Constraint which can limit the size of mutations during normal 
ingest.  However, there is no constraint or check on the size of mutations that 
may be bulk loaded.

The tablet server should log the key extent (range) of a tablet prior to 
attempting a major compaction.

Large key values (those that approach a significant portion of the working 
memory of the JVM) might need to go into a separate merge file, or might result 
in multi-stage merges just to defend against an out-of-memory failure.

Tablet servers could mark tablets during a major compaction attempt.  Tablets 
with multiple markers could use a multi-pass merge to attempt to survive the 
merge.  Alternatively, the master could refuse to assign tablets with too many 
markers.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to