[ 
https://issues.apache.org/jira/browse/ACCUMULO-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405057#comment-13405057
 ] 

Josh Elser commented on ACCUMULO-665:
-------------------------------------

The big area that this can present itself is via the seekColumnFamilies on the 
SortedKeyValueIterator#seek() method. In the IntersectingIterator inside the 
core module, you could run into "excessive" memory usage.

Say you're intersecting over the two terms "foo" and "bar" and that you have 
some column "documents" which is directly before the term "foo". Assume the 
keys in the "documents" column are in their own locality group and have very 
large Values associated with them. The IntersectingIterator only uses any 
seek-column-families that are passed in but does not set any itself. Meaning, 
even though the "documents" column is in its own section of the RFile, by not 
specifically setting the seek column families for each term to just the term 
itself, the underlying Accumulo code will still open up all locality groups 
(the one for "documents" and the default locality group).

The javadoc for SortedKeyValueIterator should also be updated to inform other 
users of the implications that (not) setting the seekColumnFamilies has.
                
> large values, complex iterator stacks, and RFile readers can consume a 
> surprising amount of memory
> --------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-665
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-665
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0, 1.4.0
>         Environment: large cluster
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Minor
>
> On a production cluster, with a complex iterator tree, a large value (~350M) 
> was causing a 4G tserver to fail with out-of-memory.
> There were several factors contributing to the problem:
> # a bug: the query should not have been looking to the big data
> # complex iterator tree, causing many copies of the data to be held at the 
> same time
> # RFile doubles the buffer it uses to load values, and continues to use that 
> large buffer for future values
> This ticket is for the last point.  If we know we're not even going to look 
> at the value, we can read past it without storing it in memory.  It is 
> surprising that skipping past a large value would cause the server to run out 
> of memory, especially since it should fit into memory enough times to be 
> returned to the caller.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to