[ https://issues.apache.org/jira/browse/ACCUMULO-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478013#comment-15478013 ]
Josh Elser commented on ACCUMULO-4391: -------------------------------------- Thanks for these last two posts, [~ivan.bella]. They are extremely helpful to understand the big picture. > Source deepcopies cannot be used safely in separate threads in tserver > ---------------------------------------------------------------------- > > Key: ACCUMULO-4391 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4391 > Project: Accumulo > Issue Type: Bug > Components: core > Affects Versions: 1.6.5 > Reporter: Ivan Bella > Assignee: Ivan Bella > Fix For: 1.6.6, 1.7.3, 1.8.1, 2.0.0 > > Original Estimate: 24h > Time Spent: 12h 50m > Remaining Estimate: 11h 10m > > We have iterators that create deep copies of the source and use them in > separate threads. As it turns out this is not safe and we end up with many > exceptions, mostly down in the ZlibDecompressor library. Curiously if you > turn on the data cache for the table being scanned then the errors disappear. > After much hunting it turns out that the real bug is in the > BoundedRangeFileInputStream. The read() method therein appropriately > synchronizes on the underlying FSDataInputStream, however the available() > method does not. Adding similar synchronization on that stream fixes the > issues. On a side note, the available() call is only invoked within the > hadoop CompressionInputStream for use in the getPos() call. That call does > not appear to actually be used at least in this context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)