[ https://issues.apache.org/jira/browse/ACCUMULO-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476871#comment-15476871 ]
marco polo commented on ACCUMULO-4391: -------------------------------------- Why is that decompressor being shared? Why isn't the thread being given access to its own decompressor on its own block read? > Source deepcopies cannot be used safely in separate threads in tserver > ---------------------------------------------------------------------- > > Key: ACCUMULO-4391 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4391 > Project: Accumulo > Issue Type: Bug > Components: core > Affects Versions: 1.6.5 > Reporter: Ivan Bella > Assignee: Ivan Bella > Fix For: 1.6.6, 1.7.3, 1.8.1, 2.0.0 > > Original Estimate: 24h > Time Spent: 12.5h > Remaining Estimate: 11.5h > > We have iterators that create deep copies of the source and use them in > separate threads. As it turns out this is not safe and we end up with many > exceptions, mostly down in the ZlibDecompressor library. Curiously if you > turn on the data cache for the table being scanned then the errors disappear. > After much hunting it turns out that the real bug is in the > BoundedRangeFileInputStream. The read() method therein appropriately > synchronizes on the underlying FSDataInputStream, however the available() > method does not. Adding similar synchronization on that stream fixes the > issues. On a side note, the available() call is only invoked within the > hadoop CompressionInputStream for use in the getPos() call. That call does > not appear to actually be used at least in this context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)