[ https://issues.apache.org/jira/browse/ACCUMULO-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170326#comment-16170326 ]
Keith Turner commented on ACCUMULO-4708: ---------------------------------------- I think it would be good to open another issue to investigate how mutation handles really large puts and add sanity checks as needed. I took a quick look at the code and I don't think it would handle it well in some cases. > Limit RFile block size to 2GB > ----------------------------- > > Key: ACCUMULO-4708 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4708 > Project: Accumulo > Issue Type: Bug > Components: core > Reporter: Nick Felts > Assignee: Nick Felts > Labels: pull-request-available > Fix For: 1.7.4, 1.8.2, 2.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In core/file/rfile/bcfile/BCFile.java, the block size is determined by the > size of a DataOutputStream, which returns an int. The javadoc for size() > states "If the counter overflows, it will be wrapped to Integer.MAX_VALUE." > This can be a problem when RFiles are saving block regions, because Accumulo > has no way of knowing how large the block actually is when the size is > Integer.MAX_VALUE. > To fix this, a check can be put in to throw an exception if Integer.MAX_VALUE > bytes (or more) have been written. A check can also be made when appending > to the block to make sure the key/value won't create a block that is too > large. > A check should also be made when reading in TABLE_FILE_COMPRESSED_BLOCK_SIZE > and TABLE_FILE_COMPRESSED_BLOCK_SIZE_INDEX to make sure nobody mistakenly > configures larger than Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.4.14#64029)