[jira] [Created] (HBASE-25859) Reference class incorrectly parses the protobuf magic marker
Constantin-Catalin Luca created HBASE-25859: --- Summary: Reference class incorrectly parses the protobuf magic marker Key: HBASE-25859 URL: https://issues.apache.org/jira/browse/HBASE-25859 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.4.1 Reporter: Constantin-Catalin Luca Assignee: Constantin-Catalin Luca The Reference class incorrectly parses the protobuf magic marker. It uses: `DataInputStream.read(byte[lengthOfPNMagic])` but this call does not guarantee to read all the bytes the marker. The fix is the same as the one for https://issues.apache.org/jira/browse/HBASE-25674 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25674) RegionInfo.parseFrom(DataInputStream) does not read correc
Constantin-Catalin Luca created HBASE-25674: --- Summary: RegionInfo.parseFrom(DataInputStream) does not read correc Key: HBASE-25674 URL: https://issues.apache.org/jira/browse/HBASE-25674 Project: HBase Issue Type: Bug Reporter: Constantin-Catalin Luca -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24541) Add support to run LoadIncrementalHFiles in a distributed manner
Constantin-Catalin Luca created HBASE-24541: --- Summary: Add support to run LoadIncrementalHFiles in a distributed manner Key: HBASE-24541 URL: https://issues.apache.org/jira/browse/HBASE-24541 Project: HBase Issue Type: Improvement Components: mapreduce, Performance Affects Versions: 1.4.0 Reporter: Constantin-Catalin Luca LoadIncrementalHFiles takes a very long time to complete when running HBase on top of S3 and attempting to bulkload 500K-700K files. The root cause of this is a combination of the higher latency of S3 (as compared to HDFS) as well as the calls made by LoadIncrementalHFiles to the underlying filesystem(each file is opened, seeked to the trailer offset at the end, and then the trailer is read). Increasing the parallelism does not yield any significant improvement. This seems to stem from the fact that once the trailer is read the stream is not consumed to the end. This causes the underlying HTTP connection to be aborted and it cannot be re-used. The proposed solution would be to also add support to run LoadIncrementalHFiles on multiple machines as a map reduce job. -- This message was sent by Atlassian Jira (v8.3.4#803005)