[ https://issues.apache.org/jira/browse/HBASE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289334#comment-13289334 ]
Gopinathan A commented on HBASE-6040: ------------------------------------- Need to take care some more things for block encoding in case bulk load. Getting following exception while scanning the table. {noformat} 2012-06-05 15:39:24,771 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner java.lang.AssertionError: Expected on-disk data block encoding NONE, got PREFIX at org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.diskToCacheFormat(HFileDataBlockEncoderImpl.java:151) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:329) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.seekTo(HFileReaderV2.java:951) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:229) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:130) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2044) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3307) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1630) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1622) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1598) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2317) at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) {noformat} Also better to support BloomFilter in bulkload. > Use block encoding and HBase handled checksum verification in bulk loading > using HFileOutputFormat > -------------------------------------------------------------------------------------------------- > > Key: HBASE-6040 > URL: https://issues.apache.org/jira/browse/HBASE-6040 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 0.94.0, 0.96.0 > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 0.94.1 > > Attachments: HBASE-6040_94.patch, HBASE-6040_Trunk.patch > > > When the data is bulk loaded using HFileOutputFormat, we are not using the > block encoding and the HBase handled checksum features.. When the writer is > created for making the HFile, I am not seeing any such info passing to the > WriterBuilder. > In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont > have these info and do not pass also to the writer... So those HFiles will > not have these optimizations.. > Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide > one HFile(created by the MR) iff it can not belong to just one region, I can > see we pass the datablock encoding details and checksum details to the new > HFile writer. But this step wont happen normally I think.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira