[
https://issues.apache.org/jira/browse/HBASE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-27733.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
No activity for a long time, resolve.
Please open new issue for backporting if you are still around [~alanlemma].
Thanks.
> hfile split occurs during bulkload, the new HFile file does not specify
> favored nodes
> -------------------------------------------------------------------------------------
>
> Key: HBASE-27733
> URL: https://issues.apache.org/jira/browse/HBASE-27733
> Project: HBase
> Issue Type: Improvement
> Components: tooling
> Reporter: alan.zhao
> Assignee: alan.zhao
> Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> ## BulkloadHFilesTool.class
> /**
> * Copy half of an HFile into a new HFile.
> */
> private static void copyHFileHalf(Configuration conf, Path inFile, Path
> outFile,
> Reference reference, ColumnFamilyDescriptor familyDescriptor) throws
> IOException {
> FileSystem fs = inFile.getFileSystem(conf);
> CacheConfig cacheConf = CacheConfig.DISABLED;
> HalfStoreFileReader halfReader = null;
> StoreFileWriter halfWriter = null;
> try {
> ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs,
> inFile).build();
> StoreFileInfo storeFileInfo =
> new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
> storeFileInfo.initHFileInfo(context);
> halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context,
> cacheConf);
> storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
> Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo();
> int blocksize = familyDescriptor.getBlocksize();
> Algorithm compression = familyDescriptor.getCompressionType();
> BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
> HFileContext hFileContext = new
> HFileContextBuilder().withCompression(compression)
> .withChecksumType(StoreUtils.getChecksumType(conf))
> .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
> .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
> .withCreateTime(EnvironmentEdgeManager.currentTime()).build();
> *halfWriter = new StoreFileWriter.Builder(conf, cacheConf,
> fs).withFilePath(outFile)*
> *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();*
> HFileScanner scanner = halfReader.getScanner(false, false, false);
> scanner.seekTo();
> ...
>
> When hfile splitting occurs during bulkload, the new HFile file does not
> specify favored nodes, which will affect the locality of data. Internally, we
> implemented a version of the code that allows us to specify the favored nodes
> of the split HFile in copyHFileHalf() to avoid compromising locality
--
This message was sent by Atlassian Jira
(v8.20.10#820010)