[jira] [Created] (HBASE-28100) The size that is checked against the maxfilesize threshold is the uncompressed size of the HFile

2023-09-19 Thread alan.zhao (Jira)
alan.zhao created HBASE-28100:
-

 Summary:  The size that is checked against the maxfilesize 
threshold is the uncompressed size of the HFile
 Key: HBASE-28100
 URL: https://issues.apache.org/jira/browse/HBASE-28100
 Project: HBase
  Issue Type: Bug
 Environment: HBase 2.x
Reporter: alan.zhao
Assignee: alan.zhao
 Attachments: image-2023-09-20-12-09-49-959.png

HBase server is configured to use Snappy compression.when doing bulkload in 
HBase, the size that is checked against the maxfilesize threshold is the 
uncompressed size of the HFile, not the compressed size.

HFileOutputFormat2.class

 
{code:java}
//代码占位符
new RecordWriter() {

@Override
public void write(ImmutableBytesWritable row, V cell) throws IOException {
...


}

}
 {code}
!image-2023-09-20-12-09-49-959.png!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27733) hfile split occurs during bulkload, the new HFile file does not specify favored nodes

2023-03-19 Thread alan.zhao (Jira)
alan.zhao created HBASE-27733:
-

 Summary: hfile split occurs during bulkload, the new HFile file 
does not specify favored nodes
 Key: HBASE-27733
 URL: https://issues.apache.org/jira/browse/HBASE-27733
 Project: HBase
  Issue Type: Improvement
Reporter: alan.zhao
Assignee: alan.zhao


## BulkloadHFilesTool.class

/**
* Copy half of an HFile into a new HFile.
*/
private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile,
Reference reference, ColumnFamilyDescriptor familyDescriptor) throws 
IOException {
FileSystem fs = inFile.getFileSystem(conf);
CacheConfig cacheConf = CacheConfig.DISABLED;
HalfStoreFileReader halfReader = null;
StoreFileWriter halfWriter = null;
try {
ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, 
inFile).build();
StoreFileInfo storeFileInfo =
new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
storeFileInfo.initHFileInfo(context);
halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, 
cacheConf);
storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
Map fileInfo = halfReader.loadFileInfo();

int blocksize = familyDescriptor.getBlocksize();
Algorithm compression = familyDescriptor.getCompressionType();
BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
HFileContext hFileContext = new 
HFileContextBuilder().withCompression(compression)
.withChecksumType(StoreUtils.getChecksumType(conf))
.withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
.withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
.withCreateTime(EnvironmentEdgeManager.currentTime()).build();
*halfWriter = new StoreFileWriter.Builder(conf, cacheConf, 
fs).withFilePath(outFile)*
 *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();*
HFileScanner scanner = halfReader.getScanner(false, false, false);
scanner.seekTo();

...

 

When hfile splitting occurs during bulkload, the new HFile file does not 
specify favored nodes, which will affect the locality of data. Internally, we 
implemented a version of the code that allows us to specify the favored nodes 
of the split HFile in copyHFileHalf() to avoid compromising locality



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27688) HFile splitting occurs during bulkload, the CREATE_TIME_TS of hfileinfo is 0

2023-03-05 Thread alan.zhao (Jira)
alan.zhao created HBASE-27688:
-

 Summary: HFile splitting occurs during bulkload, the 
CREATE_TIME_TS of hfileinfo is 0
 Key: HBASE-27688
 URL: https://issues.apache.org/jira/browse/HBASE-27688
 Project: HBase
  Issue Type: Bug
Reporter: alan.zhao


If HFile splitting occurs during bulkload, the CREATE_TIME_TS of hfileinfo 
=0,When HFile is copied after splitting, CREATE_TIME_TS of the original file is 
not copied。
{code:java}
##BulkLoadHFilesTool.class 
/**
 * Copy half of an HFile into a new HFile.
 */
private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile,
  Reference reference, ColumnFamilyDescriptor familyDescriptor) throws 
IOException {
  FileSystem fs = inFile.getFileSystem(conf);
  CacheConfig cacheConf = CacheConfig.DISABLED;
  HalfStoreFileReader halfReader = null;
  StoreFileWriter halfWriter = null;
  try {
。。。
HFileContext hFileContext = new 
HFileContextBuilder().withCompression(compression)
  .withChecksumType(StoreUtils.getChecksumType(conf))
  
.withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
  
.withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
  .build();
// TODO .withCreateTime(EnvironmentEdgeManager.currentTime())      

halfWriter = new StoreFileWriter.Builder(conf, cacheConf, 
fs).withFilePath(outFile)
  .withBloomType(bloomFilterType).withFileContext(hFileContext).build();
HFileScanner scanner = halfReader.getScanner(false, false, false);
scanner.seekTo();
do {
  halfWriter.append(scanner.getCell());
} while (scanner.next());

for (Map.Entry entry : fileInfo.entrySet()) {
  if (shouldCopyHFileMetaKey(entry.getKey())) {
halfWriter.appendFileInfo(entry.getKey(), entry.getValue());
  }
}
  } finally {
。。。
  }
} 


##get lastMajorCompactionTs metric

  lastMajorCompactionTs = this.region.getOldestHfileTs(true);
...
  long now = EnvironmentEdgeManager.currentTime();
  return now - lastMajorCompactionTs;
...
##
public long getOldestHfileTs(boolean majorCompactionOnly) throws IOException {
  long result = Long.MAX_VALUE;
  for (HStore store : stores.values()) {
Collection storeFiles = store.getStorefiles();
   ...
for (HStoreFile file : storeFiles) {
  StoreFileReader sfReader = file.getReader();
 ...
  result = Math.min(result, reader.getFileContext().getFileCreateTime());
}
  }
  return result == Long.MAX_VALUE ? 0 : result;
}{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27670) The FSDataOutputStream is obtained without reflection mode

2023-02-27 Thread alan.zhao (Jira)
alan.zhao created HBASE-27670:
-

 Summary: The FSDataOutputStream is obtained without reflection mode
 Key: HBASE-27670
 URL: https://issues.apache.org/jira/browse/HBASE-27670
 Project: HBase
  Issue Type: Improvement
 Environment: HBase version: 2.2.3
Reporter: alan.zhao


hbase interacts with hdfs and obtains FSDataOutputStream to generate HFiles. In 
order to support favoredNodes, reflection is used. The DistributedFileSystem 
has a more direct way to get the FSDataOutputStream,for 
example:dfs.createFile(path).permission(perm).create()...;  this API allows you 
to set various parameters, including favoredNodes. I think avoiding reflection 
can improve performance, and if you agree with me, I can optimize this part of 
the code;

Model:hbase-server

class:FSUtils

 
{code:java}
public static FSDataOutputStream create(Configuration conf, FileSystem fs, Path 
path,
FsPermission perm, InetSocketAddress[] favoredNodes) throws IOException {
if (fs instanceof HFileSystem) {
FileSystem backingFs = ((HFileSystem) fs).getBackingFs();
if (backingFs instanceof DistributedFileSystem) {
// Try to use the favoredNodes version via reflection to allow backwards-
// compatibility.
short replication = 
Short.parseShort(conf.get(ColumnFamilyDescriptorBuilder.DFS_REPLICATION,
String.valueOf(ColumnFamilyDescriptorBuilder.DEFAULT_DFS_REPLICATION)));
try {
return (FSDataOutputStream) (DistributedFileSystem.class
.getDeclaredMethod("create", Path.class, FsPermission.class, boolean.class, 
int.class,
short.class, long.class, Progressable.class, InetSocketAddress[].class)
.invoke(backingFs, path, perm, true, 
CommonFSUtils.getDefaultBufferSize(backingFs),
replication > 0 ? replication : CommonFSUtils.getDefaultReplication(backingFs, 
path),
CommonFSUtils.getDefaultBlockSize(backingFs, path), null, favoredNodes));{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)