[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461559#comment-13461559 ]
Cheng Hao commented on HBASE-6852: ---------------------------------- {quote} Cheng Hao: you said that your dataset size was 600GB, and the total amount of block cache was presumably much smaller than that, which makes me think the workload should have been I/O-bound. What was the CPU utilization on your test? What was the disk throughput? {quote} Actually it's the CPU-bound. and the utilization is more than 80%. I have 4 machines and each machine has 12 disks and 24 CPU cores. Besides, in order to make it more effective, I have splitted the regions twice, and then did the major compact, to be sure the data locality. After that, I ran the data scanning tests base on Hive query like "select count() from xxx"; I am also curious if there any overheads of threads/syscalls switching (like during the IPC). PS: I did set the "hbase.client.scanner.caching" as 1000; > SchemaMetrics.updateOnCacheHit costs too much while full scanning a table > with all of its fields > ------------------------------------------------------------------------------------------------ > > Key: HBASE-6852 > URL: https://issues.apache.org/jira/browse/HBASE-6852 > Project: HBase > Issue Type: Improvement > Components: metrics > Affects Versions: 0.94.0 > Reporter: Cheng Hao > Priority: Minor > Labels: performance > Fix For: 0.94.3, 0.96.0 > > Attachments: onhitcache-trunk.patch > > > The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full > table scanning. > Here is the top 5 hotspots within regionserver while full scanning a table: > (Sorry for the less-well-format) > CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit > mask of 0x00 (No unit mask) count 5000000 > samples % image name symbol name > ------------------------------------------------------------------------------- > 98447 13.4324 14033.jo void > org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, > boolean) > 98447 100.000 14033.jo void > org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, > boolean) [self] > ------------------------------------------------------------------------------- > 45814 6.2510 14033.jo int > org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, > byte[], int, int) > 45814 100.000 14033.jo int > org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, > byte[], int, int) [self] > ------------------------------------------------------------------------------- > 43523 5.9384 14033.jo boolean > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) > 43523 100.000 14033.jo boolean > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) > [self] > ------------------------------------------------------------------------------- > 42548 5.8054 14033.jo int > org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, > byte[], int, int) > 42548 100.000 14033.jo int > org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, > byte[], int, int) [self] > ------------------------------------------------------------------------------- > 40572 5.5358 14033.jo int > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], > int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 > 40572 100.000 14033.jo int > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], > int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira