[
https://issues.apache.org/jira/browse/HBASE-30225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil updated HBASE-30225:
-----------------------------------------
Description:
In HBASE-29727 we replaced a single Path attribute by three String fields for
region, column family and file names, respectively. Since these values tend to
have a high level of redundancy on large caches (same region, family and file
names for many different blocks), we introduced the usage of string pool to
avoid string value repetition and save heap allocation.
When executing ycsb read workloads, we observed a ~30% latency degradation
The problem was that we added logic for parsing the file Path into region name,
family name, as well checks for archiving all on the BlockCacheKey constructor
used by HFileReaderImpl on the beginning of each block read. As seen on the
flame graphs attached covering a five minutes window on one of the RSes, around
30% of the CPU time was spent on the BlockCacheKey constructor, either calling
Path.getParent() or HFileUtils.isHFileArchived().
!flamegraph-high-level.png!
!flame-graph-zoomed.png!
was:
In HBASE-29727 we replaced a single Path attribute by three String fields for
region, column family and file names, respectively. Since these values tend to
have a high level of redundancy on large caches (same region, family and file
names for many different blocks), we introduced the usage of string pool to
avoid string value repetition and save heap allocation.
When executing ycsb read workloads, we observed a ~30% latency degradation
The problem was that we added logic for parsing the file Path into region name,
family name, as well checks for archiving all on the BlockCacheKey constructor
used by HFileReaderImpl on the beginning of each block read. As seen on the
flame graphs attached covering a five minutes window on one of the RSes, around
30% of the CPU time was spent on the BlockCacheKey constructor, either calling
Path.getParent() or HFileUtils.isHFileArchived().
!flame-graph-zoomed.png!
> Performance degradation observed on ycsb reads benchmark after HBASE-29727
> --------------------------------------------------------------------------
>
> Key: HBASE-30225
> URL: https://issues.apache.org/jira/browse/HBASE-30225
> Project: HBase
> Issue Type: Bug
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Attachments: flame-graph-zoomed.png, flamegraph-high-level.png
>
>
> In HBASE-29727 we replaced a single Path attribute by three String fields for
> region, column family and file names, respectively. Since these values tend
> to have a high level of redundancy on large caches (same region, family and
> file names for many different blocks), we introduced the usage of string pool
> to avoid string value repetition and save heap allocation.
> When executing ycsb read workloads, we observed a ~30% latency degradation
> The problem was that we added logic for parsing the file Path into region
> name, family name, as well checks for archiving all on the BlockCacheKey
> constructor used by HFileReaderImpl on the beginning of each block read. As
> seen on the flame graphs attached covering a five minutes window on one of
> the RSes, around 30% of the CPU time was spent on the BlockCacheKey
> constructor, either calling Path.getParent() or HFileUtils.isHFileArchived().
> !flamegraph-high-level.png!
> !flame-graph-zoomed.png!
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)