[ https://issues.apache.org/jira/browse/HBASE-15569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-15569: -------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.2.2 1.4.0 1.3.0 2.0.0 Status: Resolved (was: Patch Available) Pushed to branch-1.2+ Thanks [~junegunn] for nice patch > Make Bytes.toStringBinary faster > -------------------------------- > > Key: HBASE-15569 > URL: https://issues.apache.org/jira/browse/HBASE-15569 > Project: HBase > Issue Type: Improvement > Components: Performance > Reporter: Junegunn Choi > Assignee: Junegunn Choi > Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.2 > > Attachments: HBASE-15569.patch > > > Bytes.toStringBinary is quite expensive due to its use of {{String.format}}. > It seems to me that {{String.format}} is overkill for the purpose and I could > actually make the function up to 45-times faster by replacing the part with a > simpler hand-crafted code. > This is probably a non-issue for HBase server as the function is not used in > performance-sensitive contexts but I figured it wouldn't hurt to make it > faster as it's widely used in builtin tools - Shell, {{HFilePrettyPrinter}} > with {{-p}} option, etc. - and it can be used in clients. > h4. Background: > We have [an HBase monitoring > tool|https://github.com/kakao/hbase-region-inspector] that periodically > collects the information of the regions and it calls {{Bytes.toStringBinary}} > during the process to make some information suitable for display. Profiling > revealed that a large portion of the processing time was spent in > {{String.format}}. > h4. Micro-benchmark: > {code} > byte[] bytes = new byte[256]; > for (int i = 0; i < bytes.length; ++i) { > // Mixture of printable and non-printable characters. > // Maximal performance gain (45x) is observed when the array is solely > // composed of non-printable characters. > bytes[i] = (byte) i; > } > long started = System.nanoTime(); > for (int i = 0; i < 1000000; ++i) { > Bytes.toStringBinary(bytes); > } > System.out.println(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - > started)); > {code} > - Without the patch: 134176 ms > - With the patch: 3890 ms > I made sure that the new version returns the same value as before and > simplified the check for non-printable characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)