[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes
[ https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724288#comment-15724288 ] Akira Ajisaka commented on HDFS-5: -- I doubt that using String "UTF-8" is a optimization. I did a micro benchmark and the result is that {{new String(byte, StandardCharsets.UTF-8)}} is faster than {{DFSUtilClient.bytes2String(byte)}} and {{str.getBytes(StandardCharsets.UTF-8)}} is almost as fast as {{DFSUtilClient.string2Bytes(str)}}. * https://github.com/aajisaka/hadoop-tools/commit/62c5ea6f459084d5042fe83e9c465e14683f4d18 > Remove bytes2Array and string2Bytes > --- > > Key: HDFS-5 > URL: https://issues.apache.org/jira/browse/HDFS-5 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Sahil Kang >Priority: Minor > > In DFSUtilClient.java we have something like: > {code: language=java} > public static byte[] string2Bytes(String str) { > try { > return str.getBytes("UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 decoding is not supported", e); > } > } > static String bytes2String(byte[] bytes, int offset, int length) { > try { > return new String(bytes, offset, length, "UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 encoding is not supported", e); > } > } > {code} > Using StandardCharsets, these methods become trivial: > {code: language=java} > public static byte[] string2Bytes(String str) { > return str.getBytes(StandardCharsets.UTF_8); > } > static String bytes2String(byte[] bytes, int offset, int length) { > return new String(bytes, offset, length, StandardCharsets.UTF_8); > } > {code} > I think we should remove these methods and use StandardCharsets whenever we > need to convert between bytes and strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes
[ https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648021#comment-15648021 ] Kihwal Lee commented on HDFS-5: --- You are undoing this optimization. {code} // Using the charset canonical name for String/byte[] conversions is much // more efficient due to use of cached encoders/decoders. private static final String UTF8_CSN = StandardCharsets.UTF_8.name(); {code} > Remove bytes2Array and string2Bytes > --- > > Key: HDFS-5 > URL: https://issues.apache.org/jira/browse/HDFS-5 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Sahil Kang >Priority: Minor > > In DFSUtilClient.java we have something like: > {code: language=java} > public static byte[] string2Bytes(String str) { > try { > return str.getBytes("UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 decoding is not supported", e); > } > } > static String bytes2String(byte[] bytes, int offset, int length) { > try { > return new String(bytes, offset, length, "UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 encoding is not supported", e); > } > } > {code} > Using StandardCharsets, these methods become trivial: > {code: language=java} > public static byte[] string2Bytes(String str) { > return str.getBytes(StandardCharsets.UTF_8); > } > static String bytes2String(byte[] bytes, int offset, int length) { > return new String(bytes, offset, length, StandardCharsets.UTF_8); > } > {code} > I think we should remove these methods and use StandardCharsets whenever we > need to convert between bytes and strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes
[ https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642941#comment-15642941 ] ASF GitHub Bot commented on HDFS-5: --- GitHub user SahilKang opened a pull request: https://github.com/apache/hadoop/pull/152 HDFS-5 Remove bytes2String and string2Bytes Since StandardCharsets makes converting between (utf-8) bytes and strings trivial, let's remove the methods: - org.apache.hadoop.hdfs.DFSUtilClient.bytes2String - org.apache.hadoop.hdfs.DFSUtilClient.string2Bytes You can merge this pull request into a Git repository by running: $ git pull https://github.com/SahilKang/hadoop trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #152 commit f234dd88bce1189de9a5286eb7326453f9997d0a Author: Sahil KangDate: 2016-07-11T04:32:57Z MAPREDUCE-6730 Use StandardCharsets Use String.getBytes(StandardCharsets.UTF_8) instead of String.getBytes(String). commit 672229b7d4744200124d0c3f85e1dca9ad58109d Author: Sahil Kang Date: 2016-08-03T06:06:21Z Fix checkstyle warnings for TextOuputFormat Uppercased `newline' since it's a constant, and made line lengths no more than 80 chars. commit 815409d0aec43d6dba8b8a4bd67927c81ed8df11 Author: Sahil Kang Date: 2016-08-03T06:29:55Z Use StandardCharsets in mapred.TextOutputFormat commit c6f3756d966100de8c6796fa487c28b103858b9c Author: Sahil Kang Date: 2016-11-05T22:09:15Z Merge branch 'trunk' of git://git.apache.org/hadoop into trunk commit 08f716890467e0f3bc502cf054c9a243263bf666 Author: Sahil Kang Date: 2016-11-07T02:53:06Z HDFS-5 Remove bytes2String and string2Bytes Since StandardCharsets makes converting between (utf-8) bytes and strings trivial, let's remove the methods: - org.apache.hadoop.hdfs.DFSUtilClient.bytes2String - org.apache.hadoop.hdfs.DFSUtilClient.string2Bytes commit 2fe192a2be6d1078e28518fba021159c3951 Author: Sahil Kang Date: 2016-11-07T03:00:07Z Merge branch 'trunk' of git://git.apache.org/hadoop into trunk > Remove bytes2Array and string2Bytes > --- > > Key: HDFS-5 > URL: https://issues.apache.org/jira/browse/HDFS-5 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Sahil Kang >Priority: Minor > > In DFSUtilClient.java we have something like: > {code: language=java} > public static byte[] string2Bytes(String str) { > try { > return str.getBytes("UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 decoding is not supported", e); > } > } > static String bytes2String(byte[] bytes, int offset, int length) { > try { > return new String(bytes, offset, length, "UTF-8"); > } catch (UnsupportedEncodingException e) { > throw new IllegalArgumentException("UTF8 encoding is not supported", e); > } > } > {code} > Using StandardCharsets, these methods become trivial: > {code: language=java} > public static byte[] string2Bytes(String str) { > return str.getBytes(StandardCharsets.UTF_8); > } > static String bytes2String(byte[] bytes, int offset, int length) { > return new String(bytes, offset, length, StandardCharsets.UTF_8); > } > {code} > I think we should remove these methods and use StandardCharsets whenever we > need to convert between bytes and strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org