[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes

2016-12-05 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724288#comment-15724288
 ] 

Akira Ajisaka commented on HDFS-5:
--

I doubt that using String "UTF-8" is a optimization. I did a micro benchmark 
and the result is that {{new String(byte, StandardCharsets.UTF-8)}} is faster 
than {{DFSUtilClient.bytes2String(byte)}} and 
{{str.getBytes(StandardCharsets.UTF-8)}} is almost as fast as 
{{DFSUtilClient.string2Bytes(str)}}.
* 
https://github.com/aajisaka/hadoop-tools/commit/62c5ea6f459084d5042fe83e9c465e14683f4d18

> Remove bytes2Array and string2Bytes
> ---
>
> Key: HDFS-5
> URL: https://issues.apache.org/jira/browse/HDFS-5
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Reporter: Sahil Kang
>Priority: Minor
>
> In DFSUtilClient.java we have something like:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   try {
> return str.getBytes("UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 decoding is not supported", e);
>   }
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   try {
> return new String(bytes, offset, length, "UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 encoding is not supported", e);
>   }
> }
> {code}
> Using StandardCharsets, these methods become trivial:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   return str.getBytes(StandardCharsets.UTF_8);
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   return new String(bytes, offset, length, StandardCharsets.UTF_8);
> }
> {code}
> I think we should remove these methods and use StandardCharsets whenever we 
> need to convert between bytes and strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes

2016-11-08 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648021#comment-15648021
 ] 

Kihwal Lee commented on HDFS-5:
---

You are undoing this optimization.
{code}
  // Using the charset canonical name for String/byte[] conversions is much
  // more efficient due to use of cached encoders/decoders.
  private static final String UTF8_CSN = StandardCharsets.UTF_8.name();
{code}

> Remove bytes2Array and string2Bytes
> ---
>
> Key: HDFS-5
> URL: https://issues.apache.org/jira/browse/HDFS-5
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Reporter: Sahil Kang
>Priority: Minor
>
> In DFSUtilClient.java we have something like:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   try {
> return str.getBytes("UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 decoding is not supported", e);
>   }
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   try {
> return new String(bytes, offset, length, "UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 encoding is not supported", e);
>   }
> }
> {code}
> Using StandardCharsets, these methods become trivial:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   return str.getBytes(StandardCharsets.UTF_8);
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   return new String(bytes, offset, length, StandardCharsets.UTF_8);
> }
> {code}
> I think we should remove these methods and use StandardCharsets whenever we 
> need to convert between bytes and strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11115) Remove bytes2Array and string2Bytes

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642941#comment-15642941
 ] 

ASF GitHub Bot commented on HDFS-5:
---

GitHub user SahilKang opened a pull request:

https://github.com/apache/hadoop/pull/152

HDFS-5 Remove bytes2String and string2Bytes

Since StandardCharsets makes converting between (utf-8) bytes and strings
trivial, let's remove the methods:
- org.apache.hadoop.hdfs.DFSUtilClient.bytes2String
- org.apache.hadoop.hdfs.DFSUtilClient.string2Bytes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SahilKang/hadoop trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #152


commit f234dd88bce1189de9a5286eb7326453f9997d0a
Author: Sahil Kang 
Date:   2016-07-11T04:32:57Z

MAPREDUCE-6730 Use StandardCharsets

Use String.getBytes(StandardCharsets.UTF_8) instead of 
String.getBytes(String).

commit 672229b7d4744200124d0c3f85e1dca9ad58109d
Author: Sahil Kang 
Date:   2016-08-03T06:06:21Z

Fix checkstyle warnings for TextOuputFormat

Uppercased `newline' since it's a constant, and
made line lengths no more than 80 chars.

commit 815409d0aec43d6dba8b8a4bd67927c81ed8df11
Author: Sahil Kang 
Date:   2016-08-03T06:29:55Z

Use StandardCharsets in mapred.TextOutputFormat

commit c6f3756d966100de8c6796fa487c28b103858b9c
Author: Sahil Kang 
Date:   2016-11-05T22:09:15Z

Merge branch 'trunk' of git://git.apache.org/hadoop into trunk

commit 08f716890467e0f3bc502cf054c9a243263bf666
Author: Sahil Kang 
Date:   2016-11-07T02:53:06Z

HDFS-5 Remove bytes2String and string2Bytes

Since StandardCharsets makes converting between (utf-8) bytes and strings
trivial, let's remove the methods:
- org.apache.hadoop.hdfs.DFSUtilClient.bytes2String
- org.apache.hadoop.hdfs.DFSUtilClient.string2Bytes

commit 2fe192a2be6d1078e28518fba021159c3951
Author: Sahil Kang 
Date:   2016-11-07T03:00:07Z

Merge branch 'trunk' of git://git.apache.org/hadoop into trunk




> Remove bytes2Array and string2Bytes
> ---
>
> Key: HDFS-5
> URL: https://issues.apache.org/jira/browse/HDFS-5
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Reporter: Sahil Kang
>Priority: Minor
>
> In DFSUtilClient.java we have something like:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   try {
> return str.getBytes("UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 decoding is not supported", e);
>   }
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   try {
> return new String(bytes, offset, length, "UTF-8");
>   } catch (UnsupportedEncodingException e) {
> throw new IllegalArgumentException("UTF8 encoding is not supported", e);
>   }
> }
> {code}
> Using StandardCharsets, these methods become trivial:
> {code: language=java}
> public static byte[] string2Bytes(String str) {
>   return str.getBytes(StandardCharsets.UTF_8);
> }
> static String bytes2String(byte[] bytes, int offset, int length) {
>   return new String(bytes, offset, length, StandardCharsets.UTF_8);
> }
> {code}
> I think we should remove these methods and use StandardCharsets whenever we 
> need to convert between bytes and strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org