[ 
https://issues.apache.org/jira/browse/TEZ-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998932#comment-13998932
 ] 

Bikas Saha commented on TEZ-373:
--------------------------------

We should not create new byte array if the payload is empty.{code}+  public 
static TezUserPayload convertToTezUserPayload(@Nullable byte[] payload) {
+    if (payload == null) {
+      payload = new byte[0];
+    }{code}

Why do convertFromTezUserPayload(), serializeTezUserPayload() have nullable 
arguments?

I did not notice this earlier but it looks like serde using the UserPayload 
proto is causing a new buffer copy beyond what already exists, right?
{code}+  public static byte[] serializeTezUserPayload(@Nullable TezUserPayload 
payload) {
+    if (payload == null || payload.getPayload().length == 0) {
+      return null;
+    }
+    return UserPayload.newBuilder().setPayload(
+        ByteString.copyFrom(payload.getPayload())).build().toByteArray();
+  }
+  public static TezUserPayload deserializeTezUserPayload(@Nullable byte[] 
payload) {
+    if (payload == null) {
+      payload = new byte[0];
+    }
+    return new TezUserPayload(UserPayload.newBuilder().setPayload(
+        ByteString.copyFrom(payload)).build().getPayload().toByteArray());
+  }
{code}
Unfortunately, this may be a perf hit for large payloads in Hive :( Earlier in 
this jira, we decided to defer the 0 copy change to a different jira so that we 
can do it for all cases. Until we get change it, I think we should avoid the 
extra copy now by having the serde methods directly use the byte[] instead of 
wrapping it in the UserPayload proto object. I am sorry I might have misguided 
you in this direction. If you agree that the extra copy is not a good choice 
then I can change the patch to remove the new extra copy. This means that the 
patch does not add a new UserPayload proto object but simply uses the byte[] 
directly for serde.
In parallel, I think you could pick up TEZ-305 or create a new jira to remove 
the extra copy overhead in the RPC serde.

> Create UserPayload class
> ------------------------
>
>                 Key: TEZ-373
>                 URL: https://issues.apache.org/jira/browse/TEZ-373
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Tsuyoshi OZAWA
>              Labels: newbie
>         Attachments: TEZ-373.1.patch, TEZ-373.2.patch, TEZ-373.3.patch, 
> TEZ-373.4.patch, TEZ-373.5.patch, TEZ-373.6.patch, TEZ-373.7.patch, 
> TEZ-373.8.patch, TEZ-373.9.patch
>
>
> Currently Tez allows user payload to be passed as byte[]. Within Tez code is 
> hard to understand where byte[] is userPayload and where is not user payload. 
> If we create a TezUserPayload class that contains byte[] as a member then its 
> much easier to search and identify in code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to