[
https://issues.apache.org/jira/browse/TEZ-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998932#comment-13998932
]
Bikas Saha commented on TEZ-373:
--------------------------------
We should not create new byte array if the payload is empty.{code}+ public
static TezUserPayload convertToTezUserPayload(@Nullable byte[] payload) {
+ if (payload == null) {
+ payload = new byte[0];
+ }{code}
Why do convertFromTezUserPayload(), serializeTezUserPayload() have nullable
arguments?
I did not notice this earlier but it looks like serde using the UserPayload
proto is causing a new buffer copy beyond what already exists, right?
{code}+ public static byte[] serializeTezUserPayload(@Nullable TezUserPayload
payload) {
+ if (payload == null || payload.getPayload().length == 0) {
+ return null;
+ }
+ return UserPayload.newBuilder().setPayload(
+ ByteString.copyFrom(payload.getPayload())).build().toByteArray();
+ }
+ public static TezUserPayload deserializeTezUserPayload(@Nullable byte[]
payload) {
+ if (payload == null) {
+ payload = new byte[0];
+ }
+ return new TezUserPayload(UserPayload.newBuilder().setPayload(
+ ByteString.copyFrom(payload)).build().getPayload().toByteArray());
+ }
{code}
Unfortunately, this may be a perf hit for large payloads in Hive :( Earlier in
this jira, we decided to defer the 0 copy change to a different jira so that we
can do it for all cases. Until we get change it, I think we should avoid the
extra copy now by having the serde methods directly use the byte[] instead of
wrapping it in the UserPayload proto object. I am sorry I might have misguided
you in this direction. If you agree that the extra copy is not a good choice
then I can change the patch to remove the new extra copy. This means that the
patch does not add a new UserPayload proto object but simply uses the byte[]
directly for serde.
In parallel, I think you could pick up TEZ-305 or create a new jira to remove
the extra copy overhead in the RPC serde.
> Create UserPayload class
> ------------------------
>
> Key: TEZ-373
> URL: https://issues.apache.org/jira/browse/TEZ-373
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Tsuyoshi OZAWA
> Labels: newbie
> Attachments: TEZ-373.1.patch, TEZ-373.2.patch, TEZ-373.3.patch,
> TEZ-373.4.patch, TEZ-373.5.patch, TEZ-373.6.patch, TEZ-373.7.patch,
> TEZ-373.8.patch, TEZ-373.9.patch
>
>
> Currently Tez allows user payload to be passed as byte[]. Within Tez code is
> hard to understand where byte[] is userPayload and where is not user payload.
> If we create a TezUserPayload class that contains byte[] as a member then its
> much easier to search and identify in code.
--
This message was sent by Atlassian JIRA
(v6.2#6252)