[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3255:
------------------------------------

    Status: Open  (was: Patch Available)

Had a chat with Koji. He pointed out HADOOP-6109 which doubles the size of 
byte[] in Text every time a append happens. 

Text.java

Hadoop 1.x
{code}
private void setCapacity(int len, boolean keepData) {
    if (bytes == null || bytes.length < len) {
      byte[] newBytes = new byte[len];
      if (bytes != null && keepData) {
        System.arraycopy(bytes, 0, newBytes, 0, length);
      }
      bytes = newBytes;
    }
  }
{code}

Hadoop 0.23/2.x:
{code}
private void setCapacity(int len, boolean keepData) {
    if (bytes == null || bytes.length < len) {
      if (bytes != null && keepData) {
        bytes = Arrays.copyOf(bytes, Math.max(len,length << 1));
      } else {
        bytes = new byte[len];
      }
    }
  }
{code}

So value.getBytes().length == value.getLength() will be true only when the size 
of the line is < io.file.buffer.size. Since a copy of the byte[] needs to be 
created with the right size in any case, we can go with reusing the Text() for 
every getNext() in OutputHandler. It will be more beneficial when the record 
sizes are greater than io.file.buffer.size and value.getBytes().length is 
almost never equal to value.getLength() because of the doubling of the size.

I will modify the patch to reuse Text object.
                
> Avoid extra byte array copy in streaming deserialize
> ----------------------------------------------------
>
>                 Key: PIG-3255
>                 URL: https://issues.apache.org/jira/browse/PIG-3255
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.12
>
>         Attachments: PIG-3255-1.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
>         Text val = new Text(bytes);  
>         return StorageUtil.textToTuple(val, fieldDel);
>     }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to