[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769786#comment-13769786
 ] 

Alan Gates commented on PIG-3255:
-

I gave my +1 above, so we're good from my viewpoint.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, 
> PIG-3255-4.patch, PIG-3255-5.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765732#comment-13765732
 ] 

Alan Gates commented on PIG-3255:
-

+1

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-12 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765457#comment-13765457
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

Thanks Jeremy and Alan. I will go with a new v2 abstract class approach and 
deprecate the old one.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-12 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765473#comment-13765473
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

Came across a good article - 
http://haacked.com/archive/2008/02/20/versioning-issues-with-abstract-base-classes-and-interfaces.aspx.
 Similar approach but cleaner to code. Idea is to create StreamToPigBase and 
PigToStreamBase abstract classes that implement StreamToPig and PigToStream 
interfaces respectively and add the new method there. In the InputHandler and 
OutputHandler check if it is an instanceof StreamToPigBase, then call new 
method else call the old interface one. With this don't have to check if v1 or 
v2 interface is implemented during reflection and change Input/OutputHandler 
and its implementations to set two different serializer/de-serializers. Will 
still go ahead and deprecate the interface so that it can be removed in the 
next release. 

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765123#comment-13765123
 ] 

Alan Gates commented on PIG-3255:
-

At compile time, but not at runtime.  At runtime Pig would need to reflect the 
class implementing StreamToPig and see if it contained a deserialize method 
that matches your new signature.  You could then pick which method to call 
based on that.  As Jeremy suggests, you could instead do that with a new 
interface (PigToStreamV2) and then at compile time determine which interface is 
being implemented and act accordingly.  This is actually better than what I 
initially suggested as the determination can be made at compile time.  If you 
choose this route you should also change PIgToStreamV2 to an abstract class so 
that in the future we can add methods without going through this dance.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Jeremy Karn (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765061#comment-13765061
 ] 

Jeremy Karn commented on PIG-3255:
--

If the performance gain is significant enough you could deprecate the existing 
interface and create a new interface that extends it adding just the new 
deserialize method.  It's ugly though.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765044#comment-13765044
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

Alan,
   If you add a method to the interface, won't it break existing compiled code 
at runtime anyways ?

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764980#comment-13764980
 ] 

Alan Gates commented on PIG-3255:
-

I don't know if anyone is using StreamToPig either, but marking an interface as 
stable and then changing it without deprecation or anything isn't cool.  So no, 
I don't think this change is ok.

We could add the proposed function "public Tuple deserialize(byte[] bytes, int 
offset, int length) throws IOException;" to the interface and change Pig to 
call it if it's present or use the old one if not.  

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-08 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761566#comment-13761566
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

If the interface change is ok, then thinking of changing even the 
PigToStream.java interface 

public byte[] serialize(Tuple t) throws IOException;

to 

public DataBuffer serialize(Tuple t) throws IOException;
 
 where DataBuffer will be same as 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java?revision=1306187&view=markup

Don't want to use DataOutputBuffer itself as it is marked 
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
@InterfaceStability.Unstable

This will get rid of one more byte array copy. Thoughts ? 


> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-07 Thread Jeremy Karn (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761052#comment-13761052
 ] 

Jeremy Karn commented on PIG-3255:
--

I think this change makes sense and it should be easy for me to update the 
patch in PIG-2417 once this is merged in.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-06 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760814#comment-13760814
 ] 

Daniel Dai commented on PIG-3255:
-

PIG-2417 stream udf use this interface. Need to get it solved before check in.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-05 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759608#comment-13759608
 ] 

Rohini Palaniswamy commented on PIG-3255:
-

[~alangates],
   Comments ?

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-08-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753979#comment-13753979
 ] 

Daniel Dai commented on PIG-3255:
-

I personally does not realize anyone using StreamToPig, but need to check with 
[~alangates], since he marked it as public stable. Other part of the patch 
looks good. Avoiding 2 byte array copy and reuse Text object would save memory 
and enhance performance.

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch, PIG-3255-2.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-03-20 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608227#comment-13608227
 ] 

Koji Noguchi commented on PIG-3255:
---

+1 Looks good to me. 
Probably another Jira, but I wonder if we really need to create new Text for 
every streaming outputs.  Can we reuse it with value.clear() ?
(But if we do this, then in most cases value.getBytes().length <> 
value.getLength().)

> Avoid extra byte array copy in streaming deserialize
> 
>
> Key: PIG-3255
> URL: https://issues.apache.org/jira/browse/PIG-3255
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3255-1.patch
>
>
> PigStreaming.java:
>  public Tuple deserialize(byte[] bytes) throws IOException {
> Text val = new Text(bytes);  
> return StorageUtil.textToTuple(val, fieldDel);
> }
> Should remove new Text(bytes) copy and construct the tuple directly from the 
> bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira