[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769786#comment-13769786 ] Alan Gates commented on PIG-3255: - I gave my +1 above, so we're good from my viewpoint. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, > PIG-3255-4.patch, PIG-3255-5.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765732#comment-13765732 ] Alan Gates commented on PIG-3255: - +1 > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765457#comment-13765457 ] Rohini Palaniswamy commented on PIG-3255: - Thanks Jeremy and Alan. I will go with a new v2 abstract class approach and deprecate the old one. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765473#comment-13765473 ] Rohini Palaniswamy commented on PIG-3255: - Came across a good article - http://haacked.com/archive/2008/02/20/versioning-issues-with-abstract-base-classes-and-interfaces.aspx. Similar approach but cleaner to code. Idea is to create StreamToPigBase and PigToStreamBase abstract classes that implement StreamToPig and PigToStream interfaces respectively and add the new method there. In the InputHandler and OutputHandler check if it is an instanceof StreamToPigBase, then call new method else call the old interface one. With this don't have to check if v1 or v2 interface is implemented during reflection and change Input/OutputHandler and its implementations to set two different serializer/de-serializers. Will still go ahead and deprecate the interface so that it can be removed in the next release. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765123#comment-13765123 ] Alan Gates commented on PIG-3255: - At compile time, but not at runtime. At runtime Pig would need to reflect the class implementing StreamToPig and see if it contained a deserialize method that matches your new signature. You could then pick which method to call based on that. As Jeremy suggests, you could instead do that with a new interface (PigToStreamV2) and then at compile time determine which interface is being implemented and act accordingly. This is actually better than what I initially suggested as the determination can be made at compile time. If you choose this route you should also change PIgToStreamV2 to an abstract class so that in the future we can add methods without going through this dance. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765061#comment-13765061 ] Jeremy Karn commented on PIG-3255: -- If the performance gain is significant enough you could deprecate the existing interface and create a new interface that extends it adding just the new deserialize method. It's ugly though. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765044#comment-13765044 ] Rohini Palaniswamy commented on PIG-3255: - Alan, If you add a method to the interface, won't it break existing compiled code at runtime anyways ? > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764980#comment-13764980 ] Alan Gates commented on PIG-3255: - I don't know if anyone is using StreamToPig either, but marking an interface as stable and then changing it without deprecation or anything isn't cool. So no, I don't think this change is ok. We could add the proposed function "public Tuple deserialize(byte[] bytes, int offset, int length) throws IOException;" to the interface and change Pig to call it if it's present or use the old one if not. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761566#comment-13761566 ] Rohini Palaniswamy commented on PIG-3255: - If the interface change is ok, then thinking of changing even the PigToStream.java interface public byte[] serialize(Tuple t) throws IOException; to public DataBuffer serialize(Tuple t) throws IOException; where DataBuffer will be same as http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java?revision=1306187&view=markup Don't want to use DataOutputBuffer itself as it is marked @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"}) @InterfaceStability.Unstable This will get rid of one more byte array copy. Thoughts ? > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761052#comment-13761052 ] Jeremy Karn commented on PIG-3255: -- I think this change makes sense and it should be easy for me to update the patch in PIG-2417 once this is merged in. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760814#comment-13760814 ] Daniel Dai commented on PIG-3255: - PIG-2417 stream udf use this interface. Need to get it solved before check in. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759608#comment-13759608 ] Rohini Palaniswamy commented on PIG-3255: - [~alangates], Comments ? > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753979#comment-13753979 ] Daniel Dai commented on PIG-3255: - I personally does not realize anyone using StreamToPig, but need to check with [~alangates], since he marked it as public stable. Other part of the patch looks good. Avoiding 2 byte array copy and reuse Text object would save memory and enhance performance. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608227#comment-13608227 ] Koji Noguchi commented on PIG-3255: --- +1 Looks good to me. Probably another Jira, but I wonder if we really need to create new Text for every streaming outputs. Can we reuse it with value.clear() ? (But if we do this, then in most cases value.getBytes().length <> value.getLength().) > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira