[ https://issues.apache.org/jira/browse/OOZIE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473992#comment-16473992 ]
Peter Cseh commented on OOZIE-3250: ----------------------------------- Thanks for the improvement [~andras.piros]! Shouldn't we use == in the test instead of Arrays.equals ? This test passes even if I remove internalization from the code. Any ideas on the performance impact of reading through all the byte arrays? I don't imagine it's huge as we'll usually parse there arrays as Strings later. Would it make sense to keep only one copy of the data in StringBlob and BinaryBlob? E.g. when the getString is called in StringBlob, the byte array could be nulled out as it can be recalculated if needed from the String value. This would again be some calculation overhead added to gain some space in memory. I'm not sure it's worth it but it might worth a try. Also, are we 100% sure that the contents of the arrays are not changing? I haven't checked all the places in all the Beans but it looks like we're in the safe zone here. > Reduce heap waste by reducing duplicate byte[] count > ---------------------------------------------------- > > Key: OOZIE-3250 > URL: https://issues.apache.org/jira/browse/OOZIE-3250 > Project: Oozie > Issue Type: Improvement > Components: core > Affects Versions: 5.0.0 > Reporter: Andras Piros > Assignee: Andras Piros > Priority: Major > Attachments: OOZIE-3250.001.patch, OOZIE-3250.002.patch > > > Similar to OOZIE-3232, we also need to intern the {{byte[]}} field values > within > [*{{BinaryBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/BinaryBlob.java#L32-L33] > and > [*{{StringBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/StringBlob.java#L34] > to reduce heap waste caused by duplicate {{byte[]}} entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)