[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854654#action_12854654 ]
Dmitriy V. Ryaboy commented on PIG-1348: ---------------------------------------- In the spirit of better java and micro-optimizations: StorageUtil does things like this to convert to bytes: {code} out.write(((Integer)field).toString().getBytes()); {code} Integer's toString() method creates a new string every time, even if the same integer (value-wise) is being converted to a String. This is better: {code} out.wirte(String.valueOf(field).getBytes()); {code} (This reuses the values, and also collapses the case statement a fair bit, cleaning up the code -- we can batch Integer, Double, etc, together and fall through to just one line of code.) This discussion should probably go into a separate ticket. > PigStorage making unnecessary byte array copy when storing data > --------------------------------------------------------------- > > Key: PIG-1348 > URL: https://issues.apache.org/jira/browse/PIG-1348 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Reporter: Ashutosh Chauhan > Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1348.patch, PIG-1348_2.patch > > > InternalCachedBag makes estimate of memory available to the VM by using > Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though > configurable) of this memory and divides this memory into number of bags. It > keeps track of the memory used by bags and then proactively spills if bags > memory usage reach close to these limits. Given all this in theory when > presented with data more then it can handle InternalCachedBag should not run > out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.