[
https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854654#action_12854654
]
Dmitriy V. Ryaboy commented on PIG-1348:
----------------------------------------
In the spirit of better java and micro-optimizations:
StorageUtil does things like this to convert to bytes:
{code}
out.write(((Integer)field).toString().getBytes());
{code}
Integer's toString() method creates a new string every time, even if the same
integer (value-wise) is being converted to a String. This is better:
{code}
out.wirte(String.valueOf(field).getBytes());
{code}
(This reuses the values, and also collapses the case statement a fair bit,
cleaning up the code -- we can batch Integer, Double, etc, together and fall
through to just one line of code.)
This discussion should probably go into a separate ticket.
> PigStorage making unnecessary byte array copy when storing data
> ---------------------------------------------------------------
>
> Key: PIG-1348
> URL: https://issues.apache.org/jira/browse/PIG-1348
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.7.0
> Reporter: Ashutosh Chauhan
> Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using
> Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though
> configurable) of this memory and divides this memory into number of bags. It
> keeps track of the memory used by bags and then proactively spills if bags
> memory usage reach close to these limits. Given all this in theory when
> presented with data more then it can handle InternalCachedBag should not run
> out of memory. But in practice we find OOM happening.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.