[ 
https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854643#action_12854643
 ] 

Ashutosh Chauhan commented on PIG-1348:
---------------------------------------

1) As far as I can see TextOutputFormat has synchronized write() because it is 
meant to work even with mappers implementing MultithreadedMapRunner. But since 
thats not the case for Pig, we can get rid of it especially now that we are 
putting in our own PigTextOutputFormat instead of using TextOutputformat. 

3) Thats what I meant, if Schema is available, we should use that to find 
types, instead of reflecting on every call. I suggested the work around of 
caching for the case if we know user did provide Schema, but we dont have a 
handle on it. Clearly, if there is no schema, we need to find type every time. 
I can see that dealing with Complex types even when there is a schema is not 
straight forward. In any case, casts that are currently there for simple types 
are unnecessary.

For performance numbers, both of these will save CPU time, if we are convinced 
that we are always I/O bound we can leave these things as it is. 

> PigStorage making unnecessary byte array copy when storing data
> ---------------------------------------------------------------
>
>                 Key: PIG-1348
>                 URL: https://issues.apache.org/jira/browse/PIG-1348
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Richard Ding
>             Fix For: 0.7.0
>
>         Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using 
> Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though 
> configurable) of this memory and divides this memory into number of bags. It 
> keeps track of the memory used by bags and then proactively spills if bags 
> memory usage reach close to these limits. Given all this in theory when 
> presented with data more then it can handle InternalCachedBag should not run 
> out of memory. But in practice we find OOM happening. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to