[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854643#action_12854643 ]
Ashutosh Chauhan commented on PIG-1348: --------------------------------------- 1) As far as I can see TextOutputFormat has synchronized write() because it is meant to work even with mappers implementing MultithreadedMapRunner. But since thats not the case for Pig, we can get rid of it especially now that we are putting in our own PigTextOutputFormat instead of using TextOutputformat. 3) Thats what I meant, if Schema is available, we should use that to find types, instead of reflecting on every call. I suggested the work around of caching for the case if we know user did provide Schema, but we dont have a handle on it. Clearly, if there is no schema, we need to find type every time. I can see that dealing with Complex types even when there is a schema is not straight forward. In any case, casts that are currently there for simple types are unnecessary. For performance numbers, both of these will save CPU time, if we are convinced that we are always I/O bound we can leave these things as it is. > PigStorage making unnecessary byte array copy when storing data > --------------------------------------------------------------- > > Key: PIG-1348 > URL: https://issues.apache.org/jira/browse/PIG-1348 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Reporter: Ashutosh Chauhan > Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1348.patch, PIG-1348_2.patch > > > InternalCachedBag makes estimate of memory available to the VM by using > Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though > configurable) of this memory and divides this memory into number of bags. It > keeps track of the memory used by bags and then proactively spills if bags > memory usage reach close to these limits. Given all this in theory when > presented with data more then it can handle InternalCachedBag should not run > out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.