[ https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454432#comment-16454432 ]
Greg Phillips commented on PIG-5338: ------------------------------------ Thanks [~knoguchi]! I was able to run e2e successfully on a small cluster in a reasonable amount of time (220 minutes). In addition to resolving the in the e2e error noted before I've added testing, documentation, and the ability to return a native java DataBag from the Jython UDF. I'm not certain returning a DataBag is the correct way to go, I may add more functionality to the JythonBag to make it writable if that seems like a better way to proceed. > Prevent deep copy of DataBag into Jython List > --------------------------------------------- > > Key: PIG-5338 > URL: https://issues.apache.org/jira/browse/PIG-5338 > Project: Pig > Issue Type: Improvement > Reporter: Greg Phillips > Assignee: Greg Phillips > Priority: Major > Attachments: PIG-5338.001.patch, PIG-5338.patch > > > Pig Python UDFs currently perform deep copies on Bags converting them into > Jython PyLists. This can cause Jython UDFs to run out of memory and fail. A > Jython DataBag which extends PyList could allow for iterative access to > DataBag elements, while only performing a deep copy when necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)