[ https://issues.apache.org/jira/browse/HIVE-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-457: --------------------------- Attachment: err.sh HIVE-457.1.patch * Modified TextRecordReader to return a line when the size has reached a threshold, even if the newline hasn't been encoutered. Test Plan: err.sh is a bash script that continuously writes a string to stderr. Using err.sh, the error can be reproduced by running in hive: {code} set mapred.child.java.opts=-Xmx8m; add file err.sh; SELECT TRANSFORM(*) USING 'err.sh' FROM src; {code} The job will fail quickly with a memory error (set mapred.child.java.opts=-Xmx8m reduces available memory so that the job fails quickly). With the patch, the job should run continuously. > ScriptOperator should NOT cache all data in stderr > -------------------------------------------------- > > Key: HIVE-457 > URL: https://issues.apache.org/jira/browse/HIVE-457 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Zheng Shao > Assignee: Paul Yang > Priority: Blocker > Fix For: 0.5.0 > > Attachments: err.sh, HIVE-457.1.patch > > > Sometimes user scripts output a lot of data to stderr without a new line, and > this causes Hive to go out-of-memory. > We should directly output the data from stderr without caching it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.