[ 
https://issues.apache.org/jira/browse/HIVE-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-457:
---------------------------

    Attachment: err.sh
                HIVE-457.1.patch

* Modified TextRecordReader to return a line when the size has reached a 
threshold, even if the newline hasn't been encoutered.

Test Plan:

err.sh is a bash script that continuously writes a string to stderr. Using 
err.sh, the error can be reproduced by running in hive:

{code}
set mapred.child.java.opts=-Xmx8m;
add file err.sh;
SELECT TRANSFORM(*) USING 'err.sh' FROM src;
{code}

The job will fail quickly with a memory error (set 
mapred.child.java.opts=-Xmx8m reduces available memory so that the job fails 
quickly).

With the patch, the job should run continuously.


> ScriptOperator should NOT cache all data in stderr
> --------------------------------------------------
>
>                 Key: HIVE-457
>                 URL: https://issues.apache.org/jira/browse/HIVE-457
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Paul Yang
>            Priority: Blocker
>             Fix For: 0.5.0
>
>         Attachments: err.sh, HIVE-457.1.patch
>
>
> Sometimes user scripts output a lot of data to stderr without a new line, and 
> this causes Hive to go out-of-memory.
> We should directly output the data from stderr without caching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to