[ https://issues.apache.org/jira/browse/PIG-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161358#comment-13161358 ]
Daniel Dai commented on PIG-2374: --------------------------------- This is caused by HADOOP-6109 (0.21 and beyond). After 6109, Text.getBytes() will return a bytearray larger than Text.length. In Pig code, OutputHandler:92, we use the bytearray and ignore length. We need to either: 1. Ask Hadoop to rollback HADOOP-6109 2. Hunting down all occurrence we use getBytes() but ignore length in Pig > streaming regression with dotNext > --------------------------------- > > Key: PIG-2374 > URL: https://issues.apache.org/jira/browse/PIG-2374 > Project: Pig > Issue Type: Bug > Environment: hadoopApache Pig version 0.9.2.1111101150 (r1200499) > compiled Nov 10 2011, 19:50:15 > -bash-3.1$ hadoop version > Hadoop 0.23.0.1111080202 > Subversion > http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23.0/hadoop-common-project/hadoop-common > -r 1196973 > Compiled by hadoopqa on Tue Nov 8 02:12:04 PST 2011 > From source with checksum 4e42b2d96c899a98a8ab8c7cc23f27ae > Reporter: Araceli Henley > Fix For: 0.9.2 > > > Streaming seems to be broken in dotNext. There are several tests that are > failing. > The results from C below produce clean results. > The results from D which are streamed through CMD produce control characters > on some of the output. > define CMD `perl GroupBy.pl '\t' 0` > ship('/homes/monster/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/GroupBy.pl'); > A = load '/user/user1/pig/tests/data/singlefile/studenttab10k'; > B = group A by $0; > C = foreach B generate flatten(A); > D = stream C through CMD; > store C into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_C.out'; > store D into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_D.out'; > Other streaming tests that fail with control characters: > EST FAILED <ComputeSpec_7> > TEST FAILED <ComputeSpec_8> > TEST FAILED <ComputeSpec_10> > TEST FAILED <ComputeSpec_11> > TEST FAILED <ComputeSpec_12> > TEST FAILED <JobManagement_2> > TEST FAILED <JobManagement_3> > TEST FAILED <StreamingIO_4> > TEST FAILED <NonStreaming_1> > TEST FAILED <MultiQuery_21> > ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira