[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2010-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804343#action_12804343
 ] 

Hadoop QA commented on MAPREDUCE-1277:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428009/streaming-1277-new.patch
  against trunk revision 902272.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/284/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/284/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/284/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/284/console

This message is automatically generated.

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277-new.patch, streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791951#action_12791951
 ] 

Hadoop QA commented on MAPREDUCE-1277:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428009/streaming-1277-new.patch
  against trunk revision 891524.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/console

This message is automatically generated.

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277-new.patch, streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-11 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789453#action_12789453
 ] 

Zheng Shao commented on MAPREDUCE-1277:
---

Hadoop does need to understand the data format in stdout to split the records 
and key/value inside the record.
By default, Hadoop streaming uses utf-8, "\n" and "\t".

For stderr, Hadoop needs to know the line boundary  "\n" as well. Hadoop 
already supports reporting (change of counters etc) through stderr.

As a result, I think it's a better idea to specify the encoding of the streams.


> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-10 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789081#action_12789081
 ] 

ZhuGuanyin commented on MAPREDUCE-1277:
---

I think the framework should not care what the characterset of the input and 
user log, may be the input or output has more than one characterset.

what hadoop need to do is read raw data for user mapper or reducer, collect raw 
stdout and stderr data and save them on hdfs or tasktracker local disk.

raw in, raw out, no matter what characterset it is.

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-10 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1270#action_1270
 ] 

Zheng Shao commented on MAPREDUCE-1277:
---

I think a better way to do this is to add an "encoding" property to JobConf so 
that we can encode and decode the data correctly.
That also allows us to do codec changes if needed.

Does that make sense?

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-09 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788489#action_12788489
 ] 

ZhuGuanyin commented on MAPREDUCE-1277:
---

this patch change 

System.err.println(lineStr);

to
 
System.err.write(line.getBytes(),0,line.getLength());
System.err.println();

I think it could be verified by review, and it not very easy to write a 
testcase for this jira.

manual steps to check this :

1)copy a small file to hdfs

2)run streaming job using the mapper as follows:

#!/bin/sh
cat >/dev/null

echo "㊣ ?※" >&2
echo "礙骯襖壩闆辦" >&2

3) check the task stderr output, the logs would corrupted.

4) add the patch, and run the streaming job again, the task stderr would be 
fine.

this patch is usefull when user need write some debug message, example: some 
input record which might be encoded by big5, GBK and so on.

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788140#action_12788140
 ] 

Hadoop QA commented on MAPREDUCE-1277:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427460/streaming-1277.patch
  against trunk revision 888761.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/309/console

This message is automatically generated.

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-09 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788021#action_12788021
 ] 

ZhuGuanyin commented on MAPREDUCE-1277:
---

test case:
using the following mapper, and you would see the stderr log has corrupted.

#!/bin/sh
cat >/dev/null

echo "㊣ ?※" >&2
echo "礙骯襖壩闆辦" >&2



> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-09 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788022#action_12788022
 ] 

ZhuGuanyin commented on MAPREDUCE-1277:
---

a simple solution:

change line 492 in PipeMapRed.java

System.err.println(lineStr);

to:
System.err.write(line.getBytes(),0,line.getLength());
System.err.println();

I will attach the patch soon. 

> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> ---
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.