Hadoop exception: DFSInputStream.java: - Error making BlockReader. Closing stale NioInetPeer

2014-06-06 Thread S. Zhou
The exception happens on hadoop 2.2 version. The whole error message is shown 
below. Notice that the level is DEBUG. Not sure if such exception is serious.

=2014-06-05 14:39:31,135 DEBUG [pool-1-thread-1] (DFSInputStream.java:1095) - 
Error making BlockReader. Closing stale 
NioInetPeer(Socket[addr=/XX.XX.XX.XX,port=50010,localport=45112])
java.io.EOFException: Premature EOF: no length prefix available
    at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
    at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
    at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:131)
    at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1088)
    at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:533)
    at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:154)
    at java.io.BufferedReader.readLine(BufferedReader.java:317)
    at java.io.BufferedReader.readLine(BufferedReader.java:382)

set file permission on mapreduce outputs

2013-10-28 Thread S. Zhou
I have a MR job (which only has mapper) and the file permission of the output 
files is rwx--. I want it to be rwxr-xr-x. How can I set it  up in job 
config?  Thanks

Senqiang


stop generating these part-XXXX empty files when using MultipleOutputs in mapreduce job

2013-10-28 Thread S. Zhou
I use MultipleOutputs so the output data are no longer stored in files 
part-XXX. But they are still generated (though empty). Is it possible to stop 
generating these files when running MR job? (BTW, my MR job only has mapper). 
Thanks

Senqiang


Mapreduce outputs to a different cluster?

2013-10-24 Thread S. Zhou
The scenario is: I run mapreduce job on cluster A (all source data is in 
cluster A) but I want the output of the job to cluster B. Is it possible? If 
yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang


Re: Mapreduce outputs to a different cluster?

2013-10-24 Thread S. Zhou
Thanks Shahab  Yong. If cluster B (in which I want to dump output) has url 
hdfs://machine.domain:8080 and data folder /tmp/myfolder, what should I 
specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 java8...@hotmail.com 
wrote:
 
Just specify the output location using the URI to another cluster. As long as 
the network is accessible, you should be fine.

Yong




Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myx...@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in 
cluster A) but I want the output of the job to cluster B. Is it possible? If 
yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang