Are you sure that you prepare your MR code to work with mutiple
files?
This example (WordCount) works with a single input.
You should take a look to the MultipleInput API for this.
Best wishes
El 02/10/2012 6:05, Ben Kim escribió:
I'm having a similar issue
I'm running a wordcount MR as follows
hadoop
jar WordCount.jar wordcount.WordCountDriver
s3n://bucket/wordcount/input s3n://bucket/wordcount/output
s3n://bucket/wordcount/input is a s3 object that contains
other input files.
However I get following NPE error
12/10/02
18:56:23 INFO mapred.JobClient: map 0% reduce 0%
12/10/02 18:56:54 INFO mapred.JobClient: map 50% reduce 0%
12/10/02 18:56:56 INFO mapred.JobClient: Task Id :
attempt_201210021853_0001_m_000001_0, Status : FAILED
java.lang.NullPointerException
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at
java.io.BufferedInputStream.close(BufferedInputStream.java:451)
at
java.io.FilterInputStream.close(FilterInputStream.java:155)
at
org.apache.hadoop.util.LineReader.close(LineReader.java:83)
at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
MR runs fine if i specify more specific input path such as
s3n://bucket/wordcount/input/file.txt
what i want is to be able to pass s3 folders as parameters
Does anyone knows how to do this?
Best regards,
Ben Kim
On Fri, Jul 20, 2012 at 10:33 AM, Harsh
J <ha...@cloudera.com>
wrote:
Dan,
Can you share your error? The plain .gz files (not
.tar.gz) are natively supported by Hadoop via its
GzipCodec, and if you are facing an error, I believe its
cause of something other than compression.
On Fri, Jul 20, 2012 at 6:14
AM, Dan Yi <d...@mediosystems.com>
wrote:
i
have a MR job to read file on amazon S3 and
process the data on local hdfs. the files
are zipped text file as .gz. i tried to
setup the job as below but it won't work,
anyone know what might be wrong? do i need
to add extra step to unzip the file first?
thanks.
String S3_LOCATION = "s3n://access_key:private_key@bucket_name"
protected void prepareHadoopJob() throws Exception {
this.getHadoopJob().setMapperClass(Mapper1.class);
this.getHadoopJob().setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(this.getHadoopJob(), new Path(S3_LOCATION));
this.getHadoopJob().setNumReduceTasks(0);
this.getHadoopJob().setOutputFormatClass(TableOutputFormat.class);
this.getHadoopJob().getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, myTable.getTableName());
this.getHadoopJob().setOutputKeyClass(ImmutableBytesWritable.class);
this.getHadoopJob().setOutputValueClass(Put.class);
}

Dan Yi | Software
Engineer,
Analytics
Engineering
Medio Systems Inc | 701
Pike St. #1500
Seattle, WA
98101
Predictive Analytics for a
Connected
World
--
Harsh J
--
Benjamin Kim
benkimkimben at gmail
--
Marcos Ortiz Valmaseda,
Data Engineer && Senior System Administrator at UCI
Blog: http://marcosluis2186.posterous.com
Linkedin: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186
|