Re: Can we use System.exit() inside custom job class main function?

2011-04-09 Thread Harsh J
Hello,

On Sun, Apr 10, 2011 at 9:37 AM, Xiaobo Gu  wrote:
> Hi,
>
> Will it terminate the whole Hadoop JVM ?

It would be alright to use System.exit() in the job driver (frontend).
It would only terminate the launcher program JVM, not the submitted
jobs.

-- 
Harsh J


job.

2011-04-09 Thread real great..
Hi,
Although this is absolutely out of context, is there a similar mailing list
for Hadoop jobs??
especially freshers from engineering who want to work in this field have
some small experience working in this context?

-- 
Regards,
R.V.


Can we use System.exit() inside custom job class main function?

2011-04-09 Thread Xiaobo Gu
Hi,

Will it terminate the whole Hadoop JVM ?

Regards,

Xiaobo Gu


Re: Reg HDFS checksum

2011-04-09 Thread Thamizh
Hi Harsh ,
Thanks a lot for your reference.
I am looking forward to know about, how does Hadoop computes CRC for any file? 
If you have some reference please share me. It would be great help for me.

Regards,

  Thamizhannal P

--- On Sat, 9/4/11, Harsh J  wrote:

From: Harsh J 
Subject: Re: Reg HDFS checksum
To: common-user@hadoop.apache.org
Date: Saturday, 9 April, 2011, 3:20 PM

Hello Thamizh,

Perhaps the discussion in the following link can shed some light on
this: http://getsatisfaction.com/cloudera/topics/hadoop_fs_crc

On Fri, Apr 8, 2011 at 5:47 PM, Thamizh  wrote:
> Hi All,
>
> This is question regarding "HDFS checksum" computation.

-- 
Harsh J


Re: Writing to Mapper Context from RecordReader

2011-04-09 Thread Harsh J
Hello Adi,

On Thu, Apr 7, 2011 at 8:12 PM, Adi  wrote:
> using 0.21.0. I have implemented a custom InputFormat. The RecordReader
> extends org.apache.hadoop.mapreduce.RecordReader
>
> The sample I looked at threw an IOException when there was incompatible
> input line. But I am not sure who is supposed to catch and handle this
> exception. The task just failed when this exception was thrown.
> I changed the implementation to log an error instead of throwing an
> IOException but the best thing would be to write to the output via context
> and report this error.
> But the RecordReader does not have a handle to the Mapper context.
> Is there a way to get a handle to the current Mapper context and write a
> message via the Mapper context from the RecordReader?
> Any other suggestions on handling bad input data when implementing Custom
> InputFormat?

I'd say logging is better, unless you also want to preserve
information on the bad records.

Anyways, to solve this, you can open a DFS file stream and write your
bad records to it. Have a look in the FAQ at [1] - That should be
doable from the RecordReader layer also.

If you can push this functionality (validation) down into your mapper,
you can leverage the MultipleOutputs feature to do this easily too.

Finally, If you can use the old API, this is possible via the
framework itself by using the 'Skip Bad Records' feature [2].

[1] - 
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
[2] - 
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Skipping+Bad+Records

-- 
Harsh J


Re: Reg HDFS checksum

2011-04-09 Thread Harsh J
Hello Thamizh,

Perhaps the discussion in the following link can shed some light on
this: http://getsatisfaction.com/cloudera/topics/hadoop_fs_crc

On Fri, Apr 8, 2011 at 5:47 PM, Thamizh  wrote:
> Hi All,
>
> This is question regarding "HDFS checksum" computation.

-- 
Harsh J


Re: cannot compile the source code

2011-04-09 Thread Harsh J
Hello,

2011/4/6 لسٹ शिराज़ :
> Hi folks,
> I have checked out the hadoop-common source code from the tag 0.21. I have
> executed the command on root project folder with "ant eclipse", I can
> successfully view the project in eclipse, but there are some compilation
> errors which I would like to resolve
> within 
> org.apache.hadoop.classification.tools.ExcludePrivateAnnotationsJDiffDoclet.

Have a look at this thread (around the ending):
http://search-hadoop.com/m/VtONnKrE8j/pointers+to+Hadoop+eclipse&subj=pointers+to+Hadoop+eclipse

-- 
Harsh J


Re: How do I create per-reducer temporary files?

2011-04-09 Thread Harsh J
Hello,

On Tue, Apr 5, 2011 at 2:53 AM, W.P. McNeill  wrote:
> If I try:
>
>      storePath = FileOutputFormat.getPathForWorkFile(context, "my-file",
> ".seq");
>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);
>
> I get an exception about a mismatch in file systems when trying to read from
> the file.
>
> Alternately if I try:
>
>      storePath = new Path(SequenceFileOutputFormat.getUniqueFile(context,
> "my-file", ".seq"));
>      writer = SequenceFile.createWriter(FileSystem.get(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);

FileOutputFormat.getPathForWorkFile will give back HDFS paths. And
since you are looking to create local temporary files to be used only
by the task within itself, you shouldn't really worry about unique
filenames (stuff can go wrong).

You're looking for the tmp/ directory locally created in the FS where
the Task is running (at ${mapred.child.tmp}, which defaults to ./tmp).
You can create a regular file there using vanilla Java APIs for files,
or using RawLocalFS + your own created Path (not derived via
OutputFormat/etc.).

>      storePath = new Path(new Path(context.getConf().get("mapred.child.tmp"), 
> "my-file.seq");
>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);

The above should work, I think (haven't tried, but the idea is to use
the mapred.child.tmp).

Also see: 
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Directory+Structure

-- 
Harsh J