Re: Issue with running hadoop program using eclipse

2013-01-31 Thread Mohammad Tariq
Hello Vikas,

 Sorry for late response. You don't have to create the jar separately.
If you have added "job.setJarByClass" as specified by Hemanth sir, it
should work.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Feb 1, 2013 at 9:42 AM, Hemanth Yamijala
wrote:

> Previously, I have resolved this error by building a jar and then using
> the API job.setJarByClass(.class). Can you please try that
> once ?
>
>
> On Thu, Jan 31, 2013 at 6:40 PM, Vikas Jadhav wrote:
>
>> Hi I know it class not found error
>> but I have Map and reduce Class as part of Driver class
>> So what is problem ?
>>
>> I want to ask whether it is compulsory to have jar of any Hadoop Program
>> Because it say "No job jar file set " See JobConf(Class) or
>> JobConf#setJar(String).
>>
>> I am running my program using eclipse(windows machine) and Hadoop(llinux
>> machine) remotely.
>>
>>
>>
>> On Thu, Jan 31, 2013 at 12:40 PM, Mohammad Tariq wrote:
>>
>>> Hello Vikas,
>>>
>>>  It clearly shows that the class can not be found. For
>>> debugging, you can write your MR job as a standalone java program and debug
>>> it. It works. And if you want to just debug your mapper / reducer logic,
>>> you should look into using MRUnit. There is a good 
>>> write-upat
>>>  Cloudera's blog section which talks about it in detail.
>>>
>>> HTH
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 31, 2013 at 11:56 AM, Vikas Jadhav >> > wrote:
>>>
 Hi
 I have one windows machine and one linux machine
 my eclipse is on winowds machine
 and hadoop running on single linux machine
 I am trying to run wordcount program from eclipse(on windows machine)
 to Hadoop(on linux machine)
  I getting following error

  3/01/31 11:48:14 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 13/01/31 11:48:15 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or JobConf#setJar(String).
 13/01/31 11:48:16 INFO input.FileInputFormat: Total input paths to
 process : 1
 13/01/31 11:48:16 WARN util.NativeCodeLoader: Unable to load
 native-hadoop library for your platform... using builtin-java classes where
 applicable
 13/01/31 11:48:16 WARN snappy.LoadSnappy: Snappy native library not
 loaded
 13/01/31 11:48:26 INFO mapred.JobClient: Running job:
 job_201301300613_0029
 13/01/31 11:48:27 INFO mapred.JobClient:  map 0% reduce 0%
 13/01/31 11:48:40 INFO mapred.JobClient: Task Id :
 attempt_201301300613_0029_m_00_0, Status : FAILED
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 WordCount$MapClass
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
  at
 org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException: WordCount$MapClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:247)
  at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
  ... 8 more




 Also I want to know how to debug Hadoop Program using eclipse.




 Thank You.


>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>


Re: the different size of file through hadoop streaming

2013-01-31 Thread 周梦想
I have known why.

When an executable is specified for mappers, each mapper task will launch
the executable as a separate process when the mapper is initialized. As the
mapper task runs, it converts its inputs into lines and feed the lines to
the stdin of the process. In the meantime, the mapper collects the line
oriented outputs from the stdout of the process and converts each line into
a key/value pair, which is collected as the output of the mapper. By
default, the *prefix of a line up to the first tab character* is the *key* and
the the rest of the line (excluding the tab character) will be the *value*.
However, this can be customized

When an executable is specified for reducers, each reducer task will launch
the executable as a separate process then the reducer is initialized. As
the reducer task runs, it converts its input key/values pairs into lines
and feeds the lines to the stdin of the process. In the meantime, the
reducer collects the line oriented outputs from the stdout of the process,
converts each line into a key/value pair, which is collected as the output
of the reducer. By default, the prefix of a line up to the first tab
character is the key and the the rest of the line (excluding the tab
character) is the value. However, this can be customized
So the output was added a tab (09) to the beginning of each line.

Andy
2013/2/1 周梦想 

> hello,
> I process a file using hadoop streaming. but I found streaming will add
> byte 0x09 before 0x0a. So the file is changed after streaming process.
> some one can tells me why add this byte to output?
>
> [zhouhh@Hadoop48 ~]$ ls -l README.txt
> -rw-r--r-- 1 zhouhh zhouhh 1399 Feb  1 10:53 README.txt
>
> [zhouhh@Hadoop48 ~]$ wc README.txt
>   34  182 1399 README.txt
> [zhouhh@Hadoop48 ~]$ hadoop fs -ls
> Found 3 items
> -rw-r--r--   2 zhouhh supergroup   9358 2013-01-10 17:52
> /user/zhouhh/fsimage
> drwxr-xr-x   - zhouhh supergroup  0 2013-02-01 10:30
> /user/zhouhh/gz
> -rw-r--r--   2 zhouhh supergroup 65 2012-09-26 14:10
> /user/zhouhh/test中文.txt
> [zhouhh@Hadoop48 ~]$ hadoop fs -put README.txt .
> [zhouhh@Hadoop48 ~]$ hadoop fs -ls
> Found 4 items
> -rw-r--r--   2 zhouhh supergroup   1399 2013-02-01 10:56
> /user/zhouhh/README.txt
> -rw-r--r--   2 zhouhh supergroup   9358 2013-01-10 17:52
> /user/zhouhh/fsimage
> drwxr-xr-x   - zhouhh supergroup  0 2013-02-01 10:30
> /user/zhouhh/gz
> -rw-r--r--   2 zhouhh supergroup 65 2012-09-26 14:10
> /user/zhouhh/test中文.txt
>
>
> [zhouhh@Hadoop48 ~]$ hadoop fs -ls README.txt
> Found 1 items
> -rw-r--r--   2 zhouhh supergroup   1399 2013-02-01 10:56
> /user/zhouhh/README.txt
>
> [zhouhh@Hadoop48 ~]$ hadoop jar
>  $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input README.txt
> -output wordcount1 -mapper /bin/cat -reducer /bin/sort
>
>
> [zhouhh@Hadoop48 ~]$ hadoop fs -ls wordcount/part*
> Found 1 items
> -rw-r--r--   2 zhouhh supergroup   *1433* 2013-02-01 11:20
> /user/zhouhh/wordcount/part-0
>
>
> [zhouhh@Hadoop48 ~]$ hadoop jar
>  $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input README.txt
> -output wordcount1 -mapper /bin/cat -reducer /usr/bin/wc
>
>  [zhouhh@Hadoop48 ~]$ hadoop fs -cat wordcount1/p*
>  34 182*1433*
>
> part of the two file of hex code:
> sort README.txt  :
>  streaming README.txt and reduce sort:
>   000: 0a0a 0a0a 0a0a 0a61 6c67 6f72 6974 686d  ...algorithm
>|  000: *090a 090a 090a 090a 090a 090a 090a* 616c
>  ..al
>   010: 732e 2020 5468 6520 666f 726d 2061 6e64  s.  The form and
>|  010: 676f 7269 7468 6d73 2e20 2054 6865 2066  gorithms.
>  The f
>   020: 206d 616e 6e65 7220 6f66 2074 6869 7320   manner of this
> |  020: 6f72 6d20 616e 6420 6d61 6e6e 6572 206f  orm and
> manner o
>   030: 4170 6163 6865 2053 6f66 7477 6172 6520  Apache Software
> |  030: 6620 7468 6973 2041 7061 6368 6520 536f  f this
> Apache So
>   040: 466f 756e 6461 7469 6f6e 0a61 6e64 206f  Foundation.and o
>|  040: 6674 7761 7265 2046 6f75 6e64 6174 696f  ftware
> Foundatio
>
> because there are 34 lines, so the file size add 34 of 09 byte.
> 1399+34=1433. why?
>
> Best regards,
> Andy
>


Re: Issue with running hadoop program using eclipse

2013-01-31 Thread Hemanth Yamijala
Previously, I have resolved this error by building a jar and then using the
API job.setJarByClass(.class). Can you please try that once ?


On Thu, Jan 31, 2013 at 6:40 PM, Vikas Jadhav wrote:

> Hi I know it class not found error
> but I have Map and reduce Class as part of Driver class
> So what is problem ?
>
> I want to ask whether it is compulsory to have jar of any Hadoop Program
> Because it say "No job jar file set " See JobConf(Class) or
> JobConf#setJar(String).
>
> I am running my program using eclipse(windows machine) and Hadoop(llinux
> machine) remotely.
>
>
>
> On Thu, Jan 31, 2013 at 12:40 PM, Mohammad Tariq wrote:
>
>> Hello Vikas,
>>
>>  It clearly shows that the class can not be found. For
>> debugging, you can write your MR job as a standalone java program and debug
>> it. It works. And if you want to just debug your mapper / reducer logic,
>> you should look into using MRUnit. There is a good 
>> write-upat
>>  Cloudera's blog section which talks about it in detail.
>>
>> HTH
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 31, 2013 at 11:56 AM, Vikas Jadhav 
>> wrote:
>>
>>> Hi
>>> I have one windows machine and one linux machine
>>> my eclipse is on winowds machine
>>> and hadoop running on single linux machine
>>> I am trying to run wordcount program from eclipse(on windows machine) to
>>> Hadoop(on linux machine)
>>>  I getting following error
>>>
>>>  3/01/31 11:48:14 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 13/01/31 11:48:15 WARN mapred.JobClient: No job jar file set.  User
>>> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
>>> 13/01/31 11:48:16 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 13/01/31 11:48:16 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 13/01/31 11:48:16 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 13/01/31 11:48:26 INFO mapred.JobClient: Running job:
>>> job_201301300613_0029
>>> 13/01/31 11:48:27 INFO mapred.JobClient:  map 0% reduce 0%
>>> 13/01/31 11:48:40 INFO mapred.JobClient: Task Id :
>>> attempt_201301300613_0029_m_00_0, Status : FAILED
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> WordCount$MapClass
>>>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
>>>  at
>>> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
>>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
>>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:396)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>>  at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> Caused by: java.lang.ClassNotFoundException: WordCount$MapClass
>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>  at java.lang.Class.forName0(Native Method)
>>>  at java.lang.Class.forName(Class.java:247)
>>>  at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>>>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
>>>  ... 8 more
>>>
>>>
>>>
>>>
>>> Also I want to know how to debug Hadoop Program using eclipse.
>>>
>>>
>>>
>>>
>>> Thank You.
>>>
>>>
>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>


the different size of file through hadoop streaming

2013-01-31 Thread 周梦想
hello,
I process a file using hadoop streaming. but I found streaming will add
byte 0x09 before 0x0a. So the file is changed after streaming process.
some one can tells me why add this byte to output?

[zhouhh@Hadoop48 ~]$ ls -l README.txt
-rw-r--r-- 1 zhouhh zhouhh 1399 Feb  1 10:53 README.txt

[zhouhh@Hadoop48 ~]$ wc README.txt
  34  182 1399 README.txt
[zhouhh@Hadoop48 ~]$ hadoop fs -ls
Found 3 items
-rw-r--r--   2 zhouhh supergroup   9358 2013-01-10 17:52
/user/zhouhh/fsimage
drwxr-xr-x   - zhouhh supergroup  0 2013-02-01 10:30 /user/zhouhh/gz
-rw-r--r--   2 zhouhh supergroup 65 2012-09-26 14:10
/user/zhouhh/test中文.txt
[zhouhh@Hadoop48 ~]$ hadoop fs -put README.txt .
[zhouhh@Hadoop48 ~]$ hadoop fs -ls
Found 4 items
-rw-r--r--   2 zhouhh supergroup   1399 2013-02-01 10:56
/user/zhouhh/README.txt
-rw-r--r--   2 zhouhh supergroup   9358 2013-01-10 17:52
/user/zhouhh/fsimage
drwxr-xr-x   - zhouhh supergroup  0 2013-02-01 10:30 /user/zhouhh/gz
-rw-r--r--   2 zhouhh supergroup 65 2012-09-26 14:10
/user/zhouhh/test中文.txt


[zhouhh@Hadoop48 ~]$ hadoop fs -ls README.txt
Found 1 items
-rw-r--r--   2 zhouhh supergroup   1399 2013-02-01 10:56
/user/zhouhh/README.txt

[zhouhh@Hadoop48 ~]$ hadoop jar
 $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input README.txt
-output wordcount1 -mapper /bin/cat -reducer /bin/sort


[zhouhh@Hadoop48 ~]$ hadoop fs -ls wordcount/part*
Found 1 items
-rw-r--r--   2 zhouhh supergroup   *1433* 2013-02-01 11:20
/user/zhouhh/wordcount/part-0


[zhouhh@Hadoop48 ~]$ hadoop jar
 $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input README.txt
-output wordcount1 -mapper /bin/cat -reducer /usr/bin/wc

[zhouhh@Hadoop48 ~]$ hadoop fs -cat wordcount1/p*
 34 182*1433*

part of the two file of hex code:
sort README.txt  :
 streaming README.txt and reduce sort:
  000: 0a0a 0a0a 0a0a 0a61 6c67 6f72 6974 686d  ...algorithm
   |  000: *090a 090a 090a 090a 090a 090a 090a* 616c
 ..al
  010: 732e 2020 5468 6520 666f 726d 2061 6e64  s.  The form and
   |  010: 676f 7269 7468 6d73 2e20 2054 6865 2066  gorithms.
 The f
  020: 206d 616e 6e65 7220 6f66 2074 6869 7320   manner of this
  |  020: 6f72 6d20 616e 6420 6d61 6e6e 6572 206f  orm and
manner o
  030: 4170 6163 6865 2053 6f66 7477 6172 6520  Apache Software
  |  030: 6620 7468 6973 2041 7061 6368 6520 536f  f this
Apache So
  040: 466f 756e 6461 7469 6f6e 0a61 6e64 206f  Foundation.and o
   |  040: 6674 7761 7265 2046 6f75 6e64 6174 696f  ftware
Foundatio

because there are 34 lines, so the file size add 34 of 09 byte.
1399+34=1433. why?

Best regards,
Andy


What happens when a block is being invalidated/deleted on the DataNode when it is being read?

2013-01-31 Thread Xiao Yu
Hi, all

I am wondering if this situation could happen: a block is being
invalidated/deleted on a DataNode(by the NameNode, for example, to delete a
over-replicated block) while it is also concurrently being read (by some
clients). If this could happen, how does hdfs handle this issue?

Thank you very much.

Xiao Yu


o.a.h.m.MapTask: data buffer - Takes long time to come out

2013-01-31 Thread Rajesh Balamohan
Hi Experts,

At times, we have seen tasks taking a lot long time map tasks and looking
at the task logs, its spending a lot of time in "o.a.h.m.MapTask: data
buffer = xxx / yyy".

After sometime it prints the next line " MapTask: record buffer = xxx /
yyy". I initially thought it was allocating data buffer in JVM and causing
memory pressure. But it doesn't seem to be the case and no swapping is
happening in the system as well.

Has anyone else faced similar issue with MR-1.0? Any pointers would be
appreciated.

-- 
~Rajesh.B


Re: Dell Hardware

2013-01-31 Thread Ted Dunning
We have tested both machines in our labs at MapR and both work well.  Both
run pretty hot so you need to keep a good eye on that.

The R720 will have higher wattage per unit of storage due to the smaller
number of drives per chassis.  That may be a good match for ordinary Hadoop
due to the lower I/O efficiency, but with higher performance distributions
like MapR, you will probably do better on combined acquisition, ops and
power costs per terabyte with the C2100 series.

As you do your selection, you need to determine what your primary
constraint will be.  Will it be storage size?  I/O rates?  Compute load?

Once you have that figured out, the hardware selection should fall out
directly from a TCO analysis that looks at the different scenarios.

On Thu, Jan 31, 2013 at 11:13 AM, Andy Isaacson  wrote:

> On Thu, Jan 31, 2013 at 8:53 AM, Artem Ervits  wrote:
> > Does anyone run Hadoop on Dell R720 model of servers? Dell site lists
> C2100
> > model of servers as best fit for Hadoop workloads. What does community
> > recommend?
>
> The R720 supports up to 2 xeon CPUs and 8 drives in 2U. If configured
> appropriately it will work fine as a Hadoop node. For example, 2x
> Xeon, 64GB RAM, 8x 2TB SATA configured as JBOD, would be a reasonable
> worker node. Buy 20 of those and now you're cooking with gas. :)
>
> The C2100 is probably somewhat cheaper for a given storage amount
> since it fits 12 disks in a 2U chassis (so you don't need to buy as
> many machines to achieve a given amount of capacity). Dell doesn't
> provide a configurator for that model so I can't compare prices.
> Apparently it's using the previous generation of Xeon -- 5500/5600
> versus e5 on the R720 -- which probably means poorer throughput per
> watt.
>
> The right answer will depend on your application and workload.
>
> -andy
>


Re: Dell Hardware

2013-01-31 Thread Andy Isaacson
On Thu, Jan 31, 2013 at 8:53 AM, Artem Ervits  wrote:
> Does anyone run Hadoop on Dell R720 model of servers? Dell site lists C2100
> model of servers as best fit for Hadoop workloads. What does community
> recommend?

The R720 supports up to 2 xeon CPUs and 8 drives in 2U. If configured
appropriately it will work fine as a Hadoop node. For example, 2x
Xeon, 64GB RAM, 8x 2TB SATA configured as JBOD, would be a reasonable
worker node. Buy 20 of those and now you're cooking with gas. :)

The C2100 is probably somewhat cheaper for a given storage amount
since it fits 12 disks in a 2U chassis (so you don't need to buy as
many machines to achieve a given amount of capacity). Dell doesn't
provide a configurator for that model so I can't compare prices.
Apparently it's using the previous generation of Xeon -- 5500/5600
versus e5 on the R720 -- which probably means poorer throughput per
watt.

The right answer will depend on your application and workload.

-andy


Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

2013-01-31 Thread Alejandro Abdelnur
Hi Tony, from what i understand your prob is not with MTOF but with you
wanting to run 2 jobs using the same output directory, the second job will
fail because the output dir already existed. My take would be tweaking your
jobs to use a temp output dir, and moving them to the required (final)
location upon completion.

thx



On Thu, Jan 31, 2013 at 8:22 AM, Tony Burton wrote:

> Hi everyone,
>
> Some of you might recall this topic, which I worked on with the list's
> help back in August last year - see email trail below. Despite initial
> success of the discovery, I had the shelve the approach as I ended up using
> a different solution (for reasons I forget!) with the implementation that
> was ultimately used for that particular project.
>
> I'm now in a position to be working on a similar new task, where I've
> successfully implemented the combination of LazyOutputFormat and
> MultipleOutputs using hadoop 1.0.3 to write out to multiple custom output
> locations. However, I've hit another snag which I'm hoping you might help
> me work through.
>
> I'm going to be running daily tasks to extract data from XML files
> (specifically, the data stored in certain nodes of the XML), stored on AWS
> S3 using object names with the following format:
>
> s3://inputbucket/data/2013/1/13/
>
> I want to extract items from the XML and write out as follows:
>
> s3://outputbucket/path//20130113/
>
> For one day of data, this works fine. I pass in s3://inputbucket/data and
> s3://outputbucket/path as input and output arguments, along with my run
> date (20130113) which gets manipulated and appended where appropriate to
> form the precise read and write locations, for example
>
> FileInputFormat.setInputhPath(job, " s3://inputbucket/data");
> FileOutputFormat.setOutputPath(job, "s3://outputbucket/path");
>
> Then MultipleOutputs adds on my XML node names underneath
> s3://outputbucket/path automatically.
>
> However, for the next day's run, the job gets to
> FileOutputFormat.setOutputPath and sees that the output path
> (s3://outputbucket/path) already exists, and throws a
> FileAlreadyExistsException from FileOutputFormat.checkOutputSpecs() - even
> though my ultimate subdirectory, to be constructed by MultipleOutputs does
> not already exist.
>
> Is there any way around this? I'm given hope by this, from
> http://hadoop.apache.org/docs/r1.0.3/api/org/apache/hadoop/fs/FileAlreadyExistsException.html:
> "public class FileAlreadyExistsException extends IOException - Used when
> target file already exists for any operation *and is not configured to be
> overwritten*" (my emphasis). Is it possible to deconfigure the overwrite
> protection?
>
> If not, I suppose one other way ahead is to create my own FileOutputFormat
> where the checkOutputSpecs() is a bit less fussy; another might be to write
> to a "temp" directory and programmatically move it to the desired output
> when the job completes successfully, although this is getting to feel a bit
> "hacky" to me.
>
> Thanks for any feedback!
>
> Tony
>
>
>
>
>
>
>
> 
> From: Harsh J [ha...@cloudera.com]
> Sent: 31 August 2012 10:47
> To: user@hadoop.apache.org
> Subject: Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat
>
> Good finding, that OF slipped my mind. We can mention on the
> MultipleOutputs javadocs for the new API to use the LazyOutputFormat for
> the job-level config. Please file a JIRA for this under MAPREDUCE project
> on the Apache JIRA?
>
> On Fri, Aug 31, 2012 at 2:32 PM, Tony Burton 
> wrote:
> > Hi Harsh,
> >
> > I tried using NullOutputFormat as you suggested, however simply using
> >
> > job.setOutputFormatClass(NullOutputFormat.class);
> >
> > resulted in no output at all. Although I've not tried overriding
> getOutputCommitter in NullOutputFormat as you suggested, I discovered
> LazyOutputFormat which only writes when it has to, "the output file is
> created only when the first record is emitted for a given partition" (from
> "Hadoop: The Definitive Guide").
> >
> > Instead of
> >
> > job.setOutputFormatClass(TextOutputFormat.class);
> >
> > use LazyOutputFormat like this:
> >
> > LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
> >
> > So now my unnamed MultipleOutputs are handling to segmented results, and
> LazyOutputFormat is suppressing the default output. Good job!
> >
> > Tony
> >
> >
> >
> >
> >
> > 
> > From: Harsh J [ha...@cloudera.com]
> > Sent: 29 August 2012 17:05
> > To: user@hadoop.apache.org
> > Subject: Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat
> >
> > Hi Tony,
> >
> > On Wed, Aug 29, 2012 at 9:30 PM, Tony Burton 
> wrote:
> >> Success so far!
> >>
> >> I followed the example given by Tom on the link to the
> MultipleOutputs.html API you suggested.
> >>
> >> I implemented a WordCount MR job using hadoop 1.0.3 and segmented the
> output depending on word length: output to directory "sml" for less than 10
> characters, "med" 

RE: Oozie workflow error - renewing token issue

2013-01-31 Thread Corbett Martin
Thanks Daryn for the reply.

Here's some answers that Alejandro asked for:


1)   Versions reported by Cloudera Manager:
Flume NG CDH4 1.2.0+121
Hadoop   CDH4 2.0.0+545
HBase CDH4 0.92.1+156
HDFS (CDH4 only) CDH4 2.0.0+545
HiveCDH4 0.9.0+150
HttpFS (CDH4 only)   CDH4 2.0.0+545
HueCDH4 2.1.0+210
Hue Plugins  CDH4 2.1.0+210
MapReduce 1 (CDH4 only)   CDH4 0.20.2+1261
MapReduce 2 (CDH4 only)   CDH4 2.0.0+545
Oozie  CDH4 3.2.0+123
Yarn (CDH4 only)   CDH4 2.0.0+545
ZookeeperCDH4 3.4.3+27


2)   We didn't specifically enable Kerberos so my guess is its off.

3)   Bummer here - seems the logs have been cleaned up.  I can repost some new 
ones when we try to run again.

Yes we're running the job as hdfs.

Let me know if you have some other advice.  Thanks for your feedback.

~Corbett Martin
Software Architect
AbsoluteAR Accounts Receivable Services - An NHIN Solution

From: Daryn Sharp [mailto:da...@yahoo-inc.com]
Sent: Wednesday, January 30, 2013 3:07 PM
To: user@hadoop.apache.org
Cc: u...@oozie.apache.org
Subject: Re: Oozie workflow error - renewing token issue

The token renewer needs to be the job tracker principal.  I think oozie had "mr 
token" hardcoded at one point, but later changed it to use a conf setting.

The rest of the log looks very odd - ie. it looks like security is off, but it 
can't be.  It's trying to renew hdfs tokens issued for the "hdfs" superuser.  
Are you running your job as hdfs?  With security enabled, the container 
executor won't run jobs for hdfs...  It's also trying to renew as "mapred 
(auth:SIMPLE)".  SIMPLE means that security is disabled, which doesn't make 
sense since tokens require kerberos.

The cannot find renewer class for MAPREDUCE_DELEGATION_TOKEN is also odd 
because it did subsequently try to renew the token.

Daryn



On Jan 30, 2013, at 2:14 PM, Alejandro Abdelnur wrote:


Cobert,

[Moving thread to user@oozie.a.o, BCCing 
common-user@hadoop.a.o]

* What version of Oozie are you using?
* Is the cluster a secure setup (Kerberos enabled)?
* Would you mind posting the complete launcher logs?

Thx


On Wed, Jan 30, 2013 at 6:14 AM, Corbett Martin 
mailto:comar...@nhin.com>> wrote:

Thanks for the tip.

The sqoop command listed in the stdout log file is:
sqoop
import
--driver
org.apache.derby.jdbc.ClientDriver
--connect
jdbc:derby://test-server:1527/mondb
--username
monuser
--password
x
--table
MONITOR
--split-by
request_id
--target-dir
/mon/import
--append
--incremental
append
--check-column
request_id
--last-value
200

The following information is from the sterr and stdout log files.

/var/log/hadoop-0.20-mapreduce/userlogs/job_201301231648_0029/attempt_201301231648_0029_m_00_0
 # more stderr
No such sqoop tool: sqoop. See 'sqoop help'.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
exit code [1]


/var/log/hadoop-0.20-mapreduce/userlogs/job_201301231648_0029/attempt_201301231648_0029_m_00_0
 # more stdout
...
...
...
>>> Invoking Sqoop command line now >>>

1598 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not 
been set in the environment. Cannot check for additional configuration.
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher ends


~Corbett Martin
Software Architect
AbsoluteAR Accounts Receivable Services - An NHIN Solution
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Tuesday, January 29, 2013 10:11 PM
To: mailto:user@hadoop.apache.org>>
Subject: Re: Oozie workflow error - renewing token issue

The job that Oozie launches for your action, which you are observing is 
failing, does its own logs (task logs) show any errors?

On Wed, Jan 30, 2013 at 4:59 AM, Corbett Martin 
mailto:comar...@nhin.com>> wrote:
> Oozie question
>
>
>
> I'm trying to run an Oozie workflow (sqoop action) from the Hue
> console and it fails every time.  No exception in the oozie log but I
> see this in the Job Tracker log file.
>
>
>
> Two primary issues seem to be
>
> 1.   Client mapred tries to renew a token with renewer specified as mr token
>
>
>
> And
>
>
>
> 2.   Cannot find class for token kind MAPREDUCE_DELEGATION_TOKEN
>
>
>
> Any ideas how to get past this?
>
>
>
> Full Stacktrace:

Re: Issue with Reduce Side join using datajoin package

2013-01-31 Thread Vikas Jadhav
***source 



public class MyJoin extends Configured implements Tool {



public static class MapClass extends DataJoinMapperBase {

protected Text generateInputTag(String inputFile) {
System.out.println("Starting generateInputTag() : "+inputFile);
String datasource = inputFile.split("-")[0];
return new Text(datasource);
}

protected Text generateGroupKey(TaggedMapOutput aRecord) {
System.out.println(" Statring generateGroupKey() : "+aRecord);
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(",");
String groupKey = tokens[0];
return new Text(groupKey);
}

protected TaggedMapOutput generateTaggedMapOutput(Object value) {
System.out.println("starting  generateTaggedMapOutput() value
: "+value);
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}

public static class Reduce extends DataJoinReducerBase {

protected TaggedMapOutput combine(Object[] tags, Object[] values) {
 System.out.println("combine :");
if (tags.length < 2) return null;
String joinedStr = "";
for (int i=0; i 0) joinedStr += ",";
TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(",", 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new Text(joinedStr));
retv.setTag((Text) tags[0]);
return retv;
}
}

public static class TaggedWritable extends TaggedMapOutput {

private Writable data;


public TaggedWritable()
{
this.tag = new Text();

}//end empty( taking no parameters) constructor TaggedWritable

public TaggedWritable(Writable data) {

this.tag = new Text("");
this.data = data;
}

public Writable getData() {
return data;
}

public void write(DataOutput out) throws IOException {
//System.out.println(");
this.tag.write(out);
this.data.write(out);
System.out.println("Tag  :"+tag+" Data  :"+ data);
}

/*
public void readFields(DataInput in) throws IOException {
System.out.println(" Starting short readFields(): "+ in);
this.tag.readFields(in);
this.data.readFields(in);
} */


public void readFields(DataInput in) throws IOException {
System.out.println(" Starting short readFields(): "+ in);
this.tag.readFields(in);
String w = in.toString();
if(this.data == null)
try {
this.data =(Writable)
ReflectionUtils.newInstance(Class.forName(w), null);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
this.data.readFields(in);
}


}
public int run(String[] args) throws Exception {
System.out.println("Starting run() Method:");
Configuration conf = getConf();
conf.addResource(new
Path("/home/vikas/project/hadoop-1.0.3/conf/core-site.xml"));
conf.addResource(new
Path("/home/vikas/project/hadoop-1.0.3/conf/mapred-site.xml"));
conf.addResource(new
Path("/home/vikas/project/hadoop-1.0.3/conf/hdfs-site.xml"));
JobConf job = new JobConf(conf, MyJoin.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

job.setJobName("DataJoin_cust X order");

job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);

job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(TaggedWritable.class);
job.set("mapred.textoutputformat.separator", ",");
JobClient.runJob(job);
return 0;
}

public static void main(String[] args) throws Exception {
System.out.println("Starting main() function:");
int res = ToolRunner.run(new Configuration(),
 new MyJoin(),
 args);

System.exit(res);
}
}



*and
error*


13/01/31 23:04:26 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
13/01/31 23:04:26 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/31 23:04:26 INFO mapred.FileInputFormat: Total input paths to proces

Dell Hardware

2013-01-31 Thread Artem Ervits
Hello all,

Does anyone run Hadoop on Dell R720 model of servers? Dell site lists C2100 
model of servers as best fit for Hadoop workloads. What does community 
recommend?

Artem Ervits
New York Presbyterian Hospital





This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.






This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.





All MAP Jobs(Java Custom Map Reduce Program) are assigned to one Node why?

2013-01-31 Thread samir das mohapatra
Hi All,

  I am using cdh4 with MRv1 . When I am running any hadoop mapreduce
program from  java  , all the map task is assigned to one node. It suppose
to distribute the map task among the cluster's nodes.

 Note : 1) My jobtracker web-UI is showing 500 nodes
2) when  it is comming to reducer , then it is sponning into
other  node  (other then map node)

Can nay one guide me why it is like so

Regards,
samir.


RE: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

2013-01-31 Thread Tony Burton
Hi everyone,

Some of you might recall this topic, which I worked on with the list's help 
back in August last year - see email trail below. Despite initial success of 
the discovery, I had the shelve the approach as I ended up using a different 
solution (for reasons I forget!) with the implementation that was ultimately 
used for that particular project.

I'm now in a position to be working on a similar new task, where I've 
successfully implemented the combination of LazyOutputFormat and 
MultipleOutputs using hadoop 1.0.3 to write out to multiple custom output 
locations. However, I've hit another snag which I'm hoping you might help me 
work through.

I'm going to be running daily tasks to extract data from XML files 
(specifically, the data stored in certain nodes of the XML), stored on AWS S3 
using object names with the following format:

s3://inputbucket/data/2013/1/13/

I want to extract items from the XML and write out as follows:

s3://outputbucket/path//20130113/

For one day of data, this works fine. I pass in s3://inputbucket/data and 
s3://outputbucket/path as input and output arguments, along with my run date 
(20130113) which gets manipulated and appended where appropriate to form the 
precise read and write locations, for example

FileInputFormat.setInputhPath(job, " s3://inputbucket/data");
FileOutputFormat.setOutputPath(job, "s3://outputbucket/path");

Then MultipleOutputs adds on my XML node names underneath 
s3://outputbucket/path automatically.

However, for the next day's run, the job gets to FileOutputFormat.setOutputPath 
and sees that the output path (s3://outputbucket/path) already exists, and 
throws a FileAlreadyExistsException from FileOutputFormat.checkOutputSpecs() - 
even though my ultimate subdirectory, to be constructed by MultipleOutputs does 
not already exist.

Is there any way around this? I'm given hope by this, from 
http://hadoop.apache.org/docs/r1.0.3/api/org/apache/hadoop/fs/FileAlreadyExistsException.html:
 "public class FileAlreadyExistsException extends IOException - Used when 
target file already exists for any operation *and is not configured to be 
overwritten*" (my emphasis). Is it possible to deconfigure the overwrite 
protection?

If not, I suppose one other way ahead is to create my own FileOutputFormat 
where the checkOutputSpecs() is a bit less fussy; another might be to write to 
a "temp" directory and programmatically move it to the desired output when the 
job completes successfully, although this is getting to feel a bit "hacky" to 
me.

Thanks for any feedback!

Tony








From: Harsh J [ha...@cloudera.com]
Sent: 31 August 2012 10:47
To: user@hadoop.apache.org
Subject: Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

Good finding, that OF slipped my mind. We can mention on the MultipleOutputs 
javadocs for the new API to use the LazyOutputFormat for the job-level config. 
Please file a JIRA for this under MAPREDUCE project on the Apache JIRA?

On Fri, Aug 31, 2012 at 2:32 PM, Tony Burton  wrote:
> Hi Harsh,
>
> I tried using NullOutputFormat as you suggested, however simply using
>
> job.setOutputFormatClass(NullOutputFormat.class);
>
> resulted in no output at all. Although I've not tried overriding 
> getOutputCommitter in NullOutputFormat as you suggested, I discovered 
> LazyOutputFormat which only writes when it has to, "the output file is 
> created only when the first record is emitted for a given partition" (from 
> "Hadoop: The Definitive Guide").
>
> Instead of
>
> job.setOutputFormatClass(TextOutputFormat.class);
>
> use LazyOutputFormat like this:
>
> LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
>
> So now my unnamed MultipleOutputs are handling to segmented results, and 
> LazyOutputFormat is suppressing the default output. Good job!
>
> Tony
>
>
>
>
>
> 
> From: Harsh J [ha...@cloudera.com]
> Sent: 29 August 2012 17:05
> To: user@hadoop.apache.org
> Subject: Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat
>
> Hi Tony,
>
> On Wed, Aug 29, 2012 at 9:30 PM, Tony Burton  
> wrote:
>> Success so far!
>>
>> I followed the example given by Tom on the link to the MultipleOutputs.html 
>> API you suggested.
>>
>> I implemented a WordCount MR job using hadoop 1.0.3 and segmented the output 
>> depending on word length: output to directory "sml" for less than 10 
>> characters, "med" for between 10 and 20 characters, "lrg" otherwise.
>>
>> I used out.write(key, new IntWritable(sum), generateFilename(key,
>> sum)); to write the output, and generateFileName to create the custom
>> directory name/filename. You need to provide the start of the
>> filename as well otherwise your output files will be -r-0,
>> -r-1 etc. (so, for example, return "sml/part"; etc)
>
> Thanks for these notes, should come helpful for those who search!
>
>> Also required: as Tom states, override Reducer.setup() to create the 
>> MultipleOutputs

Re: How to find Blacklisted Nodes via cli.

2013-01-31 Thread Dhanasekaran Anbalagan
Hi Harsh,

Thanks I understand, In HDFS Dead Nodes concept.
In MapReduce How to clear blacklisted nodes via command line. We are using
CM, we have frequently nodes will be blacklisted It's very hard to select
nodes and restart the mapreduce services via CM.

Please guide me command line option clear the blacklisted nodes restart
the task tracker on particular node

-Dhanasekaran.

Did I learn something today? If not, I wasted it.


On Thu, Jan 31, 2013 at 9:02 AM, Harsh J  wrote:

> There isn't a concept of 'blacklist' in HDFS, it exists only in
> MapReduce (MR1) and YARN.
>
> Dead Nodes (and those that have been excluded) are reported via "hdfs
> dfsadmin -report" as indicated earlier.
>
> On Thu, Jan 31, 2013 at 7:28 PM, Dhanasekaran Anbalagan
>  wrote:
> > Hi Hemanth,
> >
> > Thanks for spending your valuable time for me.
> >
> > This mapred job -list-blacklisted-trackers for find jobtracker level but
> I
> > want hdfs level I didn't find any command. [HDFS Blacklisted nodes]
> Please
> > guide me. And also to clear blacklisted nodes via command line
> >
> > -Dhanasekaran.
> >
> > Did I learn something today? If not, I wasted it.
> >
> >
> > On Wed, Jan 30, 2013 at 11:25 PM, Hemanth Yamijala
> >  wrote:
> >>
> >> Hi,
> >>
> >> Part answer: you can get the blacklisted tasktrackers using the command
> >> line:
> >>
> >> mapred job -list-blacklisted-trackers.
> >>
> >> Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if
> it
> >> works fine after some time. Though I am not very sure about this.
> >>
> >> Thanks
> >> hemanth
> >>
> >>
> >> On Wed, Jan 30, 2013 at 9:35 PM, Dhanasekaran Anbalagan
> >>  wrote:
> >>>
> >>> Hi Guys,
> >>>
> >>> How to find Blacklisted Nodes via, command line. I want to see job
> >>> Tracker Blacklisted Nodes and hdfs Blacklisted Nodes.
> >>>
> >>> and also how to clear blacklisted nodes to clear start. The only option
> >>> to restart the service. some other way clear the Blacklisted Nodes.
> >>>
> >>> please guide me.
> >>>
> >>> -Dhanasekaran.
> >>>
> >>> Did I learn something today? If not, I wasted it.
> >>
> >>
> >
>
>
>
> --
> Harsh J
>


Re: How to find Blacklisted Nodes via cli.

2013-01-31 Thread Harsh J
There isn't a concept of 'blacklist' in HDFS, it exists only in
MapReduce (MR1) and YARN.

Dead Nodes (and those that have been excluded) are reported via "hdfs
dfsadmin -report" as indicated earlier.

On Thu, Jan 31, 2013 at 7:28 PM, Dhanasekaran Anbalagan
 wrote:
> Hi Hemanth,
>
> Thanks for spending your valuable time for me.
>
> This mapred job -list-blacklisted-trackers for find jobtracker level but I
> want hdfs level I didn't find any command. [HDFS Blacklisted nodes] Please
> guide me. And also to clear blacklisted nodes via command line
>
> -Dhanasekaran.
>
> Did I learn something today? If not, I wasted it.
>
>
> On Wed, Jan 30, 2013 at 11:25 PM, Hemanth Yamijala
>  wrote:
>>
>> Hi,
>>
>> Part answer: you can get the blacklisted tasktrackers using the command
>> line:
>>
>> mapred job -list-blacklisted-trackers.
>>
>> Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if it
>> works fine after some time. Though I am not very sure about this.
>>
>> Thanks
>> hemanth
>>
>>
>> On Wed, Jan 30, 2013 at 9:35 PM, Dhanasekaran Anbalagan
>>  wrote:
>>>
>>> Hi Guys,
>>>
>>> How to find Blacklisted Nodes via, command line. I want to see job
>>> Tracker Blacklisted Nodes and hdfs Blacklisted Nodes.
>>>
>>> and also how to clear blacklisted nodes to clear start. The only option
>>> to restart the service. some other way clear the Blacklisted Nodes.
>>>
>>> please guide me.
>>>
>>> -Dhanasekaran.
>>>
>>> Did I learn something today? If not, I wasted it.
>>
>>
>



-- 
Harsh J


Re: How to find Blacklisted Nodes via cli.

2013-01-31 Thread Dhanasekaran Anbalagan
Hi Hemanth,

Thanks for spending your valuable time for me.

This mapred job -list-blacklisted-trackers for find jobtracker level but I
want hdfs level I didn't find any command. [HDFS Blacklisted nodes] Please
guide me. And also to clear blacklisted nodes via command line

-Dhanasekaran.

Did I learn something today? If not, I wasted it.


On Wed, Jan 30, 2013 at 11:25 PM, Hemanth Yamijala <
yhema...@thoughtworks.com> wrote:

> Hi,
>
> Part answer: you can get the blacklisted tasktrackers using the command
> line:
>
> mapred job -list-blacklisted-trackers.
>
> Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if it
> works fine after some time. Though I am not very sure about this.
>
> Thanks
> hemanth
>
>
> On Wed, Jan 30, 2013 at 9:35 PM, Dhanasekaran Anbalagan <
> bugcy...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> How to find Blacklisted Nodes via, command line. I want to see job
>> Tracker Blacklisted Nodes and hdfs Blacklisted Nodes.
>>
>> and also how to clear blacklisted nodes to clear start. The
>> only option to restart the service. some other way clear the Blacklisted
>> Nodes.
>>
>> please guide me.
>>
>> -Dhanasekaran.
>>
>> Did I learn something today? If not, I wasted it.
>>
>
>


Re: Issue with running hadoop program using eclipse

2013-01-31 Thread Vikas Jadhav
Hi I know it class not found error
but I have Map and reduce Class as part of Driver class
So what is problem ?

I want to ask whether it is compulsory to have jar of any Hadoop Program
Because it say "No job jar file set " See JobConf(Class) or
JobConf#setJar(String).

I am running my program using eclipse(windows machine) and Hadoop(llinux
machine) remotely.



On Thu, Jan 31, 2013 at 12:40 PM, Mohammad Tariq  wrote:

> Hello Vikas,
>
>  It clearly shows that the class can not be found. For
> debugging, you can write your MR job as a standalone java program and debug
> it. It works. And if you want to just debug your mapper / reducer logic,
> you should look into using MRUnit. There is a good 
> write-upat
>  Cloudera's blog section which talks about it in detail.
>
> HTH
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 31, 2013 at 11:56 AM, Vikas Jadhav 
> wrote:
>
>> Hi
>> I have one windows machine and one linux machine
>> my eclipse is on winowds machine
>> and hadoop running on single linux machine
>> I am trying to run wordcount program from eclipse(on windows machine) to
>> Hadoop(on linux machine)
>>  I getting following error
>>
>>  3/01/31 11:48:14 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 13/01/31 11:48:15 WARN mapred.JobClient: No job jar file set.  User
>> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
>> 13/01/31 11:48:16 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 13/01/31 11:48:16 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 13/01/31 11:48:16 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 13/01/31 11:48:26 INFO mapred.JobClient: Running job:
>> job_201301300613_0029
>> 13/01/31 11:48:27 INFO mapred.JobClient:  map 0% reduce 0%
>> 13/01/31 11:48:40 INFO mapred.JobClient: Task Id :
>> attempt_201301300613_0029_m_00_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> WordCount$MapClass
>>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
>>  at
>> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:396)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>  at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: java.lang.ClassNotFoundException: WordCount$MapClass
>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>  at java.lang.Class.forName0(Native Method)
>>  at java.lang.Class.forName(Class.java:247)
>>  at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
>>  ... 8 more
>>
>>
>>
>>
>> Also I want to know how to debug Hadoop Program using eclipse.
>>
>>
>>
>>
>> Thank You.
>>
>>
>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*