Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-29 Thread Pralabh Kumar
Hi

What's the relationship between HDFS_BYTE_READ and Map input bytes counter
. Why can they be different for particular MR job.

Thanks and Regards

> Pralabh Kumar


Re: Multiple ways to write Hadoop program driver - Which one to choose?

2013-04-29 Thread Jens Scheidtmann
Dear Chandrash3khar K0tekar,

Using the run() method implies implementing Tool and using ToolRunner. This
gives as additional benefit that some "standard" hadoop command line
options are available. See here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/util/ToolRunner.java

Best regards,

Jens





VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Hello,

This might be something very obvious that I am missing but this has been
bugging me and I am unable to find what am I missing?

I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
shell and hadoop commands.

When I give the following command:

'hbase version'

I get the following output which is correct and expected:
---
13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 07:47:42 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

But when I I kick of the VersionInfo class manually (I do see that there is
a main method in there), I get an Unknown result? Why is that?
Command:
'java  -cp
/usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
org.apache.hadoop.hbase.util.VersionInfo'

Output:
---
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: HBase Unknown
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: Subversion Unknown -r Unknown
Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
logVersion
INFO: Compiled by Unknown on Unknown

Now this is causing problems when I am trying to run my HBase client on
this machine as the it aborts with the following error:
---
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

This means that the hbase-default.xml in the hbase jar is being picked up
but the version info captured/compiled through annotations is not? How is
it possible if 'hbase shell' (or hadoop version') works fine!

Please advise. Thanks a lot. I will be very grateful.

Regards,
Shahab


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Ted Yu
bq. 'java  -cp /usr/lib/hbase/hbase...

Instead of hard coding class path, can you try specifying `hbase classpath`
?

Cheers

On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus wrote:

> Hello,
>
> This might be something very obvious that I am missing but this has been
> bugging me and I am unable to find what am I missing?
>
> I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
> and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
> shell and hadoop commands.
>
> When I give the following command:
>
> 'hbase version'
>
> I get the following output which is correct and expected:
> ---
> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> But when I I kick of the VersionInfo class manually (I do see that there
> is a main method in there), I get an Unknown result? Why is that?
> Command:
> 'java  -cp
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
> org.apache.hadoop.hbase.util.VersionInfo'
>
> Output:
> ---
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: HBase Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Subversion Unknown -r Unknown
> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
> logVersion
> INFO: Compiled by Unknown on Unknown
>
> Now this is causing problems when I am trying to run my HBase client on
> this machine as the it aborts with the following error:
> ---
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> This means that the hbase-default.xml in the hbase jar is being picked up
> but the version info captured/compiled through annotations is not? How is
> it possible if 'hbase shell' (or hadoop version') works fine!
>
> Please advise. Thanks a lot. I will be very grateful.
>
> Regards,
> Shahab
>


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Ted, Sorry I didn't understand. What do you mean exactly by "specifying
`hbase classpath` "? You mean declare a environment variable
'HBASE_CLASSPATH'?

Regards,
Shaahb


On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu  wrote:

> bq. 'java  -cp /usr/lib/hbase/hbase...
>
> Instead of hard coding class path, can you try specifying `hbase
> classpath` ?
>
> Cheers
>
>
> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus wrote:
>
>> Hello,
>>
>> This might be something very obvious that I am missing but this has been
>> bugging me and I am unable to find what am I missing?
>>
>> I have hadoop and hbase installed on Linux machine.
>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>> and I can invoke hbase shell and hadoop commands.
>>
>> When I give the following command:
>>
>> 'hbase version'
>>
>> I get the following output which is correct and expected:
>> ---
>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>> -r Unknown
>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>  1 18:01:09 PDT 2012
>>
>> But when I I kick of the VersionInfo class manually (I do see that there
>> is a main method in there), I get an Unknown result? Why is that?
>> Command:
>> 'java  -cp
>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>> org.apache.hadoop.hbase.util.VersionInfo'
>>
>> Output:
>> ---
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: HBase Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Subversion Unknown -r Unknown
>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>> logVersion
>> INFO: Compiled by Unknown on Unknown
>>
>> Now this is causing problems when I am trying to run my HBase client on
>> this machine as the it aborts with the following error:
>> ---
>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>at
>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>
>> This means that the hbase-default.xml in the hbase jar is being picked up
>> but the version info captured/compiled through annotations is not? How is
>> it possible if 'hbase shell' (or hadoop version') works fine!
>>
>> Please advise. Thanks a lot. I will be very grateful.
>>
>> Regards,
>> Shahab
>>
>
>


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Okay, I think I know what you mean. Those were back ticks!

So I tried the following:

java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo

and I still get:

13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown

I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus wrote:

> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> `hbase classpath` "? You mean declare a environment variable
> 'HBASE_CLASSPATH'?
>
> Regards,
> Shaahb
>
>
> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu  wrote:
>
>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>
>> Instead of hard coding class path, can you try specifying `hbase
>> classpath` ?
>>
>> Cheers
>>
>>
>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus wrote:
>>
>>> Hello,
>>>
>>> This might be something very obvious that I am missing but this has been
>>> bugging me and I am unable to find what am I missing?
>>>
>>> I have hadoop and hbase installed on Linux machine.
>>> Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
>>> and I can invoke hbase shell and hadoop commands.
>>>
>>> When I give the following command:
>>>
>>> 'hbase version'
>>>
>>> I get the following output which is correct and expected:
>>> ---
>>> 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>>> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>> -r Unknown
>>> 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>>>  1 18:01:09 PDT 2012
>>>
>>> But when I I kick of the VersionInfo class manually (I do see that there
>>> is a main method in there), I get an Unknown result? Why is that?
>>> Command:
>>> 'java  -cp
>>> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>> org.apache.hadoop.hbase.util.VersionInfo'
>>>
>>> Output:
>>> ---
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: HBase Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Subversion Unknown -r Unknown
>>> Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>> logVersion
>>> INFO: Compiled by Unknown on Unknown
>>>
>>> Now this is causing problems when I am trying to run my HBase client on
>>> this machine as the it aborts with the following error:
>>> ---
>>> java.lang.RuntimeException: hbase-default.xml file seems to be for and
>>> old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>>>at
>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>>>
>>> This means that the hbase-default.xml in the hbase jar is being picked
>>> up but the version info captured/compiled through annotations is not? How
>>> is it possible if 'hbase shell' (or hadoop version') works fine!
>>>
>>> Please advise. Thanks a lot. I will be very grateful.
>>>
>>> Regards,
>>> Shahab
>>>
>>
>>
>


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Harsh J
This is rather odd and am unable to reproduce this across several
versions. It may even be something to do with all that static loading
done in the VersionInfo class but am unsure at the moment.

What does "java -version" print for you?

On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus  wrote:
> Okay, I think I know what you mean. Those were back ticks!
>
> So I tried the following:
>
> java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>
> and I still get:
>
> 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>
> I did print `hbase classpath` on the console itself and it does print paths
> to various libs and jars.
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus 
> wrote:
>>
>> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> `hbase classpath` "? You mean declare a environment variable
>> 'HBASE_CLASSPATH'?
>>
>> Regards,
>> Shaahb
>>
>>
>> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu  wrote:
>>>
>>> bq. 'java  -cp /usr/lib/hbase/hbase...
>>>
>>> Instead of hard coding class path, can you try specifying `hbase
>>> classpath` ?
>>>
>>> Cheers
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus 
>>> wrote:

 Hello,

 This might be something very obvious that I am missing but this has been
 bugging me and I am unable to find what am I missing?

 I have hadoop and hbase installed on Linux machine. Version
 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can
 invoke hbase shell and hadoop commands.

 When I give the following command:

 'hbase version'

 I get the following output which is correct and expected:
 ---
 13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
 13/04/29 07:47:42 INFO util.VersionInfo: Subversion
 file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
 -r Unknown
 13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
 1 18:01:09 PDT 2012

 But when I I kick of the VersionInfo class manually (I do see that there
 is a main method in there), I get an Unknown result? Why is that?
 Command:
 'java  -cp
 /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
 org.apache.hadoop.hbase.util.VersionInfo'

 Output:
 ---
 Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
 logVersion
 INFO: HBase Unknown
 Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
 logVersion
 INFO: Subversion Unknown -r Unknown
 Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
 logVersion
 INFO: Compiled by Unknown on Unknown

 Now this is causing problems when I am trying to run my HBase client on
 this machine as the it aborts with the following error:
 ---
 java.lang.RuntimeException: hbase-default.xml file seems to be for and
 old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
at
 org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

 This means that the hbase-default.xml in the hbase jar is being picked
 up but the version info captured/compiled through annotations is not? How 
 is
 it possible if 'hbase shell' (or hadoop version') works fine!

 Please advise. Thanks a lot. I will be very grateful.

 Regards,
 Shahab
>>>
>>>
>>
>



-- 
Harsh J


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
The output of "java -version" is:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--

Also, when I run:

"hbase org.apache.hadoop.hbase.util.VersionInfo"

I do get the correct output:
3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
13/04/29 09:50:26 INFO util.VersionInfo: Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
-r Unknown
13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
18:01:09 PDT 2012

This is strange and because of this I am unable to run my java client which
errores out as mentioned with the following:
java.lang.RuntimeException: hbase-default.xml file seems to be for and old
version of HBase (0.92.1-cdh4.1.2), this version is Unknown
   at
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)

Regards,
Shahab


On Mon, Apr 29, 2013 at 10:50 AM, Harsh J  wrote:

> This is rather odd and am unable to reproduce this across several
> versions. It may even be something to do with all that static loading
> done in the VersionInfo class but am unsure at the moment.
>
> What does "java -version" print for you?
>
> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus 
> wrote:
> > Okay, I think I know what you mean. Those were back ticks!
> >
> > So I tried the following:
> >
> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >
> > and I still get:
> >
> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
> >
> > I did print `hbase classpath` on the console itself and it does print
> paths
> > to various libs and jars.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus 
> > wrote:
> >>
> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
> >> `hbase classpath` "? You mean declare a environment variable
> >> 'HBASE_CLASSPATH'?
> >>
> >> Regards,
> >> Shaahb
> >>
> >>
> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu  wrote:
> >>>
> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >>>
> >>> Instead of hard coding class path, can you try specifying `hbase
> >>> classpath` ?
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus 
> >>> wrote:
> 
>  Hello,
> 
>  This might be something very obvious that I am missing but this has
> been
>  bugging me and I am unable to find what am I missing?
> 
>  I have hadoop and hbase installed on Linux machine. Version
>  2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
> I can
>  invoke hbase shell and hadoop commands.
> 
>  When I give the following command:
> 
>  'hbase version'
> 
>  I get the following output which is correct and expected:
>  ---
>  13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>  13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> 
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>  -r Unknown
>  13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> Nov
>  1 18:01:09 PDT 2012
> 
>  But when I I kick of the VersionInfo class manually (I do see that
> there
>  is a main method in there), I get an Unknown result? Why is that?
>  Command:
>  'java  -cp
> 
> /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>  org.apache.hadoop.hbase.util.VersionInfo'
> 
>  Output:
>  ---
>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>  logVersion
>  INFO: HBase Unknown
>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>  logVersion
>  INFO: Subversion Unknown -r Unknown
>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>  logVersion
>  INFO: Compiled by Unknown on Unknown
> 
>  Now this is causing problems when I am trying to run my HBase client
> on
>  this machine as the it aborts with the following error:
>  ---
>  java.lang.RuntimeException: hbase-default.xml file seems to be for and
>  old version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> at
> 
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> 
>  This means that the hbase-default.xml in the hbase jar is being picked
>  up but the version info captured/compiled through annotations is not?
> How is
>  it 

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Harsh J
Well… Bingo! :)

We don't write our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.

On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus  wrote:
> The output of "java -version" is:
>
> java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> --
>
> Also, when I run:
>
> "hbase org.apache.hadoop.hbase.util.VersionInfo"
>
> I do get the correct output:
> 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> -r Unknown
> 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov  1
> 18:01:09 PDT 2012
>
> This is strange and because of this I am unable to run my java client which
> errores out as mentioned with the following:
> java.lang.RuntimeException: hbase-default.xml file seems to be for and old
> version of HBase (0.92.1-cdh4.1.2), this version is Unknown
>at
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>
> Regards,
> Shahab
>
>
> On Mon, Apr 29, 2013 at 10:50 AM, Harsh J  wrote:
>>
>> This is rather odd and am unable to reproduce this across several
>> versions. It may even be something to do with all that static loading
>> done in the VersionInfo class but am unsure at the moment.
>>
>> What does "java -version" print for you?
>>
>> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus 
>> wrote:
>> > Okay, I think I know what you mean. Those were back ticks!
>> >
>> > So I tried the following:
>> >
>> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
>> >
>> > and I still get:
>> >
>> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
>> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
>> >
>> > I did print `hbase classpath` on the console itself and it does print
>> > paths
>> > to various libs and jars.
>> >
>> > Regards,
>> > Shahab
>> >
>> >
>> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus 
>> > wrote:
>> >>
>> >> Ted, Sorry I didn't understand. What do you mean exactly by "specifying
>> >> `hbase classpath` "? You mean declare a environment variable
>> >> 'HBASE_CLASSPATH'?
>> >>
>> >> Regards,
>> >> Shaahb
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu  wrote:
>> >>>
>> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
>> >>>
>> >>> Instead of hard coding class path, can you try specifying `hbase
>> >>> classpath` ?
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus 
>> >>> wrote:
>> 
>>  Hello,
>> 
>>  This might be something very obvious that I am missing but this has
>>  been
>>  bugging me and I am unable to find what am I missing?
>> 
>>  I have hadoop and hbase installed on Linux machine. Version
>>  2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and
>>  I can
>>  invoke hbase shell and hadoop commands.
>> 
>>  When I give the following command:
>> 
>>  'hbase version'
>> 
>>  I get the following output which is correct and expected:
>>  ---
>>  13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
>>  13/04/29 07:47:42 INFO util.VersionInfo: Subversion
>> 
>>  file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
>>  -r Unknown
>>  13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
>>  Nov
>>  1 18:01:09 PDT 2012
>> 
>>  But when I I kick of the VersionInfo class manually (I do see that
>>  there
>>  is a main method in there), I get an Unknown result? Why is that?
>>  Command:
>>  'java  -cp
>> 
>>  /usr/lib/hbase/hbase-0.92.1-cdh4.1.2-security.jar:/usr/lib/hbase/lib/commons-logging-1.1.1.jar
>>  org.apache.hadoop.hbase.util.VersionInfo'
>> 
>>  Output:
>>  ---
>>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>  logVersion
>>  INFO: HBase Unknown
>>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>  logVersion
>>  INFO: Subversion Unknown -r Unknown
>>  Apr 29, 2013 7:48:41 a.m. org.apache.hadoop.hbase.util.VersionInfo
>>  logVersion
>>  INFO: Compiled by Unknown on Unknown
>> 
>>  Now this is causing problems when I am trying to run my HBase client
>>  on
>>  this machine as the it aborts with the following 

Re: M/R job optimization

2013-04-29 Thread Han JU
Thanks Ted and .. Ted ..
I've been looking at the progress when the job is executing.
In fact, I think it's not a skewed partition problem. I've looked at the
mapper output files, all are of the same size and the reducer each takes a
single group.
What I want to know is that how hadoop M/R framework calculate the progress
percentage.
For example, my reducer:

reducer(...) {
  call_of_another_func() // lots of complicated calculations
}

Will the percentage reflect the calculation inside the function call?
Because I observed that in the job, all reducer reached 100% fairly
quickly, then they stucked there. In this time, the datanodes seem to be
working.

Thanks.


2013/4/26 Ted Dunning 

> Have you checked the logs?
>
> Is there a task that is taking a long time?  What is that task doing?
>
> There are two basic possibilities:
>
> a) you have a skewed join like the other Ted mentioned.  In this case, the
> straggler will be seen to be working on data.
>
> b) you have a hung process.  This can be more difficult to diagnose, but
> indicates that there is a problem with your cluster.
>
>
>
> On Fri, Apr 26, 2013 at 2:21 AM, Han JU  wrote:
>
>> Hi,
>>
>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
>> questionis that in one of the jobs, map and reduce tasks show 100% finished
>> in about 1m 30s, but I have to wait another 5m for this job to finish.
>> This job writes about 720mb compressed data to HDFS with replication
>> factor 1, in sequence file format. I've tried copying these data to hdfs,
>> it takes only < 20 seconds. What happened during this 5 more minutes?
>>
>> Any idea on how to optimize this part?
>>
>> Thanks.
>>
>> --
>> *JU Han*
>>
>> UTC   -  Université de Technologie de Compiègne
>> * **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 061960
>>
>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
* **GI06 - Fouille de Données et Décisionnel*

+33 061960


Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Yes, this indeed seem to be the case. After running java -version and
seeing 1.5 it rung a bell because all our servers (as far as I knew) were
1.6 or above. So I never thought that this would be any issue!! But boy I
was wrong and it indeed turned out to be something so obvious. Thanks guys
for your prompt responses and help. I feel embarrassed to bother all for
such an issue :/

I ran all of these commands on machines which actually had Java 1.6 or 1.7
and they work.

Regards,
Shahab


On Mon, Apr 29, 2013 at 11:05 AM, Harsh J  wrote:

> Well… Bingo! :)
>
> We don't write our projects for 1.5 JVMs, and especially not the GCJ
> (1.5 didn't have annotations either IIRC? We depend on that here). Try
> with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
>
> On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus 
> wrote:
> > The output of "java -version" is:
> >
> > java -version
> > java version "1.5.0"
> > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> >
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > --
> >
> > Also, when I run:
> >
> > "hbase org.apache.hadoop.hbase.util.VersionInfo"
> >
> > I do get the correct output:
> > 3/04/29 09:50:26 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> > 13/04/29 09:50:26 INFO util.VersionInfo: Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> > -r Unknown
> > 13/04/29 09:50:26 INFO util.VersionInfo: Compiled by jenkins on Thu Nov
>  1
> > 18:01:09 PDT 2012
> >
> > This is strange and because of this I am unable to run my java client
> which
> > errores out as mentioned with the following:
> > java.lang.RuntimeException: hbase-default.xml file seems to be for and
> old
> > version of HBase (0.92.1-cdh4.1.2), this version is Unknown
> >at
> >
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Apr 29, 2013 at 10:50 AM, Harsh J  wrote:
> >>
> >> This is rather odd and am unable to reproduce this across several
> >> versions. It may even be something to do with all that static loading
> >> done in the VersionInfo class but am unsure at the moment.
> >>
> >> What does "java -version" print for you?
> >>
> >> On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus 
> >> wrote:
> >> > Okay, I think I know what you mean. Those were back ticks!
> >> >
> >> > So I tried the following:
> >> >
> >> > java  -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
> >> >
> >> > and I still get:
> >> >
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknown
> >> > 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on
> Unknown
> >> >
> >> > I did print `hbase classpath` on the console itself and it does print
> >> > paths
> >> > to various libs and jars.
> >> >
> >> > Regards,
> >> > Shahab
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus <
> shahab.yu...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ted, Sorry I didn't understand. What do you mean exactly by
> "specifying
> >> >> `hbase classpath` "? You mean declare a environment variable
> >> >> 'HBASE_CLASSPATH'?
> >> >>
> >> >> Regards,
> >> >> Shaahb
> >> >>
> >> >>
> >> >> On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu 
> wrote:
> >> >>>
> >> >>> bq. 'java  -cp /usr/lib/hbase/hbase...
> >> >>>
> >> >>> Instead of hard coding class path, can you try specifying `hbase
> >> >>> classpath` ?
> >> >>>
> >> >>> Cheers
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus <
> shahab.yu...@gmail.com>
> >> >>> wrote:
> >> 
> >>  Hello,
> >> 
> >>  This might be something very obvious that I am missing but this has
> >>  been
> >>  bugging me and I am unable to find what am I missing?
> >> 
> >>  I have hadoop and hbase installed on Linux machine. Version
> >>  2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working
> and
> >>  I can
> >>  invoke hbase shell and hadoop commands.
> >> 
> >>  When I give the following command:
> >> 
> >>  'hbase version'
> >> 
> >>  I get the following output which is correct and expected:
> >>  ---
> >>  13/04/29 07:47:42 INFO util.VersionInfo: HBase 0.92.1-cdh4.1.2
> >>  13/04/29 07:47:42 INFO util.VersionInfo: Subversion
> >> 
> >> 
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.92.1-cdh4.1.2
> >>  -r Unknown
> >>  13/04/29 07:47:42 INFO util.VersionInfo: Compiled by jenkins on Thu
> >>  Nov
> >>  1 18:01:09 PDT 2012
> >> 
> >>  But when I I kick of the VersionInfo class manually (I do see that
> >>  there
> >>  i

Re: Warnings?

2013-04-29 Thread Ted Xu
Hi Kevin,

Native libraries are those implemented using C/C++, which only provide code
level portability (instead of binary level portability, as Java do). That
is to say, the binaries provided by CDH4 distribution will in most cases be
broken in your environment.

To check if your native libraries are working or not, you can follow the
instructions I sent previously. Quote as following.



During runtime, check the hadoop log files for your MapReduce tasks.

   - If everything is all right, then: DEBUG util.NativeCodeLoader - Trying
   to load the custom-built native-hadoop library... INFO
   util.NativeCodeLoader - Loaded the native-hadoop library
   - If something goes wrong, then: INFO util.NativeCodeLoader - Unable to
   load native-hadoop library for your platform... using builtin-java classes
   where applicable




On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton wrote:

> I looked at the link you provided and found the Ubuntu is one of the
> “supported platforms” but it doesn’t give any information on how to obtain
> it or build it. Any idea why it is not includde as part of the Cloudera
> CDH4 distribution? I followed the installation instructions (mostly apt-get
> install . . . .) but I fail to see the libhadoop.so.  In order to avoid
> this warning do I need to download the Apache distribution? Which one?
>
> ** **
>
> For the warnings about the configuration I looked in my configuration and
> for this specific example I don’t see ‘session.id’ used anywhere. It must
> be used by default. If so why is the deprecated default being used? 
>
> ** **
>
> As for the two warnings about counters. I know I have not implemented any
> code for counters so again this must be something internal. Is there
> something I am doing to trigger this?
>
> ** **
>
> So I can avoid them what are “hadoop generic options”?
>
> ** **
>
> Thanks again.
>
> ** **
>
> Kevin
>
> ** **
>
> *From:* Ted Xu [mailto:t...@gopivotal.com]
> *Sent:* Friday, April 26, 2013 10:49 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Warnings?
>
> ** **
>
> Hi Kevin,
>
> ** **
>
> Please see my comments inline,
>
> ** **
>
> On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton 
> wrote:
>
> Is the native library not available for Ubuntu? If so how do I load it?***
> *
>
> Native libraries usually requires recompile, for more information please
> refer Native 
> Libraries
> . 
>
> ** **
>
> ** **
>
> Can I tell which key is off? Since I am just starting I would want to be
> as up to date as possible. It is out of date probably because I copied my
> examples from books and tutorials.
>
> ** **
>
> I think the warning messages are telling it already, "xxx is deprecated,
> use xxx instead...". In fact, most of the configure keys are changed from
> hadoop 1.x to 2.x. The compatibility change may later documented on
> http://wiki.apache.org/hadoop/Compatibility.
>
>  
>
> The main class does derive from Tool. Should I ignore this warning as it
> seems to be in error?
>
> Of course you can ignore this warning as long as you don't use hadoop
> generic options.
>
>  
>
> ** **
>
> Thank you.
>
>
> On Apr 26, 2013, at 7:49 PM, Ted Xu  wrote:
>
> Hi,
>
> ** **
>
> First warning is saying hadoop cannot load native library, usually a
> compression codec. In that case, hadoop will use java implementation
> instead, which is slower.
>
> ** **
>
> Second is caused by hadoop 1.x/2.x configuration key change. You're using
> a 1.x style key under 2.x, yet hadoop still guarantees backward
> compatibility.
>
> ** **
>
> Third is saying that the main class of a hadoop application is recommanded
> to implement 
> org.apache.hadoop.util.Tool,
> or else generic command line options (e.g., -D options) will not supported.
>   
>
> ** **
>
> On Sat, Apr 27, 2013 at 5:51 AM,  wrote:
>
> I am running a simple WordCount m/r job and I get output but I get five
> warnings that I am not sure if I should pay attention to: 
>
> ** **
>
> *13/04/26 16:24:50 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable *
>
> ** **
>
> *13/04/26 16:24:50 WARN conf.Configuration: session.id is deprecated.
> Instead, use dfs.metrics.session-id *
>
> ** **
>
> *13/04/26 16:24:50 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same. **
> ***
>
> ** **
>
> *13/04/26 16:24:51 WARN mapreduce.Counters: Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead *
>
> ** **
>
> *13/04/26 16:24:51 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES
> is deprecated. Use FileInputFormatCounters as group name and  BYTE

Re: M/R job optimization

2013-04-29 Thread Ted Xu
Hi Han,

I think your point is valid. In fact you can change the progress report
logic by manually calling the Reporter API, but by default it is quite
straight forward. Reducer progress is divided into 3 phases, namely copy
phase, merge/sort phase and reduce phase, each with ~33%. In your case it
looks your program is stucked in reduce phase. To better track the cause,
you can check the task log, as Ted Dunning suggested before.


On Mon, Apr 29, 2013 at 11:17 PM, Han JU  wrote:

> Thanks Ted and .. Ted ..
> I've been looking at the progress when the job is executing.
> In fact, I think it's not a skewed partition problem. I've looked at the
> mapper output files, all are of the same size and the reducer each takes a
> single group.
> What I want to know is that how hadoop M/R framework calculate the
> progress percentage.
> For example, my reducer:
>
> reducer(...) {
>   call_of_another_func() // lots of complicated calculations
> }
>
> Will the percentage reflect the calculation inside the function call?
> Because I observed that in the job, all reducer reached 100% fairly
> quickly, then they stucked there. In this time, the datanodes seem to be
> working.
>
> Thanks.
>
>
> 2013/4/26 Ted Dunning 
>
>> Have you checked the logs?
>>
>> Is there a task that is taking a long time?  What is that task doing?
>>
>> There are two basic possibilities:
>>
>> a) you have a skewed join like the other Ted mentioned.  In this case,
>> the straggler will be seen to be working on data.
>>
>> b) you have a hung process.  This can be more difficult to diagnose, but
>> indicates that there is a problem with your cluster.
>>
>>
>>
>> On Fri, Apr 26, 2013 at 2:21 AM, Han JU  wrote:
>>
>>> Hi,
>>>
>>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
>>> questionis that in one of the jobs, map and reduce tasks show 100% finished
>>> in about 1m 30s, but I have to wait another 5m for this job to finish.
>>> This job writes about 720mb compressed data to HDFS with replication
>>> factor 1, in sequence file format. I've tried copying these data to hdfs,
>>> it takes only < 20 seconds. What happened during this 5 more minutes?
>>>
>>> Any idea on how to optimize this part?
>>>
>>> Thanks.
>>>
>>> --
>>> *JU Han*
>>>
>>> UTC   -  Université de Technologie de Compiègne
>>> * **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 061960
>>>
>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> * **GI06 - Fouille de Données et Décisionnel*
>
> +33 061960
>



-- 
Regards,
Ted Xu


Add jars to worker classpaths

2013-04-29 Thread Mark
What's the best way to add a number of jar's to workers class path? Preferably 
only adding something to one of the main configuration files (core-site.xml, 
mapred-site.xml) since we don't really want to mess with any of the startup 
scripts.

Thanks



Hardware Selection for Hadoop

2013-04-29 Thread Raj Hadoop
Hi,

I have to propose some hardware requirements in my company for a Proof of 
Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera 
Website. But just wanted to know from the group - what is the requirements if I 
have to plan for a 5 node cluster. I dont know at this time, the data that need 
to be processed at this time for the Proof of Concept. So - can you suggest 
something to me?

Regards,
Raj

Re: Hardware Selection for Hadoop

2013-04-29 Thread Marcos Luis Ortiz Valmaseda
Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.



2013/4/29 Raj Hadoop 

>Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 


Re: Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-29 Thread Vinod Kumar Vavilapalli

They can be different if maps read HDFS files directly instead of or on top of 
getting key-val pairs via the map interface.

HDFS_BYTES_READ will always be greater than or equal to map-input-bytes.

Thanks,
+Vinod

On Apr 29, 2013, at 1:50 AM, Pralabh Kumar wrote:

> Hi
> 
> What's the relationship between HDFS_BYTE_READ and Map input bytes counter . 
> Why can they be different for particular MR job.
> 
> Thanks and Regards
> Pralabh Kumar
> 



Re: M/R job to a cluster?

2013-04-29 Thread Michel Segel
This is one of the reasons we set up edge nodes in the cluster. This is a node 
where Hadoop is loaded yet none of the Hadoop services are running . This 
allows jobs to automatically pick up the right Hadoop configuration from the 
node and point to the right cluster. 

The edge nodes are used for staging jobs and data import in to the cluster. 
Maybe run he MySQL data store there and also hive and pig jobs

HTH


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 28, 2013, at 4:26 PM, Kevin Burton  wrote:

> Part of the problem is nothing comes up on port 50030. 50070 yes but 50030 
> no. 
> 
> On Apr 28, 2013, at 12:04 PM, shashwat shriparv  
> wrote:
> 
>> check in namenode:50030 if it appears there its not running in localmode 
>> else it is
>> 
>> Thanks & Regards   
>> ∞
>> Shashwat Shriparv
>> 
>> 
>> 
>> On Sun, Apr 28, 2013 at 1:18 AM, sudhakara st  wrote:
>>> Hello Kevin,
>>> 
>>> In the case:
>>> 
>>> JobClient client = new JobClient();
>>> JobConf conf - new JobConf(WordCount.class);
>>> 
>>> Job client(default in local system) picks  configuration information  by 
>>> referring HADOOP_HOME in local system.
>>> 
>>> if your job configuration like this:
>>> Configuration conf = new Configuration();
>>> conf.set("fs.default.name", "hdfs://name_node:9000");
>>> conf.set("mapred.job.tracker", "job_tracker_node:9001");
>>> 
>>> It pickups configuration information  by referring HADOOP_HOME in specified 
>>> namenode and job tracker.
>>> 
>>> Regards,
>>> Sudhakara.st
>>> 
>>> 
>>> On Sat, Apr 27, 2013 at 2:52 AM, Kevin Burton  
>>> wrote:
 It is hdfs://devubuntu05:9000. Is this wrong? Devubuntu05 is the name of 
 the host where the NameNode and JobTracker should be running. It is also 
 the host where I am running the M/R client code.
 
 On Apr 26, 2013, at 4:06 PM, Rishi Yadav  wrote:
 
> check core-site.xml and see value of fs.default.name. if it has localhost 
> you are running locally.
> 
> 
> 
> 
> On Fri, Apr 26, 2013 at 1:59 PM,  wrote:
>> I suspect that my MapReduce job is being run locally. I don't have any 
>> evidence but I am not sure how the specifics of my configuration are 
>> communicated to the Java code that I write. Based on the text that I 
>> have read online basically I start with code like:
>> 
>> JobClient client = new JobClient();
>> JobConf conf - new JobConf(WordCount.class);
>> . . . . .
>> 
>> Where do I communicate the configuration information so that the M/R job 
>> runs on the cluster and not locally? Or is the configuration location 
>> "magically determined"?
>> 
>> Thank you.
>>> 
>>> 
>>> 
>>> -- 
>>>
>>> Regards,
>>> .  Sudhakara.st
>> 


Re: M/R job to a cluster?

2013-04-29 Thread Harsh J
To validate if your jobs are running locally, look for the classname
"LocalJobRunner" in the runtime output.

Configs are sourced either from the classpath (if a dir or jar on the
classpath has the XMLs at their root, they're read), or via the code
(conf.set("mapred.job.tracker", "foo:349");) or also via -D parameters
if you use Tool.

The tool + classpath way is usually the best thing to do, for flexibility.

On Sat, Apr 27, 2013 at 2:29 AM,   wrote:
> I suspect that my MapReduce job is being run locally. I don't have any
> evidence but I am not sure how the specifics of my configuration are
> communicated to the Java code that I write. Based on the text that I have
> read online basically I start with code like:
>
> JobClient client = new JobClient();
> JobConf conf - new JobConf(WordCount.class);
> . . . . .
>
> Where do I communicate the configuration information so that the M/R job
> runs on the cluster and not locally? Or is the configuration location
> "magically determined"?
>
> Thank you.



-- 
Harsh J


Re: Warnings?

2013-04-29 Thread Kevin Burton
If it doesn't work what are my options? Is there source that I can download and 
compile?

On Apr 29, 2013, at 10:31 AM, Ted Xu  wrote:

> Hi Kevin,
> 
> Native libraries are those implemented using C/C++, which only provide code 
> level portability (instead of binary level portability, as Java do). That is 
> to say, the binaries provided by CDH4 distribution will in most cases be 
> broken in your environment. 
> 
> To check if your native libraries are working or not, you can follow the 
> instructions I sent previously. Quote as following.
> 
> 
> During runtime, check the hadoop log files for your MapReduce tasks.
> 
> If everything is all right, then: DEBUG util.NativeCodeLoader - Trying to 
> load the custom-built native-hadoop library... INFO util.NativeCodeLoader - 
> Loaded the native-hadoop library
> If something goes wrong, then: INFO util.NativeCodeLoader - Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 
> 
> 
> On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton  
> wrote:
>> I looked at the link you provided and found the Ubuntu is one of the 
>> “supported platforms” but it doesn’t give any information on how to obtain 
>> it or build it. Any idea why it is not includde as part of the Cloudera CDH4 
>> distribution? I followed the installation instructions (mostly apt-get 
>> install . . . .) but I fail to see the libhadoop.so.  In order to avoid this 
>> warning do I need to download the Apache distribution? Which one?
>> 
>>  
>> 
>> For the warnings about the configuration I looked in my configuration and 
>> for this specific example I don’t see ‘session.id’ used anywhere. It must be 
>> used by default. If so why is the deprecated default being used?
>> 
>>  
>> 
>> As for the two warnings about counters. I know I have not implemented any 
>> code for counters so again this must be something internal. Is there 
>> something I am doing to trigger this?
>> 
>>  
>> 
>> So I can avoid them what are “hadoop generic options”?
>> 
>>  
>> 
>> Thanks again.
>> 
>>  
>> 
>> Kevin
>> 
>>  
>> 
>> From: Ted Xu [mailto:t...@gopivotal.com] 
>> Sent: Friday, April 26, 2013 10:49 PM
>> To: user@hadoop.apache.org
>> Subject: Re: Warnings?
>> 
>>  
>> 
>> Hi Kevin,
>> 
>>  
>> 
>> Please see my comments inline,
>> 
>>  
>> 
>> On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton  
>> wrote:
>> 
>> Is the native library not available for Ubuntu? If so how do I load it?
>> 
>> Native libraries usually requires recompile, for more information please 
>> refer Native Libraries. 
>> 
>>  
>> 
>>  
>> 
>> Can I tell which key is off? Since I am just starting I would want to be as 
>> up to date as possible. It is out of date probably because I copied my 
>> examples from books and tutorials.
>> 
>>  
>> 
>> I think the warning messages are telling it already, "xxx is deprecated, use 
>> xxx instead...". In fact, most of the configure keys are changed from hadoop 
>> 1.x to 2.x. The compatibility change may later documented on 
>> http://wiki.apache.org/hadoop/Compatibility.
>> 
>>  
>> 
>> The main class does derive from Tool. Should I ignore this warning as it 
>> seems to be in error?
>> 
>> Of course you can ignore this warning as long as you don't use hadoop 
>> generic options.
>> 
>>  
>> 
>>  
>> 
>> Thank you.
>> 
>> 
>> On Apr 26, 2013, at 7:49 PM, Ted Xu  wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> First warning is saying hadoop cannot load native library, usually a 
>> compression codec. In that case, hadoop will use java implementation 
>> instead, which is slower.
>> 
>>  
>> 
>> Second is caused by hadoop 1.x/2.x configuration key change. You're using a 
>> 1.x style key under 2.x, yet hadoop still guarantees backward compatibility.
>> 
>>  
>> 
>> Third is saying that the main class of a hadoop application is recommanded 
>> to implement org.apache.hadoop.util.Tool, or else generic command line 
>> options (e.g., -D options) will not supported.   
>> 
>>  
>> 
>> On Sat, Apr 27, 2013 at 5:51 AM,  wrote:
>> 
>> I am running a simple WordCount m/r job and I get output but I get five 
>> warnings that I am not sure if I should pay attention to:
>> 
>>  
>> 
>> 13/04/26 16:24:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 
>>  
>> 
>> 13/04/26 16:24:50 WARN conf.Configuration: session.id is deprecated. 
>> Instead, use dfs.metrics.session-id
>> 
>>  
>> 
>> 13/04/26 16:24:50 WARN mapred.JobClient: Use GenericOptionsParser for 
>> parsing the arguments. Applications should implement Tool for the same.
>> 
>>  
>> 
>> 13/04/26 16:24:51 WARN mapreduce.Counters: Group 
>> org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
>> org.apache.hadoop.mapreduce.TaskCounter instead
>> 
>>  
>> 
>> 13/04/26 16:24:51 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is 
>> deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
>> counter name instea

Re: Hardware Selection for Hadoop

2013-04-29 Thread Patai Sangbutsarakum
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop 
mailto:hadoop...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of 
Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera 
Website. But just wanted to know from the group - what is the requirements if I 
have to plan for a 5 node cluster. I dont know at this time, the data that need 
to be processed at this time for the Proof of Concept. So - can you suggest 
something to me?

Regards,
Raj



Re: Hardware Selection for Hadoop

2013-04-29 Thread Ted Dunning
I think that having more than 6 drives is better.

More memory never hurts.  If you have too little, you may have to run with
fewer slots than optimal.

10GB networking is good.  If not, having more than 2 1GBe ports is good, at
least on distributions that can deal with them properly.


On Mon, Apr 29, 2013 at 11:49 AM, Patai Sangbutsarakum <
patai.sangbutsara...@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop 
>  wrote:
>
>  Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>


Re: Warnings?

2013-04-29 Thread Omkar Joshi
Hi,

did you check in your ubuntu installation; "libhadoop" binary.. it is
present in my ubuntu installation at a relative path of (I used apache
installation)

"hadoop-common-project/hadoop-common/target/native/target/usr/local/lib"

if present add it to your LID_LIBRARY_PATH.

if not present then you can try rebuilding your hadoop installation

"mvn clean install -Pnative -Pdist -Dtar -DskipTests"


Thanks,
Omkar Joshi
*Hortonworks Inc.* 


On Mon, Apr 29, 2013 at 11:19 AM, Kevin Burton wrote:

> If it doesn't work what are my options? Is there source that I can
> download and compile?
>
> On Apr 29, 2013, at 10:31 AM, Ted Xu  wrote:
>
> Hi Kevin,
>
> Native libraries are those implemented using C/C++, which only provide
> code level portability (instead of binary level portability, as Java do).
> That is to say, the binaries provided by CDH4 distribution will in most
> cases be broken in your environment.
>
> To check if your native libraries are working or not, you can follow the
> instructions I sent previously. Quote as following.
>
> 
>
> During runtime, check the hadoop log files for your MapReduce tasks.
>
>- If everything is all right, then: DEBUG util.NativeCodeLoader -
>Trying to load the custom-built native-hadoop library... INFO
>util.NativeCodeLoader - Loaded the native-hadoop library
>- If something goes wrong, then: INFO util.NativeCodeLoader - Unable
>to load native-hadoop library for your platform... using builtin-java
>classes where applicable
>
> 
>
>
> On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton 
> wrote:
>
>> I looked at the link you provided and found the Ubuntu is one of the
>> “supported platforms” but it doesn’t give any information on how to obtain
>> it or build it. Any idea why it is not includde as part of the Cloudera
>> CDH4 distribution? I followed the installation instructions (mostly apt-get
>> install . . . .) but I fail to see the libhadoop.so.  In order to avoid
>> this warning do I need to download the Apache distribution? Which one?***
>> *
>>
>> ** **
>>
>> For the warnings about the configuration I looked in my configuration and
>> for this specific example I don’t see ‘session.id’ used anywhere. It
>> must be used by default. If so why is the deprecated default being used?
>> 
>>
>> ** **
>>
>> As for the two warnings about counters. I know I have not implemented any
>> code for counters so again this must be something internal. Is there
>> something I am doing to trigger this?
>>
>> ** **
>>
>> So I can avoid them what are “hadoop generic options”?
>>
>> ** **
>>
>> Thanks again.
>>
>> ** **
>>
>> Kevin
>>
>> ** **
>>
>> *From:* Ted Xu [mailto:t...@gopivotal.com]
>> *Sent:* Friday, April 26, 2013 10:49 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Warnings?
>>
>> ** **
>>
>> Hi Kevin,
>>
>> ** **
>>
>> Please see my comments inline,
>>
>> ** **
>>
>> On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton 
>> wrote:
>>
>> Is the native library not available for Ubuntu? If so how do I load it?**
>> **
>>
>> Native libraries usually requires recompile, for more information please
>> refer Native 
>> Libraries
>> . 
>>
>> ** **
>>
>> ** **
>>
>> Can I tell which key is off? Since I am just starting I would want to be
>> as up to date as possible. It is out of date probably because I copied my
>> examples from books and tutorials.
>>
>> ** **
>>
>> I think the warning messages are telling it already, "xxx is deprecated,
>> use xxx instead...". In fact, most of the configure keys are changed from
>> hadoop 1.x to 2.x. The compatibility change may later documented on
>> http://wiki.apache.org/hadoop/Compatibility.
>>
>>  
>>
>> The main class does derive from Tool. Should I ignore this warning as it
>> seems to be in error?
>>
>> Of course you can ignore this warning as long as you don't use hadoop
>> generic options.
>>
>>  
>>
>> ** **
>>
>> Thank you.
>>
>>
>> On Apr 26, 2013, at 7:49 PM, Ted Xu  wrote:
>>
>> Hi,
>>
>> ** **
>>
>> First warning is saying hadoop cannot load native library, usually a
>> compression codec. In that case, hadoop will use java implementation
>> instead, which is slower.
>>
>> ** **
>>
>> Second is caused by hadoop 1.x/2.x configuration key change. You're using
>> a 1.x style key under 2.x, yet hadoop still guarantees backward
>> compatibility.
>>
>> ** **
>>
>> Third is saying that the main class of a hadoop application is
>> recommanded to implement 
>> org.apache.hadoop.util.Tool,
>> or else generic command line options (e.g., -D options) will not supported.
>>   
>>
>> ** **
>>
>> On Sat, Apr 27, 2013 at 5:51 AM,  wrote:
>>
>> I am running a simple WordCount m/r job and I get output but I get five
>> warnings that I am n

Gap in logs?

2013-04-29 Thread rkevinburton


I see a startup error in the 
/var/log/hadoop-hdfs/hadoop-hdfs-namenode-.log


2013-04-29 14:12:36,095 FATAL 
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
join
java.io.IOException: There appears to be a gap in the edit log.  We 
expected txid 1, but got txid 2103.
at 
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)


. . . .

Apparently this is severe enough that it causes the name node not to 
start up. Any ideas on how to recover? I am tempted to just remove all 
of the log files and start from scratch. Is that too drastic?


Re: Warnings?

2013-04-29 Thread Harsh J
The env-var is auto-created by the "hadoop" script for you when you
invoke "hadoop jar". You do not necessarily have to manually set it,
nor do you have to compile the native libs if what you're using is
pre-built for your OS.

On Tue, Apr 30, 2013 at 12:52 AM,   wrote:
> I don't have this environment  variable. Should I create it in .bashrc AND
> /etc/profile?
>
>
> On Mon, Apr 29, 2013 at 1:55 PM, Omkar Joshi wrote:
>
>  Hi,
>
> did you check in your ubuntu installation; "libhadoop" binary.. it is
> present in my ubuntu installation at a relative path of (I used apache
> installation)
>
> "hadoop-common-project/hadoop-common/target/native/target/usr/local/lib"
>
>
> if present add it to your LID_LIBRARY_PATH.
>
>
> if not present then you can try rebuilding your hadoop installation
>
>
> "mvn clean install -Pnative -Pdist -Dtar -DskipTests"
>
>
>
> Thanks, Omkar Joshi
> Hortonworks Inc.
>
>
> On Mon, Apr 29, 2013 at 11:19 AM, Kevin Burton < rkevinbur...@charter.net>
> wrote:
> If it doesn't work what are my options? Is there source that I can download
> and compile?
>
> On Apr 29, 2013, at 10:31 AM, Ted Xu < t...@gopivotal.com> wrote:
>
> Hi Kevin,
> Native libraries are those implemented using C/C++, which only provide code
> level portability (instead of binary level portability, as Java do). That is
> to say, the binaries provided by CDH4 distribution will in most cases be
> broken in your environment.
>
> To check if your native libraries are working or not, you can follow the
> instructions I sent previously. Quote as following.
>
> 
> During runtime, check the hadoop log files for your MapReduce tasks.
>
>• If everything is all right, then:  DEBUG util.NativeCodeLoader - Trying
> to load the custom-built native-hadoop library...   INFO
> util.NativeCodeLoader - Loaded the native-hadoop library
>• If something goes wrong, then:  INFO util.NativeCodeLoader - Unable to
> load native-hadoop library for your platform... using builtin-java classes
> where applicable
>
> 
>
>
> On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton < rkevinbur...@charter.net>
> wrote:
> I looked at the link you provided and found the Ubuntu is one of the
> “supported platforms” but it doesn’t give any information on how to obtain
> it or build it. Any idea why it is not includde as part of the Cloudera CDH4
> distribution? I followed the installation instructions (mostly apt-get
> install . . . .) but I fail to see the libhadoop.so.  In order to avoid this
> warning do I need to download the Apache distribution? Which one?
>
> For the warnings about the configuration I looked in my configuration and
> for this specific example I don’t see ‘ session.id’ used anywhere. It must
> be used by default. If so why is the deprecated default being used?
>
> As for the two warnings about counters. I know I have not implemented any
> code for counters so again this must be something internal. Is there
> something I am doing to trigger this?
>
> So I can avoid them what are “hadoop generic options”?
>
> Thanks again.
>
> Kevin
>
> From: Ted Xu [mailto: t...@gopivotal.com]
> Sent: Friday, April 26, 2013 10:49 PM
> To: user@hadoop.apache.org
> Subject: Re: Warnings?
>
> Hi Kevin,
>
> Please see my comments inline,
>
> On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton < rkevinbur...@charter.net>
> wrote:
> Is the native library not available for Ubuntu? If so how do I load it?
> Native libraries usually requires recompile, for more information please
> refer Native Libraries.
>
>
> Can I tell which key is off? Since I am just starting I would want to be as
> up to date as possible. It is out of date probably because I copied my
> examples from books and tutorials.
>
> I think the warning messages are telling it already, "xxx is deprecated, use
> xxx instead...". In fact, most of the configure keys are changed from hadoop
> 1.x to 2.x. The compatibility change may later documented on
> http://wiki.apache.org/hadoop/Compatibility.
>
> The main class does derive from Tool. Should I ignore this warning as it
> seems to be in error?
> Of course you can ignore this warning as long as you don't use hadoop
> generic options.
>
>
> Thank you.
>
> On Apr 26, 2013, at 7:49 PM, Ted Xu < t...@gopivotal.com> wrote:
> Hi,
>
> First warning is saying hadoop cannot load native library, usually a
> compression codec. In that case, hadoop will use java implementation
> instead, which is slower.
>
> Second is caused by hadoop 1.x/2.x configuration key change. You're using a
> 1.x style key under 2.x, yet hadoop still guarantees backward compatibility.
>
> Third is saying that the main class of a hadoop application is recommanded
> to implement org.apache.hadoop.util.Tool, or else generic command line
> options (e.g., -D options) will not supported.
>
> On Sat, Apr 27, 2013 at 5:51 AM, < rkevinbur...@charter.net> wrote:
> I am running a simple WordCount m/r job and I get output but I get five
> warnings that I am not sure if I should pay attention to:
>
> 13/04

Re: Hardware Selection for Hadoop

2013-04-29 Thread Raj Hadoop
Hi,
 
In 5 node cluster - you mean
 
Name Node , Job Tracker , Secondary Name Node all on 1 
64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
Data Trackers and Job Trackers - on 4 machies - each of
32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
 
NIC ?
 
Also - what other details should I provide to my hardware engineer. 
 
The idea is to start with a Web Log Processing proof of concept.
 
Please advise.
 



From: Patai Sangbutsarakum 
To: "user@hadoop.apache.org"  
Sent: Monday, April 29, 2013 2:49 PM
Subject: Re: Hardware Selection for Hadoop



2 x Quad cores Intel 
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents



On Apr 29, 2013, at 9:24 AM, Raj Hadoop 
 wrote:

Hi,
>
>I have to propose some hardware requirements in my company for a Proof of 
>Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera 
>Website. But just wanted to know from the group - what is the requirements if 
>I have to plan for a 5 node cluster. I dont know at this time, the data that 
>need to be processed at this time for the Proof of Concept. So - can you 
>suggest something to me?
>
>Regards,
>Raj

Permissions

2013-04-29 Thread rkevinburton


I look in the name node log and I get the following errors:

2013-04-29 15:25:11,646 ERROR 
org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:mapred (auth:SIMPLE) 
cause:org.apache.hadoop.security.AccessControlException: Permission 
denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x


2013-04-29 15:25:11,646 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 6 on 9000, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from 
172.16.26.68:45044: error: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)


When I create the file system I have the user hdfs on the root folder. 
(/). I am not sure now to have both the user mapred and hdfs have access 
to the root (which it seems these errors are indicating).


I get a page from 50070 put when I try to browse the filesystem from the 
web UI I get an error that there are no nodes listening (I have 3 data 
nodes and 1 namenode). The browser indicates that there is nothing 
listening to port 50030, so it seems that the JobTracker is not up.


reducer gets values with empty attributes

2013-04-29 Thread alxsss


Hello,

I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have a 
map function which gets
keys and creates a different object with a few attributes like id and etc and 
passes it to reducer function using

 output.collect(key, value);

Reducer gets keys, but values  has empty  fields (like id and etc) , although 
map correctly assigns these fields to each value.

Any ideas why this happens?

Thanks.
Alex.


Incompartible cluserIDS

2013-04-29 Thread rkevinburton


I am trying to start up a cluster and in the datanode log on the 
NameNode server I get the error:


2013-04-29 15:50:20,988 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Lock on 
/data/hadoop/dfs/data/in_use.lock acquired by nodename 1406@devUbuntu05
2013-04-29 15:50:20,990 FATAL 
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed 
for block pool Block pool BP-1306349046-172.16.26.68-1367256199559 
(storage id DS-403514403-172.16.26.68-50010-1366406077018) service to 
devUbuntu05/172.16.26.68:9000
java.io.IOException: Incompatible clusterIDs in /data/hadoop/dfs/data: 
namenode clusterID = CID-23b9f9c7-2c25-411f-8bd2-4d5c9d7c25a1; datanode 
clusterID = CID-e3f6b811-c1b4-4778-a31e-14dea8b2cca8
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)


How do I get around this error? What does the error mean?

Thank you.

Kevin


Re: Hardware Selection for Hadoop

2013-04-29 Thread Mohammad Tariq
If I were to start with a 5 node cluster, I would do this :

*Machine 1 : *NN+JT
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure
metadata is safe)

*Machine 2 : *SNN*
*
32GB RAM, 2xQuad Core Proc, 500GB SATA HDD

*Machine 3,4,5 : *DN+TT
16GB RAM, 2xQuad Core Proc, 5 x 200GB SATA HDD(JBOD configuation)

I don't think you'll require 64GB RAM and so much of storage just for a
POC(but, it actually depends). You can really kick the ass with 32GB.

NIC(network interface card) is a computer hardware component that connects
a computer to a computer network. It must be reliable to make sure that all
your machines are always connected to the cluster and there is no problem
in the data transfer.

Apart from this ask them to provide you cabinets with good ventilation and
cooling mechanism.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:17 AM, Raj Hadoop  wrote:

> Hi,
>
> In 5 node cluster - you mean
>
> Name Node , Job Tracker , Secondary Name Node all on 1
> 64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> Data Trackers and Job Trackers - on 4 machies - each of
> 32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )
>
> NIC ?
>
> Also - what other details should I provide to my hardware engineer.
>
> The idea is to start with a Web Log Processing proof of concept.
>
> Please advise.
>
>
>   *From:* Patai Sangbutsarakum 
> *To:* "user@hadoop.apache.org" 
> *Sent:* Monday, April 29, 2013 2:49 PM
> *Subject:* Re: Hardware Selection for Hadoop
>
>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
> my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop 
>  wrote:
>
>  Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>
>
>


Re: Incompartible cluserIDS

2013-04-29 Thread Mohammad Tariq
Hello Kevin,

  Have you reformatted the NN(unsuccessfully)?Was your NN serving
some other cluster earlier or your DNs were part of some other
cluster?Datanodes bind themselves to namenode through namespaceID and in
your case the IDs of DNs and NN seem to be different. As a workaround you
could do this :

1- Stop all the daemons.
2- Go to the directory which you have specified as the value of
"dfs.name.dir" property in your hdfs-site.xml file.
3- You'll find a directory called "current" inside this directory where a
file named "VERSION" will be present. Open this file and copy the value of
"namespaceID" form here.
4- Now go to the directory which you have specified as the value of
"dfs.data.dir" property in your hdfs-site.xml file.
5- Move inside the "current" directory and open the "VERSION" file here as
well. Now replace the value of "namespaceID" present here with the one you
had copied earlier.
6- Restart all the daemons.

Note : If you have not created dfs.name.dir and dfs.data.dir separately,
you could find all this inside your temp directory.

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 2:45 AM,  wrote:

> I am trying to start up a cluster and in the datanode log on the NameNode
> server I get the error:
>
> 2013-04-29 15:50:20,988 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Lock on /data/hadoop/dfs/data/in_use.lock acquired by nodename
> 1406@devUbuntu05
> 2013-04-29 15:50:20,990 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-1306349046-172.16.26.68-1367256199559 (storage id
> DS-403514403-172.16.26.68-50010-1366406077018) service to devUbuntu05/
> 172.16.26.68:9000
> java.io.IOException: *Incompatible clusterIDs* in /data/hadoop/dfs/data:
> namenode clusterID = CID-23b9f9c7-2c25-411f-8bd2-4d5c9d7c25a1; datanode
> clusterID = CID-e3f6b811-c1b4-4778-a31e-14dea8b2cca8
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
>
> How do I get around this error? What does the error mean?
>
> Thank you.
>
> Kevin
>


Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
Thank you the HDFS system seems to be up. Now I am having a problem with 
getting the JobTracker and TaskTracker up. According to the logs on the 
JobTracker mapred doesn't have  write permission to /. I am not clear on what 
the permissions should be.

Anyway, thank you.

On Apr 29, 2013, at 4:30 PM, Mohammad Tariq  wrote:

> Hello Kevin,
> 
>   Have you reformatted the NN(unsuccessfully)?Was your NN serving 
> some other cluster earlier or your DNs were part of some other 
> cluster?Datanodes bind themselves to namenode through namespaceID and in your 
> case the IDs of DNs and NN seem to be different. As a workaround you could do 
> this :
> 
> 1- Stop all the daemons.
> 2- Go to the directory which you have specified as the value of 
> "dfs.name.dir" property in your hdfs-site.xml file.
> 3- You'll find a directory called "current" inside this directory where a 
> file named "VERSION" will be present. Open this file and copy the value of 
> "namespaceID" form here.
> 4- Now go to the directory which you have specified as the value of 
> "dfs.data.dir" property in your hdfs-site.xml file.
> 5- Move inside the "current" directory and open the "VERSION" file here as 
> well. Now replace the value of "namespaceID" present here with the one you 
> had copied earlier.
> 6- Restart all the daemons.
> 
> Note : If you have not created dfs.name.dir and dfs.data.dir separately, you 
> could find all this inside your temp directory.
> 
> HTH
> 
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> 
> 
> On Tue, Apr 30, 2013 at 2:45 AM,  wrote:
>> I am trying to start up a cluster and in the datanode log on the NameNode 
>> server I get the error:
>> 
>> 2013-04-29 15:50:20,988 INFO org.apache.hadoop.hdfs.server.common.Storage: 
>> Lock on /data/hadoop/dfs/data/in_use.lock acquired by nodename 
>> 1406@devUbuntu05
>> 2013-04-29 15:50:20,990 FATAL 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
>> block pool Block pool BP-1306349046-172.16.26.68-1367256199559 (storage id 
>> DS-403514403-172.16.26.68-50010-1366406077018) service to 
>> devUbuntu05/172.16.26.68:9000
>> java.io.IOException: Incompatible clusterIDs in /data/hadoop/dfs/data: 
>> namenode clusterID = CID-23b9f9c7-2c25-411f-8bd2-4d5c9d7c25a1; datanode 
>> clusterID = CID-e3f6b811-c1b4-4778-a31e-14dea8b2cca8
>> at 
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
>> at 
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
>> at 
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
>> 
>> How do I get around this error? What does the error mean?
>> 
>> Thank you.
>> 
>> Kevin
> 


Re: Incompartible cluserIDS

2013-04-29 Thread Mohammad Tariq
make it 755.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton wrote:

> Thank you the HDFS system seems to be up. Now I am having a problem with
> getting the JobTracker and TaskTracker up. According to the logs on the
> JobTracker mapred doesn't have  write permission to /. I am not clear on
> what the permissions should be.
>
> Anyway, thank you.
>
> On Apr 29, 2013, at 4:30 PM, Mohammad Tariq  wrote:
>
> Hello Kevin,
>
>   Have you reformatted the NN(unsuccessfully)?Was your NN serving
> some other cluster earlier or your DNs were part of some other
> cluster?Datanodes bind themselves to namenode through namespaceID and in
> your case the IDs of DNs and NN seem to be different. As a workaround you
> could do this :
>
> 1- Stop all the daemons.
> 2- Go to the directory which you have specified as the value of
> "dfs.name.dir" property in your hdfs-site.xml file.
> 3- You'll find a directory called "current" inside this directory where a
> file named "VERSION" will be present. Open this file and copy the value of
> "namespaceID" form here.
> 4- Now go to the directory which you have specified as the value of
> "dfs.data.dir" property in your hdfs-site.xml file.
> 5- Move inside the "current" directory and open the "VERSION" file here as
> well. Now replace the value of "namespaceID" present here with the one you
> had copied earlier.
> 6- Restart all the daemons.
>
> Note : If you have not created dfs.name.dir and dfs.data.dir separately,
> you could find all this inside your temp directory.
>
> HTH
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Tue, Apr 30, 2013 at 2:45 AM,  wrote:
>
>> I am trying to start up a cluster and in the datanode log on the NameNode
>> server I get the error:
>>
>> 2013-04-29 15:50:20,988 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Lock on
>> /data/hadoop/dfs/data/in_use.lock acquired by nodename 1406@devUbuntu05
>> 2013-04-29 15:50:20,990 FATAL
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
>> block pool Block pool BP-1306349046-172.16.26.68-1367256199559 (storage id
>> DS-403514403-172.16.26.68-50010-1366406077018) service to devUbuntu05/
>> 172.16.26.68:9000
>> java.io.IOException: *Incompatible clusterIDs* in /data/hadoop/dfs/data:
>> namenode clusterID = CID-23b9f9c7-2c25-411f-8bd2-4d5c9d7c25a1; datanode
>> clusterID = CID-e3f6b811-c1b4-4778-a31e-14dea8b2cca8
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
>>
>> How do I get around this error? What does the error mean?
>>
>> Thank you.
>>
>> Kevin
>>
>
>


Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
"It" is '/'?

On Apr 29, 2013, at 5:09 PM, Mohammad Tariq  wrote:

> make it 755.
> 
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> 
> 
> On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton  
> wrote:
>> Thank you the HDFS system seems to be up. Now I am having a problem with 
>> getting the JobTracker and TaskTracker up. According to the logs on the 
>> JobTracker mapred doesn't have  write permission to /. I am not clear on 
>> what the permissions should be.
>> 
>> Anyway, thank you.
>> 
>> On Apr 29, 2013, at 4:30 PM, Mohammad Tariq  wrote:
>> 
>>> Hello Kevin,
>>> 
>>>   Have you reformatted the NN(unsuccessfully)?Was your NN serving 
>>> some other cluster earlier or your DNs were part of some other 
>>> cluster?Datanodes bind themselves to namenode through namespaceID and in 
>>> your case the IDs of DNs and NN seem to be different. As a workaround you 
>>> could do this :
>>> 
>>> 1- Stop all the daemons.
>>> 2- Go to the directory which you have specified as the value of 
>>> "dfs.name.dir" property in your hdfs-site.xml file.
>>> 3- You'll find a directory called "current" inside this directory where a 
>>> file named "VERSION" will be present. Open this file and copy the value of 
>>> "namespaceID" form here.
>>> 4- Now go to the directory which you have specified as the value of 
>>> "dfs.data.dir" property in your hdfs-site.xml file.
>>> 5- Move inside the "current" directory and open the "VERSION" file here as 
>>> well. Now replace the value of "namespaceID" present here with the one you 
>>> had copied earlier.
>>> 6- Restart all the daemons.
>>> 
>>> Note : If you have not created dfs.name.dir and dfs.data.dir separately, 
>>> you could find all this inside your temp directory.
>>> 
>>> HTH
>>> 
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 2:45 AM,   wrote:
 I am trying to start up a cluster and in the datanode log on the NameNode 
 server I get the error:
 
 2013-04-29 15:50:20,988 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on /data/hadoop/dfs/data/in_use.lock acquired by nodename 
 1406@devUbuntu05
 2013-04-29 15:50:20,990 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool BP-1306349046-172.16.26.68-1367256199559 (storage id 
 DS-403514403-172.16.26.68-50010-1366406077018) service to 
 devUbuntu05/172.16.26.68:9000
 java.io.IOException: Incompatible clusterIDs in /data/hadoop/dfs/data: 
 namenode clusterID = CID-23b9f9c7-2c25-411f-8bd2-4d5c9d7c25a1; datanode 
 clusterID = CID-e3f6b811-c1b4-4778-a31e-14dea8b2cca8
 at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
 
 How do I get around this error? What does the error mean?
 
 Thank you.
 
 Kevin
> 


Re: reducer gets values with empty attributes

2013-04-29 Thread Mahesh Balija
Hi Alex,

 Can you please attach your code? and the sample input data.

Best,
Mahesh Balija,
Calsoft Labs.


On Tue, Apr 30, 2013 at 2:29 AM,  wrote:

>
> Hello,
>
> I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have 
> a map function which gets
>
> keys and creates a different object with a few attributes like id and etc and 
> passes it to reducer function using
>
>
>  output.collect(key, value);
>
> Reducer gets keys, but values  has empty  fields (like id and etc) , although 
> map correctly assigns these fields to each value.
>
> Any ideas why this happens?
>
> Thanks.
> Alex.
>
>