ClassNotFoundException

2010-12-28 Thread Cavus,M.,Fa. Post Direkt
Hi,

I process this command: ./hadoop jar /home/userme/hd.jar
org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output

 

and get this why? Because I have org.postdirekt.hadoop.Map in the jar
File.

 

10/12/28 15:28:30 INFO mapreduce.Job: Task Id :
attempt_201012281524_0002_m_00_0, Status : FAILED

java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map

at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)

at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
Method)

at
java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

10/12/28 15:28:41 INFO mapreduce.Job: Task Id :
attempt_201012281524_0002_m_00_1, Status : FAILED

java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map

at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)

at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
Method)

at
java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

10/12/28 15:28:53 INFO mapreduce.Job: Task Id :
attempt_201012281524_0002_m_00_2, Status : FAILED

java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map

at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)

at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
Method)

at
java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

10/12/28 15:29:09 INFO mapreduce.Job: Job complete:
job_201012281524_0002

10/12/28 15:29:09 INFO mapreduce.Job: Counters: 7

Job Counters 

Data-local map tasks=4

Total time spent by all maps waiting after
reserving slots (ms)=0

Total time spent by all reduces waiting after
reserving slots (ms)=0

Failed map tasks=1

SLOTS_MILLIS_MAPS=45636

SLOTS_MILLIS_REDUCES=0

Launched map tasks=4

 



Re: ClassNotFoundException

2010-12-28 Thread James Seigel
jar -tvf the jar file and double check that it is a class that is
listed. Can't be in an included jar file.

Sent from my mobile. Please excuse the typos.

On 2010-12-28, at 7:58 AM, Cavus,M.,Fa. Post Direkt
m.ca...@postdirekt.de wrote:

 Hi,

 I process this command: ./hadoop jar /home/userme/hd.jar
 org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output



 and get this why? Because I have org.postdirekt.hadoop.Map in the jar
 File.



 10/12/28 15:28:30 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_0, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:28:41 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_1, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:28:53 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_2, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:29:09 INFO mapreduce.Job: Job complete:
 job_201012281524_0002

 10/12/28 15:29:09 INFO mapreduce.Job: Counters: 7

Job Counters

Data-local map tasks=4

Total time spent by all maps waiting after
 reserving slots (ms)=0

Total time spent by all reduces waiting after
 reserving slots (ms)=0

Failed map tasks=1

SLOTS_MILLIS_MAPS=45636

SLOTS_MILLIS_REDUCES=0

Launched map tasks=4





RE: ClassNotFoundException

2010-12-28 Thread Cavus,M.,Fa. Post Direkt
What must I do James?

-Original Message-
From: James Seigel [mailto:ja...@tynt.com] 
Sent: Tuesday, December 28, 2010 4:03 PM
To: common-user@hadoop.apache.org
Subject: Re: ClassNotFoundException

jar -tvf the jar file and double check that it is a class that is
listed. Can't be in an included jar file.

Sent from my mobile. Please excuse the typos.

On 2010-12-28, at 7:58 AM, Cavus,M.,Fa. Post Direkt
m.ca...@postdirekt.de wrote:

 Hi,

 I process this command: ./hadoop jar /home/userme/hd.jar
 org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output



 and get this why? Because I have org.postdirekt.hadoop.Map in the jar
 File.



 10/12/28 15:28:30 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_0, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:28:41 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_1, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:28:53 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_2, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map

at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)

at

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)

at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native
 Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native
 Method)

at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.m

 10/12/28 15:29:09 INFO mapreduce.Job: Job complete:
 job_201012281524_0002

 10/12/28 15:29:09 INFO mapreduce.Job: Counters: 7

Job Counters

Data-local map tasks=4

Total time spent by all maps waiting after
 reserving slots (ms)=0

Total time spent by all reduces waiting after
 reserving slots (ms)=0


Re: ClassNotFoundException

2010-12-28 Thread Praveen Bathala
Just run this and make sure you really have the class file in jar

jar -tvf | grep org.postdirekt.hadoop.Map

if you don't get any output, the you don't have the class file in your jar

+ Praveen

On Dec 28, 2010, at 9:12 AM, Cavus,M.,Fa. Post Direkt wrote:

 What must I do James?
 
 -Original Message-
 From: James Seigel [mailto:ja...@tynt.com] 
 Sent: Tuesday, December 28, 2010 4:03 PM
 To: common-user@hadoop.apache.org
 Subject: Re: ClassNotFoundException
 
 jar -tvf the jar file and double check that it is a class that is
 listed. Can't be in an included jar file.
 
 Sent from my mobile. Please excuse the typos.
 
 On 2010-12-28, at 7:58 AM, Cavus,M.,Fa. Post Direkt
 m.ca...@postdirekt.de wrote:
 
 Hi,
 
 I process this command: ./hadoop jar /home/userme/hd.jar
 org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output
 
 
 
 and get this why? Because I have org.postdirekt.hadoop.Map in the jar
 File.
 
 
 
 10/12/28 15:28:30 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_0, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 
   at sun.m
 
 10/12/28 15:28:41 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_1, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 
   at sun.m
 
 10/12/28 15:28:53 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_2, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 
   at sun.m
 
 10/12/28 15:29:09 INFO mapreduce.Job: Job complete:
 job_201012281524_0002
 
 10/12/28 15:29:09 

RE: ClassNotFoundException

2010-12-28 Thread Cavus,M.,Fa. Post Direkt
Hi Praveen, I get this

2398 Mon Dec 27 16:19:16 CET org/postdirekt/hadoop/Map.class


-Original Message-
From: Praveen Bathala [mailto:pbatha...@gmail.com] 
Sent: Tuesday, December 28, 2010 4:17 PM
To: common-user@hadoop.apache.org
Subject: Re: ClassNotFoundException

Just run this and make sure you really have the class file in jar

jar -tvf | grep org.postdirekt.hadoop.Map

if you don't get any output, the you don't have the class file in your
jar

+ Praveen

On Dec 28, 2010, at 9:12 AM, Cavus,M.,Fa. Post Direkt wrote:

 What must I do James?
 
 -Original Message-
 From: James Seigel [mailto:ja...@tynt.com] 
 Sent: Tuesday, December 28, 2010 4:03 PM
 To: common-user@hadoop.apache.org
 Subject: Re: ClassNotFoundException
 
 jar -tvf the jar file and double check that it is a class that is
 listed. Can't be in an included jar file.
 
 Sent from my mobile. Please excuse the typos.
 
 On 2010-12-28, at 7:58 AM, Cavus,M.,Fa. Post Direkt
 m.ca...@postdirekt.de wrote:
 
 Hi,
 
 I process this command: ./hadoop jar /home/userme/hd.jar
 org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output
 
 
 
 and get this why? Because I have org.postdirekt.hadoop.Map in the jar
 File.
 
 
 
 10/12/28 15:28:30 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_0, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at

org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 
   at sun.m
 
 10/12/28 15:28:41 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_1, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at

org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 
   at sun.m
 
 10/12/28 15:28:53 INFO mapreduce.Job: Task Id :
 attempt_201012281524_0002_m_00_2, Status : FAILED
 
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.postdirekt.hadoop.Map
 
   at

org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
 
   at
 

org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
 tImpl.java:167)
 
   at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
 
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
 
   at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 
   at java.security.AccessController.doPrivileged(Native
 Method)
 
   at javax.security.auth.Subject.doAs(Subject.java:396)
 
   at
 

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
 n.java:742)
 
   at org.apache.hadoop.mapred.Child.main(Child.java:211)
 
 Caused by: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
 
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 
   at java.security.AccessController.doPrivileged(Native

RE: ClassNotFoundException

2010-12-28 Thread Black, Michael (IS)
I'm using hadoop-0.20.2 and I see this for my map/reduce class
 
com/ngc/asoc/recommend/Predict$Counter.class
com/ngc/asoc/recommend/Predict$R.class
com/ngc/asoc/recommend/Predict$M.class
com/ngc/asoc/recommend/Predict.class

I'm a java idiot so I don't know why they appear but perhaps you have similar?
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 



From: Cavus,M.,Fa. Post Direkt [mailto:m.ca...@postdirekt.de]
Sent: Tue 12/28/2010 9:21 AM
To: common-user@hadoop.apache.org
Subject: EXTERNAL:RE: ClassNotFoundException



Hi Praveen, I get this

2398 Mon Dec 27 16:19:16 CET org/postdirekt/hadoop/Map.class






Re: ClassNotFoundException

2010-12-28 Thread Harsh J
In your job driving class (WordCount as per that command), have you
specified the jar by calling the Job.setJarByClass() [or on Stable
API, JobConf.setJarByClass()]?

I'm not sure if hadoop.util.RunJar automatically sends the jar across
for distribution to TaskTrackers.

On Tue, Dec 28, 2010 at 8:27 PM, Cavus,M.,Fa. Post Direkt
m.ca...@postdirekt.de wrote:
 Hi,

 I process this command: ./hadoop jar /home/userme/hd.jar
 org.postdirekt.hadoop.WordCount gutenberg gutenberberg-output



-- 
Harsh J
www.harshj.com


Re: where the the cloudera hbase rpm's?

2010-12-28 Thread Mark Kerzner
Great, thank you.
Mark

On Tue, Dec 28, 2010 at 12:18 PM, Eric Sammer esam...@cloudera.com wrote:

 Mark:

 For Cloudera / CDH specific questions, please use the cdh-user list at
 https://groups.google.com/a/cloudera.org/group/cdh-user/topics

 Thanks.

 On Tue, Dec 28, 2010 at 1:13 PM, Mark Kerzner markkerz...@gmail.com
 wrote:

  1) On each server, install the core HBase RPMs: hbase, hbase-native,
  hbase-master, hbase-regionserver, hbase-zookeeper, hbase-conf-pseudo,
  hbase-docs.
 
  *I do this: yum list | grep cloudera | grep hbase*
  *
  *
  *and nothing happens. But I do have other packages from Cloudera*
  *
  *
  *
   yum list | grep cloudera
  cloudera-desktop.i386  0.3.0-1
  cloudera-cdh2
  cloudera-desktop.x86_640.3.0-1
  cloudera-cdh2
  cloudera-desktop-plugins.noarch0.3.0-1
  cloudera-cdh2
  hadoop-0.18.noarch 0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-conf-pseudo.noarch 0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-datanode.noarch0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-debuginfo.i386 0.18.3+71-1
  cloudera-cdh2
  hadoop-0.18-docs.noarch0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-jobtracker.noarch  0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-libhdfs.i386   0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-libhdfs.x86_64 0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-namenode.noarch0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-native.i3860.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-native.x86_64  0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-pipes.i386 0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-pipes.x86_64   0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-secondarynamenode.noarch   0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-source.noarch  0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.18-tasktracker.noarch 0.18.3+76.2-1
  cloudera-cdh2
  hadoop-0.20-conf-pseudo.noarch 0.20.1+169.113-1
   cloudera-cdh2
  hadoop-0.20-conf-pseudo-desktop.noarch 0.3.0-1
  cloudera-cdh2
  hadoop-0.20-datanode.noarch0.20.1+169.113-1
   cloudera-cdh2
  hadoop-0.20-debuginfo.i386 0.20.1+169.113-1
   cloudera-cdh2
  hadoop-0.20-debuginfo.x86_64   0.20.1+169.113-1
   cloudera-cdh2
 
 
  Thank you
  *
 



 --
 Eric Sammer
 twitter: esammer
 data: www.cloudera.com



Re: Hadoop RPC call response post processing

2010-12-28 Thread Stefan Groschupf
Hi Ted, 
I don't think the problem is allocation but garbage collection. 
When the gc kicks in everything freezes. Of course changing the gc algorithm 
helps a little.
Stefan 



On Dec 27, 2010, at 11:21 PM, Ted Dunning wrote:

 I would be very surprised if allocation itself is the problem as opposed to
 good old fashioned excess copying.
 
 It is very hard to write an allocator faster than the java generational gc,
 especially if you are talking about objects that are ephemeral.
 
 Have you looked at the tenuring distribution?
 
 On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:
 
 Hi All,
 I'm browsing the RPC code since quite a while now trying to find any entry
 point / interceptor slot that allows me to handle a RPC call response
 writable after it was send over the wire.
 Does anybody has an idea how break into the RPC code from outside. All the
 interesting methods are private. :(
 
 Background:
 Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
 multiple systems  that the garbage collect can get so busy that the jvm
 almost freezes for seconds. Things like zookeeper sessions time out in that
 cases.
 My idea is to create an object pool for writables. Borrowing an object from
 the pool is simple since this happen in our custom code, though we do know
 when the writable return was send over the wire and can be returned into the
 pool.
 A dirty hack would be to overwrite the write(out) method in the writable,
 assuming that is the last thing done with the writable, though turns out
 that this method is called in other cases too, e.g. to measure throughput.
 
 Any ideas?
 
 Thanks,
 Stefan



RE: UI doesn't work

2010-12-28 Thread maha
James said:

Is the job tracker running on that machine?YES 
Is there a firewall in the way?  I don't think so, because it used to work for 
me. How can I check that? 


Harsh said:

Did you do any ant operation on your release copy of Hadoop prior to
starting it, by the way?

   NO, I get the following error:

BUILD FAILED
/cs/sandbox/student/maha/hadoop-0.20.2/build.xml:316: Unable to find a javac 
compiler;
com.sun.tools.javac.Main is not on the classpath.
Perhaps JAVA_HOME does not point to the JDK.
It is currently set to /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre

  I had to change JAVA_HOME to point to -- /usr/lib/jvm/jre-1.6.0-openjdk   
because I used to get an error when trying to run a jar file. The error was:

 bin/hadoop: line 258: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: exec: /etc/alternatives/java/bin/java: cannot
 execute: Not a directory



Adarsh said:
   
  logs of namenode + jobtracker

 namenode log 

[m...@speed logs]$ cat hadoop-maha-namenode-speed.cs.ucsb.edu.log
2010-12-28 12:23:25,006 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = speed.cs.ucsb.edu/128.111.43.50
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
2010-12-28 12:23:25,126 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-12-28 12:23:25,130 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Namenode up at: speed.cs.ucsb.edu/128.111.43.50:9000
2010-12-28 12:23:25,133 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-12-28 12:23:25,134 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-12-28 12:23:25,258 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=maha,grad
2010-12-28 12:23:25,258 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-12-28 12:23:25,258 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2010-12-28 12:23:25,269 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2010-12-28 12:23:25,270 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2010-12-28 12:23:25,316 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files = 6
2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files under construction = 0
2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 551 loaded in 0 seconds.
2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Edits file /tmp/hadoop-maha/dfs/name/current/edits of size 4 edits # 0 loaded 
in 0 seconds.
2010-12-28 12:23:25,358 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 551 saved in 0 seconds.
2010-12-28 12:23:25,711 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage 
in 542 msecs
2010-12-28 12:23:25,715 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
mode ON. 
The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe 
mode will be turned off automatically.
2010-12-28 12:23:25,834 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2010-12-28 12:23:25,901 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50070
2010-12-28 12:23:25,902 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50070 
webServer.getConnectors()[0].getLocalPort() returned 50070
2010-12-28 12:23:25,902 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50070
2010-12-28 12:23:25,902 INFO org.mortbay.log: jetty-6.1.14
2010-12-28 12:23:26,360 INFO org.mortbay.log: Started 
selectchannelconnec...@0.0.0.0:50070
2010-12-28 12:23:26,360 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Web-server up at: 0.0.0.0:50070
2010-12-28 12:23:26,360 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting

Re: Hadoop RPC call response post processing

2010-12-28 Thread Stefan Groschupf
Hi Todd, 
Right, that is the code I'm looking into. Though Responder is inner private 
class and is created  responder = new Responder();
It would be great if the Responder implementation could be configured. 
Do you have any idea how to overwrite the Responder?
Thanks, 
Stefan


On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

 Hi Stefan,
 
 Sounds interesting.
 
 Maybe you're looking for o.a.h.ipc.Server$Responder?
 
 -Todd
 
 On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:
 
 Hi All,
 I'm browsing the RPC code since quite a while now trying to find any entry
 point / interceptor slot that allows me to handle a RPC call response
 writable after it was send over the wire.
 Does anybody has an idea how break into the RPC code from outside. All the
 interesting methods are private. :(
 
 Background:
 Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
 multiple systems  that the garbage collect can get so busy that the jvm
 almost freezes for seconds. Things like zookeeper sessions time out in that
 cases.
 My idea is to create an object pool for writables. Borrowing an object from
 the pool is simple since this happen in our custom code, though we do know
 when the writable return was send over the wire and can be returned into the
 pool.
 A dirty hack would be to overwrite the write(out) method in the writable,
 assuming that is the last thing done with the writable, though turns out
 that this method is called in other cases too, e.g. to measure throughput.
 
 Any ideas?
 
 Thanks,
 Stefan
 
 
 
 
 -- 
 Todd Lipcon
 Software Engineer, Cloudera



Re: UI doesn't work

2010-12-28 Thread James Seigel
For job tracker go to port 50030 see if that helps

James

Sent from my mobile. Please excuse the typos.

On 2010-12-28, at 1:36 PM, maha m...@umail.ucsb.edu wrote:

 James said:

 Is the job tracker running on that machine?YES
 Is there a firewall in the way?  I don't think so, because it used to work 
 for me. How can I check that?

 
 Harsh said:

 Did you do any ant operation on your release copy of Hadoop prior to
 starting it, by the way?

   NO, I get the following error:

 BUILD FAILED
 /cs/sandbox/student/maha/hadoop-0.20.2/build.xml:316: Unable to find a javac 
 compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre

  I had to change JAVA_HOME to point to -- /usr/lib/jvm/jre-1.6.0-openjdk   
 because I used to get an error when trying to run a jar file. The error was:

 bin/hadoop: line 258: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: exec: /etc/alternatives/java/bin/java: cannot
 execute: Not a directory


 
 Adarsh said:

  logs of namenode + jobtracker

  namenode log 

 [m...@speed logs]$ cat hadoop-maha-namenode-speed.cs.ucsb.edu.log
 2010-12-28 12:23:25,006 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = speed.cs.ucsb.edu/128.111.43.50
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build = 
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2010-12-28 12:23:25,126 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
 Initializing RPC Metrics with hostName=NameNode, port=9000
 2010-12-28 12:23:25,130 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Namenode up at: speed.cs.ucsb.edu/128.111.43.50:9000
 2010-12-28 12:23:25,133 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2010-12-28 12:23:25,134 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
 NameNodeMeterics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=maha,grad
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
 2010-12-28 12:23:25,269 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
 Initializing FSNamesystemMetrics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,270 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
 FSNamesystemStatusMBean
 2010-12-28 12:23:25,316 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files = 6
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files under construction = 0
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 loaded in 0 seconds.
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Edits file /tmp/hadoop-maha/dfs/name/current/edits of size 4 edits # 0 loaded 
 in 0 seconds.
 2010-12-28 12:23:25,358 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 saved in 0 seconds.
 2010-12-28 12:23:25,711 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage 
 in 542 msecs
 2010-12-28 12:23:25,715 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
 mode ON.
 The ratio of reported blocks 0. has not reached the threshold 0.9990. 
 Safe mode will be turned off automatically.
 2010-12-28 12:23:25,834 INFO org.mortbay.log: Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 2010-12-28 12:23:25,901 INFO org.apache.hadoop.http.HttpServer: Port returned 
 by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
 the listener on 50070
 2010-12-28 12:23:25,902 INFO org.apache.hadoop.http.HttpServer: 
 listener.getLocalPort() returned 50070 
 webServer.getConnectors()[0].getLocalPort() returned 50070
 2010-12-28 12:23:25,902 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 50070
 2010-12-28 12:23:25,902 INFO org.mortbay.log: jetty-6.1.14
 2010-12-28 12:23:26,360 INFO org.mortbay.log: Started 
 

Re: UI doesn't work

2010-12-28 Thread maha
Hi James,

   I'm accessing  --- http://speed.cs.ucsb.edu:50030/   for the job tracker 
and  port: 50070 for the name node just like Hadoop quick start. 

Did you mean to change the port in my mapred-site.xml file ?

  property
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value
  /property


Maha


On Dec 28, 2010, at 1:01 PM, James Seigel wrote:

 For job tracker go to port 50030 see if that helps
 
 James
 
 Sent from my mobile. Please excuse the typos.
 
 On 2010-12-28, at 1:36 PM, maha m...@umail.ucsb.edu wrote:
 
 James said:
 
 Is the job tracker running on that machine?YES
 Is there a firewall in the way?  I don't think so, because it used to work 
 for me. How can I check that?
 
 
 Harsh said:
 
 Did you do any ant operation on your release copy of Hadoop prior to
 starting it, by the way?
 
  NO, I get the following error:
 
 BUILD FAILED
 /cs/sandbox/student/maha/hadoop-0.20.2/build.xml:316: Unable to find a javac 
 compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre
 
 I had to change JAVA_HOME to point to -- /usr/lib/jvm/jre-1.6.0-openjdk   
 because I used to get an error when trying to run a jar file. The error was:
 
 bin/hadoop: line 258: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: exec: /etc/alternatives/java/bin/java: cannot
 execute: Not a directory
 
 
 
 Adarsh said:
 
 logs of namenode + jobtracker
 
  namenode log 
 
 [m...@speed logs]$ cat hadoop-maha-namenode-speed.cs.ucsb.edu.log
 2010-12-28 12:23:25,006 INFO 
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = speed.cs.ucsb.edu/128.111.43.50
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build = 
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2010-12-28 12:23:25,126 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
 Initializing RPC Metrics with hostName=NameNode, port=9000
 2010-12-28 12:23:25,130 INFO 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 
 speed.cs.ucsb.edu/128.111.43.50:9000
 2010-12-28 12:23:25,133 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2010-12-28 12:23:25,134 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
 NameNodeMeterics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=maha,grad
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
 2010-12-28 12:23:25,269 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
 Initializing FSNamesystemMetrics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,270 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
 FSNamesystemStatusMBean
 2010-12-28 12:23:25,316 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files = 6
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files under construction = 0
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 loaded in 0 seconds.
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Edits file /tmp/hadoop-maha/dfs/name/current/edits of size 4 edits # 0 
 loaded in 0 seconds.
 2010-12-28 12:23:25,358 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 saved in 0 seconds.
 2010-12-28 12:23:25,711 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading 
 FSImage in 542 msecs
 2010-12-28 12:23:25,715 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
 mode ON.
 The ratio of reported blocks 0. has not reached the threshold 0.9990. 
 Safe mode will be turned off automatically.
 2010-12-28 12:23:25,834 INFO org.mortbay.log: Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 2010-12-28 12:23:25,901 INFO org.apache.hadoop.http.HttpServer: Port 
 returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. 
 Opening the listener 

Re: UI doesn't work

2010-12-28 Thread James Seigel
Nope, just on my iPhone I thought you'd tried a different port ( bad memory :) )

Try accessing it with an ip address you get from doing an ipconfig on
the machine.

Then look at the logs and see if there are any errors or indications
that it is being hit properly.

Does your browser follow redirects properly?  As well try clearing the
cache on your browser.

Sorry for checking out the obvious stuff but sometimes it is :).

Cheers
James

Sent from my mobile. Please excuse the typos.

On 2010-12-28, at 2:30 PM, maha m...@umail.ucsb.edu wrote:

 Hi James,

   I'm accessing  --- http://speed.cs.ucsb.edu:50030/   for the job tracker 
 and  port: 50070 for the name node just like Hadoop quick start.

 Did you mean to change the port in my mapred-site.xml file ?

  property
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value
  /property


 Maha


 On Dec 28, 2010, at 1:01 PM, James Seigel wrote:

 For job tracker go to port 50030 see if that helps

 James

 Sent from my mobile. Please excuse the typos.

 On 2010-12-28, at 1:36 PM, maha m...@umail.ucsb.edu wrote:

 James said:

 Is the job tracker running on that machine?YES
 Is there a firewall in the way?  I don't think so, because it used to work 
 for me. How can I check that?

 
 Harsh said:

 Did you do any ant operation on your release copy of Hadoop prior to
 starting it, by the way?

 NO, I get the following error:

 BUILD FAILED
 /cs/sandbox/student/maha/hadoop-0.20.2/build.xml:316: Unable to find a 
 javac compiler;
 com.sun.tools.javac.Main is not on the classpath.
 Perhaps JAVA_HOME does not point to the JDK.
 It is currently set to /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre

 I had to change JAVA_HOME to point to -- /usr/lib/jvm/jre-1.6.0-openjdk   
 because I used to get an error when trying to run a jar file. The error was:

 bin/hadoop: line 258: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: /etc/alternatives/java/bin/java: Not a directory
 bin/hadoop: line 289: exec: /etc/alternatives/java/bin/java: cannot
 execute: Not a directory


 
 Adarsh said:

 logs of namenode + jobtracker

  namenode log 

 [m...@speed logs]$ cat hadoop-maha-namenode-speed.cs.ucsb.edu.log
 2010-12-28 12:23:25,006 INFO 
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = speed.cs.ucsb.edu/128.111.43.50
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build = 
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2010-12-28 12:23:25,126 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
 Initializing RPC Metrics with hostName=NameNode, port=9000
 2010-12-28 12:23:25,130 INFO 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 
 speed.cs.ucsb.edu/128.111.43.50:9000
 2010-12-28 12:23:25,133 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2010-12-28 12:23:25,134 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: 
 Initializing NameNodeMeterics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=maha,grad
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2010-12-28 12:23:25,258 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 isPermissionEnabled=true
 2010-12-28 12:23:25,269 INFO 
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
 Initializing FSNamesystemMetrics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-12-28 12:23:25,270 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
 FSNamesystemStatusMBean
 2010-12-28 12:23:25,316 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files = 6
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Number of files under construction = 0
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 loaded in 0 seconds.
 2010-12-28 12:23:25,323 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Edits file /tmp/hadoop-maha/dfs/name/current/edits of size 4 edits # 0 
 loaded in 0 seconds.
 2010-12-28 12:23:25,358 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Image file of size 551 saved in 0 seconds.
 2010-12-28 12:23:25,711 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading 
 

SV:

2010-12-28 Thread Y G
Recently I often bought some products from a business company.
Very cost-effective and convenient,
if you are free, you can go to browse: okayele.com ,
enrich a shopping choice for yourself wonderful life


RE:FW

2010-12-28 Thread Y G
Recently I often bought some products from a business company.
Very cost-effective and convenient,
if you are free, you can go to browse: okayele.com ,
enrich a shopping choice for yourself wonderful life


Re: UI doesn't work

2010-12-28 Thread maha
Thanks James, you think those are obvious stuff, but they are not to me!  Here 
is the update:

  1- I cleared Browser cache
  2- I used IP address for masters/slaves/mapred-core.xml/core-site.xml   which 
still identifies it as (( speed.cs.ucsb.edu/128.111.43.50 )) in logs. 
  3- Namenode page (( http://128.111.43.50:50030/ ))  redirected to -- (( 
http://128.111.43.50:50070/dfshealth.jsp))which shows the 404 Error.   Is 
that a correct redirection?
  4- log for JobTracker shows something new :

   2010-12-28 14:15:11,870 INFO org.apache.hadoop.mapred.JobTracker: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting JobTracker
STARTUP_MSG:   host = speed.cs.ucsb.edu/128.111.43.50
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
2010-12-28 14:15:11,983 INFO org.apache.hadoop.mapred.JobTracker: Scheduler 
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, 
limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2010-12-28 14:15:12,033 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=JobTracker, port=9001
2010-12-28 14:15:12,096 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2010-12-28 14:15:12,290 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50030
2010-12-28 14:15:12,291 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50030 
webServer.getConnectors()[0].getLocalPort() returned 50030
2010-12-28 14:15:12,291 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50030
2010-12-28 14:15:12,291 INFO org.mortbay.log: jetty-6.1.14
2010-12-28 14:18:28,261 INFO org.mortbay.log: Started 
selectchannelconnec...@0.0.0.0:50030
2010-12-28 14:18:28,265 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2010-12-28 14:18:28,266 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up 
at: 9001
2010-12-28 14:18:28,266 INFO org.apache.hadoop.mapred.JobTracker: JobTracker 
webserver: 50030
2010-12-28 14:18:28,513 INFO org.apache.hadoop.mapred.JobTracker: Cleaning up 
the system directory
2010-12-28 14:18:28,577 INFO org.apache.hadoop.mapred.CompletedJobStatusStore: 
Completed job store is inactive
2010-12-28 14:18:28,667 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2010-12-28 14:18:28,668 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 9001: starting
2010-12-28 14:18:28,668 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 9001: starting
2010-12-28 14:18:28,668 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 9001: starting
2010-12-28 14:18:28,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 9001: starting
2010-12-28 14:18:28,673 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 
on 9001: starting
2010-12-28 14:18:28,673 INFO org.apache.hadoop.mapred.JobTracker: Starting 
RUNNING
2010-12-28 14:18:28,673 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 9001: starting
2010-12-28 14:18:28,684 WARN org.apache.hadoop.mapred.JobTracker: Serious 
problem, cannot find record of 'previous' heartbeat for 
'tracker_pinky.cs.ucsb.edu:localhost/127.0.0.1:56875'; reinitializing the 
tasktracker
2010-12-28 14:18:28,684 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
getProtocolVersion(org.apache.hadoop.mapred.JobSubmissionProtocol, 20) from 
128.111.43.50:59775: output error

@   This might be because I forced to leave SAFEMODE ?  
@

2010-12-28 14:18:28,696 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 9001 caught: java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:144)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:342)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
at 

help for using mapreduce to run different code?

2010-12-28 Thread Jander g
Hi, all

Whether Hadoop supports the map function running different code? If yes, how
to realize this?

Thanks in advance!

-- 
Regards,
Jander


Re: how to run jobs every 30 minutes?

2010-12-28 Thread Jimmy Wan
I've been using Cascading to act as make for my Hadoop processes for quite
some time. Unfortunately, even the most recent distribution of Cascading was
written against the deprecated Hadoop APIs (JobConf) that I'm looking to
replace. Does anyone have an alternative?

On Tue, Dec 14, 2010 at 18:02, Chris K Wensel ch...@wensel.net wrote:


 Cascading also has the ability to only run 'stale' processes. Think 'make'
 file. When re-running a job where only one file of many has changed, this is
 a big win.



Re: help for using mapreduce to run different code?

2010-12-28 Thread James Seigel
Not sure what you mean.

Can you write custom code for your map functions?: yes

Cheers
James

Sent from my mobile. Please excuse the typos.

On 2010-12-28, at 3:54 PM, Jander g jande...@gmail.com wrote:

 Hi, all

 Whether Hadoop supports the map function running different code? If yes, how
 to realize this?

 Thanks in advance!

 --
 Regards,
 Jander


Re: Hadoop RPC call response post processing

2010-12-28 Thread Ted Dunning
Knowing the tenuring distribution will tell a lot about that exact issue.
 Ephemeral collections take on average less than one instruction per
allocation and the allocation itself is generally only a single instruction.
 For ephemeral garbage, it is extremely unlikely that you can beat that.

So the real question is whether you are actually creating so much garbage
that you are over-whelming the collector or whether the data is much longer
lived than it should be. *That* can cause lots of collection costs.

To tell how long data lives, you need to get the tenuring distribution:

-XX:+PrintTenuringDistribution Prints details about the tenuring
distribution to standard out. It can be used to show this threshold and the
ages of objects in the new generation. It is also useful for observing the
lifetime distribution of an application.
On Tue, Dec 28, 2010 at 11:59 AM, Stefan Groschupf s...@101tec.com wrote:

 I don't think the problem is allocation but garbage collection.



Re: help for using mapreduce to run different code?

2010-12-28 Thread Ted Dunning
if you mean running different code in different mappers, I recommend using
an if statement.

On Tue, Dec 28, 2010 at 2:53 PM, Jander g jande...@gmail.com wrote:

 Whether Hadoop supports the map function running different code? If yes,
 how
 to realize this?



Re: help for using mapreduce to run different code?

2010-12-28 Thread Jander g
Hi James,

Thanks for your attention.

Suppose there are only 2 map running in Hadoop cluster, I want to using one
map to sort and another to wordcount in the same time in the same Hadoop
cluster.

On Wed, Dec 29, 2010 at 6:58 AM, James Seigel ja...@tynt.com wrote:

 Not sure what you mean.

 Can you write custom code for your map functions?: yes

 Cheers
 James

 Sent from my mobile. Please excuse the typos.

 On 2010-12-28, at 3:54 PM, Jander g jande...@gmail.com wrote:

  Hi, all
 
  Whether Hadoop supports the map function running different code? If yes,
 how
  to realize this?
 
  Thanks in advance!
 
  --
  Regards,
  Jander




-- 
Thanks,
Jander


Re: help for using mapreduce to run different code?

2010-12-28 Thread Jander g
Yes, that is what I mean.

But what the condition is if I want to using one map to sort and another to
wordcount in the same time in the same Hadoop cluster. I have no idea.

Thanks,
Jander

On Wed, Dec 29, 2010 at 7:08 AM, Ted Dunning tdunn...@maprtech.com wrote:

 if you mean running different code in different mappers, I recommend using
 an if statement.

 On Tue, Dec 28, 2010 at 2:53 PM, Jander g jande...@gmail.com wrote:

  Whether Hadoop supports the map function running different code? If yes,
  how
  to realize this?
 




-- 
Thanks,
Jander


Re: how to run jobs every 30 minutes?

2010-12-28 Thread Ted Dunning
Good quote.

On Tue, Dec 28, 2010 at 3:46 PM, Chris K Wensel ch...@wensel.net wrote:


 deprecated is the new stable.

 https://issues.apache.org/jira/browse/MAPREDUCE-1734

 ckw

 On Dec 28, 2010, at 2:56 PM, Jimmy Wan wrote:

  I've been using Cascading to act as make for my Hadoop processes for
 quite
  some time. Unfortunately, even the most recent distribution of Cascading
 was
  written against the deprecated Hadoop APIs (JobConf) that I'm looking to
  replace. Does anyone have an alternative?
 
  On Tue, Dec 14, 2010 at 18:02, Chris K Wensel ch...@wensel.net wrote:
 
 
  Cascading also has the ability to only run 'stale' processes. Think
 'make'
  file. When re-running a job where only one file of many has changed,
 this is
  a big win.
 

 --
 Chris K Wensel
 ch...@concurrentinc.com
 http://www.concurrentinc.com

 -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading




Re: help for using mapreduce to run different code?

2010-12-28 Thread maha
Hi Jander,

   You mean write Map in another language?  like python or C, then yes. Check 
this http://hadoop.apache.org/common/docs/r0.18.0/streaming.html for Hadoop 
Streaming.

Maha

On Dec 28, 2010, at 2:53 PM, Jander g wrote:

 Hi, all
 
 Whether Hadoop supports the map function running different code? If yes, how
 to realize this?
 
 Thanks in advance!
 
 -- 
 Regards,
 Jander



HDFS disk consumption.

2010-12-28 Thread Jane Chen
Is setting dfs.replication to 1 sufficient to stop replication?  How do I 
verify that?  I have a pseudo cluster running 0.21.0.  It seems that the hdfs 
disk consumption triples the amount of data stored.

Thanks,
Jane


  


Re: Hadoop/Elastic MR on AWS

2010-12-28 Thread Sudhir Vallamkondu
Unfortunately I can't publish the exact numbers however here are the various
things we considered

First off our data trends. We gathered our current data size and plotted a
future growth trend for the next few years. We then finalized on a archival
strategy to understand how much data needs to be on the cluster on a
rotating basis. We crunch our data often (meaning as we get them) so
computing power is not an issue and the cluster size was mainly driven by
our data size that needs to be readily available and replication strategy.
We factored in compression use on older rotating data.

Once we had the above numbers we could decide on our cluster infrastructure
size and type of hardware needed.

For local cluster we factored in hardware, warranty, regular networking
stuff for cluster that size, data center costs, support manpower. We also
factored in a NAS and bandwidth costs to replicate cluster data to another
data center for active replication.

For EMR costs we compared a reserved instance cluster (nodes reserved for
3years with similar hardware config as above) with above cluster size vs
nodes on the fly. We factored in S3 costs to store the above calculated
rotating data and bandwidth costs for data coming in and coming out. One
thing to note is Amazon EMR costs are above normal EC2 instance costs. For
example if you run a job in EMR with 4 nodes and the job overall takes 1hr
then total EMR cost (excluding any data transfer costs) = 4*1*{EMR /hour} +
4*1*EC2 /hour cost. Hopefully that makes sense.

I am sure missing a few things above but that's the jist of it.

- Sudhir

  




On 12/27/10 9:22 PM, common-user-digest-h...@hadoop.apache.org
common-user-digest-h...@hadoop.apache.org wrote:

 From: Dave Viner davevi...@gmail.com
 Date: Mon, 27 Dec 2010 10:23:37 -0800
 To: common-user@hadoop.apache.org
 Subject: Re: Hadoop/Elastic MR on AWS
 
 Hi Sudhir,
 
 Can you publish your findings around pricing, and how you calculated the
 various aspects?
 
 This is great information.
 
 Thanks
 Dave Viner
 
 
 On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu 
 sudhir.vallamko...@icrossing.com wrote:
 
 We recently crossed this bridge and here are some insights. We did an
 extensive study comparing costs and benchmarking local vs EMR for our
 current needs and future trend.
 
 - Scalability you get with EMR is unmatched although you need to look at
 your requirement and decide this is something you need.
 
 - When using EMR its cheaper to use reserved instances vs nodes on the fly.
 You can always add more nodes when required. I suggest looking at your
 current computing needs and reserve instances for a year or two and use
 these to run EMR and add nodes at peak needs. In your cost estimation you
 will need to factor in the data transfer time/costs unless you are dealing
 with public datasets on S3
 
 - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
 benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
 benchmark). For IO intensive jobs you will need to add more nodes to
 compensate this.
 
 - When compared to local cluster, you will need to factor the time it takes
 for the EMR cluster to setup when starting a job. This like data transfer
 time, cluster replication time etc
 
 - EMR API is very flexible however you will need to build a custom
 interface
 on top of it to suit your job management and monitoring needs
 
 - EMR bootstrap actions can satisfy most of your native lib needs so no
 drawbacks there.
 
 
 -- Sudhir
 
 
 On 12/26/10 5:26 AM, common-user-digest-h...@hadoop.apache.org
 common-user-digest-h...@hadoop.apache.org wrote:
 
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
 To: common-user@hadoop.apache.org
 Subject: Re: Hadoop/Elastic MR on AWS
 
 Hello Amandeep,
 
 
 
 - Original Message 
 From: Amandeep Khurana ama...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Fri, December 10, 2010 1:14:45 AM
 Subject: Re: Hadoop/Elastic MR on AWS
 
 Mark,
 
 Using EMR makes it very easy to start a cluster and add/reduce  capacity
 as
 and when required. There are certain optimizations that make EMR  an
 attractive choice as compared to building your own cluster out. Using
  EMR
 
 
 Could you please point out what optimizations you are referring to?
 
 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
 HBase
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 also ensures you are using a production quality, stable system backed by
  the
 EMR engineers. You can always use bootstrap actions to put your own
  tweaked
 version of Hadoop in there if you want to do that.
 
 Also, you  don't have to tear down your cluster after every job. You can
 set
 the alive  option when you start your cluster and it will stay there
 even
 after your  Hadoop job completes.
 
 If you face any issues with EMR, send me a mail  offline and I'll be
 happy to
 help.
 
 

Re: Hadoop RPC call response post processing

2010-12-28 Thread Todd Lipcon
On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf s...@101tec.com wrote:

 Hi Todd,
 Right, that is the code I'm looking into. Though Responder is inner private
 class and is created  responder = new Responder();
 It would be great if the Responder implementation could be configured.
 Do you have any idea how to overwrite the Responder?


Nope, it's not currently pluggable, nor do I think there's any compelling
reason to make it pluggable. It's coupled quite tightly to the
implementation right now.

Perhaps you can hack something in a git branch, and if it has good results
on something like NNBench it could be a general contribution?

-Todd


 On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

  Hi Stefan,
 
  Sounds interesting.
 
  Maybe you're looking for o.a.h.ipc.Server$Responder?
 
  -Todd
 
  On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:
 
  Hi All,
  I'm browsing the RPC code since quite a while now trying to find any
 entry
  point / interceptor slot that allows me to handle a RPC call response
  writable after it was send over the wire.
  Does anybody has an idea how break into the RPC code from outside. All
 the
  interesting methods are private. :(
 
  Background:
  Heavy use of the RPC allocates hugh amount of Writable objects. We saw
 in
  multiple systems  that the garbage collect can get so busy that the jvm
  almost freezes for seconds. Things like zookeeper sessions time out in
 that
  cases.
  My idea is to create an object pool for writables. Borrowing an object
 from
  the pool is simple since this happen in our custom code, though we do
 know
  when the writable return was send over the wire and can be returned into
 the
  pool.
  A dirty hack would be to overwrite the write(out) method in the
 writable,
  assuming that is the last thing done with the writable, though turns out
  that this method is called in other cases too, e.g. to measure
 throughput.
 
  Any ideas?
 
  Thanks,
  Stefan
 
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: UI doesn't work

2010-12-28 Thread Sudhir Vallamkondu
I recently had this issue. UI links were working for some nodes meaning when
I go to dfsHealth.jsp page and following some cluster data node links some
would work and some would show a 404 error.

I started tracing them all the way from the listening ports. Data nodes port
is 50010 so do a netsat on that port to find what process is listening in.
Then check that process to see if its the data node. The issue I had was
somehow when I did the hadoop upgrade I had a older instance and a new
instance of data node running and it was all messed up so I had to kill all
hadoop processes and do a clean start.


On 12/27/10 9:22 PM, common-user-digest-h...@hadoop.apache.org
common-user-digest-h...@hadoop.apache.org wrote:

 From: Harsh J qwertyman...@gmail.com
 Date: Tue, 28 Dec 2010 09:51:11 +0530
 To: common-user@hadoop.apache.org
 Subject: Re: UI doesn't work
 
 I remember facing such an issue with the JT (50030) once. None of the
 jsp pages would load, 'cept the index. It was some odd issue with the
 webapps not getting loaded right while startup. Don't quite remember
 how it got solved.
 
 Did you do any ant operation on your release copy of Hadoop prior to
 starting it, by the way?
 
 On Tue, Dec 28, 2010 at 5:15 AM, maha m...@umail.ucsb.edu wrote:
 Hi,
 
  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm
 using Hadoop-0.20.2 and the following are parts of my configuration files.
 
  in Core-site.xml:
    namefs.default.name/name
    valuehdfs://speed.cs.ucsb.edu:9000/value
 
 in mapred-site.xml:
    namemapred.job.tracker/name
    valuespeed.cs.ucsb.edu:9001/value
 
 
 when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.
 
 
 Any ideas?
 
  Thank you,
     Maha
 
 
 
 
 
 -- 
 Harsh J
 www.harshj.com


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop RPC call response post processing

2010-12-28 Thread Lance Norskog
Are you connecting to this JVM with RMI? RMI does a very nasty thing
with garbage collection: it forces a blocking collection every 60
seconds. Really. You have to change this with a system property.

On Tue, Dec 28, 2010 at 5:56 PM, Todd Lipcon t...@cloudera.com wrote:
 On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf s...@101tec.com wrote:

 Hi Todd,
 Right, that is the code I'm looking into. Though Responder is inner private
 class and is created  responder = new Responder();
 It would be great if the Responder implementation could be configured.
 Do you have any idea how to overwrite the Responder?


 Nope, it's not currently pluggable, nor do I think there's any compelling
 reason to make it pluggable. It's coupled quite tightly to the
 implementation right now.

 Perhaps you can hack something in a git branch, and if it has good results
 on something like NNBench it could be a general contribution?

 -Todd


 On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote:

  Hi Stefan,
 
  Sounds interesting.
 
  Maybe you're looking for o.a.h.ipc.Server$Responder?
 
  -Todd
 
  On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:
 
  Hi All,
  I'm browsing the RPC code since quite a while now trying to find any
 entry
  point / interceptor slot that allows me to handle a RPC call response
  writable after it was send over the wire.
  Does anybody has an idea how break into the RPC code from outside. All
 the
  interesting methods are private. :(
 
  Background:
  Heavy use of the RPC allocates hugh amount of Writable objects. We saw
 in
  multiple systems  that the garbage collect can get so busy that the jvm
  almost freezes for seconds. Things like zookeeper sessions time out in
 that
  cases.
  My idea is to create an object pool for writables. Borrowing an object
 from
  the pool is simple since this happen in our custom code, though we do
 know
  when the writable return was send over the wire and can be returned into
 the
  pool.
  A dirty hack would be to overwrite the write(out) method in the
 writable,
  assuming that is the last thing done with the writable, though turns out
  that this method is called in other cases too, e.g. to measure
 throughput.
 
  Any ideas?
 
  Thanks,
  Stefan
 
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera




 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Lance Norskog
goks...@gmail.com


Re: help for using mapreduce to run different code?

2010-12-28 Thread Matthew John
Hi Jander,

If I understand what u want , u would like to run the map instances of two
different mapreduces (so obviously different mapper codes) simultaneously on
the same machine. If I am correct, it has got more to do with the number of
simultaneous mapper instances setting (I guess its default 2 or 4). And
there should be a way to divide the map instances among the two MR modules
(to fill up the slot of 4)u want to run together. Please correct me if I am
wrong. Wanted to try clearing the air regarding the Query :) :) .

Matthew

On Wed, Dec 29, 2010 at 5:47 AM, maha m...@umail.ucsb.edu wrote:

 Hi Jander,

   You mean write Map in another language?  like python or C, then yes.
 Check this http://hadoop.apache.org/common/docs/r0.18.0/streaming.html for
 Hadoop Streaming.

 Maha

 On Dec 28, 2010, at 2:53 PM, Jander g wrote:

  Hi, all
 
  Whether Hadoop supports the map function running different code? If yes,
 how
  to realize this?
 
  Thanks in advance!
 
  --
  Regards,
  Jander




Re: Hadoop RPC call response post processing

2010-12-28 Thread Stefan Groschupf
Hi Todd, 
Thanks for the feedback. 
 Nope, it's not currently pluggable, nor do I think there's any compelling
 reason to make it pluggable.

Well, one could argue with an interceptor / filter it would be very easy to add 
compression or encryption to RPC. 
But since the nutch days the code base was never architected in extendable or 
modular way. 

 Perhaps you can hack something in a git branch, and if it has good results
 on something like NNBench it could be a general contribution?

Thanks - I pass on that offer. The days waiting a half year to get a patch into 
the codebase are behind me. :)
I think I will just replace hadoop RPC with netty. 

Cheers, 
Stefan 

Re: HDFS Structure

2010-12-28 Thread Harsh J
FileInputFormat takes care of line boundaries in splits, you don't
need to worry about that.

Each mapper works on a FileSplit, which contains the starting offset
and the length from there. These things are computed for it with line
boundaries in mind (and the extra bytes are pulled from the DataNode
that has it).

Similarly, in SequenceFiles, it is done using a special Sync byte
embedded in between logical blocks of data.

On Wed, Dec 29, 2010 at 10:27 AM, shanmukhan battinapati
shanmukha...@gmail.com wrote:
 Hi,

 I have a small doubt about the how  HDFS manages the files internally.

 Assume like I have a NameNode and 2 DataNodes. I have inserted a csv file of
 size 80MB into HDFS using 'hadoop copyFromLocal' command.

 Then how this file will be stored in HDFS?

 Will it be split into two parts of size 64MB(Default chunk size) and
 remaining 16Mb and copied to the 2 DataNodes?

 If that is the case, if I am doing some map-reduce on the two dataNodes, as
 the data is not line oriented I may get unexpected results.

 How to solve this type of issues? Please help me.



 Thanks  Regards
 Shanmukhan.B




-- 
Harsh J
www.harshj.com


Re: help for using mapreduce to run different code?

2010-12-28 Thread Harsh J
Have a look at MultipleInputs

On Wed, Dec 29, 2010 at 4:39 AM, Jander g jande...@gmail.com wrote:
 Hi James,

 Thanks for your attention.

 Suppose there are only 2 map running in Hadoop cluster, I want to using one
 map to sort and another to wordcount in the same time in the same Hadoop
 cluster.

 On Wed, Dec 29, 2010 at 6:58 AM, James Seigel ja...@tynt.com wrote:

 Not sure what you mean.

 Can you write custom code for your map functions?: yes

 Cheers
 James

 Sent from my mobile. Please excuse the typos.

 On 2010-12-28, at 3:54 PM, Jander g jande...@gmail.com wrote:

  Hi, all
 
  Whether Hadoop supports the map function running different code? If yes,
 how
  to realize this?
 
  Thanks in advance!
 
  --
  Regards,
  Jander




 --
 Thanks,
 Jander




-- 
Harsh J
www.harshj.com


Re: UI doesn't work

2010-12-28 Thread maha
Thanks for the tip , I'll try it right away. 

But a quick clarification, what I did is remotely connect to one node and mark 
it as a master and slave. Then, before starting Hadoop, 'jps' will show only 
'jps' but after starting hadoop 'jps' will show all the hadoop deamons. Isn't 
this a clean start??

Maha

On Dec 28, 2010, at 6:02 PM, Sudhir Vallamkondu wrote:

 I recently had this issue. UI links were working for some nodes meaning when
 I go to dfsHealth.jsp page and following some cluster data node links some
 would work and some would show a 404 error.
 
 I started tracing them all the way from the listening ports. Data nodes port
 is 50010 so do a netsat on that port to find what process is listening in.
 Then check that process to see if its the data node. The issue I had was
 somehow when I did the hadoop upgrade I had a older instance and a new
 instance of data node running and it was all messed up so I had to kill all
 hadoop processes and do a clean start.
 
 
 On 12/27/10 9:22 PM, common-user-digest-h...@hadoop.apache.org
 common-user-digest-h...@hadoop.apache.org wrote:
 
 From: Harsh J qwertyman...@gmail.com
 Date: Tue, 28 Dec 2010 09:51:11 +0530
 To: common-user@hadoop.apache.org
 Subject: Re: UI doesn't work
 
 I remember facing such an issue with the JT (50030) once. None of the
 jsp pages would load, 'cept the index. It was some odd issue with the
 webapps not getting loaded right while startup. Don't quite remember
 how it got solved.
 
 Did you do any ant operation on your release copy of Hadoop prior to
 starting it, by the way?
 
 On Tue, Dec 28, 2010 at 5:15 AM, maha m...@umail.ucsb.edu wrote:
 Hi,
 
  I get Error 404 when I try to use hadoop UI to monitor my job execution. 
 I'm
 using Hadoop-0.20.2 and the following are parts of my configuration files.
 
  in Core-site.xml:
namefs.default.name/name
valuehdfs://speed.cs.ucsb.edu:9000/value
 
 in mapred-site.xml:
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value
 
 
 when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.
 
 
 Any ideas?
 
  Thank you,
 Maha
 
 
 
 
 
 -- 
 Harsh J
 www.harshj.com
 
 
 iCrossing Privileged and Confidential Information
 This email message is for the sole use of the intended recipient(s) and may 
 contain confidential and privileged information of iCrossing. Any 
 unauthorized review, use, disclosure or distribution is prohibited. If you 
 are not the intended recipient, please contact the sender by reply email and 
 destroy all copies of the original message.