Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread praveenesh kumar
I followed michael noll's tutorial for making hadoop-0-20-append jars..

http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/

After following the article.. we get 5 jar files which we need to replace it
from hadoop.0.20.2 jar file.
There is no jar file for hadoop-eclipse plugin..that I can see in my
repository if I follow that tutorial.

Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
regarding whether it is compatible with hadoop-0.20-append.

Does anyone else. faced this kind of issue ???

Thanks,
Praveenesh


On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote:

 Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the
 hadoop cluster. For this it needs to have same version of hadoop-core.jar
 for client as well as server(hadoop cluster).

 Update the hadoop eclipse plugin for your eclipse which is provided with
 hadoop-0.20-append release, it will work fine.


 Devaraj K

 -Original Message-
 From: praveenesh kumar [mailto:praveen...@gmail.com]
 Sent: Wednesday, June 22, 2011 11:25 AM
 To: common-user@hadoop.apache.org
 Subject: Hadoop eclipse plugin stopped working after replacing
 hadoop-0.20.2
 jar files with hadoop-0.20-append jar files

 Guys,
 I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
 It was working fine for me.
 I was using Eclipse SDK Helios 3.6.2 with the plugin
 hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
 MAPREDUCE-1280

 Now for Hbase installation.. I had to use hadoop-0.20-append compiled
 jars..and I had to replace the old jar files with new 0.20-append compiled
 jar files..
 But now after replacing .. my hadoop eclipse plugin is not working well for
 me.
 Whenever I am trying to connect to my hadoop master node from that and try
 to see DFS locations..
 it is giving me the following error:
 *
 Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version
 mismatch (client 41 server 43)*

 However the hadoop cluster is working fine if I go directly on hadoop
 namenode use hadoop commands..
 I can add files to HDFS.. run jobs from there.. HDFS web console and
 Map-Reduce web console are also working fine. but not able to use my
 previous hadoop eclipse plugin.

 Any suggestions or help for this issue ?

 Thanks,
 Praveenesh




Backup and upgrade practices?

2011-06-22 Thread Mark Kerzner
Hi,

I am planning a small Hadoop cluster, but looking ahead, are there cheaps
option to have a back up of the data? If I later want to upgrade the
hardware, do I make a complete copy, or do I upgrade one node at a time?

Thank you,
Mark


Re: Automatic Configuration of Hadoop Clusters

2011-06-22 Thread Nathan Milford
http://www.opscode.com/chef/
http://trac.mcs.anl.gov/projects/bcfg2
http://cfengine.com/
http://www.puppetlabs.com/

I use chef personally, but the others are just as good and all are tuned
towards different philosophies in configuration management.

http://trac.mcs.anl.gov/projects/bcfg2- n

On Wed, Jun 22, 2011 at 11:38 AM, gokul gokraz...@gmail.com wrote:

 Dear all,
 for benchmarking purposes we would like to adjust configurations as well as
 flexibly adding/removing machines from our Hadoop clusters. Is there any
 framework around allowing this in an easy manner without having to manually
 distribute the changed configuration files? We consider writing a bash
 script for that purpose, but hope that there is a tool out there saving us
 the work.
 Thanks in advance,
 Gokul

 --
 View this message in context:
 http://hadoop-common.472056.n3.nabble.com/Automatic-Configuration-of-Hadoop-Clusters-tp3096077p3096077.html
 Sent from the Users mailing list archive at Nabble.com.



RE: ClassNotFoundException while running quick start guide on Windows.

2011-06-22 Thread Sandy Pratt
Hi Drew,

I don't know if this is actually the issue or not, but the output below makes 
me think you might be passing Cygwin pathes into the java.exe launcher.  If 
that's the case, it won't work.  java.exe is pure Windows and doesn't know 
about '/cygdrive/c' for example (it also expects the path separator to be 
semicolon rather than colon).  Every once in a while when I try to use java.exe 
from the Cygwin CLI on my Windows box, I get bitten by this.

Sandy

 -Original Message-
 From: Drew Gross [mailto:drew.a.gr...@gmail.com]
 Sent: Tuesday, June 21, 2011 21:26
 To: common-user@hadoop.apache.org
 Subject: Re: ClassNotFoundException while running quick start guide on
 Windows.
 
 Thanks Jeff, it was a problem with JAVA_HOME. I have another problem now
 though, I have this:
 
 $JAVA:  /cygdrive/c/Program Files/Java/jdk1.6.0_26/bin/java
 $JAVA_HEAP_MAX:  -Xmx1000m
 $HADOOP_OPTS:  -Dhadoop.log.dir=C:\Users\Drew
 Gross\Documents\Projects\discom\hadoop-0.21.0\logs
 -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=C:\Users\Drew
 Gross\Documents\Projects\discom\hadoop-0.21.0\ -Dhadoop.id.str= -
 Dhadoop.root.logger=INFO,console
  -Djava.library.path=/cygdrive/c/Users/Drew
 Gross/Documents/Projects/discom/hadoop-0.21.0/lib/native/
 -Dhadoop.policy.file=hadoop-policy.xml
 $CLASS:  org.apache.hadoop.util.RunJar
 Exception in thread main java.lang.NoClassDefFoundError:
 Gross\Documents\Projects\discom\hadoop-0/21/0\logs
 Caused by: java.lang.ClassNotFoundException:
 Gross\Documents\Projects\discom\hadoop-0.21.0\logs
         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class:
 Gross\Documents\Projects\discom\hadoop-0.21.0\logs.  Program will exit.
 
 (This is with some extra debugging info added by me in bin/hadoop)
 
 It looks like the windows style file names are causing problems, especially 
 the
 spaces. Has anyone encountered this before, and know how to fix? I tried
 escaping the spaces and surrounding the file paths with quotes (not at the
 same time), but that didn't help.
 
 Drew
 
 
 On Tue, Jun 21, 2011 at 6:24 AM, madhu phatak phatak@gmail.com
 wrote:
 
  I think the jar have some issuses where its not able to read the Main
  class from manifest . try unjar the jar and see in Manifest.xml what
  is the main class and then run as follows
 
   bin/hadoop jar hadoop-*-examples.jar Full qualified main class grep
  input output 'dfs[a-z.]+'
  On Thu, Jun 16, 2011 at 10:23 AM, Drew Gross drew.a.gr...@gmail.com
 wrote:
 
   Hello,
  
   I'm trying to run the example from the quick start guide on Windows
   and I get this error:
  
   $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
   Exception in thread main java.lang.NoClassDefFoundError:
   Caused by: java.lang.ClassNotFoundException:
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          at
   sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   Could not find the main class: .  Program will exit.
   Exception in thread main java.lang.NoClassDefFoundError:
   Gross\Documents\Projects\discom\hadoop-0/21/0\logs
   Caused by: java.lang.ClassNotFoundException:
   Gross\Documents\Projects\discom\hadoop-0.21.0\logs
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          at
   sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   Could not find the main class:
   Gross\Documents\Projects\discom\hadoop-0.21.0\logs.  Program will
 exit.
  
   Does anyone know what I need to change?
  
   Thank you.
  
   From, Drew
  
   --
   Forget the environment. Print this e-mail immediately. Then burn it.
  
 
 
 
 --
 Forget the environment. Print this e-mail immediately. Then burn it.


Re: Automatic Configuration of Hadoop Clusters

2011-06-22 Thread jagaran das
Pupetize




From: gokul gokraz...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wed, 22 June, 2011 8:38:13 AM
Subject: Automatic Configuration of Hadoop Clusters

Dear all,
for benchmarking purposes we would like to adjust configurations as well as
flexibly adding/removing machines from our Hadoop clusters. Is there any
framework around allowing this in an easy manner without having to manually
distribute the changed configuration files? We consider writing a bash
script for that purpose, but hope that there is a tool out there saving us
the work.
Thanks in advance,
Gokul

--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Automatic-Configuration-of-Hadoop-Clusters-tp3096077p3096077.html

Sent from the Users mailing list archive at Nabble.com.


Re: Any reason Hadoop logs cant be directed to a separate filesystem?

2011-06-22 Thread Madhu Ramanna
Looks like you missed the '#' in line beginning

Feel free to set HADOOP_LOG_DIR in that script or elsewhere

On 6/22/11 1:02 PM, Jack Craig jcr...@carrieriq.com wrote:

Hi Folks,

In the hadoop-env.sh, we find, ...

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

is there any reason this location could not be a separate filesystem on
the name node?

Thx, jackc...

Jack Craig, Operations
CarrierIQ.comhttp://CarrierIQ.com
1200 Villa Ct, Suite 200
Mountain View, CA. 94041
650-625-5456




Re: Any reason Hadoop logs cant be directed to a separate filesystem?

2011-06-22 Thread Harsh J
Jack,

I believe the location can definitely be set to any desired path.
Could you tell us the issues you face when you change it?

P.s. The env var is used to set the config property hadoop.log.dir
internally. So as long as you use the regular scripts (bin/ or init.d/
ones) to start daemons, it would apply fine.

On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig jcr...@carrieriq.com wrote:
 Hi Folks,

 In the hadoop-env.sh, we find, ...

 # Where log files are stored.  $HADOOP_HOME/logs by default.
 # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

 is there any reason this location could not be a separate filesystem on the 
 name node?

 Thx, jackc...
 
 Jack Craig, Operations
 CarrierIQ.comhttp://CarrierIQ.com
 1200 Villa Ct, Suite 200
 Mountain View, CA. 94041
 650-625-5456





-- 
Harsh J


Re: Any reason Hadoop logs cant be directed to a separate filesystem?

2011-06-22 Thread Jack Craig
Thx to both respondents.

Note i've not tried this redirection as I have only production grids available.

Our grids are growing and with them, log volume.

As until now that log location has been in the same fs as the grid data,
so running out of space due log bloat is a growing problem.

From your replies, sounds like I can relocate my logs, Cool!

But now the tough question, if i set up a too small partition and it runs out 
of space,
will my grid become unstable if hadoop can no longer write to its logs?

Thx again, jackc...


Jack Craig, Operations
CarrierIQ.comhttp://CarrierIQ.com
1200 Villa Ct, Suite 200
Mountain View, CA. 94041
650-625-5456

On Jun 22, 2011, at 1:09 PM, Harsh J wrote:

Jack,

I believe the location can definitely be set to any desired path.
Could you tell us the issues you face when you change it?

P.s. The env var is used to set the config property hadoop.log.dir
internally. So as long as you use the regular scripts (bin/ or init.d/
ones) to start daemons, it would apply fine.

On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig 
jcr...@carrieriq.commailto:jcr...@carrieriq.com wrote:
Hi Folks,

In the hadoop-env.sh, we find, ...

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

is there any reason this location could not be a separate filesystem on the 
name node?

Thx, jackc...

Jack Craig, Operations
CarrierIQ.comhttp://CarrierIQ.com
1200 Villa Ct, Suite 200
Mountain View, CA. 94041
650-625-5456





--
Harsh J



Re: Any reason Hadoop logs cant be directed to a separate filesystem?

2011-06-22 Thread jagaran das
Hi,

Can I limit the log file duration ?
I want to keep files for last 15 days only.

Regards,
Jagaran 




From: Jack Craig jcr...@carrieriq.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Wed, 22 June, 2011 2:00:23 PM
Subject: Re: Any reason Hadoop logs cant be directed to a separate filesystem?

Thx to both respondents.

Note i've not tried this redirection as I have only production grids available.

Our grids are growing and with them, log volume.

As until now that log location has been in the same fs as the grid data,
so running out of space due log bloat is a growing problem.

From your replies, sounds like I can relocate my logs, Cool!

But now the tough question, if i set up a too small partition and it runs out 
of 
space,
will my grid become unstable if hadoop can no longer write to its logs?

Thx again, jackc...


Jack Craig, Operations
CarrierIQ.comhttp://CarrierIQ.com
1200 Villa Ct, Suite 200
Mountain View, CA. 94041
650-625-5456

On Jun 22, 2011, at 1:09 PM, Harsh J wrote:

Jack,

I believe the location can definitely be set to any desired path.
Could you tell us the issues you face when you change it?

P.s. The env var is used to set the config property hadoop.log.dir
internally. So as long as you use the regular scripts (bin/ or init.d/
ones) to start daemons, it would apply fine.

On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig 
jcr...@carrieriq.commailto:jcr...@carrieriq.com wrote:
Hi Folks,

In the hadoop-env.sh, we find, ...

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

is there any reason this location could not be a separate filesystem on the 
name 
node?

Thx, jackc...

Jack Craig, Operations
CarrierIQ.comhttp://CarrierIQ.com
1200 Villa Ct, Suite 200
Mountain View, CA. 94041
650-625-5456





--
Harsh J

Re: Problem debugging MapReduce job under Windows

2011-06-22 Thread Sal
I had the same issue.  I installed the previous stable version of Hadoop
(0.20.2), and it worked fine.  I hope this helps.

-Sal




Re: Hadoop Eclipse plugin 0.20.203.0 doesn't work

2011-06-22 Thread Jack Ye
can anyone help me?

叶达峰 kobe082...@qq.com编写:

Hi,
  
 I am a freshman on Hadoop. Today, I spent the whole night trying to set up a 
 development environment for Hadoop. I encounter several problems, first is 
 that the eclipse can't load the plugin, I changed to another version, this 
 problem was resolved.
  
 But now, I have one more difficult problem. I try to set up Map/Reduce 
 Location, if everything were well, it should connect to the server and the 
 DFS Location could list the whole file system. Sadly, it doesn't work.
 I have checked to configuration several times, it should be correct.
  
 Here is the message I get:
 Error: failure to login
 An internal error occured during: Map/reduce location status updater.
 org/codehaus/jackson/map/jsonmappingexception

Re: OutOfMemoryError: GC overhead limit exceeded

2011-06-22 Thread hadoopman
I've run into similar problems in my hive jobs and will look at the 
'mapred.child.ulimit' option.  One thing that we've found is when 
loading data with insert overwrite into our hive tables we've needed to 
include a 'CLUSTER BY' or 'DISTRIBUTE BY' option.  Generally that's 
fixed our memory issues during the reduce phase.  But not 100% of the 
time (but close).


I understand the basics as to what those options do but I'm unclear as 
to why they are necessary (coming from an Oracle and Postgres DBA 
background).  I'm guessing it has to do something with the underlying code.




On 06/18/2011 12:28 PM, Mapred Learn wrote:

Did u try playing with mapred.child.ulimit along with java.opts ?

Sent from my iPhone

On Jun 18, 2011, at 9:55 AM, Ken Williamszoo9...@hotmail.com  wrote:

   

Hi All,

I'm having a problem running a job on Hadoop. Using Mahout, I've been able to 
run several Bayesian classifiers and train and test them successfully on 
increasingly large datasets. Now I'm working on a dataset of 100,000 documents 
(size 100MB). I've trained the classifier on 80,000 docs and am using the 
remaining 20,000 as the test set. I've been able to train the classifier but 
when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: 
java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the 
job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of 
memory and CPU power (3 x 16GB, 2.5GHz quad-core machines).
I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G 
-Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE 
at values like 2000, 2500 and 3000 but this made no difference. When the 
program is running I can use 'top' to see that although the CPUs are busy, 
memory usage rarely goes above 12GB and absolutely no swapping is taking place. 
(see Program console output: http://pastebin.com/0m2Uduxa, Job config file: 
http://pastebin.com/4GEFSnUM).
I found a similar problem with a 'GC overhead limit exceeded' where the program was 
spending so much time garbage-collecting (more then 90% of its time!) that it was 
unable to progress and so threw the 'GC overhead limit exceeded' exception.  If I set 
(-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this 
exception then I see the same behaviour as before only a slightly different exception 
is thrown,   Caused by: java.lang.OutOfMemoryError: Java heap space at 
java.nio.HeapCharBuffer.init(HeapCharBuffer.java:39)
So I'm guessing that maybe my program is spending too much time 
garbage-collecting for it to progress ? But how do I fix this ? There's no 
further info in the log-files other than seeing the exceptions being thrown. I 
tried to reduce the 'dfs.block.size' parameter to reduce the amount of data 
going into each 'map' process (and therefore reduce it's memory requirements) 
but this made no difference. I tried various settings for JVM reuse 
(mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use 
(10), and unlimited re-use (-1) but no difference. I think the problem is in 
the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 
and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu 
and Java 6.Any help would be very much appreciated,

   Ken Williams








 
   




Re: Poor scalability with map reduce application

2011-06-22 Thread Alberto Andreotti
Hi guys,

I suspected that the problem was due to overhead introduced by the
filesystem, so I tried to set the dfs.replication.max property to
different values.
First, I tried with 2, and I got a message saying that I was requesting a
value of 3, which was bigger than the limit. So I couldn't do the run(it
seems this 3 is hardcoded somewhere, I read that in Jira).
Then I tried with 3, I could generate the input files for the map reduce
app, but when trying to run I got this one,

Exception in thread main java.io.IOException: file
/tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar.
Requested replication 10 exceeds maximum 3
at
org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468)


which seems like the framework were trying to replicate the output in as
many nodes as possible. Could this be the degradation source?.
Also I attached the log for the run with 7 nodes,.

Alberto.


On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote:

 Matt,

 You're right that it (slowstart) does not / would not affect much. I
 was merely explaining the reason behind his observance of reducers
 getting scheduled early, not really recommending a tweak for
 performance changes there.

 On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000)
 matthew.go...@monsanto.com wrote:
  Harsh,
 
  Is it possible for mapred.reduce.slowstart.completed.maps to even play a
 significant role in this? The only benefit he would find in tweaking that
 for his problem would be to spread network traffic from the shuffle over a
 longer period of time at a cost of having the reducer using resources
 earlier. Either way he would see this effect across both sets of runs if he
 is using the default parameters. I guess it would all depend on what kind of
 network layout the cluster is on.
 
  Matt
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: Tuesday, June 21, 2011 12:09 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Poor scalability with map reduce application
 
  Alberto,
 
  On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
  albertoandreo...@gmail.com wrote:
  I don't know if speculatives maps are on, I'll check it. One thing I
  observed is that reduces begin before all maps have finished. Let me
 check
  also if the difference is on the map side or in the reduce. I believe
 it's
  balanced, both are slower when adding more nodes, but i'll confirm that.
 
  Maps and reduces are speculative by default, so must've been ON. Could
  you also post a general input vs. output record counts and statistics
  like that between your job runs, to correlate?
 
  The reducers get scheduled early but do not exactly reduce() until
  all maps are done. They just keep fetching outputs. Their scheduling
  can be controlled with some configurations (say, to start only after
  X% of maps are done -- by default it starts up when 5% of maps are
  done).
 
  --
  Harsh J
  This e-mail message may contain privileged and/or confidential
 information, and is intended to be received only by persons entitled
  to receive such information. If you have received this e-mail in error,
 please notify the sender immediately. Please delete it and
  all attachments from any servers, hard drives or any other media. Other
 use of this e-mail by you is strictly prohibited.
 
  All e-mails and attachments sent and received are subject to monitoring,
 reading and archival by Monsanto, including its
  subsidiaries. The recipient of this e-mail is solely responsible for
 checking for the presence of Viruses or other Malware.
  Monsanto, along with its subsidiaries, accepts no liability for any
 damage caused by any such code transmitted by or accompanying
  this e-mail or any attachment.
 
 
  The information contained in this email may be subject to the export
 control laws and regulations of the United States, potentially
  including but not limited to the Export Administration Regulations (EAR)
 and sanctions regulations issued by the U.S. Department of
  Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of
 this information you are obligated to comply with all
  applicable U.S. export laws and regulations.
 
 



 --
 Harsh J




-- 
José Pablo Alberto Andreotti.
Tel: 54 351 4730292
Móvil: 54351156526363.
MSN: albertoandreo...@gmail.com
Skype: andreottialberto
11/06/23 01:09:38 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
11/06/23 01:09:38 WARN conf.Configuration: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
11/06/23 01:09:38 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
11/06/23 01:09:38 INFO input.FileInputFormat: Total input paths to process : 1
11/06/23 01:09:40 WARN conf.Configuration: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
11/06/23 

Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread Yaozhen Pan
Hi,

I am using Eclipse Helios Service Release 2.
I encountered a similar problem (map/reduce perspective failed to load) when
upgrading eclipse plugin from 0.20.2 to 0.20.3-append version.

I compared the source code of eclipse plugin and found only a few
difference. I tried to revert the differences one by one to see if it can
work.
What surprised me was that when I only reverted the jar name from
hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it
worked in eclipse.

Yaozhen


On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.comwrote:

 I am doing that.. its not working.. If I am replacing the hadoop-core from
 hadoop-plugin.jar.. I am not able to see map-reduce perspective at all.
 Guys.. any help.. !!!

 Thanks,
 Praveenesh

 On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote:

  Every time when hadoop builds, it also builds the hadoop eclipse plug-in
  using the latest hadoop core jar. In your case eclipse plug-in contains
 the
  other version jar and cluster is running with other version. That's why
 it
  is giving the version mismatch error.
 
 
 
  Just replace the hadoop-core jar in your eclipse plug-in with the jar
  whatever the hadoop cluster is using  and check.
 
 
 
  Devaraj K
 
   _
 
  From: praveenesh kumar [mailto:praveen...@gmail.com]
  Sent: Wednesday, June 22, 2011 12:07 PM
  To: common-user@hadoop.apache.org; devara...@huawei.com
  Subject: Re: Hadoop eclipse plugin stopped working after replacing
  hadoop-0.20.2 jar files with hadoop-0.20-append jar files
 
 
 
   I followed michael noll's tutorial for making hadoop-0-20-append jars..
 
 
 
 http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio
  n-for-hbase-0-90-2/
 
  After following the article.. we get 5 jar files which we need to replace
  it
  from hadoop.0.20.2 jar file.
  There is no jar file for hadoop-eclipse plugin..that I can see in my
  repository if I follow that tutorial.
 
  Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
  regarding whether it is compatible with hadoop-0.20-append.
 
  Does anyone else. faced this kind of issue ???
 
  Thanks,
  Praveenesh
 
 
 
  On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com
 wrote:
 
  Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the
  hadoop cluster. For this it needs to have same version of hadoop-core.jar
  for client as well as server(hadoop cluster).
 
  Update the hadoop eclipse plugin for your eclipse which is provided with
  hadoop-0.20-append release, it will work fine.
 
 
  Devaraj K
 
  -Original Message-
  From: praveenesh kumar [mailto:praveen...@gmail.com]
  Sent: Wednesday, June 22, 2011 11:25 AM
  To: common-user@hadoop.apache.org
  Subject: Hadoop eclipse plugin stopped working after replacing
  hadoop-0.20.2
  jar files with hadoop-0.20-append jar files
 
 
  Guys,
  I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
  It was working fine for me.
  I was using Eclipse SDK Helios 3.6.2 with the plugin
  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
  MAPREDUCE-1280
 
  Now for Hbase installation.. I had to use hadoop-0.20-append compiled
  jars..and I had to replace the old jar files with new 0.20-append
 compiled
  jar files..
  But now after replacing .. my hadoop eclipse plugin is not working well
 for
  me.
  Whenever I am trying to connect to my hadoop master node from that and
 try
  to see DFS locations..
  it is giving me the following error:
  *
  Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version
  mismatch (client 41 server 43)*
 
  However the hadoop cluster is working fine if I go directly on hadoop
  namenode use hadoop commands..
  I can add files to HDFS.. run jobs from there.. HDFS web console and
  Map-Reduce web console are also working fine. but not able to use my
  previous hadoop eclipse plugin.
 
  Any suggestions or help for this issue ?
 
  Thanks,
  Praveenesh
 
 
 
 



Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread Jack Ye
do you use hadoop 0.20.203.0?
I also have problem about this plugin.

Yaozhen Pan itzhak@gmail.com编写:

Hi,

I am using Eclipse Helios Service Release 2.
I encountered a similar problem (map/reduce perspective failed to load) when
upgrading eclipse plugin from 0.20.2 to 0.20.3-append version.

I compared the source code of eclipse plugin and found only a few
difference. I tried to revert the differences one by one to see if it can
work.
What surprised me was that when I only reverted the jar name from
hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it
worked in eclipse.

Yaozhen


On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.comwrote:

 I am doing that.. its not working.. If I am replacing the hadoop-core from
 hadoop-plugin.jar.. I am not able to see map-reduce perspective at all.
 Guys.. any help.. !!!

 Thanks,
 Praveenesh

 On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote:

  Every time when hadoop builds, it also builds the hadoop eclipse plug-in
  using the latest hadoop core jar. In your case eclipse plug-in contains
 the
  other version jar and cluster is running with other version. That's why
 it
  is giving the version mismatch error.
 
 
 
  Just replace the hadoop-core jar in your eclipse plug-in with the jar
  whatever the hadoop cluster is using  and check.
 
 
 
  Devaraj K
 
   _
 
  From: praveenesh kumar [mailto:praveen...@gmail.com]
  Sent: Wednesday, June 22, 2011 12:07 PM
  To: common-user@hadoop.apache.org; devara...@huawei.com
  Subject: Re: Hadoop eclipse plugin stopped working after replacing
  hadoop-0.20.2 jar files with hadoop-0.20-append jar files
 
 
 
   I followed michael noll's tutorial for making hadoop-0-20-append jars..
 
 
 
 http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio
  n-for-hbase-0-90-2/
 
  After following the article.. we get 5 jar files which we need to replace
  it
  from hadoop.0.20.2 jar file.
  There is no jar file for hadoop-eclipse plugin..that I can see in my
  repository if I follow that tutorial.
 
  Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
  regarding whether it is compatible with hadoop-0.20-append.
 
  Does anyone else. faced this kind of issue ???
 
  Thanks,
  Praveenesh
 
 
 
  On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com
 wrote:
 
  Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the
  hadoop cluster. For this it needs to have same version of hadoop-core.jar
  for client as well as server(hadoop cluster).
 
  Update the hadoop eclipse plugin for your eclipse which is provided with
  hadoop-0.20-append release, it will work fine.
 
 
  Devaraj K
 
  -Original Message-
  From: praveenesh kumar [mailto:praveen...@gmail.com]
  Sent: Wednesday, June 22, 2011 11:25 AM
  To: common-user@hadoop.apache.org
  Subject: Hadoop eclipse plugin stopped working after replacing
  hadoop-0.20.2
  jar files with hadoop-0.20-append jar files
 
 
  Guys,
  I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
  It was working fine for me.
  I was using Eclipse SDK Helios 3.6.2 with the plugin
  hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
  MAPREDUCE-1280
 
  Now for Hbase installation.. I had to use hadoop-0.20-append compiled
  jars..and I had to replace the old jar files with new 0.20-append
 compiled
  jar files..
  But now after replacing .. my hadoop eclipse plugin is not working well
 for
  me.
  Whenever I am trying to connect to my hadoop master node from that and
 try
  to see DFS locations..
  it is giving me the following error:
  *
  Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version
  mismatch (client 41 server 43)*
 
  However the hadoop cluster is working fine if I go directly on hadoop
  namenode use hadoop commands..
  I can add files to HDFS.. run jobs from there.. HDFS web console and
  Map-Reduce web console are also working fine. but not able to use my
  previous hadoop eclipse plugin.
 
  Any suggestions or help for this issue ?
 
  Thanks,
  Praveenesh
 
 
 
 



Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread Yaozhen Pan
Hi,

Our hadoop version was built on 0.20-append with a few patches.
However, I didn't see big differences in eclipse-plugin.

Yaozhen

On Thu, Jun 23, 2011 at 11:29 AM, 叶达峰 (Jack Ye) kobe082...@qq.com wrote:

 do you use hadoop 0.20.203.0?
 I also have problem about this plugin.

 Yaozhen Pan itzhak@gmail.com编写:

 Hi,
 
 I am using Eclipse Helios Service Release 2.
 I encountered a similar problem (map/reduce perspective failed to load)
 when
 upgrading eclipse plugin from 0.20.2 to 0.20.3-append version.
 
 I compared the source code of eclipse plugin and found only a few
 difference. I tried to revert the differences one by one to see if it can
 work.
 What surprised me was that when I only reverted the jar name from
 hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar,
 it
 worked in eclipse.
 
 Yaozhen
 
 
 On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.com
 wrote:
 
  I am doing that.. its not working.. If I am replacing the hadoop-core
 from
  hadoop-plugin.jar.. I am not able to see map-reduce perspective at all.
  Guys.. any help.. !!!
 
  Thanks,
  Praveenesh
 
  On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com
 wrote:
 
   Every time when hadoop builds, it also builds the hadoop eclipse
 plug-in
   using the latest hadoop core jar. In your case eclipse plug-in
 contains
  the
   other version jar and cluster is running with other version. That's
 why
  it
   is giving the version mismatch error.
  
  
  
   Just replace the hadoop-core jar in your eclipse plug-in with the jar
   whatever the hadoop cluster is using  and check.
  
  
  
   Devaraj K
  
_
  
   From: praveenesh kumar [mailto:praveen...@gmail.com]
   Sent: Wednesday, June 22, 2011 12:07 PM
   To: common-user@hadoop.apache.org; devara...@huawei.com
   Subject: Re: Hadoop eclipse plugin stopped working after replacing
   hadoop-0.20.2 jar files with hadoop-0.20-append jar files
  
  
  
I followed michael noll's tutorial for making hadoop-0-20-append
 jars..
  
  
  
 
 http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio
   n-for-hbase-0-90-2/
  
   After following the article.. we get 5 jar files which we need to
 replace
   it
   from hadoop.0.20.2 jar file.
   There is no jar file for hadoop-eclipse plugin..that I can see in my
   repository if I follow that tutorial.
  
   Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
   regarding whether it is compatible with hadoop-0.20-append.
  
   Does anyone else. faced this kind of issue ???
  
   Thanks,
   Praveenesh
  
  
  
   On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com
  wrote:
  
   Hadoop eclipse plugin also uses hadoop-core.jar file communicate to
 the
   hadoop cluster. For this it needs to have same version of
 hadoop-core.jar
   for client as well as server(hadoop cluster).
  
   Update the hadoop eclipse plugin for your eclipse which is provided
 with
   hadoop-0.20-append release, it will work fine.
  
  
   Devaraj K
  
   -Original Message-
   From: praveenesh kumar [mailto:praveen...@gmail.com]
   Sent: Wednesday, June 22, 2011 11:25 AM
   To: common-user@hadoop.apache.org
   Subject: Hadoop eclipse plugin stopped working after replacing
   hadoop-0.20.2
   jar files with hadoop-0.20-append jar files
  
  
   Guys,
   I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
   It was working fine for me.
   I was using Eclipse SDK Helios 3.6.2 with the plugin
   hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
   MAPREDUCE-1280
  
   Now for Hbase installation.. I had to use hadoop-0.20-append compiled
   jars..and I had to replace the old jar files with new 0.20-append
  compiled
   jar files..
   But now after replacing .. my hadoop eclipse plugin is not working
 well
  for
   me.
   Whenever I am trying to connect to my hadoop master node from that and
  try
   to see DFS locations..
   it is giving me the following error:
   *
   Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol
 version
   mismatch (client 41 server 43)*
  
   However the hadoop cluster is working fine if I go directly on hadoop
   namenode use hadoop commands..
   I can add files to HDFS.. run jobs from there.. HDFS web console and
   Map-Reduce web console are also working fine. but not able to use my
   previous hadoop eclipse plugin.
  
   Any suggestions or help for this issue ?
  
   Thanks,
   Praveenesh
  
  
  
  
 



Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files

2011-06-22 Thread Jack Ye
I used the 0.20.203.0, and can't access the Dfs locations.
Following is the error:
failure to login
 internal error:map/reduce location status updater
org/codehaus/jackson/map/jsonmappingexceptoon

Yaozhen Pan itzhak@gmail.com编写:

Hi,

Our hadoop version was built on 0.20-append with a few patches.
However, I didn't see big differences in eclipse-plugin.

Yaozhen

On Thu, Jun 23, 2011 at 11:29 AM, 叶达峰 (Jack Ye) kobe082...@qq.com wrote:

 do you use hadoop 0.20.203.0?
 I also have problem about this plugin.

 Yaozhen Pan itzhak@gmail.com编写:

 Hi,
 
 I am using Eclipse Helios Service Release 2.
 I encountered a similar problem (map/reduce perspective failed to load)
 when
 upgrading eclipse plugin from 0.20.2 to 0.20.3-append version.
 
 I compared the source code of eclipse plugin and found only a few
 difference. I tried to revert the differences one by one to see if it can
 work.
 What surprised me was that when I only reverted the jar name from
 hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar,
 it
 worked in eclipse.
 
 Yaozhen
 
 
 On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.com
 wrote:
 
  I am doing that.. its not working.. If I am replacing the hadoop-core
 from
  hadoop-plugin.jar.. I am not able to see map-reduce perspective at all.
  Guys.. any help.. !!!
 
  Thanks,
  Praveenesh
 
  On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com
 wrote:
 
   Every time when hadoop builds, it also builds the hadoop eclipse
 plug-in
   using the latest hadoop core jar. In your case eclipse plug-in
 contains
  the
   other version jar and cluster is running with other version. That's
 why
  it
   is giving the version mismatch error.
  
  
  
   Just replace the hadoop-core jar in your eclipse plug-in with the jar
   whatever the hadoop cluster is using  and check.
  
  
  
   Devaraj K
  
_
  
   From: praveenesh kumar [mailto:praveen...@gmail.com]
   Sent: Wednesday, June 22, 2011 12:07 PM
   To: common-user@hadoop.apache.org; devara...@huawei.com
   Subject: Re: Hadoop eclipse plugin stopped working after replacing
   hadoop-0.20.2 jar files with hadoop-0.20-append jar files
  
  
  
I followed michael noll's tutorial for making hadoop-0-20-append
 jars..
  
  
  
 
 http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio
   n-for-hbase-0-90-2/
  
   After following the article.. we get 5 jar files which we need to
 replace
   it
   from hadoop.0.20.2 jar file.
   There is no jar file for hadoop-eclipse plugin..that I can see in my
   repository if I follow that tutorial.
  
   Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280
   regarding whether it is compatible with hadoop-0.20-append.
  
   Does anyone else. faced this kind of issue ???
  
   Thanks,
   Praveenesh
  
  
  
   On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com
  wrote:
  
   Hadoop eclipse plugin also uses hadoop-core.jar file communicate to
 the
   hadoop cluster. For this it needs to have same version of
 hadoop-core.jar
   for client as well as server(hadoop cluster).
  
   Update the hadoop eclipse plugin for your eclipse which is provided
 with
   hadoop-0.20-append release, it will work fine.
  
  
   Devaraj K
  
   -Original Message-
   From: praveenesh kumar [mailto:praveen...@gmail.com]
   Sent: Wednesday, June 22, 2011 11:25 AM
   To: common-user@hadoop.apache.org
   Subject: Hadoop eclipse plugin stopped working after replacing
   hadoop-0.20.2
   jar files with hadoop-0.20-append jar files
  
  
   Guys,
   I was using hadoop eclipse plugin on hadoop 0.20.2 cluster..
   It was working fine for me.
   I was using Eclipse SDK Helios 3.6.2 with the plugin
   hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA
   MAPREDUCE-1280
  
   Now for Hbase installation.. I had to use hadoop-0.20-append compiled
   jars..and I had to replace the old jar files with new 0.20-append
  compiled
   jar files..
   But now after replacing .. my hadoop eclipse plugin is not working
 well
  for
   me.
   Whenever I am trying to connect to my hadoop master node from that and
  try
   to see DFS locations..
   it is giving me the following error:
   *
   Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol
 version
   mismatch (client 41 server 43)*
  
   However the hadoop cluster is working fine if I go directly on hadoop
   namenode use hadoop commands..
   I can add files to HDFS.. run jobs from there.. HDFS web console and
   Map-Reduce web console are also working fine. but not able to use my
   previous hadoop eclipse plugin.
  
   Any suggestions or help for this issue ?
  
   Thanks,
   Praveenesh
  
  
  
  
 



Re: Poor scalability with map reduce application

2011-06-22 Thread Harsh J
Alberto,

I can assure you that fiddling with default replication factors can't
be the solution here. Most of us running a 3+ cluster still use the
3-replica-factor and it hardly introduces a performance lag. As long
as your Hadoop cluster network is not shared with other network
applications, you shouldn't be seeing any network slowdowns.

Anyhow, the dfs.replication.max is not what you were looking to
change. It was dfs.replication instead (to affect all new file
replication values). AFAIK, there is no replication factor hardcoded
anywhere in code, its all configurable, so its just a matter of
setting the right configuration :)

Regarding the 10 thing: The MR components try to load their jars and
other submitted code/files with a 10 replication factor by default, so
that it propagates to all racks/etc and leads to a fast startup of
tasks. I do not think that's a problem either in your case (if it gets
4, it will use 4, if it gets 7, it will use 7 -- but won't take too
long).

On Thu, Jun 23, 2011 at 6:14 AM, Alberto Andreotti
albertoandreo...@gmail.com wrote:
 Hi guys,

 I suspected that the problem was due to overhead introduced by the
 filesystem, so I tried to set the dfs.replication.max property to
 different values.
 First, I tried with 2, and I got a message saying that I was requesting a
 value of 3, which was bigger than the limit. So I couldn't do the run(it
 seems this 3 is hardcoded somewhere, I read that in Jira).
 Then I tried with 3, I could generate the input files for the map reduce
 app, but when trying to run I got this one,

 Exception in thread main java.io.IOException: file
 /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar.
 Requested replication 10 exceeds maximum 3
     at
 org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468)


 which seems like the framework were trying to replicate the output in as
 many nodes as possible. Could this be the degradation source?.
 Also I attached the log for the run with 7 nodes,.

 Alberto.


 On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote:

 Matt,

 You're right that it (slowstart) does not / would not affect much. I
 was merely explaining the reason behind his observance of reducers
 getting scheduled early, not really recommending a tweak for
 performance changes there.

 On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000)
 matthew.go...@monsanto.com wrote:
  Harsh,
 
  Is it possible for mapred.reduce.slowstart.completed.maps to even play a
  significant role in this? The only benefit he would find in tweaking that
  for his problem would be to spread network traffic from the shuffle over a
  longer period of time at a cost of having the reducer using resources
  earlier. Either way he would see this effect across both sets of runs if he
  is using the default parameters. I guess it would all depend on what kind 
  of
  network layout the cluster is on.
 
  Matt
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: Tuesday, June 21, 2011 12:09 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Poor scalability with map reduce application
 
  Alberto,
 
  On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
  albertoandreo...@gmail.com wrote:
  I don't know if speculatives maps are on, I'll check it. One thing I
  observed is that reduces begin before all maps have finished. Let me
  check
  also if the difference is on the map side or in the reduce. I believe
  it's
  balanced, both are slower when adding more nodes, but i'll confirm
  that.
 
  Maps and reduces are speculative by default, so must've been ON. Could
  you also post a general input vs. output record counts and statistics
  like that between your job runs, to correlate?
 
  The reducers get scheduled early but do not exactly reduce() until
  all maps are done. They just keep fetching outputs. Their scheduling
  can be controlled with some configurations (say, to start only after
  X% of maps are done -- by default it starts up when 5% of maps are
  done).
 
  --
  Harsh J
  This e-mail message may contain privileged and/or confidential
  information, and is intended to be received only by persons entitled
  to receive such information. If you have received this e-mail in error,
  please notify the sender immediately. Please delete it and
  all attachments from any servers, hard drives or any other media. Other
  use of this e-mail by you is strictly prohibited.
 
  All e-mails and attachments sent and received are subject to monitoring,
  reading and archival by Monsanto, including its
  subsidiaries. The recipient of this e-mail is solely responsible for
  checking for the presence of Viruses or other Malware.
  Monsanto, along with its subsidiaries, accepts no liability for any
  damage caused by any such code transmitted by or accompanying
  this e-mail or any attachment.
 
 
  The information contained in this email may be subject to the export
  control