Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Backup and upgrade practices?
Hi, I am planning a small Hadoop cluster, but looking ahead, are there cheaps option to have a back up of the data? If I later want to upgrade the hardware, do I make a complete copy, or do I upgrade one node at a time? Thank you, Mark
Re: Automatic Configuration of Hadoop Clusters
http://www.opscode.com/chef/ http://trac.mcs.anl.gov/projects/bcfg2 http://cfengine.com/ http://www.puppetlabs.com/ I use chef personally, but the others are just as good and all are tuned towards different philosophies in configuration management. http://trac.mcs.anl.gov/projects/bcfg2- n On Wed, Jun 22, 2011 at 11:38 AM, gokul gokraz...@gmail.com wrote: Dear all, for benchmarking purposes we would like to adjust configurations as well as flexibly adding/removing machines from our Hadoop clusters. Is there any framework around allowing this in an easy manner without having to manually distribute the changed configuration files? We consider writing a bash script for that purpose, but hope that there is a tool out there saving us the work. Thanks in advance, Gokul -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Automatic-Configuration-of-Hadoop-Clusters-tp3096077p3096077.html Sent from the Users mailing list archive at Nabble.com.
RE: ClassNotFoundException while running quick start guide on Windows.
Hi Drew, I don't know if this is actually the issue or not, but the output below makes me think you might be passing Cygwin pathes into the java.exe launcher. If that's the case, it won't work. java.exe is pure Windows and doesn't know about '/cygdrive/c' for example (it also expects the path separator to be semicolon rather than colon). Every once in a while when I try to use java.exe from the Cygwin CLI on my Windows box, I get bitten by this. Sandy -Original Message- From: Drew Gross [mailto:drew.a.gr...@gmail.com] Sent: Tuesday, June 21, 2011 21:26 To: common-user@hadoop.apache.org Subject: Re: ClassNotFoundException while running quick start guide on Windows. Thanks Jeff, it was a problem with JAVA_HOME. I have another problem now though, I have this: $JAVA: /cygdrive/c/Program Files/Java/jdk1.6.0_26/bin/java $JAVA_HEAP_MAX: -Xmx1000m $HADOOP_OPTS: -Dhadoop.log.dir=C:\Users\Drew Gross\Documents\Projects\discom\hadoop-0.21.0\logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=C:\Users\Drew Gross\Documents\Projects\discom\hadoop-0.21.0\ -Dhadoop.id.str= - Dhadoop.root.logger=INFO,console -Djava.library.path=/cygdrive/c/Users/Drew Gross/Documents/Projects/discom/hadoop-0.21.0/lib/native/ -Dhadoop.policy.file=hadoop-policy.xml $CLASS: org.apache.hadoop.util.RunJar Exception in thread main java.lang.NoClassDefFoundError: Gross\Documents\Projects\discom\hadoop-0/21/0\logs Caused by: java.lang.ClassNotFoundException: Gross\Documents\Projects\discom\hadoop-0.21.0\logs at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: Gross\Documents\Projects\discom\hadoop-0.21.0\logs. Program will exit. (This is with some extra debugging info added by me in bin/hadoop) It looks like the windows style file names are causing problems, especially the spaces. Has anyone encountered this before, and know how to fix? I tried escaping the spaces and surrounding the file paths with quotes (not at the same time), but that didn't help. Drew On Tue, Jun 21, 2011 at 6:24 AM, madhu phatak phatak@gmail.com wrote: I think the jar have some issuses where its not able to read the Main class from manifest . try unjar the jar and see in Manifest.xml what is the main class and then run as follows bin/hadoop jar hadoop-*-examples.jar Full qualified main class grep input output 'dfs[a-z.]+' On Thu, Jun 16, 2011 at 10:23 AM, Drew Gross drew.a.gr...@gmail.com wrote: Hello, I'm trying to run the example from the quick start guide on Windows and I get this error: $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' Exception in thread main java.lang.NoClassDefFoundError: Caused by: java.lang.ClassNotFoundException: at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: . Program will exit. Exception in thread main java.lang.NoClassDefFoundError: Gross\Documents\Projects\discom\hadoop-0/21/0\logs Caused by: java.lang.ClassNotFoundException: Gross\Documents\Projects\discom\hadoop-0.21.0\logs at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: Gross\Documents\Projects\discom\hadoop-0.21.0\logs. Program will exit. Does anyone know what I need to change? Thank you. From, Drew -- Forget the environment. Print this e-mail immediately. Then burn it. -- Forget the environment. Print this e-mail immediately. Then burn it.
Re: Automatic Configuration of Hadoop Clusters
Pupetize From: gokul gokraz...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 22 June, 2011 8:38:13 AM Subject: Automatic Configuration of Hadoop Clusters Dear all, for benchmarking purposes we would like to adjust configurations as well as flexibly adding/removing machines from our Hadoop clusters. Is there any framework around allowing this in an easy manner without having to manually distribute the changed configuration files? We consider writing a bash script for that purpose, but hope that there is a tool out there saving us the work. Thanks in advance, Gokul -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Automatic-Configuration-of-Hadoop-Clusters-tp3096077p3096077.html Sent from the Users mailing list archive at Nabble.com.
Re: Any reason Hadoop logs cant be directed to a separate filesystem?
Looks like you missed the '#' in line beginning Feel free to set HADOOP_LOG_DIR in that script or elsewhere On 6/22/11 1:02 PM, Jack Craig jcr...@carrieriq.com wrote: Hi Folks, In the hadoop-env.sh, we find, ... # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs is there any reason this location could not be a separate filesystem on the name node? Thx, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456
Re: Any reason Hadoop logs cant be directed to a separate filesystem?
Jack, I believe the location can definitely be set to any desired path. Could you tell us the issues you face when you change it? P.s. The env var is used to set the config property hadoop.log.dir internally. So as long as you use the regular scripts (bin/ or init.d/ ones) to start daemons, it would apply fine. On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig jcr...@carrieriq.com wrote: Hi Folks, In the hadoop-env.sh, we find, ... # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs is there any reason this location could not be a separate filesystem on the name node? Thx, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456 -- Harsh J
Re: Any reason Hadoop logs cant be directed to a separate filesystem?
Thx to both respondents. Note i've not tried this redirection as I have only production grids available. Our grids are growing and with them, log volume. As until now that log location has been in the same fs as the grid data, so running out of space due log bloat is a growing problem. From your replies, sounds like I can relocate my logs, Cool! But now the tough question, if i set up a too small partition and it runs out of space, will my grid become unstable if hadoop can no longer write to its logs? Thx again, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456 On Jun 22, 2011, at 1:09 PM, Harsh J wrote: Jack, I believe the location can definitely be set to any desired path. Could you tell us the issues you face when you change it? P.s. The env var is used to set the config property hadoop.log.dir internally. So as long as you use the regular scripts (bin/ or init.d/ ones) to start daemons, it would apply fine. On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig jcr...@carrieriq.commailto:jcr...@carrieriq.com wrote: Hi Folks, In the hadoop-env.sh, we find, ... # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs is there any reason this location could not be a separate filesystem on the name node? Thx, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456 -- Harsh J
Re: Any reason Hadoop logs cant be directed to a separate filesystem?
Hi, Can I limit the log file duration ? I want to keep files for last 15 days only. Regards, Jagaran From: Jack Craig jcr...@carrieriq.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Wed, 22 June, 2011 2:00:23 PM Subject: Re: Any reason Hadoop logs cant be directed to a separate filesystem? Thx to both respondents. Note i've not tried this redirection as I have only production grids available. Our grids are growing and with them, log volume. As until now that log location has been in the same fs as the grid data, so running out of space due log bloat is a growing problem. From your replies, sounds like I can relocate my logs, Cool! But now the tough question, if i set up a too small partition and it runs out of space, will my grid become unstable if hadoop can no longer write to its logs? Thx again, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456 On Jun 22, 2011, at 1:09 PM, Harsh J wrote: Jack, I believe the location can definitely be set to any desired path. Could you tell us the issues you face when you change it? P.s. The env var is used to set the config property hadoop.log.dir internally. So as long as you use the regular scripts (bin/ or init.d/ ones) to start daemons, it would apply fine. On Thu, Jun 23, 2011 at 1:32 AM, Jack Craig jcr...@carrieriq.commailto:jcr...@carrieriq.com wrote: Hi Folks, In the hadoop-env.sh, we find, ... # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs is there any reason this location could not be a separate filesystem on the name node? Thx, jackc... Jack Craig, Operations CarrierIQ.comhttp://CarrierIQ.com 1200 Villa Ct, Suite 200 Mountain View, CA. 94041 650-625-5456 -- Harsh J
Re: Problem debugging MapReduce job under Windows
I had the same issue. I installed the previous stable version of Hadoop (0.20.2), and it worked fine. I hope this helps. -Sal
Re: Hadoop Eclipse plugin 0.20.203.0 doesn't work
can anyone help me? 叶达峰 kobe082...@qq.com编写: Hi, I am a freshman on Hadoop. Today, I spent the whole night trying to set up a development environment for Hadoop. I encounter several problems, first is that the eclipse can't load the plugin, I changed to another version, this problem was resolved. But now, I have one more difficult problem. I try to set up Map/Reduce Location, if everything were well, it should connect to the server and the DFS Location could list the whole file system. Sadly, it doesn't work. I have checked to configuration several times, it should be correct. Here is the message I get: Error: failure to login An internal error occured during: Map/reduce location status updater. org/codehaus/jackson/map/jsonmappingexception
Re: OutOfMemoryError: GC overhead limit exceeded
I've run into similar problems in my hive jobs and will look at the 'mapred.child.ulimit' option. One thing that we've found is when loading data with insert overwrite into our hive tables we've needed to include a 'CLUSTER BY' or 'DISTRIBUTE BY' option. Generally that's fixed our memory issues during the reduce phase. But not 100% of the time (but close). I understand the basics as to what those options do but I'm unclear as to why they are necessary (coming from an Oracle and Postgres DBA background). I'm guessing it has to do something with the underlying code. On 06/18/2011 12:28 PM, Mapred Learn wrote: Did u try playing with mapred.child.ulimit along with java.opts ? Sent from my iPhone On Jun 18, 2011, at 9:55 AM, Ken Williamszoo9...@hotmail.com wrote: Hi All, I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly large datasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've trained the classifier on 80,000 docs and am using the remaining 20,000 as the test set. I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines). I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values like 2000, 2500 and 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config file: http://pastebin.com/4GEFSnUM). I found a similar problem with a 'GC overhead limit exceeded' where the program was spending so much time garbage-collecting (more then 90% of its time!) that it was unable to progress and so threw the 'GC overhead limit exceeded' exception. If I set (-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this exception then I see the same behaviour as before only a slightly different exception is thrown, Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:39) So I'm guessing that maybe my program is spending too much time garbage-collecting for it to progress ? But how do I fix this ? There's no further info in the log-files other than seeing the exceptions being thrown. I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data going into each 'map' process (and therefore reduce it's memory requirements) but this made no difference. I tried various settings for JVM reuse (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use (10), and unlimited re-use (-1) but no difference. I think the problem is in the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6.Any help would be very much appreciated, Ken Williams
Re: Poor scalability with map reduce application
Hi guys, I suspected that the problem was due to overhead introduced by the filesystem, so I tried to set the dfs.replication.max property to different values. First, I tried with 2, and I got a message saying that I was requesting a value of 3, which was bigger than the limit. So I couldn't do the run(it seems this 3 is hardcoded somewhere, I read that in Jira). Then I tried with 3, I could generate the input files for the map reduce app, but when trying to run I got this one, Exception in thread main java.io.IOException: file /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar. Requested replication 10 exceeds maximum 3 at org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468) which seems like the framework were trying to replicate the output in as many nodes as possible. Could this be the degradation source?. Also I attached the log for the run with 7 nodes,. Alberto. On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote: Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there. On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. -- Harsh J -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto 11/06/23 01:09:38 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 11/06/23 01:09:38 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 11/06/23 01:09:38 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/06/23 01:09:38 INFO input.FileInputFormat: Total input paths to process : 1 11/06/23 01:09:40 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 11/06/23
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.comwrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
do you use hadoop 0.20.203.0? I also have problem about this plugin. Yaozhen Pan itzhak@gmail.com编写: Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.comwrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
Hi, Our hadoop version was built on 0.20-append with a few patches. However, I didn't see big differences in eclipse-plugin. Yaozhen On Thu, Jun 23, 2011 at 11:29 AM, 叶达峰 (Jack Ye) kobe082...@qq.com wrote: do you use hadoop 0.20.203.0? I also have problem about this plugin. Yaozhen Pan itzhak@gmail.com编写: Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.com wrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
I used the 0.20.203.0, and can't access the Dfs locations. Following is the error: failure to login internal error:map/reduce location status updater org/codehaus/jackson/map/jsonmappingexceptoon Yaozhen Pan itzhak@gmail.com编写: Hi, Our hadoop version was built on 0.20-append with a few patches. However, I didn't see big differences in eclipse-plugin. Yaozhen On Thu, Jun 23, 2011 at 11:29 AM, 叶达峰 (Jack Ye) kobe082...@qq.com wrote: do you use hadoop 0.20.203.0? I also have problem about this plugin. Yaozhen Pan itzhak@gmail.com编写: Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.com wrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Poor scalability with map reduce application
Alberto, I can assure you that fiddling with default replication factors can't be the solution here. Most of us running a 3+ cluster still use the 3-replica-factor and it hardly introduces a performance lag. As long as your Hadoop cluster network is not shared with other network applications, you shouldn't be seeing any network slowdowns. Anyhow, the dfs.replication.max is not what you were looking to change. It was dfs.replication instead (to affect all new file replication values). AFAIK, there is no replication factor hardcoded anywhere in code, its all configurable, so its just a matter of setting the right configuration :) Regarding the 10 thing: The MR components try to load their jars and other submitted code/files with a 10 replication factor by default, so that it propagates to all racks/etc and leads to a fast startup of tasks. I do not think that's a problem either in your case (if it gets 4, it will use 4, if it gets 7, it will use 7 -- but won't take too long). On Thu, Jun 23, 2011 at 6:14 AM, Alberto Andreotti albertoandreo...@gmail.com wrote: Hi guys, I suspected that the problem was due to overhead introduced by the filesystem, so I tried to set the dfs.replication.max property to different values. First, I tried with 2, and I got a message saying that I was requesting a value of 3, which was bigger than the limit. So I couldn't do the run(it seems this 3 is hardcoded somewhere, I read that in Jira). Then I tried with 3, I could generate the input files for the map reduce app, but when trying to run I got this one, Exception in thread main java.io.IOException: file /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar. Requested replication 10 exceeds maximum 3 at org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468) which seems like the framework were trying to replicate the output in as many nodes as possible. Could this be the degradation source?. Also I attached the log for the run with 7 nodes,. Alberto. On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote: Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there. On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control