RE: libhdfs install dep
Rodrigo, Assuming you are asking for hadoop 1.x You are missing the hadoop-*libhdfs* rpm. Build it or get it from the vendor you got your hadoop from. -Original Message- From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] Sent: Monday, September 24, 2012 8:20 PM To: 'core-u...@hadoop.apache.org' Subject: libhdfs install dep Anybody know why libhdfs.so is not found by package managers on CentOS 64 and OpenSuse64? I hava an rpm which declares Hadoop as a dependacy, but the package managers (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency eventhough Hadoop has been installed via rpm package, and libhdfs.so is installed as well. Thanks, Rodrigo.
RE: hadoop dfs -ls
Hi Nitin, Normally your conf should reside in /etc/hadoop/conf (if you don't have one. Copy it from the namenode - and keep it sync) hadoop (script) by default depends on hadoop-setup.sh which depends on hadoop-env.sh in /etc/hadoop/conf Or during runtime specify the config dir i.e: [hdfs]$ hadoop [--config path to your config dir] commands P.S. Some useful links: http://wiki.apache.org/hadoop/FAQ http://wiki.apache.org/hadoop/FrontPage http://wiki.apache.org/hadoop/ http://hadoop.apache.org/common/docs/r1.0.3/ -Original Message- From: d...@paraliatech.com [mailto:d...@paraliatech.com] On Behalf Of Dave Beech Sent: Friday, July 13, 2012 6:18 AM To: common-user@hadoop.apache.org Subject: Re: hadoop dfs -ls Hi Nitin It's likely that your hadoop command isn't finding the right configuration. In particular it doesn't know where your namenode is (fs.default.namesetting in core-site.xml) Maybe you need to set the HADOOP_CONF_DIR environment variable to point to your conf directory. Dave On 13 July 2012 14:11, Nitin Pawar nitinpawar...@gmail.commailto:nitinpawar...@gmail.com wrote: Hi, I have done setup numerous times but this time i did after some break. I managed to get the cluster up and running fine but when I do hadoop dfs -ls / it actually shows me contents of linux file system I am using hadoop-1.0.3 on rhel5.6 Can anyone suggest what I must have done wrong? -- Nitin Pawar
RE: JAVA_HOME is not set
I don't think OpenJDK is supported. There were a lot of problems with it. But feel free to give it a try, if you run into JVM crashes. Use Oracle(Sun) JDK 6. (NOT 7) Harsh had a good post before regarding the JVM(s). -Original Message- From: Simon [mailto:gsmst...@gmail.com] Sent: Thursday, July 05, 2012 9:53 AM To: common-user@hadoop.apache.org Cc: huangyi...@gmail.com Subject: Re: JAVA_HOME is not set I think you should set JAVA_HOME=/usr/lib/jvm/java-7-**openjdk-i386/jre JAVA_HOME is the base location of java, where it can find the java executable $JAVA_HOME/bin/java Regards, Simon On Thu, Jul 5, 2012 at 12:42 PM, Ying Huang huangyi...@gmail.com wrote: Hello, I am installing hadoop according to this page: https://cwiki.apache.org/**BIGTOP/how-to-install-hadoop-** distribution-from-bigtop.htmlhttps://cwiki.apache.org/BIGTOP/how-to-install-hadoop-distribution-from-bigtop.html I think I have successfully installed hadoop on my Ubuntu 12.04 x64. Then I go to step Running Hadoop, bellowing is my operation step, why it prompts that my JAVA_HOME is not set? --**--** -- root@ubuntu32:/usr/lib/hadoop# export JAVA_HOME=/usr/lib/jvm/java-7-** openjdk-i386/jre/bin/java root@ubuntu32:/usr/lib/hadoop# sudo -u hdfs hadoop namenode -format Error: JAVA_HOME is not set. root@ubuntu32:/usr/lib/hadoop# ls $JAVA_HOME -al -rwxr-xr-x 1 root root 5588 May 2 20:14 /usr/lib/jvm/java-7-openjdk-** i386/jre/bin/java root@ubuntu32:/usr/lib/hadoop# --**--** -- -- Best Regards Ying Huang
RE: 8021 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
Hmm... Waqas, I think you are using pre 1.6.0.20 build, and likely to be OpenJDK. Please try the Sun/Oracle JDK 1.6.0_26+ (as Harsh said.. don't stay away from 1.7) And if I'm reading the logs right you are using the older release of HDP?. V1.0.0 If this is the case, you may also want to check with the HortonWorks team. Their distro datanode uses 32bit JDK and the NN uses 64bit. You'll have to be careful of running the right JVM for each node type in the same host. BTW, also I think you are starting the JT as wagas, try running it under mapred. Something like $ su - mapred -c path to your hadoop-dameon.sh start jobtracker (once you fix the JVM versions) Cheers -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, May 16, 2012 8:59 AM To: common-user@hadoop.apache.org Subject: Re: 8021 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) Hi, On Wed, May 16, 2012 at 9:17 PM, waqas latif waqas...@gmail.com wrote: Hi I tried it with java 6 but with no success here are the links for log and out file of jobtracker with java 6 logfile link http://pastebin.com/bvWZRt0A outfile link is here which is a bit different from the java 7 http://pastebin.com/4YCZhQGh Now this does look odd (and is again the same thing). What exactly is your Java6 version? i.e. java -version output? Do you also get the same error if you try a hadoop jobtracker directly in the CLI? Also please keep in mind that i can run hadoop 0.20 with java home path set to java 7. You may be able to, but none of us presently test it with that configuration. So if you run into bugs or odd behavior with that, you'll pretty much be alone :) -- Harsh J
RE: Question on MapReduce
Nope, you must tune the config on that specific super node to have more M/R slots (this is for 1.0.x) This does not mean the JobTracker will be eager to stuff that super node with all the M/R jobs at hand. It still goes through the scheduler, Capacity Scheduler is most likely what you have. (check your config) IMO, If the data locality is not going to be there, your cluster is going to suffer from Network I/O. -Original Message- From: Satheesh Kumar [mailto:nks...@gmail.com] Sent: Friday, May 11, 2012 9:51 AM To: common-user@hadoop.apache.org Subject: Question on MapReduce Hi, I am a newbie on Hadoop and have a quick question on optimal compute vs. storage resources for MapReduce. If I have a multiprocessor node with 4 processors, will Hadoop schedule higher number of Map or Reduce tasks on the system than on a uni-processor system? In other words, does Hadoop detect denser systems and schedule denser tasks on multiprocessor systems? If yes, will that imply that it makes sense to attach higher capacity storage to store more number of blocks on systems with dense compute? Any insights will be very useful. Thanks, Satheesh
RE: Question on MapReduce
This maybe dated materials. Cloudera and HDP folks please correct with updates :) http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/ Hope this helps. -Original Message- From: Satheesh Kumar [mailto:nks...@gmail.com] Sent: Friday, May 11, 2012 12:48 PM To: common-user@hadoop.apache.org Subject: Re: Question on MapReduce Thanks, Leo. What is the config of a typical data node in a Hadoop cluster - cores, storage capacity, and connectivity (SATA?).? How many tasktrackers scheduled per core in general? Is there a best practices guide somewhere? Thanks, Satheesh On Fri, May 11, 2012 at 10:48 AM, Leo Leung lle...@ddn.com wrote: Nope, you must tune the config on that specific super node to have more M/R slots (this is for 1.0.x) This does not mean the JobTracker will be eager to stuff that super node with all the M/R jobs at hand. It still goes through the scheduler, Capacity Scheduler is most likely what you have. (check your config) IMO, If the data locality is not going to be there, your cluster is going to suffer from Network I/O. -Original Message- From: Satheesh Kumar [mailto:nks...@gmail.com] Sent: Friday, May 11, 2012 9:51 AM To: common-user@hadoop.apache.org Subject: Question on MapReduce Hi, I am a newbie on Hadoop and have a quick question on optimal compute vs. storage resources for MapReduce. If I have a multiprocessor node with 4 processors, will Hadoop schedule higher number of Map or Reduce tasks on the system than on a uni-processor system? In other words, does Hadoop detect denser systems and schedule denser tasks on multiprocessor systems? If yes, will that imply that it makes sense to attach higher capacity storage to store more number of blocks on systems with dense compute? Any insights will be very useful. Thanks, Satheesh
RE: Why is hadoop build I generated from a release branch different from release build?
Hi Pawan, ant -p (not for 0.23+) will tell you the available build targets. Use mvn (maven) for 0.23 or newer -Original Message- From: Matt Foley [mailto:mfo...@hortonworks.com] Sent: Thursday, March 08, 2012 3:52 PM To: common-user@hadoop.apache.org Subject: Re: Why is hadoop build I generated from a release branch different from release build? Hi Pawan, The complete way releases are built (for v0.20/v1.0) is documented at http://wiki.apache.org/hadoop/HowToRelease#Building However, that does a bunch of stuff you don't need, like generate the documentation and do a ton of cross-checks. The full set of ant build targets are defined in build.xml in the top level of the source code tree. binary may be the target you want. --Matt On Thu, Mar 8, 2012 at 3:35 PM, Pawan Agarwal pawan.agar...@gmail.comwrote: Hi, I am trying to generate hadoop binaries from source and execute hadoop from the build I generate. I am able to build, however I am seeing that as part of build *bin* folder which comes with hadoop installation is not generated in my build. Can someone tell me how to do a build so that I can generate build equivalent to hadoop release build and which can be used directly to run hadoop. Here's the details. Desktop: Ubuntu Server 11.10 Hadoop version for installation: 0.20.203.0 (link: http://mirrors.gigenet.com/apache//hadoop/common/hadoop-0.20.203.0/) Hadoop Branch used build: branch-0.20-security-203 Build Command used: Ant maven-install Here's the directory structures from build I generated vs hadoop official release build. *Hadoop directory which I generated:* pawan@ubuntu01:/hadoop0.20.203.0/hadoop-common/build$ ls -1 ant c++ classes contrib examples hadoop-0.20-security-203-pawan hadoop-ant-0.20-security-203-pawan.jar hadoop-core-0.20-security-203-pawan.jar hadoop-examples-0.20-security-203-pawan.jar hadoop-test-0.20-security-203-pawan.jar hadoop-tools-0.20-security-203-pawan.jar ivy jsvc src test tools webapps *Official Hadoop build installation* pawan@ubuntu01:/hadoop0.20.203.0/hadoop-common/build$ ls /hadoop -1 bin build.xml c++ CHANGES.txt conf contrib docs hadoop-ant-0.20.203.0.jar hadoop-core-0.20.203.0.jar hadoop-examples-0.20.203.0.jar hadoop-test-0.20.203.0.jar hadoop-tools-0.20.203.0.jar input ivy ivy.xml lib librecordio LICENSE.txt logs NOTICE.txt README.txt src webapps Any pointers for help are greatly appreciated? Also, if there are any other resources for understanding hadoop build system, pointers to that would be also helpful. Thanks Pawan
RE: Hadoop and Hibernate
Geoffry, Hadoop distributedCache (as of now) is used to cache M/R application specific files. These files are used by M/R app only and not the framework. (Normally as side-lookup) You can certainly try to use Hibernate to query your SQL based back-end within the M/R code. But think of what happens when a few hundred or thousands of M/R task do that concurrently. Your back-end is going to cry. (if it can - before it dies) So IMO, prep your M/R job with distributedCache files (pull it down first) is a better approach. Also, MPI is pretty much out of question (not baked into the framework). You'll likely have to roll your own. (And try to trick the JobTracker in not starting the same task) Anyone has a better solution for Geoffry? -Original Message- From: Geoffry Roberts [mailto:geoffry.robe...@gmail.com] Sent: Friday, March 02, 2012 9:42 AM To: common-user@hadoop.apache.org Subject: Re: Hadoop and Hibernate This is a tardy response. I'm spread pretty thinly right now. DistributedCachehttp://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCacheis apparently deprecated. Is there a replacement? I didn't see anything about this in the documentation, but then I am still using 0.21.0. I have to for performance reasons. 1.0.1 is too slow and the client won't have it. Also, the DistributedCachehttp://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCacheapproach seems only to work from within a hadoop job. i.e. From within a Mapper or a Reducer, but not from within a Driver. I have libraries that I must access both from both places. I take it that I am stuck keeping two copies of these libraries in synch--Correct? It's either that, or copy them into hdfs, replacing them all at the beginning of each job run. Looking for best practices. Thanks On 28 February 2012 10:17, Owen O'Malley omal...@apache.org wrote: On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts geoffry.robe...@gmail.com wrote: If I create an executable jar file that contains all dependencies required by the MR job do all said dependencies get distributed to all nodes? You can make a single jar and that will be distributed to all of the machines that run the task, but it is better in most cases to use the distributed cache. See http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#Distr ibutedCache If I specify but one reducer, which node in the cluster will the reducer run on? The scheduling is done by the JobTracker and it isn't possible to control the location of the reducers. -- Owen -- Geoffry Roberts