Re:
If this is not Bigtop package installed, please see src/BUILDING.txt to build your proper native libraries. The tarball doesn't ship with globally usable native libraries, given the OS/Arch variants out there. On Wed, Jul 3, 2013 at 3:54 AM, Chui-Hui Chiu cch...@tigers.lsu.edu wrote: Hello, I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I see the following message. WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is it at the lib/native folder? How do I configure the system to load it? Thanks, Chui-hui -- Harsh J
Re: How bad is this? :)
This is what I remember: If you disable journalling, running fsck after a crash will (be required and) take longer. Certainly not a good idea to have an extra wait after the cluster loses power and is being restarted, etc. On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree cemb...@gmail.com wrote: Hey Hadoop smart folks I have a tendency to seek optimum performance given my understanding, so that led to me brilliant decision. We settled on EXT4 for our underlying FS for HDFS. Greedy for speed I thought, let's turn the journal off and gain the speed benefits. After all, I have 3 copies of the data. How much does this bother you, given we have a 21 node prod and only 10 node dev cluster. I'm embarrassed to say I did not capture good pre and post change I/O. In my simple brain, not writing to journal just screams improved I/O. Don't be shy, tell me how badly I have done bad things. (I originally said screwed the pooch but I reconsidered our USA audience. ;) If I'm not incredibly wrong, should we consider higher speed (less safe) file systems? Correct/support my thinking. Chris -- Harsh J
Re: whitelist feature of YARN
Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Package Missing When Building Hadoop Plugin For Eclipse
*Hey, guys, * * * *I'm trying to build my own hadoop(1.1.2) plugin for eclipse(3.7.2), and it is always saying that some eclipse packages do not exist. Actually the eclipse path is explicitly written in both build.xml and build-contrib.xml. And I double-checked that the path is correct and all the missing packages are actually there.* * * *Here's the error message I got:* compile: [echo] contrib: eclipse-plugin [javac] Compiling 45 source files to /home/tony/Downloads/hadoop-1.1.2/build/contrib/eclipse-plugin/classes [javac] /home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/Activator.java:22: package org.eclipse.ui.plugin does not exist [javac] import org.eclipse.ui.plugin.AbstractUIPlugin; [javac] ^ [javac] /home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/Activator.java:28: cannot find symbol [javac] symbol: class AbstractUIPlugin [javac] public class Activator extends AbstractUIPlugin { [javac]^ [javac] /home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/ErrorMessageDialog.java:21: package org.eclipse.jface.dialogs does not exist .. [javac] 100 errors BUILD FAILED /home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/build.xml:68: Compile failed; see the compiler error output for details. *Here's what my build.xml under hadoop/src/contrib/eclipse-plugin looks like:* ?xml version=1.0 encoding=UTF-8 standalone=no? project default=jar name=eclipse-plugin import file=../build-contrib.xml/ property name=eclipse.home location=/usr/share/eclipse/ property name=version value=1.1.2/ path id=eclipse-sdk-jars fileset dir=${eclipse.home}/plugins/ include name=org.eclipse.ui*.jar/ include name=org.eclipse.jdt*.jar/ include name=org.eclipse.core*.jar/ include name=org.eclipse.equinox*.jar/ include name=org.eclipse.debug*.jar/ include name=org.eclipse.osgi*.jar/ include name=org.eclipse.swt*.jar/ include name=org.eclipse.jface*.jar/ include name=org.eclipse.team.cvs.ssh2*.jar/ include name=com.jcraft.jsch*.jar/ /fileset /path !-- Override classpath to include Eclipse SDK jars -- path id=classpath pathelement location=${build.classes}/ pathelement location=${hadoop.root}/build/classes/ fileset dir=${hadoop.root} include name=**/*.jar / /fileset path refid=eclipse-sdk-jars/ /path !-- Skip building if eclipse.home is unset. -- target name=check-contrib unless=eclipse.home property name=skip.contrib value=yes/ echo message=eclipse.home unset: skipping eclipse plugin/ /target target name=compile depends=init, ivy-retrieve-common unless=skip.contrib echo message=contrib: ${name}/ javac encoding=${build.encoding} srcdir=${src.dir} includes=**/*.java destdir=${build.classes} debug=${javac.debug} deprecation=${javac.deprecation} includeantruntime=on classpath refid=classpath/ /javac /target !-- Override jar target to specify manifest -- target name=jar depends=compile unless=skip.contrib mkdir dir=${build.dir}/lib/ copy file=${hadoop.root}/hadoop-core-${version}.jar tofile=${build.dir}/lib/hadoop-core.jar verbose=true/ copy file=${hadoop.root}/lib/commons-cli-${commons-cli.version}.jar todir=${build.dir}/lib verbose=true/ copy file=${hadoop.root}/lib/commons-configuration-1.6.jar tofile=${build.dir}/lib/commons-configuration.jar verbose=true/ copy file=${hadoop.root}/lib/commons-httpclient-3.0.1.jar tofile=${build.dir}/lib/commons-httpclient.jar verbose=true/ copy file=${hadoop.root}/lib/commons-lang-2.4.jar tofile=${build.dir}/lib/commons-lang.jar verbose=true/ copy file=${hadoop.root}/lib/jackson-core-asl-1.8.8.jar tofile=${build.dir}/lib/jackson-core-asl.jar verbose=true/ copy file=${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar tofile=${build.dir}/lib/jackson-mapper-asl.jar verbose=true/ echo message=${build.dir}/ echo message=${root}/ jar jarfile=${build.dir}/hadoop-${name}-${version}.jar manifest=${root}/META-INF/MANIFEST.MF fileset dir=${build.dir} includes=classes/ lib// fileset dir=${root} includes=resources/ plugin.xml/ /jar /target /project *I add these infos to hadoop/src/contrib/build-contrib.xml:* property name=eclipse.home location=/usr/share/eclipse/ property name=version value=1.1.2/ *And also I added these infos to hadoop/src/contrib/eclipse-plugin/META-INF/MANIFEST.MF:* Bundle-ClassPath: classes/, lib/hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-configuration-1.6.jar, lib/commons-httpclient-3.0.1.jar, lib/commons-lang-2.4.jar, lib/jackson-core-asl-1.8.8.jar, lib/jackson-mapper-asl-1.8.8.jar *I'm totally freaking out by getting this message and just wondering
Re: How bad is this? :)
Hi Chris, You should use a utility like iozone http://www.iozone.org/; for benchmarking drives while tuning your filesystem. You may be surprised at what measured values can show you. :) We use ext4 for storing HDFS blocks on our compute nodes and journaling has been left on. We also have 'writeback' enabled and commits are delayed by 30 seconds. Slide 21 has suggestions for tuning ext4: http://www.slideshare.net/allenwittenauer/2012-lihadoopperf; Be warned that with these settings and 3 copies of each block, it's still possible to lose data in the event of a power loss. ~2.5 years ago we had a datacenter power failure and I think lost 6-10 files due to block corruption. Those files were actively being written when the power failure happened so we ended up rerunning those jobs. Balancing performance vs exposure is something to keep in mind when making these kinds of changes. -- Adam On Jul 9, 2013, at 12:25 AM, Harsh J ha...@cloudera.com wrote: This is what I remember: If you disable journalling, running fsck after a crash will (be required and) take longer. Certainly not a good idea to have an extra wait after the cluster loses power and is being restarted, etc. On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree cemb...@gmail.com wrote: Hey Hadoop smart folks I have a tendency to seek optimum performance given my understanding, so that led to me brilliant decision. We settled on EXT4 for our underlying FS for HDFS. Greedy for speed I thought, let's turn the journal off and gain the speed benefits. After all, I have 3 copies of the data. How much does this bother you, given we have a 21 node prod and only 10 node dev cluster. I'm embarrassed to say I did not capture good pre and post change I/O. In my simple brain, not writing to journal just screams improved I/O. Don't be shy, tell me how badly I have done bad things. (I originally said screwed the pooch but I reconsidered our USA audience. ;) If I'm not incredibly wrong, should we consider higher speed (less safe) file systems? Correct/support my thinking. Chris -- Harsh J
Distributed Cache
Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); ... However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? Thanks in advance, Andrew
Re: Distributed Cache
You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: ** ** import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; ** ** Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); … ** ** However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? ** ** Thanks in advance, ** ** Andrew
Re: Issues Running Hadoop 1.1.2 on multi-node cluster
Siddharth, The error msgs pointing to file system issues. Make sure that the file system locations you specified in the config files are accurate and accessible. -Sreedhar From: siddharth mathur sidh1...@gmail.com To: user@hadoop.apache.org Sent: Tuesday, July 9, 2013 9:56 AM Subject: Issues Running Hadoop 1.1.2 on multi-node cluster Hi, I have installed Hadoop 1.1.2 on a 5 nodes cluster. I installed it watching this tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ When I startup the hadoop, I get the folloing error in all the tasktrackers. 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201307051203_0001 for user-log deletion with retainTimeStamp:1373472921775 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201307051611_0001 for user-log deletion with retainTimeStamp:1373472921775 2013-07-09 12:15:22,601 INFO org.apache.hadoop.mapred.TaskTracker:Failed to get system directory... 2013-07-09 12:15:25,164 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... 2013-07-09 12:15:27,901 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... 2013-07-09 12:15:30,144 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... But everything looks fine in the webUI. When I run a job, I get the following error but the job completes anyways. I haveattached the screenshots of the maptask failed error log in the UI. 13/07/09 12:29:37 INFO input.FileInputFormat: Total input paths to process : 2 13/07/09 12:29:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/07/09 12:29:37 WARN snappy.LoadSnappy: Snappy native library not loaded 13/07/09 12:29:37 INFO mapred.JobClient: Running job: job_201307091215_0001 13/07/09 12:29:38 INFO mapred.JobClient: map 0% reduce 0% 13/07/09 12:29:41 INFO mapred.JobClient: Task Id : attempt_201307091215_0001_m_01_0, Status : FAILED Error initializing attempt_201307091215_0001_m_01_0: ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2581) at java.lang.Thread.run(Thread.java:724) 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stdout 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stderr 13/07/09 12:29:45 INFO mapred.JobClient: map 50% reduce 0% 13/07/09 12:29:53 INFO mapred.JobClient: map 50% reduce 16% 13/07/09 12:30:38 INFO mapred.JobClient: Task Id : attempt_201307091215_0001_m_00_1, Status : FAILED Error initializing attempt_201307091215_0001_m_00_1: ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221) at
HiBench tool not running
Hi , I am running hibench on my Hadoop setup Not able to initialize History viewer. Caused by java.io.Exception: Not a valid history directory output/log/_history I did not find much on the internet. Any idea what is going wrong. My Hadoop cluster is running the terasort benchmark properly. -Rahul
Re: Issues Running Hadoop 1.1.2 on multi-node cluster
Hi Siddharth, While running the multi-node we need to take care of the local host of the slave machine from the error messages the task tracker root directory not able to get to the masters. Please check and rerun it. Thanks, Kiran On Tue, Jul 9, 2013 at 10:26 PM, siddharth mathur sidh1...@gmail.comwrote: Hi, I have installed Hadoop 1.1.2 on a 5 nodes cluster. I installed it watching this tutorial * http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ * When I startup the hadoop, I get the folloing error in *all* the tasktrackers. 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201307051203_0001 for user-log deletion with retainTimeStamp:1373472921775 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201307051611_0001 for user-log deletion with retainTimeStamp:1373472921775 2013-07-09 12:15:22,601 INFO org.apache.hadoop.mapred.TaskTracker:*Failed to get system directory *... 2013-07-09 12:15:25,164 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... 2013-07-09 12:15:27,901 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... 2013-07-09 12:15:30,144 INFO org.apache.hadoop.mapred.TaskTracker: Failed to get system directory... *But everything looks fine in the webUI. * When I run a job, I get the following error but the job completes anyways. I have* attached the* *screenshots* of the maptask failed error log in the UI. ** 13/07/09 12:29:37 INFO input.FileInputFormat: Total input paths to process : 2 13/07/09 12:29:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/07/09 12:29:37 WARN snappy.LoadSnappy: Snappy native library not loaded 13/07/09 12:29:37 INFO mapred.JobClient: Running job: job_201307091215_0001 13/07/09 12:29:38 INFO mapred.JobClient: map 0% reduce 0% 13/07/09 12:29:41 INFO mapred.JobClient: Task Id : attempt_201307091215_0001_m_01_0, Status : FAILED Error initializing attempt_201307091215_0001_m_01_0: ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2581) at java.lang.Thread.run(Thread.java:724) 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stdout 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stderr 13/07/09 12:29:45 INFO mapred.JobClient: map 50% reduce 0% 13/07/09 12:29:53 INFO mapred.JobClient: map 50% reduce 16% 13/07/09 12:30:38 INFO mapred.JobClient: Task Id : attempt_201307091215_0001_m_00_1, Status : FAILED Error initializing attempt_201307091215_0001_m_00_1: ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221) at
Re: Distributed Cache
It should be like this: Configuration conf = new Configuration(); Job job = new Job(conf, test); job.setJarByClass(Test.class); DistributedCache.addCacheFile(new Path(your hdfs path).toUri(), job.getConfiguration()); but the best example is test cases: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu yuzhih...@gmail.com wrote: You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: ** ** import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; ** ** Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI(file path in HDFS), conf); Job job = Job.getInstance(); … ** ** However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? ** ** Thanks in advance, ** ** Andrew
RE: can not start yarn
Hi, Here NM is failing to connect to Resource Manager. Have you started the Resource Manager successfully? Or Do you see any problem while starting Resource Manager in RM log.. If you have started the Resource Manager in different machine other than NM, you need to set this configuration for NM yarn.resourcemanager.resource-tracker.address with RM resource tracker address. Thanks Devaraj k From: ch huang [mailto:justlo...@gmail.com] Sent: 10 July 2013 08:36 To: user@hadoop.apache.org Subject: can not start yarn i am testing mapreducev2 ,i find i start NM error here is NM log content 2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is started. 2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is started. 2013-07-10 11:02:35,930 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at /0.0.0.0:8031http://0.0.0.0:8031 2013-07-10 11:02:37,209 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:38,210 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:40,212 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:41,213 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:42,215 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:43,216 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:44,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:45,218 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:46,219 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFi xedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-07-10 11:02:46,226 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:190) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137) ... 4 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From
How to configure Hive metastore (Mysql) for beeswax(Hive UI) in Clouera Manager
Hi, I am using Cloudera Manager 4.1.2 not having hive as a service, so I was installed hive and configured mysql as metastore. Using Cloudera Manager i was installed HUE. In the Hue, Beeswax (Hive UI) which is using by default derby database i want configure metastore same as what hive is using i.e Mysql and want to refer both hive and Beeswax will refer same database and metastore. I was changed the hive-site.xml file in /var/run/cloudera-scm-agent/process/662-hue-HUE_SERVER/hive-conf and /var/run/cloudera-scm-agent/process/663-hue-BEESWAX_SERVER/hive-conf but beeswax is not pointing to metastore (mysql) and restarting hue service every time creating new configuration file by cloudera manager. Any suggestions where to do configuration changes. Thanks in advance. From, Ramesh Babu,
stop-dfs.sh does not work
Hi users. I start my HDFS by using :start-dfs.sh. And add the node start successfully. However the stop-dfs.sh dose not work when I want to stop the HDFS. It shows : no namdenode to stop no datanode to stop. I have to stop it by the command: kill -9 pid. So I wonder that how the stop-dfs.sh does not work no longer? Best regards
Re: stop-dfs.sh does not work
You can try the following Sudo netstat -plten | grep java This will give you all the java process which have a socket connection open. You can easily figure out based on the port no you have mentioned in config files like core-site.xml and kill the process Thanks Regards, Deepak Rosario Pancras Tharigopla. Achiever/Responsibility/Arranger/Maximizer/Harmony Sent from my iPhone On Jul 10, 2013, at 12:30 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi users. I start my HDFS by using :start-dfs.sh. And add the node start successfully. However the stop-dfs.sh dose not work when I want to stop the HDFS. It shows : no namdenode to stop no datanode to stop. I have to stop it by the command: kill -9 pid. So I wonder that how the stop-dfs.sh does not work no longer? Best regards