Re:

2013-07-09 Thread Harsh J
If this is not Bigtop package installed, please see src/BUILDING.txt
to build your proper native libraries. The tarball doesn't ship with
globally usable native libraries, given the OS/Arch variants out
there.

On Wed, Jul 3, 2013 at 3:54 AM, Chui-Hui Chiu cch...@tigers.lsu.edu wrote:
 Hello,

 I have a Hadoop 2.0.5 Alpha cluster.  When I execute any Hadoop command, I
 see the following message.

 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
 platform... using builtin-java classes where applicable

 Is it at the lib/native folder?  How do I configure the system to load it?

 Thanks,
 Chui-hui



-- 
Harsh J


Re: How bad is this? :)

2013-07-09 Thread Harsh J
This is what I remember: If you disable journalling, running fsck
after a crash will (be required and) take longer. Certainly not a good
idea to have an extra wait after the cluster loses power and is being
restarted, etc.

On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree cemb...@gmail.com wrote:
 Hey Hadoop smart folks

 I have a tendency to seek optimum performance given my understanding, so
 that led to me brilliant decision.  We settled on EXT4 for our underlying
 FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
 gain the speed benefits.  After all, I have 3 copies of the data.

 How much does this bother you, given we have a 21 node prod and only 10 node
 dev cluster.

 I'm embarrassed to say I did not capture good pre and post change I/O.  In
 my simple brain, not writing to journal just screams improved I/O.

 Don't be shy, tell me how badly I have done bad things. (I originally said
 screwed the pooch but I reconsidered our  USA audience. ;)

 If I'm not incredibly wrong, should we consider higher speed (less safe)
 file systems?

 Correct/support my thinking.
 Chris



--
Harsh J


Re: whitelist feature of YARN

2013-07-09 Thread Krishna Kishore Bonagiri
Hi Sandy,

  Yes, I have been using AMRMClient APIs. I am planning to shift to
whatever way is this white list feature is supported with. But am not sure
what is meant by submitting ResourceRequests directly to RM. Can you please
elaborate on this or give me a pointer to some example code on how to do
it...

   Thanks for the reply,

-Kishore


On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Krishna,

 From your previous email, it looks like you are using the AMRMClient APIs.
  Support for whitelisting is not yet supported through them.  I am working
 on this in YARN-521, which should be included in the next release after
 2.1.0-beta.  If you are submitting ResourceRequests directly to the RM, you
 can whitelist a node by
 * setting the relaxLocality flag on the node-level ResourceRequest to true
 * setting the relaxLocality flag on the corresponding rack-level
 ResourceRequest to false
 * setting the relaxLocality flag on the corresponding any-level
 ResourceRequest to false

 -Sandy


 On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Can someone please point to some example code of how to use the
 whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta
 and want to use this feature.

   It would be great if you can point me to some description of what this
 white listing feature is, I have gone through some JIRA logs related to
 this but more concrete explanation would be helpful.

 Thanks,
 Kishore





Package Missing When Building Hadoop Plugin For Eclipse

2013-07-09 Thread TonY Xu
*Hey, guys, *
*
*
*I'm trying to build my own hadoop(1.1.2) plugin for eclipse(3.7.2), and it
is always saying that some eclipse packages do not exist. Actually the
eclipse path is explicitly written in both build.xml and build-contrib.xml.
And I double-checked that the path is correct and all the missing packages
are actually there.*
*
*
*Here's the error message I got:*

compile:
[echo] contrib: eclipse-plugin
[javac] Compiling 45 source files to
/home/tony/Downloads/hadoop-1.1.2/build/contrib/eclipse-plugin/classes
[javac]
/home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/Activator.java:22:
package org.eclipse.ui.plugin does not exist
[javac] import org.eclipse.ui.plugin.AbstractUIPlugin;
[javac] ^
[javac]
/home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/Activator.java:28:
cannot find symbol
[javac] symbol: class AbstractUIPlugin
[javac] public class Activator extends AbstractUIPlugin {
[javac]^
[javac]
/home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/ErrorMessageDialog.java:21:
package org.eclipse.jface.dialogs does not exist
..
[javac] 100 errors

BUILD FAILED
/home/tony/Downloads/hadoop-1.1.2/src/contrib/eclipse-plugin/build.xml:68:
Compile failed; see the compiler error output for details.


*Here's what my build.xml under hadoop/src/contrib/eclipse-plugin looks
like:*

?xml version=1.0 encoding=UTF-8 standalone=no?

project default=jar name=eclipse-plugin

  import file=../build-contrib.xml/

  property name=eclipse.home location=/usr/share/eclipse/
  property name=version value=1.1.2/

  path id=eclipse-sdk-jars
fileset dir=${eclipse.home}/plugins/
  include name=org.eclipse.ui*.jar/
  include name=org.eclipse.jdt*.jar/
  include name=org.eclipse.core*.jar/
  include name=org.eclipse.equinox*.jar/
  include name=org.eclipse.debug*.jar/
  include name=org.eclipse.osgi*.jar/
  include name=org.eclipse.swt*.jar/
  include name=org.eclipse.jface*.jar/

  include name=org.eclipse.team.cvs.ssh2*.jar/
  include name=com.jcraft.jsch*.jar/
/fileset
  /path

  !-- Override classpath to include Eclipse SDK jars --
  path id=classpath
pathelement location=${build.classes}/
pathelement location=${hadoop.root}/build/classes/
fileset dir=${hadoop.root}
include name=**/*.jar /
/fileset
path refid=eclipse-sdk-jars/
  /path

  !-- Skip building if eclipse.home is unset. --
  target name=check-contrib unless=eclipse.home
property name=skip.contrib value=yes/
echo message=eclipse.home unset: skipping eclipse plugin/
  /target

 target name=compile depends=init, ivy-retrieve-common
unless=skip.contrib
echo message=contrib: ${name}/
javac
 encoding=${build.encoding}
 srcdir=${src.dir}
 includes=**/*.java
 destdir=${build.classes}
 debug=${javac.debug}
 deprecation=${javac.deprecation}
 includeantruntime=on
 classpath refid=classpath/
/javac
  /target

  !-- Override jar target to specify manifest --
  target name=jar depends=compile unless=skip.contrib
mkdir dir=${build.dir}/lib/
copy file=${hadoop.root}/hadoop-core-${version}.jar
tofile=${build.dir}/lib/hadoop-core.jar verbose=true/
copy file=${hadoop.root}/lib/commons-cli-${commons-cli.version}.jar
 todir=${build.dir}/lib verbose=true/
copy file=${hadoop.root}/lib/commons-configuration-1.6.jar
 tofile=${build.dir}/lib/commons-configuration.jar verbose=true/
copy file=${hadoop.root}/lib/commons-httpclient-3.0.1.jar
 tofile=${build.dir}/lib/commons-httpclient.jar verbose=true/
copy file=${hadoop.root}/lib/commons-lang-2.4.jar
 tofile=${build.dir}/lib/commons-lang.jar verbose=true/
copy file=${hadoop.root}/lib/jackson-core-asl-1.8.8.jar
 tofile=${build.dir}/lib/jackson-core-asl.jar verbose=true/
copy file=${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar
 tofile=${build.dir}/lib/jackson-mapper-asl.jar verbose=true/
echo message=${build.dir}/
echo message=${root}/
jar
  jarfile=${build.dir}/hadoop-${name}-${version}.jar
  manifest=${root}/META-INF/MANIFEST.MF
  fileset dir=${build.dir} includes=classes/ lib//
  fileset dir=${root} includes=resources/ plugin.xml/
/jar
  /target

/project

*I add these infos to hadoop/src/contrib/build-contrib.xml:*

  property name=eclipse.home location=/usr/share/eclipse/
  property name=version value=1.1.2/

*And also I added these infos to
hadoop/src/contrib/eclipse-plugin/META-INF/MANIFEST.MF:*

Bundle-ClassPath: classes/,
 lib/hadoop-core.jar,
 lib/commons-cli-1.2.jar,
 lib/commons-configuration-1.6.jar,
 lib/commons-httpclient-3.0.1.jar,
 lib/commons-lang-2.4.jar,
 lib/jackson-core-asl-1.8.8.jar,
 lib/jackson-mapper-asl-1.8.8.jar

*I'm totally freaking out by getting this message and just wondering 

Re: How bad is this? :)

2013-07-09 Thread Adam Faris
Hi Chris,

You should use a utility like iozone http://www.iozone.org/; for benchmarking 
drives while tuning your filesystem.  You may be surprised at what measured 
values can show you. :)

We use ext4 for storing HDFS blocks on our compute nodes and journaling has 
been left on.  We also have 'writeback' enabled and commits are delayed by 30 
seconds.  Slide 21 has suggestions for tuning ext4: 
http://www.slideshare.net/allenwittenauer/2012-lihadoopperf;  Be warned that 
with these settings and 3 copies of each block, it's still possible to lose 
data in the event of a power loss.   ~2.5 years ago we had a datacenter power 
failure and I think lost 6-10 files due to block corruption.  Those files were 
actively being written when the power failure happened so we ended up rerunning 
those jobs.  Balancing performance vs exposure is something to keep in mind 
when making these kinds of changes.  

-- Adam

On Jul 9, 2013, at 12:25 AM, Harsh J ha...@cloudera.com wrote:

 This is what I remember: If you disable journalling, running fsck
 after a crash will (be required and) take longer. Certainly not a good
 idea to have an extra wait after the cluster loses power and is being
 restarted, etc.
 
 On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree cemb...@gmail.com wrote:
 Hey Hadoop smart folks
 
 I have a tendency to seek optimum performance given my understanding, so
 that led to me brilliant decision.  We settled on EXT4 for our underlying
 FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
 gain the speed benefits.  After all, I have 3 copies of the data.
 
 How much does this bother you, given we have a 21 node prod and only 10 node
 dev cluster.
 
 I'm embarrassed to say I did not capture good pre and post change I/O.  In
 my simple brain, not writing to journal just screams improved I/O.
 
 Don't be shy, tell me how badly I have done bad things. (I originally said
 screwed the pooch but I reconsidered our  USA audience. ;)
 
 If I'm not incredibly wrong, should we consider higher speed (less safe)
 file systems?
 
 Correct/support my thinking.
 Chris
 
 
 
 --
 Harsh J



Distributed Cache

2013-07-09 Thread Botelho, Andrew
Hi,

I was wondering if I can still use the DistributedCache class in the latest 
release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed 
cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI(file path in HDFS), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew


Re: Distributed Cache

2013-07-09 Thread Ted Yu
You should use Job#addCacheFile()


Cheers

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote:

 Hi,

 ** **

 I was wondering if I can still use the DistributedCache class in the
 latest release of Hadoop (Version 2.0.5).

 In my driver class, I use this code to try and add a file to the
 distributed cache:

 ** **

 import java.net.URI;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.filecache.DistributedCache;

 import org.apache.hadoop.fs.*;

 import org.apache.hadoop.io.*;

 import org.apache.hadoop.mapreduce.*;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 ** **

 Configuration conf = new Configuration();

 DistributedCache.addCacheFile(new URI(file path in HDFS), conf);

 Job job = Job.getInstance(); 

 …

 ** **

 However, I keep getting warnings that the method addCacheFile() is
 deprecated.

 Is there a more current way to add files to the distributed cache?

 ** **

 Thanks in advance,

 ** **

 Andrew



Re: Issues Running Hadoop 1.1.2 on multi-node cluster

2013-07-09 Thread Sree K
Siddharth,

The error msgs pointing to file system issues.  Make sure that the file system 
locations you specified in the config files are accurate and accessible.

-Sreedhar






 From: siddharth mathur sidh1...@gmail.com
To: user@hadoop.apache.org 
Sent: Tuesday, July 9, 2013 9:56 AM
Subject: Issues Running Hadoop 1.1.2 on multi-node cluster
 


Hi, 

I have installed Hadoop 1.1.2 on a 5 nodes cluster. I installed it watching 
this tutorial 
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
 

When I startup the hadoop, I get the folloing error in all the tasktrackers. 



2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding 
job_201307051203_0001 for user-log deletion with retainTimeStamp:1373472921775
2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding 
job_201307051611_0001 for user-log deletion with retainTimeStamp:1373472921775
2013-07-09 12:15:22,601 INFO org.apache.hadoop.mapred.TaskTracker:Failed to 
get system directory...
2013-07-09 12:15:25,164 INFO org.apache.hadoop.mapred.TaskTracker: Failed to 
get system directory...
2013-07-09 12:15:27,901 INFO org.apache.hadoop.mapred.TaskTracker: Failed to 
get system directory...
2013-07-09 12:15:30,144 INFO org.apache.hadoop.mapred.TaskTracker: Failed to 
get system directory...


But everything looks fine in the webUI. 


When I run a job, I get the following error but the job completes anyways. I 
haveattached the screenshots of the maptask failed error log in the UI.



13/07/09 12:29:37 INFO input.FileInputFormat: Total input paths to process : 2
13/07/09 12:29:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/09 12:29:37 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/09 12:29:37 INFO mapred.JobClient: Running job: job_201307091215_0001
13/07/09 12:29:38 INFO mapred.JobClient:  map 0% reduce 0%
13/07/09 12:29:41 INFO mapred.JobClient: Task Id : 
attempt_201307091215_0001_m_01_0, Status : FAILED
Error initializing attempt_201307091215_0001_m_01_0:
ENOENT: No such file or directory
    at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
    at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
    at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at 
org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240)
    at 
org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205)
    at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at 
org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306)
    at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221)
    at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2581)
    at java.lang.Thread.run(Thread.java:724)

13/07/09 12:29:41 WARN mapred.JobClient: Error reading task 
outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stdout
13/07/09 12:29:41 WARN mapred.JobClient: Error reading task 
outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stderr
13/07/09 12:29:45 INFO mapred.JobClient:  map 50% reduce 0%
13/07/09 12:29:53 INFO mapred.JobClient:  map 50% reduce 16%
13/07/09 12:30:38 INFO mapred.JobClient: Task Id : 
attempt_201307091215_0001_m_00_1, Status : FAILED
Error initializing attempt_201307091215_0001_m_00_1:
ENOENT: No such file or directory
    at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
    at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
    at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at 
org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240)
    at 
org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205)
    at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at 
org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306)
    at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221)
    at 

HiBench tool not running

2013-07-09 Thread Shah, Rahul1
Hi ,

I am running hibench on my Hadoop setup

Not able to initialize History viewer.

Caused by java.io.Exception: Not a valid history directory output/log/_history

I did not find much on the internet. Any idea what is going wrong.  My Hadoop 
cluster is running the terasort benchmark properly.

-Rahul



Re: Issues Running Hadoop 1.1.2 on multi-node cluster

2013-07-09 Thread Kiran Dangeti
Hi Siddharth,

While running the multi-node we need to take care of the local host of the
slave machine from the error messages the task tracker root directory not
able to get to the masters. Please check and rerun it.

Thanks,
Kiran


On Tue, Jul 9, 2013 at 10:26 PM, siddharth mathur sidh1...@gmail.comwrote:

 Hi,

 I have installed Hadoop 1.1.2 on a 5 nodes cluster. I installed it
 watching this tutorial *
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
 *

 When I startup the hadoop, I get the folloing error in *all* the
 tasktrackers.

 
 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner:
 Adding job_201307051203_0001 for user-log deletion with
 retainTimeStamp:1373472921775
 2013-07-09 12:15:22,301 INFO org.apache.hadoop.mapred.UserLogCleaner:
 Adding job_201307051611_0001 for user-log deletion with
 retainTimeStamp:1373472921775
 2013-07-09 12:15:22,601 INFO org.apache.hadoop.mapred.TaskTracker:*Failed to 
 get system directory
 *...
 2013-07-09 12:15:25,164 INFO org.apache.hadoop.mapred.TaskTracker: Failed
 to get system directory...
 2013-07-09 12:15:27,901 INFO org.apache.hadoop.mapred.TaskTracker: Failed
 to get system directory...
 2013-07-09 12:15:30,144 INFO org.apache.hadoop.mapred.TaskTracker: Failed
 to get system directory...
 

 *But everything looks fine in the webUI. *

 When I run a job, I get the following error but the job completes anyways.
 I have* attached the* *screenshots* of the maptask failed error log in
 the UI.

 **
 13/07/09 12:29:37 INFO input.FileInputFormat: Total input paths to process
 : 2
 13/07/09 12:29:37 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 13/07/09 12:29:37 WARN snappy.LoadSnappy: Snappy native library not loaded
 13/07/09 12:29:37 INFO mapred.JobClient: Running job: job_201307091215_0001
 13/07/09 12:29:38 INFO mapred.JobClient:  map 0% reduce 0%
 13/07/09 12:29:41 INFO mapred.JobClient: Task Id :
 attempt_201307091215_0001_m_01_0, Status : FAILED
 Error initializing attempt_201307091215_0001_m_01_0:
 ENOENT: No such file or directory
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
 at
 org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240)
 at
 org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205)
 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
 at
 org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306)
 at
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2581)
 at java.lang.Thread.run(Thread.java:724)

 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task
 outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stdout
 13/07/09 12:29:41 WARN mapred.JobClient: Error reading task
 outputhttp://dmkd-1:50060/tasklog?plaintext=trueattemptid=attempt_201307091215_0001_m_01_0filter=stderr
 13/07/09 12:29:45 INFO mapred.JobClient:  map 50% reduce 0%
 13/07/09 12:29:53 INFO mapred.JobClient:  map 50% reduce 16%
 13/07/09 12:30:38 INFO mapred.JobClient: Task Id :
 attempt_201307091215_0001_m_00_1, Status : FAILED
 Error initializing attempt_201307091215_0001_m_00_1:
 ENOENT: No such file or directory
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
 at
 org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240)
 at
 org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205)
 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1331)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
 at
 org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1306)
 at
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1221)
 at 

Re: Distributed Cache

2013-07-09 Thread Azuryy Yu
It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, test);
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path(your hdfs path).toUri(),
job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu yuzhih...@gmail.com wrote:

 You should use Job#addCacheFile()


 Cheers


 On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote:

 Hi,

 ** **

 I was wondering if I can still use the DistributedCache class in the
 latest release of Hadoop (Version 2.0.5).

 In my driver class, I use this code to try and add a file to the
 distributed cache:

 ** **

 import java.net.URI;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.filecache.DistributedCache;

 import org.apache.hadoop.fs.*;

 import org.apache.hadoop.io.*;

 import org.apache.hadoop.mapreduce.*;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 ** **

 Configuration conf = new Configuration();

 DistributedCache.addCacheFile(new URI(file path in HDFS), conf);

 Job job = Job.getInstance(); 

 …

 ** **

 However, I keep getting warnings that the method addCacheFile() is
 deprecated.

 Is there a more current way to add files to the distributed cache?

 ** **

 Thanks in advance,

 ** **

 Andrew





RE: can not start yarn

2013-07-09 Thread Devaraj k
Hi,

   Here NM is failing to connect to Resource Manager.

Have you started the Resource Manager successfully? Or Do you see any problem 
while starting Resource Manager in RM log..

If you have started the Resource Manager in different machine other than NM, 
you need to set this configuration for NM 
yarn.resourcemanager.resource-tracker.address with RM resource tracker 
address.


Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 10 July 2013 08:36
To: user@hadoop.apache.org
Subject: can not start yarn

i am testing mapreducev2 ,i find i start NM error

here is NM log content

2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is started.
2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: 
Service:Dispatcher is started.
2013-07-10 11:02:35,930 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to 
ResourceManager at /0.0.0.0:8031http://0.0.0.0:8031
2013-07-10 11:02:37,209 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:38,210 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 1 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 2 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:40,212 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 3 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:41,213 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 4 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:42,215 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 5 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:43,216 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 6 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:44,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 7 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:45,218 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 8 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:46,219 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031http://0.0.0.0/0.0.0.0:8031. Already tried 9 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:46,226 ERROR org.apache.hadoop.yarn.service.CompositeService: 
Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException: 
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)
Caused by: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:190)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
... 4 more
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
Call From 

How to configure Hive metastore (Mysql) for beeswax(Hive UI) in Clouera Manager

2013-07-09 Thread Ram
Hi,
I am using Cloudera Manager 4.1.2 not having hive as a service, so  I
was installed hive and configured mysql as metastore.  Using Cloudera
Manager i was installed HUE. In the Hue, Beeswax (Hive UI) which is using
by default derby database i want configure metastore same as what hive is
using i.e Mysql  and want to refer both hive and Beeswax will refer same
database and metastore.

I was changed the hive-site.xml file
in /var/run/cloudera-scm-agent/process/662-hue-HUE_SERVER/hive-conf
and /var/run/cloudera-scm-agent/process/663-hue-BEESWAX_SERVER/hive-conf
but beeswax is not pointing to metastore (mysql) and restarting hue service
every time creating new configuration file by cloudera manager.

Any suggestions where to do configuration changes. Thanks in advance.

From,
Ramesh Babu,


stop-dfs.sh does not work

2013-07-09 Thread YouPeng Yang
Hi users.

I start my HDFS by using :start-dfs.sh. And add the node start
successfully.
However the stop-dfs.sh dose not work when I want to stop the HDFS.
It shows : no namdenode to stop
   no datanode to stop.

I have to stop it by the command: kill -9 pid.


So I wonder that how the stop-dfs.sh does not  work no longer?


Best regards


Re: stop-dfs.sh does not work

2013-07-09 Thread rozartharigopla
You can try the following

Sudo netstat -plten | grep java 

This will give you all the java process which have a socket connection open.

You can easily figure out based on the port no you have mentioned in config 
files like core-site.xml and kill the process

Thanks  Regards,
Deepak Rosario Pancras Tharigopla.
Achiever/Responsibility/Arranger/Maximizer/Harmony

Sent from my iPhone

On Jul 10, 2013, at 12:30 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote:

 Hi users.
 
 I start my HDFS by using :start-dfs.sh. And add the node start 
 successfully.
 However the stop-dfs.sh dose not work when I want to stop the HDFS.
 It shows : no namdenode to stop 
no datanode to stop.
 
 I have to stop it by the command: kill -9 pid.
 
 
 So I wonder that how the stop-dfs.sh does not  work no longer?
 
 
 Best regards