Re: MapReduce job temp input files

2014-10-29 Thread Tang

Hi,
I also see this on the webUI:
Number of Blocks Pending Deletion: 1

how to delete the invalidate blocks immediately without restart cluster.

Thanks
Tang

On 2014/10/29 13:11:28, Tang shawndow...@gmail.com wrote:
hi,

We are running mapreduce jobs on hadoop clusters. The job inputs come from logs 
which are not in HDFS. so we need first copy them into HDFS. After job 
finished, delete them. 
Recently the cluster become very unstable. the HDFS disk are prone to full. in 
fact total valid files are only several Gega bytes. many invalid blocks are on 
the disk. After reboot the cluster,

they are deleted automatically. It seems that restart datanode only willn't 
work, the namenode willn't send delete block command to datanode. 

for this case, any ideas?

Regards
Tang

[HDFS] result order of getFileBlockLocations() and listFiles()?

2014-10-29 Thread Demai Ni
hi, Guys,

I am trying to implement a simple program(that is not for production,
experimental). And invoke FileSystem.listFiles() to get a list of files
under a hdfs folder, and then use FileSystem.getFileBlockLocations() to get
replica locations of each file/blocks.

Since it is a controlled environment, I can make sure the files are static
and don't worry about datanode crash, fail-over, etc.

Assuming at a small time-window(say, 1 minute), I have 100~1000s client
invoke the same program to look up the same folder. Will the above two APIs
guarantee *same result in the same order* for all clients?

To elaborate a bit more, say there is a folder called /dfs/dn/user/data
contains three files: file1, file2, and file3.  If client1 gets:
listFiles() : file1,file2,file3
getFileBlockLocation(file1) - datanode1, datanode3, datanode6

Will all other clients get the same information(I think so) and in the same
order?  or I have to do a sort by each client to guarantee the order?

Many thanks for your inputs

Demai


Re: run arbitrary job (non-MR) on YARN ?

2014-10-29 Thread Kevin
You can accomplish this by using the DistributedShell application that
comes with YARN.

If you copy all your archives to HDFS, then inside your shell script you
could copy those archives to your YARN container and then execute whatever
you want, provided all the other system dependencies exist in the container
(correct Java version, Python, C++ libraries, etc.)

For example,

In myscript.sh I wrote the following:

#!/usr/bin/env bash
echo This is my script running!
echo Present working directory:
pwd
echo Current directory listing: (nothing exciting yet)
ls
echo Copying file from HDFS to container
hadoop fs -get /path/to/some/data/on/hdfs .
echo Current directory listing: (file should not be here)
ls
echo Cat ExecScript.sh (this is the script created by the DistributedShell
application)
cat ExecScript.sh

Run the DistributedShell application with the hadoop (or yarn) command:

hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.3.0-cdh5.1.3.jar
-num_containers 1 -shell_script myscript.sh

If you have the YARN log aggregation property set, then you can pipe the
container's logs to your client console using the yarn command:

yarn logs -applicationId application_1414160538995_0035

(replace the application id with yours)

Here is a quick reference that should help get you going:
http://books.google.com/books?id=heoXAwAAQBAJpg=PA227lpg=PA227dq=hadoop+yarn+distributed+shell+applicationsource=blots=psGuJYlY1Ysig=khp3b3hgzsZLZWFfz7GOe2yhgyYhl=ensa=Xei=0U5RVKzDLeTK8gGgoYGoDQved=0CFcQ6AEwCA#v=onepageqf=false

Hopefully this helps,
Kevin

On Mon Oct 27 2014 at 2:21:18 AM Yang tedd...@gmail.com wrote:

 I happened to run into this interesting scenario:

 I had some mahout seq2sparse jobs, originally i run them in parallel using
 the distributed mode. but because the input files are so small, running
 them locally actually is much faster. so I truned them to local mode.

 but I run 10 of these jobs in parallel, so when 10 mahout jobs are run
 together, everyone became very slow.

 is there an existing code that takes a desired shell script, and possibly
 some archive files (could contain the jar file, or C++ --generated
 executable code). I understand that I could use yarn API to code such a
 thing, but it would be nice if I could just take it and run in shell..

 Thanks
 Yang



Fwd: problems with Hadoop instalation

2014-10-29 Thread David Novogrodsky
All,

I am new to Hadoop so any help would be appreciated.

I have a question for the mailing list regarding Hadoop.  I have installed
the most recent stable version (2.4.1) on a virtual machine running CentOS
7.  I have tried to run this command
%Hadoop -fs ls but without success.

​The question is, what does Hadoop consider a valid JAVA_HOME directory?
And where should the JAVA_HOME directory variable be defined?  I installed
Java using the package manager yum.  I installed the most recent version,
detailed below.​


​T​
his is in my .bashrc file
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64


[david@localhost ~]$ hadoop fs -ls
/usr/local/hadoop/bin/hadoop: line 133:
/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java: No such file or directory


then I tried this value for JAVA_HOME
​ in my .bashrc file.
/usr/bin/Java.
​
[david@localhost ~]$ which java
/usr/bin/java
[david@localhost ~]$ java -version
java version 1.7.0_71
OpenJDK Runtime Environment (rhel-2.5.3.1.el7_0-x86_64 u71-b14)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

here is the result:
[david@localhost ~]$hadoop fs -ls
/usr/local/hadoop/bin/hadoop: line 133: /usr/bin/java/bin/java: Not a
directory
/usr/local/hadoop/bin/hadoop: line 133: exec: /usr/bin/java/bin/java:
cannot execute: Not a directory

David Novogrodsky


Re: problems with Hadoop instalation

2014-10-29 Thread Alexander Pivovarov
Are RHEL7 based OSs supported?


On Wed, Oct 29, 2014 at 3:59 PM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 All,

 I am new to Hadoop so any help would be appreciated.

 I have a question for the mailing list regarding Hadoop.  I have installed
 the most recent stable version (2.4.1) on a virtual machine running CentOS
 7.  I have tried to run this command
 %Hadoop -fs ls but without success.

 ​The question is, what does Hadoop consider a valid JAVA_HOME directory?
 And where should the JAVA_HOME directory variable be defined?  I installed
 Java using the package manager yum.  I installed the most recent version,
 detailed below.​


 ​T​
 his is in my .bashrc file
 # The java implementation to use.
 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64


 [david@localhost ~]$ hadoop fs -ls
 /usr/local/hadoop/bin/hadoop: line 133:
 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java: No such file or directory


 then I tried this value for JAVA_HOME
 ​ in my .bashrc file.
 /usr/bin/Java.
 ​
 [david@localhost ~]$ which java
 /usr/bin/java
 [david@localhost ~]$ java -version
 java version 1.7.0_71
 OpenJDK Runtime Environment (rhel-2.5.3.1.el7_0-x86_64 u71-b14)
 OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

 here is the result:
 [david@localhost ~]$hadoop fs -ls
 /usr/local/hadoop/bin/hadoop: line 133: /usr/bin/java/bin/java: Not a
 directory
 /usr/local/hadoop/bin/hadoop: line 133: exec: /usr/bin/java/bin/java:
 cannot execute: Not a directory

 David Novogrodsky





Re: problems with Hadoop instalation

2014-10-29 Thread Bhooshan Mogal
HI David,

JAVA_HOME should point to the java installation directory. Typically, this
directory will contain a subdirectory called 'bin'. Hadoop tries to find
the java command in $JAVA_HOME/bin/java.

It is likely that /usr/bin/java is a symlink to some other file. If you do
an ls -l /usr/bin/java, you should be able to see where that symlink points
to. If the symlink points to a path that is of the form
base_dir/bin/java, then base_dir should be the value of JAVA_HOME.

HTH,
Bhooshan

On Wed, Oct 29, 2014 at 3:59 PM, David Novogrodsky 
david.novogrod...@gmail.com wrote:

 All,

 I am new to Hadoop so any help would be appreciated.

 I have a question for the mailing list regarding Hadoop.  I have installed
 the most recent stable version (2.4.1) on a virtual machine running CentOS
 7.  I have tried to run this command
 %Hadoop -fs ls but without success.

 ​The question is, what does Hadoop consider a valid JAVA_HOME directory?
 And where should the JAVA_HOME directory variable be defined?  I installed
 Java using the package manager yum.  I installed the most recent version,
 detailed below.​


 ​T​
 his is in my .bashrc file
 # The java implementation to use.
 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64


 [david@localhost ~]$ hadoop fs -ls
 /usr/local/hadoop/bin/hadoop: line 133:
 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java: No such file or directory


 then I tried this value for JAVA_HOME
 ​ in my .bashrc file.
 /usr/bin/Java.
 ​
 [david@localhost ~]$ which java
 /usr/bin/java
 [david@localhost ~]$ java -version
 java version 1.7.0_71
 OpenJDK Runtime Environment (rhel-2.5.3.1.el7_0-x86_64 u71-b14)
 OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

 here is the result:
 [david@localhost ~]$hadoop fs -ls
 /usr/local/hadoop/bin/hadoop: line 133: /usr/bin/java/bin/java: Not a
 directory
 /usr/local/hadoop/bin/hadoop: line 133: exec: /usr/bin/java/bin/java:
 cannot execute: Not a directory

 David Novogrodsky





RE: problems with Hadoop instalation

2014-10-29 Thread Henry Hung
Try to add “/” at the end of hadoop fs -ls

So it will become
Hadoop fs -ls /

From: David Novogrodsky [mailto:david.novogrod...@gmail.com]
Sent: Thursday, October 30, 2014 7:00 AM
To: user@hadoop.apache.org
Subject: Fwd: problems with Hadoop instalation

All,

I am new to Hadoop so any help would be appreciated.

I have a question for the mailing list regarding Hadoop.  I have installed the 
most recent stable version (2.4.1) on a virtual machine running CentOS 7.  I 
have tried to run this command
%Hadoop -fs ls but without success.

​The question is, what does Hadoop consider a valid JAVA_HOME directory?  And 
where should the JAVA_HOME directory variable be defined?  I installed Java 
using the package manager yum.  I installed the most recent version, detailed 
below.​

​T​
his is in my .bashrc file
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64


[david@localhost ~]$ hadoop fs -ls
/usr/local/hadoop/bin/hadoop: line 133: 
/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java: No such file or directory


then I tried this value for JAVA_HOME
​ in my .bashrc file.
/usr/bin/Java.
​
[david@localhost ~]$ which java
/usr/bin/java
[david@localhost ~]$ java -version
java version 1.7.0_71
OpenJDK Runtime Environment (rhel-2.5.3.1.el7_0-x86_64 u71-b14)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

here is the result:
[david@localhost ~]$hadoop fs -ls
/usr/local/hadoop/bin/hadoop: line 133: /usr/bin/java/bin/java: Not a directory
/usr/local/hadoop/bin/hadoop: line 133: exec: /usr/bin/java/bin/java: cannot 
execute: Not a directory

David Novogrodsky



The privileged confidential information contained in this email is intended for 
use only by the addressees as indicated by the original sender of this email. 
If you are not the addressee indicated in this email or are not responsible for 
delivery of the email to such a person, please kindly reply to the sender 
indicating this fact and delete all copies of it from your computer and network 
server immediately. Your cooperation is highly appreciated. It is advised that 
any unauthorized use of confidential information of Winbond is strictly 
prohibited; and any information in this email irrelevant to the official 
business of Winbond shall be deemed as neither given nor endorsed by Winbond.