Re: New bee quick questions :-)

2008-04-21 Thread Luca

vikas wrote:

Hi,

I'm new to HADOOP. And aiming to develop good amount of code with it. I've
some quick questions it would be highly appreciable if some one can answer
them.

I was able to run HADOOP in cygwin environment. run the examples both in
standalone mode as well as in a 2 node cluster.

1) How can I over come the difficulty of giving password for SSH logins when
ever DataNodes are getting started.



Creating SSH keys (either user or host based) and pairing the hosts with 
these keys. This is the first URL I got for a ssh public key 
authentication): http://sial.org/howto/openssh/publickey-auth/



2) I've put some 1.5 GB of file in my Master node where even a DataNode is
running. I want to see how load balancing can be done so that disk space
will be utilized even from other datanodes.



Not sure how to answer to this one. HDFS has knowledge of three 
entities: the node, the rack, and the rest. In the default 
configuration, each block is replicated 3 times, one for each entity. If 
you don't have racks and so you might want to fine tune replication of 
files through HDFS shell.



3) How can I add a new DataNode without stopping HADOOP.


Just add it to the slaves and run start-dfs.sh. Already running nodes 
won't be touched.




4) Let us suppose I want to shutdown one datanode for maintenance  purpose.
is there any way to inform Hadoop saying that this particular datanode is
going done -- please make sure the data in it is replicated else where ?



Replication of blocks with a factor = 2 should do the job. In the 
general case, default replication is 3. You can check the replication 
factor through HDFS shell.



5) I was going through some videos on MAP-Reduce and few Yahoo tech talks.
in that they were specifying a Hadoop cluster has multiple cores -- what
does this mean ?



Are you talking about multi-core processors?


  5.1) can I have multiple instance of namenodes running in a cluster apart
from secondary nodes ?



Not sure on this, but as far as I know there's only one namenode that 
should be running.



6) If I go on create huge files will they be balanced among all the
datanodes ? or do I need to change the creation of file location in the
application.



Files are divided in blocks. Then blocks are replicated. Huge files are 
simply composed by a larger set of blocks. In principle, you don't know 
where your blocks will end up, apart from the entities I mentioned 
before. And in principle, you shouldn't care about where they end with, 
because Hadoop applications will take care of sending tasks where the 
data reside.


Ciao,
Luca



Re: Lease expired on open file

2008-04-18 Thread Luca

dhruba Borthakur wrote:

The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on unzipping.

Thanks,
dhruba



Thanks Dhruba,
	with your suggestion and a small Sleep() every block (more or less), it 
worked perfectly. Good hint!


Ciao,
Luca


-Original Message-
From: Luca Telloli [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 16, 2008 9:43 AM

To: core-user@hadoop.apache.org
Subject: Lease expired on open file

Hello everyone,
I wrote a small application that directly gunzip files from a
local 
filesystem to an installation of HDFS, writing on a FSDataOutputStream. 
Nevertheless, while expanding a very big file, I got this exception:


org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
/user/luca/testfile File is not open for writing. [Lease.  Holder: 44 46


53 43 6c 69 65 6e 74 5f 2d 31 39 31 34 34 39 36 31 34 30, heldlocks: 0, 
pendingcreates: 1]


I wonder what the cause would be for this Exception and if there's a way

  to know the default lease for a file and to possibly prolongate it.

Ciao,
Luca






Lease expired on open file

2008-04-16 Thread Luca Telloli

Hello everyone,
	I wrote a small application that directly gunzip files from a local 
filesystem to an installation of HDFS, writing on a FSDataOutputStream. 
Nevertheless, while expanding a very big file, I got this exception:


org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
/user/luca/testfile File is not open for writing. [Lease.  Holder: 44 46 
53 43 6c 69 65 6e 74 5f 2d 31 39 31 34 34 39 36 31 34 30, heldlocks: 0, 
pendingcreates: 1]


I wonder what the cause would be for this Exception and if there's a way 
 to know the default lease for a file and to possibly prolongate it.


Ciao,
Luca


Re: Using NFS without HDFS

2008-04-11 Thread Luca

slitz wrote:

I've read in the archive that it should be possible to use any distributed
filesystem since the data is available to all nodes, so it should be
possible to use NFS, right?
I've also read somewere in the archive that this shoud be possible...



As far as I know, you can refer to any file on a mounted file system 
(visible from all compute nodes) using the prefix file:// before the 
full path, unless another prefix has been specified.


Cheers,
Luca



slitz


On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED]
wrote:


Hello ,

To execute Hadoop Map-Reduce job input data should be on HDFS not on
NFS.

Thanks

---
Peeyush



On Fri, 2008-04-11 at 12:40 +0100, slitz wrote:


Hello,
I'm trying to assemble a simple setup of 3 nodes using NFS as

Distributed

Filesystem.

Box A: 192.168.2.3, this box is either the NFS server and working as a

slave

node
Box B: 192.168.2.30, this box is only JobTracker
Box C: 192.168.2.31, this box is only slave

Obviously all three nodes can access the NFS shared, and the path to the
share is /home/slitz/warehouse in all three.

My hadoop-site.xml file were copied over all nodes and looks like this:

configuration

property

namefs.default.name/name

 valuelocal/value

description

 The name of the default file system. Either the literal string

local or a host:port for NDFS.

 /description

/property

 property

namemapred.job.tracker/name

 value192.168.2.30:9001/value

description

 The host and port that the MapReduce job

tracker runs at. If local, then jobs are

 run in-process as a single map and reduce task.

/description

 /property

property

namemapred.system.dir/name

 value/home/slitz/warehouse/hadoop_service/system/value

descriptionomgrotfcopterlol./description

 /property

/configuration


As one can see, i'm not using HDFS at all.
(Because all the free space i have is located in only one node, so using
HDFS would be unnecessary overhead)

I've copied the input folder from hadoop to /home/slitz/warehouse/input.
When i try to run the example line

bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/
/home/slitz/warehouse/output 'dfs[a-z.]+'

the job starts and finish okay but at the end i get this error:

org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist

:

/home/slitz/hadoop-0.15.3/grep-temp-141595661
at


org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...the error stack continues...)

i don't know why the input path being looked is in the local path
/home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)

Maybe something is missing in my hadoop-site.xml?



slitz







[HOD] Collecting MapReduce logs

2008-03-07 Thread Luca

Hello everyone,
	I wonder what is the meaning of hodring.log-destination-uri versus 
hodring.log-dir. I'd like to collect MapReduce UI logs after a job has 
been run and the only attribute seems to be hod.hadoop-ui-log-dir, in 
the hod section.


With that attribute specified, logs are all grabbed in that directory, 
producing a large amount of html files. Is there a way to collect them, 
maybe as a .tar.gz, in a place somewhere related to the user?


Additionally, how do administrators specify variables in these values? 
Which interpreter interprets them? For instance, variables specified in 
a bash fashion like $USER in section hodring or ringmaster work well (I 
guess they are interpreted by bash itself) but if specified in the hod 
section they're not: I tried with

[hod]
hadoop-ui-log-dir=/somedir/$USER
but any hod command fails displaying an error on that line.

Cheers,
Luca



[HOD] Example script

2008-03-05 Thread Luca

Hello everyone,
in http://hadoop.apache.org/core/docs/r0.16.0/hod.html there is no
mention of scripting abilities of HOD, but with version 0.3 I was used
to do something like:

hod -m N -a run jar foo.jar jar-parameters

now I see that in 0.4.0 -a refers to logs-collection and option -z
refers to script.

I wrote a two-lines bash script with this code in it:

#!/bin/bash
hadoop --config /home/luca/hod-test jar
/mnt/scratch/hadoop/hadoop-0.16.0-examples.jar wordcount
file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out

which is launched right after the allocation, but gets stuck on some

08/03/05 12:43:14 INFO ipc.Client: Retrying connect to server:
svr3.com/10.78.36.204:56895. Already tried 10 time(s).
java.net.ConnectException: Connection refused

The script was launched with command:

hod -m 3 -z test-script.sh

Noticeably, if I from the shell I run manually:
$ hod -o allocate . 4
$ hadoop --config . jar /mnt/scratch/hadoop/hadoop-0.16.0-examples.jar
wordcount file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out
$ hod -o deallocate .

everything works well, so I'm pretty sure I'm doing a mistake with the
script, but I didn't find any example to work on. I'm not sure what's
changing in regard to ports among the two invocations.

Cheers,
Luca




[HOD] Port for MapReduce UI interface

2008-02-26 Thread Luca

Hello everyone,
although in my configuration I specified:

[hod]
xrs-port-range  = 1-11000
http-port-range = 1-11000


the Mapred UI is chosen outside this range. This will be a problem with 
my currently firewall settings. Any hint on how to solve this? Did I 
forget some additional parameter?


Cheers,
Luca



[HOD] hdfs:///mapredsystem directory

2008-02-26 Thread Luca Telloli

Hello everyone,
	I have a problem with directory hdfs:///mapredsystem and I'm not sure 
if it's a bug or my fault.


Not sure if this influences what follows, but I have two users, one is 
hadoop who has sudo privileges on all the nodes, the other one is 
luca, who has normal privileges.


I see that each submitted job creates a hdfs:///mapredsystem directory, 
created by (I guess) the hodring process. Problem is that it's not 
cleaned up at the end of the process; for instance a use case would be:


- user hadoop allocates a cluster, the ringmaster is svr3, so a 
/mapredsystem/svr3 directory is created


- user hadoop deallocates the cluster, but that directory is not cleaned up

- user luca allocates a cluster, and the first node chosen as ringmaster 
is svr3, so hodring tries to write hdfs:///mapredsystem but it fails


- allocation succeeds, but there's no hodring running; looking at
0-jobtracker/logdir/hadoop.log under the temporary directory I can read:

2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error 
starting tracker: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.fs.permission.AccessControlException: Permission 
denied: user=luca, access=WRITE, 
inode=mapredsystem:hadoop:supergroup:rwxr-xr-x


Am I doing anything wrong?
Cheers,
Luca


Re: [HOD] hdfs:///mapredsystem directory

2008-02-26 Thread Luca Telloli

Hi Mahadev,
	I'm not sure the workaround can solve the problem, because it appears 
that a subdirectory is created under that directory with the name of the 
hodring host. So if the next allocation made by a different user chooses 
the same host, the permission problem might show up again. Unless 
obviously each time the user who allocated resources deletes that 
directory during deallocation, which would be a solution for this problem.


I filed a bug: HADOOP-2899

Cheers,
Luca

Mahadev Konar wrote:

Hi Luca,
  This seems like a bug. The JobTracker process tries to create this
directory if this does nto exist. And if you have two different users
running hod clusters they will both try to create this directory and
since only one succeeds, with permissions the directory is owned by the
user who created the cluster first. 

A work around to this solution is to create 

Hdfs:///mapredsystem manually and make it world writable. 


Please open a bug regarding this issue.

Regards
Mahadev


-Original Message-
From: Luca Telloli [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 26, 2008 9:06 AM
To: core-user@hadoop.apache.org
Subject: [HOD] hdfs:///mapredsystem directory

Hello everyone,
I have a problem with directory hdfs:///mapredsystem and I'm not
sure
if it's a bug or my fault.

Not sure if this influences what follows, but I have two users, one is
hadoop who has sudo privileges on all the nodes, the other one is
luca, who has normal privileges.

I see that each submitted job creates a hdfs:///mapredsystem

directory,

created by (I guess) the hodring process. Problem is that it's not
cleaned up at the end of the process; for instance a use case would

be:

- user hadoop allocates a cluster, the ringmaster is svr3, so a
/mapredsystem/svr3 directory is created

- user hadoop deallocates the cluster, but that directory is not

cleaned

up

- user luca allocates a cluster, and the first node chosen as

ringmaster

is svr3, so hodring tries to write hdfs:///mapredsystem but it fails

- allocation succeeds, but there's no hodring running; looking at
0-jobtracker/logdir/hadoop.log under the temporary directory I can

read:

2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker:

Error

starting tracker: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied: user=luca, access=WRITE,
inode=mapredsystem:hadoop:supergroup:rwxr-xr-x

Am I doing anything wrong?
Cheers,
Luca






Re: Problems running a HOD test cluster

2008-02-25 Thread Luca

Allen Wittenauer wrote:


[2008-02-21 19:46:11,014] ERROR/40 torque:96 - qstat error: exit code:
153 | signal: False | core False
[2008-02-21 19:46:11,017] INFO/20 hadoop:451 - Ringmaster at : None.


I bet your ringmaster didn't come up.  Check which nodes were allocated
to your job via qstat -f.  Chances are good the first one is the ringmaster
node.  Check the torque logs, syslogs, and the hod log dir for hints as to
what happened.


So it was a ringmaster related problem, and it's now solved. Now another 
problem: I try to run an hadoop job but I get a timeout error from 
hadoop in trying to get the input path. I guess this might be related to 
the fact that I'm using an external HDFS already running, and I'm not 
sure how to hook hod with it.


I configured

[gridservice-hdfs]
external= True
host= mane-of-my-server.com
pkgs= /mnt/scratch/grid/hadoop/current
fs_port = 10010
info_port   = 10007

but I'm not sure whether the actual fs_port is on 10010. Does anybody 
know what's the attribute in hadoop configuratioon where this value is 
specified? If not specified, there is a default/random value? Finally, 
is this a global attribute, or a node attribute? (like node A having 
some fs_port and node B having a different one).


Thanks in advance,
Luca


Problems running a HOD test cluster

2008-02-21 Thread Luca
.server.com
[2008-02-21 19:45:58,856] INFO/20 hadoop:447 - Hod Job successfully 
submitted. JobId : 207.server.com.
[2008-02-21 19:46:08,967] DEBUG/10 torque:87 - /usr/bin/qstat -f -1 
207.server.com
[2008-02-21 19:46:11,014] ERROR/40 torque:96 - qstat error: exit code: 
153 | signal: False | core False

[2008-02-21 19:46:11,017] INFO/20 hadoop:451 - Ringmaster at : None.
[2008-02-21 19:46:11,021] INFO/20 hadoop:530 - Cleaning up job id 
207.server.com, as cluster could not be allocated.

[2008-02-21 19:46:11,025] DEBUG/10 torque:131 - /usr/bin/qdel 207.server.com
[2008-02-21 19:46:13,079] CRITICAL/50 hod:253 - Cannot allocate cluster 
/mnt/scratch/grid/test

[2008-02-21 19:46:13,940] DEBUG/10 hod:391 - return code: 6


$ cat hod/conf/hodrc
[hod]
stream  = True
java-home   = /usr/java/jdk1.6.0_04
cluster = HOD
cluster-factor  = 1.8
xrs-port-range  = 1-11000
debug   = 4
allocate-wait-time  = 3600
temp-dir= /tmp/hod
log-dir = /mnt/scratch/grid/hod/logs

[ringmaster]
register= True
stream  = False
temp-dir= /tmp/hod
http-port-range = 1-11000
work-dirs   = /tmp/hod/1,/tmp/hod/2
xrs-port-range  = 1-11000
debug   = 4

[hodring]
stream  = False
temp-dir= /tmp/hod
register= True
java-home   = /usr/java/jdk1.6.0_04
http-port-range = 1-11000
xrs-port-range  = 1-11000
debug   = 4

[resource_manager]
queue   = hadoop
batch-home  = /usr
id  = torque
env-vars= HOD_PYTHON_HOME=/usr/bin/python2.5

[gridservice-mapred]
external= False
pkgs= /mnt/scratch/grid/hadoop/current
tracker_port= 10003
info_port   = 10008

[gridservice-hdfs]
external= True
pkgs= /mnt/scratch/grid/hadoop/current
fs_port = 10007
info_port   = 10009

Cheers,
Luca