failed or killed...

2010-04-23 Thread Pierre ANCELOT
Just wondering about the exact difference between a task declared "failed"
and a task declared "killed"
Because of a typo, we had a node have WAY more maps it could ever handle.,
it ran out of memory and got almost totally unresponsive.
Some tasks failed, some were killed and I've been wondering about the data
integrity, if all tasks would end up to get done on different nodes anyways.
Also, is there a way in the middle of a job to manually blacklist a node?

Thank you.

Pierre ANCELOT.

-- 
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"


Re: Try to mount HDFS

2010-04-23 Thread Brian Bockelman
Hm, ok, now you have me stumped.

One last hunch - can you include the port information, but also switch to port 
9000?

Additionally, can you do the following:

1) In /var/log/messages and copy out the hdfs/fuse-related messages and post 
them
2) Using the hadoop clients do,
hadoop fs -ls /

Brian

On Apr 23, 2010, at 12:33 AM, Christian Baun wrote:

> Hi,
> 
> When adding the port information inside core-site.xml, the problem remains:
> 
>   
>   fs.default.name
>   
> hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020
>   true
>   
> 
> # ./fuse_dfs_wrapper.sh dfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020 
> /mnt/hdfs/ 
> port=8020,server=ec2-75-101-210-65.compute-1.amazonaws.com
> fuse-dfs didn't recognize /mnt/hdfs/,-2
> 
> # ls /mnt/hdfs
> ls: cannot access /mnt/hdfs/®1: No such file or directory
> 
> Best Regards,
>   Christian
> 
> 
> Am Freitag, 23. April 2010 schrieb Christian Baun:
>> Hi Brian,
>> 
>> this is inside my core-site.xml 
>> 
>> 
>>  
>>  fs.default.name
>>  hdfs://ec2-75-101-210-65.compute-1.amazonaws.com/
>>  true
>>  
>>  
>>  hadoop.tmp.dir
>>  /mnt
>>  A base for other temporary 
>> directories.
>>  
>> 
>> 
>> Do I need to give the port here? 
>> 
>> this is inside my hdfs-site.xml
>> 
>> 
>>  
>>  dfs.name.dir
>>  ${hadoop.tmp.dir}/dfs/name
>>  true
>>  
>>  
>>  dfs.data.dir
>>  ${hadoop.tmp.dir}/dfs/data
>> 
>> 
>>  
>>  fs.checkpoint.dir
>>  ${hadoop.tmp.dir}/dfs/namesecondary
>>  true
>>  true
>>  
>> 
>> 
>> These directories do all exist
>> 
>> # ls -l /mnt/dfs/
>> total 12
>> drwxr-xr-x 2 hadoop hadoop 4096 2010-04-23 05:08 data
>> drwxr-xr-x 4 hadoop hadoop 4096 2010-04-23 05:17 name
>> drwxr-xr-x 2 hadoop hadoop 4096 2010-04-23 05:08 namesecondary
>> 
>> I don't have the config file hadoop-site.xml in /etc/...
>> In the source directory of hadoop I have a hadoop-site.xml but with this 
>> information
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Best Regards,
>>   Christian 
>> 
>> 
>> 
>> Am Freitag, 23. April 2010 schrieb Brian Bockelman:
>>> Hey Christian,
>>> 
>>> I've run into this before.
>>> 
>>> Make sure that the hostname/port you give to fuse is EXACTLY the same as 
>>> listed in hadoop-site.xml.
>>> 
>>> If these aren't the same text string (including the ":8020"), then you get 
>>> those sort of issues.
>>> 
>>> Brian
>>> 
>>> On Apr 22, 2010, at 5:00 AM, Christian Baun wrote:
>>> 
 Dear All,
 
 I want to test HDFS inside Amazon EC2.
 
 Two Ubuntu instances are running inside EC2. 
 One server is namenode and jobtracker. The other server is the datanode.
 Cloudera (hadoop-0.20) is installed and running.
 
 Now, I want to mount HDFS.
 I tried to install contrib/fuse-dfs as described here:
 http://wiki.apache.org/hadoop/MountableHDFS
 
 The compilation worked via:
 
 # ant compile-c++-libhdfs -Dlibhdfs=1
 # ant package -Djava5.home=/usr/lib/jvm/java-1.5.0-sun-1.5.0.06/ 
 -Dforrest.home=/home/ubuntu/apache-forrest-0.8/
 # ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
 
 But now, when I try to mount the filesystem:
 
 # ./fuse_dfs_wrapper.sh 
 dfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020 /mnt/hdfs/ -d
 port=8020,server=ec2-75-101-210-65.compute-1.amazonaws.com
 fuse-dfs didn't recognize /mnt/hdfs/,-2
 fuse-dfs ignoring option -d
 FUSE library version: 2.8.1
 nullpath_ok: 0
 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
 INIT: 7.13
 flags=0x007b
 max_readahead=0x0002
  INIT: 7.12
  flags=0x0011
  max_readahead=0x0002
  max_write=0x0002
  unique: 1, success, outsize: 40
 
 
 # ./fuse_dfs_wrapper.sh 
 dfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020 /mnt/hdfs/
 port=8020,server=ec2-75-101-210-65.compute-1.amazonaws.com
 fuse-dfs didn't recognize /mnt/hdfs/,-2
 
 # ls /mnt/hdfs/
 ls: reading directory /mnt/hdfs/: Input/output error
 # ls /mnt/hdfs/
 ls: cannot access /mnt/hdfs/o¢: No such file or directory
 o???
 # ls /mnt/hdfs/
 ls: reading directory /mnt/hdfs/: Input/output error
 # ls /mnt/hdfs/
 ls: cannot access /mnt/hdfs/`á›Óÿ: No such file or directory
 `?
 # ls /mnt/hdfs/
 ls: reading directory /mnt/hdfs/: Input/output error
 ...
 
 
 What can I do at this point?
 
 Thanks in advance
Christian
>>> 
>>> 
>> 
>> 
> 



smime.p7s
Description: S/MIME cryptographic signature


data node stops on slave

2010-04-23 Thread Muhammad Mudassar
Hi

I am following tutorial running hadoop on ubuntu linux (multinode cluster)
*
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
*
for  configuring 2 node cluster but i am facing problem data node on slave
machine goes down after some time here I am sending log file of datanode on
slave machine and log file of namenode at master machine kindly help me to
solve the issue.

*Log file of data node on slave machine*

2010-04-23 17:37:17,690 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop-desktop/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
2010-04-23 17:37:19,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/10.3.31.221:54310. Already tried 0 time(s).
2010-04-23 17:37:25,303 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
FSDatasetStatusMBean
2010-04-23 17:37:25,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2010-04-23 17:37:25,307 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is
1048576 bytes/s
2010-04-23 17:37:30,777 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 50075
2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50075
webServer.getConnectors()[0].getLocalPort() returned 50075
2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50075
2010-04-23 17:37:30,833 INFO org.mortbay.log: jetty-6.1.14
2010-04-23 17:37:31,242 INFO org.mortbay.log: Started
selectchannelconnec...@0.0.0.0:50075
2010-04-23 17:37:31,279 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2010-04-23 17:37:36,608 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=DataNode, port=50020
2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50020: starting
2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 50020: starting
2010-04-23 17:37:36,611 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 50020: starting
2010-04-23 17:37:36,611 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
DatanodeRegistration(hadoop-desktop:50010,
storageID=DS-463609775-127.0.1.1-50010-1271833984369, infoPort=50075,
ipcPort=50020)
2010-04-23 17:37:36,639 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.3.31.220:50010, storageID=DS-463609775-127.0.1.1-50010-1271833984369,
infoPort=50075, ipcPort=50020)In DataNode.run, data =
FSDataset{dirpath='/home/hadoop/Desktop/dfs/datahadoop/dfs/data/current'}
2010-04-23 17:37:36,639 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL
of 360msec Initial delay: 0msec
2010-04-23 17:37:36,653 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 17 blocks
got processed in 6 msecs
2010-04-23 17:37:36,665 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block
scanner.
2010-04-23 17:37:39,641 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action:
DNA_REGISTER
2010-04-23 17:37:42,645 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
10.3.31.220:50010 is attempting to report storage ID
DS-463609775-127.0.1.1-50010-1271833984369. Node 10.3.31.221:50010 is
expected to serve this storage.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3920)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:2891)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:715)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.

Hadoop Log Collection

2010-04-23 Thread Patrick Datko
Hey everyone,

i deal with hadoop since a few weeks to build up a cluster with hdfs. I
was looking for several Monitoring tools to observe my cluster and find
a good solution with ganglia+nagios. To complete the monitoring part of
the cluster, i am looking for an Log collection tool, which store the
log files of the nodes centralized. I have tested Chukwa and Facebook's
Scribe, but both are not that type of simple storing log files, in my
opinion they are too big, only for such a job. 

So i've thinking about writing an own LogCollector. I didn't want
something special. My idea is, to build a deamon, which could be
installed on every node in the cluster and onxml-file, which describes
which log files have to be collected. The daemon should collect, in
configured time interval, all needed log files and store them using the
Java API in HDFS.

This was just an idea for a simple LogCollector and it would cool if you
can give me some opinion about this or whether such a LogCollector
exits.

Kind regards,
Patrick 



Re: Hadoop performance - xfs and ext4

2010-04-23 Thread stephen mulcahy

Andrew Klochkov wrote:

Hi,

Just curious - did you try ext3? Can it be faster then ext4? Hadoop wiki
suggests ext3 as it's used mostly for hadoop clusters:

http://wiki.apache.org/hadoop/DiskSetup


For completeness, I rebuilt one more time with ext3

mkfs.ext3 -T largefile4 DEV
(mounted with noatime)
gives me a cluster which runs TeraSort in about 22.5 minutes

So ext4 looks like the winner, from a performance perspective, at least 
for running the TeraSort on my cluster with it's specific configuration.


-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Hadoop performance - xfs and ext4

2010-04-23 Thread stephen mulcahy

Steve Loughran wrote:
That's really interesting. Do you want to update the bits of the Hadoop 
wiki that talks about filesystems?


I can if people think that would be useful.

I'm not sure if my results are neccesarily going to reflect what will 
happen on other peoples systems and configs though - whats the best way 
of addressing that?


Do my apache credentials work for the wiki or do I need to explicitly 
have a new account for the hadoop wiki?


-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Try to mount HDFS

2010-04-23 Thread Christian Baun
Brian,

You got it!!! :-)
It works (partly)!

i switched to Port 9000. core-site.xml includes now:


fs.default.name

hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000
true



$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup  0 2010-04-23 05:18 /mnt

$ hadoop fs -ls /mnt/
Found 1 items
drwxr-xr-x   - hadoop supergroup  0 2010-04-23 13:00 /mnt/mapred

# ./fuse_dfs_wrapper.sh dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000 
/mnt/hdfs/
port=9000,server=ec2-75-101-210-65.compute-1.amazonaws.com
fuse-dfs didn't recognize /mnt/hdfs/,-2

This tiny error message remains.

# mount | grep fuse
fuse_dfs on /hdfs type fuse.fuse_dfs 
(rw,nosuid,nodev,allow_other,default_permissions)

# ls /mnt/hdfs/
mnt
# mkdir /mnt/hdfs/testverzeichnis
# touch /mnt/hdfs/testdatei
# ls -l /mnt/hdfs/
total 8
drwxr-xr-x 3 hadoop 99 4096 2010-04-23 05:18 mnt
-rw-r--r-- 1 root   990 2010-04-23 13:07 testdatei
drwxr-xr-x 2 root   99 4096 2010-04-23 13:05 testverzeichnis

In /var/log/messages there was no information about hdfs/fuse.

Only in /var/log/user.log were these lines:
Apr 23 13:04:34 ip-10-242-231-63 fuse_dfs: mounting 
dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000/

mkdir and touch works. But I cannot write data into files(?!). They are all 
read only.
When I try to copy files "from outside" into the HDFS, only an empty file is 
created and in user.log appear these error messages:

Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not write 
all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not write 
all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: dfs problem - could not close 
file_handle(23486496) for /testordner/testfile fuse_impls_release.c:58

Weird...

But this is a big step forward.

Thanks a lot!!!

Best Regards
Christian 


Am Freitag, 23. April 2010 schrieb Brian Bockelman:
> Hm, ok, now you have me stumped.
> 
> One last hunch - can you include the port information, but also switch to 
> port 9000?
> 
> Additionally, can you do the following:
> 
> 1) In /var/log/messages and copy out the hdfs/fuse-related messages and post 
> them
> 2) Using the hadoop clients do,
> hadoop fs -ls /
> 
> Brian
> 
> On Apr 23, 2010, at 12:33 AM, Christian Baun wrote:
> 
> > Hi,
> > 
> > When adding the port information inside core-site.xml, the problem remains:
> > 
> > 
> > fs.default.name
> > 
> > hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020
> > true
> > 
> > 
> > # ./fuse_dfs_wrapper.sh 
> > dfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020 /mnt/hdfs/ 
> > port=8020,server=ec2-75-101-210-65.compute-1.amazonaws.com
> > fuse-dfs didn't recognize /mnt/hdfs/,-2
> > 
> > # ls /mnt/hdfs
> > ls: cannot access /mnt/hdfs/®1: No such file or directory
> > 
> > Best Regards,
> >   Christian
> > 
> > 
> > Am Freitag, 23. April 2010 schrieb Christian Baun:
> >> Hi Brian,
> >> 
> >> this is inside my core-site.xml 
> >> 
> >> 
> >>
> >>fs.default.name
> >>hdfs://ec2-75-101-210-65.compute-1.amazonaws.com/
> >>true
> >>
> >>
> >>hadoop.tmp.dir
> >>/mnt
> >>A base for other temporary 
> >> directories.
> >>
> >> 
> >> 
> >> Do I need to give the port here? 
> >> 
> >> this is inside my hdfs-site.xml
> >> 
> >> 
> >>
> >>dfs.name.dir
> >>${hadoop.tmp.dir}/dfs/name
> >>true
> >>
> >>
> >>dfs.data.dir
> >>${hadoop.tmp.dir}/dfs/data
> >> 
> >> 
> >>
> >>fs.checkpoint.dir
> >>${hadoop.tmp.dir}/dfs/namesecondary
> >>true
> >>true
> >>
> >> 
> >> 
> >> These directories do all exist
> >> 
> >> # ls -l /mnt/dfs/
> >> total 12
> >> drwxr-xr-x 2 hadoop hadoop 4096 2010-04-23 05:08 data
> >> drwxr-xr-x 4 hadoop hadoop 4096 2010-04-23 05:17 name
> >> drwxr-xr-x 2 hadoop hadoop 4096 2010-04-23 05:08 namesecondary
> >> 
> >> I don't have the config file hadoop-site.xml in /etc/...
> >> In the source directory of hadoop I have a hadoop-site.xml but with this 
> >> information
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Best Regards,
> >>   Christian 
> >> 
> >> 
> >> 
> >> Am Freitag, 23. April 2010 schrieb Brian Bockelman:
> >>> Hey Christian,
> >>> 
> >>> I've run into this before.
> >>> 
> >>> Make sure that the hostname/port you give to fuse is EXACTLY the same as 
> >>> listed in hadoop-site.xml.
> 

Re: Try to mount HDFS

2010-04-23 Thread Brian Bockelman
Hey Christian,

Glad to hear things are beginning to click.  Can you upload the things you 
learned into the wiki?  In our internal user docs, we have big bold letters 
saying to watch out for this issue.

As far as your writing issues - can you write using "hadoop fs -put"?  The nice 
thing about the built-in utilities is that it will give you better terminal 
feedback.

Alternately, I find myself mounting things in debug mode to see the Hadoop 
issues printed out to the terminal.

Brian

On Apr 23, 2010, at 8:30 AM, Christian Baun wrote:

> Brian,
> 
> You got it!!! :-)
> It works (partly)!
> 
> i switched to Port 9000. core-site.xml includes now:
> 
>   
>   fs.default.name
>   
> hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000
>   true
>   
> 
> 
> $ hadoop fs -ls /
> Found 1 items
> drwxr-xr-x   - hadoop supergroup  0 2010-04-23 05:18 /mnt
> 
> $ hadoop fs -ls /mnt/
> Found 1 items
> drwxr-xr-x   - hadoop supergroup  0 2010-04-23 13:00 /mnt/mapred
> 
> # ./fuse_dfs_wrapper.sh dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000 
> /mnt/hdfs/
> port=9000,server=ec2-75-101-210-65.compute-1.amazonaws.com
> fuse-dfs didn't recognize /mnt/hdfs/,-2
> 
> This tiny error message remains.
> 
> # mount | grep fuse
> fuse_dfs on /hdfs type fuse.fuse_dfs 
> (rw,nosuid,nodev,allow_other,default_permissions)
> 
> # ls /mnt/hdfs/
> mnt
> # mkdir /mnt/hdfs/testverzeichnis
> # touch /mnt/hdfs/testdatei
> # ls -l /mnt/hdfs/
> total 8
> drwxr-xr-x 3 hadoop 99 4096 2010-04-23 05:18 mnt
> -rw-r--r-- 1 root   990 2010-04-23 13:07 testdatei
> drwxr-xr-x 2 root   99 4096 2010-04-23 13:05 testverzeichnis
> 
> In /var/log/messages there was no information about hdfs/fuse.
> 
> Only in /var/log/user.log were these lines:
> Apr 23 13:04:34 ip-10-242-231-63 fuse_dfs: mounting 
> dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000/
> 
> mkdir and touch works. But I cannot write data into files(?!). They are all 
> read only.
> When I try to copy files "from outside" into the HDFS, only an empty file is 
> created and in user.log appear these error messages:
> 
> Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
> write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
> Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not 
> write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
> Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
> write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
> Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not 
> write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
> Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: dfs problem - could not 
> close file_handle(23486496) for /testordner/testfile fuse_impls_release.c:58
> 
> Weird...
> 
> But this is a big step forward.
> 
> Thanks a lot!!!
> 
> Best Regards
>Christian 
> 
> 
> Am Freitag, 23. April 2010 schrieb Brian Bockelman:
>> Hm, ok, now you have me stumped.
>> 
>> One last hunch - can you include the port information, but also switch to 
>> port 9000?
>> 
>> Additionally, can you do the following:
>> 
>> 1) In /var/log/messages and copy out the hdfs/fuse-related messages and post 
>> them
>> 2) Using the hadoop clients do,
>> hadoop fs -ls /
>> 
>> Brian
>> 
>> On Apr 23, 2010, at 12:33 AM, Christian Baun wrote:
>> 
>>> Hi,
>>> 
>>> When adding the port information inside core-site.xml, the problem remains:
>>> 
>>> 
>>> fs.default.name
>>> 
>>> hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020
>>> true
>>> 
>>> 
>>> # ./fuse_dfs_wrapper.sh 
>>> dfs://ec2-75-101-210-65.compute-1.amazonaws.com:8020 /mnt/hdfs/ 
>>> port=8020,server=ec2-75-101-210-65.compute-1.amazonaws.com
>>> fuse-dfs didn't recognize /mnt/hdfs/,-2
>>> 
>>> # ls /mnt/hdfs
>>> ls: cannot access /mnt/hdfs/®1: No such file or directory
>>> 
>>> Best Regards,
>>>  Christian
>>> 
>>> 
>>> Am Freitag, 23. April 2010 schrieb Christian Baun:
 Hi Brian,
 
 this is inside my core-site.xml 
 
 

fs.default.name
hdfs://ec2-75-101-210-65.compute-1.amazonaws.com/
true


hadoop.tmp.dir
/mnt
A base for other temporary 
 directories.

 
 
 Do I need to give the port here? 
 
 this is inside my hdfs-site.xml
 
 

dfs.name.dir
${hadoop.tmp.dir}/dfs/name
true


dfs.data.dir
${hadoop.tmp.dir}/dfs/data
 
 

fs.checkpoint.dir
${hadoop.tmp.dir}/dfs/namesecondary
true
true

 
 
 These directories do all exist
 
 # ls -l /mnt/d

Host name problem in Hadoop GUI

2010-04-23 Thread David Rosenstrauch

Having an issue with host names on my new Hadoop cluster.

The cluster is currently 1 name node and 2 data nodes, running in a 
cloud vendor data center.  All is well with general operations of the 
cluster - i.e., name node and data nodes can talk just fine, I can 
read/write to/from the HDFS, yada yada.


The problem is when I try to view the DFS through the web GUI.  The 
http://:50070/dfsnodelist.jsp page lists the data nodes, but 
the links don't work properly.


I think the reason is because I don't have dns entries set up for the 
slave machines.  And their /etc/hosts file is somewhat sketchy/sparse, i.e.:


[r...@hddata01 conf]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1   004.admin.lax1 004 localhost.localdomain 
localhost hddata01

::1 localhost6.localdomain6 localhost6

(Given the above hosts file, we would internally think of the node as 
being named "hdddata01".  But again, there's no DNS entry for that.)


So the data nodes all appear (incorrectly) in the HDFS node list page as 
"004", with an erroneous link to 
http://004.admin.lax1:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F 
- which is obviously a broken link.


Is there any way to fix this issue without setting up DNS entries for 
the data nodes?  e.g., is there any way to tell Hadoop to only use IP 
addresses in the GUI?


I also did some googling on this issue today, and saw mention of a 
"slave.host.name" configuration setting that sounded like it might solve 
the problem.  But it doesn't appear to be well documented, and it wasn't 
clear that this was the solution.


Any suggestions much appreciated!

TIA,

DR


Re: Hadoop performance - xfs and ext4

2010-04-23 Thread Todd Lipcon
Hi Stephen,

Can you try mounting ext4 with the nodelalloc option? I've seen the same
improvement due to delayed allocation butbeen a little nervous about that
option (especially in the NN where we currently follow what the kernel
people call an antipattern for image rotation).

-Todd

On Fri, Apr 23, 2010 at 6:12 AM, stephen mulcahy
wrote:

> Andrew Klochkov wrote:
>
>> Hi,
>>
>> Just curious - did you try ext3? Can it be faster then ext4? Hadoop wiki
>> suggests ext3 as it's used mostly for hadoop clusters:
>>
>> http://wiki.apache.org/hadoop/DiskSetup
>>
>
> For completeness, I rebuilt one more time with ext3
>
> mkfs.ext3 -T largefile4 DEV
> (mounted with noatime)
> gives me a cluster which runs TeraSort in about 22.5 minutes
>
> So ext4 looks like the winner, from a performance perspective, at least for
> running the TeraSort on my cluster with it's specific configuration.
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Try to mount HDFS

2010-04-23 Thread Christian Baun
Hi Brian, 

The error is found and the filesystem works now!

The error logs of "hadoop fs -put" helped a lot.

I tried to copy a small file:

#  ls -l /tmp/neue_datei.txt 
-rw-r--r-- 1 root root 5 2010-04-23 14:08 /tmp/neue_datei.txt

# hadoop fs -put /tmp/neue_datei.txt /hdfs/
10/04/23 14:09:02 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hdfs could 
only be replicated to 0 nodes, instead of 1
...

This told me to look into 
http://ec2-75-101-210-65.compute-1.amazonaws.com:50070
and
http://ec2-75-101-210-65.compute-1.amazonaws.com:50030
and the logs there. 
The result: 0 Nodes.
The namenode and jobtracker logs are full of error messages.

=> reboot...

Now I have again 1 Live Node.

# ./fuse_dfs_wrapper.sh dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000 
/hdfs/
port=9000,server=ec2-75-101-210-65.compute-1.amazonaws.com
fuse-dfs didn't recognize /hdfs/,-2
port=9000,server=ec2-75-101-210-65.compute-1.amazonaws.com

# df | grep hdfs
fuse_dfs 433455104 0 433455104   0% /hdfs

But writing inside /hdfs works only for the user "hadoop"

$ echo test > /hdfs/testfile
$ cat /hdfs/testfile 
test

When I try as root, it doesn't work (bash: testfile: Input/output error) and I 
have this error messages inside the logfile of the namenode:

2010-04-23 14:58:13,306 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 
on 9000, call create(/testfile.txt, rwxr-xr-x, DFSClient_-1937513051, true, 3, 
67108864) from 10.242.231.63:46719: error: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=root, access=WRITE, inode="":hadoop:supergroup:rwxr-xr-x
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=root, access=WRITE, inode="":hadoop:supergroup:rwxr-xr-x
...

But it works now. 

Thanks a lot for your help!

Best Regards
    Christian



Am Freitag, 23. April 2010 schrieb Brian Bockelman:
> Hey Christian,
> 
> Glad to hear things are beginning to click.  Can you upload the things you 
> learned into the wiki?  In our internal user docs, we have big bold letters 
> saying to watch out for this issue.
> 
> As far as your writing issues - can you write using "hadoop fs -put"?  The 
> nice thing about the built-in utilities is that it will give you better 
> terminal feedback.
> 
> Alternately, I find myself mounting things in debug mode to see the Hadoop 
> issues printed out to the terminal.
> 
> Brian
> 
> On Apr 23, 2010, at 8:30 AM, Christian Baun wrote:
> 
> > Brian,
> > 
> > You got it!!! :-)
> > It works (partly)!
> > 
> > i switched to Port 9000. core-site.xml includes now:
> > 
> > 
> > fs.default.name
> > 
> > hdfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000
> > true
> > 
> > 
> > 
> > $ hadoop fs -ls /
> > Found 1 items
> > drwxr-xr-x   - hadoop supergroup  0 2010-04-23 05:18 /mnt
> > 
> > $ hadoop fs -ls /mnt/
> > Found 1 items
> > drwxr-xr-x   - hadoop supergroup  0 2010-04-23 13:00 /mnt/mapred
> > 
> > # ./fuse_dfs_wrapper.sh 
> > dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000 /mnt/hdfs/
> > port=9000,server=ec2-75-101-210-65.compute-1.amazonaws.com
> > fuse-dfs didn't recognize /mnt/hdfs/,-2
> > 
> > This tiny error message remains.
> > 
> > # mount | grep fuse
> > fuse_dfs on /hdfs type fuse.fuse_dfs 
> > (rw,nosuid,nodev,allow_other,default_permissions)
> > 
> > # ls /mnt/hdfs/
> > mnt
> > # mkdir /mnt/hdfs/testverzeichnis
> > # touch /mnt/hdfs/testdatei
> > # ls -l /mnt/hdfs/
> > total 8
> > drwxr-xr-x 3 hadoop 99 4096 2010-04-23 05:18 mnt
> > -rw-r--r-- 1 root   990 2010-04-23 13:07 testdatei
> > drwxr-xr-x 2 root   99 4096 2010-04-23 13:05 testverzeichnis
> > 
> > In /var/log/messages there was no information about hdfs/fuse.
> > 
> > Only in /var/log/user.log were these lines:
> > Apr 23 13:04:34 ip-10-242-231-63 fuse_dfs: mounting 
> > dfs://ec2-75-101-210-65.compute-1.amazonaws.com:9000/
> > 
> > mkdir and touch works. But I cannot write data into files(?!). They are all 
> > read only.
> > When I try to copy files "from outside" into the HDFS, only an empty file 
> > is created and in user.log appear these error messages:
> > 
> > Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
> > write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
> > Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not 
> > write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
> > Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: fuse problem - could not 
> > write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:60
> > Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: WARN: fuse problem - could not 
> > write all the bytes for /testordner/testfile -1!=4096fuse_impls_write.c:64
> > Apr 23 13:18:46 ip-10-242-231-63 fuse_dfs: ERROR: dfs problem - could not 
> > close file_handle(23486496) for /testordner/testfile fuse_i

Re: Hadoop performance - xfs and ext4

2010-04-23 Thread Carfield Yim
I've done some research and following mount option sound like optimal
, will you interested to give it a try?

noatime,data=writeback,barrier=0,nobh

On Fri, Apr 23, 2010 at 10:43 PM, Todd Lipcon  wrote:
> Hi Stephen,
>
> Can you try mounting ext4 with the nodelalloc option? I've seen the same
> improvement due to delayed allocation butbeen a little nervous about that
> option (especially in the NN where we currently follow what the kernel
> people call an antipattern for image rotation).
>
> -Todd
>
> On Fri, Apr 23, 2010 at 6:12 AM, stephen mulcahy
> wrote:
>
>> Andrew Klochkov wrote:
>>
>>> Hi,
>>>
>>> Just curious - did you try ext3? Can it be faster then ext4? Hadoop wiki
>>> suggests ext3 as it's used mostly for hadoop clusters:
>>>
>>> http://wiki.apache.org/hadoop/DiskSetup
>>>
>>
>> For completeness, I rebuilt one more time with ext3
>>
>> mkfs.ext3 -T largefile4 DEV
>> (mounted with noatime)
>> gives me a cluster which runs TeraSort in about 22.5 minutes
>>
>> So ext4 looks like the winner, from a performance perspective, at least for
>> running the TeraSort on my cluster with it's specific configuration.
>>
>> -stephen
>>
>> --
>> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
>> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
>> http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Using external library in MapReduce jobs

2010-04-23 Thread Farhan Husain
Hello Mike,

I completely agree with you. I think bundling the libraries in the job jar
file is the correct way to go.

Thanks,
Farhan

On Thu, Apr 22, 2010 at 9:12 PM, Michael Segel wrote:

>
>
>
> > Date: Thu, 22 Apr 2010 17:30:13 -0700
> > Subject: Re: Using external library in MapReduce jobs
> > From: ale...@cloudera.com
> > To: common-user@hadoop.apache.org
> >
> > Sure, you need to place them into $HADOOP_HOME/lib directory on each
> server
> > in the cluster and they will be picked up on the next restart.
> >
> > -- Alex K
> >
>
> While this works, I wouldn't recommend it.
>
> You have to look at it this way... Your external m/r java libs are job
> centric. So every time you want to add jobs that require new external
> libraries you have to 'bounce' your cloud after pushing the the jars. Then
> you also have the issue of java class collisions if the cloud has one
> version of the same jar you're using. (We've had this happen to us already.)
>
> If you're just testing for a proof of concept, its one thing, but after the
> proof, you'll need to determine how to correctly push the jars out to each
> node.
>
> In a production environment, constantly bouncing clouds for each new job
> isn't really a good idea.
>
> HTH
>
> -Mike
>
> _
> Hotmail has tools for the New Busy. Search, chat and e-mail from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>


Dynamically determining number of reducers

2010-04-23 Thread Farhan Husain
Hello,

Is there any way to determine the number of reducers present in the cluster
dynamically? I need to determine it when the job parameters are set up.

Thanks,
Farhan


Re: Dynamically determining number of reducers

2010-04-23 Thread Farhan Husain
I actually wanted to mean number of tasktrackers. I want to set the number
of reducers equal to the number of tasktrackers present in the cluster and I
want to determine the number of tasktrackers dynamically.

Thanks,
Farhan

On Fri, Apr 23, 2010 at 12:03 PM, Farhan Husain
wrote:

> Hello,
>
> Is there any way to determine the number of reducers present in the cluster
> dynamically? I need to determine it when the job parameters are set up.
>
> Thanks,
> Farhan
>


Decomishining a node

2010-04-23 Thread Raymond Jennings III
I've got a dead machine on my cluster.  I want to safely update HDFS so that 
nothing references this machine then I want to rebuild it and put it back in 
service in the cluster.

Does anyone have any pointers how to do this (the first part - updating HDFS so 
that it's no longer referenced.)  Thank you.


  


Re: Dynamically determining number of reducers

2010-04-23 Thread Hong Tang

JobClient.getClusterStatus().getMaxReduceTasks().

On Apr 23, 2010, at 10:34 AM, Farhan Husain wrote:

I actually wanted to mean number of tasktrackers. I want to set the  
number
of reducers equal to the number of tasktrackers present in the  
cluster and I

want to determine the number of tasktrackers dynamically.

Thanks,
Farhan

On Fri, Apr 23, 2010 at 12:03 PM, Farhan Husain
wrote:


Hello,

Is there any way to determine the number of reducers present in the  
cluster
dynamically? I need to determine it when the job parameters are set  
up.


Thanks,
Farhan





Re: Decomishining a node

2010-04-23 Thread Allen Wittenauer

On Apr 23, 2010, at 10:48 AM, Raymond Jennings III wrote:

> I've got a dead machine on my cluster.  I want to safely update HDFS so that 
> nothing references this machine then I want to rebuild it and put it back in 
> service in the cluster.
> 
> Does anyone have any pointers how to do this (the first part - updating HDFS 
> so that it's no longer referenced.) 

1. Add node to dfs.exclude
2. hadoop dfsadmin -refreshNodes

That will start the decommissioning process.

When you want to add it back in, remove it from dfs.excluce and re-run 
refreshnodes.

Error with distcp: hdfs to S3 bulk transfer

2010-04-23 Thread ilayaraja
The following error is thrown when distcp ing data from hdfs (hadoop 15.5) 
to S3 storage.
This problem is creeping in after actually applying couple of bug fixes in 
hadoop 15.5 that were resolved in the later versions.

Any thoughts would be greatly helpful.

With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 GET failed. XML Error Message: 
encoding="UTF-8"?>NoSuchKeyThe specified key 
does not 
exist./user/root/ImplicitFeedback/linkdb-test1249D2146A4A104E
   at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:199)
   at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.inodeExists(Jets3tFileSystemStore.java:169)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

   at $Proxy1.inodeExists(Unknown Source)
   at 
org.apache.hadoop.fs.s3.S3FileSystem.exists(S3FileSystem.java:127)

   at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:675)
   at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
   at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)

Regards & Thanks,
Ilayaraja 



Re: Decomishining a node

2010-04-23 Thread Alex Kozlov
I think Raymond says that the machine is already dead...

At this point, you can just remove it from dfs.hosts list and let
HDFS to restore the lost blocks...

But before that, if you have the disks intact, you can stop HDFS and
manually copy the blocks together with their CRC from the dead machine's
dfs.data.dir to existing machines.  The copied blocks will be found
and recognized by the HDFS on restart.

Alex K

On Fri, Apr 23, 2010 at 11:31 AM, Allen Wittenauer  wrote:

>
> On Apr 23, 2010, at 10:48 AM, Raymond Jennings III wrote:
>
> > I've got a dead machine on my cluster.  I want to safely update HDFS so
> that nothing references this machine then I want to rebuild it and put it
> back in service in the cluster.
> >
> > Does anyone have any pointers how to do this (the first part - updating
> HDFS so that it's no longer referenced.)
>
> 1. Add node to dfs.exclude
> 2. hadoop dfsadmin -refreshNodes
>
> That will start the decommissioning process.
>
> When you want to add it back in, remove it from dfs.excluce and re-run
> refreshnodes.


Re: Decomishining a node

2010-04-23 Thread Allen Wittenauer

On Apr 23, 2010, at 1:56 PM, Alex Kozlov wrote:

> I think Raymond says that the machine is already dead...

Right.  But he wants to re-add it later.  So dfs.exclude is still a better way 
to go.  dfs.hosts, iirc, doesn't get re-read so it would require a nn bounce to 
clear.



Re: Dynamically determining number of reducers

2010-04-23 Thread Farhan Husain
Thanks!

On Fri, Apr 23, 2010 at 1:22 PM, Hong Tang  wrote:

> JobClient.getClusterStatus().getMaxReduceTasks().
>
>
> On Apr 23, 2010, at 10:34 AM, Farhan Husain wrote:
>
>  I actually wanted to mean number of tasktrackers. I want to set the number
>> of reducers equal to the number of tasktrackers present in the cluster and
>> I
>> want to determine the number of tasktrackers dynamically.
>>
>> Thanks,
>> Farhan
>>
>> On Fri, Apr 23, 2010 at 12:03 PM, Farhan Husain
>> wrote:
>>
>>  Hello,
>>>
>>> Is there any way to determine the number of reducers present in the
>>> cluster
>>> dynamically? I need to determine it when the job parameters are set up.
>>>
>>> Thanks,
>>> Farhan
>>>
>>>
>


Re: Decomishining a node

2010-04-23 Thread Alex Kozlov
The best way to resolve an argument is to look at the code:

 */**
   * Rereads the config to get hosts and exclude list file names.
   * Rereads the files to update the hosts and exclude lists.  It
   * checks if any of the hosts have changed states:
   * 1. Added to hosts  --> no further work needed here.
   * 2. Removed from hosts --> mark AdminState as decommissioned.
   * 3. Added to exclude --> start decommission.
   * 4. Removed from exclude --> stop decommission.
   */
  public void refreshNodes(Configuration conf) throws IOException {
checkSuperuserPrivilege();
// Reread the config to get dfs.hosts and dfs.hosts.exclude filenames.
// Update the file names and refresh internal includes and excludes list
if (conf == null)
  conf = new Configuration();
hostsReader.updateFileNames(conf.get("dfs.hosts",""),
conf.get("dfs.hosts.exclude", ""));
hostsReader.refresh();
synchronized (this) {
  for (Iterator it =
datanodeMap.values().iterator();
   it.hasNext();) {
DatanodeDescriptor node = it.next();
// Check if not include.
if (!inHostsList(node, null)) {
  node.setDecommissioned();  // case 2.
} else {
  if (inExcludedHostsList(node, null)) {
if (!node.isDecommissionInProgress() &&
!node.isDecommissioned()) {
  startDecommission(node);   // case 3.
}
  } else {
if (node.isDecommissionInProgress() ||
node.isDecommissioned()) {
  stopDecommission(node);   // case 4.
}
  }
}
  }
}
  }*

The machine is already dead, so there is no point in decomissioning.  HDFS
will still replicate the blocks as it is risky to function under a reduced
replication factor.

There may still be an argument whether it makes sense to physically move the
blocks...

Alex K

On Fri, Apr 23, 2010 at 2:20 PM, Allen Wittenauer
wrote:

>
> On Apr 23, 2010, at 1:56 PM, Alex Kozlov wrote:
>
> > I think Raymond says that the machine is already dead...
>
> Right.  But he wants to re-add it later.  So dfs.exclude is still a better
> way to go.  dfs.hosts, iirc, doesn't get re-read so it would require a nn
> bounce to clear.
>
>


Re: Decomishining a node

2010-04-23 Thread Allen Wittenauer

On Apr 23, 2010, at 2:50 PM, Alex Kozlov wrote:

> The best way to resolve an argument is to look at the code:

I didn't realize we were having an argument.

But I will say this:

I've never had a node removed from both dfs.hosts and dfs.hosts.exclude 
actually disappear from the dead list in the web ui, at least under 0.20.2, 
without bouncing the nn.




HADOOP_SSH_OPTS

2010-04-23 Thread Hazem Mahmoud
I have a test setup where (due to the environment I'm testing on) every system 
is listening on a different SSH port. From what I can tell, I can use 
HADOOP_SSH_OPTS in hadoop-env.sh to specify different SSH options (ie: specify 
a different port to connect to). However, in my case, the grid nodes are each 
listening on different SSH ports. Is there a way to deal with that? Thanks!

-Hazem

Re: HADOOP_SSH_OPTS

2010-04-23 Thread Allen Wittenauer

On Apr 23, 2010, at 4:01 PM, Hazem Mahmoud wrote:

> I have a test setup where (due to the environment I'm testing on) every 
> system is listening on a different SSH port. From what I can tell, I can use 
> HADOOP_SSH_OPTS in hadoop-env.sh to specify different SSH options (ie: 
> specify a different port to connect to). However, in my case, the grid nodes 
> are each listening on different SSH ports. Is there a way to deal with that? 
> Thanks!

Use a custom .ssh/config that has an entry per host.