Re: How to change join output separator
Hi, Found the solution. Its happening in toString() method under mapred.org.apache.hadoop.mapred.join.TupleWritable Thanks, Dhana Carbon Rock wrote: Hi, I am running map-side join. My input looks like this. file1.txt --- a|deer b|dog file2.txt --- a|veg b|nveg I am getting output like a|[deer,veg] b|[dog,nveg] I dont want those square brackets and the field seperator should be | (pipe) instead of comma Please guide me how to acheive this. Thanks, Dhana -- View this message in context: http://old.nabble.com/How-to-change-join-output-separator-tp28547855p28555738.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Problem starting datanode inside Solaris zone
Hi all, I'm inside a OpenSolaris zone, or more precisely a Joyent Accelerator. I can't seem to get a datanode started. I can start a namenode fine. I can bin/hadoop datanode -format' fine. JAVA_HOME is set to /usr/jdk/latest which is a symlink to whatever the latest version is. I'm running it as user 'jill' and I don't even know where that 24 /tmp/hadoop-jill/dfs/data is coming from. What am I missing? I'm very baffled :( In the log file all I'm getting is this: 2010-05-14 05:30:28,059 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = somehost/10.181.x.x STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2010-05-14 05:30:28,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-jill/dfs/data is not formatted. 2010-05-14 05:30:28,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... 2010-05-14 05:30:28,275 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.NumberFormatException:* For input string: 24 /tmp/hadoop-jill/dfs/data* at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:419) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.fs.DU.parseExecResult(DU.java:187) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DU.init(DU.java:53) at org.apache.hadoop.fs.DU.init(DU.java:63) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDataset.java:333) at org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:689) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:302) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216) Thanks! -alex
Re: Problem starting datanode inside Solaris zone
On May 14, 2010, at 3:27 AM, Alex Li wrote: I'm running it as user 'jill' and I don't even know where that 24 /tmp/hadoop-jill/dfs/data is coming from. What am I missing? I'm very baffled :( It is likely coming from the output of du, which the datanode uses to determine space. We run Hadoop on Solaris, but not in a zone so there shouldn't be any issues there, unless Joyent is doing odd things. What version of Solaris and what does your output of du /tmp/hadoop-jill/dfs/data give? [tabs vs. spaces, etc, counts!]
Re: Problem starting datanode inside Solaris zone
Hi Allen, Thanks for the pointer! You are dead on! This is what I got: [j...@alextest ~]$ du /storage/hadoop-jill/dfs/data/ 3 /storage/hadoop-jill/dfs/data/detach 6 /storage/hadoop-jill/dfs/data/current 3 /storage/hadoop-jill/dfs/data/tmp 18 /storage/hadoop-jill/dfs/data [j...@alextest ~]$ [j...@alextest ~]$ which du /usr/xpg4/bin/du [j...@alextest ~]$ which gdu /opt/local/bin/gdu [j...@alextest ~]$ It turns out what I got isn't the GNU du. I got it to run by doing this under /opt/local/bin: ln -s du gdu alias didn't work. hadoop must be looking for 'whch du' and use whichever if found first in the PATH. Thanks so much! My data node is now up! -alex On Sat, May 15, 2010 at 12:14 AM, Allen Wittenauer awittena...@linkedin.com wrote: On May 14, 2010, at 3:27 AM, Alex Li wrote: I'm running it as user 'jill' and I don't even know where that 24 /tmp/hadoop-jill/dfs/data is coming from. What am I missing? I'm very baffled :( It is likely coming from the output of du, which the datanode uses to determine space. We run Hadoop on Solaris, but not in a zone so there shouldn't be any issues there, unless Joyent is doing odd things. What version of Solaris and what does your output of du /tmp/hadoop-jill/dfs/data give? [tabs vs. spaces, etc, counts!]
Re: Problem starting datanode inside Solaris zone
On May 14, 2010, at 9:57 AM, Alex Li wrote: [j...@alextest ~]$ which du /usr/xpg4/bin/du POSIX du [j...@alextest ~]$ which gdu /opt/local/bin/gdu GNU du [j...@alextest ~]$ It turns out what I got isn't the GNU du. This is actually very concerning. Hadoop should work with POSIX du. If it doesn't, it is a bug. alias didn't work. hadoop must be looking for 'whch du' and use whichever if found first in the PATH. Essentially, yes. It is surprising that POSIX du fails but SysV du (which is what we use here) and GNU du work. I'll have to play with this and see what is going on.
Re: Problem starting datanode inside Solaris zone
Forgot, OS version, Nevada build 121 [r...@alextest ~]# uname -a SunOS alextest 5.11 snv_121 i86pc i386 i86pc Cheers! On Sat, May 15, 2010 at 1:20 AM, Alex Li a...@joyent.com wrote: Thanks again! On Sat, May 15, 2010 at 1:08 AM, Allen Wittenauer awittena...@linkedin.com wrote: On May 14, 2010, at 9:57 AM, Alex Li wrote: [j...@alextest ~]$ which du /usr/xpg4/bin/du POSIX du [j...@alextest ~]$ which gdu /opt/local/bin/gdu GNU du [j...@alextest ~]$ It turns out what I got isn't the GNU du. This is actually very concerning. Hadoop should work with POSIX du. If it doesn't, it is a bug. alias didn't work. hadoop must be looking for 'whch du' and use whichever if found first in the PATH. Essentially, yes. It is surprising that POSIX du fails but SysV du (which is what we use here) and GNU du work. I'll have to play with this and see what is going on.
Re: Build a indexing and search service with Hadoop
Thanks for the replies. I'm already investigating how katta works and how I can extend it. What do you mean by distributed search capability? Lucene provives any way to merge hits from diferent indexes? 2010/5/14 Ian Soboroff ian.sobor...@nist.gov Aécio aecio.sola...@gmail.com writes: 2. Search - The query received is used as input of the map function. This function would search the document on the local shard using our custom library and emit the hits. The reduce function would group the hits from all shards. There is no way you can do interactive searches via MapReduce in Hadoop, because the JVM start time will kill you. If your shard backend is Lucene, just use the distributed search capability already there, or look at Katta. Ian -- Atenciosamente, Aécio Santos.
Re: Setting up a second cluster and getting a weird issue
I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1 property namedfs.data.dir/name value/srv/hadoop/dfs/1/value /property I don't have hadoop.tmp.dir set to anything so it's whatever the default is. I don't have access to the cluster right now but will update with the exact settings when I get a chance. I have 4 slaves with identical hardware. Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1. The same config file is used across all the slaves. I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes. Thanks! On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote: On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote: Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. Any thoughts? Something is deleting the contents of /srv/hadoop/dfs/1. How did you set your dfs.data.dir in the config file? Or did you just change hadoop.tmp.dir?
Re: TestFDSIO
Hi Lavanya, On 5/14/2010 10:51 AM, Lavanya Ramakrishnan wrote: Hello, I am running org.apache.hadoop.fs.TestDFSIO to benchmark our HDFS installation and had a couple of questions regarding the same. a) If I run the benchmark back to back in the same directory, I start seeing strange errors such as NotReplicatedYetException or AlreadyBeingCreatedException (failed to create file on client 5, because this file is already being created by DFSClient_ on ...). It seems like there might be some kind of race condition between the replication from a previous run and subsequent runs. Is there any way to avoid this? Yes this looks like a race with the previous run. You can just wait or run TestDFSIO -clean before the second run. b) I have been testing with concurrent writers and see a significant drop in throughput. I get about 60 MB/s for 1 writer and about 8 MB/s for 50 concurrent writers. Is this the known scalability limits for HDFS. Is there any way to configure this to perform better? It depends on the size and the configuration of your cluster. In general for consistent results with DFSIO it is better to set up 1 or 2 tasks per node. And specify as many files for DFSIO as you have map slots. The idea is that all maps finish in one wave. Then you should get optimal performance. Thanks, --Konstantin
Chaining jobs that have dependencies on jars...
We have a series of hadoop map/reduce jobs that need to be run. In between each job, we have to do some logic and depending on the results the next job gets called. So in the chain of events, job A runs. At the end of the job, some value is evaluated. Depending on its result, we want to run either job B or job C. So we can use the Tool Interface and load the class and run. The catch is that some of the jobs have dependencies. When launched from the hadoop command line as a standalone jar, if there are any dependencies and the jar file has a /lib directory, those jars will be loaded. When you use the tool interface, those jars in the /lib directory will not be loaded. Outside of using the distributed cache, is there a way to launch a job so that it will pick up the jar file? Thx -Mike _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
Re: Setting up a second cluster and getting a weird issue
My hdfs-site.xml file: 11 configuration 12 property 13 namedfs.replication/name 14 value3/value 15 /property 16 property 17 namedfs.name.dir/name 18 value/srv/hadoop/dfs.name.dir/value 19 /property 20 property 21 namedfs.data.dir/name 22 value/srv/hadoop/dfs/1/value 23 /property 24 /configuration Here is my /srv/hadoop/hadoop directory listing: total 5068 drwxr-xr-x 2 hadoop hadoop4096 2010-05-12 16:10 bin -rw-rw-r-- 1 hadoop hadoop 73847 2010-03-21 23:17 build.xml drwxr-xr-x 5 hadoop hadoop4096 2010-03-21 23:17 c++ -rw-rw-r-- 1 hadoop hadoop 348624 2010-03-21 23:17 CHANGES.txt drwxr-xr-x 4 hadoop hadoop4096 2010-05-12 09:29 cloudera lrwxrwxrwx 1 hadoop hadoop 15 2010-05-12 15:54 conf - ../hadoop-conf/ drwxr-xr-x 15 hadoop hadoop4096 2010-03-21 23:17 contrib drwxr-xr-x 9 hadoop hadoop4096 2010-05-12 09:29 docs drwxr-xr-x 3 hadoop hadoop4096 2010-03-21 23:17 example-confs -rw-rw-r-- 1 hadoop hadoop6839 2010-03-21 23:17 hadoop-0.20.2+228-ant.jar -rw-rw-r-- 1 hadoop hadoop 2806445 2010-03-21 23:17 hadoop-0.20.2+228-core.jar -rw-rw-r-- 1 hadoop hadoop 142466 2010-03-21 23:17 hadoop-0.20.2+228-examples.jar -rw-rw-r-- 1 hadoop hadoop 1637240 2010-03-21 23:17 hadoop-0.20.2+228-test.jar -rw-rw-r-- 1 hadoop hadoop 70090 2010-03-21 23:17 hadoop-0.20.2+228-tools.jar drwxr-xr-x 2 hadoop hadoop4096 2010-05-12 09:29 ivy -rw-rw-r-- 1 hadoop hadoop9103 2010-03-21 23:17 ivy.xml drwxr-xr-x 5 hadoop hadoop4096 2010-05-12 09:29 lib -rw-rw-r-- 1 hadoop hadoop 13366 2010-03-21 23:17 LICENSE.txt lrwxrwxrwx 1 hadoop hadoop 8 2010-05-12 16:28 logs - ../logs/ drwxr-xr-x 3 hadoop hadoop4096 2010-05-12 16:16 logs-old -rw-rw-r-- 1 hadoop hadoop 101 2010-03-21 23:17 NOTICE.txt lrwxrwxrwx 1 hadoop hadoop 7 2010-05-12 16:28 pids - ../pids drwxr-xr-x 2 hadoop hadoop4096 2010-05-12 16:10 pids-old -rw-rw-r-- 1 hadoop hadoop1366 2010-03-21 23:17 README.txt drwxr-xr-x 15 hadoop hadoop4096 2010-05-12 09:29 src drwxr-xr-x 8 hadoop hadoop4096 2010-03-21 23:17 webapps The only NFS shared directories are /srv/hadoop/hadoop and /srv/hadoop/hadoop-conf On May 14, 2010, at 1:06 PM, Andrew Nguyen wrote: I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1 property namedfs.data.dir/name value/srv/hadoop/dfs/1/value /property I don't have hadoop.tmp.dir set to anything so it's whatever the default is. I don't have access to the cluster right now but will update with the exact settings when I get a chance. I have 4 slaves with identical hardware. Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1. The same config file is used across all the slaves. I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes. Thanks! On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote: On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote: Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. Any thoughts? Something is deleting the contents of /srv/hadoop/dfs/1. How did you set your dfs.data.dir in the config file? Or did you just change hadoop.tmp.dir?
NameNode deadlocked (help?)
Hey guys, I know it's 5PM on a Friday, but we just saw one of our big cluster's namenode's deadlock. This is 0.19.1; does this ring a bell for anyone? I haven't had any time to start going through source code, but I figured I'd send out a SOS in case if this looked familiar. We had restarted this cluster a few hours ago and made the following changes: 1) Increased the number of datanode handlers from 10 to 40. 2) Increased ipc.server.listen.queue.size from 128 to 256. If nothing else, I figure a deadlocked NN might be interesting to devs... Brian 2010-05-14 17:11:30 Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode): IPC Server handler 39 on 9000 daemon prio=10 tid=0x2aaafc181400 nid=0x4cd waiting for monitor entry [0x45962000..0x45962d90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:2231) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.sendHeartbeat(NameNode.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 38 on 9000 daemon prio=10 tid=0x2aaafc17f800 nid=0x4cc waiting for monitor entry [0x45861000..0x45861d10] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStats(FSNamesystem.java:3326) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.getStats(NameNode.java:505) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 37 on 9000 daemon prio=10 tid=0x2aaafc17e000 nid=0x4cb waiting for monitor entry [0x4575f000..0x45760a90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:801) - waiting to lock 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.getBlockLocations(NameNode.java:272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 36 on 9000 daemon prio=10 tid=0x2aaafc17c400 nid=0x4ca waiting for monitor entry [0x4565e000..0x4565fa10] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:3281) - waiting to lock 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:649) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 35 on 9000 daemon prio=10 tid=0x2aaafc17a800 nid=0x4c9 waiting for monitor entry [0x4555e000..0x4555eb90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1294) - waiting to lock 0x2aaab3843e40 (a
Re: NameNode deadlocked (help?)
Hey Brian, Looks like it's not deadlocked, but rather just busy doing a lot of work: org.apache.hadoop.hdfs.server.namenode.fsnamesystem$heartbeatmoni...@1c778255 daemon prio=10 tid=0x2aaafc012c00 nid=0x493 runnable [0x413da000..0x413daa10] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeStoredBlock(FSNamesystem.java:3236) - locked 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeDatanode(FSNamesystem.java:2695) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.heartbeatCheck(FSNamesystem.java:2785) - locked 0x2aaab3659dd8 (a java.util.TreeMap) - locked 0x2aaab3653848 (a java.util.ArrayList) - locked 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2312) at java.lang.Thread.run(Thread.java:619) Let me dig up the 19.1 source code and see if this looks like an infinite loop or just one that's tying it up for some number of seconds. Anything being written into the logs? -Todd On Fri, May 14, 2010 at 5:22 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey guys, I know it's 5PM on a Friday, but we just saw one of our big cluster's namenode's deadlock. This is 0.19.1; does this ring a bell for anyone? I haven't had any time to start going through source code, but I figured I'd send out a SOS in case if this looked familiar. We had restarted this cluster a few hours ago and made the following changes: 1) Increased the number of datanode handlers from 10 to 40. 2) Increased ipc.server.listen.queue.size from 128 to 256. If nothing else, I figure a deadlocked NN might be interesting to devs... Brian 2010-05-14 17:11:30 Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode): IPC Server handler 39 on 9000 daemon prio=10 tid=0x2aaafc181400 nid=0x4cd waiting for monitor entry [0x45962000..0x45962d90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:2231) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.sendHeartbeat(NameNode.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 38 on 9000 daemon prio=10 tid=0x2aaafc17f800 nid=0x4cc waiting for monitor entry [0x45861000..0x45861d10] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStats(FSNamesystem.java:3326) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.getStats(NameNode.java:505) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 37 on 9000 daemon prio=10 tid=0x2aaafc17e000 nid=0x4cb waiting for monitor entry [0x4575f000..0x45760a90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:801) - waiting to lock 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.getBlockLocations(NameNode.java:272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
Re: TestFDSIO
On the second thought, there should not be any racing. You probably restart the hdfs cluster between the runs. When you shutdown the cluster after the first run some files may still remain unclosed. Then after restarting the cluster you will have all their leases renewed, and if somebody tries to to recreate an unclosed file he will fail with AlreadyBeingCreatedException. If my guess is correct then you should keep the cluster running between the consequent DFSIO runs. Cleaning up will still help keeping benchmark data consistent. If a bunch of files is recreated, hdfs will start removing the old file blocks. This increases the internal load and skews the performance results. --Konstantin On 5/14/2010 2:26 PM, Konstantin Shvachko wrote: Hi Lavanya, On 5/14/2010 10:51 AM, Lavanya Ramakrishnan wrote: Hello, I am running org.apache.hadoop.fs.TestDFSIO to benchmark our HDFS installation and had a couple of questions regarding the same. a) If I run the benchmark back to back in the same directory, I start seeing strange errors such as NotReplicatedYetException or AlreadyBeingCreatedException (failed to create file on client 5, because this file is already being created by DFSClient_ on ...). It seems like there might be some kind of race condition between the replication from a previous run and subsequent runs. Is there any way to avoid this? Yes this looks like a race with the previous run. You can just wait or run TestDFSIO -clean before the second run. b) I have been testing with concurrent writers and see a significant drop in throughput. I get about 60 MB/s for 1 writer and about 8 MB/s for 50 concurrent writers. Is this the known scalability limits for HDFS. Is there any way to configure this to perform better? It depends on the size and the configuration of your cluster. In general for consistent results with DFSIO it is better to set up 1 or 2 tasks per node. And specify as many files for DFSIO as you have map slots. The idea is that all maps finish in one wave. Then you should get optimal performance. Thanks, --Konstantin