RE: Get Line Number from InputFormat
Ok, so getting your position in to the file based on offset and a known fixed length format, er what you meant by structured, will give you a line number. But lets look at the question from a more practical and wider application. In most applications where you have a single record per line, you will not have a fixed length record format, so you really don't have a good way to calculate your line number based on position in to the file. Lets also look at the issue of the importance of a line number in terms of practical use. Sort of like row_id in a partitioned table, line number loses meaning. If line number had specific meaning and the application ended their records with a '\n' (or cr nl), the an alternative would be to add a field that contained the line number. HTH -Mike PS. Wouldn't you call a record in XML structured? Yet of an unknown length? ;-) (Sorry, I haven't had my first cup of coffee yet. :-) ) From: am...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Tue, 6 Apr 2010 12:14:56 +0530 Subject: Re: Get Line Number from InputFormat Hi, If your records are structured / of equal size, then getting the line number is straightforward. If not, you'll need to construct your own sequence of numbers, someone's been kind enough to publish on his blog: http://www.data-miners.com/blog/2009/11/hadoop-and-mapreduce-parallel-program.html Amogh On 4/5/10 7:59 PM, Michael Segel michael_se...@hotmail.com wrote: Date: Mon, 5 Apr 2010 14:57:09 +0100 From: lamfeeli...@gmail.com To: common-user@hadoop.apache.org Subject: Get Line Number from InputFormat Dear all, TextInputFormat send the Offset, Line into the Mapper, however, the offset is sometime meaningless, and confusing. Is it possible to have a InputFormat which outputs Line NO., line into mapper? Thanks a lot. Song Song, I'm not sure what you want is realistic or even worthwhile. You have a file and its split in to chunks of 64MB (default) or something larger based on your cloud settings. You have map job that starts from a specific point in to the file, but that does not mean that its starting at a specific line, or that Hadoop will know which line in the file. (Your records are not always going to be based on the end of a line, or one like per record. Does that make sense? Offset has more meaning that an arbitrary Line NO. -Mike _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
What means log DIR* NameSystem.completeFile: failed to complete... ?
Hi all, this warning is written in FSFileSystem.java/completeFileInternal(). It makes the calling code in NameNode.java throwing an IOException. FSFileSystem.java ... if (fileBlocks == null ) { NameNode.stateChangeLog.warn( DIR* NameSystem.completeFile: + failed to complete + src + because dir.getFileBlocks() is null + and pendingFile is + ((pendingFile == null) ? null : (from + pendingFile.getClientMachine())) ); ... What is the meaning of this warning? Any Idea what could have gone wrong in such a case? (This popped up through hbase, but as this code is in HDFS, I am asking this list) Thx Al
Re: losing network interfaces during long running map-reduce jobs
David Howell wrote: But I haven't seen anything in the dmesg log. I'll have to try looking at the tcpdump output on Monday, once I can get console access again. My apologies that I'm so sketchy on details right now... so far, I haven't been any able to find any evidence of something going wrong except for the hadoop log entries when the IOExceptions start. Thanks, -David I just lost my networking again. This time, I had switched my cluster back to the build I was using before I switched to CDH2. It's Hadoop 0.20.1 with these patches applied (for Dumbo): HADOOP-1722-v0.20.1 HADOOP-5450 MAPREDUCE-764 HADOOP-5528 Now I'm wondering if something about my job is the culprit. I have 2 nodes, both 8 core machines. mapred.tasktracker.map|reduce.tasks.maximum are both set to 7. The job I'm running is combining lots of gzipped Apache log files into sequence files for later analysis... I'm going from one file per virtual host per server per day to file per virtual host per day. The last attempt had ~1400 maps/10 reduces. could be just file handles you are losing; have up upped the OS defaults?
Re: Logging info in hadoop
hi, Nachiket, * I think if you output something to stderr, you should be able to find it in the .out log. * *Just make sure you are checking the right .out log file, * *you can do that by checking which tasktrackers are running you job from the web UI.* * * * * On Tue, Apr 6, 2010 at 6:56 PM, Nachiket Vaidya nachik...@gmail.com wrote: I have the following doubts: 1. How to print log information in Hadoop. In the documentation, I have read that hadoop-username-processname-machinename.log contains logs. I have used Log log = LogFactory.getLog(FBEMMapper.class); and log.info(); for printing into log, but I do not see any log information in log file. I have also used System.out.println() but these are also not getting printed in .log or .out file. Do we need to change some log level in hadoop? Do we need to enable logging for some class? which log4j.properties file we need to change? Firstly, am I doing right things for logging? Actually the problem is I have written my custom FileInputFormat and WritableComparable for my purpose. My program runs fine, but I do not see any output. That is why I need to print some log statement to debug the problem. Thank you. - Nachiket
Hadoop, C API, and fork
Hi, I have a distributed file server front end to Hadoop that uses the libhdfs C API to talk to Hadoop. Normally the file server will fork on a new client connection but this does not work with the libhdfs shared library (it is loaded using dlopen). If the server is in single process mode (no forking and can handle only one client at a time) then everything works fine. I have tried changing it so the server disconnects the Hadoop connection before forking and having both processes re-connect post fork. Essentially in the server: hdfsDisconnect(...); pid = fork(); hdfsConnect(...); if (pid == 0) ... else ... This causes a hang in the child process on Connect with the following backtrace: (gdb) bt #0 0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x2ace492559f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x2ace4930a5da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x2ace49307b13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x2ace490cf5fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x2ace49c87f50 in ?? () #6 0x0001 in ?? () #7 0x2ace4cd84d10 in ?? () #8 0x3f80 in ?? () #9 0x2ace49c8841d in ?? () #10 0x7fff0b4d04c0 in ?? () #11 0x in ?? () Leaving the connection open in the server: pid = fork(); if (pid == 0) ... else ... Also produces a hang in the child: (gdb) bt #0 0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x2b3d7193d9f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x2b3d719f25da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x2b3d719efb13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x2b3d717b75fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x2b3d7236ff50 in ?? () #6 0x in ?? () Does anyone have a suggestion on debugging/fixing this? Thanks for any help, -- - Patrick Donnelly
Re: Reducer-side join example
Thanks, I appreciate the example - what happens if File A and B have many more columns (all different data types)? The logic doesn't seem to work in that case - unless we set up the values in the Map function to include the file name (maybe the output value is a HashMap or something, which might work). Also, I was asking to see a reduce-side join as we have other things going on in the Mapper and I'm not sure if we can tweak it's output (we send output to multiple places). Does anyone have an example using the contrib/DataJoin or something similar? thanks On Mon, Apr 5, 2010 at 7:03 PM, He Chen airb...@gmail.com wrote: For the Map function: Input key: default input value: File A and File B lines output key: A, B, C,(first colomn of the final result) output value: 12, 24, Car, 13, Van, SUV... Reduce function: take the Map output and do: for each key { if the value of a key is integer then same it to array1; else save it to array2 } for ith element in array1 for jth element in array2 output(key, array1[i]+\t+array2[j]); done Hope this helps. On Mon, Apr 5, 2010 at 4:10 PM, M B machac...@gmail.com wrote: Hi, I need a good java example to get me started with some joining we need to do, any examples would be appreciated. File A: Field1 Field2 A12 B13 C22 A24 File B: Field1 Field2 Field3 ACar ... BTruck... BSUV ... BVan ... So, we need to first join File A and B on Field1 (say both are string fields). The result would just be: A 12 Car ... A 24 Car ... B 13 Truck ... B 13 SUV ... B 13 Van ... and so on - with all the fields from both files returning. Once we have that, we sometimes need to then transform it so we have a single record per key (Field1): A (12,Car) (24,Car) B (13,Truck) (13,SUV) (13,Van) --however it looks, basically tuples for each key (we'll modify this later to return a conatenated set of fields from B, etc) At other times, instead of transforming to a single row, we just need to modify rows based on values. So if B.Field2 equals Van, we need to set Output.Field2 = whatever then output to file ... Are there any good examples of this in native java (we can't use pig/hive/etc)? thanks. -- Best Wishes! -- Chen He PhD. student of CSE Dept. Holland Computing Center University of Nebraska-Lincoln Lincoln NE 68588
Re: Hadoop, C API, and fork
Hey Patrick, Using fork() for a multi-threaded process (which anything that uses libhdfs is) is pretty shaky. You might want to start off by reading the multi-threaded notes from the POSIX standard: http://www.opengroup.org/onlinepubs/95399/functions/fork.html You might have better luck playing around with pthread_atfork, or thinking about other possible designs :) If you really, really want to do this, you can also try playing around with the internals of libhdfs. Basically, use native JNI calls to shut down the JVM after you disconnect, then fork, then re-initialize everything. No idea if this would work. Brian On Apr 6, 2010, at 9:51 AM, Patrick Donnelly wrote: Hi, I have a distributed file server front end to Hadoop that uses the libhdfs C API to talk to Hadoop. Normally the file server will fork on a new client connection but this does not work with the libhdfs shared library (it is loaded using dlopen). If the server is in single process mode (no forking and can handle only one client at a time) then everything works fine. I have tried changing it so the server disconnects the Hadoop connection before forking and having both processes re-connect post fork. Essentially in the server: hdfsDisconnect(...); pid = fork(); hdfsConnect(...); if (pid == 0) ... else ... This causes a hang in the child process on Connect with the following backtrace: (gdb) bt #0 0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x2ace492559f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x2ace4930a5da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x2ace49307b13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x2ace490cf5fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x2ace49c87f50 in ?? () #6 0x0001 in ?? () #7 0x2ace4cd84d10 in ?? () #8 0x3f80 in ?? () #9 0x2ace49c8841d in ?? () #10 0x7fff0b4d04c0 in ?? () #11 0x in ?? () Leaving the connection open in the server: pid = fork(); if (pid == 0) ... else ... Also produces a hang in the child: (gdb) bt #0 0x0034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x2b3d7193d9f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x2b3d719f25da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x2b3d719efb13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x2b3d717b75fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x2b3d7236ff50 in ?? () #6 0x in ?? () Does anyone have a suggestion on debugging/fixing this? Thanks for any help, -- - Patrick Donnelly smime.p7s Description: S/MIME cryptographic signature
Re: What means log DIR* NameSystem.completeFile: failed to complete... ?
Hi Al, Usually this indicates that the file was renamed or deleted while it was still being created by the client. Unfortunately it's not the most descriptive :) -Todd On Tue, Apr 6, 2010 at 5:36 AM, Al Lias al.l...@gmx.de wrote: Hi all, this warning is written in FSFileSystem.java/completeFileInternal(). It makes the calling code in NameNode.java throwing an IOException. FSFileSystem.java ... if (fileBlocks == null ) { NameNode.stateChangeLog.warn( DIR* NameSystem.completeFile: + failed to complete + src + because dir.getFileBlocks() is null + and pendingFile is + ((pendingFile == null) ? null : (from + pendingFile.getClientMachine())) ); ... What is the meaning of this warning? Any Idea what could have gone wrong in such a case? (This popped up through hbase, but as this code is in HDFS, I am asking this list) Thx Al -- Todd Lipcon Software Engineer, Cloudera
Re: losing network interfaces during long running map-reduce jobs
could be just file handles you are losing; have up upped the OS defaults? I have not, and that does seem like a likely culprit. Although, it's a bit alarming that asking for one socket too many could take down the networking stack...
Re: Errors reading lzo-compressed files from Hadoop
Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex
Jetty can't start the SelectChannelConnector
Hi all, I configured the Hadoop in a cluster and the NameNode and JobTracker are running ok, but the DataNode and TaskTracker Doesn't start, they stop and keep waiting when they are going to start the Jetty I observed that Jetty can't start the _SelectChannelConnector_ Is there any Jetty configuration that should be changed ? There is no log message in the NN and JT when I try to start the DN and TT. The kernel I'm using is: Linux bl05 2.6.32.10 #2 SMP Tue Apr 6 12:33:42 BRT 2010 x86_64 GNU/Linux This is the message when I start the DN. It happens with TT too. ram...@bl05:~/hadoop-0.20.1+169.56$ ./bin/hadoop datanode 10/04/06 16:24:14 INFO datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = bl05.ctinfra.ufpr.br/192.168.1.115 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1+169.56 STARTUP_MSG: build = -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled by 'chad' on Tue Feb 2 13:27:17 PST 2010 / 10/04/06 16:24:14 INFO datanode.DataNode: Registered FSDatasetStatusMBean 10/04/06 16:24:14 INFO datanode.DataNode: Opened info server at 50010 10/04/06 16:24:14 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 10/04/06 16:24:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 10/04/06 16:24:14 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 10/04/06 16:24:14 INFO http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 10/04/06 16:24:14 INFO http.HttpServer: Jetty bound to port 50075 10/04/06 16:24:14 INFO mortbay.log: jetty-6.1.14 Thanks in Advance, Edson Ramiro
Re: Jetty can't start the SelectChannelConnector
Hi Todd, I'm getting this behavior in another cluster too, there the same thing happens. and as I don't have a jstack installed in the first cluster and I'm not the admin, I'm sending the results of the second cluster. These are the results: [erl...@cohiba ~ ]$ jstack -l 22510 22510: well-known file is not secure [erl...@cohiba ~ ]$ jstack -l 3836 3836: well-known file is not secure The jstack -F result is in thread_dump files and the jstack -m result is in java_native_frames. The files ending with nn are the namenode results and the files ending with dn are the datanode results. Thanks, Edson Ramiro On 6 April 2010 18:19, Todd Lipcon t...@cloudera.com wrote: Hi Edson, Can you please run jstack on the daemons in question and paste the output here? -Todd On Tue, Apr 6, 2010 at 12:44 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, I configured the Hadoop in a cluster and the NameNode and JobTracker are running ok, but the DataNode and TaskTracker Doesn't start, they stop and keep waiting when they are going to start the Jetty I observed that Jetty can't start the _SelectChannelConnector_ Is there any Jetty configuration that should be changed ? There is no log message in the NN and JT when I try to start the DN and TT. The kernel I'm using is: Linux bl05 2.6.32.10 #2 SMP Tue Apr 6 12:33:42 BRT 2010 x86_64 GNU/Linux This is the message when I start the DN. It happens with TT too. ram...@bl05:~/hadoop-0.20.1+169.56$ ./bin/hadoop datanode 10/04/06 16:24:14 INFO datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = bl05.ctinfra.ufpr.br/192.168.1.115 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1+169.56 STARTUP_MSG: build = -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled by 'chad' on Tue Feb 2 13:27:17 PST 2010 / 10/04/06 16:24:14 INFO datanode.DataNode: Registered FSDatasetStatusMBean 10/04/06 16:24:14 INFO datanode.DataNode: Opened info server at 50010 10/04/06 16:24:14 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 10/04/06 16:24:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 10/04/06 16:24:14 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 10/04/06 16:24:14 INFO http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 10/04/06 16:24:14 INFO http.HttpServer: Jetty bound to port 50075 10/04/06 16:24:14 INFO mortbay.log: jetty-6.1.14 Thanks in Advance, Edson Ramiro -- Todd Lipcon Software Engineer, Cloudera
Re: Jetty can't start the SelectChannelConnector
Hi Edson, Your attachments did not come through - can you put them on pastebin? -Todd On Tue, Apr 6, 2010 at 3:37 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi Todd, I'm getting this behavior in another cluster too, there the same thing happens. and as I don't have a jstack installed in the first cluster and I'm not the admin, I'm sending the results of the second cluster. These are the results: [erl...@cohiba ~ ]$ jstack -l 22510 22510: well-known file is not secure [erl...@cohiba ~ ]$ jstack -l 3836 3836: well-known file is not secure The jstack -F result is in thread_dump files and the jstack -m result is in java_native_frames. The files ending with nn are the namenode results and the files ending with dn are the datanode results. Thanks, Edson Ramiro On 6 April 2010 18:19, Todd Lipcon t...@cloudera.com wrote: Hi Edson, Can you please run jstack on the daemons in question and paste the output here? -Todd On Tue, Apr 6, 2010 at 12:44 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, I configured the Hadoop in a cluster and the NameNode and JobTracker are running ok, but the DataNode and TaskTracker Doesn't start, they stop and keep waiting when they are going to start the Jetty I observed that Jetty can't start the _SelectChannelConnector_ Is there any Jetty configuration that should be changed ? There is no log message in the NN and JT when I try to start the DN and TT. The kernel I'm using is: Linux bl05 2.6.32.10 #2 SMP Tue Apr 6 12:33:42 BRT 2010 x86_64 GNU/Linux This is the message when I start the DN. It happens with TT too. ram...@bl05:~/hadoop-0.20.1+169.56$ ./bin/hadoop datanode 10/04/06 16:24:14 INFO datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = bl05.ctinfra.ufpr.br/192.168.1.115 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1+169.56 STARTUP_MSG: build = -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled by 'chad' on Tue Feb 2 13:27:17 PST 2010 / 10/04/06 16:24:14 INFO datanode.DataNode: Registered FSDatasetStatusMBean 10/04/06 16:24:14 INFO datanode.DataNode: Opened info server at 50010 10/04/06 16:24:14 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 10/04/06 16:24:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 10/04/06 16:24:14 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 10/04/06 16:24:14 INFO http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 10/04/06 16:24:14 INFO http.HttpServer: Jetty bound to port 50075 10/04/06 16:24:14 INFO mortbay.log: jetty-6.1.14 Thanks in Advance, Edson Ramiro -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Jetty can't start the SelectChannelConnector
Thanks Todd, but could you please explain why entropy is important ? Edson Ramiro On 6 April 2010 20:09, Todd Lipcon t...@cloudera.com wrote: Not enough entropy on your system - you need to generate entropy or fake some using this technique: http://www.chrissearle.org/blog/technical/increase_entropy_26_kernel_linux_box -Todd On Tue, Apr 6, 2010 at 4:05 PM, Edson Ramiro erlfi...@gmail.com wrote: ok, [erl...@cohiba ~ ]$ cat java_native_frames_dn Attaching to process ID 3836, please wait... Debugger attached successfully. Server compiler detected. JVM version is 11.2-b01 Deadlock Detection: No deadlocks found. - 3864 - 0xf775a4f1__libc_read + 0x41 0xf702298creadBytes + 0xdc 0xf701e717Java_java_io_FileInputStream_readBytes + 0x47 0xf400b4aa* java.io.FileInputStream.readBytes(byte[], int, int) bci:0 (Interpreted frame) 0xf4003f69* java.io.FileInputStream.read(byte[], int, int) bci:4 line:199 (Interpreted frame) 0xf4003f69* java.io.BufferedInputStream.read1(byte[], int, int) bci:39 line:256 (Interpreted frame) 0xf4003f69* java.io.BufferedInputStream.read(byte[], int, int) bci:49 line:317 (Interpreted frame) 0xf4003f69* java.io.BufferedInputStream.fill() bci:175 line:218 (Interpreted frame) 0xf400408d* java.io.BufferedInputStream.read1(byte[], int, int) bci:44 line:258 (Interpreted frame) 0xf4003f69* java.io.BufferedInputStream.read(byte[], int, int) bci:49 line:317 (Interpreted frame) 0xf4003f69* sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte() bci:12 line:453 (Interpreted frame) 0xf4003e61* sun.security.provider.SeedGenerator.getSeedBytes(byte[]) bci:11 line:123 (Interpreted frame) 0xf400408d* sun.security.provider.SeedGenerator.generateSeed(byte[]) bci:4 line:118 (Interpreted frame) 0xf400408d* sun.security.provider.SecureRandom.engineGenerateSeed(int) bci:5 line:114 (Interpreted frame) 0xf4003f27* sun.security.provider.SecureRandom.engineNextBytes(byte[]) bci:40 line:171 (Interpreted frame) 0xf400408d* java.security.SecureRandom.nextBytes(byte[]) bci:5 line:433 (Interpreted frame) 0xf400408d* java.security.SecureRandom.next(int) bci:17 line:455 (Interpreted frame) 0xf4003f69* java.util.Random.nextLong() bci:3 line:284 (Interpreted frame) 0xf4003fab* org.mortbay.jetty.servlet.HashSessionIdManager.doStart() bci:73 line:139 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf4004569* org.mortbay.jetty.servlet.AbstractSessionManager.doStart() bci:96 line:168 (Interpreted frame) 0xf400408d* org.mortbay.jetty.servlet.HashSessionManager.doStart() bci:12 line:67 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf4004569* org.mortbay.jetty.servlet.SessionHandler.doStart() bci:4 line:115 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf4004569* org.mortbay.jetty.handler.HandlerWrapper.doStart() bci:11 line:130 (Interpreted frame) 0xf400408d* org.mortbay.jetty.handler.ContextHandler.startContext() bci:1 line:537 (Interpreted frame) 0xf400408d* org.mortbay.jetty.servlet.Context.startContext() bci:1 line:136 (Interpreted frame) 0xf400408d* org.mortbay.jetty.webapp.WebAppContext.startContext() bci:123 line:1234 (Interpreted frame) 0xf400408d* org.mortbay.jetty.handler.ContextHandler.doStart() bci:140 line:517 (Interpreted frame) 0xf400408d* org.mortbay.jetty.webapp.WebAppContext.doStart() bci:170 line:460 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf4004569* org.mortbay.jetty.handler.HandlerCollection.doStart() bci:32 line:152 (Interpreted frame) 0xf400408d* org.mortbay.jetty.handler.ContextHandlerCollection.doStart() bci:5 line:156 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf4004569* org.mortbay.jetty.handler.HandlerWrapper.doStart() bci:11 line:130 (Interpreted frame) 0xf400408d* org.mortbay.jetty.Server.doStart() bci:201 line:222 (Interpreted frame) 0xf400408d* org.mortbay.component.AbstractLifeCycle.start() bci:31 line:50 (Interpreted frame) 0xf400408d* org.apache.hadoop.http.HttpServer.start() bci:383 line:461 (Interpreted frame) 0xf400408d* org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(org.apache.hadoop.conf.Configuration, java.util.AbstractList) bci:916 line:375 (Interpreted frame) 0xf400408d* org.apache.hadoop.hdfs.server.datanode.DataNode.init(org.apache.hadoop.conf.Configuration, java.util.AbstractList) bci:158 line:216
Re: Reducer-side join example
Hi, Your question has an academic sound, so I'll give it an academic answer ;). Unfortunately, there are not really any good generalized (ie. cross join a large matrix with a large matrix) methods for doing joins in map-reduce. The fundamental reason for this is that in the general case you're comparing everything to everything, and so for each pair of possible rows, you must actually generate each pair of rows. This means every node ships all its data to every other node, no matter what (in the general case). I bring this up not because you're looking to optimize cross joining, but because it demonstrates the point that you will exploit the characteristics of your data no matter what strategy you choose, and each will have domain-specific flaws and advantages. The typical strategy for a reduce side join is to use hadoop's sorting functionality to group rows by their keys, such that the entire data set for a particular key will be resident on a single reducer. The key insight is that you're thinking about the join as a sorting problem. Yes this means you risk producing data sets that fill your reducers, but thats a trade-off that you accept to reduce the complexity of the original problem. If the existing join framework in hadoop (whose javadocs are quite thorough) is inadequate, you shouldn't be afraid to invent, implement, and test join strategies that are specific to your domain. On Tue, Apr 6, 2010 at 11:01 AM, M B machac...@gmail.com wrote: Thanks, I appreciate the example - what happens if File A and B have many more columns (all different data types)? The logic doesn't seem to work in that case - unless we set up the values in the Map function to include the file name (maybe the output value is a HashMap or something, which might work). Also, I was asking to see a reduce-side join as we have other things going on in the Mapper and I'm not sure if we can tweak it's output (we send output to multiple places). Does anyone have an example using the contrib/DataJoin or something similar? thanks On Mon, Apr 5, 2010 at 7:03 PM, He Chen airb...@gmail.com wrote: For the Map function: Input key: default input value: File A and File B lines output key: A, B, C,(first colomn of the final result) output value: 12, 24, Car, 13, Van, SUV... Reduce function: take the Map output and do: for each key { if the value of a key is integer then same it to array1; else save it to array2 } for ith element in array1 for jth element in array2 output(key, array1[i]+\t+array2[j]); done Hope this helps. On Mon, Apr 5, 2010 at 4:10 PM, M B machac...@gmail.com wrote: Hi, I need a good java example to get me started with some joining we need to do, any examples would be appreciated. File A: Field1 Field2 A12 B13 C22 A24 File B: Field1 Field2 Field3 ACar ... BTruck... BSUV ... BVan ... So, we need to first join File A and B on Field1 (say both are string fields). The result would just be: A 12 Car ... A 24 Car ... B 13 Truck ... B 13 SUV ... B 13 Van ... and so on - with all the fields from both files returning. Once we have that, we sometimes need to then transform it so we have a single record per key (Field1): A (12,Car) (24,Car) B (13,Truck) (13,SUV) (13,Van) --however it looks, basically tuples for each key (we'll modify this later to return a conatenated set of fields from B, etc) At other times, instead of transforming to a single row, we just need to modify rows based on values. So if B.Field2 equals Van, we need to set Output.Field2 = whatever then output to file ... Are there any good examples of this in native java (we can't use pig/hive/etc)? thanks. -- Best Wishes! -- Chen He PhD. student of CSE Dept. Holland Computing Center University of Nebraska-Lincoln Lincoln NE 68588
Cluster in Safe Mode
Hey all, I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs now yet to come out of Safe Mode. Does it normally take that long ? The DataNode logs on Node running NameNode indicates following similar output on the slave node ( running only Data Node ) as well. 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-310922324774702076_996024 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3302288729849061244_813694 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-7252548330326272479_1259723 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-5909954202848831867_1075933 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-3213723859645738103_1075939 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-2209269106581706132_676390 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-6007998488187910667_676379 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-1024215056075897357_676383 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3780597313184168671_1270304 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_8891623760013835158_676336 One thing I wanted to point out is sometime back I'd to do setrep on the entire Cluster, are these verifications messages related to that ? Also while going through the NameNode logs i encountered following things. 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 then again @ 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 I had to restart the cluster post which I got both the nodes back. 2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.21:50010storage DS-455083797-192 .168.100.21-50010-1268220157729 2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.21:50010 2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg. blk_-1845977707636580795_1665561 2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added to blk_-1845977707636580795_1665561 size 72753 2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45 SyncTimes(ms): 387 2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.2:50010storage DS-1237294752-192.168.100.2-50010-1252010614375 2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.2:50010 Then again subsequently they were removed. No clue why this happened. Ever since I'm seeing following things in logs.. 2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310, call create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg, rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437: error: org.apache.hadoop.dfs.SafeModeException: Cannot create file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg. Name node is in safe mode. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mode will be turned off automatically. org.apache.hadoop.dfs.SafeModeException: Cannot create file/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg. Name node is in safe mode. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe
Re: Cluster in Safe Mode
Looks like your all data nodes are down. Please make sure your data nodes are up and running (Check from Name node web ui and by jps on data nodes). Fsck is showing that there are 0 minimally replicated files and Average block replication is 0. Also please verify if your Data nodes data dir has any blocks. - Ravi On 4/6/10 10:16 PM, Manish N m1n...@gmail.com wrote: CORRUPT FILES:1601525 MISSING BLOCKS:1601927 MISSING SIZE:540525108291 B CORRUPT BLOCKS: 1601927 Minimally replicated blocks:0 (0.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:2 Average block replication:0.0 Corrupt blocks:1601927 Ravi --
RE: Cluster in Safe Mode
Hi Manish, Do you see any errors on DataNode log-files ? It is quite likely that after the namenode starts the processes on datanode then are failing to start, causing the namenode to wait in safe mode for datanode services to start. Thanks, Sagar -Original Message- From: Manish N [mailto:m1n...@gmail.com] Sent: Wednesday, April 07, 2010 10:47 AM To: common-user@hadoop.apache.org Subject: Cluster in Safe Mode Hey all, I've a 2 Node cluster which is now running in Safe Mode. Its been 15-16 hrs now yet to come out of Safe Mode. Does it normally take that long ? The DataNode logs on Node running NameNode indicates following similar output on the slave node ( running only Data Node ) as well. 2010-04-07 10:03:10,687 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-310922324774702076_996024 2010-04-07 10:03:10,705 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3302288729849061244_813694 2010-04-07 10:03:10,730 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-7252548330326272479_1259723 2010-04-07 10:03:10,745 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-5909954202848831867_1075933 2010-04-07 10:03:10,886 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-3213723859645738103_1075939 2010-04-07 10:03:10,910 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-2209269106581706132_676390 2010-04-07 10:03:10,923 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-6007998488187910667_676379 2010-04-07 10:03:11,086 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-1024215056075897357_676383 2010-04-07 10:03:11,127 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3780597313184168671_1270304 2010-04-07 10:03:11,160 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_8891623760013835158_676336 One thing I wanted to point out is sometime back I'd to do setrep on the entire Cluster, are these verifications messages related to that ? Also while going through the NameNode logs i encountered following things. 2010-04-05 21:01:31,383 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-05 21:01:49,240 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-05 21:01:49,243 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-05 21:02:01,791 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 then again @ 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.21:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 192.168.100.2:50010 2010-04-06 06:41:56,290 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/192.168.100.2:50010 I had to restart the cluster post which I got both the nodes back. 2010-04-06 10:11:24,325 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.21:50010storage DS-455083797-192 .168.100.21-50010-1268220157729 2010-04-06 10:11:24,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.21:50010 2010-04-06 10:11:25,245 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/listing/image/5/84025/35924c87e664a43893904effbd2be601_list.jpg. blk_-1845977707636580795_1665561 2010-04-06 10:11:25,342 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 192.168.100.21:50010 is added to blk_-1845977707636580795_1665561 size 72753 2010-04-06 10:11:44,257 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 64 Total time for transactions(ms): 4 Number of syncs: 45 SyncTimes(ms): 387 2010-04-06 10:11:51,485 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.100.2:50010storage DS-1237294752-192.168.100.2-50010-1252010614375 2010-04-06 10:11:51,488 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.100.2:50010 Then again subsequently they were removed. No clue why this happened. Ever since I'm seeing following things in logs.. 2010-04-06 10:00:49,052 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310, call create(/data/listing/image/4/43734/5af88437f6c6a88d62c5f900b06ab8dd_high.jpg, rwxr-xr-x, DFSClient_1226879860, true, 2, 67108864) from 192.168.100.5:40437: error: org.apache.hadoop.dfs.SafeModeException: Cannot create