Re: Logging from the job
Where are you looking for the logs? They will be available in Tasklogs. You can view them from web ui from taskdetails.jsp page. -Amareshwari On 4/27/10 2:22 PM, Alexander Semenov bohtva...@gmail.com wrote: Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need to log statements from the running job, e.g. use Apache commons logging to print debug messages on map and reduce operations. I've tuned the conf/log4.properties file for my logging domain but log statements are still missing in the log files and on the console. I start the job like this: hadoop jar jar_file.jar input_dir output_dir The job finishes gracefully but I see no logging. Any suggestions? Thanks.
Re: Logging from the job
you should use the log4j rather than the apache common logging On Tue, Apr 27, 2010 at 4:52 PM, Alexander Semenov bohtva...@gmail.com wrote: Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need to log statements from the running job, e.g. use Apache commons logging to print debug messages on map and reduce operations. I've tuned the conf/log4.properties file for my logging domain but log statements are still missing in the log files and on the console. I start the job like this: hadoop jar jar_file.jar input_dir output_dir The job finishes gracefully but I see no logging. Any suggestions? Thanks. -- Best Regards Jeff Zhang
Re: Logging from the job
I'm expecting to see the logs on the console since the root logger is configured to do so. On Tue, 2010-04-27 at 14:28 +0530, Amareshwari Sri Ramadasu wrote: Where are you looking for the logs? They will be available in Tasklogs. You can view them from web ui from taskdetails.jsp page. -Amareshwari On 4/27/10 2:22 PM, Alexander Semenov bohtva...@gmail.com wrote: Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need to log statements from the running job, e.g. use Apache commons logging to print debug messages on map and reduce operations. I've tuned the conf/log4.properties file for my logging domain but log statements are still missing in the log files and on the console. I start the job like this: hadoop jar jar_file.jar input_dir output_dir The job finishes gracefully but I see no logging. Any suggestions? Thanks.
Re: Logging from the job
Alexander Semenov wrote: Ok, thanks. Unfortunately ant is currently not installed on machine running hadoop. What if I use slf4j + logback just in job's jar? depends on the classpath. You can do it in your own code by getting whatever classloader you use then call getResource(log4.properties) to get the URL, which you can print to the console, something like: System.err.println(Log4J properties at +this.getClass().getClassloader().getResource(log4.properties)); BTW, is hadoop planning to migrate on this stack from the deprecated Apache commons and log4j? Deprecated? That's very much a point of view. 1. commons-log is a thin front end to anything; I have my own that can be switched in when I desire to get the logs from 1 machine in one place. Where it is great for is in libraries which can be embedded in other things -Hadoop does get used this way- as it stops the library saying here is the logging tool you must use. you get to choose. 2. Log4J is a fantastic logging API with some really good back end implementations. In Hadoop, I'd recommend the rolling Logs. There is good support in Hadoop for changing logging levels as you go along. 3. slf4j was meant to be a log api that avoided all the problems of layering, but it actually introduces a new one: the risk of 1 SLF4J on the classpath. And, as it's an extra JAR that Jetty requires, you have to make sure your job's SLF4j doesn't clash with the one used for Jetty. And its back-ends aren't on a par with Log4J. I'm happy with commons-logging and log4j, just wish that jetty had stayed with commons-logging. The alternative would be the java.util.logging APIs, which are themselves a dog to configure. You need to point to the right config file using JVM system properties which really need to be set on the command line to get picked up early enough, and as the docs say: By default, the LogManager reads its initial configuration from a properties file lib/logging.properties in the JRE directory. This isn't as bad as having a commons-logging or log4.properties in the JAR of a third party library, but it means that by default, one logging setup per JVM unless you deliberately configure each app differently. Which is a PITA, and its probably one of the reasons that the java logging API never took off, and doesn't have as good back end loggers as Log4J. Summary: try and find the properties file, it may just be classpath/JVM quirks, learn to use the rolling/daily log4j logs, don't keep the logs on your root drive either. -Steve
mutipul reduce tasks in contrib/index
hi all, I'm using contrib/index for text indexing, can i have mulitpul reducer for index writing? For example documents from the same type falls into the same reduce node. 2010-04-27 ni_jiangfeng
DataNode not able to spawn a Task
Hi guys, I see the exception below when I launch a job 0/04/27 10:54:16 INFO mapred.JobClient: map 0% reduce 0% 10/04/27 10:54:22 INFO mapred.JobClient: Task Id : attempt_201004271050_0001_m_005760_0, Status : FAILED Error initializing attempt_201004271050_0001_m_005760_0: java.lang.NumberFormatException: For input string: - at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:476) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630) Few things * I ran fsck on the namenode and no corrupted blocks reported. * The -report from dfsadmin , says the datanode is up. -- View this message in context: http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28378863.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: DataNode not able to spawn a Task
Hi Vishal, What operating system are you on? The TT is having issues parsing the output of df -Todd On Tue, Apr 27, 2010 at 9:03 AM, vishalsant vishal.santo...@gmail.comwrote: Hi guys, I see the exception below when I launch a job 0/04/27 10:54:16 INFO mapred.JobClient: map 0% reduce 0% 10/04/27 10:54:22 INFO mapred.JobClient: Task Id : attempt_201004271050_0001_m_005760_0, Status : FAILED Error initializing attempt_201004271050_0001_m_005760_0: java.lang.NumberFormatException: For input string: - at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:476) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630) Few things * I ran fsck on the namenode and no corrupted blocks reported. * The -report from dfsadmin , says the datanode is up. -- View this message in context: http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28378863.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: DataNode not able to spawn a Task
It seems that this piece of code , does a df to get the amount of free space ( got this info from the irc channel ) And it is trying to do a Number conversion on information returned by df /Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda21891213200 -45291780 1838887216 - / . . Of course in my case the Use% is - and that is an issue :) BTW , this datanode had stopped responding.. it is always good idea to do df thus , to make sure that this does not happen during job execution and may be even as a part of the ./hadoop dfsadmin -report pbly. Will close the thread , when this is resolved with the disk issue ( which it seems to be ). vishalsant wrote: Hi guys, I see the exception below when I launch a job 0/04/27 10:54:16 INFO mapred.JobClient: map 0% reduce 0% 10/04/27 10:54:22 INFO mapred.JobClient: Task Id : attempt_201004271050_0001_m_005760_0, Status : FAILED Error initializing attempt_201004271050_0001_m_005760_0: java.lang.NumberFormatException: For input string: - at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:476) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630) Few things * I ran fsck on the namenode and no corrupted blocks reported. * The -report from dfsadmin , says the datanode is up. -- View this message in context: http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28379065.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Reducer ID
Thanks! 2010/4/26 Amareshwari Sri Ramadasu amar...@yahoo-inc.com context.getTaskAttemptID() gives the task attempt id and context,getTaskAttemptID().getTaskID() gives the task id of the reducer. Context.getTaskAttemptID().getTaskID().getId() gives the reducer number. Thanks Amareshwari On 4/27/10 5:34 AM, Gang Luo lgpub...@yahoo.com.cn wrote: JobConf.get(mapred.task.id) gives you everything (including attempt id). -Gang - 原始邮件 发件人: Farhan Husain farhan.hus...@csebuet.org 收件人: common-user@hadoop.apache.org 发送日期: 2010/4/26 (周一) 7:13:03 下午 主 题: Reducer ID Hello, Is it possible to know the unique id of a reducer inside the reduce or setup method of a reducer class? I tried to find any method of the context class which might help in this regard but could not get any. Thanks, Farhan
Re: Logging from the job
Hi Alexander, Where are you looking for the logs? The output of the tasks should be in $HADOOP_LOG_DIR/userlogs/attempt*/{stdout,stderr,syslog}. Could you provide the exact java command line your tasks are running with (do 'ps -ef | grep Child' on one of the nodes when the job is running). Alex K On Tue, Apr 27, 2010 at 1:52 AM, Alexander Semenov bohtva...@gmail.comwrote: Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need to log statements from the running job, e.g. use Apache commons logging to print debug messages on map and reduce operations. I've tuned the conf/log4.properties file for my logging domain but log statements are still missing in the log files and on the console. I start the job like this: hadoop jar jar_file.jar input_dir output_dir The job finishes gracefully but I see no logging. Any suggestions? Thanks.
Hadoop Developer opening San Diego or Los Angeles
My name is Larry Mills and I am conducting a search for two Hadoop/Java developers in San Diego or Los Angeles. Please see below job description. Hadoop/Java Developers Needed San Diego or Los Angeles, California Position Summary We are seeking two JAVA/Hadoop developers who will be as passionate about our product as we are. If you enjoy pushing the envelope of internet technology to deliver next generation eCommerce solutions and you meet these qualifications, we want to talk to you. Requirements/Qualifications: * 3+ years JAVA development * Hadoop (HDFS and MapReduce) development training or experience * Passion for cutting-edge technologies * Excellent communication and verbal skills * Must thrive in a fast-paced, small, team-work environment * Bachelor's degree in Computer Science or a related field preferred * Minimum of 3 years professional development experience What We can Offer You! * Competitive Wages * Medical Benefits * Dental Benefits * 401(k) Plan * Vision Please contact Larry Mills at 720.339.1361 or email lmi...@knowledgerecruiters.com to further explore this opportunity. Sincerely, cid:image003.png@01CABBD6.B556F1D0 Larry Mills Managing Partner Knowledge Recruiters 8547 East Arapahoe Road, J 254 Greenwood Village, Colorado 80112 Phone: 720-339-1361 lmi...@knowledgerecruiters.com http://www.knowledgerecruiters.com/ www.knowledgerecruiters.com http://www.linkedin.com/pub/larry-mills/2/35b/279 http://www.linkedin.com/pub/larry-mills/2/35b/279
User defined class as Map/Reduce output value
Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member properties. The DataInput and DataOutput interfaces used by the read and write methods of the Writable class do not support object serialization. Is there any other way I can achieve this? Thanks, Farhan
Re: User defined class as Map/Reduce output value
Take a look at the sample given in Javadoc of Writable.java You need to serialize your data yourself: @Override public void readFields(DataInput in) throws IOException { h = Text.readString(in); sc = in.readFloat(); ran = in.readInt (); } On Tue, Apr 27, 2010 at 10:53 AM, Farhan Husain farhan.hus...@csebuet.orgwrote: Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member properties. The DataInput and DataOutput interfaces used by the read and write methods of the Writable class do not support object serialization. Is there any other way I can achieve this? Thanks, Farhan
Jetty Exceptions in the DataNode log related to task failure ?
Not sure , what is happening here .. in the sense that is this critical? I had read that the status of a task is passed on to the jobtracker over http. Is that true ? I see tasks killed b'coz of expiree , even though the Datanode seems to be alive and kicking ( expect for the above exception ).. Is there any relation? 2010-04-27 14:51:47,334 WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_201004271342_0001_m_001281_0,49) failed : org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2943) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) ... 23 more -- View this message in context: http://old.nabble.com/Jetty-Exceptions-in-the-DataNode-log-related-to-task-failure---tp28381307p28381307.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: User defined class as Map/Reduce output value
Can I use the Serializable interface? Alternatively, is there any way to specify OutputFormatter for mappers like we can do for reducers? Thanks, Farhan On Tue, Apr 27, 2010 at 1:19 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at the sample given in Javadoc of Writable.java You need to serialize your data yourself: @Override public void readFields(DataInput in) throws IOException { h = Text.readString(in); sc = in.readFloat(); ran = in.readInt (); } On Tue, Apr 27, 2010 at 10:53 AM, Farhan Husain farhan.hus...@csebuet.orgwrote: Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member properties. The DataInput and DataOutput interfaces used by the read and write methods of the Writable class do not support object serialization. Is there any other way I can achieve this? Thanks, Farhan
Questions on MultithreadedMapper
Hi, I've decided to refactor some of my Hadoop jobs and implement them using MultithreadedMapper.class but I got puzzled because of some unexpected error messages at run time. Here are some relevant settings regarding my Hadoop cluster: mapred.tasktracker.map.tasks.maximum = 1 mapred.tasktracker.reduce.tasks.maximum = 1 mapred.job.reuse.jvm.num.tasks = -1 mapred.map.multithreadedrunner.threads = 4 I'd like to know how threads are used to run the map task in a single JVM (Correct me if this is wrong). Suppose I've got a sample Mapper class as such: class Mapper ... { MyObject A; static MyObject B; setup() { Configuration conf = context.getConfiguration(); A.initialize(c); B.initialize(c); } map() {...} cleanup() {...} Does each thread run all three of setup(), map(), cleanup() methods ? -OR- Are setup() and cleanup() run once per task (and thus per JVM according to my settings) and so map is the only multithreaded function? Also, are the objects A and B shared among different threads or does each trade have its own copy of them? My initial guess was that each thread would have a separate copy of A, and B would be shared among the 4 threads running on the same box since it is defined as static, but it appears to me that this assumption is not correct and A seems to be shared. Thanks, Jim
Re: User defined class as Map/Reduce output value
I tried to use a class which implements the Serializable interface and got the following error: java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:759) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:487) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) On Tue, Apr 27, 2010 at 12:53 PM, Farhan Husain farhan.hus...@csebuet.orgwrote: Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member properties. The DataInput and DataOutput interfaces used by the read and write methods of the Writable class do not support object serialization. Is there any other way I can achieve this? Thanks, Farhan
Output pair in Mapper.cleanup method
Hello, Is it possible to output in Mapper.cleanup method since the Mapper.context object is still available there? Thanks, Farhan
Re: Output pair in Mapper.cleanup method
Yes. It's a common pattern to buffer some amount of data in the map() method, flushing every N records and then to flush any remaining records in the cleanup() method. On Tue, Apr 27, 2010 at 6:57 PM, Farhan Husain farhan.hus...@csebuet.org wrote: Hello, Is it possible to output in Mapper.cleanup method since the Mapper.context object is still available there? Thanks, Farhan -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
Re: Output pair in Mapper.cleanup method
Thanks Eric. On Tue, Apr 27, 2010 at 6:19 PM, Eric Sammer esam...@cloudera.com wrote: Yes. It's a common pattern to buffer some amount of data in the map() method, flushing every N records and then to flush any remaining records in the cleanup() method. On Tue, Apr 27, 2010 at 6:57 PM, Farhan Husain farhan.hus...@csebuet.org wrote: Hello, Is it possible to output in Mapper.cleanup method since the Mapper.context object is still available there? Thanks, Farhan -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
Re: User defined class as Map/Reduce output value
Can you try adding 'org.apache.hadoop.io.serializer.JavaSerialization,' to the following config ? C:\hadoop-0.20.2\src\core\core-default.xml(87,9): nameio.serializations/name By default, only org.apache.hadoop.io.serializer.WritableSerialization is included. On Tue, Apr 27, 2010 at 3:55 PM, Farhan Husain farhan.hus...@csebuet.orgwrote: I tried to use a class which implements the Serializable interface and got the following error: java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:759) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:487) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) On Tue, Apr 27, 2010 at 12:53 PM, Farhan Husain farhan.hus...@csebuet.orgwrote: Hello, I want to output a class which I have written as the value of the map phase. The obvious was is to implement the Writable interface but the problem is the class has other classes as its member properties. The DataInput and DataOutput interfaces used by the read and write methods of the Writable class do not support object serialization. Is there any other way I can achieve this? Thanks, Farhan