In cygwin, hadoop throws exception when running wordcount.
Env: windows7+cygwin 1.7.11-1+jdk1.6.0_31+hadoop 1.0.0 So far, the standalone model is ok. But, in pseudo or cluster model, the wordcount example always throw errors. The HDFS works fine, but tasktracker can not create threads(jvm) for new job. It is empty under /logs/userlogs/job-/attempt-/. The error log of tasktracker is like: == 12/03/28 14:35:13 INFO mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203280212_0005_m_-1386636958 12/03/28 14:35:13 INFO mapred.JvmManager: JVM Runner jvm_201203280212_0005_m_-1386636958 spawned. 12/03/28 14:35:17 INFO mapred.JvmManager: JVM Not killed jvm_201203280212_0005_m_-1386636958 but just removed 12/03/28 14:35:17 INFO mapred.JvmManager: JVM : jvm_201203280212_0005_m_-1386636958 exited with exit code -1. Number of tasks it ran: 0 12/03/28 14:35:17 WARN mapred.TaskRunner: attempt_201203280212_0005_m_02_0 : Child Error java.io.IOException: Task process exit with nonzero status of -1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) 12/03/28 14:35:21 INFO mapred.TaskTracker: addFreeSlot : current free slots : 2 12/03/28 14:35:24 INFO mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203280212_0005_m_02_1 task's state:UNASSIGNED 12/03/28 14:35:24 INFO mapred.TaskTracker: Trying to launch : attempt_201203280212_0005_m_02_1 which needs 1 slots 12/03/28 14:35:24 INFO mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203280212_0005_m_02_1 which needs 1 slots 12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201203280212_0005_m_02_0 java.io.FileNotFoundException: D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index (The system cannot find the path specified) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102) at org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stderr log for task: attempt_201203280212_0005_m_02_0 java.io.FileNotFoundException: D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index (The system cannot find the path specified) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102) at org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) at
Re: Cannot renew lease for DFSClient_977492582. Name node is in safe mode in AWS
Hi Mohit, HDFS is in safe mode which is read only mod. Run the following command to get out of safemode bin/hadoop dfsadmin -safemode leave. On Thu, Mar 15, 2012 at 5:54 AM, Mohit Anchlia mohitanch...@gmail.comwrote: When I run client to create files in amazon HDFS I get this error. Does anyone know what it really means and how to resolve this? --- 2012-03-14 23:16:21,414 INFO org.apache.hadoop.ipc.Server (IPC Server handler 46 on 9000): IPC Server handler 46 on 9000, call renewLease(DFSClient_977492582) from 10.70.150.119:47240: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew lease for DFSClient_977492582. Name node is in safe mode. The ratio of reported blocks 1. has reached the threshold 0.9990. Safe mode will be turned off automatically in 0 seconds. org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew lease for DFSClient_977492582. Name node is in safe mode. The ratio of reported blocks 1. has reached the threshold 0.9990. Safe mode will be turned off automatically in 0 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:2296) at org.apache.hadoop.hdfs.server.namenode.NameNode.renewLease(NameNode.java:814) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) -- https://github.com/zinnia-phatak-dev/Nectar
Re: dynamic mapper?
Hi, You can use java API's to compile custom java code and create jars. For example , look at this code from Sqoop /** * Licensed to Cloudera, Inc. under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. Cloudera, Inc. licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.cloudera.sqoop.orm; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.util.ArrayList; import java.util.List; import java.util.jar.JarOutputStream; import java.util.zip.ZipEntry; import javax.tools.JavaCompiler; import javax.tools.JavaFileObject; import javax.tools.StandardJavaFileManager; import javax.tools.ToolProvider; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.mapred.JobConf; import com.cloudera.sqoop.SqoopOptions; import com.cloudera.sqoop.util.FileListing; import com.cloudera.sqoop.util.Jars; /** * Manages the compilation of a bunch of .java files into .class files * and eventually a jar. * * Also embeds this program's jar into the lib/ directory inside the compiled * jar to ensure that the job runs correctly. */ public class CompilationManager { /** If we cannot infer a jar name from a table name, etc., use this. */ public static final String DEFAULT_CODEGEN_JAR_NAME = sqoop-codegen-created.jar; public static final Log LOG = LogFactory.getLog( CompilationManager.class.getName()); private SqoopOptions options; private ListString sources; public CompilationManager(final SqoopOptions opts) { options = opts; sources = new ArrayListString(); } public void addSourceFile(String sourceName) { sources.add(sourceName); } /** * locate the hadoop-*-core.jar in $HADOOP_HOME or --hadoop-home. * If that doesn't work, check our classpath. * @return the filename of the hadoop-*-core.jar file. */ private String findHadoopCoreJar() { String hadoopHome = options.getHadoopHome(); if (null == hadoopHome) { LOG.info($HADOOP_HOME is not set); return Jars.getJarPathForClass(JobConf.class); } if (!hadoopHome.endsWith(File.separator)) { hadoopHome = hadoopHome + File.separator; } File hadoopHomeFile = new File(hadoopHome); LOG.info(HADOOP_HOME is + hadoopHomeFile.getAbsolutePath()); File [] entries = hadoopHomeFile.listFiles(); if (null == entries) { LOG.warn(HADOOP_HOME appears empty or missing); return Jars.getJarPathForClass(JobConf.class); } for (File f : entries) { if (f.getName().startsWith(hadoop-) f.getName().endsWith(-core.jar)) { LOG.info(Found hadoop core jar at: + f.getAbsolutePath()); return f.getAbsolutePath(); } } return Jars.getJarPathForClass(JobConf.class); } /** * Compile the .java files into .class files via embedded javac call. * On success, move .java files to the code output dir. */ public void compile() throws IOException { ListString args = new ArrayListString(); // ensure that the jar output dir exists. String jarOutDir = options.getJarOutputDir(); File jarOutDirObj = new File(jarOutDir); if (!jarOutDirObj.exists()) { boolean mkdirSuccess = jarOutDirObj.mkdirs(); if (!mkdirSuccess) { LOG.debug(Warning: Could not make directories for + jarOutDir); } } else if (LOG.isDebugEnabled()) { LOG.debug(Found existing + jarOutDir); } // Make sure jarOutDir ends with a '/'. if (!jarOutDir.endsWith(File.separator)) { jarOutDir = jarOutDir + File.separator; } // find hadoop-*-core.jar for classpath. String coreJar = findHadoopCoreJar(); if (null == coreJar) { // Couldn't find a core jar to insert into the CP for compilation. If, // however, we're running this from a unit test, then the path to the // .class files might be set via the hadoop.alt.classpath property // instead. Check there first. String coreClassesPath = System.getProperty(hadoop.alt.classpath); if (null == coreClassesPath) { // no -- we're out of options. Fail. throw new IOException(Could not find hadoop core jar!); } else { coreJar = coreClassesPath; } } // find sqoop jar for compilation
Why does just tasktracker run under cyg_server account in cygwin?
Hi all I noticed that when I run bin/start-all.sh in cygwin. Namenode, datanode and jobtracker are all running under the login account, e.g. timwu in my case. Just tasktracker run under cyg_server, where cyg_server is the account created by ssh-host-config when I set up sshd in cygwin. As I install cygwin in d:\cygwin, all namenode, datanode and jobtracker use the d:\\tmp\\hadoop-timwu as their tmp folder. And only tasktraker use d:\\tmp\\hadoop-cyg_server. Do I configure the hadoop correctly? Best Tim
MapReduce on autocomplete
So I have a lot of small files on S3 that I need to consolidate, so headed to Google to see the best way to do it in a MapReduce job. Looks like someone's got a different idea, according to Google's autocomplete: [cid:image001.jpg@01CD0D09.CDEB9E90] ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Brookfield House, Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. Outbound email has been scanned for viruses and SPAM
Re: where are my logging output files going to?
You don't want users actually running anything directly on the cluster. You would set up some machine to launch jobs. Essentially any sort of Linux machine where you can install Hadoop, but you don't run any jobs... Sent from my iPhone On Mar 28, 2012, at 3:30 AM, Jane Wayne jane.wayne2...@gmail.com wrote: what do you mean by an edge node? do you mean any node that is not the master node (or NameNode or JobTracker node)? On Wed, Mar 28, 2012 at 3:51 AM, Michel Segel michael_se...@hotmail.comwrote: First you really don't want to launch the job from the cluster but from an edge node. To answer your question, in a word, yes, you should have a consistent set of configuration files as possible, noting that overtime this may not be possible as hardware configs may change, Sent from a remote device. Please excuse any typos... Mike Segel On Mar 27, 2012, at 8:42 PM, Jane Wayne jane.wayne2...@gmail.com wrote: if i have a hadoop cluster of 10 nodes, do i have to modify the /hadoop/conf/log4j.properties files on ALL 10 nodes to be the same? currently, i ssh into the master node to execute a job. this node is the only place where i have modified the logj4.properties file. i notice that although my log files are being created, nothing is being written to them. when i test on cygwin, the logging works, however, when i go to a live cluster (i.e. amazon elastic mapreduce), the logging output on the master node no longer works. i wonder if logging is happening at each slave/task node? could someone explain logging or point me to the documentation discussing this issue?
Re: MapReduce on autocomplete
Looks like your mail client (or the list) stripped away your image attachment. Could you post the image as a link from imageshack/etc. instead? On Wed, Mar 28, 2012 at 10:10 PM, Tony Burton tbur...@sportingindex.com wrote: So I have a lot of small files on S3 that I need to consolidate, so headed to Google to see the best way to do it in a MapReduce job. Looks like someone’s got a different idea, according to Google’s autocomplete: P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Brookfield House, Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. -- Harsh J
Re: Hbase RegionServer stalls on initialization
It must be waiting for the master. Have you launched the master? On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman nabib.elrah...@tubemogul.com wrote: Hi Guys, I'm starting up an region server and it stalls on initialization. I took a thread dump and found it hanging on this spot: regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in Object.wait() [0x7fa9101b4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122) - locked 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493) at org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560) at java.lang.Thread.run(Thread.java:662) Any Idea on who or what its being blocked on? *Nabib El-Rahman *| Senior Sofware Engineer *M:* 734.846.2529 www.tubemogul.com | *twitter: @nabiber* http://www.tubemogul.com/ http://www.tubemogul.com/
Re: Hbase RegionServer stalls on initialization
Then you should have an error in the master logs. If not, it worths checking that the master the region servers speak to the same ZK... As it's hbase related, I redirect the question to hbase user mailing list (hadoop common is in bcc). On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman nabib.elrah...@tubemogul.com wrote: The master is up. is it possible that zookeeper might not know about it? *Nabib El-Rahman *| Senior Sofware Engineer *M:* 734.846.25 734.846.2529 www.tubemogul.com | *twitter: @nabiber* http://www.tubemogul.com/ http://www.tubemogul.com/ On Mar 28, 2012, at 10:42 AM, N Keywal wrote: It must be waiting for the master. Have you launched the master? On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman nabib.elrah...@tubemogul.com wrote: Hi Guys, I'm starting up an region server and it stalls on initialization. I took a thread dump and found it hanging on this spot: regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in Object.wait() [0x7fa9101b4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122) - locked 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493) at org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560) at java.lang.Thread.run(Thread.java:662) Any Idea on who or what its being blocked on? *Nabib El-Rahman *| Senior Sofware Engineer *M:* 734.846.2529 www.tubemogul.com | *twitter: @nabiber* http://www.tubemogul.com/ http://www.tubemogul.com/
Re: activity on IRC .
Hey Jay, That's the only one I know of. Not a lot of idle chatter, but when people have questions, discussions do start up. Much more active during PST working hours, of course :) -Todd On Wed, Mar 28, 2012 at 8:05 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys : I notice the IRC activity is a little low. Just wondering if theres a better chat channel for hadoop other than the official one (#hadoop on freenode)? In any case... Im on there :) come say hi. -- Jay Vyas MMSB/UCHC -- Todd Lipcon Software Engineer, Cloudera
Re: activity on IRC .
I get good answers on Twitter. Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com On Mar 28, 2012, at 12:27 PM, Todd Lipcon t...@cloudera.com wrote: Hey Jay, That's the only one I know of. Not a lot of idle chatter, but when people have questions, discussions do start up. Much more active during PST working hours, of course :) -Todd On Wed, Mar 28, 2012 at 8:05 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys : I notice the IRC activity is a little low. Just wondering if theres a better chat channel for hadoop other than the official one (#hadoop on freenode)? In any case... Im on there :) come say hi. -- Jay Vyas MMSB/UCHC -- Todd Lipcon Software Engineer, Cloudera
Hadoop Roadmap
I have seen several articles including recent one in SD Times on changes that are about to happen in Hadoop and significant architectural upgrades. Does anyone have a good, detailed resource to recommend where these changes are outlined including long term roadmap. I am interested in this because I would to see how other data parallel algorithms other than M/R and other parallel file system approaches other than HDFS could be fitted into this vision. My team might have resources and interests to contribute. Regards, Edmon
Re: Hadoop Roadmap
Edmon, Here is a brief overview: http://hadoop.apache.org/common/docs/r0.23.1/ Ping me if you want more collateral. thanks, Arun On Mar 28, 2012, at 3:35 PM, Edmon Begoli wrote: I have seen several articles including recent one in SD Times on changes that are about to happen in Hadoop and significant architectural upgrades. Does anyone have a good, detailed resource to recommend where these changes are outlined including long term roadmap. I am interested in this because I would to see how other data parallel algorithms other than M/R and other parallel file system approaches other than HDFS could be fitted into this vision. My team might have resources and interests to contribute. Regards, Edmon -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Possible to poll JobTracker for information from any language?
Hello, I'm interested in writing a library, to be used with Node.js, that can ask the JobTracker for information about jobs. I see that this is possible using the Java API, with the JobClient interface [1]. I also saw that on the wiki, it mentions that clients can poll the JobTracker for information, but does not go into detail [2]. Is it possible to get information about jobs from the JobTracker using C, or C++, or Thrift, or something else? Thanks, Ryan 1. http://hadoop.apache.org/common/docs/r1.0.1/mapred_tutorial.html#Job+Submission+and+Monitoring 2. http://wiki.apache.org/hadoop/JobTracker