Hello all,

When submitting an Hbase export job to YARN, I see it appearing on the web UI but for some reason, the job never starts; it constantly stays at 0% complete. I am using hadoop 0.23 and hbase 0.92 ( CDH4 beta 1 )

I see the NodeManagers connecting to the ResourceManager:

2012-03-15 19:36:10,585 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: torb1pnb001.dataraker.net:46696 Node Transitioned from NEW to RUNNING 2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved torb1pnb002.dataraker.net to /default-rack 2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node torb1pnb002.dataraker.net(cmPort: 35665 httpPort: 9999) registered with capability: 1000, assigned nodeId torb1pnb002.dataraker.net:35665 2012-03-15 19:36:16,634 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: torb1pnb002.dataraker.net:35665 Node Transitioned from NEW to RUNNING
[ etc... ]

and the job being submitted to the ResourceManager:

2012-03-15 19:40:29,248 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 1 2012-03-15 19:40:31,323 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1331840162147_0001 State change from NEW to SUBMITTED 2012-03-15 19:40:31,323 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1331840162147_0001_000001 2012-03-15 19:40:31,323 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1331840162147_0001_000001 State change from NEW to SUBMITTED 2012-03-15 19:40:31,327 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 1 submitted by user hdfs with application_id [..snip..] 2012-03-15 19:40:31,329 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs IP=10.192.16.64 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1331840162147_0001 2012-03-15 19:40:31,333 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Application Submission: application_1331840162147_0001 from hdfs, currently active: 1 2012-03-15 19:40:31,336 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1331840162147_0001_000001 State change from SUBMITTED to SCHEDULED 2012-03-15 19:40:31,336 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1331840162147_0001 State change from SUBMITTED to ACCEPTED

but after the NodeManager starts, the log never indicates any requests from the ResourceManager

2012-03-15 19:36:16,604 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected to ResourceManager at torb1pna001:8025 2012-03-15 19:36:16,645 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as torb1pnb002.dataraker.net:35665 with total resource of memory: 1000 2012-03-15 19:36:16,645 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2012-03-15 19:36:16,646 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
[ end of log ]

I am seeing strange errors in Zookeeper when the job is submitted:

2012-03-15 16:58:00,216 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:33262 2012-03-15 16:58:00,219 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:33262 2012-03-15 16:58:00,229 - INFO [CommitProcessor:0:ZooKeeperServer@604] - Established session 0x35d53d539f0071 with negotiated timeout 40000 for client /127.0.0.1:33262 2012-03-15 16:58:48,884 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35d53d539f0071, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)
2012-03-15 16:58:48,885 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:33262 which had sessionid 0x35d53d539f0071 2012-03-15 17:02:59,968 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:59652 2012-03-15 17:02:59,971 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:59652 2012-03-15 17:02:59,978 - INFO [CommitProcessor:0:ZooKeeperServer@604] - Established session 0x35d53d539f0072 with negotiated timeout 40000 for client /127.0.0.1:59652 2012-03-15 17:18:39,480 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35d53d539f0072, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)
2012-03-15 17:18:39,481 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:59652 which had sessionid 0x35d53d539f0072 2012-03-15 17:48:32,406 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:40908 2012-03-15 17:48:32,409 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:40908 2012-03-15 17:48:32,420 - INFO [CommitProcessor:0:ZooKeeperServer@604] - Established session 0x35d53d539f0073 with negotiated timeout 40000 for client /127.0.0.1:40908 2012-03-15 18:40:19,836 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35d53d539f0073, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)
2012-03-15 18:40:19,837 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:40908 which had sessionid 0x35d53d539f0073 2012-03-15 19:40:30,503 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:45743 2012-03-15 19:40:30,506 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:45743 2012-03-15 19:40:30,518 - INFO [CommitProcessor:0:ZooKeeperServer@604] - Established session 0x35d53d539f0074 with negotiated timeout 40000 for client /127.0.0.1:45743

Any ideas how to debug this?

[--- yarn-site.xml ---]

<?xml version="1.0"?>
<configuration>

    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/tmp/nm-local-dir</value>
<description>List of directories to store localized files in.</description>
    </property>

    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/tmp/logs</value>
        <description>Where to store container logs.</description>
    </property>

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1000</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
    </property>

    <property>
        <name>yarn.resourcemanager.address</name>
        <value>torb1pna001:8040</value>
<description>The address of the applications manager interface in the RM.</description>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>torb1pna001:8030</value>
        <description>The address of the scheduler interface.</description>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>torb1pna001:8088</value>
        <description>The address of the RM web application.</description>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>torb1pna001:8025</value>
        <description></description>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>torb1pna001:8141</value>
        <description>The address of the RM admin interface.</description>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

</configuration>

[--- mapred-site.xml ---]

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
  </property>


  <property>
        <name>mapreduce.jobhistory.address</name>
        <value>torb1pna001:10020</value>
  </property>

  <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>torb1pna001:19888</value>
  </property>


  <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/tmp/mr-history/tmp</value>
  </property>

  <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/tmp/mr-history/done</value>
  </property>

</configuration>

[--- output from HBase export ---]

bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export dev /backups/dev
12/03/15 17:48:30 INFO mapreduce.Export: verisons=1, starttime=0, endtime=9223372036854775807 12/03/15 17:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/03/15 17:48:30 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:host.name=torb1pna001.dataraker.net 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_27 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_27/jre 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/etc/hadoop/conf:/usr/<...snip...> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-274.el5 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/lib/hbase 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session 12/03/15 17:48:32 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 18...@torb1pna001.dataraker.net 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x35d53d539f0073, negotiated timeout = 40000 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 17:48:32 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 12/03/15 17:48:33 INFO mapred.ResourceMgrDelegate: Submitted application application_1331830579833_0004 to ResourceManager at torb1pna001/10.192.16.64:8040 12/03/15 17:48:33 INFO mapreduce.Job: The url to track the job: http://torb1pna001:8088/proxy/application_1331830579833_0004/
12/03/15 17:48:33 INFO mapreduce.Job: Running job: job_1331830579833_0004
bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export bch_dev /backups/bch_dev WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 12/03/15 19:40:28 INFO mapreduce.Export: verisons=1, starttime=0, endtime=9223372036854775807 12/03/15 19:40:28 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/03/15 19:40:28 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:host.name=torb1pna001.dataraker.net 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_27 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_27/jre 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/etc/hadoop/conf:<...snip...> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-274.el5 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/lib/hbase 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session 12/03/15 19:40:30 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 19...@torb1pna001.dataraker.net 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x35d53d539f0074, negotiated timeout = 40000 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 19:40:30 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 12/03/15 19:40:31 INFO mapred.ResourceMgrDelegate: Submitted application application_1331840162147_0001 to ResourceManager at torb1pna001/10.192.16.64:8040 12/03/15 19:40:31 INFO mapreduce.Job: The url to track the job: http://torb1pna001:8088/proxy/application_1331840162147_0001/
12/03/15 19:40:31 INFO mapreduce.Job: Running job: job_1331840162147_0001

Reply via email to