Hello all,
When submitting an Hbase export job to YARN, I see it appearing on the web
UI but for some reason, the job never starts; it constantly stays at 0%
complete. I am using hadoop 0.23 and hbase 0.92 ( CDH4 beta 1 )
I see the NodeManagers connecting to the ResourceManager:
2012-03-15 19:36:10,585 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
torb1pnb001.dataraker.net:46696 Node Transitioned from NEW to RUNNING
2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.util.RackResolver:
Resolved torb1pnb002.dataraker.net to /default-rack
2012-03-15 19:36:16,633 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node torb1pnb002.dataraker.net(cmPort: 35665 httpPort:
9999) registered with capability: 1000, assigned nodeId
torb1pnb002.dataraker.net:35665
2012-03-15 19:36:16,634 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
torb1pnb002.dataraker.net:35665 Node Transitioned from NEW to RUNNING
[ etc... ]
and the job being submitted to the ResourceManager:
2012-03-15 19:40:29,248 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
new applicationId: 1
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1331840162147_0001 State change from NEW to SUBMITTED
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering appattempt_1331840162147_0001_000001
2012-03-15 19:40:31,323 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1331840162147_0001_000001 State change from NEW to SUBMITTED
2012-03-15 19:40:31,327 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 1 submitted by user hdfs with application_id [..snip..]
2012-03-15 19:40:31,329 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
IP=10.192.16.64 OPERATION=Submit Application Request
TARGET=ClientRMService RESULT=SUCCESS
APPID=application_1331840162147_0001
2012-03-15 19:40:31,333 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
Application Submission: application_1331840162147_0001 from hdfs,
currently active: 1
2012-03-15 19:40:31,336 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1331840162147_0001_000001 State change from SUBMITTED to
SCHEDULED
2012-03-15 19:40:31,336 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1331840162147_0001 State change from SUBMITTED to ACCEPTED
but after the NodeManager starts, the log never indicates any requests
from the ResourceManager
2012-03-15 19:36:16,604 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected
to ResourceManager at torb1pna001:8025
2012-03-15 19:36:16,645 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Registered with ResourceManager as torb1pnb002.dataraker.net:35665 with
total resource of memory: 1000
2012-03-15 19:36:16,645 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is
started.
2012-03-15 19:36:16,646 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
[ end of log ]
I am seeing strange errors in Zookeeper when the job is submitted:
2012-03-15 16:58:00,216 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:33262
2012-03-15 16:58:00,219 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:33262
2012-03-15 16:58:00,229 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0071 with negotiated timeout 40000 for
client /127.0.0.1:33262
2012-03-15 16:58:48,884 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0071, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 16:58:48,885 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:33262 which had sessionid
0x35d53d539f0071
2012-03-15 17:02:59,968 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:59652
2012-03-15 17:02:59,971 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:59652
2012-03-15 17:02:59,978 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0072 with negotiated timeout 40000 for
client /127.0.0.1:59652
2012-03-15 17:18:39,480 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0072, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 17:18:39,481 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:59652 which had sessionid
0x35d53d539f0072
2012-03-15 17:48:32,406 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:40908
2012-03-15 17:48:32,409 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:40908
2012-03-15 17:48:32,420 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0073 with negotiated timeout 40000 for
client /127.0.0.1:40908
2012-03-15 18:40:19,836 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x35d53d539f0073, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-15 18:40:19,837 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for client /127.0.0.1:40908 which had sessionid
0x35d53d539f0073
2012-03-15 19:40:30,503 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
Accepted socket connection from /127.0.0.1:45743
2012-03-15 19:40:30,506 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establish new session at /127.0.0.1:45743
2012-03-15 19:40:30,518 - INFO [CommitProcessor:0:ZooKeeperServer@604] -
Established session 0x35d53d539f0074 with negotiated timeout 40000 for
client /127.0.0.1:45743
Any ideas how to debug this?
[--- yarn-site.xml ---]
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/nm-local-dir</value>
<description>List of directories to store localized files
in.</description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs</value>
<description>Where to store container logs.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1000</value>
<description>Amount of physical memory, in MB, that can be
allocated for containers.</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>torb1pna001:8040</value>
<description>The address of the applications manager interface in
the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>torb1pna001:8030</value>
<description>The address of the scheduler interface.</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>torb1pna001:8088</value>
<description>The address of the RM web application.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>torb1pna001:8025</value>
<description></description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>torb1pna001:8141</value>
<description>The address of the RM admin interface.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
[--- mapred-site.xml ---]
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>torb1pna001:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>torb1pna001:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/tmp/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/tmp/mr-history/done</value>
</property>
</configuration>
[--- output from HBase export ---]
bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export dev /backups/dev
12/03/15 17:48:30 INFO mapreduce.Export: verisons=1, starttime=0,
endtime=9223372036854775807
12/03/15 17:48:30 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/03/15 17:48:30 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:host.name=torb1pna001.dataraker.net
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_27
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_27/jre
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/etc/hadoop/conf:/usr/<...snip...>
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/lib/hadoop/lib/native
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.name=Linux
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-274.el5
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.name=hdfs
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/usr/lib/hbase
12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Opening socket connection to
server /127.0.0.1:2181
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Socket connection established
to localhost.localdomain/127.0.0.1:2181, initiating session
12/03/15 17:48:32 INFO zookeeper.RecoverableZooKeeper: The identifier of
this process is 18...@torb1pna001.dataraker.net
12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Session establishment
complete on server localhost.localdomain/127.0.0.1:2181, sessionid =
0x35d53d539f0073, negotiated timeout = 40000
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 17:48:32 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 17:48:32 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 17:48:33 INFO mapred.ResourceMgrDelegate: Submitted application
application_1331830579833_0004 to ResourceManager at
torb1pna001/10.192.16.64:8040
12/03/15 17:48:33 INFO mapreduce.Job: The url to track the job:
http://torb1pna001:8088/proxy/application_1331830579833_0004/
12/03/15 17:48:33 INFO mapreduce.Job: Running job: job_1331830579833_0004
bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export bch_dev
/backups/bch_dev
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.
12/03/15 19:40:28 INFO mapreduce.Export: verisons=1, starttime=0,
endtime=9223372036854775807
12/03/15 19:40:28 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/03/15 19:40:28 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:host.name=torb1pna001.dataraker.net
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_27
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_27/jre
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/etc/hadoop/conf:<...snip...>
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/lib/hadoop/lib/native
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.name=Linux
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-274.el5
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.name=hdfs
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/usr/lib/hbase
12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Opening socket connection to
server /127.0.0.1:2181
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Socket connection established
to localhost.localdomain/127.0.0.1:2181, initiating session
12/03/15 19:40:30 INFO zookeeper.RecoverableZooKeeper: The identifier of
this process is 19...@torb1pna001.dataraker.net
12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Session establishment
complete on server localhost.localdomain/127.0.0.1:2181, sessionid =
0x35d53d539f0074, negotiated timeout = 40000
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
12/03/15 19:40:30 INFO mapreduce.JobSubmitter: number of splits:73
12/03/15 19:40:30 WARN conf.Configuration: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
12/03/15 19:40:31 INFO mapred.ResourceMgrDelegate: Submitted application
application_1331840162147_0001 to ResourceManager at
torb1pna001/10.192.16.64:8040
12/03/15 19:40:31 INFO mapreduce.Job: The url to track the job:
http://torb1pna001:8088/proxy/application_1331840162147_0001/
12/03/15 19:40:31 INFO mapreduce.Job: Running job: job_1331840162147_0001