[
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443515#comment-13443515
]
Ted Yu edited comment on HBASE-6649 at 8/29/12 8:43 AM:
--------------------------------------------------------
>From
>https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/118/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread]
master.AssignmentManager(2911): based on AM, current
region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071
server being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394):
Added=juno.apache.org,55977,1346174923023 to dead servers, submitted shutdown
handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on
existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
handler.ServerShutdownHandler(175): Splitting logs for
juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229):
master:55418-0x1396e4723930003 Set watcher on existing znode
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.MasterFileSystem(267): Renamed region directory:
hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(894): dead splitlog worker
juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(248): started splitting logs in
[hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869]
security.UserGroupInformation(1124): PriviledgedActionException
as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory doesn't
exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL
[RegionServer:0;juno.apache.org,55977,1346174923023.logRoller]
regionserver.HRegionServer(1537): ABORTING region server
juno.apache.org,55977,1346174923023: IOE in log roller
java.io.IOException: cannot get log writer
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException:
java.io.FileNotFoundException: Parent directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent
directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException:
java.io.FileNotFoundException: Parent directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log
roller (on region server).
In run() of log roller:
{code}
} catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
was (Author: [email protected]):
From
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/lastCompletedBuild/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread]
master.AssignmentManager(2911): based on AM, current
region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071
server being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394):
Added=juno.apache.org,55977,1346174923023 to dead servers, submitted shutdown
handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on
existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
handler.ServerShutdownHandler(175): Splitting logs for
juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229):
master:55418-0x1396e4723930003 Set watcher on existing znode
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.MasterFileSystem(267): Renamed region directory:
hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(894): dead splitlog worker
juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(248): started splitting logs in
[hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869]
security.UserGroupInformation(1124): PriviledgedActionException
as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory doesn't
exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL
[RegionServer:0;juno.apache.org,55977,1346174923023.logRoller]
regionserver.HRegionServer(1537): ABORTING region server
juno.apache.org,55977,1346174923023: IOE in log roller
java.io.IOException: cannot get log writer
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException:
java.io.FileNotFoundException: Parent directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent
directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException:
java.io.FileNotFoundException: Parent directory doesn't exist:
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log
roller (on region server).
In run() of log roller:
{code}
} catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails
> ------------------------------------------------------------------
>
> Key: HBASE-6649
> URL: https://issues.apache.org/jira/browse/HBASE-6649
> Project: HBase
> Issue Type: Bug
> Reporter: Devaraj Das
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB &
> http://bit.ly/O79Dq7 ..
> Looking briefly at the logs hints at a pattern - in both the failed test
> instances, there was an RS crash while the test was running.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira