[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443515#comment-13443515
 ] 

Ted Yu edited comment on HBASE-6649 at 8/29/12 8:43 AM:
--------------------------------------------------------

>From 
>https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/118/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread] 
master.AssignmentManager(2911): based on AM, current 
region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071 
server being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG 
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread] 
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394): 
Added=juno.apache.org,55977,1346174923023 to dead servers, submitted shutdown 
handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG 
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread] 
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on 
existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
handler.ServerShutdownHandler(175): Splitting logs for 
juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229): 
master:55418-0x1396e4723930003 Set watcher on existing znode 
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG 
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.MasterFileSystem(267): Renamed region directory: 
hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(894): dead splitlog worker 
juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG 
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(248): started splitting logs in 
[hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869] 
security.UserGroupInformation(1124): PriviledgedActionException 
as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory doesn't 
exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL 
[RegionServer:0;juno.apache.org,55977,1346174923023.logRoller] 
regionserver.HRegionServer(1537): ABORTING region server 
juno.apache.org,55977,1346174923023: IOE in log roller
java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException: 
java.io.FileNotFoundException: Parent directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
        ... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent 
directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
        at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
        ... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: 
java.io.FileNotFoundException: Parent directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy8.create(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy8.create(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
        ... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log 
roller (on region server).
In run() of log roller:
{code}
      } catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
                
      was (Author: [email protected]):
    From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/lastCompletedBuild/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread] 
master.AssignmentManager(2911): based on AM, current 
region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071 
server being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG 
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread] 
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394): 
Added=juno.apache.org,55977,1346174923023 to dead servers, submitted shutdown 
handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG 
[RegionServer:1;juno.apache.org,43891,1346174923071-EventThread] 
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on 
existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
handler.ServerShutdownHandler(175): Splitting logs for 
juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229): 
master:55418-0x1396e4723930003 Set watcher on existing znode 
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG 
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.MasterFileSystem(267): Renamed region directory: 
hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(894): dead splitlog worker 
juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG 
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO  
[MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0] 
master.SplitLogManager(248): started splitting logs in 
[hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869] 
security.UserGroupInformation(1124): PriviledgedActionException 
as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory doesn't 
exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL 
[RegionServer:0;juno.apache.org,55977,1346174923023.logRoller] 
regionserver.HRegionServer(1537): ABORTING region server 
juno.apache.org,55977,1346174923023: IOE in log roller
java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException: 
java.io.FileNotFoundException: Parent directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
        ... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent 
directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
        at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
        ... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: 
java.io.FileNotFoundException: Parent directory doesn't exist: 
/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy8.create(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy8.create(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
        ... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log 
roller (on region server).
In run() of log roller:
{code}
      } catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
                  
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails
> ------------------------------------------------------------------
>
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB & 
> http://bit.ly/O79Dq7 .. 
> Looking briefly at the logs hints at a pattern - in both the failed test 
> instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to