[ 
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9779:
-------------------------

    Attachment: 9779part.txt

Fix so we don't retry if table doesn't exist.

If Elliott is reading, this failed the #36, #40 and half if not all of build 
#38. Let me make it critical.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify 
> table 
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-9779
>                 URL: https://issues.apache.org/jira/browse/HBASE-9779
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: stack
>            Assignee: stack
>         Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster 
> state.  Interestingly we can disable the table successfully but then 
> immediately after we fail the delete because we cannot get the table 
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which 
> causes clients to retry over and over again as though the descriptor was 
> going to come back.
> This bug is kinda ugly because in at least one case it caused our 
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:53,539 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, 
> negotiated timeout = 40000
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:54,696 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, 
> negotiated timeout = 40000
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:55,890 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:55,891 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:55,891 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:55,891 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,892 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:55,896 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c72, 
> negotiated timeout = 40000
> 2013-10-11 18:27:55,908 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,912 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c72 closed
> 2013-10-11 18:27:55,912 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
>  2389 ms
> Deleting table IntegrationTestLoadAndVerify 
> 2013-10-11 18:33:02,029 INFO  [main] hbase.HBaseCluster: Restoring cluster - 
> started
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Added new HBaseAdmin
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Restoring cluster - 
> done
> 2013-10-11 18:33:02,033 ERROR [main] util.AbstractHBaseTool: Error running 
> command-line tool
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=35, exceptions:
> Fri Oct 11 18:27:55 PDT 2013, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@48c6f1e8, 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: HTableDescriptor missing for IntegrationTestLoadAndVerify
>       at 
> org.apache.hadoop.hbase.master.handler.TableEventHandler.getTableDescriptor(TableEventHandler.java:231)
>       at 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler.prepareWithTableLock(DeleteTableHandler.java:58)
>       at 
> org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:93)
>       at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1816)
>       at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1826)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38213)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>       at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> ....
> {code}
> Looking on master logs, it looks fine:
> {code}
>   1 2013-10-11 18:27:55,541 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] 
> handler.DisableTableHandler: Disable waiting until done; 298996 ms remaining; 
> []
>   0 2013-10-11 18:27:55,546 INFO  [MASTER_TABLE_OPERATIONS-a1805:60000-0] 
> handler.DisableTableHandler: Disabled table is done=true
>   1 2013-10-11 18:27:55,555 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] 
> lock.ZKInterProcessLockBase: Released 
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000001
>   2 2013-10-11 18:27:55,923 INFO  [RpcServer.handler=14,port=60000] 
> master.HMaster: Client=hbase//10.20.200.112 delete 
> IntegrationTestLoadAndVerify
>   3 2013-10-11 18:27:55,931 DEBUG [RpcServer.handler=14,port=60000] 
> lock.ZKInterProcessLockBase: Acquired a lock for 
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   4 2013-10-11 18:27:55,954 DEBUG [RpcServer.handler=14,port=60000] 
> lock.ZKInterProcessLockBase: Released 
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   5 2013-10-11 18:27:56,167 INFO  [RpcServer.handler=15,port=60000] 
> master.HMaster: Client=hbase//10.20.200.112 delete 
> IntegrationTestLoadAndVerify
> ...
> {code}
> Oddly no exceptions other than corrupted snapshot because mssing hfile.  All 
> seems to have been operating fine.  Will look more.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to