[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table

2014-09-19 Thread chendihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140287#comment-14140287
 ] 

chendihao commented on HBASE-9779:
--

Why "Stopping catalog tracker" and establish sessions so frequently? Is it 
related to this issue? [~ndimiduk] [~stack]

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify 
> table 
> ---
>
> Key: HBASE-9779
> URL: https://issues.apache.org/jira/browse/HBASE-9779
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster 
> state.  Interestingly we can disable the table successfully but then 
> immediately after we fail the delete because we cannot get the table 
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which 
> causes clients to retry over and over again as though the descriptor was 
> going to come back.
> This bug is kinda ugly because in at least one case it caused our 
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:53,539 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, 
> negotiated timeout = 4
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:54,696 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, 
> negotiated timeout = 4
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:55,890 INFO  [main] zookeeper.Zoo

[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table

2013-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799299#comment-13799299
 ] 

stack commented on HBASE-9779:
--

It failed again on internal #43.  Will dig in on this one now.

[~ndimiduk] From users's perspective, agree.  I marked it critical because it 
is failing long running tests and w/o completed long-running tests, it is hard 
to have confidence in the bits.  Thats why I think it critical.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify 
> table 
> ---
>
> Key: HBASE-9779
> URL: https://issues.apache.org/jira/browse/HBASE-9779
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster 
> state.  Interestingly we can disable the table successfully but then 
> immediately after we fail the delete because we cannot get the table 
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which 
> causes clients to retry over and over again as though the descriptor was 
> going to come back.
> This bug is kinda ugly because in at least one case it caused our 
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:53,539 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, 
> negotiated timeout = 4
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:54,696 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, 
> negotiated timeout = 4
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c71

[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table

2013-10-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796953#comment-13796953
 ] 

Nick Dimiduk commented on HBASE-9779:
-

Patch looks good to me.

I think Critical is strong considering the impact is primarily in test. It 
probably only Major or Minor from a user's perspective.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify 
> table 
> ---
>
> Key: HBASE-9779
> URL: https://issues.apache.org/jira/browse/HBASE-9779
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster 
> state.  Interestingly we can disable the table successfully but then 
> immediately after we fail the delete because we cannot get the table 
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which 
> causes clients to retry over and over again as though the descriptor was 
> going to come back.
> This bug is kinda ugly because in at least one case it caused our 
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:53,539 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, 
> negotiated timeout = 4
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:54,696 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, 
> negotiated timeout = 4
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10