[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
[ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140287#comment-14140287 ] chendihao commented on HBASE-9779: -- Why "Stopping catalog tracker" and establish sessions so frequently? Is it related to this issue? [~ndimiduk] [~stack] > IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify > table > --- > > Key: HBASE-9779 > URL: https://issues.apache.org/jira/browse/HBASE-9779 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.96.0 >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 9779part.txt > > > As part of the test, we want to delete the created table to restore cluster > state. Interestingly we can disable the table successfully but then > immediately after we fail the delete because we cannot get the table > descriptor -- getting the file descriptor is used to test if table is present. > The test for getDescriptor is kinda broke because it throws base IOE which > causes clients to retry over and over again as though the descriptor was > going to come back. > This bug is kinda ugly because in at least one case it caused our > long-running hbase-it suite run to fail so would be good to fix. > Here is sample from a test run: > {code} > Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO > [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify > 2013-10-11 18:27:53,526 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:53,527 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:53,527 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,529 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:53,539 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, > negotiated timeout = 4 > 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,662 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c70 closed > 2013-10-11 18:27:53,662 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:54,666 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:54,667 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:54,696 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, > negotiated timeout = 4 > 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,871 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c71 closed > 2013-10-11 18:27:54,871 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:55,890 INFO [main] zookeeper.Zoo
[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
[ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799299#comment-13799299 ] stack commented on HBASE-9779: -- It failed again on internal #43. Will dig in on this one now. [~ndimiduk] From users's perspective, agree. I marked it critical because it is failing long running tests and w/o completed long-running tests, it is hard to have confidence in the bits. Thats why I think it critical. > IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify > table > --- > > Key: HBASE-9779 > URL: https://issues.apache.org/jira/browse/HBASE-9779 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.96.0 >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 9779part.txt > > > As part of the test, we want to delete the created table to restore cluster > state. Interestingly we can disable the table successfully but then > immediately after we fail the delete because we cannot get the table > descriptor -- getting the file descriptor is used to test if table is present. > The test for getDescriptor is kinda broke because it throws base IOE which > causes clients to retry over and over again as though the descriptor was > going to come back. > This bug is kinda ugly because in at least one case it caused our > long-running hbase-it suite run to fail so would be good to fix. > Here is sample from a test run: > {code} > Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO > [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify > 2013-10-11 18:27:53,526 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:53,527 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:53,527 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,529 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:53,539 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, > negotiated timeout = 4 > 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,662 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c70 closed > 2013-10-11 18:27:53,662 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:54,666 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:54,667 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:54,696 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, > negotiated timeout = 4 > 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,871 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c71
[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
[ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796953#comment-13796953 ] Nick Dimiduk commented on HBASE-9779: - Patch looks good to me. I think Critical is strong considering the impact is primarily in test. It probably only Major or Minor from a user's perspective. > IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify > table > --- > > Key: HBASE-9779 > URL: https://issues.apache.org/jira/browse/HBASE-9779 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.96.0 >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 9779part.txt > > > As part of the test, we want to delete the created table to restore cluster > state. Interestingly we can disable the table successfully but then > immediately after we fail the delete because we cannot get the table > descriptor -- getting the file descriptor is used to test if table is present. > The test for getDescriptor is kinda broke because it throws base IOE which > causes clients to retry over and over again as though the descriptor was > going to come back. > This bug is kinda ugly because in at least one case it caused our > long-running hbase-it suite run to fail so would be good to fix. > Here is sample from a test run: > {code} > Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO > [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify > 2013-10-11 18:27:53,526 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:53,527 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:53,527 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,529 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:53,539 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, > negotiated timeout = 4 > 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,662 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c70 closed > 2013-10-11 18:27:53,662 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:54,666 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:54,667 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:54,696 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, > negotiated timeout = 4 > 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,871 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c71 closed > 2013-10-11 18:27:54,871 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10