[jira] [Commented] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
[ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140287#comment-14140287 ] chendihao commented on HBASE-9779: -- Why "Stopping catalog tracker" and establish sessions so frequently? Is it related to this issue? [~ndimiduk] [~stack] > IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify > table > --- > > Key: HBASE-9779 > URL: https://issues.apache.org/jira/browse/HBASE-9779 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.96.0 >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 9779part.txt > > > As part of the test, we want to delete the created table to restore cluster > state. Interestingly we can disable the table successfully but then > immediately after we fail the delete because we cannot get the table > descriptor -- getting the file descriptor is used to test if table is present. > The test for getDescriptor is kinda broke because it throws base IOE which > causes clients to retry over and over again as though the descriptor was > going to come back. > This bug is kinda ugly because in at least one case it caused our > long-running hbase-it suite run to fail so would be good to fix. > Here is sample from a test run: > {code} > Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO > [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify > 2013-10-11 18:27:53,526 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:53,527 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:53,527 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,529 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:53,539 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, > negotiated timeout = 4 > 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5 > 2013-10-11 18:27:53,662 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c70 closed > 2013-10-11 18:27:53,662 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:54,666 INFO [main] zookeeper.ZooKeeper: Initiating client > connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 > watcher=catalogtracker-on-hconnection-0x5a7e666f > 2013-10-11 18:27:54,667 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper > ensemble=a1805.halxg.cloudera.com:2181 > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Opening socket connection to server > a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate > using SASL (unknown error) > 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,667 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket > connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, > initiating session > 2013-10-11 18:27:54,696 INFO > [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: > Session establishment complete on server > a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, > negotiated timeout = 4 > 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog > tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d > 2013-10-11 18:27:54,871 INFO [main] zookeeper.ZooKeeper: Session: > 0x1412d47f53a5c71 closed > 2013-10-11 18:27:54,871 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > .2013-10-11 18:27:55,890 INFO [main] zookeeper.Zoo
[jira] [Commented] (HBASE-11885) Provide a Dockerfile to easily build and run HBase from source
[ https://issues.apache.org/jira/browse/HBASE-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120811#comment-14120811 ] chendihao commented on HBASE-11885: --- That's nice and I haven't thoungt about the license problem. As users they may not care about the versions of Java and Maven. So I think the ideal way for them may be pulling the HBase image to run instead of building it by themselves. As developers we may want to test the combination of softwares. But it's hard to cater to all kinds of environment, like different versions of Java, Maven, Linux distributions and kernels. > Provide a Dockerfile to easily build and run HBase from source > -- > > Key: HBASE-11885 > URL: https://issues.apache.org/jira/browse/HBASE-11885 > Project: HBase > Issue Type: New Feature >Reporter: Dima Spivak >Assignee: Dima Spivak > Attachments: HBASE-11885.patch > > > [A recent email to > dev@|http://mail-archives.apache.org/mod_mbox/hbase-dev/201408.mbox/%3CCAAef%2BM4q%3Da8Dqxe_EHSFTueY%2BXxz%2BtTe%2BJKsWWbXjhB_Pz7oSA%40mail.gmail.com%3E] > highlighted the difficulty that new users can face in getting HBase compiled > from source and running locally. I'd like to provide a Dockerfile that would > allow anyone with Docker running on a machine with a reasonably current Linux > kernel to do so with ease. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11885) Provide a Dockerfile to easily build and run HBase from source
[ https://issues.apache.org/jira/browse/HBASE-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120783#comment-14120783 ] chendihao commented on HBASE-11885: --- I'm also interested in building HBase on docker and here's my dockerfile, https://github.com/tobegit3hub/standalone-hbase-0.94. There're lots of HBase images in docker hub. It let us escape from dependency hell so I don't think we should download the JDK and Maven tars by ourselves to build the image:-) > Provide a Dockerfile to easily build and run HBase from source > -- > > Key: HBASE-11885 > URL: https://issues.apache.org/jira/browse/HBASE-11885 > Project: HBase > Issue Type: New Feature >Reporter: Dima Spivak >Assignee: Dima Spivak > Attachments: HBASE-11885.patch > > > [A recent email to > dev@|http://mail-archives.apache.org/mod_mbox/hbase-dev/201408.mbox/%3CCAAef%2BM4q%3Da8Dqxe_EHSFTueY%2BXxz%2BtTe%2BJKsWWbXjhB_Pz7oSA%40mail.gmail.com%3E] > highlighted the difficulty that new users can face in getting HBase compiled > from source and running locally. I'd like to provide a Dockerfile that would > allow anyone with Docker running on a machine with a reasonably current Linux > kernel to do so with ease. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11778) Scale timestamps by 1000
[ https://issues.apache.org/jira/browse/HBASE-11778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103300#comment-14103300 ] chendihao commented on HBASE-11778: --- Shifting bits seems more efficient. If we have nano time, it's no need to scale timestamps. But is there something problematic to use nano time in Java? > Scale timestamps by 1000 > > > Key: HBASE-11778 > URL: https://issues.apache.org/jira/browse/HBASE-11778 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl > > The KV timestamps are used for various reasons: > # ordering of KVs > # resolving conflicts > # enforce TTL > Currently we assume that the timestamps have a resolution of 1ms, and because > of that we made the resolution at which we can determine time identical to > the resolution at which we can store time. > I think it is time to disentangle the two... At least allow a higher > resolution of time to be stored. That way we could have a centralized > transaction oracle that produces ids that relate to wall clock time, and at > the same time allow producing more than 1000/s. > The simplest way is to just store time in us (microseconds). I.e. we'd still > collect time in ms by default and just multiply that with 1000 before we > store it. With 8 bytes that still gives us a range of 292471 years. > We'd have grandfather in old data. Could write a metadata entry into each > HFile declaring what the TS resolution is if it is different from ms. > Not sure, yet, how this would relate to using the TS for things like seqIds. > Let's do some brainstorming. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100559#comment-14100559 ] chendihao commented on HBASE-11769: --- Agree with [~jmspaggi]. Truncate_preserve works well without removing the privilieges. Won't fix, right? > Truncate table shouldn't revoke user privileges > --- > > Key: HBASE-11769 > URL: https://issues.apache.org/jira/browse/HBASE-11769 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 0.94.15 >Reporter: hongyu bi > > hbase(main):002:0> create 'a','cf' > 0 row(s) in 0.2500 seconds > => Hbase::Table - a > hbase(main):003:0> grant 'usera','R','a' > 0 row(s) in 0.2080 seconds > hbase(main):007:0> user_permission 'a' > User > Table,Family,Qualifier:Permission > > > usera a,,: > [Permission: actions=READ] > > > hbase(main):004:0> truncate 'a' > Truncating 'a' table (it may take a while): > - Disabling table... > - Dropping table... > - Creating table... > 0 row(s) in 1.5320 seconds > hbase(main):005:0> user_permission 'a' > User > Table,Family,Qualifier:Permission > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11754) [Shell] Record table property SPLITS_FILE in descriptor
[ https://issues.apache.org/jira/browse/HBASE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100206#comment-14100206 ] chendihao commented on HBASE-11754: --- Thank [~apurtell] and the new title is much better :-) [~jmspaggi] We will not see this property in newer tables. Can't find any code to set it, either. > [Shell] Record table property SPLITS_FILE in descriptor > --- > > Key: HBASE-11754 > URL: https://issues.apache.org/jira/browse/HBASE-11754 > Project: HBase > Issue Type: Improvement >Reporter: chendihao >Assignee: chendihao >Priority: Trivial > Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 > > Attachments: HBASE-11754-0.94-1.patch, HBASE-11754-trunk-1.patch > > > When I check the properties of HBase table on Web UI, some tables have > SPLITS_FILE property but some don't. In fact, those tables pre-split > correctly but this property is not stored in .tableinfo in HDFS. But some > table do, that's a little weird. > Knowing SPLITS_FILE helps us to compare the backup table in different cluster > with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11754) Display property SPLITS_FILE on Web UI
[ https://issues.apache.org/jira/browse/HBASE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-11754: -- Attachment: HBASE-11754-0.94-1.patch Patch for 0.94. > Display property SPLITS_FILE on Web UI > -- > > Key: HBASE-11754 > URL: https://issues.apache.org/jira/browse/HBASE-11754 > Project: HBase > Issue Type: Improvement >Reporter: chendihao >Priority: Minor > Attachments: HBASE-11754-0.94-1.patch, HBASE-11754-trunk-1.patch > > > When I check the properties of HBase table on Web UI, some tables have > SPLITS_FILE property but some don't. In fact, those tables pre-split > correctly but this property is not stored in .tableinfo in HDFS. But some > table do, that's a little weird. > Knowing SPLITS_FILE helps us to compare the backup table in different cluster > with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11754) Display property SPLITS_FILE on Web UI
[ https://issues.apache.org/jira/browse/HBASE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-11754: -- Attachment: HBASE-11754-trunk-1.patch Patch for trunk. > Display property SPLITS_FILE on Web UI > -- > > Key: HBASE-11754 > URL: https://issues.apache.org/jira/browse/HBASE-11754 > Project: HBase > Issue Type: Improvement >Reporter: chendihao >Priority: Minor > Attachments: HBASE-11754-trunk-1.patch > > > When I check the properties of HBase table on Web UI, some tables have > SPLITS_FILE property but some don't. In fact, those tables pre-split > correctly but this property is not stored in .tableinfo in HDFS. But some > table do, that's a little weird. > Knowing SPLITS_FILE helps us to compare the backup table in different cluster > with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11754) Display property SPLITS_FILE on Web UI
[ https://issues.apache.org/jira/browse/HBASE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098130#comment-14098130 ] chendihao commented on HBASE-11754: --- Please have a look. [~stack] [~lhofhansl] > Display property SPLITS_FILE on Web UI > -- > > Key: HBASE-11754 > URL: https://issues.apache.org/jira/browse/HBASE-11754 > Project: HBase > Issue Type: Improvement >Reporter: chendihao >Priority: Minor > Attachments: HBASE-11754-trunk-1.patch > > > When I check the properties of HBase table on Web UI, some tables have > SPLITS_FILE property but some don't. In fact, those tables pre-split > correctly but this property is not stored in .tableinfo in HDFS. But some > table do, that's a little weird. > Knowing SPLITS_FILE helps us to compare the backup table in different cluster > with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11754) Display property SPLITS_FILE on Web UI
[ https://issues.apache.org/jira/browse/HBASE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098124#comment-14098124 ] chendihao commented on HBASE-11754: --- We usually use Ruby script to create tables. I think we can add this property to HTableDescriptor object when creating tables. Then .tableinfo stores this value and HBase could display correctly on Web UI. > Display property SPLITS_FILE on Web UI > -- > > Key: HBASE-11754 > URL: https://issues.apache.org/jira/browse/HBASE-11754 > Project: HBase > Issue Type: Improvement >Reporter: chendihao >Priority: Minor > > When I check the properties of HBase table on Web UI, some tables have > SPLITS_FILE property but some don't. In fact, those tables pre-split > correctly but this property is not stored in .tableinfo in HDFS. But some > table do, that's a little weird. > Knowing SPLITS_FILE helps us to compare the backup table in different cluster > with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11754) Display property SPLITS_FILE on Web UI
chendihao created HBASE-11754: - Summary: Display property SPLITS_FILE on Web UI Key: HBASE-11754 URL: https://issues.apache.org/jira/browse/HBASE-11754 Project: HBase Issue Type: Improvement Reporter: chendihao Priority: Minor When I check the properties of HBase table on Web UI, some tables have SPLITS_FILE property but some don't. In fact, those tables pre-split correctly but this property is not stored in .tableinfo in HDFS. But some table do, that's a little weird. Knowing SPLITS_FILE helps us to compare the backup table in different cluster with the original one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10153) improve VerifyReplication to compute BADROWS more accurately
[ https://issues.apache.org/jira/browse/HBASE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098048#comment-14098048 ] chendihao commented on HBASE-10153: --- Without this improvement, there're so many BADROWS while these tow tables are consistent. It's a little misleading. Please have a look. [~stack] [~lhofhansl] > improve VerifyReplication to compute BADROWS more accurately > > > Key: HBASE-10153 > URL: https://issues.apache.org/jira/browse/HBASE-10153 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.94.14 >Reporter: cuijianwei > Attachments: HBASE-10153-0.94-v1.patch > > > VerifyReplicaiton could compare the source table with its peer table and > compute BADROWS. However, the current BADROWS computing method might not be > accurate enough. For example, if source table contains rows as {r1, r2, r3, > r4} and peer table contains rows as {r1, r3, r4} BADROWS will be 3 because > 'r2' in source table will make all the later row comparisons fail. Will it be > better if the BADROWS is computed to 1 in this situation? Maybe, we can > compute the BADROWS more accurately in merge comparison? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11675) Major compactions change query results
[ https://issues.apache.org/jira/browse/HBASE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086170#comment-14086170 ] chendihao commented on HBASE-11675: --- [~jmspaggi] We're working on it. At present, there's no perfect solution but I don't think it's a normal behavior for our users. We will make a patch when we work it out. > Major compactions change query results > -- > > Key: HBASE-11675 > URL: https://issues.apache.org/jira/browse/HBASE-11675 > Project: HBase > Issue Type: Bug >Reporter: chendihao > > The bug mentioned in http://hbase.apache.org/book.html > 5.9.2.2. Major compactions change query results > “...create three cell versions at t1, t2 and t3, with a maximum-versions > setting of 2. So when getting all versions, only the values at t2 and t3 will > be returned. But if you delete the version at t2 or t3, the one at t1 will > appear again. Obviously, once a major compaction has run, such behavior will > not be the case anymore...” -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11675) Major compactions change query results
chendihao created HBASE-11675: - Summary: Major compactions change query results Key: HBASE-11675 URL: https://issues.apache.org/jira/browse/HBASE-11675 Project: HBase Issue Type: Bug Reporter: chendihao The bug mentioned in http://hbase.apache.org/book.html 5.9.2.2. Major compactions change query results “...create three cell versions at t1, t2 and t3, with a maximum-versions setting of 2. So when getting all versions, only the values at t2 and t3 will be returned. But if you delete the version at t2 or t3, the one at t1 will appear again. Obviously, once a major compaction has run, such behavior will not be the case anymore...” -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9800: - Attachment: HBASE-9800-0.94-v2.patch patch for 0.94 v2 > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch, HBASE-9800-0.94-v2.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933036#comment-13933036 ] chendihao commented on HBASE-9800: -- Thank [~yuzhih...@gmail.com] for reviewing. 1. I will update the license header. 2. LocalFileSink can be used when we set "coprocessor.rowcounter.sink" as "org.apache.hadoop.hbase.coprocessor.example.CoprocessorRowcounter$LocalFileSink" in the configuration file. 3. We periodically count the rows, so we need to indicate the date of the data. It's not so general but meets our requirements. 4. The patch for trunk will be uploaded later if we need it. > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931576#comment-13931576 ] chendihao commented on HBASE-9800: -- [~te...@apache.org] [~stack] It's a nice feature for us to count the rows. But the issue has been opened for a long time, could you help to review the code or just close it? > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10365) HBaseFsck should clean up connection properly when repair is completed
[ https://issues.apache.org/jira/browse/HBASE-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879810#comment-13879810 ] chendihao commented on HBASE-10365: --- Will this backport to 0.94? [~lhofhansl] > HBaseFsck should clean up connection properly when repair is completed > -- > > Key: HBASE-10365 > URL: https://issues.apache.org/jira/browse/HBASE-10365 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.0, 0.99.0 > > Attachments: 10365-v1.txt > > > At the end of exec() method, connections to the cluster are not properly > released. > Connections should be released upon completion of repair. > This was mentioned by Jean-Marc in the thread '[VOTE] The 1st hbase 0.94.16 > release candidate is available for download' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10282: -- Attachment: HBASE-10282-0.94-v1.patch patch for 0.94 > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10282-0.94-v1.patch > > > Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9932) Remove Master Recovery handling when ZK session expired
[ https://issues.apache.org/jira/browse/HBASE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871871#comment-13871871 ] chendihao commented on HBASE-9932: -- If it works for now, I don't think we have to remove the functional code. [~jeffreyz] Can you explain what problem it will cause during recovering the master? I'm working on this and have the same feeling about that there are potential issues when HMaster disconnected or session expired(refer to HBASE-10345). Removing the recovery handling is the simplest solution but we should understand underlying problem firstly. > Remove Master Recovery handling when ZK session expired > --- > > Key: HBASE-9932 > URL: https://issues.apache.org/jira/browse/HBASE-9932 > Project: HBase > Issue Type: Brainstorming >Reporter: Jeffrey Zhong > > Currently we use HMaster#tryRecoveringExpiredZKSession to allow master > recovery from a ZK session expired error. While this triggers to initialize > HMaster partially, it is error prone because it's hard to guarantee the half > initialized master is in correct state. I found several times already that > the registered ZK listeners are different before & after a fail over. > Since we already have HA support, I'm proposing to remove this part handling. > Though we have a configuration setting "fail.fast.expired.active.master" to > skip the logic, why not go one stop further to clean the master code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10345) HMaster should not serve when disconnected with ZooKeeper
chendihao created HBASE-10345: - Summary: HMaster should not serve when disconnected with ZooKeeper Key: HBASE-10345 URL: https://issues.apache.org/jira/browse/HBASE-10345 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.3 Reporter: chendihao Refer to HBASE-9468(Previous active master can still serves RPC request when it is trying recovering expired zk session), we can fail fast to avoid existing double masters at the same time. But this problem may occur before session expired. When receive Disconnected event, we can't make sure of that this active master can communicate with zk later. And it doesn't know whether backup master has become the new active master or not until it receives Expired event(which may lose forever). During this unsure-who-is-active-master period, the current active master should not serve(maybe turn off RpcServer). Here is the statement from "ZooKeeper Distributed Process Coordination" P101 {quote} If the developer is not careful, the old leader will continue to act as a leader and may take actions that conflict with those of the new leader. For this reason, when a process receives a Disconnected event, the process should suspend actions taken as a leader until it reconnects. Normally this reconnect happens very quickly. {quote} So it's equally necessary to handle Disconnected event and Expired event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-10282: - Assignee: chendihao > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > > Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871513#comment-13871513 ] chendihao commented on HBASE-10282: --- Thank [~stack] and we(Xiaomi) will make a patch to eliminate the confusion. Can we reduce those two function into a killRandomZooKeeperServer() because they seem to have the same effect? Before doing that, we have to fix HBASE-10283 otherwise killing the first one will occur other problems. > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-9830: Assignee: chendihao > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Fix For: 0.94.17 > > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
[ https://issues.apache.org/jira/browse/HBASE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870655#comment-13870655 ] chendihao commented on HBASE-9392: -- It's a trival improvement for ChaosMonkey. How about a quick fix? [~stack] [~lhofhansl] > Add RestartBackupMastersAction for ChaosMonkey > -- > > Key: HBASE-9392 > URL: https://issues.apache.org/jira/browse/HBASE-9392 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: RestartBackupMastersAction.patch > > > Just implement RestartBackupMastersAction for more failures. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
[ https://issues.apache.org/jira/browse/HBASE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-9392: Assignee: chendihao > Add RestartBackupMastersAction for ChaosMonkey > -- > > Key: HBASE-9392 > URL: https://issues.apache.org/jira/browse/HBASE-9392 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: RestartBackupMastersAction.patch > > > Just implement RestartBackupMastersAction for more failures. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870508#comment-13870508 ] chendihao commented on HBASE-10274: --- Thanks all :-) There're two minor issues about minicluster, HBASE-10282 and HBASE-10283, do you mind taking a look? [~enis] [~lhofhansl] [~apurtell] > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 > > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, > HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch, > HBASE-10274-truck-v2.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870504#comment-13870504 ] chendihao commented on HBASE-10283: --- Thanks all :-) There're two minor issues about minicluster, HBASE-10282 and HBASE-10283, do you mind taking a look? > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao > Attachments: HBASE-10283-0.94-v1.patch > > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only connect with the first zk server and if you > kill the first one, it fails to access the cluster even though other zk > servers are serving. > It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call > `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you > construct the zk client, it can't connect with the zk cluster for any way. > Here is the simple log you can refer. > {noformat} > 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): > Started MiniZK Cluster and connect 1 ZK server on client port: 55227 > .. > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): > Kill the current active ZK servers in the cluster on client port: 55227 > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): > Activate a backup zk server in the cluster on client port: 55228 > 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): > Initiating client connection, connectString=localhost:55227 > sessionTimeout=3000 > watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 > (then it throws exceptions..) > {noformat} > The log is kind of problematic because it always show "Started MiniZK Cluster > and connect 1 ZK server" but actually there're three zk servers. > Looking deeply we find that the client is still trying to connect with the > dead zk server's port. When I print out the zkQuorum it used, only the first > zk server's hostport is there and it will not change no matter you kill the > server or not. The reason for this is in ZKConfig which will convert HBase > settings into zk's. MiniZooKeeperCluster create three servers with the same > host name, "localhost", and different ports. But HBase self force to use the > same port for each zk server and ZKConfig will ignore the other two servers > which have the same host name. > MiniZooKeeperCluster works improperly before we fix this. The bug is not > found because we never test whether HBase works or not if we kill the zk > active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10283: -- Attachment: HBASE-10283-0.94-v1.patch patch for 0.94 > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao > Attachments: HBASE-10283-0.94-v1.patch > > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only connect with the first zk server and if you > kill the first one, it fails to access the cluster even though other zk > servers are serving. > It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call > `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you > construct the zk client, it can't connect with the zk cluster for any way. > Here is the simple log you can refer. > {noformat} > 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): > Started MiniZK Cluster and connect 1 ZK server on client port: 55227 > .. > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): > Kill the current active ZK servers in the cluster on client port: 55227 > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): > Activate a backup zk server in the cluster on client port: 55228 > 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): > Initiating client connection, connectString=localhost:55227 > sessionTimeout=3000 > watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 > (then it throws exceptions..) > {noformat} > The log is kind of problematic because it always show "Started MiniZK Cluster > and connect 1 ZK server" but actually there're three zk servers. > Looking deeply we find that the client is still trying to connect with the > dead zk server's port. When I print out the zkQuorum it used, only the first > zk server's hostport is there and it will not change no matter you kill the > server or not. The reason for this is in ZKConfig which will convert HBase > settings into zk's. MiniZooKeeperCluster create three servers with the same > host name, "localhost", and different ports. But HBase self force to use the > same port for each zk server and ZKConfig will ignore the other two servers > which have the same host name. > MiniZooKeeperCluster works improperly before we fix this. The bug is not > found because we never test whether HBase works or not if we kill the zk > active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-10283: - Assignee: chendihao > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only connect with the first zk server and if you > kill the first one, it fails to access the cluster even though other zk > servers are serving. > It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call > `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you > construct the zk client, it can't connect with the zk cluster for any way. > Here is the simple log you can refer. > {noformat} > 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): > Started MiniZK Cluster and connect 1 ZK server on client port: 55227 > .. > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): > Kill the current active ZK servers in the cluster on client port: 55227 > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): > Activate a backup zk server in the cluster on client port: 55228 > 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): > Initiating client connection, connectString=localhost:55227 > sessionTimeout=3000 > watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 > (then it throws exceptions..) > {noformat} > The log is kind of problematic because it always show "Started MiniZK Cluster > and connect 1 ZK server" but actually there're three zk servers. > Looking deeply we find that the client is still trying to connect with the > dead zk server's port. When I print out the zkQuorum it used, only the first > zk server's hostport is there and it will not change no matter you kill the > server or not. The reason for this is in ZKConfig which will convert HBase > settings into zk's. MiniZooKeeperCluster create three servers with the same > host name, "localhost", and different ports. But HBase self force to use the > same port for each zk server and ZKConfig will ignore the other two servers > which have the same host name. > MiniZooKeeperCluster works improperly before we fix this. The bug is not > found because we never test whether HBase works or not if we kill the zk > active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869396#comment-13869396 ] chendihao commented on HBASE-10283: --- There're two solutions for this. The first one is allowing to set different zk ports for HBase(generic but contrary to original design). And the next one is adding extra code in ZKConf to support multiple ports for MiniZooKeeperCluster. I prefer the last one and try to reduce the change of code. MiniZooKeeperCluster can't be used for zk failover test before it's fixed. Can [~enis] help to review this? > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only connect with the first zk server and if you > kill the first one, it fails to access the cluster even though other zk > servers are serving. > It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call > `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you > construct the zk client, it can't connect with the zk cluster for any way. > Here is the simple log you can refer. > {noformat} > 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): > Started MiniZK Cluster and connect 1 ZK server on client port: 55227 > .. > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): > Kill the current active ZK servers in the cluster on client port: 55227 > 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): > Activate a backup zk server in the cluster on client port: 55228 > 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): > Initiating client connection, connectString=localhost:55227 > sessionTimeout=3000 > watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 > (then it throws exceptions..) > {noformat} > The log is kind of problematic because it always show "Started MiniZK Cluster > and connect 1 ZK server" but actually there're three zk servers. > Looking deeply we find that the client is still trying to connect with the > dead zk server's port. When I print out the zkQuorum it used, only the first > zk server's hostport is there and it will not change no matter you kill the > server or not. The reason for this is in ZKConfig which will convert HBase > settings into zk's. MiniZooKeeperCluster create three servers with the same > host name, "localhost", and different ports. But HBase self force to use the > same port for each zk server and ZKConfig will ignore the other two servers > which have the same host name. > MiniZooKeeperCluster works improperly before we fix this. The bug is not > found because we never test whether HBase works or not if we kill the zk > active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869370#comment-13869370 ] chendihao commented on HBASE-10282: --- The function {{killCurrentActiveZooKeeperServer()}} and {{killOneBackupZooKeeperServer()}} are nonsense because of this. I think [~liyin] treated the first zk server as the leader but we can't make sure of that. So should we rename {{activeZKServerIndex}} into {{firstZKServerIndex}} and combining these two functions into {{killFirstZooKeeperServer()}}(hard to know which one is actual leader)? Need more people to discuss it. [~enis] [~stack] > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869360#comment-13869360 ] chendihao commented on HBASE-10274: --- Thank [~lhofhansl] to resolve HBASE-10306 and please commit this by the way. > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, > HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10306) Backport HBASE-6820 to 0.94, MiniZookeeperCluster should ensure that ZKDatabase is closed upon shutdown()
[ https://issues.apache.org/jira/browse/HBASE-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868719#comment-13868719 ] chendihao commented on HBASE-10306: --- Thanks for the review. [~lhofhansl] [~enis] > Backport HBASE-6820 to 0.94, MiniZookeeperCluster should ensure that > ZKDatabase is closed upon shutdown() > - > > Key: HBASE-10306 > URL: https://issues.apache.org/jira/browse/HBASE-10306 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Fix For: 0.94.16 > > Attachments: HBASE-10306-0.94-v1.patch > > > Backport HBASE-6820: [WINDOWS] MiniZookeeperCluster should ensure that > ZKDatabase is closed upon shutdown() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10306) Backport HBASE-6820 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10306: -- Fix Version/s: 0.94.3 0.94.16 Status: Patch Available (was: Open) > Backport HBASE-6820 to 0.94 > --- > > Key: HBASE-10306 > URL: https://issues.apache.org/jira/browse/HBASE-10306 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Fix For: 0.94.16, 0.94.3 > > Attachments: HBASE-10306-0.94-v1.patch > > > Backport HBASE-6820: [WINDOWS] MiniZookeeperCluster should ensure that > ZKDatabase is closed upon shutdown() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10306) Backport HBASE-6820 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866402#comment-13866402 ] chendihao commented on HBASE-10306: --- The patch is recreated for the newest code of 0.94. [~enis] please review. > Backport HBASE-6820 to 0.94 > --- > > Key: HBASE-10306 > URL: https://issues.apache.org/jira/browse/HBASE-10306 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10306-0.94-v1.patch > > > Backport HBASE-6820: [WINDOWS] MiniZookeeperCluster should ensure that > ZKDatabase is closed upon shutdown() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10306) Backport HBASE-6820 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10306: -- Attachment: HBASE-10306-0.94-v1.patch patch for 0.94 > Backport HBASE-6820 to 0.94 > --- > > Key: HBASE-10306 > URL: https://issues.apache.org/jira/browse/HBASE-10306 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10306-0.94-v1.patch > > > Backport HBASE-6820: [WINDOWS] MiniZookeeperCluster should ensure that > ZKDatabase is closed upon shutdown() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866390#comment-13866390 ] chendihao commented on HBASE-10274: --- Thank [~enis] and you're so considerate. It's all good as long as the problem is solved. HBASE-10306 is opened and needs a quick fix. > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, > HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10306) Backport HBASE-6820 to 0.94
chendihao created HBASE-10306: - Summary: Backport HBASE-6820 to 0.94 Key: HBASE-10306 URL: https://issues.apache.org/jira/browse/HBASE-10306 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Backport HBASE-6820: [WINDOWS] MiniZookeeperCluster should ensure that ZKDatabase is closed upon shutdown() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866100#comment-13866100 ] chendihao commented on HBASE-10274: --- Backporting HBASE-6820 seems good for us. Thanks for considering. [~enis] Let's opening another issue to do that. > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, > HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10274: -- Attachment: HBASE-10274-0.94-v2.patch HBASE-10274-truck-v2.patch > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, > HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863764#comment-13863764 ] chendihao commented on HBASE-10274: --- bq. For the killOneBackupZooKeeperServer(), I think you are closing the the ZKDatabase for the active server instead of the backupZkServer My mistake. Fix it by uploading v2 patch and thanks for reviewing. bq. Do you need to have this patch for 0.94? I think it's better to fix it because our codebase is 0.94. > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-truck-v1.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10283: -- Description: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. was: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client ca
[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10283: -- Description: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. was: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only c
[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10283: -- Description: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. was: Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. > Client can't connect with all the running zk servers in MiniZooKeeperCluster > > > Key: HBASE-10283 > URL: https://issues.apache.org/jira/browse/HBASE-10283 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao > > Refer to HBASE-3052, multiple zk servers can run together in minicluster. The > problem is that client can only connect with the first zk server and if you > kill the first one, it
[jira] [Created] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
chendihao created HBASE-10283: - Summary: Client can't connect with all the running zk servers in MiniZooKeeperCluster Key: HBASE-10283 URL: https://issues.apache.org/jira/browse/HBASE-10283 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 {noformat} The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10282: -- Description: Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of "active" here is not the same as zk's. It's confusing when I read the code so I think we should fix it. was: Thanks to HBASE-10274, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of "active" here is not the same as zk's. It's confusing when I read the code so I think we should fix it. > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10282: -- Description: Thanks to HBASE-10274, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of "active" here is not the same as zk's. It's confusing when I read the code so I think we should fix it. was: Thanks to https://issues.apache.org/jira/browse/HBASE-10274, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of "active" here is not the same as zk's. It's confusing when I read the code so I think we should fix it. > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Thanks to HBASE-10274, we're able to run multiple zk servers in minicluster. > However, It's confusing to keep the variable activeZKServerIndex as zero and > assure the first zk server is always the active one. I think returning the > first sever's client port is for testing and it seems that we can directly > return the first item of the list. Anyway, the concept of "active" here is > not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862720#comment-13862720 ] chendihao commented on HBASE-10282: --- Looking for your replay. [~stack] [~liyin] [~streamy] > We can't assure that the first ZK server is active server in > MiniZooKeeperCluster > - > > Key: HBASE-10282 > URL: https://issues.apache.org/jira/browse/HBASE-10282 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Thanks to https://issues.apache.org/jira/browse/HBASE-10274, we're able to > run multiple zk servers in minicluster. However, It's confusing to keep the > variable activeZKServerIndex as zero and assure the first zk server is always > the active one. I think returning the first sever's client port is for > testing and it seems that we can directly return the first item of the list. > Anyway, the concept of "active" here is not the same as zk's. > It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
chendihao created HBASE-10282: - Summary: We can't assure that the first ZK server is active server in MiniZooKeeperCluster Key: HBASE-10282 URL: https://issues.apache.org/jira/browse/HBASE-10282 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Thanks to https://issues.apache.org/jira/browse/HBASE-10274, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of "active" here is not the same as zk's. It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862501#comment-13862501 ] chendihao commented on HBASE-10274: --- BTW, the patch of HBASE-6820 is not committed in 0.94. Can you confirm this? [~enis] > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-truck-v1.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10274: -- Attachment: HBASE-10274-0.94-v1.patch patch for 0.94 > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-truck-v1.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10274: -- Attachment: HBASE-10274-truck-v1.patch patch for trunk > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-truck-v1.patch > > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10274: -- Status: Patch Available (was: Open) > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-10274: - Assignee: chendihao > MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers > --- > > Key: HBASE-10274 > URL: https://issues.apache.org/jira/browse/HBASE-10274 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > > HBASE-6820 points out the problem but not fix completely. > killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will > shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-9800: Assignee: chendihao > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-2218) MiniZooKeeperCluster - to be refactored and moved upstream to zk
[ https://issues.apache.org/jira/browse/HBASE-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861326#comment-13861326 ] chendihao commented on HBASE-2218: -- That would be nice. Any plan for this? [~yuzhih...@gmail.com] [~phunt] > MiniZooKeeperCluster - to be refactored and moved upstream to zk > - > > Key: HBASE-2218 > URL: https://issues.apache.org/jira/browse/HBASE-2218 > Project: HBase > Issue Type: Improvement >Reporter: Karthik K > > As rightly mentioned in the comments - MiniZooKeeperCluster should be > refactored and moved up to the ZK tree as appropriate and reused as > necessary. > Marked as an improvement to remember the task. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6820) [WINDOWS] MiniZookeeperCluster should ensure that ZKDatabase is closed upon shutdown()
[ https://issues.apache.org/jira/browse/HBASE-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861317#comment-13861317 ] chendihao commented on HBASE-6820: -- [~enis] [~stack] I'm working with MiniZooKeeperCluster and glad to see the patch. But I don't think it's done completely. KillCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. I open a new jira https://issues.apache.org/jira/browse/HBASE-10274 and it would be nice to fix it. > [WINDOWS] MiniZookeeperCluster should ensure that ZKDatabase is closed upon > shutdown() > -- > > Key: HBASE-6820 > URL: https://issues.apache.org/jira/browse/HBASE-6820 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.3, 0.95.2 >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Labels: windows > Fix For: 0.95.0 > > Attachments: hbase-6820_v1-0.94.patch, hbase-6820_v1-trunk.patch > > > MiniZookeeperCluster.shutdown() shuts down the ZookeeperServer and > NIOServerCnxnFactory. However, MiniZookeeperCluster uses a deprecated > ZookeeperServer constructor, which in turn constructs its own FileTxnSnapLog, > and ZKDatabase. Since ZookeeperServer.shutdown() does not close() the > ZKDatabase, we have to explicitly close it in MiniZookeeperCluster.shutdown(). > Tests effected by this are > {code} > TestSplitLogManager > TestSplitLogWorker > TestOfflineMetaRebuildBase > TestOfflineMetaRebuildHole > TestOfflineMetaRebuildOverlap > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
chendihao created HBASE-10274: - Summary: MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861022#comment-13861022 ] chendihao commented on HBASE-9830: -- [~lhofhansl] Yes, we need this. We're using coprocessor to count the rows but with this restriction we have to indicate all the cf. Fixing this would help to process any cluster with the same interface. > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Fix For: 0.94.16 > > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-9802: Assignee: chendihao > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Assignee: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823196#comment-13823196 ] chendihao commented on HBASE-9830: -- Agree with [~lhofhansl]. Because we use 0.94 and need coprocessor to count rows, it may be same for the 0.96 guys if they need. > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810948#comment-13810948 ] chendihao commented on HBASE-9802: -- [~tlipcon] That's really really helpful! Nice to see your project and we will look deep at it. Thanks again :-) > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810009#comment-13810009 ] chendihao commented on HBASE-9802: -- >From now on, We have found some tools to simulate the HW actions. The servers >we want to test should provide an interface for us to call them(maybe ssh or >http server to accept our requests). I just put out the list and welcome to >any suggestions. Network Delay: Use tc like `tc qdisc add dev eth0 root netem delay 1000ms` to set network delay or recover. Network Unavailable: Use iptable to block the specific port, like `iptables -A OUTPUT -p tcp --dport 3306 -j DROP `. Network Bandwidth Limit: Use tc like `tc qdisc add dev eth0 root tbf rate 5800kbit latency 50ms burst 1540` to limit the bandwidth. Disk Full: Use dd to create a really large file to fill up the disk, like `dd if=/dev/zero of=/$path/tst.img bs=1M count=20K `. Disk Failure: Maybe use fiu-ctrl or `echo offline > /sys/block/sda/device/state`(not test yet). Disk Slow: Use fio to write or read a lot making the disk under stress. Memory Limit: Impl a program from http://minuteware.net/simulating-high-memory-usage-in-linux/. CPU Limit: Use cpulimit from https://github.com/opsengine/cpulimit. Although these tools can run individually, we have a plan to integrate them as a failure-injection system for Linux. If it's done, the failover test framework can call the failure periodically. But in our situation, the client running failover framework can't ssh to the server directly(because of anything about security). We are thinking to impl a http server which accepts the requests and call the failures in the test machine. > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9800: - Status: Patch Available (was: Open) > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804110#comment-13804110 ] chendihao commented on HBASE-9830: -- [~yuzhih...@gmail.com] Any suggestions? > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804108#comment-13804108 ] chendihao commented on HBASE-9830: -- The patch is for 0.94 so HadoopQA may not handle it. Hope someone review and commit it. > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9830: - Status: Patch Available (was: Open) > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9830: - Attachment: HBASE-9830-0.94-v1.patch patch for 0.94 > Backport HBASE-9605 to 0.94 > --- > > Key: HBASE-9830 > URL: https://issues.apache.org/jira/browse/HBASE-9830 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9830-0.94-v1.patch > > > Backport HBASE-9605 which is about "Allow AggregationClient to skip > specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9830) Backport HBASE-9605 to 0.94
chendihao created HBASE-9830: Summary: Backport HBASE-9605 to 0.94 Key: HBASE-9830 URL: https://issues.apache.org/jira/browse/HBASE-9830 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Backport HBASE-9605 which is about "Allow AggregationClient to skip specifying column family for row count aggregate" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800475#comment-13800475 ] chendihao commented on HBASE-9802: -- We don't use IT test a lot and think it's less aggressive. As [~eclark] said, the class IntegrationTestBigLinkedListWithChaosMonkey may have verified data, but we treat this framework as a external tool. We impl a DataValidateTool to randomly read/put/delete data(simulate a real client), then read the value from HBase and compared with the expected value which is stored in memory and reliable. It's an easy way for us to validate data whenever we want(before/during/after failover test), and ensure the availability and data correctness. > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799133#comment-13799133 ] chendihao commented on HBASE-9802: -- Thanks for paying attention on our work. Now we're trying to seperate HBase things from this framework and reuse for HDFS, zookeeper and other HA services. Just like what [~ste...@apache.org] has said, we want to make it more generic and just provide an extensible framework, then everyone can implement their actions to inject failures in their system. Thank [~elserj] and we will learn more about Accumulo. Currently we use tc(traffic control) to simulate network delay, dd to make disk full and other tools to simulate network/disk/cpu/memory failure. It would be helpful if our test servers provide these interfaces to use. I think we can do it generally and share with community. > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9802) A new failover test framework for HBase
chendihao created HBASE-9802: Summary: A new failover test framework for HBase Key: HBASE-9802 URL: https://issues.apache.org/jira/browse/HBASE-9802 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Currently HBase uses ChaosMonkey for IT test and fault injection. It will restart regionserver, force balancer and perform other actions randomly and periodically. However, we need a more extensible and full-featured framework for our failover test and we find ChaosMonkey cant' suit our needs since it has the following drawbacks. 1) Only process-level actions can be simulated, not support machine-level/hardware-level/network-level actions. 2) No data validation before and after the test, the fatal bugs such as that can cause data inconsistent may be overlook. 3) When failure occurs, we can't repro the problem and hard to figure out the reason. Therefore, we have developed a new framework to satisfy the need of failover test. We extended ChaosMonkey and implement the function to validate data and to replay failed actions. Here are the features we add. 1) Policy/Task/Action abstraction, seperating Task from Policy and Action makes it easier to manage and replay a set of actions. 2) Make action configurable. We have implemented some actions to cause machine failure and defined the same interface as original actions. 3) We should validate the date consistent before and after failover test to ensure the availability and data correctness. 4) After performing a set of actions, we also check the consistency of table as well. 5) The set of actions that caused test failure can be replayed, and the reproducibility of actions can help fixing the exposed bugs. Our team has developed this framework and run for a while. Some bugs were exposed and fixed by running this test framework. Moreover, we have a monitor program which shows the progress of failover test and make sure our cluster is as stable as we want. Now we are trying to make it more general and will opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798833#comment-13798833 ] chendihao commented on HBASE-9800: -- I will put it in coprocessor/example package. Any suggestions? [~yuzhih...@gmail.com] > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9800: - Attachment: HBASE-9800-0.94-v1.patch patch for 0.94 > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: HBASE-9800-0.94-v1.patch > > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
[ https://issues.apache.org/jira/browse/HBASE-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798760#comment-13798760 ] chendihao commented on HBASE-9800: -- Yes, I know AggregationClient.rowCount(). We write a main function to call it from time to time. Because it's not quite real-time, we have a plan to count the rows daily at midnight. It might be helpful for us to know the rough statistics about tables while not affecting the online services. > Impl CoprocessorRowcounter to run in command-line > - > > Key: HBASE-9800 > URL: https://issues.apache.org/jira/browse/HBASE-9800 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > We want to count the rows of table daily but the default rowcounter using > mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. > Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9800) Impl CoprocessorRowcounter to run in command-line
chendihao created HBASE-9800: Summary: Impl CoprocessorRowcounter to run in command-line Key: HBASE-9800 URL: https://issues.apache.org/jira/browse/HBASE-9800 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor We want to count the rows of table daily but the default rowcounter using mapreduce is inefficient. Impl rowcounter using coprocessor makes it better. Furthermore, It must provide the mechanism to choose the way to output result. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9605) Allow AggregationClient to skip specifying column family for row count aggregate
[ https://issues.apache.org/jira/browse/HBASE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9605: - Attachment: 0605-0.94.patch patch for 0.94 > Allow AggregationClient to skip specifying column family for row count > aggregate > > > Key: HBASE-9605 > URL: https://issues.apache.org/jira/browse/HBASE-9605 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.0 > > Attachments: 0605-0.94.patch, 9605-v1.txt > > > For rowcounter job, column family is not required as input parameter. > AggregationClient requires the specification of one column family: > {code} > } else if (scan.getFamilyMap().size() != 1) { > throw new IOException("There must be only one family."); > } > {code} > We should relax the above requirement for row count aggregate where > FirstKeyOnlyFilter would be automatically applied. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
[ https://issues.apache.org/jira/browse/HBASE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9392: - Status: Patch Available (was: Open) > Add RestartBackupMastersAction for ChaosMonkey > -- > > Key: HBASE-9392 > URL: https://issues.apache.org/jira/browse/HBASE-9392 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: RestartBackupMastersAction.patch > > > Just implement RestartBackupMastersAction for more failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
[ https://issues.apache.org/jira/browse/HBASE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9392: - Tags: it ChaosMonkey (was: ch) > Add RestartBackupMastersAction for ChaosMonkey > -- > > Key: HBASE-9392 > URL: https://issues.apache.org/jira/browse/HBASE-9392 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: RestartBackupMastersAction.patch > > > Just implement RestartBackupMastersAction for more failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
[ https://issues.apache.org/jira/browse/HBASE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9392: - Attachment: RestartBackupMastersAction.patch patch for trunk > Add RestartBackupMastersAction for ChaosMonkey > -- > > Key: HBASE-9392 > URL: https://issues.apache.org/jira/browse/HBASE-9392 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > Attachments: RestartBackupMastersAction.patch > > > Just implement RestartBackupMastersAction for more failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9392) Add RestartBackupMastersAction for ChaosMonkey
chendihao created HBASE-9392: Summary: Add RestartBackupMastersAction for ChaosMonkey Key: HBASE-9392 URL: https://issues.apache.org/jira/browse/HBASE-9392 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Attachments: RestartBackupMastersAction.patch Just implement RestartBackupMastersAction for more failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752187#comment-13752187 ] chendihao commented on HBASE-9350: -- Thanks for reviewing [~stack] :-) > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.98.0, 0.95.0 > > Attachments: MoveRegionsOfTableAction.java.patch, > MoveRegionsOfTableAction-v2.patch > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Attachment: ChaosMonkey-v3.patch update patch for trunk > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao >Priority: Trivial > Fix For: 0.94.0 > > Attachments: ChaosMonkey.java.patch, ChaosMonkey-v2.patch, > ChaosMonkey-v3.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751231#comment-13751231 ] chendihao commented on HBASE-9138: -- [~hadoopqa]Please try again^^ > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao >Priority: Trivial > Fix For: 0.94.0 > > Attachments: ChaosMonkey.java.patch, ChaosMonkey-v2.patch, > ChaosMonkey-v3.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Fix Version/s: 0.94.0 Affects Version/s: (was: 0.94.4) 0.94.0 Release Note: patch for 0.94.x Hadoop Flags: Reviewed Status: Patch Available (was: Open) > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao >Priority: Trivial > Fix For: 0.94.0 > > Attachments: ChaosMonkey.java.patch, ChaosMonkey-v2.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Attachment: MoveRegionsOfTableAction-v2.patch take the diff at HBASE_HOME_DIR > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.94.0 > > Attachments: MoveRegionsOfTableAction.java.patch, > MoveRegionsOfTableAction-v2.patch > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Attachment: MoveRegionsOfTableAction.java.patch patch for 0.94.x > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.94.0 > > Attachments: MoveRegionsOfTableAction.java.patch > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Status: Patch Available (was: Open) > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.94.0 > > Attachments: MoveRegionsOfTableAction.java.patch > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Labels: test (was: ) Release Note: patch for 0.94.x Hadoop Flags: Reviewed Status: Patch Available (was: Open) > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.94.0 > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Status: Open (was: Patch Available) > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Labels: test > Fix For: 0.94.0 > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
[ https://issues.apache.org/jira/browse/HBASE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9350: - Description: The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final byte [] destServerName) should be encoded. Otherwise, it could throw UnknowRegionException and result in failure of this action. {code} encodedRegionName The encoded region name; i.e. the hash that makes up the region name suffix: e.g. if regionname is TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. {code} was: The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final byte [] destServerName) should be encoded. Otherwise, it could throw UnknowRegionException and result in failure of this action. encodedRegionName The encoded region name; i.e. the hash that makes up the region name suffix: e.g. if regionname is TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException > -- > > Key: HBASE-9350 > URL: https://issues.apache.org/jira/browse/HBASE-9350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: chendihao > Fix For: 0.94.0 > > > The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final > byte [] destServerName) should be encoded. Otherwise, it could throw > UnknowRegionException and result in failure of this action. > {code} > encodedRegionName The encoded region name; i.e. the hash that makes up the > region name suffix: >e.g. if regionname is > TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then >the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9350) In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException
chendihao created HBASE-9350: Summary: In ChaosMonkey, MoveRegionsOfTableAction throws UnknownRegionException Key: HBASE-9350 URL: https://issues.apache.org/jira/browse/HBASE-9350 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: chendihao Fix For: 0.94.0 The first parameter in HBaseAdmin.move(final byte [] encodedRegionName, final byte [] destServerName) should be encoded. Otherwise, it could throw UnknowRegionException and result in failure of this action. encodedRegionName The encoded region name; i.e. the hash that makes up the region name suffix: e.g. if regionname is TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396., then the encoded region name is: 527db22f95c8a9e0116f0cc13c680396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734322#comment-13734322 ] chendihao commented on HBASE-9138: -- It a trivial bug and please fix it quickly [~stack] > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.4 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch, ChaosMonkey-v2.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Attachment: ChaosMonkey-v2.patch Thanks for reminding:-) [~stack] > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.4 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch, ChaosMonkey-v2.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Affects Version/s: (was: 0.94.8) 0.94.4 > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.4 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Description: The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be getHBaseIntegrationTestingUtility(), just a spelling mistake. {code} /** * Context for Action's */ public static class ActionContext { private IntegrationTestingUtility util; public ActionContext(IntegrationTestingUtility util) { this.util = util; } public IntegrationTestingUtility getHaseIntegrationTestingUtility() { return util; } public HBaseCluster getHBaseCluster() { return util.getHBaseClusterInterface(); } } {code} was: The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be getHBaseIntegrationTestingUtility(), just a misspelling. {code} /** * Context for Action's */ public static class ActionContext { private IntegrationTestingUtility util; public ActionContext(IntegrationTestingUtility util) { this.util = util; } public IntegrationTestingUtility getHaseIntegrationTestingUtility() { return util; } public HBaseCluster getHBaseCluster() { return util.getHBaseClusterInterface(); } } {code} > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.8 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a spelling mistake. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Description: The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be getHBaseIntegrationTestingUtility(), just a misspelling. {code} /** * Context for Action's */ public static class ActionContext { private IntegrationTestingUtility util; public ActionContext(IntegrationTestingUtility util) { this.util = util; } public IntegrationTestingUtility getHaseIntegrationTestingUtility() { return util; } public HBaseCluster getHBaseCluster() { return util.getHBaseClusterInterface(); } } {code} was:the function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be getHBaseIntegrationTestingUtility() > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.8 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch > > > The function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility(), just a misspelling. > {code} > /** > >* Context for Action's > >*/ > public static class ActionContext { > private IntegrationTestingUtility util; > public ActionContext(IntegrationTestingUtility util) { > this.util = util; > } > public IntegrationTestingUtility getHaseIntegrationTestingUtility() { > return util; > } > public HBaseCluster getHBaseCluster() { > return util.getHBaseClusterInterface(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9138) the name of function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-9138: - Summary: the name of function getHaseIntegrationTestingUtility() is a misspelling (was: the function getHaseIntegrationTestingUtility() is a misspelling) > the name of function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.8 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch > > > the function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9138) the function getHaseIntegrationTestingUtility() is a misspelling
[ https://issues.apache.org/jira/browse/HBASE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730646#comment-13730646 ] chendihao commented on HBASE-9138: -- patch for 0.94.x > the function getHaseIntegrationTestingUtility() is a misspelling > > > Key: HBASE-9138 > URL: https://issues.apache.org/jira/browse/HBASE-9138 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.8 >Reporter: chendihao >Priority: Trivial > Attachments: ChaosMonkey.java.patch > > > the function getHaseIntegrationTestingUtility() in ChaosMonkey.java should be > getHBaseIntegrationTestingUtility() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira