[jira] [Commented] (HBASE-11721) jdiff script no longer works as usage instructions indicate
[ https://issues.apache.org/jira/browse/HBASE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093739#comment-14093739 ] Aleksandr Shulman commented on HBASE-11721: --- Thanks [~misty] for trying out the tool. I am open to suggestions on how to make the usage instructions more clear. Regarding the error you are seeing: Looks like the artifact that this script depends on may have been moved. curl l is returning an html redirect (301) instead of the zip file. Using wget, it looks like there may also be some openssl or certificate issues. Doing a little homework on the matter, it looks like some people have hit this issue with . Will look into this further. {code}wget http://cloud.github.com/downloads/tomwhite/jdiff/jdiff-1.1.1-with-incompatible-option.zip --2014-08-11 21:42:11-- http://cloud.github.com/downloads/tomwhite/jdiff/jdiff-1.1.1-with-incompatible-option.zip Resolving cloud.github.com (cloud.github.com)... 54.230.141.84, 54.230.143.7, 54.230.140.148, ... Connecting to cloud.github.com (cloud.github.com)|54.230.141.84|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://cloud.github.com/downloads/tomwhite/jdiff/jdiff-1.1.1-with-incompatible-option.zip [following] --2014-08-11 21:42:11-- https://cloud.github.com/downloads/tomwhite/jdiff/jdiff-1.1.1-with-incompatible-option.zip Connecting to cloud.github.com (cloud.github.com)|54.230.141.84|:443... connected. OpenSSL: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure Unable to establish SSL connection. {code} jdiff script no longer works as usage instructions indicate --- Key: HBASE-11721 URL: https://issues.apache.org/jira/browse/HBASE-11721 Project: HBase Issue Type: Bug Components: scripts Reporter: Misty Stanley-Jones I pasted the command from the usage instructions embedded in the script, but it fails as follows: [misty@cheezel dev-support](master)$ bash ./jdiffHBasePublicAPI.sh https://github.com/apache/hbase.git 0.94 https://github.com/MY_REPO/hbase.git 0.94 JDiff evaluation beginning: Determining if this is a local directory or a git repo. Looks like https://github.com/apache/hbase.git is a git repo Determining if this is a local directory or a git repo. Looks like https://github.com/MY_REPO/hbase.git is a git repo We are going to compare source 1 which is a git_repo and source 2, which is a git_repo 0.94 0.94 JDIFF_WORKING_DIRECTORY not set. That's not an issue. We will default it to /tmp/jdiff. % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 183 100 1830 0447 0 --:--:-- --:--:-- --:--:-- 448 Archive: jdiff-1.1.1-with-incompatible-option.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of jdiff-1.1.1-with-incompatible-option.zip or jdiff-1.1.1-with-incompatible-option.zip.zip, and cannot find jdiff-1.1.1-with-incompatible-option.zip.ZIP, period. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11400) Edit, consolidate, and update Compression and data encoding docs
[ https://issues.apache.org/jira/browse/HBASE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-11400: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Summary: Edit, consolidate, and update Compression and data encoding docs (was: Edit, colsolidate, and update Compression and data encoding docs) Edit, consolidate, and update Compression and data encoding docs Key: HBASE-11400 URL: https://issues.apache.org/jira/browse/HBASE-11400 Project: HBase Issue Type: Improvement Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Priority: Minor Attachments: HBASE-11400.patch Current docs are here: http://hbase.apache.org/book.html#compression.test It could use some editing and expansion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11400) Edit, consolidate, and update Compression and data encoding docs
[ https://issues.apache.org/jira/browse/HBASE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041671#comment-14041671 ] Aleksandr Shulman commented on HBASE-11400: --- Thanks for taking this up, [~misty]. I'll have a look. Edit, consolidate, and update Compression and data encoding docs Key: HBASE-11400 URL: https://issues.apache.org/jira/browse/HBASE-11400 Project: HBase Issue Type: Improvement Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Priority: Minor Attachments: HBASE-11400.patch Current docs are here: http://hbase.apache.org/book.html#compression.test It could use some editing and expansion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040236#comment-14040236 ] Aleksandr Shulman commented on HBASE-10924: --- Hmm - I think it's still the intention of the patch to have region_mover do a best-effort move of all the regions, as the script had done before. The main addition is that it will retry that process a configurable number of times, in case of strange transient conditions we've seen, like the master down when the move request is sent. Overall, I've seen the region_mover work pretty well and I see this patch as just being a minor stability improvement. If you believe there is a better way to do this region movement, such as failing fast on a split region, I'd be happy to test such a patch in our frameworks. If we're happy with the logic of this patch, then I can post a version for 0.96, 0.98, trunk, etc. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.22 Attachments: HBASE-10924-0.94-v2.patch, HBASE-10924-0.94-v3.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11122) Annotate coprocessor APIs
[ https://issues.apache.org/jira/browse/HBASE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-11122: -- Labels: compatibility coprocessors (was: ) Annotate coprocessor APIs - Key: HBASE-11122 URL: https://issues.apache.org/jira/browse/HBASE-11122 Project: HBase Issue Type: Task Affects Versions: 0.99.0, 0.98.3 Reporter: Andrew Purtell Labels: compatibility, coprocessors Add annotations to coprocessor APIs for:\\ - Interface stability - If or if not bypassable - If or if not executed under row lock -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998017#comment-13998017 ] Aleksandr Shulman commented on HBASE-4920: -- +1 for the Orca. We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, krake.zip, more_orcas.png, more_orcas2.png, photo (2).JPG, plus_orca.png We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996537#comment-13996537 ] Aleksandr Shulman commented on HBASE-10924: --- This issue affects all branches. I will upload patches for the other branches as well. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Attachments: HBASE-10924-0.94-v2.patch, HBASE-10924-0.94-v3.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()
[ https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984841#comment-13984841 ] Aleksandr Shulman commented on HBASE-10251: --- [~ndimiduk], I like the idea of having utility classes that have a compatibility story, from which the tests (which do not have compatibility considerations) can pull. Restore API Compat for PerformanceEvaluation.generateValue() Key: HBASE-10251 URL: https://issues.apache.org/jira/browse/HBASE-10251 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Aleksandr Shulman Labels: api_compatibility Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because it was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10924: -- Status: Patch Available (was: Open) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Attachments: HBASE-10924-0.94-v2.patch, HBASE-10924-0.94-v3.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10924: -- Attachment: HBASE-10924-0.94-v3.patch Adding v3 which includes a sleep and some comments as to the rationale of the fix. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Attachments: HBASE-10924-0.94-v2.patch, HBASE-10924-0.94-v3.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10924: -- Attachment: HBASE-10924-0.94-v1.patch Attaching v1 of the patch. For 94 only. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Attachments: HBASE-10924-0.94-v1.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10924: -- Attachment: (was: HBASE-10924-0.94-v1.patch) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10924: -- Attachment: HBASE-10924-0.94-v2.patch Attaching a better version of the patch here. It's relatively straightforward, but if there is interest in a formal review, I can put it up on RB. Testing: I ran this patch through an in-house rolling upgrade test framework. It performs MR jobs, splits, compactions, and DML while regions are moving. I also did some explicit testing by installing this on a cluster and moving regions back and forth while doing splits. The results were fine for all the testing. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.20 Attachments: HBASE-10924-0.94-v2.patch Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE
[ https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972123#comment-13972123 ] Aleksandr Shulman commented on HBASE-11008: --- We should be careful to consider what workflows we might disrupt with this change. Specifically, we should consider while the user is upgrading (rolling upgrade) and after the upgrade is complete. Bulk loading is something that users can expect to do while a rolling upgrade is going on. If some regionservers begin enforcing a more restrictive requirement, then it will cause issues. If we choose to make it more restrictive, we should document any changes we should make to the ACL table in order to allow the upgrade to go smoothly. If we choose to make it less restrictive (e.g. allow admin permissions to users with create), then we have to acknowledge that the ACL semantics have changed and document that appropriately. Align bulk load, flush, and compact to require Action.CREATE Key: HBASE-11008 URL: https://issues.apache.org/jira/browse/HBASE-11008 Project: HBase Issue Type: Improvement Components: security Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20 Attachments: HBASE-11008.patch Over in HBASE-10958 we noticed that it might make sense to require Action.CREATE for bulk load, flush, and compact since it is also required for things like enable and disable. This means the following changes: - preBulkLoadHFile goes from WRITE to CREATE - compact/flush go from ADMIN to ADMIN or CREATE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
Aleksandr Shulman created HBASE-10924: - Summary: [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Fix For: 0.94.19 Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10924) [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges
[ https://issues.apache.org/jira/browse/HBASE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962112#comment-13962112 ] Aleksandr Shulman commented on HBASE-10924: --- That seems like a good place to put that logic since it'll be easier to maintain. As a bonus, we'll have implicit compatibility checks at compile time :) Only concern is that we don't break the shell api, but that shouldn't be difficult to maintain. [region_mover]: Adjust region_mover script to retry unloading a server a configurable number of times in case of region splits/merges - Key: HBASE-10924 URL: https://issues.apache.org/jira/browse/HBASE-10924 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.15 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: region_mover, rolling_upgrade Fix For: 0.94.19 Observed behavior: In about 5% of cases, my rolling upgrade tests fail because of stuck regions during a region server unload. My theory is that this occurs when region assignment information changes between the time the region list is generated, and the time when the region is to be moved. An example of such a region information change is a split or merge. Example: Regionserver A has 100 regions (#0-#99). The balancer is turned off and the regionmover script is called to unload this regionserver. The regionmover script will generate the list of 100 regions to be moved and then proceed down that list, moving the regions off in series. However, there is a region, #84, that has split into two daughter regions while regions 0-83 were moved. The script will be stuck trying to move #84, timeout, and then the failure will bubble up (attempt 1 failed). Proposed solution: This specific failure mode should be caught and the region_mover script should now attempt to move off all the regions. Now, it will have 16+1 (due to split) regions to move. There is a good chance that it will be able to move all 17 off without issues. However, should it encounter this same issue (attempt 2 failed), it will retry again. This process will continue until the maximum number of unload retry attempts has been reached. This is not foolproof, but let's say for the sake of argument that 5% of unload attempts hit this issue, then with a retry count of 3, it will reduce the unload failure probability from 0.05 to 0.000125 (0.05^3). Next steps: I am looking for feedback on this approach. If it seems like a sensible approach, I will create a strawman patch and test it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934697#comment-13934697 ] Aleksandr Shulman commented on HBASE-10184: --- I will take a look. Should be interesting. [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.98.1, 0.99.0 Attachments: 10184-4.patch, 10184.addendum, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934706#comment-13934706 ] Aleksandr Shulman commented on HBASE-10184: --- From the above runs (3), I only see one TF related to my tests: https://builds.apache.org/job/HBase-0.98/228/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideOnlineSchemaChangeWhileSplitting/ Can you point me to the additional failures? [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.99.0, 0.98.2 Attachments: 10184-4.patch, 10184.addendum, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935630#comment-13935630 ] Aleksandr Shulman commented on HBASE-10184: --- I ran the tests locally a while ago before submitting the patch and they all passed for me. It's possible something has changed between now and then. Let me look into these test failures and whether they reveal actual product bugs. Otherwise, I'll adjust the tests to be more stable on our Jenkins runs. [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.99.0, 0.98.2 Attachments: 10184-4.patch, 10184.addendum, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935743#comment-13935743 ] Aleksandr Shulman commented on HBASE-10184: --- I don't want to relax the requirements of the test unless they are testing something that is not always guaranteed to be true. If that's the case, then I can remove it. I'd rather get to the bottom of the issue. [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.99.0, 0.98.2 Attachments: 10184-4.patch, 10184.addendum, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934303#comment-13934303 ] Aleksandr Shulman commented on HBASE-10184: --- Thanks Andrew! [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.98.1, 0.99.0 Attachments: 10184-4.patch, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934367#comment-13934367 ] Aleksandr Shulman commented on HBASE-10184: --- +1 on the addendum as well. [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.98.1, 0.99.0 Attachments: 10184-4.patch, 10184.addendum, HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932733#comment-13932733 ] Aleksandr Shulman commented on HBASE-10184: --- Thanks for following up on this. Was the patch that you applied the one from the JIRA, or was it the latest from the reviewboard review: That one is here: https://reviews.apache.org/r/16457/diff/raw/ [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.99.0, 0.98.2 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Attachments: HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10653) Incorrect table status in HBase shell Describe
[ https://issues.apache.org/jira/browse/HBASE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918913#comment-13918913 ] Aleksandr Shulman commented on HBASE-10653: --- The formatting is a bit confusing, but it does appear that the table is disabled: 'ENABLED' is just the name of the attribute. Below, it says 'false', right under that heading. DESCRIPTION *ENABLED* 'TestTable', {NAME = 'info', DATA_BLOCK_ENCODING = 'NONE', BLOOMF *false* ILTER = 'ROW', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESS ION = 'SNAPPY', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DEL ETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'} 1 row(s) in 1.4220 seconds This might be a usability concern though, and maybe it's worth exploring a clearer formatting. -- Best Regards, Aleks Shulman 847.814.5804 Cloudera Incorrect table status in HBase shell Describe -- Key: HBASE-10653 URL: https://issues.apache.org/jira/browse/HBASE-10653 Project: HBase Issue Type: Bug Components: shell Reporter: Biju Nair Labels: HbaseShell, describe Describe output of table which is disabled shows as enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10653) Incorrect table status in HBase shell Describe
[ https://issues.apache.org/jira/browse/HBASE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918922#comment-13918922 ] Aleksandr Shulman commented on HBASE-10653: --- To clarify, the 'false' ended up here: {code}BLOOMF *false* ILTER {code} Incorrect table status in HBase shell Describe -- Key: HBASE-10653 URL: https://issues.apache.org/jira/browse/HBASE-10653 Project: HBase Issue Type: Bug Components: shell Reporter: Biju Nair Labels: HbaseShell, describe Describe output of table which is disabled shows as enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10579: -- Attachment: HBASE-10579-v0.patch Trivial fix. There is only one reference to this path in the book, so I just had to fix it in one spot. [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10567) Add overwrite manifest option to ExportSnapshot
[ https://issues.apache.org/jira/browse/HBASE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909089#comment-13909089 ] Aleksandr Shulman commented on HBASE-10567: --- Took a first read of the patch. Looks good to me. I'd maybe like to see a few more tests, but this is probably okay for now. Add overwrite manifest option to ExportSnapshot --- Key: HBASE-10567 URL: https://issues.apache.org/jira/browse/HBASE-10567 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10567-v0.patch, HBASE-10567-v1.patch If you want to export a snapshot twice (e.g. in case you accidentally removed a file and now your snapshot is corrupted) you have to manually remove the .hbase-snapshot/SNAPSHOT_NAME directory and then run the ExportSnapshot tool. Add an -overwrite option to this operation automatically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
Aleksandr Shulman created HBASE-10579: - Summary: [Documentation]: ExportSnapshot tool package incorrectly documented Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1 Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895104#comment-13895104 ] Aleksandr Shulman commented on HBASE-10481: --- Semantically, it does not make sense to have a the previous version be greater than the current version. The script would just generate a report that is the mirror image (adds reported as removes). I don't think this is a meaningful use case to support. The solution would be to add a meaningful error message and also to document the logic. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10481: -- Attachment: HBASE-10481-v1.patch Adding v1 of the patch. Fixes the case identified in the jira and also corrects some of the output about where the working directory is. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-10481 started by Aleksandr Shulman. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10481: -- Status: Patch Available (was: In Progress) API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.1.1, 0.94.16, 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.98.1, 0.99.0, 0.96.1.1, 0.94.16 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
Aleksandr Shulman created HBASE-10481: - Summary: API Compatibility JDiff script does not properly handle arguments in reverse order Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.1.1, 0.94.16, 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.98.0, 0.99.0, 0.96.1.1, 0.94.16 [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860747#comment-13860747 ] Aleksandr Shulman commented on HBASE-10264: --- +1 - looks good to me as well. Smoke tested it against MRv1 as well. [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
Aleksandr Shulman created HBASE-10269: - Summary: [Nit]: Spelling issue in HFileContext.setCompresssion Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Attachment: HBASE-10269-1.patch It looks like this call was not used anywhere, so the change is a one-liner. [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Affects Version/s: 0.99.0 0.98.1 0.98.0 [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Status: Patch Available (was: Open) [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860802#comment-13860802 ] Aleksandr Shulman commented on HBASE-10269: --- Should apply cleanly to both 0.98 and trunk. Not necessary for 0.96. [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
Aleksandr Shulman created HBASE-10264: - Summary: [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()
Aleksandr Shulman created HBASE-10251: - Summary: Restore API Compat for PerformanceEvaluation.generateValue() Key: HBASE-10251 URL: https://issues.apache.org/jira/browse/HBASE-10251 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because is was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()
[ https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10251: -- Labels: api_compatibility (was: ) Restore API Compat for PerformanceEvaluation.generateValue() Key: HBASE-10251 URL: https://issues.apache.org/jira/browse/HBASE-10251 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: api_compatibility Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because is was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()
[ https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10251: -- Description: Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because it was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. was: Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because is was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. Restore API Compat for PerformanceEvaluation.generateValue() Key: HBASE-10251 URL: https://issues.apache.org/jira/browse/HBASE-10251 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: api_compatibility Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because it was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856465#comment-13856465 ] Aleksandr Shulman commented on HBASE-10184: --- Review available here: https://reviews.apache.org/r/16457/ [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Attachments: HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10194) [Usability]: Instructions in CompactionTool no longer accurate because of namespaces
[ https://issues.apache.org/jira/browse/HBASE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10194: -- Attachment: HBASE-10194-trunk.patch Attaching patch. Should apply cleanly for everything after 0.94. [Usability]: Instructions in CompactionTool no longer accurate because of namespaces Key: HBASE-10194 URL: https://issues.apache.org/jira/browse/HBASE-10194 Project: HBase Issue Type: Bug Components: Compaction, util Affects Versions: 0.96.2, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10194-trunk.patch Observed Behavior: The instructions for org.apache.hadoop.hbase.regionserver.CompactionTool suggest using the pre-95 hbase format: {code}Examples: To compact the full 'TestTable' using MapReduce: $ bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred hdfs:///hbase/TestTable{code} Expected behavior: It should now take into account namespaces, for example: {code} hdfs:///hbase/data/default/TestTable {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10184: -- Attachment: HBASE-10184-trunk.diff First draft of patch against trunk. All tests pass. [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Attachments: HBASE-10184-trunk.diff There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10194) [Usability]: Instructions in CompactionTool no longer accurate because of namespaces
Aleksandr Shulman created HBASE-10194: - Summary: [Usability]: Instructions in CompactionTool no longer accurate because of namespaces Key: HBASE-10194 URL: https://issues.apache.org/jira/browse/HBASE-10194 Project: HBase Issue Type: Bug Components: Compaction, util Affects Versions: 0.96.2, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Observed Behavior: The instructions for org.apache.hadoop.hbase.regionserver.CompactionTool suggest using the pre-95 hbase format: {code}Examples: To compact the full 'TestTable' using MapReduce: $ bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred hdfs:///hbase/TestTable{code} Expected behavior: It should now take into account namespaces, for example: {code} hdfs:///hbase/data/default/TestTable {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
Aleksandr Shulman created HBASE-10184: - Summary: [Online Schema Change]: Add additional tests for online schema change Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10136: -- Attachment: HBASE-10136-trunk.patch Adding a patch for a test that exposes this issue. Test should pass once this issue is resolved. Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Attachments: HBASE-10136-trunk.patch Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at
[jira] [Commented] (HBASE-10184) [Online Schema Change]: Add additional tests for online schema change
[ https://issues.apache.org/jira/browse/HBASE-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850213#comment-13850213 ] Aleksandr Shulman commented on HBASE-10184: --- Thanks [~xieliang007]. I thought through some of the things that could happen simultaneously and be compromised by this operation. I have some test cases locally for some of these already that seem to pass. If you have any others you'd like to suggest, let me know :) [Online Schema Change]: Add additional tests for online schema change - Key: HBASE-10184 URL: https://issues.apache.org/jira/browse/HBASE-10184 Project: HBase Issue Type: Task Components: test Affects Versions: 0.96.1, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change There are some gaps in testing for Online Schema Change: Examples of some tests that should be added: 1. Splits with online schema change 2. Merge during online schema change 3. MR over HBase during online schema change 4. Bulk Load during online schema change 5. Online change table owner 6. Online Replication scope change 7. Online Bloom Filter change 8. Snapshots during online schema change (HBASE-10136) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table
Aleksandr Shulman created HBASE-10136: - Summary: [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) ... 5 more{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10136: -- Labels: online_schema_change (was: ) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) ... 5 more{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman reassigned HBASE-10136: - Assignee: Matteo Bertozzi (was: Aleksandr Shulman) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) ... 5 more{code} -- This message was sent by Atlassian JIRA
[jira] [Updated] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10136: -- Summary: [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table (was: [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table) [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table --- Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
[jira] [Commented] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845934#comment-13845934 ] Aleksandr Shulman commented on HBASE-10136: --- A potential solution might be table locking: With the table lock we would expect the modifyTable to wait for the snapshot to complete or the snapshot to wait the modifyTable to complete. [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table --- Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
[jira] [Commented] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845945#comment-13845945 ] Aleksandr Shulman commented on HBASE-10136: --- Sorry, yes, I should have phrased it as: {quote} A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing.{quote} [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table --- Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at
[jira] [Updated] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10136: -- Description: Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) ... 5 more{code} was: Expected behavior: A user can take a snapshot of a table while that table is undergoing an online schema change. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is
[jira] [Updated] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10136: -- Summary: Alter table conflicts with concurrent snapshot attempt on that table (was: [Online Schema Change]: Online Schema Change on a table conflicts with concurrent snapshot attempt on the table) Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at
[jira] [Commented] (HBASE-9966) Create IntegrationTest for Online Bloom Filter Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844770#comment-13844770 ] Aleksandr Shulman commented on HBASE-9966: -- Thanks Andrew! Create IntegrationTest for Online Bloom Filter Change - Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.96.1, 0.98.1, 0.99.0 Attachments: HBASE-9966-96.patch, HBASE-9966-98.patch, HBASE-9966-trunk.patch For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843699#comment-13843699 ] Aleksandr Shulman commented on HBASE-9966: -- Thanks for taking a look. Would you like me to create a formal code review or is this enough? Also, I may be adding some more online schema change monkeys, but I'll file a separate jira for that. Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.96.1, 0.98.1, 0.99.0 Attachments: HBASE-9966-96.patch, HBASE-9966-98.patch, HBASE-9966-trunk.patch For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9966: - Fix Version/s: (was: 0.95.2) 0.99.0 0.98.1 0.96.1 Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Fix For: 0.96.1, 0.98.1, 0.99.0 For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9966: - Attachment: HBASE-9966-96.patch HBASE-9966-98.patch HBASE-9966-trunk.patch The patch ends up being the same for Trunk, 98, and 0.96. Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Fix For: 0.96.1, 0.98.1, 0.99.0 Attachments: HBASE-9966-96.patch, HBASE-9966-98.patch, HBASE-9966-trunk.patch For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9966: - Labels: online_schema_change (was: ) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: online_schema_change Fix For: 0.96.1, 0.98.1, 0.99.0 Attachments: HBASE-9966-96.patch, HBASE-9966-98.patch, HBASE-9966-trunk.patch For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10073: -- Assignee: Andrew Purtell (was: Matteo Bertozzi) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Andrew Purtell Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840904#comment-13840904 ] Aleksandr Shulman commented on HBASE-10073: --- Assigned to Andrew Purtell to have a look, since he is the author of the patch. The patch itself is not incorrect, but reveals a larger issue of maintaining compatibility and harmony among the dependencies. [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Andrew Purtell Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841014#comment-13841014 ] Aleksandr Shulman commented on HBASE-10073: --- I think the problem runs a little deeper. If there are two versions of slf4j (or any dependency for that matter) on the classpath, then something will be affected by a resulting incompatibility. Because of changes in the ordering of the classpath in 0.96, I had to make some changes to my own setup scripts. Both configurations one could consider reasonable or representative of what a common user would do. Before my change, I reported this issue. After the change, hbase zkcli works fine, but there is a similar error now when starting master (much worse!). The revert you are suggesting may fix it though. I'll test it. Here's the master startup error: {code}21:53:54 2013-12-05 21:53:40,596 INFO [main] impl.MetricsSourceAdapter: MBean for source jvm registered. 21:53:54 2013-12-05 21:53:40,604 INFO [main] impl.MetricsSourceAdapter: MBean for source IPC,sub=IPC registered. 21:53:54 2013-12-05 21:53:41,250 INFO [main] impl.MetricsSourceAdapter: MBean for source ugi registered. 21:53:54 2013-12-05 21:53:41,628 INFO [main] master.HMaster: hbase.rootdir=hdfs://snapshot-tarball-vm-6.ent.cloudera.com:8020/hbase, hbase.cluster.distributed=true 21:53:54 2013-12-05 21:53:41,758 ERROR [main] master.HMasterCommandLine: Master exiting 21:53:54 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster 21:53:54at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2779) 21:53:54at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:184) 21:53:54at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:134) 21:53:54at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 21:53:54at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) 21:53:54at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2793) 21:53:54 Caused by: java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 21:53:54at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 21:53:54at org.apache.zookeeper.ZooKeeper.clinit(ZooKeeper.java:94) 21:53:54at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.init(RecoverableZooKeeper.java:112) 21:53:54at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:132) 21:53:54at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:165) 21:53:54at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:472) 21:53:54at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 21:53:54at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 21:53:54at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 21:53:54at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 21:53:54at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2774) 21:53:54... 5 more{code} [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Andrew Purtell Fix For: 0.98.0, 0.96.1, 0.99.0 Attachments: 10073-0.96.patch Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only
[jira] [Commented] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841032#comment-13841032 ] Aleksandr Shulman commented on HBASE-10073: --- {quote}The revert you are suggesting may fix it though. I'll test it.{quote} As expected, even with the different classpath, when applying your revert, everything builds and runs correctly. [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Andrew Purtell Fix For: 0.98.0, 0.96.1, 0.99.0 Attachments: 10073-0.96.patch Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10073) Revert HBASE-9718 (Add a test scope dependency on org.slf4j:slf4j-api to hbase-client)
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841067#comment-13841067 ] Aleksandr Shulman commented on HBASE-10073: --- Thanks Andrew! Revert HBASE-9718 (Add a test scope dependency on org.slf4j:slf4j-api to hbase-client) -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Andrew Purtell Fix For: 0.98.0, 0.96.1, 0.99.0 Attachments: 10073.patch Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
Aleksandr Shulman created HBASE-10073: - Summary: [Hadoop1]: hbase zkcli broken due to slf4j incompatibility Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10073: -- Assignee: Matteo Bertozzi [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10073) [Hadoop1]: hbase zkcli broken due to slf4j incompatibility
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10073: -- Description: Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. was: Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. [Hadoop1]: hbase zkcli broken due to slf4j incompatibility -- Key: HBASE-10073 URL: https://issues.apache.org/jira/browse/HBASE-10073 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1 Environment: Centos6, sun-jdk-64bit-1.7.0.25 Reporter: Aleksandr Shulman Observed behavior: In my automation, I have a call to hbase zkcli. That call recently broke with this checkin: https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c The error that is reported is: {code}++ ./hbase zkcli 11:19:58 Warning: $HADOOP_HOME is deprecated. 11:19:58 11:20:00 Exception in thread main java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory 11:20:00 at org.slf4j.LoggerFactory.clinit(LoggerFactory.java:60) 11:20:00 at org.apache.zookeeper.ZooKeeperMain.clinit(ZooKeeperMain.java:50) 11:20:00 at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) 11:20:00 Build step 'Execute shell' marked build as failure{code} That said, this checkin is perfectly valid as each component should be allowed to specify its own dependencies. The issue is a deeper one of dependency mismatches. Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, where there is a similar checkin, but since trunk is not required to work against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9980) [0.94] Wire compatibility test for 0.94
[ https://issues.apache.org/jira/browse/HBASE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829353#comment-13829353 ] Aleksandr Shulman commented on HBASE-9980: -- Thanks [~andrew.purt...@gmail.com] for the explanation. Makes sense. [0.94] Wire compatibility test for 0.94 --- Key: HBASE-9980 URL: https://issues.apache.org/jira/browse/HBASE-9980 Project: HBase Issue Type: Bug Affects Versions: 0.94.13 Reporter: Lars Hofhansl Assignee: Andrew Purtell See HBASE-9834. We should have a test that: # generates a file with all kinds of objects serialized into it. Save that file as part of the HBase tests # a test can then read the objects back from that file # a test can regenerate that file If both tests pass we can be reasonably sure that neither readFields nor write was changed in an incompatible way. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9980) [0.94] Wire compatibility test for 0.94
[ https://issues.apache.org/jira/browse/HBASE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828494#comment-13828494 ] Aleksandr Shulman commented on HBASE-9980: -- Great this is getting attention :) [~lhofhansl] {quote} those that are assignable from Writable.{quote} Can you elaborate on what exactly you mean by this. Can you also give a couple examples of objects that are and are not? [0.94] Wire compatibility test for 0.94 --- Key: HBASE-9980 URL: https://issues.apache.org/jira/browse/HBASE-9980 Project: HBase Issue Type: Bug Affects Versions: 0.94.13 Reporter: Lars Hofhansl Assignee: Andrew Purtell See HBASE-9834. We should have a test that: # generates a file with all kinds of objects serialized into it. Save that file as part of the HBase tests # a test can then read the objects back from that file # a test can regenerate that file If both tests pass we can be reasonably sure that neither readFields nor write was changed in an incompatible way. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9973) [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x
[ https://issues.apache.org/jira/browse/HBASE-9973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9973: - Labels: acl (was: ) [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x Key: HBASE-9973 URL: https://issues.apache.org/jira/browse/HBASE-9973 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.96.0, 0.96.1 Reporter: Aleksandr Shulman Labels: acl Fix For: 0.96.1 In our testing, we have uncovered that the ACL permissions for users with the 'A' credential do not hold after the upgrade to 0.96.x. This is because in the ACL table, the entry for the admin user is a permission on the '_acl_' table with permission 'A'. However, because of the namespace transition, there is no longer an '_acl_' table. Therefore, that entry in the hbase:acl table is no longer valid. Example: {code}hbase(main):002:0 scan 'hbase:acl' ROW COLUMN+CELL TestTablecolumn=l:hdfs, timestamp=1384454830701, value=RW TestTablecolumn=l:root, timestamp=1384455875586, value=RWCA _acl_column=l:root, timestamp=1384454767568, value=C _acl_column=l:tableAdmin, timestamp=1384454788035, value=A hbase:aclcolumn=l:root, timestamp=1384455875786, value=C {code} In this case, the following entry becomes meaningless: {code} _acl_column=l:tableAdmin, timestamp=1384454788035, value=A {code} As a result, Proposed fix: I see the fix being relatively straightforward. As part of the migration, change any entries in the '_acl_' table with key '_acl_' into a new row with key 'hbase:acl', all else being the same. And the old entry would be deleted. This can go into the standard migration script that we expect users to run. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9973) [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x
Aleksandr Shulman created HBASE-9973: Summary: [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x Key: HBASE-9973 URL: https://issues.apache.org/jira/browse/HBASE-9973 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.96.0, 0.96.1 Reporter: Aleksandr Shulman Fix For: 0.96.1 In our testing, we have uncovered that the ACL permissions for users with the 'A' credential do not hold after the upgrade to 0.96.x. This is because in the ACL table, the entry for the admin user is a permission on the '_acl_' table with permission 'A'. However, because of the namespace transition, there is no longer an '_acl_' table. Therefore, that entry in the hbase:acl table is no longer valid. Example: {code}hbase(main):002:0 scan 'hbase:acl' ROW COLUMN+CELL TestTablecolumn=l:hdfs, timestamp=1384454830701, value=RW TestTablecolumn=l:root, timestamp=1384455875586, value=RWCA _acl_column=l:root, timestamp=1384454767568, value=C _acl_column=l:tableAdmin, timestamp=1384454788035, value=A hbase:aclcolumn=l:root, timestamp=1384455875786, value=C {code} In this case, the following entry becomes meaningless: {code} _acl_column=l:tableAdmin, timestamp=1384454788035, value=A {code} As a result, Proposed fix: I see the fix being relatively straightforward. As part of the migration, change any entries in the '_acl_' table with key '_acl_' into a new row with key 'hbase:acl', all else being the same. And the old entry would be deleted. This can go into the standard migration script that we expect users to run. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9973) [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x
[ https://issues.apache.org/jira/browse/HBASE-9973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9973: - Assignee: Himanshu Vashishtha [ACL]: Users with 'Admin' ACL permission will lose permissions after upgrade to 0.96.x from 0.94.x or 0.92.x Key: HBASE-9973 URL: https://issues.apache.org/jira/browse/HBASE-9973 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.96.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Labels: acl Fix For: 0.96.1 In our testing, we have uncovered that the ACL permissions for users with the 'A' credential do not hold after the upgrade to 0.96.x. This is because in the ACL table, the entry for the admin user is a permission on the '_acl_' table with permission 'A'. However, because of the namespace transition, there is no longer an '_acl_' table. Therefore, that entry in the hbase:acl table is no longer valid. Example: {code}hbase(main):002:0 scan 'hbase:acl' ROW COLUMN+CELL TestTablecolumn=l:hdfs, timestamp=1384454830701, value=RW TestTablecolumn=l:root, timestamp=1384455875586, value=RWCA _acl_column=l:root, timestamp=1384454767568, value=C _acl_column=l:tableAdmin, timestamp=1384454788035, value=A hbase:aclcolumn=l:root, timestamp=1384455875786, value=C {code} In this case, the following entry becomes meaningless: {code} _acl_column=l:tableAdmin, timestamp=1384454788035, value=A {code} As a result, Proposed fix: I see the fix being relatively straightforward. As part of the migration, change any entries in the '_acl_' table with key '_acl_' into a new row with key 'hbase:acl', all else being the same. And the old entry would be deleted. This can go into the standard migration script that we expect users to run. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Work started] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-9966 started by Aleksandr Shulman. Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Fix For: 0.95.2 For online merge, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
Aleksandr Shulman created HBASE-9966: Summary: Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman For online merge, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9966) Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change
[ https://issues.apache.org/jira/browse/HBASE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9966: - Description: For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. was: For online merge, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. Create IntegrationTest for Online Bloom Filter and Compression Algorithm Change --- Key: HBASE-9966 URL: https://issues.apache.org/jira/browse/HBASE-9966 Project: HBase Issue Type: Sub-task Components: HFile, test Affects Versions: 0.98.0, 0.96.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Fix For: 0.95.2 For online schema change, a user is perfectly with her rights to modify the compression algorithm used, or the bloom filter. Therefore, we should add these actions to our ChaosMonkey tests to ensure that they do not introduce instability. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7639) Enable online schema update by default
[ https://issues.apache.org/jira/browse/HBASE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-7639: - Labels: online_schema_change (was: ) Enable online schema update by default --- Key: HBASE-7639 URL: https://issues.apache.org/jira/browse/HBASE-7639 Project: HBase Issue Type: Bug Affects Versions: 0.95.2 Reporter: Enis Soztutar Assignee: Elliott Clark Labels: online_schema_change Fix For: 0.98.0, 0.95.2 Attachments: HBASE-7639-0.patch After we get HBASE-7305 and HBASE-7546, things will become stable enough to enable online schema update to be enabled by default. {code} property namehbase.online.schema.update.enable/name valuefalse/value description Set true to enable online schema changes. This is an experimental feature.ยทยท There are known issues modifying table schemas at the same time a region split is happening so your table needs to be quiescent or else you have to be running with splits disabled. /description /property {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-8726: - Labels: online_schema_change (was: ) Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Labels: online_schema_change Fix For: 0.95.2 Attachments: HBASE-8726-0.patch, HBASE-8726-1.patch, HBASE-8726-2.patch, HBASE-8726-3.patch, HBASE-8726-4.patch With table locks in place it should be time to start really testing online table schema changes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-5678) Dynamic configuration capability for Hbase.
[ https://issues.apache.org/jira/browse/HBASE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-5678: - Description: I think, some preperties can be dynamically configured without restart of the nodes. This is an umberilla JIRA for this Feature. In Hadoop we already had such feature but not yet implemented by nodes. I think we can have the similar base framework here and can implemented by nodes. So, that whatever properies are allowed to reconfigurable, should be able to reconfigure with new values with out restarting the node. I will come up with some design doc with noeds implementation and will raise subtasks for each. was: I think, some preperties can be danamically configured without restart of the nodes. This is an umberilla JIRA for this Feature. In Hadoop we already had such feature but not yet implemented by nodes. I think we can have the similar base framework here and can implemented by nodes. So, that whatever properies are allowed to reconfigurable, should be able to reconfigure with new values with out restarting the node. I will come up with some design doc with noeds implementation and will raise subtasks for each. Dynamic configuration capability for Hbase. --- Key: HBASE-5678 URL: https://issues.apache.org/jira/browse/HBASE-5678 Project: HBase Issue Type: New Feature Components: master, regionserver, util Affects Versions: 0.95.2 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G I think, some preperties can be dynamically configured without restart of the nodes. This is an umberilla JIRA for this Feature. In Hadoop we already had such feature but not yet implemented by nodes. I think we can have the similar base framework here and can implemented by nodes. So, that whatever properies are allowed to reconfigurable, should be able to reconfigure with new values with out restarting the node. I will come up with some design doc with noeds implementation and will raise subtasks for each. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9395) Disable Schema Change on 0.96
[ https://issues.apache.org/jira/browse/HBASE-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9395: - Labels: online_schema_change (was: ) Disable Schema Change on 0.96 - Key: HBASE-9395 URL: https://issues.apache.org/jira/browse/HBASE-9395 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Labels: online_schema_change Fix For: 0.96.0 Attachments: HBASE-9395-95-0.patch Running LoadTestAndVerify fails when the chaos monkey is slowDeterministic. When commenting out all of the schema change actions everything passes. We should disable the schema change until we can be 100% sure of data integrity. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-8775) Throttle online schema changes.
[ https://issues.apache.org/jira/browse/HBASE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-8775: - Labels: online_schema_change (was: ) Throttle online schema changes. --- Key: HBASE-8775 URL: https://issues.apache.org/jira/browse/HBASE-8775 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Labels: online_schema_change Fix For: 0.89-fb Throttle the open and close of the regions after an online schema change -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9407) Online Schema Change causes Test Load and Verify to fail.
[ https://issues.apache.org/jira/browse/HBASE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9407: - Labels: online_schema_change (was: ) Online Schema Change causes Test Load and Verify to fail. - Key: HBASE-9407 URL: https://issues.apache.org/jira/browse/HBASE-9407 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Elliott Clark Labels: online_schema_change -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-4741) Online schema change doesn't return errors
[ https://issues.apache.org/jira/browse/HBASE-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-4741: - Labels: online_schema_change (was: ) Online schema change doesn't return errors -- Key: HBASE-4741 URL: https://issues.apache.org/jira/browse/HBASE-4741 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Labels: online_schema_change Fix For: 0.92.0 Attachments: 4741-v2.txt, 4741-v3.txt, 4741-v4.txt, 4741-v5.txt, 4741-v6.txt, 4741-v7.txt, 4741.txt Still after the fun I had over in HBASE-4729, I tried to finish altering my table (remove a family) since only half of it was changed so I did this: {quote} hbase(main):002:0 alter 'TestTable', NAME = 'allo', METHOD = 'delete' Updating all regions with the new schema... 244/244 regions updated. Done. 0 row(s) in 1.2480 seconds {quote} Nice it all looks good, but over in the master log: {quote} org.apache.hadoop.hbase.InvalidFamilyOperationException: Family 'allo' does not exist so cannot be deleted at org.apache.hadoop.hbase.master.handler.TableDeleteFamilyHandler.handleTableOperation(TableDeleteFamilyHandler.java:56) at org.apache.hadoop.hbase.master.handler.TableEventHandler.process(TableEventHandler.java:86) at org.apache.hadoop.hbase.master.HMaster.deleteColumn(HMaster.java:1011) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:348) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1242) {quote} Maybe we should do checks before launching the async task. Marking critical as this is a regression. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-5335) Dynamic Schema Configurations
[ https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-5335: - Labels: configuration online_schema_change schema (was: configuration schema) Dynamic Schema Configurations - Key: HBASE-5335 URL: https://issues.apache.org/jira/browse/HBASE-5335 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Labels: configuration, online_schema_change, schema Fix For: 0.94.7, 0.95.0 Attachments: ASF.LICENSE.NOT.GRANTED--D2247.1.patch, ASF.LICENSE.NOT.GRANTED--D2247.2.patch, ASF.LICENSE.NOT.GRANTED--D2247.3.patch, ASF.LICENSE.NOT.GRANTED--D2247.4.patch, ASF.LICENSE.NOT.GRANTED--D2247.5.patch, ASF.LICENSE.NOT.GRANTED--D2247.6.patch, ASF.LICENSE.NOT.GRANTED--D2247.7.patch, ASF.LICENSE.NOT.GRANTED--D2247.8.patch, HBASE-5335-trunk-2.patch, HBASE-5335-trunk-3.patch, HBASE-5335-trunk-3.patch, HBASE-5335-trunk-4.patch, HBASE-5335-trunk.patch Currently, the ability for a core developer to add per-table per-CF configuration settings is very heavyweight. You need to add a reserved keyword all the way up the stack you have to support this variable long-term if you're going to expose it explicitly to the user. This has ended up with using Configuration.get() a lot because it is lightweight and you can tweak settings while you're trying to understand system behavior [since there are many config params that may never need to be tuned]. We need to add the ability to put read arbitrary KV settings in the HBase schema. Combined with online schema change, this will allow us to safely iterate on configuration settings. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7236) add per-table/per-cf configuration via metadata
[ https://issues.apache.org/jira/browse/HBASE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-7236: - Labels: online_schema_change (was: ) add per-table/per-cf configuration via metadata --- Key: HBASE-7236 URL: https://issues.apache.org/jira/browse/HBASE-7236 Project: HBase Issue Type: Umbrella Affects Versions: 0.95.2 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: online_schema_change Fix For: 0.95.0 Attachments: HBASE-7236-PROTOTYPE-v1.patch, HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE.patch, HBASE-7236-v0.patch, HBASE-7236-v1.patch, HBASE-7236-v2.patch, HBASE-7236-v3.patch, HBASE-7236-v4.patch, HBASE-7236-v5.patch, HBASE-7236-v6.patch, HBASE-7236-v6.patch Regardless of the compaction policy, it makes sense to have separate configuration for compactions for different tables and column families, as their access patterns and workloads can be different. In particular, for tiered compactions that are being ported from 0.89-fb branch it is necessary to have, to use it properly. We might want to add support for compaction configuration via metadata on table/cf. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9407) Online Schema Change causes Test Load and Verify to fail.
[ https://issues.apache.org/jira/browse/HBASE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817674#comment-13817674 ] Aleksandr Shulman commented on HBASE-9407: -- Hi Elliott, is this issue still occurring? If so, can you add more specifics about the failure mode, how often it occurs, potential root causes, etc. Online Schema Change causes Test Load and Verify to fail. - Key: HBASE-9407 URL: https://issues.apache.org/jira/browse/HBASE-9407 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Elliott Clark Labels: online_schema_change -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9786) [hbck]: hbck -metaonly incorrectly reports inconsistent regions after HBASE-9698 fix
Aleksandr Shulman created HBASE-9786: Summary: [hbck]: hbck -metaonly incorrectly reports inconsistent regions after HBASE-9698 fix Key: HBASE-9786 URL: https://issues.apache.org/jira/browse/HBASE-9786 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.96.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi In my testing, I found that this call began to fail: {code}sudo -u hbase hbase hbck -metaonly {code} The checking after which it began to fail is: after this checkin: https://github.com/apache/hbase/commit/818749ff9f261aac4206054d331189e92290b408 The full output is below. The issue seems the patch does not include -metaOnly Testing done: I build 0.96 up to commit a6f208d91efff207860b049eb8466a069f0c71a9 and the test passes. The output: {code}Summary: clonedtestSnapshotSource-1381959945438 is okay. Number of regions: 0 Deployed on: hbase:meta is okay. Number of regions: 1 Deployed on: tarball-target-2.ent.cloudera.com,60020,1381952904985 hbase:namespace is okay. Number of regions: 0 Deployed on: sampleTable_tarball-target-2.ent.cloudera.com is okay. Number of regions: 0 Deployed on: testMRIncrementalLoadWithSplit_1381959500784 is okay. Number of regions: 0 Deployed on: testMRIncrementalLoad_1381959434211 is okay. Number of regions: 0 Deployed on: testSnapshotSource-1381959945438 is okay. Number of regions: 0 Deployed on: testSnapshotSource-1381959995995 is okay. Number of regions: 0 Deployed on: 7 inconsistencies detected. Status: INCONSISTENT {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9786) [hbck]: hbck -metaonly incorrectly reports inconsistent regions after HBASE-9698 fix
[ https://issues.apache.org/jira/browse/HBASE-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797440#comment-13797440 ] Aleksandr Shulman commented on HBASE-9786: -- Tested the fix -- looks good. [hbck]: hbck -metaonly incorrectly reports inconsistent regions after HBASE-9698 fix Key: HBASE-9786 URL: https://issues.apache.org/jira/browse/HBASE-9786 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.98.0, 0.96.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.96.0 Attachments: HBASE-9786-v0.patch In my testing, I found that this call began to fail: {code}sudo -u hbase hbase hbck -metaonly {code} The checking after which it began to fail is: after this checkin: https://github.com/apache/hbase/commit/818749ff9f261aac4206054d331189e92290b408 The full output is below. The issue seems the patch does not include -metaOnly Testing done: I build 0.96 up to commit a6f208d91efff207860b049eb8466a069f0c71a9 and the test passes. The output: {code} $ hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=1 sequentialWrite 1 $ hbase hbck -metaonly ... 2013-10-16 23:52:24,075 DEBUG [main] util.HBaseFsck: There are 1 region info entries ERROR: There is a hole in the region chain between and . You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: Found inconsistency in table TestTable ERROR: There is a hole in the region chain between and . You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: Found inconsistency in table hbase:namespace 2013-10-16 23:52:24,182 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=9 watcher=hbase Fsck 2013-10-16 23:52:24,183 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181 2013-10-16 23:52:24,183 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2013-10-16 23:52:24,184 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 2013-10-16 23:52:24,188 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141c377e423000d, negotiated timeout = 4 Summary: TestTable is okay. Number of regions: 0 Deployed on: hbase:meta is okay. Number of regions: 1 Deployed on: localhost,49217,1381963918103 hbase:namespace is okay. Number of regions: 0 Deployed on: 2 inconsistencies detected. Status: INCONSISTENT {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9735) region_mover.rb uses the removed HConnection.getZooKeeperWatcher() method
[ https://issues.apache.org/jira/browse/HBASE-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790774#comment-13790774 ] Aleksandr Shulman commented on HBASE-9735: -- I tested the patch. It looks good! Thanks Matteo for the fix! region_mover.rb uses the removed HConnection.getZooKeeperWatcher() method - Key: HBASE-9735 URL: https://issues.apache.org/jira/browse/HBASE-9735 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.96.0 Attachments: HBASE-9735-v0.patch [~aleksshulman] found that region_mover.rb is using HConnection.getZooKeeperWatcher(), which is deprecated in 94 and removed in 96. {code} 14:02:34 2013-09-16 14:02:34,945 INFO [main] region_mover: Moving 7 region(s) from c5-rolling2-4.ent.cloudera.com,60020,1379364656888 during this cycle 14:02:34 [c5-rolling2-2.ent.cloudera.com] out: 2013-09-16 14:02:34,951 INFO [main] region_mover: Moving region 1588230740 (0 of 7) to server=c5-rolling2-2.ent.cloudera.com,60020,1379365188814 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: NoMethodError: undefined method `getZooKeeperWatcher' for ##Class:0x1fe91485:0x465098f9 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: getServerNameForRegion at /usr/lib/hbase/bin/region_mover.rb:91 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: isSameServer at /usr/lib/hbase/bin/region_mover.rb:73 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: move at /usr/lib/hbase/bin/region_mover.rb:157 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: __for__ at /usr/lib/hbase/bin/region_mover.rb:327 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: each at file:/usr/lib/hbase/lib/jruby-complete-1.6.8.jar!/builtin/java/java.util.rb:7 14:02:35 [c5-rolling2-2.ent.cloudera.com] out:unloadRegions at /usr/lib/hbase/bin/region_mover.rb:318 14:02:35 [c5-rolling2-2.ent.cloudera.com] out: (root) at /usr/lib/hbase/bin/region_mover.rb:456 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9663) PerformanceEvaluation does not properly honor specified table name parameter
Aleksandr Shulman created HBASE-9663: Summary: PerformanceEvaluation does not properly honor specified table name parameter Key: HBASE-9663 URL: https://issues.apache.org/jira/browse/HBASE-9663 Project: HBase Issue Type: Bug Components: Client, test Reporter: Aleksandr Shulman Fix For: 0.94.13, 0.96.1, 0.95.2 Expected behavior: A user should be able to specify a given table for PerformanceEvaluation and have that table be used. That table does not need to exist. If it doesn't exist, PE will create it. Observed behavior: In creating the job, PE will use the new table name. However, the map tasks will fail because they are still looking for TestTable, which is not there. Potential causes: In the PE code, we see that the table's is not argument to MR: https://github.com/apache/hbase/blob/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L723 Command: {code} hbase org.apache.hadoop.hbase.PerformanceEvaluation --table=t2 sequentialWrite 2{code} Output: {code}initiating session 13/09/26 00:36:02 INFO zookeeper.ClientCnxn: Session establishment complete on server REDACTED/10.20.187.137:2181, sessionid = 0x14159256f9b0031, negotiated timeout = 18 13/09/26 00:36:02 DEBUG client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@c8d427c; serverName=REDACTED,60020,1380180157520 13/09/26 00:36:02 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is REDACTED:60020 13/09/26 00:36:02 DEBUG client.ClientScanner: Creating scanner over .META. starting at key 't2,,' 13/09/26 00:36:02 DEBUG client.ClientScanner: Advancing internal scanner to startKey at 't2,,' 13/09/26 00:36:02 DEBUG catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@466e06d7 13/09/26 00:36:02 INFO zookeeper.ZooKeeper: Session: 0x14159256f9b0031 closed 13/09/26 00:36:02 INFO zookeeper.ClientCnxn: EventThread shut down 13/09/26 00:36:02 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=REDACTED:2181 sessionTimeout=18 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@c8d427c 13/09/26 00:36:02 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 10390@REDACTED 13/09/26 00:36:02 DEBUG catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@5152cfbb 13/09/26 00:36:02 INFO zookeeper.ClientCnxn: Opening socket connection to server REDACTED/10.20.187.137:2181. Will not attempt to authenticate using SASL (unknown error) 13/09/26 00:36:02 INFO zookeeper.ClientCnxn: Socket connection established to REDACTED/10.20.187.137:2181, initiating session 13/09/26 00:36:02 INFO zookeeper.ClientCnxn: Session establishment complete on server REDACTED/10.20.187.137:2181, sessionid = 0x14159256f9b0032, negotiated timeout = 18 13/09/26 00:36:02 DEBUG client.ClientScanner: Creating scanner over .META. starting at key 't2,,' 13/09/26 00:36:02 DEBUG client.ClientScanner: Advancing internal scanner to startKey at 't2,,' 13/09/26 00:36:02 DEBUG catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@5152cfbb 13/09/26 00:36:03 INFO zookeeper.ZooKeeper: Session: 0x14159256f9b0032 closed 13/09/26 00:36:03 INFO zookeeper.ClientCnxn: EventThread shut down 13/09/26 00:36:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/09/26 00:36:06 INFO input.FileInputFormat: Total input paths to process : 1 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[0] startRow=1363147 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[1] startRow=1468004 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[2] startRow=1887432 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[3] startRow=209714 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[4] startRow=524285 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[5] startRow=1048576 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[6] startRow=1572861 rows=104857 totalRows=2097152 clients=2 flushCommits=true writeToWAL=true 13/09/26 00:36:06 DEBUG hbase.PerformanceEvaluation: split[7]
[jira] [Commented] (HBASE-9603) IsRestoreSnapshotDoneResponse has wrong default causing restoreSnapshot() to be async
[ https://issues.apache.org/jira/browse/HBASE-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773552#comment-13773552 ] Aleksandr Shulman commented on HBASE-9603: -- Patch looks good Matteo. +1 IsRestoreSnapshotDoneResponse has wrong default causing restoreSnapshot() to be async - Key: HBASE-9603 URL: https://issues.apache.org/jira/browse/HBASE-9603 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.96.0 Attachments: HBASE-9603-v0.patch the done field of IsRestoreSnapshotDoneRequest is set to true which cause the restoreSnapshot() to not wait until the restore is done. resulting in an async behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9153) Introduce/update a script to generate jdiff reports
[ https://issues.apache.org/jira/browse/HBASE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772094#comment-13772094 ] Aleksandr Shulman commented on HBASE-9153: -- Looks like it's not going in smoothly in a few places. What can I do to help? Introduce/update a script to generate jdiff reports --- Key: HBASE-9153 URL: https://issues.apache.org/jira/browse/HBASE-9153 Project: HBase Issue Type: Task Reporter: Jonathan Hsieh Assignee: Aleksandr Shulman Fix For: 0.98.0, 0.96.0, 0.94.13 Attachments: HBASE-9153-v1.patch, HBASE-9153-v3.patch, HBASE-9153-v4-0.94.patch, HBASE-9153-v4-0.95.patch, HBASE-9153-v4-trunk.patch, HBASE-9153-v5-0.94.patch, HBASE-9153-v5-0.95.patch, HBASE-9153-v5-trunk.patch, HBASE-9153-v6-0.94.patch, HBASE-9153-v6-0.95.patch, HBASE-9153-v6-trunk.patch We've had a few issues now where we've removed API's without deprecating or deprecating in the late release. (HBASE-9142, HBASE-9093) We should just have a tool that enforces our api deprecation policy as a release time check or as a precommit check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9533) List of dependency jars for MR jobs is hard-coded and does not include netty, breaking MRv1 jobs
[ https://issues.apache.org/jira/browse/HBASE-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772239#comment-13772239 ] Aleksandr Shulman commented on HBASE-9533: -- [~saint@gmail.com] and I took a look at the issue and discovered that the dependency is being excluded by zookeeper in 0.96 and trunk. Removing the exclusion fixes the problem. Testing: We have automation that runs MRv1 over HBase that failed because of this issue. When we removed the exclusion and ran it from a custom branch, it passed. The branch is the latest 0.96 + the patch. https://github.com/AleksandrShulman/hbase/commit/5f7df8e7b08eebe2d28337e2eb0750acea21d51d After the patch is applied, MRv1 and MRv2 both work for a regular pi job (MR only) and a rowcounter job (MR over HBase) The patch is straightforward. I will attach it shortly. List of dependency jars for MR jobs is hard-coded and does not include netty, breaking MRv1 jobs Key: HBASE-9533 URL: https://issues.apache.org/jira/browse/HBASE-9533 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.95.2, 0.96.1 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Fix For: 0.95.2, 0.96.1 Attachments: failed_mrv1_rowcounter_tt_taskoutput.out Observed behavior: Against trunk, using MRv1 with hadoop 1.0.4, r1393290, I am able to run MRv1 jobs (e.g. pi 2 4). However, when I use it to run MR over HBase jobs, they fail with the stack trace below. From the trace, the issue seems to be that it cannot find a class that the netty jar contains. This would make sense, given that the dependency jars that we use for the MapReduce job are hard-coded, and that the netty jar is not one of them. https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java#L519 Strangely, this is only an issue in trunk, not 0.95, even though the code hasn't changed. Command: {code}/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter sampletable{code} TT logs (attached) Output from console running job: {code}13/09/13 16:02:58 INFO mapred.JobClient: Task Id : attempt_201309131601_0002_m_00_2, Status : FAILED java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:119) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:489) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) 13/09/13 16:03:09 INFO mapred.JobClient: Job complete: job_201309131601_0002 13/09/13 16:03:09 INFO mapred.JobClient: Counters: 7 13/09/13 16:03:09 INFO mapred.JobClient: Job Counters 13/09/13 16:03:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29913 13/09/13 16:03:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/09/13 16:03:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/09/13 16:03:09 INFO mapred.JobClient: Launched map tasks=4 13/09/13 16:03:09 INFO mapred.JobClient: Data-local map tasks=4 13/09/13 16:03:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/09/13 16:03:09 INFO mapred.JobClient: Failed map tasks=1{code} Expected behavior: As a stopgap, the netty jar should be included in that list. More generally, there should be a more elegant way to include the jars that are needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9533) List of dependency jars for MR jobs is hard-coded and does not include netty, breaking MRv1 jobs
[ https://issues.apache.org/jira/browse/HBASE-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772585#comment-13772585 ] Aleksandr Shulman commented on HBASE-9533: -- I'm +1 on the patch for 0.96 and trunk. List of dependency jars for MR jobs is hard-coded and does not include netty, breaking MRv1 jobs Key: HBASE-9533 URL: https://issues.apache.org/jira/browse/HBASE-9533 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.95.2, 0.96.1 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0 Attachments: 9533.txt, 9533v3.txt, failed_mrv1_rowcounter_tt_taskoutput.out Observed behavior: Against trunk, using MRv1 with hadoop 1.0.4, r1393290, I am able to run MRv1 jobs (e.g. pi 2 4). However, when I use it to run MR over HBase jobs, they fail with the stack trace below. From the trace, the issue seems to be that it cannot find a class that the netty jar contains. This would make sense, given that the dependency jars that we use for the MapReduce job are hard-coded, and that the netty jar is not one of them. https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java#L519 Strangely, this is only an issue in trunk, not 0.95, even though the code hasn't changed. Command: {code}/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter sampletable{code} TT logs (attached) Output from console running job: {code}13/09/13 16:02:58 INFO mapred.JobClient: Task Id : attempt_201309131601_0002_m_00_2, Status : FAILED java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:119) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:489) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) 13/09/13 16:03:09 INFO mapred.JobClient: Job complete: job_201309131601_0002 13/09/13 16:03:09 INFO mapred.JobClient: Counters: 7 13/09/13 16:03:09 INFO mapred.JobClient: Job Counters 13/09/13 16:03:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29913 13/09/13 16:03:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/09/13 16:03:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/09/13 16:03:09 INFO mapred.JobClient: Launched map tasks=4 13/09/13 16:03:09 INFO mapred.JobClient: Data-local map tasks=4 13/09/13 16:03:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/09/13 16:03:09 INFO mapred.JobClient: Failed map tasks=1{code} Expected behavior: As a stopgap, the netty jar should be included in that list. More generally, there should be a more elegant way to include the jars that are needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9153) Introduce/update a script to generate jdiff reports
[ https://issues.apache.org/jira/browse/HBASE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771283#comment-13771283 ] Aleksandr Shulman commented on HBASE-9153: -- Hey Nick, thanks for the commit. In terms of release note, here is what I propose: Added a tool to generate a report of API compatibility between different versions of hbase. It is found in dev-support and uses JDiff under the covers. Usage info at the top of the script. I'm not sure if this is exactly what you're looking for, but we can adjust as necessary. Introduce/update a script to generate jdiff reports --- Key: HBASE-9153 URL: https://issues.apache.org/jira/browse/HBASE-9153 Project: HBase Issue Type: Task Reporter: Jonathan Hsieh Assignee: Aleksandr Shulman Fix For: 0.98.0, 0.96.0 Attachments: HBASE-9153-v1.patch, HBASE-9153-v3.patch, HBASE-9153-v4-0.94.patch, HBASE-9153-v4-0.95.patch, HBASE-9153-v4-trunk.patch, HBASE-9153-v5-0.94.patch, HBASE-9153-v5-0.95.patch, HBASE-9153-v5-trunk.patch, HBASE-9153-v6-0.94.patch, HBASE-9153-v6-0.95.patch, HBASE-9153-v6-trunk.patch We've had a few issues now where we've removed API's without deprecating or deprecating in the late release. (HBASE-9142, HBASE-9093) We should just have a tool that enforces our api deprecation policy as a release time check or as a precommit check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9153) Create a deprecation policy enforcement check
[ https://issues.apache.org/jira/browse/HBASE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9153: - Attachment: HBASE-9153-v5-0.95.patch HBASE-9153-v5-0.94.patch HBASE-9153-v5-trunk.patch File name fixed in the comments. Create a deprecation policy enforcement check - Key: HBASE-9153 URL: https://issues.apache.org/jira/browse/HBASE-9153 Project: HBase Issue Type: Task Reporter: Jonathan Hsieh Attachments: HBASE-9153-v1.patch, HBASE-9153-v3.patch, HBASE-9153-v4-0.94.patch, HBASE-9153-v4-0.95.patch, HBASE-9153-v4-trunk.patch, HBASE-9153-v5-0.94.patch, HBASE-9153-v5-0.95.patch, HBASE-9153-v5-trunk.patch We've had a few issues now where we've removed API's without deprecating or deprecating in the late release. (HBASE-9142, HBASE-9093) We should just have a tool that enforces our api deprecation policy as a release time check or as a precommit check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9153) Create a deprecation policy enforcement check
[ https://issues.apache.org/jira/browse/HBASE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-9153: - Attachment: HBASE-9153-v6-trunk.patch HBASE-9153-v6-0.95.patch HBASE-9153-v6-0.94.patch Had to fix one last thing. Create a deprecation policy enforcement check - Key: HBASE-9153 URL: https://issues.apache.org/jira/browse/HBASE-9153 Project: HBase Issue Type: Task Reporter: Jonathan Hsieh Attachments: HBASE-9153-v1.patch, HBASE-9153-v3.patch, HBASE-9153-v4-0.94.patch, HBASE-9153-v4-0.95.patch, HBASE-9153-v4-trunk.patch, HBASE-9153-v5-0.94.patch, HBASE-9153-v5-0.95.patch, HBASE-9153-v5-trunk.patch, HBASE-9153-v6-0.94.patch, HBASE-9153-v6-0.95.patch, HBASE-9153-v6-trunk.patch We've had a few issues now where we've removed API's without deprecating or deprecating in the late release. (HBASE-9142, HBASE-9093) We should just have a tool that enforces our api deprecation policy as a release time check or as a precommit check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira