[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661751#comment-16661751
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #27 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661744#comment-16661744
 ] 

Hudson commented on HBASE-21342:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661742#comment-16661742
 ] 

Hudson commented on HBASE-21338:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> [balancer] If balancer is an ill-fit for cluster size, it gives little 
> indication
> -
>
> Key: HBASE-21338
> URL: https://issues.apache.org/jira/browse/HBASE-21338
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21338.master.001.patch
>
>
> See parent issue. Running balancer on a cluster where the max steps was way 
> inadequate, the balancer gave little to no indication that it was 
> ill-configured. In fact, it only logged its starting and then that there was 
> nothing to do though the cluster was obviously out-of-whack.
> Ideally the balancer would complain when say the maxSteps limit is a small 
> fraction of what the cluster's calculated max steps are, or it would notice 
> that the balancer is making little progress on an imbalanced cluster and 
> shout. Can we set balancer configs w/o having to restart Master?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661743#comment-16661743
 ] 

Hudson commented on HBASE-21349:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718)
> at 
> 

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661729#comment-16661729
 ] 

Hadoop QA commented on HBASE-21363:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
36s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945335/HBASE-21363-v4.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b207abd0da91 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840/testReport/ |
| Max. process+thread count | 273 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Rewrite the 

[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661720#comment-16661720
 ] 

Hadoop QA commented on HBASE-21325:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
32s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 32s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
12s{color} | {color:red} hbase-server: The patch generated 3 new + 76 unchanged 
- 0 fixed = 79 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 55s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.TestSyncReplicationRemoveRemoteWAL |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21325 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945315/HBASE-21325.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3106f256c7e3 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| compile | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchprocess/patch-compile-hbase-server.txt
 |
| javac | 

[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661716#comment-16661716
 ] 

Hudson commented on HBASE-21338:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [balancer] If balancer is an ill-fit for cluster size, it gives little 
> indication
> -
>
> Key: HBASE-21338
> URL: https://issues.apache.org/jira/browse/HBASE-21338
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21338.master.001.patch
>
>
> See parent issue. Running balancer on a cluster where the max steps was way 
> inadequate, the balancer gave little to no indication that it was 
> ill-configured. In fact, it only logged its starting and then that there was 
> nothing to do though the cluster was obviously out-of-whack.
> Ideally the balancer would complain when say the maxSteps limit is a small 
> fraction of what the cluster's calculated max steps are, or it would notice 
> that the balancer is making little progress on an imbalanced cluster and 
> shout. Can we set balancer configs w/o having to restart Master?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661718#comment-16661718
 ] 

Hudson commented on HBASE-21342:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661717#comment-16661717
 ] 

Hudson commented on HBASE-21349:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> 

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661708#comment-16661708
 ] 

Allan Yang commented on HBASE-21363:


{code}
+this.partial = resetDelete ? false : other.partial;
{code}
Add a comment for this one
The v4 patch looks great, +1 for it. You can add a comment while committing 

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661707#comment-16661707
 ] 

Hadoop QA commented on HBASE-21372:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
52s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
35s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}176m 
23s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945308/HBASE-21372.branch-2.1.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d021c3937513 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / d35f65f396 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14833/testReport/ |
| Max. process+thread count | 4249 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661695#comment-16661695
 ] 

Duo Zhang commented on HBASE-21363:
---

Add a UT to confirm that we will reset all the deleted flags when building 
holdingCleanupTracker even if the original tracker is partial.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Attachment: HBASE-21363-v4.patch

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344.branch-2.0.003.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch, 
> HBASE-21344.branch-2.0.003.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> 

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661668#comment-16661668
 ] 

Duo Zhang commented on HBASE-21363:
---

Talked with [~allan163] offline, there are still problems. I'm writing a UT 
now. Will be back soon.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661667#comment-16661667
 ] 

stack commented on HBASE-21364:
---

Thanks boys.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661666#comment-16661666
 ] 

stack commented on HBASE-21344:
---

You need to add .branch-2.0. into the name of your patch [~an...@apache.org] 
Also, this is an important patch because the left-over start will prevent us 
getting to the wait-on-meta holding pattern.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> 

[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661663#comment-16661663
 ] 

Duo Zhang commented on HBASE-21254:
---

Can do this later.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21357:
---
   Resolution: Fixed
Fix Version/s: 1.4.9
   Status: Resolved  (was: Patch Available)

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 1.4.9
>
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21357) RS should abort if OOM in Reader thread

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661659#comment-16661659
 ] 

Allan Yang commented on HBASE-21357:


Pushed to branch-1, thanks [~stack] for reviewing.

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661658#comment-16661658
 ] 

Hadoop QA commented on HBASE-21344:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HBASE-21344 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21344 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945321/HBASE-21344-branch-2.0_v3.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14837/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> 

[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21376:
---
Status: Patch Available  (was: Open)

> Add some verbose log to MasterProcedureScheduler
> 
>
> Key: HBASE-21376
> URL: https://issues.apache.org/jira/browse/HBASE-21376
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21376.branch-2.0.001.patch
>
>
> As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
> critical one is already submitted in HBASE-21364 to branch-2.0 and 
> branch-2.1, but I also added some useful logs  which need to commit to all 
> branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21376:
---
Attachment: HBASE-21376.branch-2.0.001.patch

> Add some verbose log to MasterProcedureScheduler
> 
>
> Key: HBASE-21376
> URL: https://issues.apache.org/jira/browse/HBASE-21376
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21376.branch-2.0.001.patch
>
>
> As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
> critical one is already submitted in HBASE-21364 to branch-2.0 and 
> branch-2.1, but I also added some useful logs  which need to commit to all 
> branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21344:
--
Fix Version/s: 2.0.3

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21344:
--
Status: Patch Available  (was: Open)

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661655#comment-16661655
 ] 

stack commented on HBASE-21344:
---

The change in HBaseTestingUtility is just formatting?

Otherwise, patch seems good.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> 

[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661653#comment-16661653
 ] 

Allan Yang commented on HBASE-21254:


But we can use public Entry tryLockEntry(long id, long time) instead, we can 
still block the forceUpdateExecutor thread if some procedure is stuck.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661646#comment-16661646
 ] 

Allan Yang edited comment on HBASE-21364 at 10/24/18 2:58 AM:
--

The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks! Opened HBASE-21376 to commit the verbose log.


was (Author: allan163):
The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks!

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21364:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661646#comment-16661646
 ] 

Allan Yang commented on HBASE-21364:


The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks!

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21376:
--

 Summary: Add some verbose log to MasterProcedureScheduler
 Key: HBASE-21376
 URL: https://issues.apache.org/jira/browse/HBASE-21376
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
critical one is already submitted in HBASE-21364 to branch-2.0 and branch-2.1, 
but I also added some useful logs  which need to commit to all branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661644#comment-16661644
 ] 

Duo Zhang commented on HBASE-21254:
---

But we do need to review the implementation again. As I found that, the actual 
deletion for a root procedure is done later in CompletedProcedureCleaner, so a 
successful procedure could also block us from removing a file...

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661643#comment-16661643
 ] 

Duo Zhang commented on HBASE-21254:
---

No it will be executed in a separated thread. In the log rolling thread we will 
just schedule a task into the forceUpdateExecutor.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661642#comment-16661642
 ] 

Allan Yang commented on HBASE-21254:


In forceUpdateProcedure, we will acquire the execution lock before force update:
{code}
 private void forceUpdateProcedure(long procId) throws IOException {
IdLock.Entry lockEntry = procExecutionLock.getLockEntry(procId);
try {
{code}
We will wait forever here if the procedure is stuck. And IIRC, 
forceUpdateProcedure will be executed in the roll procedure log thread, which 
will stuck it too.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661628#comment-16661628
 ] 

Duo Zhang commented on HBASE-21254:
---

Which lock? I do not get your point, about the 'wait for the lock when rolling 
log'...

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661625#comment-16661625
 ] 

Allan Yang commented on HBASE-21254:


I have a question for this one, what if a procedure is stuck there, and can't 
get the lock for it, will it stuck to wait for the lock when rolling log?

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344-branch-2.0_v3.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661612#comment-16661612
 ] 

Duo Zhang commented on HBASE-21372:
---

TRSP uses the same config to decide whether to give up retrying.

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661613#comment-16661613
 ] 

Duo Zhang commented on HBASE-21363:
---

Any other concerns? [~allan163].

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661611#comment-16661611
 ] 

Hadoop QA commented on HBASE-21363:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
17s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
8s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945314/HBASE-21363-v3.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1194f80ac5e4 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14835/testReport/ |
| Max. process+thread count | 278 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14835/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Rewrite the 

[jira] [Comment Edited] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661606#comment-16661606
 ] 

Allan Yang edited comment on HBASE-21372 at 10/24/18 2:06 AM:
--

+1 for it.
TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can 
[~Apache9] conform that in branch-2+, we have already retry forever for 
assignment?


was (Author: allan163):
+1 for it, TransitRegionStateProcedure is introduced in branch-2+ by 
[~Apache9], can [~Apache9] conform that in branch-2+, we have already retry 
forever for assignment?

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661606#comment-16661606
 ] 

Allan Yang commented on HBASE-21372:


+1 for it, TransitRegionStateProcedure is introduced in branch-2+ by 
[~Apache9], can [~Apache9] conform that in branch-2+, we have already retry 
forever for assignment?

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661593#comment-16661593
 ] 

Allan Yang commented on HBASE-21364:


The checkstyle error is not valid, we add too many comments in the method, so 
it exceeds 150 lines. Will commit the actual fix to  branch-2.1 and branch-2.0, 
and open another issue to commit the verbose code to all branches as [~Apache9] 
said.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661584#comment-16661584
 ] 

Hadoop QA commented on HBASE-21373:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
22s{color} | {color:red} hbase-server: The patch generated 2 new + 37 unchanged 
- 0 fixed = 39 total (was 37) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure.TestFailedProcCleanup |
|   | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 |
| JIRA Issue | HBASE-21373 |
| JIRA Patch URL | 

[jira] [Created] (HBASE-21375) Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart"

2018-10-23 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21375:
-

 Summary: Forward port "HBASE-21364 Procedure holds the lock should 
put to front of the queue after restart"
 Key: HBASE-21375
 URL: https://issues.apache.org/jira/browse/HBASE-21375
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 3.0.0, 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661580#comment-16661580
 ] 

stack commented on HBASE-21344:
---

Make new patch [~an...@apache.org]? Thanks.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661576#comment-16661576
 ] 

Hadoop QA commented on HBASE-21371:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
9s{color} | {color:green} hbase-resource-bundle in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21371 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945311/HBASE-21371.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  xml  |
| uname | Linux 92319fb5b492 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14834/testReport/ |
| Max. process+thread count | 87 (vs. ulimit of 1) |
| modules | C: hbase-resource-bundle U: hbase-resource-bundle |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14834/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy 

[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:36 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign max attempt , will not let 
the call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign procedure , will not let the 
call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> 

[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21325:
---
Attachment: HBASE-21325.master.003.patch

> Force to terminate regionserver when abort hang in somewhere
> 
>
> Key: HBASE-21325
> URL: https://issues.apache.org/jira/browse/HBASE-21325
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21325.master.001.patch, 
> HBASE-21325.master.001.patch, HBASE-21325.master.002.patch, 
> HBASE-21325.master.003.patch
>
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661570#comment-16661570
 ] 

Duo Zhang commented on HBASE-21363:
---

Add simple comments for the ProcedureWALFormat.load method.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:30 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign procedure , will not let the 
call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash)

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   

[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Attachment: HBASE-21363-v3.patch

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:24 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash)

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098 .filter(s -> s.hasMetaTableRegion()).findAny();
You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}

Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> 

[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Attachment: (was: HBASE-21371.001.patch)

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Status: Patch Available  (was: Open)

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661560#comment-16661560
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

yes, this will need to be done wherever we ship the relevant jar. if we already 
have bouncycastle as a dependency shouldn't we have an entry for it though?

I've checked the so called "Bouncy Castle License" is literally the same as the 
MIT License, character-by-character.

Interestingly I found LICENSE.vm contains this hard coded license text:
{quote}Bouncycastle is released under the MIT license (available above),
 and is Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle.
{quote}
 Maybe there's a bug in the license checker plugin and it didn't find 
Bouncycastle before so you had to manually add this license text?

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Attachment: HBASE-21371.master.001.patch

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661558#comment-16661558
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

 
{code:java}
$ mvn dependency:tree -Dhadoop.profile=3.0 
-Dhadoop-three.version=3.3.0-SNAPSHOT{code}
 javax.activation is included indirectly from hadoop-common:
{quote}[INFO] +- org.apache.hadoop:hadoop-common:jar:3.3.0-SNAPSHOT:compile

[INFO] |  +- javax.activation:javax.activation-api:jar:1.2.0:runtime
{quote}
bouncycastle is included in test jars only:
{quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test

[INFO] | - 
org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.3.0-SNAPSHOT:test
 [INFO] | +- org.bouncycastle:bcprov-jdk15on:jar:1.60:test
 [INFO] | - org.bouncycastle:bcpkix-jdk15on:jar:1.60:test
{quote}
the new jetty dependencies are included in test jars only too:
{quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test

[INFO] |  +- 
org.apache.hadoop:hadoop-yarn-server-tests:test-jar:tests:3.3.0-SNAPSHOT:test

[INFO] |  |  +- 
org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:3.3.0-SNAPSHOT:test

[INFO] |  |  |  +- 
org.eclipse.jetty.websocket:javax-websocket-server-impl:jar:9.3.19.v20170502:test

 
{quote}

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661554#comment-16661554
 ] 

Ankit Singhal commented on HBASE-21344:
---

{quote}1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098 .filter(s -> s.hasMetaTableRegion()).findAny();
You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}

Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> 

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661555#comment-16661555
 ] 

Duo Zhang commented on HBASE-21363:
---

OK, it is in ProcedureWALFormat... Let me update the patch.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661553#comment-16661553
 ] 

Duo Zhang commented on HBASE-21363:
---

[~allan163] Where is the code we call resetModified on the tracker in 
ProcedureWALFormatReader? I can find it.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661548#comment-16661548
 ] 

stack commented on HBASE-21344:
---

bq. Actually, I'm working against branch-2.0 only, here you can see 
tableStateManager is started 2 times,

My bad. Indeed, 2.0 has this. 2.1 does not.

What you doing here from patch?

1096  Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097  .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098  .filter(s -> s.hasMetaTableRegion()).findAny();

You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661544#comment-16661544
 ] 

Ankit Singhal commented on HBASE-21344:
---

bq. You don't seem to be working against the tip of branch-2.0 or branch-2.1. 
You seem to be working in your own branch? Is that so? If so, startup has 
changed pretty radically since 2.0.0.
Actually, I'm working against branch-2.0 only, here you can see 
tableStateManager is started 2 times,

At this instance, we only wait for IMP( which will be ok during the first start 
after deploy) but not when there are SCPs.
https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L929

TableStateManger is started after meta is actually online(which is correct).
https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L958


> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> 

[jira] [Commented] (HBASE-21224) Handle compaction queue duplication

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661536#comment-16661536
 ] 

Hadoop QA commented on HBASE-21224:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 1s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  3s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}285m 41s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}325m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestFromClientSide |
|   | hadoop.hbase.namespace.TestNamespaceAuditor |
|   | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas |
|   | hadoop.hbase.replication.TestReplicationKillSlaveRS |
|   | hadoop.hbase.client.TestSnapshotDFSTemporaryDirectory |
|   | hadoop.hbase.quotas.TestSpaceQuotas |
|   | hadoop.hbase.regionserver.TestRegionReplicaFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21224 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945266/HBASE-21224-master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9607297f8ebf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3b68e5393e |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| 

[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661533#comment-16661533
 ] 

stack commented on HBASE-21364:
---

Ok. Thanks. Yeah, want to cut an RC0 if I can. Thanks for working on this.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661528#comment-16661528
 ] 

Duo Zhang commented on HBASE-21364:
---

This is a critical problem so mark it as blocker for 2.1.1.

And for the patch, I suggest that we split it into two piece. The verbose 
related code can be done in a separated issue, and can be committed to all 
branches. And the code for fixing the actual problem should be committed to 
branch-2.1 and branch-2.0, which should be done ASAP as we want to push out 
2.1.1 now.

Ping [~stack].



> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21364:
--
Priority: Blocker  (was: Major)

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21364:
--
Fix Version/s: 2.0.3
   2.1.1

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661525#comment-16661525
 ] 

Duo Zhang commented on HBASE-21363:
---

Oh shit. Let me check the code. I think this should be in 2.1. The patch is 
almost there. Thanks.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661524#comment-16661524
 ] 

stack commented on HBASE-20828:
---

I need to write up what is in here. The subtasks have changed AMv2 for the 
better. Stuff like HBASE-21278 where now we do not try to rollback successful 
procedures but rather the parent needs to schedule compensatory, new Procedures 
needs evangelizing. Ditto the background task that is trying to limit our 
backlog of master proc wals TODO.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21372:
--
Attachment: HBASE-21372.branch-2.1.001.patch

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661515#comment-16661515
 ] 

Hadoop QA commented on HBASE-21372:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
58s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
16s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
13m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}210m 22s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}261m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945268/HBASE-21372.branch-2.1.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux cd3b360c9c27 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / e29ce9f937 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14830/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14830/testReport/ |
| Max. process+thread count | 4497 (vs. 

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661513#comment-16661513
 ] 

stack commented on HBASE-21344:
---

[~an...@apache.org] You don't seem to be working against the tip of branch-2.0 
or branch-2.1. You seem to be working in your own branch? Is that so? If so, 
startup has changed pretty radically since 2.0.0.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> 

[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661486#comment-16661486
 ] 

Hadoop QA commented on HBASE-21349:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
19s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
29s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21349 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945273/HBASE-21349.master.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fc32ee6e94ae 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1e9d998727 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14831/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Work stopped] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21373 stopped by Xu Cang.
---
> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21373 started by Xu Cang.
---
> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21373:

Attachment: HBASE-21373.branch-1.001.patch
Status: Patch Available  (was: Open)

> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21073) "Maintenance mode" master

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661421#comment-16661421
 ] 

Hudson commented on HBASE-21073:


Results for branch branch-2.0
[build #1002 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> "Maintenance mode" master
> -
>
> Key: HBASE-21073
> URL: https://issues.apache.org/jira/browse/HBASE-21073
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2, master
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21073.branch-2.001.patch, 
> HBASE-21073.branch-2.1.001.patch, HBASE-21073.branch-2.1.002.patch, 
> HBASE-21073.master.001.patch, HBASE-21073.master.002.patch, 
> HBASE-21073.master.003.patch, HBASE-21073.master.004.patch, 
> HBASE-21073.master.005.patch, HBASE-21073.master.006.patch, 
> HBASE-21073.master.007.patch, HBASE-21073.master.008.patch, 
> HBASE-21073.master.009.patch, HBASE-21073.master.010.patch, 
> HBASE-21073.master.011.patch
>
>
> Make it so we can bring up a Master in "maintenance mode". This is parse of 
> master wal procs but not taking on regionservers. It would be in a state 
> where "repair" Procedures could run; e.g. a Procedure that could recover meta 
> by looking for meta WALs, splitting them, dropping recovered.edits, and even 
> making it so meta is readable. See parent issue for why needed (disaster 
> recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661395#comment-16661395
 ] 

Ankit Singhal commented on HBASE-21344:
---

bq. This should be happening already. We wait on meta assign. If SCPs, they'll 
run and recover meta if one of them was holding it. If no assign for meta in 
the procedure store, then something untoward and at least for now, operator 
needs to figure what happened until we fix the bug. Operator can schedule an 
assign with hbck2
bq. branch-2.0 will go into a holding pattern if hbase:meta is not assigned 
(ditto if hbase:namespace is not assigned) waiting on operator intevention to 
clear the lack-of-assign.
Thanks [~stack] for the pointer, I didn't go down as the problem was started 
when we are starting tableStateManager without waiting for meta assignment by 
SCPs. I think we can just remove this from here as we already starting after 
waiting for meta to get online.(attached patch for the same)
{code}
 if (initMetaProc != null) {
   initMetaProc.await();
 }
-tableStateManager.start();
{code}

bq. That said, I see some value in this patch. In particular the bit around 
resetting hbase:meta state if failure.
We shouldn't offline the meta if we are failing the assignment as it will start 
the InitMetaProcedure (which we don't want as SCP need to take care of 
recovering of Meta).



> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 

[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344-branch-2.0_v2.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Resolved] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21353.
---
   Resolution: Fixed
 Assignee: stack
Fix Version/s: hbck2-1.0.0

Pushed fix over on hbase-operator-tools/hbase-hbck2.

> TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to 
> HBCK2#checkHBCKSupport
> -
>
> Key: HBASE-21353
> URL: https://issues.apache.org/jira/browse/HBASE-21353
> Project: HBase
>  Issue Type: Test
>  Components: hbase-operator-tools, hbck2
>Reporter: Ted Yu
>Assignee: stack
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> I noticed the following when running 
> TestHBCKCommandLineParsing#testCommandWithOptions :
> {code}
> "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on 
> condition [0x70216000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00076d3055d8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown
>  Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127)
>   at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93)
>   at org.apache.hbase.HBCK2.run(HBCK2.java:352)
>   at 
> org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62)
> {code}
> The test doesn't spin up hbase cluster.
> Hence the call to check hbck support hangs.
> In HBCK2#run, we can refactor the code such that argument parsing is done 
> prior to calling HBCK2#checkHBCKSupport .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21353:
--
Component/s: hbck2
 hbase-operator-tools

> TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to 
> HBCK2#checkHBCKSupport
> -
>
> Key: HBASE-21353
> URL: https://issues.apache.org/jira/browse/HBASE-21353
> Project: HBase
>  Issue Type: Test
>  Components: hbase-operator-tools, hbck2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running 
> TestHBCKCommandLineParsing#testCommandWithOptions :
> {code}
> "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on 
> condition [0x70216000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00076d3055d8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown
>  Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127)
>   at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93)
>   at org.apache.hbase.HBCK2.run(HBCK2.java:352)
>   at 
> org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62)
> {code}
> The test doesn't spin up hbase cluster.
> Hence the call to check hbck support hangs.
> In HBCK2#run, we can refactor the code such that argument parsing is done 
> prior to calling HBCK2#checkHBCKSupport .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661363#comment-16661363
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

{quote}yes, this will need to be done wherever we ship the relevant jar. if we 
already have bouncycastle as a dependency shouldn't we have an entry for it 
though?
{quote}
It looks like bouncycastle was under MIT License before (Hadoop used a very old 
version of bouncycastle in the past) and it's now Bouncy Castle License 
although it's essentially the same thing.

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661361#comment-16661361
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

{quote}instead of adding yet another way of referring to ALv2 can we please 
update supplemental to correct the new phrasing?
{quote}
Yeah I was doing that until I realized there are like a dozen Jetty artifacts 
that require the update ... But sure I can do that if this is the preferred way.

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21374) Backport HBASE-21342 to branch-1

2018-10-23 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661360#comment-16661360
 ] 

Mike Drob commented on HBASE-21374:
---

There was a conflict in the cherry-pick that I haven't looked into yet, will 
take care of it tomorrow if nobody else gets to it before that.

FYI [~apurtell]

> Backport HBASE-21342 to branch-1
> 
>
> Key: HBASE-21374
> URL: https://issues.apache.org/jira/browse/HBASE-21374
> Project: HBase
>  Issue Type: Task
>Reporter: Mike Drob
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21374) Backport HBASE-21342 to branch-1

2018-10-23 Thread Mike Drob (JIRA)
Mike Drob created HBASE-21374:
-

 Summary: Backport HBASE-21342 to branch-1
 Key: HBASE-21374
 URL: https://issues.apache.org/jira/browse/HBASE-21374
 Project: HBase
  Issue Type: Task
Reporter: Mike Drob






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-21342:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661356#comment-16661356
 ] 

Mike Drob commented on HBASE-21342:
---

I'm not trying to ignore any versions or leave anything unfixed. I'm not trying 
to make policy.

The fix versions are the set of versions that I had already done backports and 
pushed to. I hadn't pushed code to branch-1 or branch-2.0 yet, so they weren't 
included. I was using fix version in Jira as descriptive, not prescriptive. I 
wanted to be able to close the issue so that the RM could generate release 
notes if needed, and then expected to continue work in a separate backport 
issue.

I have pushed this to branch-2.0+

There is a conflict cherry-picking to branch-1 that I don't have time to 
resolve today. I will open a backport Jira to not conflict with the release 
notes for Stack if he cuts a 2.1.1 tonight.

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-21342:
--
Fix Version/s: 2.0.3
   2.1.1

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661350#comment-16661350
 ] 

Andrew Purtell commented on HBASE-21342:


The affects versions are set. Shouldn’t the fix versions be the same if that 
code is similarly affected? Are we leaving known bugs in branch-1 unfixed by 
policy now? Separate back port JIRA is better than nothing but not much 

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21349:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.1.2)
   2.1.1
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Thanks for the patch [~xucang]

> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718)
> at 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> 

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661336#comment-16661336
 ] 

Mike Drob commented on HBASE-21342:
---

I didn't drop the branch-1 versions, they were never in the fix version list. 
They're in affects version still.

Paused my backports to consult with Stack offline about wether it's safe to 
commit to branch-2.1 right now, since I know he's prepping for a release. Will 
likely spin off separate backport issues.

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20623) Introduce the helper method "getCellBuilder()" to Mutation

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661315#comment-16661315
 ] 

Hadoop QA commented on HBASE-20623:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
28s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
11s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}303m 
24s{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}378m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20623 |
| JIRA Patch URL | 

[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661277#comment-16661277
 ] 

Hadoop QA commented on HBASE-21349:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
12s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
34s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 22s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}134m 
41s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21349 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945252/HBASE-21349.master.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c8b0eb4d0b34 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3b68e5393e |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14827/testReport/ |
| Max. process+thread count | 5051 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661275#comment-16661275
 ] 

Andrew Purtell commented on HBASE-21342:


Why did we drop all of the branch-1 fix versions? Is this not an issue there?

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-21342:
--
Fix Version/s: 2.2.0
   3.0.0

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661272#comment-16661272
 ] 

Xu Cang commented on HBASE-21349:
-

It compiles locally. I think the hadoop-qa failure was caused by this commit: 
86f23128b0d66deb70790785e63d2f7e01d5ab8d

and Duo Zhang has fixed it in later commit.

Let me re-trigger the Hadoop-QA. thanks [~stack]

 

> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718)
> at 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> 

[jira] [Updated] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21349:

Attachment: HBASE-21349.master.002.patch

> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718)
> at 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661256#comment-16661256
 ] 

Ted Yu commented on HBASE-21342:


Mike:
Please go ahead.

Thanks

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661250#comment-16661250
 ] 

Andrew Purtell commented on HBASE-21373:


Thanks [~xucang] [~stack]

> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >