[jira] [Updated] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20893:
--
Status: Patch Available  (was: Reopened)

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1, 2.1.0, 3.0.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559293#comment-16559293
 ] 

stack commented on HBASE-20893:
---

Here is an addendum that changes split and merge to just reopen regions if 
recovered.edits found rather than try and do rollback (rollback doesn't work).

I need to check the merge case still (The split test gives a nice loud and 
clear message that this addresses the CODE-BUG issue when splitting but having 
harder time w/ the merge test). I also want to do some edit logging to make it 
clear what is going on in here Will be back. Let me try this patch against 
hadoopqa in meantime.

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20893:
--
Attachment: HBASE-20893-branch-2.0.addendum.patch

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20964) test

2018-07-26 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-20964.
-
   Resolution: Invalid
Fix Version/s: (was: hbase-7290)

> test
> 
>
> Key: HBASE-20964
> URL: https://issues.apache.org/jira/browse/HBASE-20964
> Project: HBase
>  Issue Type: Test
>  Components: Client
>Affects Versions: 1.2.6.1
>Reporter: zhou pengbo
>Priority: Trivial
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-26 Thread Vishal Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559268#comment-16559268
 ] 

Vishal Khandelwal commented on HBASE-20930:
---

I have uploaded the new patch [~elserj]

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-26 Thread Vishal Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Khandelwal updated HBASE-20930:
--
Attachment: HBASE-20930.branch-1.3.v2.patch

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch, 
> HBASE-20930.branch-1.3.v2.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20964) test

2018-07-26 Thread zhou pengbo (JIRA)
zhou pengbo created HBASE-20964:
---

 Summary: test
 Key: HBASE-20964
 URL: https://issues.apache.org/jira/browse/HBASE-20964
 Project: HBase
  Issue Type: Test
  Components: Client
Affects Versions: 1.2.6.1
Reporter: zhou pengbo
 Fix For: hbase-7290






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559237#comment-16559237
 ] 

Hadoop QA commented on HBASE-19369:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
12s{color} | {color:red} hbase-server: The patch generated 1 new + 0 unchanged 
- 2 fixed = 1 total (was 2) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
39s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
23s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 
21s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 2s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-19369 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933300/HBASE-19369.v9.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d999dc6a5f43 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559217#comment-16559217
 ] 

stack commented on HBASE-20939:
---

+1 on patch. Please put it in branch-2.0+. I'll test it.

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559209#comment-16559209
 ] 

stack commented on HBASE-20939:
---

bq. but it can not prevent the same procedure to be executed parallel with 
multiple workers...

Exactly.

I allowed that this could happen -- see the comment you remove in this patch -- 
but whats great here is your identifying that around suspend, concurrent 
modification of Procedure state is possible. Thanks for the clarification.  
Framework needs to make it so execution is single-threaded by default if only 
to make the devs life easier. Later, if a case for concurrent execution of 
steps, we can figure how but can't think of a need just now. I added comment on 
HBASE-20828 about this issue.

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559208#comment-16559208
 ] 

stack commented on HBASE-20828:
---

Explain that there are "two dimensions" of exclusivity when we talk about 
Procedure execution ([~Apache9]'s words). There is exclusion around the entity 
that the Procedure is working on -- the Region, etc. -- and for this the 
framework has a locking facility (hasLock, etc.). But then there is also 
nothing in the framework currently to prevent multi-threaded execution of 
steps, possible if one worker thread execution runs into a "suspend". See 
HBASE-20939 for discussion. The case is rare. Ideally the framework would 
ensure single-threaded execution but for now, until we get more experience, see 
the 'trick' in HBASE-20939.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559201#comment-16559201
 ] 

stack commented on HBASE-20893:
---

So, Split does not support rollback at the 
SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS stage. Rollback looks for 
subprocedures, finds the unassign that just completed and tries to roll it back 
(though it successfully closed the parent). This fails. You can't rollback a 
successful unassign ... (the message is wrong.. it says assign)... at least not 
currently. Let me try and fix or figure another soln here. The test that came 
in with this patch is really helpful here. It indeed manufactures the problem 
case of unaccounted Recovered.edits but it also (thankfully) manufactures 
the failed rollback/CODE-BUG. Will be back.

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893.branch-2.0.001.patch, 
> HBASE-20893.branch-2.0.002.patch, HBASE-20893.branch-2.0.003.patch, 
> HBASE-20893.branch-2.0.004.patch, HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559200#comment-16559200
 ] 

Duo Zhang commented on HBASE-20939:
---

To speak clearer, the procedure lock is to prevent other procedures run at the 
same time, but it can not prevent the same procedure to be executed parallel 
with multiple workers...

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559199#comment-16559199
 ] 

Duo Zhang commented on HBASE-20939:
---

{quote}
So IdLock is about serializing thread access and region lock is about this 
procedure having exclusive lock on the region entity till done. Man, this 
complicated. We should make the framework ensure single-threaded execution. I 
can't think when we'd want different. Its the suspend action that makes life 
interesting
{quote}

This is not easy, I do not have an idea in mind to unify them yet. Actually, 
the region lock is for procedure, and the IdLock is for thread, they are two 
dimensions.

And on the implementation for now, we do allow a procedure to pass the 
exclusive lock check if it has already held the lock, this is necessary, and 
also easy to understand, just the same with the ReentrantLock in java. So it is 
impossible to prevent concurrent execution for the same procedure through the 
procedure lock, for now...

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19008) Add missing equals or hashCode method(s) to stock Filter implementations

2018-07-26 Thread liubangchen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559198#comment-16559198
 ] 

liubangchen commented on HBASE-19008:
-

[~Jan Hentschel] If you have no time, I can take this issure.Thanks.

> Add missing equals or hashCode method(s) to stock Filter implementations
> 
>
> Key: HBASE-19008
> URL: https://issues.apache.org/jira/browse/HBASE-19008
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Jan Hentschel
>Priority: Major
>  Labels: filter
>
> In HBASE-15410, [~mdrob] reminded me that Filter implementations may not 
> write {{equals}} or {{hashCode}} method(s).
> This issue is to add missing {{equals}} or {{hashCode}} method(s) to stock 
> Filter implementations such as KeyOnlyFilter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20939:
--
Component/s: amv2

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18152) [AMv2] Corrupt Procedure WAL file; procedure data stored out of order

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559196#comment-16559196
 ] 

stack commented on HBASE-18152:
---

I think HBASE-20939 uncovers the root of corruption I've been trying to figure 
in here.

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> -
>
> Key: HBASE-18152
> URL: https://issues.apache.org/jira/browse/HBASE-18152
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: 
> 0001-TestWALProcedureExecutore-order-checking-test-that-d.patch, 
> HBASE-18152.master.001.patch, 
> hbase-hbase-master-ctr-e138-1518143905142-221855-01-02.hwx.site.log.gz, 
> pv2-0036.log, pv2-0047.log, 
> reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20939:
--
Priority: Critical  (was: Major)

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20939:
--
Fix Version/s: 2.0.1
   2.1.1
   2.2.0
   3.0.0

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559194#comment-16559194
 ] 

stack commented on HBASE-20939:
---

So IdLock is about serializing thread access and region lock is about *this* 
procedure having exclusive lock on the region entity till done. Man, this 
complicated. We should make the framework ensure single-threaded execution. I 
can't think when we'd want different. Its the suspend action that makes life 
interesting.

Good one [~Apache9]... i think you've -- consiously or not -- figured the 
corruption I've been struggling with.

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559193#comment-16559193
 ] 

Duo Zhang commented on HBASE-20939:
---

It could lead to a reorder but for most cases, we are likely to update the 
procedure twice with the same state, as the second worker has already finished 
the procedure and set the state to finish.

But anyway, it is possible. As we need to serialize the procedure first, it is 
possible that, we serialize the procedure data, and then the second worker 
changes the state and then store it first, then corruption happens...

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559190#comment-16559190
 ] 

stack commented on HBASE-20939:
---

Oh, when I was staring at it, the above is all happening under region lock so I 
was thinking the lock would exclude the second *Dispatch* thread from stepping 
on the *Starting* thread.

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559189#comment-16559189
 ] 

stack commented on HBASE-20939:
---

Your scenario would explain the WAL corruption that I'm seeing over in 
HBASE-18152?

On the below

1. We call suspendIfNotReady on a event, and it returns true so we need to wait.
2. The event has been waked up, and the procedure will be added back to the 
scheduler.
3. A worker picks up the procedure and finishes it.
4. We finally throw ProcedureSuspendException and the ProcedureExecutor suspend 
us and store the state in procedure store.

I've been looking at this sequence presuming that after #1, we updated 
store BEFORE suspend. Over in HBASE-18152, there is 70millis between the 
*Starting* and suspend and the subsequent *Dispatch*...on another thread. In 
the WAL on a later read, the two events show as flipped => Corruption.



> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20939:
--
Attachment: HBASE-20939.patch

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20939:
--
Status: Patch Available  (was: Open)

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-19369:
--
 Assignee: Mike Drob  (was: Alex Leblang)
Fix Version/s: 2.2.0
   Status: Open  (was: Patch Available)

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Mike Drob
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch, HBASE-19369.v9.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-19369:
--
Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch, HBASE-19369.v9.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559150#comment-16559150
 ] 

Mike Drob commented on HBASE-19369:
---

v9: Finishes the EC test setup, verified that this test fails without the 
corresponding changes to the LogWriter.

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Alex Leblang
>Priority: Major
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch, HBASE-19369.v9.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-19369:
--
Attachment: HBASE-19369.v9.patch

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Alex Leblang
>Priority: Major
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch, HBASE-19369.v9.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reassigned HBASE-20939:
-

Assignee: Duo Zhang

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20921) Possible NPE in ReopenTableRegionsProcedure

2018-07-26 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20921:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.0.2
   3.0.0
   Status: Resolved  (was: Patch Available)

> Possible NPE in ReopenTableRegionsProcedure
> ---
>
> Key: HBASE-20921
> URL: https://issues.apache.org/jira/browse/HBASE-20921
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20921.branch-2.0.001.patch, 
> HBASE-20921.branch-2.0.002.patch
>
>
> After HBASE-20752, we issue a ReopenTableRegionsProcedure in 
> ModifyTableProcedure to ensure all regions are reopened.
> But, ModifyTableProcedure and ReopenTableRegionsProcedure do not hold the 
> lock (why?), so there is a chance that while ModifyTableProcedure  executing, 
> a merge/split procedure can be executed at the same time.
> So, when ReopenTableRegionsProcedure reaches the state of 
> "REOPEN_TABLE_REGIONS_CONFIRM_REOPENED", some of the persisted regions to 
> check is actually not exists, thus a NPE will throw.
> {code}
> 2018-07-18 01:38:57,528 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1246): Finished pid=6110, state=SUCCESS; 
> MergeTableRegionsProcedure table=IntegrationTestBigLinkedList, 
> regions=[845d286231eb01b7
> 1aeaa17b0e30058d, 4a46ab0918c99cada72d5336ad83a828], forcibly=false in 
> 10.8610sec
> 2018-07-18 01:38:57,530 ERROR [PEWorker-8] 
> procedure2.ProcedureExecutor(1478): CODE-BUG: Uncaught runtime exception: 
> pid=5974, ppid=5973, state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTab
> leRegionsProcedure table=IntegrationTestBigLinkedList
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.checkReopened(RegionStates.java:651)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:102)
> at 
> org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:45)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1453)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> I think we need to renew the region list of the table at the 
> "REOPEN_TABLE_REGIONS_CONFIRM_REOPENED" state. For the regions which are 
> merged or split, we do not need to check it. Since we can be sure that they 
> are opened after we made change to table descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20949) Split/Merge table can be executed concurrently with DisableTableProcedure

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559101#comment-16559101
 ] 

Duo Zhang commented on HBASE-20949:
---

OK I think the problem here is something like HBASE-20939.

After we dispatch the open region request, we suspend ourselves and return. And 
the open region call finishes immediately and wake us up, and another PEWorker 
takes charge the procedure and set the procedure state to SUCCESS, and then the 
original PEWorker come back again, it also finds out that the procedure is in 
SUCCESS state, so it also tries to finish the procedure and cause a double 
release.

{code}
  LockState lockState = acquireLock(proc);
  switch (lockState) {
case LOCK_ACQUIRED:
  execProcedure(procStack, proc);
  break;
case LOCK_YIELD_WAIT:
  LOG.info(lockState + " " + proc);
  scheduler.yield(proc);
  break;
case LOCK_EVENT_WAIT:
  // Someone will wake us up when the lock is available
  LOG.debug(lockState + " " + proc);
  break;
default:
  throw new UnsupportedOperationException();
  }
  procStack.release(proc);

  if (proc.isSuccess()) {
// update metrics on finishing the procedure
proc.updateMetricsOnFinish(getEnvironment(), proc.elapsedTime(), true);
LOG.info("Finished " + proc + " in " + 
StringUtils.humanTimeDiff(proc.elapsedTime()));
// Finalize the procedure state
if (proc.getProcId() == rootProcId) {
  procedureFinished(proc);
} else {
  execCompletionCleanup(proc);
}
break;
  }
{code}

This is the critical part, the 'if(proc.isSuccess())' part has been executed 
twice so we are dead.

Let me prepare a patch in HBASE-20939 to see if it helps.

> Split/Merge table can be executed concurrently with DisableTableProcedure
> -
>
> Key: HBASE-20949
> URL: https://issues.apache.org/jira/browse/HBASE-20949
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-20949-debug.patch
>
>
> The top flaky tests on the dashboard are all because of this.
> TestRestoreSnapshotFromClient
> TestSimpleRegionNormalizerOnCluster
> Theoretically this should not happen, need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20949) Split/Merge table can be executed concurrently with DisableTableProcedure

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559091#comment-16559091
 ] 

Duo Zhang commented on HBASE-20949:
---

The problem here is that, we seem to finish the procedure twice.

{noformat}
2018-07-26 22:35:44,762 INFO  [PEWorker-4] procedure2.ProcedureExecutor(1551): 
Initialized subprocedures=[{pid=96, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=d0b0215a67002d7c19486ee75f610d94}, {pid=97, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=9ae1ea9ba045c13d80b255e0de39eafb}, {pid=98, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=46cd0014b889f5540cb5e356c5fb7d6e}, {pid=99, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=201c3520780fedb059b26f6d757c250f}, {pid=100, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=c5e9ddc19cbd286cf4a016f889e79732}, {pid=101, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251}]
2018-07-26 22:35:44,819 DEBUG [PEWorker-8] procedure2.LockAndQueue(123): 
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251 acquire shared lock 3284bcdf: 
exclusiveLockOwner=NONE, sharedLockCount=1, waitingProcCount=0 succeeded
2018-07-26 22:35:44,819 DEBUG [PEWorker-8] procedure2.LockAndQueue(123): 
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251 acquire shared lock 7f6eec32: 
exclusiveLockOwner=95, sharedLockCount=0, waitingProcCount=0 succeeded
2018-07-26 22:35:44,819 INFO  [PEWorker-8] 
procedure.MasterProcedureScheduler(689): pid=101, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251 checking lock on 
e1d6b5e83f6ed6175d95b0a628627251
2018-07-26 22:35:44,871 INFO  [PEWorker-8] assignment.AssignProcedure(218): 
Starting pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_QUEUE, 
hasLock=true; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OFFLINE, 
location=asf911.gq1.ygridcore.net,41392,1532644489849; forceNewPlan=false, 
retain=true
2018-07-26 22:35:45,022 INFO  [PEWorker-5] assignment.RegionStateStore(199): 
pid=101 updating hbase:meta row=e1d6b5e83f6ed6175d95b0a628627251, 
regionState=OPENING, regionLocation=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,026 INFO  [PEWorker-5] 
assignment.RegionTransitionProcedure(241): Dispatch pid=101, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING, 
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,198 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=57564] 
assignment.RegionTransitionProcedure(264): Received report OPENED seqId=5, 
pid=101, ppid=95, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; 
AssignProcedure table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING, 
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,198 DEBUG [PEWorker-15] 
assignment.RegionTransitionProcedure(354): Finishing pid=101, ppid=95, 
state=RUNNABLE:REGION_TRANSITION_FINISH, hasLock=true; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251; rit=OPENING, 
location=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,199 INFO  [PEWorker-15] assignment.RegionStateStore(199): 
pid=101 updating hbase:meta row=e1d6b5e83f6ed6175d95b0a628627251, 
regionState=OPEN, openSeqNum=5, 
regionLocation=asf911.gq1.ygridcore.net,41392,1532644489849
2018-07-26 22:35:45,276 DEBUG [PEWorker-5] procedure2.LockAndQueue(132): 
pid=101, ppid=95, state=SUCCESS, hasLock=false; AssignProcedure 
table=testRestoreSnapshotAfterSplittingRegions-1532644538441, 
region=e1d6b5e83f6ed6175d95b0a628627251 release shared lock 7f6eec32: 
exclusiveLockOwner=NONE, sharedLockCount=6, 

[jira] [Commented] (HBASE-20949) Split/Merge table can be executed concurrently with DisableTableProcedure

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559086#comment-16559086
 ] 

Duo Zhang commented on HBASE-20949:
---

Confirmed, there are duplicated release lock acquires. Let me dig.

> Split/Merge table can be executed concurrently with DisableTableProcedure
> -
>
> Key: HBASE-20949
> URL: https://issues.apache.org/jira/browse/HBASE-20949
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-20949-debug.patch
>
>
> The top flaky tests on the dashboard are all because of this.
> TestRestoreSnapshotFromClient
> TestSimpleRegionNormalizerOnCluster
> Theoretically this should not happen, need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559080#comment-16559080
 ] 

Hudson commented on HBASE-20927:


Results for branch branch-2
[build #1030 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1030/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1030//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1030//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1030//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-26 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-20885:
---
Attachment: hbase-20885.master.003.patch

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch, hbase-20885.master.003.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20874) Sending compaction descriptions from all regionservers to master.

2018-07-26 Thread Mohit Goel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559014#comment-16559014
 ] 

Mohit Goel commented on HBASE-20874:


Review Link : https://reviews.apache.org/r/68035/

> Sending compaction descriptions from all regionservers to master.
> -
>
> Key: HBASE-20874
> URL: https://issues.apache.org/jira/browse/HBASE-20874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Mohit Goel
>Assignee: Mohit Goel
>Priority: Minor
>
> Need to send the compaction description from region servers to Master , to 
> let master know of the entire compaction state of the cluster. Further need 
> to change the implementation of client Side API than like getCompactionState, 
> which will consult master for the result instead of sending individual 
> request to regionservers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-07-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559009#comment-16559009
 ] 

Zach York commented on HBASE-20734:
---

PR [https://github.com/apache/hbase/pull/86] contains the master code. There 
are some test failures that I will be addressing.

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20734.branch-1.001.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558989#comment-16558989
 ] 

Sergey Soldatov commented on HBASE-20657:
-

Oops. accidentally switched to branch-2 from master. Actually changes for 
MasterProcedureScheduler were added to master as part of HBASE-20569. Not sure 
what should we do for branch-2 in that case. WDYT [~elserj]

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558974#comment-16558974
 ] 

Hadoop QA commented on HBASE-20657:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  3s{color} 
| {color:red} HBASE-20657 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-20657 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933276/HBASE-20657-4-master.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13819/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558973#comment-16558973
 ] 

Sergey Soldatov commented on HBASE-20657:
-

Removed the internal lock implementation, thanks [~Apache9] for making those a 
part of Procedure class 
I still think that changes in MasterProcedureScheduler.java should be applied. 
To reproduce the problem - make MTP holding the lock (as the patch does) and 
run the provided test. Everything will end up with a number of regions stuck in 
RiT state forever. The problem is in the state machine optimization, but not in 
MTP itself, so it may happen with any other complex procedure that may hold the 
lock. 

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-20657:
---

Assignee: stack  (was: Sergey Soldatov)

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-20657:
---

Assignee: Sergey Soldatov  (was: stack)

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Fix Version/s: 3.0.0

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-4-master.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-07-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558921#comment-16558921
 ] 

Zach York commented on HBASE-20734:
---

Sorry it took so long to get the master patch up. I'll be uploading that today 
and then will make the proposed edits after.

P.S. it looks like there were a few code paths in Master procedures that were 
looking at the wrong FS. I've tried to fix any of those that I saw.

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20734.branch-1.001.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Alex Leblang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558885#comment-16558885
 ] 

Alex Leblang commented on HBASE-19369:
--

[~mdrob] I stalled on this patch and then unfortunately forgot about it. It is 
now all yours

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Alex Leblang
>Priority: Major
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20962) LogStream Metadata Tracking

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558878#comment-16558878
 ] 

Josh Elser commented on HBASE-20962:


Posed this question to [~enis] in email. Let me try to paraphrase what he 
suggested:

[https://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/user_guide/architecture/main.html#id3]

!https://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/images/datamodel.png!

For the Distributed log data model, we have "log segments", a log stream is is 
a sequence of log-segments, and then the log stream belongs to a namespace.

For Ratis, we'd be looking at a "log segment" being one raft ring/quorum. The 
Ratis LogService would give HBase the LogStream API (abstracting away the 
"physical" data on disk) – one region would have one LogStream. All of the 
operations that HBase would want to do would be at the LogStream level, never 
the log-segment level.

I believe Enis was suggesting that we use rocksdb to manage the log-segments on 
a given RS (e.g. knowing how to construct readers/writers, how to truncate 
data), and then a metadata-level raft ring/quorum for knowing what logstreams 
exist on other nodes.

> LogStream Metadata Tracking
> ---
>
> Key: HBASE-20962
> URL: https://issues.apache.org/jira/browse/HBASE-20962
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Priority: Major
>
> An open question is about how HBase would track these LogService-backed WALs.
> Presently, HBase uses server-names and a well-known directory in HDFS to know 
> what WALs exist. Since we are not relying on HDFS (or a distributed 
> filesystem), we need to come up with something else.
> [~sergey soldatov] made a good suggestion today which was that we could 
> implement another Ratis StateMachine whose purpose was specifically designed 
> to managing the state of LogStreams "in HBase". This information should be 
> relatively "small" (WRT the amount of data in each LogStream), so we can 
> avoid the kinds of problems described in HBASE-20961 around re-introducing a 
> failed peer to the quorum. This is the best idea I've heard so far on the 
> matter.
> The other obvious candidate would be ZooKeeper but this is probably a 
> non-starter as it would be persistent data (which is an HBase anti-pattern).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20963) Benchmark LogService WALs

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20963:
--

 Summary: Benchmark LogService WALs
 Key: HBASE-20963
 URL: https://issues.apache.org/jira/browse/HBASE-20963
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


We should have a general understanding of the impact on write and recovery 
paths to using the Ratis LogService as opposed to FSHLog or AsyncFSWal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558845#comment-16558845
 ] 

Hudson commented on HBASE-18477:


Results for branch HBASE-18477
[build #276 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/276/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/276//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/276//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/276//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/276//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Umbrella JIRA for HBase Read Replica clusters
> -
>
> Key: HBASE-18477
> URL: https://issues.apache.org/jira/browse/HBASE-18477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase 
> Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope 
> doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf
>
>
> Recently, changes (such as HBASE-17437) have unblocked HBase to run with a 
> root directory external to the cluster (such as in Amazon S3). This means 
> that the data is stored outside of the cluster and can be accessible after 
> the cluster has been terminated. One use case that is often asked about is 
> pointing multiple clusters to one root directory (sharing the data) to have 
> read resiliency in the case of a cluster failure.
>  
> This JIRA is an umbrella JIRA to contain all the tasks necessary to create a 
> read-replica HBase cluster that is pointed at the same root directory.
>  
> This requires making the Read-Replica cluster Read-Only (no metadata 
> operation or data operations).
> Separating the hbase:meta table for each cluster (Otherwise HBase gets 
> confused with multiple clusters trying to update the meta table with their ip 
> addresses)
> Adding refresh functionality for the meta table to ensure new metadata is 
> picked up on the read replica cluster.
> Adding refresh functionality for HFiles for a given table to ensure new data 
> is picked up on the read replica cluster.
>  
> This can be used with any existing cluster that is backed by an external 
> filesystem.
>  
> Please note that this feature is still quite manual (with the potential for 
> automation later).
>  
> More information on this particular feature can be found here: 
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20927:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.1.1)
   2.2.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Sergey

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20894) Move BucketCache from java serialization to protobuf

2018-07-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558834#comment-16558834
 ] 

Hadoop QA commented on HBASE-20894:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
31s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 43s{color} 
| {color:red} hbase-server generated 1 new + 187 unchanged - 1 fixed = 188 
total (was 188) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
10s{color} | {color:red} hbase-server: The patch generated 66 new + 30 
unchanged - 8 fixed = 96 total (was 38) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  1s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
1m 10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 
15s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20894 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933101/HBASE-20894.master.002.patch
 |
| Optional Tests |  asflicense  cc  unit  

[jira] [Created] (HBASE-20962) LogStream Metadata Tracking

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20962:
--

 Summary: LogStream Metadata Tracking
 Key: HBASE-20962
 URL: https://issues.apache.org/jira/browse/HBASE-20962
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


An open question is about how HBase would track these LogService-backed WALs.

Presently, HBase uses server-names and a well-known directory in HDFS to know 
what WALs exist. Since we are not relying on HDFS (or a distributed 
filesystem), we need to come up with something else.

[~sergey soldatov] made a good suggestion today which was that we could 
implement another Ratis StateMachine whose purpose was specifically designed to 
managing the state of LogStreams "in HBase". This information should be 
relatively "small" (WRT the amount of data in each LogStream), so we can avoid 
the kinds of problems described in HBASE-20961 around re-introducing a failed 
peer to the quorum. This is the best idea I've heard so far on the matter.

The other obvious candidate would be ZooKeeper but this is probably a 
non-starter as it would be persistent data (which is an HBase anti-pattern).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20961) Recover RAFT quorum membership loss

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20961:
--

 Summary: Recover RAFT quorum membership loss
 Key: HBASE-20961
 URL: https://issues.apache.org/jira/browse/HBASE-20961
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Servers that are participating in the LogService's quorum will die unexpectedly.

While RAFT/Ratis would be capable of recovering from this scenario, we likely 
do not want to do this because of the associated cost of shipping the edits for 
a LogStream to a new peer.

Instead, the simple solution would be for a RegionServer to create a new 
LogStream. This is analogous to us rolling an (hdfs file-backed) WAL when we 
have errors writing/syncing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558815#comment-16558815
 ] 

Mike Drob commented on HBASE-19369:
---

bq. Anyway I see branch-2.9 has already been cut and EC is not included... 
Let's file a issue to revisit the hbase-hadoop-compact module?

[~Apache9] - I checked with some HDFS folks and they said that EC is not going 
back to 2.x, so the hadoop-three-compat module could be a little easier that 
way. Do you think it's a prerequisite for this, or something that can be done 
later if we see more use for it?

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Alex Leblang
>Priority: Major
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20960) Expose Ratis LogService metrics

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20960:
--

 Summary: Expose Ratis LogService metrics
 Key: HBASE-20960
 URL: https://issues.apache.org/jira/browse/HBASE-20960
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


RATIS-278 is filed to create metrics for the LogStream itself.

HBase will want to consume (transform?) and expose metrics about the LogStream 
for us at the HBase level to inspect. Need to figure out what the LogService 
will show us first, and then what makes sense to export as a part of the rest 
of our HBase Metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558809#comment-16558809
 ] 

Josh Elser commented on HBASE-20952:


{quote}and fall out for the replication system
{quote}
Thanks Sean. Forgot to mention this (and others) in the description.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20952:
---
Description: 
Take a step back from the current WAL implementations and think about what an 
HBase WAL API should look like. What are the primitive calls that we require to 
guarantee durability of writes with a high degree of performance?

The API needs to take the current implementations into consideration. We should 
also have a mind for what is happening in the Ratis LogService (but the 
LogService should not dictate what HBase's WAL API looks like RATIS-272).

Other "systems" inside of HBase that use WALs are replication and 
backup Replication has the use-case for "tail"'ing the WAL which we 
should provide via our new API. B doesn't do anything fancy (IIRC). We should 
make sure all consumers are generally going to be OK with the API we create.

The API may be "OK" (or OK in a part). We need to also consider other methods 
which were "bolted" on such as {{AbstractFSWAL}} and {{WALFileLengthProvider}}. 
Other corners of "WAL use" (like the {{WALSplitter}} should also be looked at 
to use WAL-APIs only).

We also need to make sure that adequate interface audience and stability 
annotations are chosen.

  was:
Take a step back from the current WAL implementations and think about what an 
HBase WAL API should look like. What are the primitive calls that we require to 
guarantee durability of writes with a high degree of performance?

The API needs to take the current implementations into consideration. We should 
also have a mind for what is happening in the Ratis LogService (but the 
LogService should not dictate what HBase's WAL API looks like RATIS-272).

The API may be "OK" (or OK in a part). We need to also consider other methods 
which were "bolted" on such as {{AbstractFSWAL}} and {{WALFileLengthProvider}}. 
Other corners of "WAL use" (like the {{WALSplitter}} should also be looked at 
to use WAL-APIs only).

We also need to make sure that adequate interface audience and stability 
annotations are chosen.


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558805#comment-16558805
 ] 

Josh Elser commented on HBASE-20952:


{quote}I definitely have some thoughts on this. I'll try to summarize and put 
it here, but in general making the interface as basic as possible would be the 
easiest to work with IMO.
{quote}
+1
{quote}Old request is reexamination of the WALEdit/WALKey entities because they 
are fat objects that duplicate attributes. Would be sweet if these got a review 
as part of this work (maybe its out of scope).
{quote}
Can try to lob this in, too.
{quote}There is too much here as it is (needs digging).
{quote}
Yeah, this is my biggest worry. We have a very basic {{WAL.Reader}} interface 
now, but I worry about the implementation details pushed into AbstractFSWal
{quote}Will multiwal be supported?
{quote}
My hope would be that we can make WAL-per-Region work as that will simplify 
recovery code greatly (and MTTR as you stated earlier). If that's the case, I 
wouldn't expect multiwal to give much benefit here.

We'll have to visit multiwal anyways (to update for API), but, right now, my 
gut is telling me that it wouldn't have much relevance for the Ratis LogService 
wal.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20782) Fix duplication of TestServletFilter.access

2018-07-26 Thread Jan Hentschel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558804#comment-16558804
 ] 

Jan Hentschel commented on HBASE-20782:
---

Readded patch -.003 to kick off QA again.

> Fix duplication of TestServletFilter.access
> ---
>
> Key: HBASE-20782
> URL: https://issues.apache.org/jira/browse/HBASE-20782
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jan Hentschel
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20782.master.001.patch, 
> HBASE-20782.master.002.patch, HBASE-20782.master.003.patch, 
> HBASE-20782.master.003.patch, HBASE-20782.master.003.patch
>
>
> The {{access}} method in {{TestServletFilter}} is duplicated in 
> {{TestPathFilter}}. The method should be moved into a common place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20782) Fix duplication of TestServletFilter.access

2018-07-26 Thread Jan Hentschel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Hentschel updated HBASE-20782:
--
Attachment: HBASE-20782.master.003.patch

> Fix duplication of TestServletFilter.access
> ---
>
> Key: HBASE-20782
> URL: https://issues.apache.org/jira/browse/HBASE-20782
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jan Hentschel
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20782.master.001.patch, 
> HBASE-20782.master.002.patch, HBASE-20782.master.003.patch, 
> HBASE-20782.master.003.patch, HBASE-20782.master.003.patch
>
>
> The {{access}} method in {{TestServletFilter}} is duplicated in 
> {{TestPathFilter}}. The method should be moved into a common place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20959) Port backup and restore to new WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20959:
--

 Summary: Port backup and restore to new WAL API
 Key: HBASE-20959
 URL: https://issues.apache.org/jira/browse/HBASE-20959
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


B uses WALs for incremental backups. Need to switch it over to using the new 
API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20954) Support reading from a LogService WAL

2018-07-26 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20954:
---
Description: Recovery needs to read from a LogStream and show that it can 
replay all edits since the last flush to make sure that no data is lost. 
Replication also needs to be able to read the LogStream.  (was: Recovery needs 
to read from a LogStream and show that it can replay all edits since the last 
flush to make sure that no data is lost.)
Summary: Support reading from a LogService WAL  (was: Support recovery 
to read from a LogService WAL)

> Support reading from a LogService WAL
> -
>
> Key: HBASE-20954
> URL: https://issues.apache.org/jira/browse/HBASE-20954
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Priority: Major
>
> Recovery needs to read from a LogStream and show that it can replay all edits 
> since the last flush to make sure that no data is lost. Replication also 
> needs to be able to read the LogStream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20958) Port WAL recovery code to use the new WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20958:
--

 Summary: Port WAL recovery code to use the new WAL API
 Key: HBASE-20958
 URL: https://issues.apache.org/jira/browse/HBASE-20958
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


The recovery code needs to be reworked into the new WAL API

Might be tricky because we have the WALSplitter now which is not something that 
is "constant".

One thought was that we could support per-Region-WALs instead of per-RS-WALs 
with the Ratis-LogService WALs. We would need to figure out how to wrangle this 
inside of an implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20957) Switch replication over to use the new WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20957:
--

 Summary: Switch replication over to use the new WAL API
 Key: HBASE-20957
 URL: https://issues.apache.org/jira/browse/HBASE-20957
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Replication needs to use the new WAL APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20956) Port AsyncFSWAL over to the new WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20956:
--

 Summary: Port AsyncFSWAL over to the new WAL API
 Key: HBASE-20956
 URL: https://issues.apache.org/jira/browse/HBASE-20956
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Need to get AsyncFSWAL over to use the new API we're making



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20955) Port FSHLog to new WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20955:
--

 Summary: Port FSHLog to new WAL API
 Key: HBASE-20955
 URL: https://issues.apache.org/jira/browse/HBASE-20955
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Convert the old FSHLog implementation over to using the new API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19369) HBase Should use Builder Pattern to Create Log Files while using WAL on Erasure Coding

2018-07-26 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558736#comment-16558736
 ] 

Mike Drob commented on HBASE-19369:
---

[~awleblang] - do you mind if I take this over?

> HBase Should use Builder Pattern to Create Log Files while using WAL on 
> Erasure Coding
> --
>
> Key: HBASE-19369
> URL: https://issues.apache.org/jira/browse/HBASE-19369
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Alex Leblang
>Assignee: Alex Leblang
>Priority: Major
> Attachments: HBASE-19369.master.001.patch, 
> HBASE-19369.master.002.patch, HBASE-19369.master.003.patch, 
> HBASE-19369.master.004.patch, HBASE-19369.v5.patch, HBASE-19369.v6.patch, 
> HBASE-19369.v7.patch, HBASE-19369.v8.patch
>
>
> Right now an HBase instance using the WAL won't function properly in an 
> Erasure Coded environment. We should change the following line to use the 
> hdfs.DistributedFileSystem builder pattern 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20598) Upgrade to JRuby 9.2

2018-07-26 Thread Jack Bearden (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558715#comment-16558715
 ] 

Jack Bearden commented on HBASE-20598:
--

Just a quick update on this. I have been investigating the errors in the test 
class: TestHBaseFsckCleanReplicationBarriers. These tests pass on my mac and my 
docker image. I will try some full test runs over the next few days to see if I 
can reproduce

> Upgrade to JRuby 9.2
> 
>
> Key: HBASE-20598
> URL: https://issues.apache.org/jira/browse/HBASE-20598
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies, shell
>Reporter: Josh Elser
>Assignee: Jack Bearden
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20598.001.patch, HBASE-20598.002.patch
>
>
> [~mdrob] pointed out that there's a JRuby 9.2 release. We should see if we 
> can get ourselves onto that from our current 9.1 release line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558671#comment-16558671
 ] 

stack commented on HBASE-20952:
---

Old request is reexamination of the WALEdit/WALKey entities because they are 
fat objects that duplicate attributes. Would be sweet if these got a review as 
part of this work (maybe its out of scope).

Also, lets aim for low friction (a 'soft' target, I know). There is too much 
here as it is (needs digging). Recently I tried multiwal with the new asyncfs 
expecting two WALs to go close to 2x the throughput or > 1.5 but no, its more 
like 1.1 x the throughput.

Will multiwal be supported?

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558661#comment-16558661
 ] 

Zach York commented on HBASE-20952:
---

I definitely have some thoughts on this. I'll try to summarize and put it here, 
but in general making the interface as basic as possible would be the easiest 
to work with IMO.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20952:

Component/s: wal

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558650#comment-16558650
 ] 

Sean Busbey commented on HBASE-20952:
-

and fall out for the replication system

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Priority: Major
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20954) Support recovery to read from a LogService WAL

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20954:
--

 Summary: Support recovery to read from a LogService WAL
 Key: HBASE-20954
 URL: https://issues.apache.org/jira/browse/HBASE-20954
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Recovery needs to read from a LogStream and show that it can replay all edits 
since the last flush to make sure that no data is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20953) Write to Ratis LogService as a WAL

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20953:
--

 Summary: Write to Ratis LogService as a WAL
 Key: HBASE-20953
 URL: https://issues.apache.org/jira/browse/HBASE-20953
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Create an implementation of the WAL that can handle writes into HBase.

e.g. we write data into HBase and should see data flowing into a logstream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20952) Re-visit the WAL API

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20952:
--

 Summary: Re-visit the WAL API
 Key: HBASE-20952
 URL: https://issues.apache.org/jira/browse/HBASE-20952
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Take a step back from the current WAL implementations and think about what an 
HBase WAL API should look like. What are the primitive calls that we require to 
guarantee durability of writes with a high degree of performance?

The API needs to take the current implementations into consideration. We should 
also have a mind for what is happening in the Ratis LogService (but the 
LogService should not dictate what HBase's WAL API looks like RATIS-272).

The API may be "OK" (or OK in a part). We need to also consider other methods 
which were "bolted" on such as {{AbstractFSWAL}} and {{WALFileLengthProvider}}. 
Other corners of "WAL use" (like the {{WALSplitter}} should also be looked at 
to use WAL-APIs only).

We also need to make sure that adequate interface audience and stability 
annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19008) Add missing equals or hashCode method(s) to stock Filter implementations

2018-07-26 Thread Jan Hentschel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558634#comment-16558634
 ] 

Jan Hentschel commented on HBASE-19008:
---

[~yuzhih...@gmail.com] Yes, we should definitely do this. Almost forgot about 
this one.

[~liubangchen] Do you want to take this one? Sorry, I'm not sure if I 
understand your question correctly.

> Add missing equals or hashCode method(s) to stock Filter implementations
> 
>
> Key: HBASE-19008
> URL: https://issues.apache.org/jira/browse/HBASE-19008
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Jan Hentschel
>Priority: Major
>  Labels: filter
>
> In HBASE-15410, [~mdrob] reminded me that Filter implementations may not 
> write {{equals}} or {{hashCode}} method(s).
> This issue is to add missing {{equals}} or {{hashCode}} method(s) to stock 
> Filter implementations such as KeyOnlyFilter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20951) Ratis LogService backed WALs

2018-07-26 Thread Josh Elser (JIRA)
Josh Elser created HBASE-20951:
--

 Summary: Ratis LogService backed WALs
 Key: HBASE-20951
 URL: https://issues.apache.org/jira/browse/HBASE-20951
 Project: HBase
  Issue Type: New Feature
  Components: wal
Reporter: Josh Elser
Assignee: Josh Elser


Umbrella issue for the Ratis+WAL work:

Design doc: 
[https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit]

The (over-simplified) goal is to re-think the current WAL APIs we have now, 
ensure that they are de-coupled from the notion of being backed by HDFS, swap 
the current implementations over to the new API, and then wire up the Ratis 
LogService to the new WAL API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558600#comment-16558600
 ] 

Sergey Soldatov commented on HBASE-20927:
-

[~elserj] Almost. After some modifications (removing dependencies on internal 
API from TestRSGroupBase) noticed this. 

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-26 Thread Vishal Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558541#comment-16558541
 ] 

Vishal Khandelwal commented on HBASE-20930:
---

Thanks [~elserj] for review. I am fine with anything but I can do these changes 
tomorrow when @ work. 

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20919) meta region can't be re-onlined when restarting cluster if opening rsgroup

2018-07-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558512#comment-16558512
 ] 

Hadoop QA commented on HBASE-20919:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
48s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
11s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 6s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-20919 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933158/HBASE-20919-branch-2.0-02.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux asf927.gq1.ygridcore.net 4.4.0-130-generic #156-Ubuntu SMP Thu 
Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / c11f0e4 |
| maven | version: Apache Maven 3.0.5 
(r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 13:51:28+) |
| Default Java | 1.8.0_172 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13815/testReport/ |
| modules | C: hbase-rsgroup U: hbase-rsgroup |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13815/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> meta region can't be re-onlined when restarting cluster if opening rsgroup
> 

[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-26 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558482#comment-16558482
 ] 

Sakthi commented on HBASE-20885:


Thanks for the prompt replies [~elserj]. Will get back to you on explaining 
what I meant by this "affecting space quotas". 

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20945:
---
Affects Version/s: (was: 1.1.2)

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558463#comment-16558463
 ] 

Josh Elser commented on HBASE-20945:


Compactions are by region though, not by server. For example, the last time a 
MajC was run for regionA might have been 5minutes ago, but the last MajC for 
regionB could've been 10 days ago.

I feel like tracking MajC per region is a bit heavy handed.

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20930) MetaScanner.metaScan should use passed variable for meta table name rather than TableName.META_TABLE_NAME

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558456#comment-16558456
 ] 

Josh Elser commented on HBASE-20930:


{code:java}
+  Assert.fail("Passed invlaid meta table name but it is not honored");
{code}
nit: spelling :)

[~vishk], you can either attach a new patch which has the spelling and 
whitespace fix, or I can fix them on commit. Let me know which you'd prefer.

> MetaScanner.metaScan should use passed variable for meta table name rather 
> than TableName.META_TABLE_NAME
> -
>
> Key: HBASE-20930
> URL: https://issues.apache.org/jira/browse/HBASE-20930
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Minor
> Fix For: 1.3.3
>
> Attachments: HBASE-20930.branch-1.3.patch
>
>
> MetaScanner.metaScan 
>  try (Table metaTable = new HTable(TableName.META_TABLE_NAME, connection, 
> null)) {
> should be changed to 
> metaScan(connection, visitor, userTableName, null, Integer.MAX_VALUE, 
> metaTableName)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20944) Building hbase 2.0.0 with Hadoop 3.1.0

2018-07-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558415#comment-16558415
 ] 

Sean Busbey commented on HBASE-20944:
-

Also please note that the position of the HBase community is that you should 
not run HBase on top of Hadoop 3.1.0. please see 
http://hbase.apache.org/book.html#hadoop for details

> Building hbase 2.0.0 with Hadoop 3.1.0
> --
>
> Key: HBASE-20944
> URL: https://issues.apache.org/jira/browse/HBASE-20944
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 2.0.0
> Environment: First env:
> {code:java}
> korvit@AKORABLEV:/mnt/c/Users/AKorablev/source/hbase$ mvn -version Apache 
> Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) 
> Maven home: /mnt/c/Users/AKorablev/tool/apache-maven-3.5.4 Java version: 
> 1.8.0_171, vendor: Oracle Corporation, runtime: 
> /usr/lib/jvm/java-8-oracle/jre Default locale: en, platform encoding: UTF-8 
> OS name: "linux", version: "4.4.0-17134-microsoft", arch: "amd64", family: 
> "unix" 
> korvit@AKORABLEV:/mnt/c/Users/AKorablev/source/hbase$ java -version java 
> version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) 
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) 
> korvit@AKORABLEV:/mnt/c/Users/AKorablev/source/hbase$ uname -a Linux 
> AKORABLEV 4.4.0-17134-Microsoft #137-Microsoft Thu Jun 14 18:46:00 PST 2018 
> x86_64 x86_64 x86_64 GNU/Linux
> {code}
> Second env:
> {code:java}
> user@host:~$ mvn -version
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_171, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-oracle/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-127-generic", arch: "amd64", family: "unix"
> user@host:~$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> user@host:~$ uname -a
> Linux host.doman.cloud 4.4.0-127-generic #153-Ubuntu SMP Sat May 19 10:58:46 
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> {code}
>Reporter: Alexander Korablev
>Priority: Critical
>
> Hi! I'm trying to build Hbase 2.0.0 with Hadoop 3.1.0. According the 
> [documentation|http://hbase.apache.org/0.94/book/build.html]:
> {code:java}
> git clone https://github.com/apache/hbase.git
> cd hbase
> git checkout rel/2.0.0
> MAVEN_OPTS="-Xmx2g" mvn clean site install assembly:assembly -DskipTests 
> -Prelease -Dhadoop.profile=3.0 -Dhadoop.version=3.1.0 
> -Dhadoop-three.version=3.1.0
> {code}
>  And i can not found Hbase distribution anywhere.
> What's i'am doing wrong?
> I'm want it because i'm face this 
> [issue|https://stackoverflow.com/questions/48709569/hbase-error-illegalstateexception-when-starting-master-hsync]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558403#comment-16558403
 ] 

Josh Elser commented on HBASE-20657:


[~sergey.soldatov], can you rebase this when you have a moment to breathe?

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558398#comment-16558398
 ] 

Josh Elser commented on HBASE-20885:


Left some comments on RB. Thanks, [~jatsakthi]

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20885) Remove entry for RPC quota from hbase:quota when RPC quota is removed.

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558383#comment-16558383
 ] 

Josh Elser commented on HBASE-20885:


{quote}having this messes up when a Space quota is being set up on the same 
table
{quote}
Forgive me, how does this affect space quotas? It's not clear to me just by 
looking at the parent (multiple issues mentioned there).
{quote}I'm not sure if this was done on purpose (for some kind of an 
optimization)
{quote}
99% sure this was not on purpose. GlobalQuotaSettingsImpl was me trying to 
consolidate rpc and space quotas together. No reason to keep around a record in 
hbase:quota if we have no corresponding quota. Let me look more closely at your 
patch, seems OK on a glance.

> Remove entry for RPC quota from hbase:quota when RPC quota is removed.
> --
>
> Key: HBASE-20885
> URL: https://issues.apache.org/jira/browse/HBASE-20885
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20885.master.001.patch, 
> hbase-20885.master.002.patch
>
>
> When a RPC quota is removed (using LIMIT => 'NONE'), the entry from 
> hbase:quota table is not completely removed. For e.g. see below:
> {noformat}
> hbase(main):005:0> create 't2','cf1'
> Created table t2
> Took 0.8000 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.1024 seconds
> hbase(main):007:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0622 seconds
> hbase(main):008:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513014463, 
> value=PBUF\x12\x0B\x12\x09\x08\x04\x10\x80\x80\x80
>\x05 \x02
> 1 row(s)
> Took 0.0453 seconds
> hbase(main):009:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 'NONE'
> Took 0.0097 seconds
> hbase(main):010:0> list_quotas
> OWNER  QUOTAS
> 0 row(s)
> Took 0.0338 seconds
> hbase(main):011:0> scan 'hbase:quota'
> ROWCOLUMN+CELL
>  t.t2  column=q:s, timestamp=1531513039505, 
> value=PBUF\x12\x00
> 1 row(s)
> Took 0.0066 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20949) Split/Merge table can be executed concurrently with DisableTableProcedure

2018-07-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558380#comment-16558380
 ] 

Duo Zhang commented on HBASE-20949:
---

What I pushed to master for debugging.

Will take a look tomorrow.

> Split/Merge table can be executed concurrently with DisableTableProcedure
> -
>
> Key: HBASE-20949
> URL: https://issues.apache.org/jira/browse/HBASE-20949
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-20949-debug.patch
>
>
> The top flaky tests on the dashboard are all because of this.
> TestRestoreSnapshotFromClient
> TestSimpleRegionNormalizerOnCluster
> Theoretically this should not happen, need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20949) Split/Merge table can be executed concurrently with DisableTableProcedure

2018-07-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20949:
--
Attachment: HBASE-20949-debug.patch

> Split/Merge table can be executed concurrently with DisableTableProcedure
> -
>
> Key: HBASE-20949
> URL: https://issues.apache.org/jira/browse/HBASE-20949
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-20949-debug.patch
>
>
> The top flaky tests on the dashboard are all because of this.
> TestRestoreSnapshotFromClient
> TestSimpleRegionNormalizerOnCluster
> Theoretically this should not happen, need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558377#comment-16558377
 ] 

Josh Elser commented on HBASE-20927:


Just to confirm, you found this via RSGroup tests, [~sergey.soldatov]? I am 
only seeing this method called there and by the HBase shell.

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20950) Helper method to configure secure DFS cluster for tests

2018-07-26 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HBASE-20950:
---

 Summary: Helper method to configure secure DFS cluster for tests
 Key: HBASE-20950
 URL: https://issues.apache.org/jira/browse/HBASE-20950
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


There is quite some boilerplate code for configuring a secure HDFS cluster for 
tests. The code is repeated in a number of test files within HBase code base. 
Convert the boilerplate code into a helper method to avoid duplication and 
lower maintenance effort.

SecureTestCluster#setHdfsSecuredConfiguration
TestSecureExport#setUpClusterKdc
TestThriftSpnegoHttpServer#addSecurityConfigurations
TestSaslFanOutOneBlockAsyncDFSOutput#setHdfsSecuredConfiguration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HBASE-18599) Add missing @Deprecated annotations

2018-07-26 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HBASE-18599:
-
Comment: was deleted

(was: A comment with security level 'jira-users' was removed.)

> Add missing @Deprecated annotations
> ---
>
> Key: HBASE-18599
> URL: https://issues.apache.org/jira/browse/HBASE-18599
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-18599.patch
>
>
> There are a couple of places where deprecations have only been added in the 
> Javadoc but the annotation is missing.
> I'll also change the Javadoc to be consistent with what I've done in 
> HBASE-13462.
> This is for master/2.0.0 only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20649) Validate HFiles do not have PREFIX_TREE DataBlockEncoding

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558356#comment-16558356
 ] 

Hudson commented on HBASE-20649:


Results for branch master
[build #409 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/409/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Validate HFiles do not have PREFIX_TREE DataBlockEncoding
> -
>
> Key: HBASE-20649
> URL: https://issues.apache.org/jira/browse/HBASE-20649
> Project: HBase
>  Issue Type: New Feature
>  Components: Operability, tooling
>Reporter: Peter Somogyi
>Assignee: Balazs Meszaros
>Priority: Minor
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20649.master.001.patch, 
> HBASE-20649.master.002.patch, HBASE-20649.master.003.patch, 
> HBASE-20649.master.004.patch, HBASE-20649.master.005.patch, 
> HBASE-20649.master.006.patch
>
>
> HBASE-20592 adds a tool to check column families on the cluster do not have 
> PREFIX_TREE encoding.
> Since it is possible that DataBlockEncoding was already changed but HFiles 
> are not rewritten yet we would need a tool that can verify the content of 
> hfiles in the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20928) Rewrite calculation of midpoint in binarySearch functions to prevent overflow

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558357#comment-16558357
 ] 

Hudson commented on HBASE-20928:


Results for branch master
[build #409 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/409/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Rewrite calculation of midpoint in binarySearch functions to prevent overflow
> -
>
> Key: HBASE-20928
> URL: https://issues.apache.org/jira/browse/HBASE-20928
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Reporter: saurabh singh
>Assignee: saurabh singh
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HBASE-20928-addendum.patch, 
> HBASE-20928-fix-binarySearch-v5.patch, HBASE-20928-fix-binarySearch-v5.patch
>
>
> There are couple of issues in the function:
>  * {{>>>}} operator would mess the values if {{low}} + {{high}} end up being 
> negative. This shouldn't happen but I don't see anything to prevent this from 
> happening.
>  * The code fails around boundary values of {{low}} and {{high}}. This is a 
> well known binary search catch. 
> [https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html]
>  
> Most of the code should already be covered by tests. I would have liked to 
> add a test that actually fails without the fix but given these are private 
> methods I am not sure on the best place to add the test. Suggestions?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20867) RS may get killed while master restarts

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558355#comment-16558355
 ] 

Hudson commented on HBASE-20867:


Results for branch master
[build #409 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/409/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> RS may get killed while master restarts
> ---
>
> Key: HBASE-20867
> URL: https://issues.apache.org/jira/browse/HBASE-20867
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20867.branch-2.0.001.patch, 
> HBASE-20867.branch-2.0.002.patch, HBASE-20867.branch-2.0.003.patch, 
> HBASE-20867.branch-2.0.004.patch, HBASE-20867.branch-2.0.005.patch, 
> HBASE-20867.branch-2.0.006.patch
>
>
> If the master is dispatching a RPC call to RS when aborting. A connection 
> exception may be thrown by the RPC layer(A IOException with "Connection 
> closed" message in this case). The RSProcedureDispatcher will regard is as an 
> un-retryable exception and pass it to UnassignProcedue.remoteCallFailed, 
> which will expire the RS.
> Actually, the RS is very healthy, only the master is restarting.
> I think we should deal with those kinds of connection exceptions in 
> RSProcedureDispatcher and retry the rpc call



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20932) Effective MemStoreSize::hashCode()

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558358#comment-16558358
 ] 

Hudson commented on HBASE-20932:


Results for branch master
[build #409 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/409/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/409//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Effective MemStoreSize::hashCode() 
> ---
>
> Key: HBASE-20932
> URL: https://issues.apache.org/jira/browse/HBASE-20932
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HBASE-20932.001.patch, HBASE-20932.002.patch
>
>
> After HBASE-20411 we have
> {code:java|title=MemStoreSize::hashCode()}
>   @Override
>   public int hashCode() {
> long h = 31 * this.dataSize;
> h = h + 31 * this.heapSize;
> h = h + 31 * this.offHeapSize;
> return (int) h;
>   }
>  {code}
> This is not effective {{hashCode()}} implementation. Instead we can use:
> {code:java|title=MemStoreSize::hashCode()}
>   @Override
>   public int hashCode() {
> long h = this.dataSize;
> h = h * 31 + this.heapSize;
> h = h * 31 + this.offHeapSize;
> return (int) h;
>   }
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558349#comment-16558349
 ] 

Hadoop QA commented on HBASE-20927:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
7m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
6s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-20927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933164/HBASE-20927.master.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-143-generic #192-Ubuntu SMP Tue 
Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / a392c01 |
| maven | version: Apache Maven 3.0.5 
(r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 13:51:28+) |
| Default Java | 1.8.0_172 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13814/testReport/ |
| modules | C: hbase-rsgroup U: hbase-rsgroup |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13814/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: 

[jira] [Commented] (HBASE-20749) Upgrade our use of checkstyle to 8.6+

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558342#comment-16558342
 ] 

Hudson commented on HBASE-20749:


Results for branch HBASE-20749
[build #4 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/4/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/4//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/4//console].


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/4//console].


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Upgrade our use of checkstyle to 8.6+
> -
>
> Key: HBASE-20749
> URL: https://issues.apache.org/jira/browse/HBASE-20749
> Project: HBase
>  Issue Type: Improvement
>  Components: build, community
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-20749.master.001.patch
>
>
> We should upgrade our checkstyle version to 8.6 or later so we can use the 
> "match violation message to this regex" feature for suppression. That will 
> allow us to make sure we don't regress on HTrace v3 vs v4 APIs (came up in 
> HBASE-20332).
> We're currently blocked on upgrading to 8.3+ by [checkstyle 
> #5279|https://github.com/checkstyle/checkstyle/issues/5279], a regression 
> that flags our use of both the "separate import groups" and "put static 
> imports over here" configs as an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >