[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-07-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561446#comment-16561446
 ] 

stack commented on HBASE-20188:
---

Profiling log link: 
https://docs.google.com/document/d/1vZ_k6_pNR1eQxID5u1xFihuPC7FkPaJQW8c4M5eA2AQ/edit

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188-xac.sh, 
> HBASE-20188.sh, HBase 2.0 performance evaluation - 8GB(1).pdf, HBase 2.0 
> performance evaluation - 8GB.pdf, HBase 2.0 performance evaluation - Basic vs 
> None_ system settings.pdf, HBase 2.0 performance evaluation - throughput 
> SSD_HDD.pdf, ITBLL2.5B_1.2.7vs2.0.0_cpu.png, 
> ITBLL2.5B_1.2.7vs2.0.0_gctime.png, ITBLL2.5B_1.2.7vs2.0.0_iops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_load.png, ITBLL2.5B_1.2.7vs2.0.0_memheap.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memstore.png, ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-env.sh, hbase-site.xml, 
> hbase-site.xml, hits.png, hits_with_fp_scheduler.png, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, perregion.png, run_ycsb.sh, 
> total.png, tree.txt, workloadx, workloadx
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-29 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-20974:
-
Fix Version/s: 1.5.0

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-29 Thread Pankaj Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561436#comment-16561436
 ] 

Pankaj Kumar edited comment on HBASE-20974 at 7/30/18 5:01 AM:
---

[~andrew.purt...@gmail.com] please review. This patch can be applied to 
branch-1.4/1.3/1.2 as well.


was (Author: pankaj2461):
[~andrew.purt...@gmail.com] please review.

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-29 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-20974:
-
Status: Patch Available  (was: Open)

[~andrew.purt...@gmail.com] please review.

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Attachments: HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-29 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-20974:
-
Attachment: HBASE-20974.branch-1.patch

> Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when 
> split a wal) to branch-1
> --
>
> Key: HBASE-20974
> URL: https://issues.apache.org/jira/browse/HBASE-20974
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Attachments: HBASE-20974.branch-1.patch
>
>
> Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20583) SplitLogWorker should handle FileNotFoundException when split a wal

2018-07-29 Thread Pankaj Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561435#comment-16561435
 ] 

Pankaj Kumar commented on HBASE-20583:
--

Created HBASE-20974 subtask to backport this issue to branch-1.x.

> SplitLogWorker should handle FileNotFoundException when split a wal
> ---
>
> Key: HBASE-20583
> URL: https://issues.apache.org/jira/browse/HBASE-20583
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: HBASE-20583.master.001.patch, 
> HBASE-20583.master.001.patch
>
>
> When a split task is finished, master will delete the wal first, then remove 
> the task's zk node. So if master crashed after delelte the wal, the zk task 
> node may be leaved on zk. When master resubmit this task, the task will 
> failed by FileNotFoundException.
> We also handle FileNotFoundException in WALSplitter. But not handle this in 
> SplitLogWorker.
>  
> {code:java}
>   try {
> in = getReader(path, reporter);
>   } catch (EOFException e) {
> if (length <= 0) {
>   // TODO should we ignore an empty, not-last log file if skip.errors
>   // is false? Either way, the caller should decide what to do. E.g.
>   // ignore if this is the last log in sequence.
>   // TODO is this scenario still possible if the log has been
>   // recovered (i.e. closed)
>   LOG.warn("Could not open {} for reading. File is empty", path, e);
> }
> // EOFException being ignored
> return null;
>   }
> } catch (IOException e) {
>   if (e instanceof FileNotFoundException) {
> // A wal file may not exist anymore. Nothing can be recovered so move on
> LOG.warn("File {} does not exist anymore", path, e);
> return null;
>   }
> }{code}
> {code:java}
> // Here fs.getFileStatus may throw FileNotFoundException, too. We should 
> handle this exception as the WALSplitter.getReader.
> try {
>   if (!WALSplitter.splitLogFile(walDir, fs.getFileStatus(new Path(walDir, 
> filename)),
> fs, conf, p, sequenceIdChecker,
>   server.getCoordinatedStateManager().getSplitLogWorkerCoordination(), 
> factory)) {
> return Status.PREEMPTED;
>   }
> } 
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20974) Backport HBASE-20583 (SplitLogWorker should handle FileNotFoundException when split a wal) to branch-1

2018-07-29 Thread Pankaj Kumar (JIRA)
Pankaj Kumar created HBASE-20974:


 Summary: Backport HBASE-20583 (SplitLogWorker should handle 
FileNotFoundException when split a wal) to branch-1
 Key: HBASE-20974
 URL: https://issues.apache.org/jira/browse/HBASE-20974
 Project: HBase
  Issue Type: Sub-task
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar


Backport HBASE-20583 to branch-1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20972) Fix call queue buffer size leaking bug

2018-07-29 Thread Xiaolin Ha (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-20972:
---
Attachment: HBASE-20972.branch-2.0.001.patch

> Fix call queue buffer size leaking bug
> --
>
> Key: HBASE-20972
> URL: https://issues.apache.org/jira/browse/HBASE-20972
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.1.0, 2.0.0, 2.2.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: HBASE-20972.branch-2.0.001.patch
>
>
> Call queue size is the currently queued and running Calls bytes size. It gets 
> incremented after we parse a call and before we add it to the queue of calls 
> for the scheduler to use. It get decremented after we have 'run' the Call. 
> When setting up a call, total size of it is added. So when a new call can not 
> be dispatched by BlockingQueue full, the call queue size should be 
> decremented. We shouldn't add size of rejected calls to the call queue size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-07-29 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20973:
--

 Summary: ArrayIndexOutOfBoundsException when rolling back procedure
 Key: HBASE-20973
 URL: https://issues.apache.org/jira/browse/HBASE-20973
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Find this one while investigating HBASE-20921. After the root 
procedure(ModifyTableProcedure  in this case) rolled back, a 
ArrayIndexOutOfBoundsException was thrown
{code}
2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
CODE-BUG: Uncaught runtime exception for pid=5973, 
state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
ang.NullPointerException; ModifyTableProcedure 
table=IntegrationTestBigLinkedList
java.lang.UnsupportedOperationException: unhandled 
state=MODIFY_TABLE_REOPEN_ALL_REGIONS
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
2018-07-18 01:39:10,243 WARN  [PEWorker-8] procedure2.ProcedureExecutor(1756): 
Worker terminating UNNATURALLY null
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}

This is a very serious condition, After this exception thrown, the exclusive 
lock held by ModifyTableProcedure was never released. All the procedure against 
this table were blocked. Until the master restarted, and since the lock info 
for the procedure won't be restored, the other procedures can go again, it is 
quite embarrassing that a bug save us...(this bug will be fixed in HBASE-20846)

I tried to reproduce this one using the test case in HBASE-20921 but I just 
can't reproduce it.
A easy way to resolve this is add a try catch, making sure no matter what 
happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20972) Fix call queue buffer size leaking bug

2018-07-29 Thread Xiaolin Ha (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-20972:
---
Affects Version/s: 2.2.0
   2.1.0

> Fix call queue buffer size leaking bug
> --
>
> Key: HBASE-20972
> URL: https://issues.apache.org/jira/browse/HBASE-20972
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.1.0, 2.0.0, 2.2.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> Call queue size is the currently queued and running Calls bytes size. It gets 
> incremented after we parse a call and before we add it to the queue of calls 
> for the scheduler to use. It get decremented after we have 'run' the Call. 
> When setting up a call, total size of it is added. So when a new call can not 
> be dispatched by BlockingQueue full, the call queue size should be 
> decremented. We shouldn't add size of rejected calls to the call queue size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20972) Fix call queue buffer size leaking bug

2018-07-29 Thread Xiaolin Ha (JIRA)
Xiaolin Ha created HBASE-20972:
--

 Summary: Fix call queue buffer size leaking bug
 Key: HBASE-20972
 URL: https://issues.apache.org/jira/browse/HBASE-20972
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0
Reporter: Xiaolin Ha
Assignee: Xiaolin Ha


Call queue size is the currently queued and running Calls bytes size. It gets 
incremented after we parse a call and before we add it to the queue of calls 
for the scheduler to use. It get decremented after we have 'run' the Call. 

When setting up a call, total size of it is added. So when a new call can not 
be dispatched by BlockingQueue full, the call queue size should be decremented. 
We shouldn't add size of rejected calls to the call queue size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20886) [Auth] Support keytab login in hbase client

2018-07-29 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561367#comment-16561367
 ] 

Reid Chan commented on HBASE-20886:
---

ping [~elserj], if you have free cycles.

> [Auth] Support keytab login in hbase client
> ---
>
> Key: HBASE-20886
> URL: https://issues.apache.org/jira/browse/HBASE-20886
> Project: HBase
>  Issue Type: New Feature
>  Components: asyncclient, Client, security
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20886.master.001.patch, 
> HBASE-20886.master.002.patch, HBASE-20886.master.003.patch, 
> HBASE-20886.master.004.patch, HBASE-20886.master.005.patch, 
> HBASE-20886.master.006.patch, HBASE-20886.master.007.patch, 
> HBASE-20886.master.008.patch
>
>
> There're lots of questions about how to connect to kerberized hbase cluster 
> through hbase-client api from user-mail and slack channel.
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> already existed in code base, but they are only used in {{Canary}}.
> This issue is to make use of two configs to support client-side keytab based 
> login, after this issue resolved, hbase-client should directly connect to 
> kerberized cluster without changing any code as long as 
> {{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}} are 
> specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-29 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561345#comment-16561345
 ] 

Allan Yang commented on HBASE-20893:


{quote}
 I filed an issue to address your finding on miscounts if an exception thrown 
as subtask here
{quote}
[~stack], have you filed a issue for this? I haven't found it. If you don't 
have time, I can do it for you, sir.

> Data loss if splitting region while ServerCrashProcedure executing
> --
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20967) TestFromClientSide3 fails with NPE

2018-07-29 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561340#comment-16561340
 ] 

Duo Zhang commented on HBASE-20967:
---

No, it passes for me too... The flaky test finder will run the tests again and 
again, and it will fail finally... See the link in the description. For flaky 
test, typically we will push the patch first and then wait for several days to 
see if the test could fall off the flaky list.

> TestFromClientSide3 fails with NPE
> --
>
> Key: HBASE-20967
> URL: https://issues.apache.org/jira/browse/HBASE-20967
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
>
>
> https://builds.apache.org/job/HBASE-Flaky-Tests/35375/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3/testLockLeakWithDelta/
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:995)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.find(TestFromClientSide3.java:1002)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.testLockLeakWithDelta(TestFromClientSide3.java:783)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException

2018-07-29 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561336#comment-16561336
 ] 

Allan Yang commented on HBASE-20939:


Great! Hope this one can solve the 'mysterious'  corrupt procedure

> There will be race when we call suspendIfNotReady and then throw 
> ProcedureSuspendedException
> 
>
> Key: HBASE-20939
> URL: https://issues.apache.org/jira/browse/HBASE-20939
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.1, 2.2.0, 2.1.1
>
> Attachments: HBASE-20939.patch, HBASE-20939.patch
>
>
> This is very typical usage in our procedure implementation, for example, in 
> AssignProcedure, we will call AM.queueAssign and then suspend ourselves to 
> wait until the AM finish processing our assign request.
> But there could be races. Think of this:
> 1. We call suspendIfNotReady on a event, and it returns true so we need to 
> wait.
> 2. The event has been waked up, and the procedure will be added back to the 
> scheduler.
> 3. A worker picks up the procedure and finishes it.
> 4. We finally throw ProcedureSuspendException and the ProcedureExecutor 
> suspend us and store the state in procedure store.
> So we have a half done procedure in the procedure store for ever... This may 
> cause assertion when loading procedures. And maybe the worker can not finish 
> the procedure as when suspending we need to restore some state, for example, 
> add something to RootProcedureState. But anyway, it will still lead to 
> assertion or other unexpected errors.
> And this can not be done by simply adding a lock in the procedure, as most 
> works are done in the ProcedureExecutor after we throw 
> ProcedureSuspendException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20971) Please add OWASP Dependency Check to the core build (pom.xml) and all sub-component builds.

2018-07-29 Thread Albert Baker (JIRA)
Albert Baker created HBASE-20971:


 Summary: Please add OWASP Dependency Check to the core build 
(pom.xml) and all sub-component builds.
 Key: HBASE-20971
 URL: https://issues.apache.org/jira/browse/HBASE-20971
 Project: HBase
  Issue Type: New Feature
  Components: build
Affects Versions: 3.0.0, 2.2.0, 2.1.1
 Environment: All development, build, test, environments.
Reporter: Albert Baker


Please add OWASP Dependency Check to the build (pom.xml). OWASP DC makes an 
outbound REST call to MITRE Common Vulnerabilities & Exposures (CVE) to perform 
a lookup for each dependant .jar to list any/all known vulnerabilities for each 
jar. This step is needed because a manual MITRE CVE lookup/check on the main 
component does not include checking for vulnerabilities in components or in 
dependant libraries.

OWASP Dependency check : https://www.owasp.org/index.php/OWASP_Dependency_Check 
has plug-ins for most Java build/make types (ant, maven, ivy, gradle).

Also, add the appropriate command to the nightly build to generate a report of 
all known vulnerabilities in any/all third party libraries/dependencies that 
get pulled in. example : mvn -Powasp -Dtest=false -DfailIfNoTests=false clean 
aggregate

Generating this report nightly/weekly will help inform the project's 
development team if any dependant libraries have a reported known 
vulnerailities. Project teams that keep up with removing vulnerabilities on a 
weekly basis will help protect businesses that rely on these open source 
componets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20578) Support region server group in target cluster

2018-07-29 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561257#comment-16561257
 ] 

Ted Yu commented on HBASE-20578:


Apart from the following static method:
{code}
  protected static List fetchSlavesAddresses(ZooKeeperWatcher zkw)
{code}
which other static method are you referring to ?

HBaseReplicationEndpoint is marked @InterfaceAudience.Private
I think small amount of refactoring HBaseReplicationEndpoint may help you avoid 
code duplication.

> Support region server group in target cluster
> -
>
> Key: HBASE-20578
> URL: https://issues.apache.org/jira/browse/HBASE-20578
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, rsgroup
>Reporter: Ted Yu
>Assignee: Albert Lee
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20578-001.patch
>
>
> When source tables belong to non-default region server group(s) and there are 
> region server group counterpart in the target cluster, we should support 
> replicating to target cluster using the region server group mapping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters

2018-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561247#comment-16561247
 ] 

Hudson commented on HBASE-18477:


Results for branch HBASE-18477
[build #279 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/279/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/279//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/279//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/279//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/279//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Umbrella JIRA for HBase Read Replica clusters
> -
>
> Key: HBASE-18477
> URL: https://issues.apache.org/jira/browse/HBASE-18477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase 
> Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope 
> doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf
>
>
> Recently, changes (such as HBASE-17437) have unblocked HBase to run with a 
> root directory external to the cluster (such as in Amazon S3). This means 
> that the data is stored outside of the cluster and can be accessible after 
> the cluster has been terminated. One use case that is often asked about is 
> pointing multiple clusters to one root directory (sharing the data) to have 
> read resiliency in the case of a cluster failure.
>  
> This JIRA is an umbrella JIRA to contain all the tasks necessary to create a 
> read-replica HBase cluster that is pointed at the same root directory.
>  
> This requires making the Read-Replica cluster Read-Only (no metadata 
> operation or data operations).
> Separating the hbase:meta table for each cluster (Otherwise HBase gets 
> confused with multiple clusters trying to update the meta table with their ip 
> addresses)
> Adding refresh functionality for the meta table to ensure new metadata is 
> picked up on the read replica cluster.
> Adding refresh functionality for HFiles for a given table to ensure new data 
> is picked up on the read replica cluster.
>  
> This can be used with any existing cluster that is backed by an external 
> filesystem.
>  
> Please note that this feature is still quite manual (with the potential for 
> automation later).
>  
> More information on this particular feature can be found here: 
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20749) Upgrade our use of checkstyle to 8.6+

2018-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561225#comment-16561225
 ] 

Hudson commented on HBASE-20749:


Results for branch HBASE-20749
[build #7 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/7/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/7//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/7//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20749/7//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Upgrade our use of checkstyle to 8.6+
> -
>
> Key: HBASE-20749
> URL: https://issues.apache.org/jira/browse/HBASE-20749
> Project: HBase
>  Issue Type: Improvement
>  Components: build, community
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-20749.master.001.patch
>
>
> We should upgrade our checkstyle version to 8.6 or later so we can use the 
> "match violation message to this regex" feature for suppression. That will 
> allow us to make sure we don't regress on HTrace v3 vs v4 APIs (came up in 
> HBASE-20332).
> We're currently blocked on upgrading to 8.3+ by [checkstyle 
> #5279|https://github.com/checkstyle/checkstyle/issues/5279], a regression 
> that flags our use of both the "separate import groups" and "put static 
> imports over here" configs as an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561181#comment-16561181
 ] 

Hudson commented on HBASE-20538:


Results for branch branch-2
[build #1041 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1041/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1041//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1041//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1041//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561179#comment-16561179
 ] 

Hudson commented on HBASE-20538:


Results for branch branch-2.0
[build #607 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/607/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/607//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/607//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/607//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20749) Upgrade our use of checkstyle to 8.6+

2018-07-29 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561169#comment-16561169
 ] 

Sean Busbey commented on HBASE-20749:
-

patch on the branch looks good. +1 pending QABot, since there are a couple of 
things that get checked in patch submit that don't get looked at in nightly. (i 
believe they'll all pass on this patch, but might as well have it give the +1)

> Upgrade our use of checkstyle to 8.6+
> -
>
> Key: HBASE-20749
> URL: https://issues.apache.org/jira/browse/HBASE-20749
> Project: HBase
>  Issue Type: Improvement
>  Components: build, community
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-20749.master.001.patch
>
>
> We should upgrade our checkstyle version to 8.6 or later so we can use the 
> "match violation message to this regex" feature for suppression. That will 
> allow us to make sure we don't regress on HTrace v3 vs v4 APIs (came up in 
> HBASE-20332).
> We're currently blocked on upgrading to 8.3+ by [checkstyle 
> #5279|https://github.com/checkstyle/checkstyle/issues/5279], a regression 
> that flags our use of both the "separate import groups" and "put static 
> imports over here" configs as an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561166#comment-16561166
 ] 

Hudson commented on HBASE-20538:


Results for branch branch-2.1
[build #119 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/119/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/119//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/119//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/119//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19036) Add action in Chaos Monkey to restart Active Namenode

2018-07-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561156#comment-16561156
 ] 

Hadoop QA commented on HBASE-19036:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
4s{color} | {color:red} hbase-server: The patch generated 2 new + 31 unchanged 
- 0 fixed = 33 total (was 31) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} hbase-it: The patch generated 3 new + 13 unchanged - 0 
fixed = 16 total (was 13) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
16s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}171m 
41s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
6s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-19036 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933516/HBASE-19036.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 227c89997b0a 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-19036) Add action in Chaos Monkey to restart Active Namenode

2018-07-29 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561122#comment-16561122
 ] 

Ted Yu commented on HBASE-19036:


{code}
+clusterManager.start(ServiceType.HADOOP_NAMENODE, serverName.getHostname(),
+  serverName.getPort());
{code}
Have you tried the above in a secure cluster ?

> Add action in Chaos Monkey to restart Active Namenode
> -
>
> Key: HBASE-19036
> URL: https://issues.apache.org/jira/browse/HBASE-19036
> Project: HBase
>  Issue Type: Improvement
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Minor
> Attachments: HBASE-19036.branch-1.001.patch, 
> HBASE-19036.branch-1.001.patch, HBASE-19036.branch-1.002.patch, 
> HBASE-19036.branch-1.002.patch, HBASE-19036.master.001.patch
>
>
> Under hbase-it we have many actions related to DataNode, Zookeeper, HMaster 
> which gets use with Chaos Monkey and they are useful in testing . Having 
> action which restart Active Namenode would be useful too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19036) Add action in Chaos Monkey to restart Active Namenode

2018-07-29 Thread Monani Mihir (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Monani Mihir updated HBASE-19036:
-
Status: In Progress  (was: Patch Available)

added new patch for master branch.

> Add action in Chaos Monkey to restart Active Namenode
> -
>
> Key: HBASE-19036
> URL: https://issues.apache.org/jira/browse/HBASE-19036
> Project: HBase
>  Issue Type: Improvement
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Minor
> Attachments: HBASE-19036.branch-1.001.patch, 
> HBASE-19036.branch-1.001.patch, HBASE-19036.branch-1.002.patch, 
> HBASE-19036.branch-1.002.patch, HBASE-19036.master.001.patch
>
>
> Under hbase-it we have many actions related to DataNode, Zookeeper, HMaster 
> which gets use with Chaos Monkey and they are useful in testing . Having 
> action which restart Active Namenode would be useful too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19036) Add action in Chaos Monkey to restart Active Namenode

2018-07-29 Thread Monani Mihir (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Monani Mihir updated HBASE-19036:
-
Status: Patch Available  (was: In Progress)

> Add action in Chaos Monkey to restart Active Namenode
> -
>
> Key: HBASE-19036
> URL: https://issues.apache.org/jira/browse/HBASE-19036
> Project: HBase
>  Issue Type: Improvement
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Minor
> Attachments: HBASE-19036.branch-1.001.patch, 
> HBASE-19036.branch-1.001.patch, HBASE-19036.branch-1.002.patch, 
> HBASE-19036.branch-1.002.patch, HBASE-19036.master.001.patch
>
>
> Under hbase-it we have many actions related to DataNode, Zookeeper, HMaster 
> which gets use with Chaos Monkey and they are useful in testing . Having 
> action which restart Active Namenode would be useful too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19036) Add action in Chaos Monkey to restart Active Namenode

2018-07-29 Thread Monani Mihir (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Monani Mihir updated HBASE-19036:
-
Attachment: HBASE-19036.master.001.patch

> Add action in Chaos Monkey to restart Active Namenode
> -
>
> Key: HBASE-19036
> URL: https://issues.apache.org/jira/browse/HBASE-19036
> Project: HBase
>  Issue Type: Improvement
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Minor
> Attachments: HBASE-19036.branch-1.001.patch, 
> HBASE-19036.branch-1.001.patch, HBASE-19036.branch-1.002.patch, 
> HBASE-19036.branch-1.002.patch, HBASE-19036.master.001.patch
>
>
> Under hbase-it we have many actions related to DataNode, Zookeeper, HMaster 
> which gets use with Chaos Monkey and they are useful in testing . Having 
> action which restart Active Namenode would be useful too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20881) Introduce a region transition procedure to handle all the state transition for a region

2018-07-29 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561107#comment-16561107
 ] 

Duo Zhang commented on HBASE-20881:
---

There are some progress. And I think this can only go into 2.2+, as 
RegionTransitionProcedure is part of the AssignmentManager, if we change to use 
another procedure to replace it, then we need to modify the code in 
AssignmentManager, and it will be conflict with the old code with 
RegionTransitionProcedure.

And the solution will be simple, disable balancer before restarting master with 
the new code, so we will not have RegionTransitionProcedure when restarting, 
then everything will be OK. And we could add a check when loading procedures, 
if there are AssignProcedure/UnassignProcedure, then we abort and tell users to 
restart with the old code first to finish these procedures.

But I think this maybe too much a patch release, so let's do it in 2.2+.

> Introduce a region transition procedure to handle all the state transition 
> for a region
> ---
>
> Key: HBASE-20881
> URL: https://issues.apache.org/jira/browse/HBASE-20881
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> Now have an AssignProcedure, an UnssignProcedure, and also a 
> MoveRegionProcedure which schedules an AssignProcedure and an 
> UnssignProcedure to move a region. This makes the logic a bit complicated, as 
> MRP is not a RIT, so when SCP can not interrupt it directly...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20538:
--
Release Note: Update hadoop-two.version to 2.7.7 and hadoop-three.version 
to 3.0.3 due to a JDK issue which is solved by HADOOP-15473.

> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20538:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. The commit message is a bit strange so I reverted and 
reapplied it again...

Thanks all for reviewing.

> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20538) Upgrade our hadoop versions to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20538:
--
Summary: Upgrade our hadoop versions to 2.7.7 and 3.0.3  (was: Upgrade our 
hadoop version to 2.7.7 and 3.0.3)

> Upgrade our hadoop versions to 2.7.7 and 3.0.3
> --
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20538) Upgrade our hadoop version to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20538:
--
Summary: Upgrade our hadoop version to 2.7.7 and 3.0.3  (was: Upgrade our 
hadoop-two.version to 2.7.7 and 3.0.3)

> Upgrade our hadoop version to 2.7.7 and 3.0.3
> -
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20970) Update hadoop check versions

2018-07-29 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-20970:
-

 Summary: Update hadoop check versions
 Key: HBASE-20970
 URL: https://issues.apache.org/jira/browse/HBASE-20970
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: Duo Zhang
 Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20538) Upgrade our hadoop-two.version to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20538:
--
Fix Version/s: 2.2.0

> Upgrade our hadoop-two.version to 2.7.7 and 3.0.3
> -
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20538) Upgrade our hadoop-two.version to 2.7.7 and 3.0.3

2018-07-29 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561090#comment-16561090
 ] 

Duo Zhang commented on HBASE-20538:
---

Let me commit...

> Upgrade our hadoop-two.version to 2.7.7 and 3.0.3
> -
>
> Key: HBASE-20538
> URL: https://issues.apache.org/jira/browse/HBASE-20538
> Project: HBase
>  Issue Type: Bug
>  Components: java, security
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: 
> 0001-HBASE-20538-TestSaslFanOutOneBlockAsyncDFSOutput-DISABLE.patch, 
> HBASE-20538-v1.patch, HBASE-20538.patch
>
>
> Infra must have updated our JDK over the weekend. The test 
> TestSaslFanOutOneBlockAsyncDFSOutput fails solidly since.
> [~gabor.bota] ran into it already over in HDFS-13494 and provided nice 
> pointers as to cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20968) list_procedures_test fails due to no matching regex

2018-07-29 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561040#comment-16561040
 ] 

Ted Yu commented on HBASE-20968:


bq. -Dhadoop-three.version=3.0.0

Have you tried without the above ? The default version of hadoop3 has been 
changed lately.

> list_procedures_test fails due to no matching regex
> ---
>
> Key: HBASE-20968
> URL: https://issues.apache.org/jira/browse/HBASE-20968
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jack Bearden
>Priority: Major
>
> From test output against hadoop3:
> {code}
> 2018-07-28 12:04:24,838 DEBUG [Time-limited test] 
> procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.  
> ShellTestProcedure
> 2018-07-28 12:04:24,864 INFO  [RS-EventLoopGroup-1-3] 
> ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, 
> version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), 
> service=MasterService
> 2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): 
> Checking to see if procedure is done pid=11
> ^[[38;5;196mF^[[0m
> ===
> Failure: 
> ^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m
> src/test/ruby/shell/list_procedures_test.rb:65:in `block in 
> test_list_procedures'
>  62: end
>  63:   end
>  64:
> ^[[48;5;124;38;5;231;1m  => 65:   assert_equal(1, matching_lines)^[[0m
>  66: end
>  67:   end
>  68: end
> <^[[48;5;34;38;5;231;1m1^[[0m> expected but was
> <^[[48;5;124;38;5;231;1m0^[[0m>
> ===
> ...
> 2018-07-28 12:04:25,374 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1316): Finished pid=12, state=SUCCESS, 
> hasLock=false; org.apache.hadoop.hbase.client.procedure.   
> ShellTestProcedure in 336msec
> {code}
> The completion of the ShellTestProcedure was after the assertion was raised.
> {code}
> def create_procedure_regexp(table_name)
>   regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \
> {code}
> The regex used by the test isn't found in test output either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)