[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-31 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168631#comment-17168631
 ] 

Hudson commented on HBASE-20226:


Results for branch branch-1
[build #1335 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1335/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1335//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1335//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1335//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168283#comment-17168283
 ] 

Hudson commented on HBASE-20226:


Results for branch branch-2.2
[build #922 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/922/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/922//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/922//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/922//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167991#comment-17167991
 ] 

Hudson commented on HBASE-20226:


Results for branch master
[build #1798 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1798/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1798/General_20Nightly_20Build_20Report/]






(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1798/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1798/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/master/1798//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.2.7
>
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167978#comment-17167978
 ] 

Hudson commented on HBASE-20226:


Results for branch branch-2.3
[build #195 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/195/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/195/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/195/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/195/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/195/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.2.7
>
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167634#comment-17167634
 ] 

Hudson commented on HBASE-20226:


Results for branch branch-2
[build #2762 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2762/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2762/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2756/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2742/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2757/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.2.7
>
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-28 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166636#comment-17166636
 ] 

Bharath Vissapragada commented on HBASE-20226:
--

[~zyork] Thanks for the quick clarification. We use a "fast" FS (non-S3) but 
the delete is still choked for some HDFS side reason and we clearly aren't 
saturating the handlers and CPU on the namenode side. That is very much visible 
in these deletes that are sequential. So the attempt here is to parallelize 
that. I put up the patch [here|https://github.com/apache/hbase/pull/2159], you 
have a few cycles to take look?

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-28 Thread Zach York (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166618#comment-17166618
 ] 

Zach York commented on HBASE-20226:
---

[~bharathv] This improvement can still be valid if you are using a slow 
filesystem. The linked JIRA solves this issue by allowing the user to move to a 
faster filesystem for temporary storage, but for people who don't want 
that/can't do that, maybe this improvement still makes sense

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Assignee: Bharath Vissapragada
>Priority: Minor
>  Labels: perfomance
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-28 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166548#comment-17166548
 ] 

Bharath Vissapragada commented on HBASE-20226:
--

Thanks for taking a look [~yuzhih...@gmail.com]. I've [cleaned 
up|https://github.com/apache/hbase/pull/2159] the patch a bit..

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-27 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166082#comment-17166082
 ] 

Ted Yu commented on HBASE-20226:


{code}
+if (v1Regions.size() > 0 || v2Regions.size() > 0) {
{code}
It seems the thread pool is needed when v1Regions.size()+v2Regions.size() > 1.

There are also a few findbugs warnings to be addressed.

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-27 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166068#comment-17166068
 ] 

Bharath Vissapragada commented on HBASE-20226:
--

bq. This is related to the work going on in HBASE-21098. It's likely that this 
Jira is no longer needed after that work is concluded.

[~zyork] Why is this no longer needed? I think the attached patch parallelizes 
deletes by adding them to a thread pool while the parent jira (HBASE-21098) 
doesn't do it right?

We are running into bottlenecks on this sequential delete and I saw this jira. 
Did I miss something?


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605110#comment-16605110
 ] 

Hadoop QA commented on HBASE-20226:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
22s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 13 new + 5 unchanged 
- 0 fixed = 18 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
12s{color} | {color:red} hbase-server generated 3 new + 0 unchanged - 0 fixed = 
3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 51s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  v2Regions could be null and is guaranteed to be dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:is guaranteed to be dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:[line 516] |
|  |  Possible null pointer dereference of v1Regions in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:v1Regions in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:[line 516] |
|  |  Nullcheck of v1Regions at line 516 of value previously dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
At SnapshotManifest.java:516 of value previously dereferenced in 

[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-09-05 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605053#comment-16605053
 ] 

Zach York commented on HBASE-20226:
---

This is related to the work going on in HBASE-21098. It's likely that this Jira 
is no longer needed after that work is concluded.

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411165#comment-16411165
 ] 

Steve Loughran commented on HBASE-20226:


Amazon throttle DELETE to the same shared, so speedup will be sublinear, even 
though the cost of a delete/bulk delete is low in terms of network traffic. 

f you are doing bulk deletes in > 1 it's probably best to do a bit of shuffling 
of the list of directories to delete before queuing the operations.


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-03-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406488#comment-16406488
 ] 

Ted Yu commented on HBASE-20226:


{code}
+if (v1Regions.size() > 0 || v2Regions.size() > 0) {
{code}
I think you may tighten the above condition by checking the sum of the sizes.
{code}
+  ThreadPoolExecutor tpoolDelete = 
createExecutor("SnapshotRegionManifestDeletePool");
{code}
where:
{code}
  public static ThreadPoolExecutor createExecutor(final Configuration conf, 
final String name) {
int maxThreads = conf.getInt("hbase.snapshot.thread.pool.max", 8);
{code}
You can add new config, instead of depending on the existing config above.

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-03-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406469#comment-16406469
 ] 

Ted Yu commented on HBASE-20226:


The following is reproducible :
{code}
testReadSnapshotManifest(org.apache.hadoop.hbase.snapshot.TestSnapshotManifest) 
 Time elapsed: 0.064 sec  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.snapshot.TestSnapshotManifest.setup(TestSnapshotManifest.java:83)
{code}
See if fixing FindBugs warnings would avoid the above exception.

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406238#comment-16406238
 ] 

Hadoop QA commented on HBASE-20226:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 4s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
58s{color} | {color:red} hbase-server: The patch generated 13 new + 5 unchanged 
- 0 fixed = 18 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 5s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
16m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
3s{color} | {color:red} hbase-server generated 3 new + 0 unchanged - 0 fixed = 
3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 58s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  v2Regions could be null and is guaranteed to be dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:is guaranteed to be dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:[line 549] |
|  |  Possible null pointer dereference of v1Regions in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:v1Regions in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
Dereferenced at SnapshotManifest.java:[line 516] |
|  |  Nullcheck of v1Regions at line 516 of value previously dereferenced in 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.convertToV2SingleManifest()  
At SnapshotManifest.java:516 of value previously dereferenced in