[jira] [Updated] (HBASE-14067) bundle ruby files for hbase shell into a jar.

2020-10-08 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-14067:

Status: Patch Available  (was: In Progress)

> bundle ruby files for hbase shell into a jar.
> -
>
> Key: HBASE-14067
> URL: https://issues.apache.org/jira/browse/HBASE-14067
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> We currently package all the ruby scripts for the hbase shell by placing them 
> in a directory within lib/. We should be able to put these in a jar file 
> since we rely on jruby.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-14067) bundle ruby files for hbase shell into a jar.

2020-10-08 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-14067:
---

Assignee: Sean Busbey

> bundle ruby files for hbase shell into a jar.
> -
>
> Key: HBASE-14067
> URL: https://issues.apache.org/jira/browse/HBASE-14067
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> We currently package all the ruby scripts for the hbase shell by placing them 
> in a directory within lib/. We should be able to put these in a jar file 
> since we rely on jruby.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-14067) bundle ruby files for hbase shell into a jar.

2020-10-08 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-14067 started by Sean Busbey.
---
> bundle ruby files for hbase shell into a jar.
> -
>
> Key: HBASE-14067
> URL: https://issues.apache.org/jira/browse/HBASE-14067
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> We currently package all the ruby scripts for the hbase shell by placing them 
> in a directory within lib/. We should be able to put these in a jar file 
> since we rely on jruby.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25154) Set java.io.tmpdir to project build directory to avoid writing std*deferred files to /tmp

2020-10-05 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25154:

Status: Patch Available  (was: Open)

> Set java.io.tmpdir to project build directory to avoid writing std*deferred 
> files to /tmp
> -
>
> Key: HBASE-25154
> URL: https://issues.apache.org/jira/browse/HBASE-25154
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25140) HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3

2020-10-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25140:

Component/s: test
 hadoop2
 documentation

> HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3
> -
>
> Key: HBASE-25140
> URL: https://issues.apache.org/jira/browse/HBASE-25140
> Project: HBase
>  Issue Type: Bug
>  Components: documentation, hadoop2, test
>Affects Versions: 2.2.3
>Reporter: Miklos Gergely
>Priority: Major
>
> Running HBaseTestingUtility.startMiniCluster() on HBase 2.2.3 works only with 
> hadoop version range 2.8.0 - 3.0.3, for example with 2.4.1 the following 
> exception occurs:
>  
> {code:java}
> 21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.DFSClient.beginFileLease(long, 
> org.apache.hadoop.hdfs.DFSOutputStream) at 
> java.lang.Class.getDeclaredMethod(Class.java:2130) ~[?:1.8.0_242] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:198)
>  ~[hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:274)
>  [hbase-server-2.2.3.jar:2.2.3] at java.lang.Class.forName0(Native Method) 
> ~[?:1.8.0_242] at java.lang.Class.forName(Class.java:264) [?:1.8.0_242] at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:136)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:136) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProvider(WALFactory.java:175) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:198) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1871)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1589)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.handleReportForDutyResponse(MiniHBaseCluster.java:157)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:184)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:130)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:168)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_242] at 
> javax.security.auth.Subject.doAs(Subject.java:360) [?:1.8.0_242] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1536)
>  [hadoop-common-2.4.1.jar:?] at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:341) 
> [hbase-common-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:165)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
> {code}
> Also upon failure during maven run it would be great if the actual exception 
> would be displayed, not just that "Master not initialized after 20ms".
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25140) HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3

2020-10-01 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205589#comment-17205589
 ] 

Sean Busbey commented on HBASE-25140:
-

{quote}
 for example with 2.4.1 the following exception occurs:
{quote}

Do you mean with Hadoop 2.4.1?

{quote}
HBaseTestingUtility.startMiniCluster() on HBase 2.2.3 works only with hadoop 
version range 2.8.0 - 3.0.3
{quote}

Please see our [reference guide for the expected Hadoop 
compatibility|http://hbase.apache.org/book.html#hadoop]. HBase 2.2.z releases 
should be used with Hadoop 2.8, 2.9, 3.1, and 3.2 (with specific maintenance 
versions mattering for some of those releases). Is the minicluster failing with 
Hadoop 3.1 or 3.2? How are you setting up dependencies?

{quote}
Also upon failure during maven run it would be great if the actual exception 
would be displayed, not just that "Master not initialized after 20ms".
{quote}

Given the problem with the fan-out wal writer you posted I am surprised the 
entire minicluster did not fail with a clear pointer to that message. Could you 
attach logs?


> HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3
> -
>
> Key: HBASE-25140
> URL: https://issues.apache.org/jira/browse/HBASE-25140
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.3
>Reporter: Miklos Gergely
>Priority: Major
>
> Running HBaseTestingUtility.startMiniCluster() on HBase 2.2.3 works only with 
> hadoop version range 2.8.0 - 3.0.3, for example with 2.4.1 the following 
> exception occurs:
>  
> {code:java}
> 21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.DFSClient.beginFileLease(long, 
> org.apache.hadoop.hdfs.DFSOutputStream) at 
> java.lang.Class.getDeclaredMethod(Class.java:2130) ~[?:1.8.0_242] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:198)
>  ~[hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:274)
>  [hbase-server-2.2.3.jar:2.2.3] at java.lang.Class.forName0(Native Method) 
> ~[?:1.8.0_242] at java.lang.Class.forName(Class.java:264) [?:1.8.0_242] at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:136)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:136) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProvider(WALFactory.java:175) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:198) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1871)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1589)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.handleReportForDutyResponse(MiniHBaseCluster.java:157)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:184)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:130)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:168)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_242] at 
> javax.security.auth.Subject.doAs(Subject.java:360) [?:1.8.0_242] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1536)
>  [hadoop-common-2.4.1.jar:?] at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:341) 
> [hbase-common-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:165)
>  

[jira] [Updated] (HBASE-25141) ref guide for 2.2.z builds needs to be updated to include 2.2 specific details

2020-10-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25141:

Priority: Minor  (was: Major)

> ref guide for 2.2.z builds needs to be updated to include 2.2 specific details
> --
>
> Key: HBASE-25141
> URL: https://issues.apache.org/jira/browse/HBASE-25141
> Project: HBase
>  Issue Type: Task
>  Components: documentation, website
>Reporter: Sean Busbey
>Priority: Minor
> Fix For: 2.2.7
>
>
> for example, it's currently missing a column for 2.2 in the hadoop compat 
> matrix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25141) ref guide for 2.2.z builds needs to be updated to include 2.2 specific details

2020-10-01 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-25141:
---

 Summary: ref guide for 2.2.z builds needs to be updated to include 
2.2 specific details
 Key: HBASE-25141
 URL: https://issues.apache.org/jira/browse/HBASE-25141
 Project: HBase
  Issue Type: Task
  Components: documentation, website
Reporter: Sean Busbey
 Fix For: 2.2.7


for example, it's currently missing a column for 2.2 in the hadoop compat 
matrix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25123) Add possibility to set different types of L1 cache

2020-10-01 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205575#comment-17205575
 ] 

Sean Busbey edited comment on HBASE-25123 at 10/1/20, 2:48 PM:
---

if we abstracted out a policy for deciding if a block gets cached and which 
blocks get evicted that would be enough to isolate the changes needed for 
HBASE-23887 I think?


was (Author: busbey):
if we abstracted out a policy for deciding if a block gets cacheing and which 
blocks get evicted that would be enough to isolate the changes needed for 
HBASE-23887 I think?

> Add possibility to set different types of L1 cache
> --
>
> Key: HBASE-25123
> URL: https://issues.apache.org/jira/browse/HBASE-25123
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache
>Reporter: Danil Lipovoy
>Priority: Minor
>
> The feature HBASE-23887 allow speed up to 3 times read performance but maybe 
> it is too complicated. So there are proposals give users possibility to 
> choose type of cache L1. Looks like it needs to change few classes 
> (CombinedBlockCache, InclusiveCombinedBlockCache, CacheConfig) if somebody 
> can code this it would be cool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25123) Add possibility to set different types of L1 cache

2020-10-01 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205575#comment-17205575
 ] 

Sean Busbey commented on HBASE-25123:
-

if we abstracted out a policy for deciding if a block gets cacheing and which 
blocks get evicted that would be enough to isolate the changes needed for 
HBASE-23887 I think?

> Add possibility to set different types of L1 cache
> --
>
> Key: HBASE-25123
> URL: https://issues.apache.org/jira/browse/HBASE-25123
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache
>Reporter: Danil Lipovoy
>Priority: Minor
>
> The feature HBASE-23887 allow speed up to 3 times read performance but maybe 
> it is too complicated. So there are proposals give users possibility to 
> choose type of cache L1. Looks like it needs to change few classes 
> (CombinedBlockCache, InclusiveCombinedBlockCache, CacheConfig) if somebody 
> can code this it would be cool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25123) Add possibility to set different types of L1 cache

2020-10-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25123:

Component/s: BlockCache

> Add possibility to set different types of L1 cache
> --
>
> Key: HBASE-25123
> URL: https://issues.apache.org/jira/browse/HBASE-25123
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache
>Reporter: Danil Lipovoy
>Priority: Minor
>
> The feature HBASE-23887 allow speed up to 3 times read performance but maybe 
> it is too complicated. So there are proposals give users possibility to 
> choose type of cache L1. Looks like it needs to change few classes 
> (CombinedBlockCache, InclusiveCombinedBlockCache, CacheConfig) if somebody 
> can code this it would be cool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-23887) BlockCache performance improve by reduce eviction rate

2020-09-29 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204279#comment-17204279
 ] 

Sean Busbey edited comment on HBASE-23887 at 9/29/20, 8:44 PM:
---

I don't want to keep pushing work on you, but if this change is adopted as an 
opt-in feature for a kind of cache then having that google doc as a markdown or 
pdf file in the {{dev-support/design-docs}} area will help a lot when folks 
need to reason about if using this feature is worthwhile.

I think this an interesting approach to skewed key reads. I would expect it to 
help with zipfian workloads (like YCSB is supposed to do) because the "should 
we bother to cache a new block" is essentially trying to approximate the 
likelihood that a new read is from the tail of the distribution rather than the 
set of frequent items. if something is from the tail then it's not worth 
thrashing trying to chase a cache hit that is very unlikely to come later.


was (Author: busbey):
I don't want to keep pushing work on you, but if this change is adopted as an 
opt-in feature for a kind of cache then having that google doc as a markdown or 
pdf file in the {{dev-support/design-docs}} area will help a lot when folks 
need to reason about if using this feature is worthwhile.

I think this an interesting approach to skewed key reads. I would expect it to 
help with zipfian workloads (like YCSB is supposed to do) because the "should 
we bother to cache a new block" is essentially trying to approximate the 
likelihood that a new read is from the tail of the distribution rather than the 
set of frequent items.

> BlockCache performance improve by reduce eviction rate
> --
>
> Key: HBASE-23887
> URL: https://issues.apache.org/jira/browse/HBASE-23887
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, Performance
>Reporter: Danil Lipovoy
>Assignee: Danil Lipovoy
>Priority: Minor
> Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, BlockCacheEvictionProcess.gif, cmp.png, 
> evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, 
> eviction_100p.png, gc_100p.png, graph.png, image-2020-06-07-08-11-11-929.png, 
> image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, 
> image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, 
> image-2020-06-08-17-38-52-579.png, image-2020-06-08-18-35-48-366.png, 
> image-2020-06-14-20-51-11-905.png, image-2020-06-22-05-57-45-578.png, 
> image-2020-09-23-09-48-59-714.png, image-2020-09-23-10-06-11-189.png, 
> ratio.png, ratio2.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, 
> requests_100p.png, requests_new2_100p.png, requests_new_100p.png, scan.png, 
> scan_and_gets.png, scan_and_gets2.png, wave.png, ycsb_logs.zip
>
>
> Hi!
> I first time here, correct me please if something wrong.
> All latest information is here:
> [https://docs.google.com/document/d/1X8jVnK_3lp9ibpX6lnISf_He-6xrHZL0jQQ7hoTV0-g/edit?usp=sharing]
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC.
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> ---
> Some information below isn't  actual
> ---
>  
>  
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
>  124
>  198
>  223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
>  124 -> 24
>  198 -> 98
>  223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions).
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
>  The key point of the code:
>  Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>   
>  But if we set it 1-99, then will work the next logic:
>   
>   
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && 

[jira] [Commented] (HBASE-23887) BlockCache performance improve by reduce eviction rate

2020-09-29 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204279#comment-17204279
 ] 

Sean Busbey commented on HBASE-23887:
-

I don't want to keep pushing work on you, but if this change is adopted as an 
opt-in feature for a kind of cache then having that google doc as a markdown or 
pdf file in the {{dev-support/design-docs}} area will help a lot when folks 
need to reason about if using this feature is worthwhile.

I think this an interesting approach to skewed key reads. I would expect it to 
help with zipfian workloads (like YCSB is supposed to do) because the "should 
we bother to cache a new block" is essentially trying to approximate the 
likelihood that a new read is from the tail of the distribution rather than the 
set of frequent items.

> BlockCache performance improve by reduce eviction rate
> --
>
> Key: HBASE-23887
> URL: https://issues.apache.org/jira/browse/HBASE-23887
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, Performance
>Reporter: Danil Lipovoy
>Assignee: Danil Lipovoy
>Priority: Minor
> Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, BlockCacheEvictionProcess.gif, cmp.png, 
> evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, 
> eviction_100p.png, gc_100p.png, graph.png, image-2020-06-07-08-11-11-929.png, 
> image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, 
> image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, 
> image-2020-06-08-17-38-52-579.png, image-2020-06-08-18-35-48-366.png, 
> image-2020-06-14-20-51-11-905.png, image-2020-06-22-05-57-45-578.png, 
> image-2020-09-23-09-48-59-714.png, image-2020-09-23-10-06-11-189.png, 
> ratio.png, ratio2.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, 
> requests_100p.png, requests_new2_100p.png, requests_new_100p.png, scan.png, 
> scan_and_gets.png, scan_and_gets2.png, wave.png, ycsb_logs.zip
>
>
> Hi!
> I first time here, correct me please if something wrong.
> All latest information is here:
> [https://docs.google.com/document/d/1X8jVnK_3lp9ibpX6lnISf_He-6xrHZL0jQQ7hoTV0-g/edit?usp=sharing]
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC.
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> ---
> Some information below isn't  actual
> ---
>  
>  
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
>  124
>  198
>  223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
>  124 -> 24
>  198 -> 98
>  223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions).
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
>  The key point of the code:
>  Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>   
>  But if we set it 1-99, then will work the next logic:
>   
>   
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
> if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>   return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
>  hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
>  When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>   
> Descriptions of 

[jira] [Created] (HBASE-25083) make sure the next hbase 1.y release has Hadoop 2.10 as a minimum version

2020-09-22 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-25083:
---

 Summary: make sure the next hbase 1.y release has Hadoop 2.10 as a 
minimum version
 Key: HBASE-25083
 URL: https://issues.apache.org/jira/browse/HBASE-25083
 Project: HBase
  Issue Type: Task
  Components: documentation, hadoop2
Reporter: Sean Busbey
 Fix For: 1.7.0


Our reference guide list of prerequisites still has Hadoop 2.8 and 2.9 listed 
for HBase 1 releases.

* [hadoop 2.8 is 
EOM|https://lists.apache.org/thread.html/r348f7bc93a522f05b7cce78a911854d128a6b1b8bd8124bad4d06ce6%40%3Cuser.hadoop.apache.org%3E]
* [hadoop 2.9 is 
EOM|https://lists.apache.org/thread.html/r16b14cce9504f7a9d228612c6b808e72d8dd20863c78be51a7e04ed5%40%3Cuser.hadoop.apache.org%3E]

The current list in the reference guide for HBase 1.6 is just the 1.5 list 
copied. we should update it to remove 2.8 and 2.9 and make sure we're no longer 
doing build/test based on those versions for branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-09-21 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199742#comment-17199742
 ] 

Sean Busbey commented on HBASE-24802:
-

[~Esalnikov] I believe the analogous Hadoop jira is HADOOP-17171 but it looks 
like that isn't going anywhere.

my WIP on PR 36 will eventually need to work at runtime with some set of Hadoop 
versions we want HBase to run on without hassle. Arbitrary Hadoop versions or 
non-HBase uses of Hadoop won't be first-class priorities, but my guess is 
whatever drop-in replacement HBase comes up with will be your best bet.

> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client, dependencies, thirdparty
>Affects Versions: 1.4.0, 2.2.0, 2.3.0, 1.6.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Critical
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24875) Remove the force param for unassign since it dose not take effect any more

2020-09-15 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-24875.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> Remove the force param for unassign since it dose not take effect any more
> --
>
> Key: HBASE-24875
> URL: https://issues.apache.org/jira/browse/HBASE-24875
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Currently unassign region in fact only close it, so not need force param any 
> more.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25021) Nightly job should skip hadoop-2 integration test for master

2020-09-14 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195411#comment-17195411
 ] 

Sean Busbey commented on HBASE-25021:
-

We can break apart the Hadoop 2 and Hadoop 3 tests and then use Jenkins DSL 
branch detection to decide on which to run. Or we can use the branch 
information that's in env variables to decide on running the Hadoop 2 test.

> Nightly job should skip hadoop-2 integration test for master
> 
>
> Key: HBASE-25021
> URL: https://issues.apache.org/jira/browse/HBASE-25021
> Project: HBase
>  Issue Type: Bug
>  Components: build, scripts
>Reporter: Duo Zhang
>Priority: Major
>
> Since master does not support hadoop 2.x any more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25018) EOM cleanup

2020-09-13 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25018:

Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

website now updated. that's all I noticed.

> EOM cleanup
> ---
>
> Key: HBASE-25018
> URL: https://issues.apache.org/jira/browse/HBASE-25018
> Project: HBase
>  Issue Type: Task
>  Components: community, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> the [foundation downloads area for the 
> project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
> be present anymore. Also it's missing an EOM marker for 1.3.
> the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25018) EOM cleanup

2020-09-13 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195066#comment-17195066
 ] 

Sean Busbey commented on HBASE-25018:
-

ref guide changes have merged. waiting on a website build.

> EOM cleanup
> ---
>
> Key: HBASE-25018
> URL: https://issues.apache.org/jira/browse/HBASE-25018
> Project: HBase
>  Issue Type: Task
>  Components: community, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the [foundation downloads area for the 
> project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
> be present anymore. Also it's missing an EOM marker for 1.3.
> the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25019) check if we're impacted by MASSEMBLY-941 and mitigate if needed

2020-09-12 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-25019:
---

 Summary: check if we're impacted by MASSEMBLY-941 and mitigate if 
needed
 Key: HBASE-25019
 URL: https://issues.apache.org/jira/browse/HBASE-25019
 Project: HBase
  Issue Type: Task
  Components: scripts
Affects Versions: 1.6.0, 2.3.1, 2.3.0, 3.0.0-alpha-1, 1.7.0
Reporter: Sean Busbey
 Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0


MASSEMBLY-941 notes a bug starting in version 3.2.0 of the assembly plugin 
where scripts lose their executable bit.

We've had this version since updating to the apache parent pom version 22 in 
HBASE-23675. We should check our release artifacts to see if we are impacted 
and if so downgrade the assembly plugin in our poms to 3.1.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25018) EOM cleanup

2020-09-12 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25018:

Status: Patch Available  (was: In Progress)

> EOM cleanup
> ---
>
> Key: HBASE-25018
> URL: https://issues.apache.org/jira/browse/HBASE-25018
> Project: HBase
>  Issue Type: Task
>  Components: community, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the [foundation downloads area for the 
> project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
> be present anymore. Also it's missing an EOM marker for 1.3.
> the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25018) EOM cleanup

2020-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194921#comment-17194921
 ] 

Sean Busbey commented on HBASE-25018:
-

I pushed a commit to the dist.a.o svn repo that 

* deleted 2.2.4 (since 2.2.5 is there)
* updated the stable pointer from 2.2.4 to 2.2.5
* deleted hbase-1.3.6 (since 1.3 is EOM)
* updated the header message to note EOM for 1.3, fix a reference to the 
stable-1 pointer we decided not to have, and cleaned up some text phrasing.

> EOM cleanup
> ---
>
> Key: HBASE-25018
> URL: https://issues.apache.org/jira/browse/HBASE-25018
> Project: HBase
>  Issue Type: Task
>  Components: community, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the [foundation downloads area for the 
> project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
> be present anymore. Also it's missing an EOM marker for 1.3.
> the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-25018) EOM cleanup

2020-09-12 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-25018 started by Sean Busbey.
---
> EOM cleanup
> ---
>
> Key: HBASE-25018
> URL: https://issues.apache.org/jira/browse/HBASE-25018
> Project: HBase
>  Issue Type: Task
>  Components: community, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the [foundation downloads area for the 
> project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
> be present anymore. Also it's missing an EOM marker for 1.3.
> the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25018) EOM cleanup

2020-09-12 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-25018:
---

 Summary: EOM cleanup
 Key: HBASE-25018
 URL: https://issues.apache.org/jira/browse/HBASE-25018
 Project: HBase
  Issue Type: Task
  Components: community, website
Reporter: Sean Busbey
Assignee: Sean Busbey


the [foundation downloads area for the 
project|https://downloads.apache.org/hbase/] has some versions that shouldn't 
be present anymore. Also it's missing an EOM marker for 1.3.

the ref guide also includes several EOM versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24932) Reduce the number of builds to keep for flaky find job

2020-08-23 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182755#comment-17182755
 ] 

Sean Busbey commented on HBASE-24932:
-

I'm getting more detail from infra on what space is being used. Something is 
amiss because a given job should have < 40KB of artifacts. With a retention of 
30 builds and 6 branches that's less than 10MB. I think fixing this disconnect 
is a better use of time then eliminating the job history.

> Reduce the number of builds to keep for flaky find job
> --
>
> Key: HBASE-24932
> URL: https://issues.apache.org/jira/browse/HBASE-24932
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, flakies, scripts
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24936) review Jenkins build artifacts

2020-08-23 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182754#comment-17182754
 ] 

Sean Busbey commented on HBASE-24936:
-

Example job on ci-hadoop for copying stuff to the nightly host:

https://ci-hadoop.apache.org/job/infra-test-nightlies/configure

> review Jenkins build artifacts
> --
>
> Key: HBASE-24936
> URL: https://issues.apache.org/jira/browse/HBASE-24936
> Project: HBase
>  Issue Type: Task
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> Post move to the ci-hadoop build servers we are now the biggest user of 
> space. That is not a problem in and of itself, but the master node has run 
> out of disk space twice now. As of this snapshot we are using 125GB of 
> storage and the next largest project is only using 20GB.
> https://paste.apache.org/kyrds
> We should review our builds for any issues and come up with expectations for 
> what our steady-state disk usage should look like
> * we are supposed to compress any test logs (usually this gets us 90-99% 
> space savings)
> * we are supposed to clean up workspaces when jobs are done
> * we are supposed to keep a fixed window of prior builds (either by days or 
> number of runs)
> If all of our jobs are currently following these guidelines, another 
> possibility is to push the artifacts we need over to 
> [nightlies.a.o|https://nightlies.apache.org/authoring.html]. Barring that, we 
> should formally request asf infra set up [a plugin for storing artifact on 
> s3|https://plugins.jenkins.io/artifact-manager-s3/].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24932) Reduce the number of builds to keep for flaky find job

2020-08-23 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182709#comment-17182709
 ] 

Sean Busbey commented on HBASE-24932:
-

At first I thought we only would need the latest build of the flaky job finder. 
But the more I think about it, if we don't keep any old flaky reports then 
there's no way to find out how a branch has been doing for flaky failures over 
time.

I'd rather see if there's a way we can bring down the per-build size or at 
least keep the report HMTL since that only appears to be ~30KB.

> Reduce the number of builds to keep for flaky find job
> --
>
> Key: HBASE-24932
> URL: https://issues.apache.org/jira/browse/HBASE-24932
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, flakies, scripts
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24936) review Jenkins build artifacts

2020-08-23 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182706#comment-17182706
 ] 

Sean Busbey commented on HBASE-24936:
-

wishlist: add a stage that can run on master and abort our builds that use a 
lot of space when freespace is low. that would let the lighter weight builds 
keep working.

> review Jenkins build artifacts
> --
>
> Key: HBASE-24936
> URL: https://issues.apache.org/jira/browse/HBASE-24936
> Project: HBase
>  Issue Type: Task
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> Post move to the ci-hadoop build servers we are now the biggest user of 
> space. That is not a problem in and of itself, but the master node has run 
> out of disk space twice now. As of this snapshot we are using 125GB of 
> storage and the next largest project is only using 20GB.
> https://paste.apache.org/kyrds
> We should review our builds for any issues and come up with expectations for 
> what our steady-state disk usage should look like
> * we are supposed to compress any test logs (usually this gets us 90-99% 
> space savings)
> * we are supposed to clean up workspaces when jobs are done
> * we are supposed to keep a fixed window of prior builds (either by days or 
> number of runs)
> If all of our jobs are currently following these guidelines, another 
> possibility is to push the artifacts we need over to 
> [nightlies.a.o|https://nightlies.apache.org/authoring.html]. Barring that, we 
> should formally request asf infra set up [a plugin for storing artifact on 
> s3|https://plugins.jenkins.io/artifact-manager-s3/].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24936) review Jenkins build artifacts

2020-08-23 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24936:
---

 Summary: review Jenkins build artifacts
 Key: HBASE-24936
 URL: https://issues.apache.org/jira/browse/HBASE-24936
 Project: HBase
  Issue Type: Task
Reporter: Sean Busbey
Assignee: Sean Busbey


Post move to the ci-hadoop build servers we are now the biggest user of space. 
That is not a problem in and of itself, but the master node has run out of disk 
space twice now. As of this snapshot we are using 125GB of storage and the next 
largest project is only using 20GB.

https://paste.apache.org/kyrds

We should review our builds for any issues and come up with expectations for 
what our steady-state disk usage should look like

* we are supposed to compress any test logs (usually this gets us 90-99% space 
savings)
* we are supposed to clean up workspaces when jobs are done
* we are supposed to keep a fixed window of prior builds (either by days or 
number of runs)

If all of our jobs are currently following these guidelines, another 
possibility is to push the artifacts we need over to 
[nightlies.a.o|https://nightlies.apache.org/authoring.html]. Barring that, we 
should formally request asf infra set up [a plugin for storing artifact on 
s3|https://plugins.jenkins.io/artifact-manager-s3/].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-08-16 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24802:

Affects Version/s: 1.4.0
   2.2.0
   1.6.0

> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client, dependencies, thirdparty
>Affects Versions: 1.4.0, 2.2.0, 2.3.0, 1.6.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Critical
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-08-16 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24802:

Priority: Critical  (was: Major)

> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client, dependencies, thirdparty
>Affects Versions: 2.3.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Critical
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-08-16 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24802:

Component/s: thirdparty
 dependencies

> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client, dependencies, thirdparty
>Affects Versions: 2.3.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Major
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-08-16 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-24802:
---

Assignee: Sean Busbey

> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.3.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Major
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24802) Please fix CVEs by removing reference to htrace-core4

2020-08-16 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24802 started by Sean Busbey.
---
> Please fix CVEs by removing reference to htrace-core4
> -
>
> Key: HBASE-24802
> URL: https://issues.apache.org/jira/browse/HBASE-24802
> Project: HBase
>  Issue Type: Bug
>  Components: Client, dependencies, thirdparty
>Affects Versions: 2.3.0
>Reporter: Rodney Aaron Stainback
>Assignee: Sean Busbey
>Priority: Critical
>
> htrace-core4 is a retired project and even on the latest version they Shade 
> Jackson databind version 2.4.0 which has the following CVEs:
> |cve|severity|cvss|
> |CVE-2017-15095|critical|9.8|
> |CVE-2018-1000873|medium|6.5|
> |CVE-2018-14718|critical|9.8|
> |CVE-2018-5968|high|8.1|
> |CVE-2018-7489|critical|9.8|
> |CVE-2019-14540|critical|9.8|
> |CVE-2019-14893|critical|9.8|
> |CVE-2019-16335|critical|9.8|
> |CVE-2019-16942|critical|9.8|
> |CVE-2019-16943|critical|9.8|
> |CVE-2019-17267|critical|9.8|
> |CVE-2019-17531|critical|9.8|
> |CVE-2019-20330|critical|9.8|
> |CVE-2020-10672|high|8.8|
> |CVE-2020-10673|high|8.8|
> |CVE-2020-10968|high|8.8|
> |CVE-2020-10969|high|8.8|
> |CVE-2020-1|high|8.8|
> |CVE-2020-2|high|8.8|
> |CVE-2020-3|high|8.8|
> |CVE-2020-11619|critical|9.8|
> |CVE-2020-11620|critical|9.8|
> |CVE-2020-14060|high|8.1|
> |CVE-2020-14061|high|8.1|
> |CVE-2020-14062|high|8.1|
> |CVE-2020-14195|high|8.1|
> |CVE-2020-8840|critical|9.8|
> |CVE-2020-9546|critical|9.8|
> |CVE-2020-9547|critical|9.8|
> |CVE-2020-9548|critical|9.8|
>  
> Our security team is trying to block us from using hbase because of this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-13 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24869:

Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-13 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177454#comment-17177454
 ] 

Sean Busbey commented on HBASE-24869:
-

I checked on the job on ci-hadoop today and things appear to be working 
properly. closing this out.

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176392#comment-17176392
 ] 

Sean Busbey commented on HBASE-24869:
-

merged the PR. disabled the job on builds.a.o. updated the job on ci-hadoop.a.o 
to use master instead of the feature branch. started a test build

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24869:

Status: Patch Available  (was: In Progress)

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176007#comment-17176007
 ] 

Sean Busbey commented on HBASE-24869:
-

build 6 failed as expected:
https://ci-hadoop.apache.org/job/HBase/job/hbase_generate_website/6/console

and the email notification looks reasonable now:

https://lists.apache.org/thread.html/r0ba69e2ec439c95f1311b70bd10d466b81b0e8bcda4fc859239e44cd%40%3Cdev.hbase.apache.org%3E

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176008#comment-17176008
 ] 

Sean Busbey commented on HBASE-24869:
-

I think that has all of our bases covered for equivalent functionality. putting 
up a PR

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175993#comment-17175993
 ] 

Sean Busbey commented on HBASE-24869:
-

verified build 3 has the expected artifacts and it cleaned up the workspace

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175991#comment-17175991
 ] 

Sean Busbey commented on HBASE-24869:
-

build 2's failure sent an email to the dev list, and it has like no details 
filled in:

https://lists.apache.org/thread.html/r24b5a5e0341cd2a658a8c93538702f6f3d8528c0ad4f42f234555d9e%40%3Cdev.hbase.apache.org%3E

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175990#comment-17175990
 ] 

Sean Busbey commented on HBASE-24869:
-

build 3 worked:

https://ci-hadoop.apache.org/job/HBase/job/hbase_generate_website/3/console

this commit came from the job:

https://github.com/apache/hbase-site/commit/0b1ad5eab6f29b479dbf5223ffd372682d74ce86

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175982#comment-17175982
 ] 

Sean Busbey commented on HBASE-24869:
-

* [WIP branch (will have force 
pushes)|https://github.com/apache/hbase/tree/HBASE-24869]
* [WIP job (currently points to 
branch)|https://ci-hadoop.apache.org/job/HBase/job/hbase_generate_website/]


> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175974#comment-17175974
 ] 

Sean Busbey commented on HBASE-24869:
-

bq. needs a job name that has no spaces (or fix the script to handle paths with 
spaces)

I am unable to reproduce a problem using the current script with paths that 
have spaces, just fyi.

the failure is on calling maven's version command. so maybe it's something 
specific to the version of maven?

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175954#comment-17175954
 ] 

Sean Busbey commented on HBASE-24869:
-

* email subject
{code}
${BUILD_STATUS}: HBase Generate Website
{code}
* email body
{code}
Build status: ${BUILD_STATUS}

The HBase website has not been updated to incorporate HBase commit 
${CURRENT_HBASE_COMMIT}.

See ${BUILD_URL}console
{code}
* send to dev@hbase
* only send on failure

> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/]
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/]
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24869:

Description: 
Update our website generation so we can use it on the new jenkins ci server

* needs a job name that has no spaces (or fix the script to handle paths with 
spaces)
* needs to only run on nodes labeled git-websites (so it will have the creds to 
push updates)
* needs to set editable email notification on failure (details in comment)

Also we will need to converte to a pipeline DSL
* define tools, namely maven (alternative get [Tool Environment 
Plugin|https://plugins.jenkins.io/toolenv/])
* set timeout for 4 hours (alternative get [build timeout 
plugin|https://plugins.jenkins.io/build-timeout/])
* needs to clean worksapce when done (haven't found an alternatiave, maybe it's 
a default for non-pipeline jobs now?)


  was:
Update our website generation so we can use it on the new jenkins ci server

* needs a job name that has no spaces (or fix the script to handle paths with 
spaces)
* needs to only run on nodes labeled git-websites (so it will have the creds to 
push updates)
* needs to set editable email notification on failure (details in comment)

Also we will need to converte to a pipeline DSL
* define tools, namely maven (alternative get [Tool Environment 
Plugin|https://plugins.jenkins.io/toolenv/]
* set timeout for 4 hours (alternative get [build timeout 
plugin|https://plugins.jenkins.io/build-timeout/]
* needs to clean worksapce when done (haven't found an alternatiave, maybe it's 
a default for non-pipeline jobs now?)



> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/])
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/])
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24869 started by Sean Busbey.
---
> migrate website generation to new asf jenkins
> -
>
> Key: HBASE-24869
> URL: https://issues.apache.org/jira/browse/HBASE-24869
> Project: HBase
>  Issue Type: Task
>  Components: build, website
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Update our website generation so we can use it on the new jenkins ci server
> * needs a job name that has no spaces (or fix the script to handle paths with 
> spaces)
> * needs to only run on nodes labeled git-websites (so it will have the creds 
> to push updates)
> * needs to set editable email notification on failure (details in comment)
> Also we will need to converte to a pipeline DSL
> * define tools, namely maven (alternative get [Tool Environment 
> Plugin|https://plugins.jenkins.io/toolenv/]
> * set timeout for 4 hours (alternative get [build timeout 
> plugin|https://plugins.jenkins.io/build-timeout/]
> * needs to clean worksapce when done (haven't found an alternatiave, maybe 
> it's a default for non-pipeline jobs now?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24869) migrate website generation to new asf jenkins

2020-08-11 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24869:
---

 Summary: migrate website generation to new asf jenkins
 Key: HBASE-24869
 URL: https://issues.apache.org/jira/browse/HBASE-24869
 Project: HBase
  Issue Type: Task
  Components: build, website
Reporter: Sean Busbey
Assignee: Sean Busbey


Update our website generation so we can use it on the new jenkins ci server

* needs a job name that has no spaces (or fix the script to handle paths with 
spaces)
* needs to only run on nodes labeled git-websites (so it will have the creds to 
push updates)
* needs to set editable email notification on failure (details in comment)

Also we will need to converte to a pipeline DSL
* define tools, namely maven (alternative get [Tool Environment 
Plugin|https://plugins.jenkins.io/toolenv/]
* set timeout for 4 hours (alternative get [build timeout 
plugin|https://plugins.jenkins.io/build-timeout/]
* needs to clean worksapce when done (haven't found an alternatiave, maybe it's 
a default for non-pipeline jobs now?)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24788) Fix the connection leaks on getting hbase admin from unclosed connection

2020-08-08 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173652#comment-17173652
 ] 

Sean Busbey commented on HBASE-24788:
-

Thanks for the quick fix!

> Fix the connection leaks on getting hbase admin from unclosed connection
> 
>
> Key: HBASE-24788
> URL: https://issues.apache.org/jira/browse/HBASE-24788
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.6.0
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
>
> Observed the significant increase in ZK connection on performance testing on 
> map reduce jobs. Turns out the 
> [TableOutputFormat.checkOutputSpecs()|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java#L182]
>   is not closing the connection it uses to get the hbase admin. It closes the 
> hbase admin but never close the connection to get the admin.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24788) Fix the connection leaks on getting hbase admin from unclosed connection

2020-08-07 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-24788:
-

this change broke source compatibility by removing the throws for 
{{InterruptedException}} from the {{getRecordWriter}} and {{checkOutputSpecs}} 
methods of {{TableOutputFormat}}.

I don't see any discussion of doing so, so I am guessing it was an oversight. 
Normally we don't allow these kinds of changes in maintenance or minor releases.

[~te...@apache.org], [~vjasani], and [~bharathv]: please fix for at least 
branch-1, branch-2, branch-2.3, and branch-2.2. If you want to keep the removal 
for master please flag the change as incompatible and release note it 
apparently.

> Fix the connection leaks on getting hbase admin from unclosed connection
> 
>
> Key: HBASE-24788
> URL: https://issues.apache.org/jira/browse/HBASE-24788
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.6.0
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
>
> Observed the significant increase in ZK connection on performance testing on 
> map reduce jobs. Turns out the 
> [TableOutputFormat.checkOutputSpecs()|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java#L182]
>   is not closing the connection it uses to get the hbase admin. It closes the 
> hbase admin but never close the connection to get the admin.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

2020-08-05 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24713:

Fix Version/s: 2.4.0

> RS startup with FSHLog throws NPE after HBASE-21751
> ---
>
> Key: HBASE-24713
> URL: https://issues.apache.org/jira/browse/HBASE-24713
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
> at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/x:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24758) Avoid flooding replication source RSes logs when no sinks are available

2020-08-05 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171749#comment-17171749
 ] 

Sean Busbey commented on HBASE-24758:
-

why didn't this go to branch-2.2?

> Avoid flooding replication source RSes logs when no sinks are available 
> 
>
> Key: HBASE-24758
> URL: https://issues.apache.org/jira/browse/HBASE-24758
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.5
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0
>
>
> On HBaseInterClusterReplicationEndpoint.replicate, if no sinks are returned 
> by ReplicationSinkManager, say remote peer is not available, we log message 
> below and return false to source shipper thread, which then keeps retrying, 
> flooding source RS log with the below messages:
> {noformat}
> WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  No replication sinks found, returning without replicating. The source should 
> retry with the same set of edits.
> {noformat}
> This condition could also cause ReplicationSinkManager.chooseSinks to blow an 
> NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24805) HBaseTestingUtility.getConnection should be threadsafe

2020-08-04 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24805:

Fix Version/s: 2.4.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Incompatible change
 Release Note: 

Users of `HBaseTestingUtility` can now safely call the `getConnection` method 
from multiple threads.

As a consequence of refactoring to improve the thread safety of the HBase 
testing classes, the protected `conf` member of the  
`HBaseCommonTestingUtility` class has been marked final. Downstream users who 
extend from the class hierarchy rooted at this class will need to pass the 
Configuration instance they want used to their super constructor rather than 
overwriting the instance variable.
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> HBaseTestingUtility.getConnection should be threadsafe
> --
>
> Key: HBASE-24805
> URL: https://issues.apache.org/jira/browse/HBASE-24805
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0
>
>
> the current javadoc for getConnection carries a thread safety warning:
> {code}
> /**
> * Get a Connection to the cluster. Not thread-safe (This class needs a 
> lot of work to make it
> * thread-safe).
> * @return A Connection that can be shared. Don't close. Will be closed on 
> shutdown of cluster.
> */
>public Connection getConnection() throws IOException {
> {code}
> We then ignore that warning across our test base. We should make the method 
> threadsafe since the intention is to share a single Connection across all 
> users of the HTU instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

2020-08-03 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24813:

Component/s: Replication

> ReplicationSource should clear buffer usage on ReplicationSourceManager upon 
> termination
> 
>
> Key: HBASE-24813
> URL: https://issues.apache.org/jira/browse/HBASE-24813
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> Following investigations on the issue described by [~elserj] on HBASE-24779, 
> we found out that once a peer is removed, thus killing peers related 
> *ReplicationSource* instance, it may leave 
> *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if 
> *ReplicationSourceWALReader* had put some entries on its queue to be 
> processed by *ReplicationSourceShipper,* but the peer removal killed the 
> shipper before it could process the pending entries. When 
> *ReplicationSourceWALReader* thread add entries to the queue, it increments 
> *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. 
> When those entries are read by *ReplicationSourceShipper,* 
> *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also 
> decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* 
> is terminated, otherwise those unprocessed entries size would be consuming 
> *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets 
> restarted. This may be a problem for deployments with multiple peers, or if 
> new peers are added.**



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24807) Backport HBASE-20417 to branch-1

2020-08-03 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24807:

Component/s: Replication

> Backport HBASE-20417 to branch-1
> 
>
> Key: HBASE-24807
> URL: https://issues.apache.org/jira/browse/HBASE-24807
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> The wal reader shouldn't keep running with peer disabled. We need to backport 
> HBASE-20417 to branch-1 or do something similar, if backport isn't possible 
> due to differences in the code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24476) release scripts should provide timing information

2020-08-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-24476.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> release scripts should provide timing information
> -
>
> Key: HBASE-24476
> URL: https://issues.apache.org/jira/browse/HBASE-24476
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> right now I can get timing from the individual maven commands but it would be 
> nice to get higher level times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24572) release scripts should try to use a keyid when refering to GPG keys.

2020-08-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-24572.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> release scripts should try to use a keyid when refering to GPG keys.
> 
>
> Key: HBASE-24572
> URL: https://issues.apache.org/jira/browse/HBASE-24572
> Project: HBase
>  Issue Type: Task
>  Components: build, community
>Reporter: Nick Dimiduk
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Right now the template us substituting the release manager's email address in 
> for the variable {{GPG_KEY}}. I think it doesn't hurt to make note of the 
> email address, but what we really want here is the key's fingerprint, or some 
> meaningly identifiable portion of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24805) HBaseTestingUtility.getConnection should be threadsafe

2020-07-31 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24805:

Status: Patch Available  (was: Open)

> HBaseTestingUtility.getConnection should be threadsafe
> --
>
> Key: HBASE-24805
> URL: https://issues.apache.org/jira/browse/HBASE-24805
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the current javadoc for getConnection carries a thread safety warning:
> {code}
> /**
> * Get a Connection to the cluster. Not thread-safe (This class needs a 
> lot of work to make it
> * thread-safe).
> * @return A Connection that can be shared. Don't close. Will be closed on 
> shutdown of cluster.
> */
>public Connection getConnection() throws IOException {
> {code}
> We then ignore that warning across our test base. We should make the method 
> threadsafe since the intention is to share a single Connection across all 
> users of the HTU instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24805) HBaseTestingUtility.getConnection should be threadsafe

2020-07-31 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-24805:
---

Assignee: Sean Busbey

> HBaseTestingUtility.getConnection should be threadsafe
> --
>
> Key: HBASE-24805
> URL: https://issues.apache.org/jira/browse/HBASE-24805
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> the current javadoc for getConnection carries a thread safety warning:
> {code}
> /**
> * Get a Connection to the cluster. Not thread-safe (This class needs a 
> lot of work to make it
> * thread-safe).
> * @return A Connection that can be shared. Don't close. Will be closed on 
> shutdown of cluster.
> */
>public Connection getConnection() throws IOException {
> {code}
> We then ignore that warning across our test base. We should make the method 
> threadsafe since the intention is to share a single Connection across all 
> users of the HTU instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24805) HBaseTestingUtility.getConnection should be threadsafe

2020-07-31 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24805:
---

 Summary: HBaseTestingUtility.getConnection should be threadsafe
 Key: HBASE-24805
 URL: https://issues.apache.org/jira/browse/HBASE-24805
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sean Busbey


the current javadoc for getConnection carries a thread safety warning:
{code}
/**
* Get a Connection to the cluster. Not thread-safe (This class needs a lot 
of work to make it
* thread-safe).
* @return A Connection that can be shared. Don't close. Will be closed on 
shutdown of cluster.
*/
   public Connection getConnection() throws IOException {
{code}

We then ignore that warning across our test base. We should make the method 
threadsafe since the intention is to share a single Connection across all users 
of the HTU instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-30 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24794:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hbase.rowlock.wait.duration should not be <= 0
> --
>
> Key: HBASE-24794
> URL: https://issues.apache.org/jira/browse/HBASE-24794
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.0, 1.5.0, 2.2.0, 2.3.0, 1.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 1.4.14, 2.2.6
>
>
> had a cluster fail after upgrade from hbase 1 because all writes to meta 
> failed.
> master started in maintenance mode looks like (RS hosting meta in non-maint 
> would look similar starting with {{HRegion.doBatchMutate}}):
> {code}
> 2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed getting lock, row=some_user_table
> java.io.IOException: Timed out waiting for lock for row: some_user_table in 
> region 1588230740
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
> at 
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> logging roughly 6k times /second.
> failure was caused by a change in behavior for 
> {{hbase.rowlock.wait.duration}} in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to 
> that change setting the config <= 0 meant that row locks would succeed only 
> if they were immediately available. After the change we fail the lock attempt 
> without checking the lock at all.
> workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
> e.g. 1, if you want row locks to fail quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-11686) Shell code should create a binding / irb workspace instead of polluting the root namespace

2020-07-30 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168088#comment-17168088
 ] 

Sean Busbey commented on HBASE-11686:
-

please update the release note to show an example of both "run this script for 
the shell" as well as "use an interactive shell"

> Shell code should create a binding / irb workspace instead of polluting the 
> root namespace
> --
>
> Key: HBASE-11686
> URL: https://issues.apache.org/jira/browse/HBASE-11686
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Sean Busbey
>Assignee: Elliot Miller
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> Right now, the shell builds a list of commands and then injects them into the 
> root exectution's context
> bin/hirb.rb
> {code}
> # Add commands to this namespace
> @shell.export_commands(self)
> {code}
> hbase-shell/src/main/ruby/shell.rb
> {code}
> def export_commands(where)
>   ::Shell.commands.keys.each do |cmd|
> # here where is the IRB namespace
> # this method just adds the call to the specified command
> # which just references back to 'this' shell object
> # a decently extensible way to add commands
> where.send :instance_eval, <<-EOF
>   def #{cmd}(*args)
> ret = @shell.command('#{cmd}', *args)
> puts
> return ret
>   end
> EOF
>   end
> end
> {code}
> This is an unclean abstraction. For one, it requires that there be an 
> instance variable in the main namespace called '@shell' without making that 
> clear in the docs. Additionally, it complicates maintenance by breaking 
> isolation.
> We should update things so that shell can provide a binding for eval or a 
> workspace for IRB execution and then use it directly when we construct our 
> IRB session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24795) RegionMover should deal with unknown (split/merged) regions

2020-07-30 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168083#comment-17168083
 ] 

Sean Busbey commented on HBASE-24795:
-

sure. I left some requests.

> RegionMover should deal with unknown (split/merged) regions
> ---
>
> Key: HBASE-24795
> URL: https://issues.apache.org/jira/browse/HBASE-24795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> For a cluster with very high load, it is quite common to see flush/compaction 
> happening every minute on each RegionServer. And we have quite high chances 
> of multiple regions going through splitting/merging.
> RegionMover, while unloading all regions (graceful stop), writes down all 
> regions to a local file and while loading them back (graceful start), ensures 
> to bring every single region back from other RSs. While loading regions back, 
> even if a single region can't be moved back, RegionMover considers load() 
> failure. We miss out on possibilities of some regions going through 
> split/merge process and the fact that not all regions written to local file 
> might even exist anymore. Hence, RegionMover should gracefully handle moving 
> any unknown region without marking load() failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-30 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167656#comment-17167656
 ] 

Sean Busbey commented on HBASE-24794:
-

did PR linking break again?

https://github.com/apache/hbase/pull/2174

> hbase.rowlock.wait.duration should not be <= 0
> --
>
> Key: HBASE-24794
> URL: https://issues.apache.org/jira/browse/HBASE-24794
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.0, 1.5.0, 2.2.0, 2.3.0, 1.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 1.4.14, 2.2.6
>
>
> had a cluster fail after upgrade from hbase 1 because all writes to meta 
> failed.
> master started in maintenance mode looks like (RS hosting meta in non-maint 
> would look similar starting with {{HRegion.doBatchMutate}}):
> {code}
> 2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed getting lock, row=some_user_table
> java.io.IOException: Timed out waiting for lock for row: some_user_table in 
> region 1588230740
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
> at 
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> logging roughly 6k times /second.
> failure was caused by a change in behavior for 
> {{hbase.rowlock.wait.duration}} in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to 
> that change setting the config <= 0 meant that row locks would succeed only 
> if they were immediately available. After the change we fail the lock attempt 
> without checking the lock at all.
> workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
> e.g. 1, if you want row locks to fail quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-30 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24794:

Status: Patch Available  (was: In Progress)

> hbase.rowlock.wait.duration should not be <= 0
> --
>
> Key: HBASE-24794
> URL: https://issues.apache.org/jira/browse/HBASE-24794
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0, 2.3.0, 2.2.0, 1.5.0, 1.4.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 1.4.14, 2.2.6
>
>
> had a cluster fail after upgrade from hbase 1 because all writes to meta 
> failed.
> master started in maintenance mode looks like (RS hosting meta in non-maint 
> would look similar starting with {{HRegion.doBatchMutate}}):
> {code}
> 2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed getting lock, row=some_user_table
> java.io.IOException: Timed out waiting for lock for row: some_user_table in 
> region 1588230740
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
> at 
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> logging roughly 6k times /second.
> failure was caused by a change in behavior for 
> {{hbase.rowlock.wait.duration}} in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to 
> that change setting the config <= 0 meant that row locks would succeed only 
> if they were immediately available. After the change we fail the lock attempt 
> without checking the lock at all.
> workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
> e.g. 1, if you want row locks to fail quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24795) RegionMover should deal with unknown (split/merged) regions

2020-07-29 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167448#comment-17167448
 ] 

Sean Busbey edited comment on HBASE-24795 at 7/29/20, 6:51 PM:
---

there's already the {{--noack}} flag that allows moving just those regions that 
are still around. Can you describe what you're proposing beyond that behavior?


was (Author: busbey):
there's already the {{--noack}} flag that should be enough?

> RegionMover should deal with unknown (split/merged) regions
> ---
>
> Key: HBASE-24795
> URL: https://issues.apache.org/jira/browse/HBASE-24795
> Project: HBase
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> For a cluster with very high load, it is quite common to see flush/compaction 
> happening every minute on each RegionServer. And we have quite high chances 
> of multiple regions going through splitting/merging.
> RegionMover, while unloading all regions (graceful stop), writes down all 
> regions to a local file and while loading them back (graceful start), ensures 
> to bring every single region back from other RSs. While loading regions back, 
> even if a single region can't be moved back, RegionMover considers load() 
> failure. We miss out on possibilities of some regions going through 
> split/merge process and the fact that not all regions written to local file 
> might even exist anymore. Hence, RegionMover should gracefully handle moving 
> any unknown region without marking load() failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24795) RegionMover should deal with unknown (split/merged) regions

2020-07-29 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167448#comment-17167448
 ] 

Sean Busbey commented on HBASE-24795:
-

there's already the {{--noack}} flag that should be enough?

> RegionMover should deal with unknown (split/merged) regions
> ---
>
> Key: HBASE-24795
> URL: https://issues.apache.org/jira/browse/HBASE-24795
> Project: HBase
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> For a cluster with very high load, it is quite common to see flush/compaction 
> happening every minute on each RegionServer. And we have quite high chances 
> of multiple regions going through splitting/merging.
> RegionMover, while unloading all regions (graceful stop), writes down all 
> regions to a local file and while loading them back (graceful start), ensures 
> to bring every single region back from other RSs. While loading regions back, 
> even if a single region can't be moved back, RegionMover considers load() 
> failure. We miss out on possibilities of some regions going through 
> split/merge process and the fact that not all regions written to local file 
> might even exist anymore. Hence, RegionMover should gracefully handle moving 
> any unknown region without marking load() failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-29 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24794:

Description: 
had a cluster fail after upgrade from hbase 1 because all writes to meta failed.

master started in maintenance mode looks like (RS hosting meta in non-maint 
would look similar starting with {{HRegion.doBatchMutate}}):

{code}
2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
Failed getting lock, row=some_user_table
java.io.IOException: Timed out waiting for lock for row: some_user_table in 
region 1588230740
at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
at 
org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
at 
org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
at 
org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
at 
org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
at 
org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
at 
org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
at 
org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
at java.lang.Thread.run(Thread.java:745)
{code}

logging roughly 6k times /second.

failure was caused by a change in behavior for {{hbase.rowlock.wait.duration}} 
in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to that change setting the config <= 
0 meant that row locks would succeed only if they were immediately available. 
After the change we fail the lock attempt without checking the lock at all.

workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
e.g. 1, if you want row locks to fail quickly.

  was:
had a cluster fail after upgrade from hbase 1 because all writes to meta failed.

RS with meta looks like:

{code}
2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
Failed getting lock, row=some_user_table
java.io.IOException: Timed out waiting for lock for row: some_user_table in 
region 1588230740
at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
at 
org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
at 

[jira] [Work started] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-29 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24794 started by Sean Busbey.
---
> hbase.rowlock.wait.duration should not be <= 0
> --
>
> Key: HBASE-24794
> URL: https://issues.apache.org/jira/browse/HBASE-24794
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.0, 1.5.0, 2.2.0, 2.3.0, 1.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 1.4.14, 2.2.6
>
>
> had a cluster fail after upgrade from hbase 1 because all writes to meta 
> failed.
> RS with meta looks like:
> {code}
> 2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed getting lock, row=some_user_table
> java.io.IOException: Timed out waiting for lock for row: some_user_table in 
> region 1588230740
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
> at 
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
> at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> logging roughly 6k times /second.
> failure was caused by a change in behavior for 
> {{hbase.rowlock.wait.duration}} in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to 
> that change setting the config <= 0 meant that row locks would succeed only 
> if they were immediately available. After the change we fail the lock attempt 
> without checking the lock at all.
> workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
> e.g. 1, if you want row locks to fail quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24794) hbase.rowlock.wait.duration should not be <= 0

2020-07-29 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24794:
---

 Summary: hbase.rowlock.wait.duration should not be <= 0
 Key: HBASE-24794
 URL: https://issues.apache.org/jira/browse/HBASE-24794
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.6.0, 2.3.0, 2.2.0, 1.5.0, 1.4.0
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 1.4.14, 2.2.6


had a cluster fail after upgrade from hbase 1 because all writes to meta failed.

RS with meta looks like:

{code}
2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
Failed getting lock, row=some_user_table
java.io.IOException: Timed out waiting for lock for row: some_user_table in 
region 1588230740
at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
at 
org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
at 
org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
at 
org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
at 
org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
at 
org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
at 
org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
at 
org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
at java.lang.Thread.run(Thread.java:745)
{code}

logging roughly 6k times /second.

failure was caused by a change in behavior for {{hbase.rowlock.wait.duration}} 
in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to that change setting the config <= 
0 meant that row locks would succeed only if they were immediately available. 
After the change we fail the lock attempt without checking the lock at all.

workaround: set {{hbase.rowlock.wait.duration}} to a small positive number, 
e.g. 1, if you want row locks to fail quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24779) Improve insight into replication WAL readers hung on checkQuota

2020-07-27 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24779:

Component/s: Replication

> Improve insight into replication WAL readers hung on checkQuota
> ---
>
> Key: HBASE-24779
> URL: https://issues.apache.org/jira/browse/HBASE-24779
> Project: HBase
>  Issue Type: Task
>  Components: Replication
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
>
> Helped a customer this past weekend who, with a large number of 
> RegionServers, has some RegionServers which replicated data to a peer without 
> issues while other RegionServers did not.
> The number of queue logs varied over the past 24hrs in the same manner. Some 
> spikes in queued logs into 100's of logs, but other times, only 1's-10's of 
> logs were queued.
> We were able to validate that there were "good" and "bad" RegionServers by 
> creating a test table, assigning it to a regionserver, enabling replication 
> on that table, and validating if the local puts were replicated to a peer. On 
> a good RS, data was replicated immediately. On a bad RS, data was never 
> replicated (at least, on the order of 10's of minutes which we waited).
> On the "bad RS", we were able to observe that the \{{wal-reader}} thread(s) 
> on that RS were spending time in a Thread.sleep() in a different location 
> than the other. Specifically it was sitting in the 
> {{ReplicationSourceWALReader#checkQuota()}}'s sleep call, _not_ the 
> {{handleEmptyWALBatch()}} method on the same class.
> My only assumption is that, somehow, these RegionServers got into a situation 
> where they "allocated" memory from the quota but never freed it. Then, 
> because the WAL reader thinks it has no free memory, it blocks indefinitely 
> and there are no pending edits to ship and (thus) free that memory. A cursory 
> glance at the code gives me a _lot_ of anxiety around places where we don't 
> properly clean it up (e.g. batches that fail to ship, dropping a peer). As a 
> first stab, let me add some more debugging so we can actually track this 
> state properly for the operators and their sanity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24779) Improve insight into replication WAL readers hung on checkQuota

2020-07-27 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165885#comment-17165885
 ] 

Sean Busbey commented on HBASE-24779:
-

{quote}Helped a customer this past weekend who, with a large number of 
RegionServers, has some RegionServers which replicated data to a peer without 
issues while other RegionServers did not.{quote}

hbase 1-ish replication, hbase 2-ish replication, or hbase master-branch-ish 
replication?

> Improve insight into replication WAL readers hung on checkQuota
> ---
>
> Key: HBASE-24779
> URL: https://issues.apache.org/jira/browse/HBASE-24779
> Project: HBase
>  Issue Type: Task
>  Components: Replication
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
>
> Helped a customer this past weekend who, with a large number of 
> RegionServers, has some RegionServers which replicated data to a peer without 
> issues while other RegionServers did not.
> The number of queue logs varied over the past 24hrs in the same manner. Some 
> spikes in queued logs into 100's of logs, but other times, only 1's-10's of 
> logs were queued.
> We were able to validate that there were "good" and "bad" RegionServers by 
> creating a test table, assigning it to a regionserver, enabling replication 
> on that table, and validating if the local puts were replicated to a peer. On 
> a good RS, data was replicated immediately. On a bad RS, data was never 
> replicated (at least, on the order of 10's of minutes which we waited).
> On the "bad RS", we were able to observe that the \{{wal-reader}} thread(s) 
> on that RS were spending time in a Thread.sleep() in a different location 
> than the other. Specifically it was sitting in the 
> {{ReplicationSourceWALReader#checkQuota()}}'s sleep call, _not_ the 
> {{handleEmptyWALBatch()}} method on the same class.
> My only assumption is that, somehow, these RegionServers got into a situation 
> where they "allocated" memory from the quota but never freed it. Then, 
> because the WAL reader thinks it has no free memory, it blocks indefinitely 
> and there are no pending edits to ship and (thus) free that memory. A cursory 
> glance at the code gives me a _lot_ of anxiety around places where we don't 
> properly clean it up (e.g. batches that fail to ship, dropping a peer). As a 
> first stab, let me add some more debugging so we can actually track this 
> state properly for the operators and their sanity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24779) Improve insight into replication WAL readers hung on checkQuota

2020-07-27 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165883#comment-17165883
 ] 

Sean Busbey commented on HBASE-24779:
-

{quote}
We were able to validate that there were "good" and "bad" RegionServers by 
creating a test table, assigning it to a regionserver, enabling replication on 
that table, and validating if the local puts were replicated to a peer. On a 
good RS, data was replicated immediately. On a bad RS, data was never 
replicated (at least, on the order of 10's of minutes which we waited).
{quote}

did this use existing tooling? or something we could improve based off of e.g. 
the canary tool? at a minimum sounds like a good docs addition for the 
troubleshooting stuff in operator-tools or the ref guide

> Improve insight into replication WAL readers hung on checkQuota
> ---
>
> Key: HBASE-24779
> URL: https://issues.apache.org/jira/browse/HBASE-24779
> Project: HBase
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
>
> Helped a customer this past weekend who, with a large number of 
> RegionServers, has some RegionServers which replicated data to a peer without 
> issues while other RegionServers did not.
> The number of queue logs varied over the past 24hrs in the same manner. Some 
> spikes in queued logs into 100's of logs, but other times, only 1's-10's of 
> logs were queued.
> We were able to validate that there were "good" and "bad" RegionServers by 
> creating a test table, assigning it to a regionserver, enabling replication 
> on that table, and validating if the local puts were replicated to a peer. On 
> a good RS, data was replicated immediately. On a bad RS, data was never 
> replicated (at least, on the order of 10's of minutes which we waited).
> On the "bad RS", we were able to observe that the \{{wal-reader}} thread(s) 
> on that RS were spending time in a Thread.sleep() in a different location 
> than the other. Specifically it was sitting in the 
> {{ReplicationSourceWALReader#checkQuota()}}'s sleep call, _not_ the 
> {{handleEmptyWALBatch()}} method on the same class.
> My only assumption is that, somehow, these RegionServers got into a situation 
> where they "allocated" memory from the quota but never freed it. Then, 
> because the WAL reader thinks it has no free memory, it blocks indefinitely 
> and there are no pending edits to ship and (thus) free that memory. A cursory 
> glance at the code gives me a _lot_ of anxiety around places where we don't 
> properly clean it up (e.g. batches that fail to ship, dropping a peer). As a 
> first stab, let me add some more debugging so we can actually track this 
> state properly for the operators and their sanity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22114) Port HBASE-15560 (TinyLFU-based BlockCache) to branch-1

2020-07-27 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165495#comment-17165495
 ] 

Sean Busbey commented on HBASE-22114:
-

IIRC we need the ability for precommit to only run some modules if the using 
jdk8. I think I had a WIP branch somewhere that had this working. If the 
precommit builds are working over on the new CI set up I could try digging it 
up and getting it to work?

> Port HBASE-15560 (TinyLFU-based BlockCache) to branch-1
> ---
>
> Key: HBASE-22114
> URL: https://issues.apache.org/jira/browse/HBASE-22114
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 1.7.0
>
> Attachments: HBASE-22114-branch-1.patch, HBASE-22114-branch-1.patch, 
> HBASE-22114-branch-1.patch
>
>
> HBASE-15560 introduces the TinyLFU cache policy for the blockcache.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> The implementation of HBASE-15560 uses several Java 8 idioms, depends on JRE 
> 8+ type Optional, and has dependencies on libraries compiled with Java 8+ 
> bytecode. It could be backported to branch-1 but must be made optional both 
> at compile time and runtime, enabled by the 'build-with-jdk8' build profile.
> The TinyLFU policy must go into its own build module.
> The blockcache must be modified to load L1 implementation/policy dynamically 
> at startup by reflection if the policy is "TinyLFU"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24476) release scripts should provide timing information

2020-07-25 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24476 started by Sean Busbey.
---
> release scripts should provide timing information
> -
>
> Key: HBASE-24476
> URL: https://issues.apache.org/jira/browse/HBASE-24476
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
>
> right now I can get timing from the individual maven commands but it would be 
> nice to get higher level times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24773) release scripts should check for tools it will at the start of building

2020-07-25 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24773:
---

 Summary: release scripts should check for tools it will at the 
start of building
 Key: HBASE-24773
 URL: https://issues.apache.org/jira/browse/HBASE-24773
 Project: HBase
  Issue Type: Bug
Reporter: Sean Busbey


I just had a RC build fail *after* going through all the work to build 
everything because a recent OS upgrade meant that my {{svn}} command had been 
replaced by a placeholder that provided an error message about how X-Code is no 
longer providing subversion binaries.

getting a new copy of svn installed was easy enough, but if we're going to use 
subversion we should have an init check similar to what we have for yetus / 
maven / etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24286) HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system

2020-07-21 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162298#comment-17162298
 ] 

Sean Busbey commented on HBASE-24286:
-

yes, please start with a PR to the master branch.

> HMaster won't become healthy after after cloning or creating a new cluster 
> pointing at the same file system
> ---
>
> Key: HBASE-24286
> URL: https://issues.apache.org/jira/browse/HBASE-24286
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
>Reporter: Jack Ye
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
>
> h1. How to reproduce:
>  # user starts an HBase cluster on top of a file system
>  # user performs some operations and shuts down the cluster, all the data are 
> still persisted in the file system
>  # user creates a new HBase cluster using a different set of servers on top 
> of the same file system with the same root directory
>  # HMaster cannot initialize
> h1. Root cause:
> During HMaster initialization phase, the following happens:
>  # HMaster waits for namespace table online
>  # AssignmentManager gets all namespace table regions info
>  # region servers of namespace table are already dead, online check fails
>  # HMaster waits for namespace regions online, keep retrying for 1000 times 
> which means forever
> Code waiting for namespace table to be online: 
> https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102
> h1. Stack trace (running on S3):
> 2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] 
> master.HMaster: 
> hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT 
> online; state=\{d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, 
> ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; 
> ServerCrashProcedures=false. Master startup cannot progress, in 
> holding-pattern until region onlined.
> where ip-10-12-13-14.ec2.internal is the old region server hosting the region 
> of hbase:namespace.
> h1. Discussion for the fix
> We see there is a fix for this at branch-3: 
> https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, 
> we would like to know from the community if we should backport this change to 
> branch-2, or if we should just perform a fix with minimum code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

2020-07-21 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162022#comment-17162022
 ] 

Sean Busbey commented on HBASE-24749:
-

excellent! it's worth a heads up to dev@hbase pointing folks here, I think.

{quote}
For recovery from region server crashes or region reload, we persist the 
in-memory HFiles tracking in store file manager to a new HBase admin table, 
‘hbase:storefile’. To prevent loading HFiles from incomplete flushes and 
compactions, and reduce the number of expensive LIST files calls against the 
file system, we will read directly from the hbase:storefile table. A write to 
the storefile table is used as the commit mechanism for a HFile, removing the 
rename from .tmp to the data directory.
{quote}

Could this be a part of meta instead? we just recently got through having 
{{hbase:namespace}} move into meta to improve operational robustness, and this 
proposed storefile lookup seems very likely to be an even greater tripping 
point since all the RS need access.

{quote}
To avoid a circular dependency on the storefile table, the store file manager 
for the meta and storefile tables will be persisted in ZooKeeper.
{quote}

no persistent state in zookeeper please. we could do this via a local region 
controlled by whomever is handling meta. or at least I think that the feature 
would work for this, what do you think [~zhangduo]?

> Direct insert HFiles and Persist in-memory HFile tracking
> -
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
>  Issue Type: Umbrella
>  Components: Compaction, HFile
>Affects Versions: 3.0.0-alpha-1
>Reporter: Tak-Lon (Stephen) Wu
>Priority: Major
>  Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system imple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2020-07-18 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160577#comment-17160577
 ] 

Sean Busbey commented on HBASE-22263:
-

just to confirm, this does impact master and branch-2?

is the patch for branch-1 meaningfully different from the PR currently 
targeting branch-1.4?

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Bo Cui
>Priority: Critical
> Attachments: HBASE-22263-branch-1.v0.add.patch, 
> HBASE-22263-branch-1.v0.patch
>
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and a critical system table is included in the regions that are being 
> recovered, then master initialization itself will end up blocked on this 
> sequence of SCP timeouts. If there are enough of them to exceed the master 
> initialization timeouts, then the situation can be self-sustaining as 
> additional master fails over cause even more duplicative SCPs to be scheduled.
> h3. Indicators:
>  * Master appears to hang; failing to assign regions to available region 
> 

[jira] [Commented] (HBASE-24733) create-release should stage PR with documentation changes

2020-07-13 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156933#comment-17156933
 ] 

Sean Busbey commented on HBASE-24733:
-

what if we flip this on its head and have the periodic website generation 
script check to see if there is a new release that it should include docs for?

> create-release should stage PR with documentation changes
> -
>
> Key: HBASE-24733
> URL: https://issues.apache.org/jira/browse/HBASE-24733
> Project: HBase
>  Issue Type: Task
>  Components: community
>Reporter: Nick Dimiduk
>Priority: Major
>
> {{create-release}} has all the bits it needs to put together commits (and 
> PRs) with documentation changes for both {{hbase.git}} and {{hbase-site.git}} 
> that accompany a release candidate. Let's automate this as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24687) MobFileCleanerChore uses a new Connection for each table each time it runs

2020-07-13 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156782#comment-17156782
 ] 

Sean Busbey commented on HBASE-24687:
-

the larger problem is definitely present on master. More threads probably is a 
good idea; I have seen deployments where it takes over an hour to get through 
the list of mob files to move (e.g. after a period of having either the cleaner 
or compaction disabled for a while). but it will need to be something we can 
tune down in the event HDFS gets overwhelmed.

{quote}
I think the problem here is not about connections, we should see the 
directories in the fs, not tables mob enabled, cos there are table 
modifications( eg. disable mob), which may prevent us from deleting some 
out-of-date files.
{quote}

That's an interesting edge case. I think currently we tell folks to manually 
sideline the mob files after turning off mob and compacting all the data. Doing 
this automatically would be an ease-of-operations improvement, maybe best to do 
it in a dedicated jira though?

> MobFileCleanerChore uses a new Connection for each table each time it runs
> --
>
> Key: HBASE-24687
> URL: https://issues.apache.org/jira/browse/HBASE-24687
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 3.0.0-alpha-1
>Reporter: Manas
>Assignee: Junhong Xu
>Priority: Minor
> Attachments: Screen Shot 2020-07-06 at 6.06.43 PM.png
>
>
> Currently creating a new connection for every table under 
> MobFileCleanerChore.java where we should theoretically just using the 
> connection from HBase masterservices.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24687) MobFileCleanerChore uses a new Connection for each table each time it runs

2020-07-08 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153295#comment-17153295
 ] 

Sean Busbey commented on HBASE-24687:
-

there's a less severe version of this problem in branch-2 based releases where 
we make a new connection per chore invocation.

> MobFileCleanerChore uses a new Connection for each table each time it runs
> --
>
> Key: HBASE-24687
> URL: https://issues.apache.org/jira/browse/HBASE-24687
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 3.0.0-alpha-1
>Reporter: Manas
>Priority: Minor
> Attachments: Screen Shot 2020-07-06 at 6.06.43 PM.png
>
>
> Currently creating a new connection for every table under 
> MobFileCleanerChore.java where we should theoretically just using the 
> connection from HBase masterservices.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24687) MobFileCleanerChore uses a new Connection for each table each time it runs

2020-07-08 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24687:

Affects Version/s: (was: 2.2.3)
   3.0.0-alpha-1

> MobFileCleanerChore uses a new Connection for each table each time it runs
> --
>
> Key: HBASE-24687
> URL: https://issues.apache.org/jira/browse/HBASE-24687
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 3.0.0-alpha-1
>Reporter: Manas
>Priority: Minor
> Attachments: Screen Shot 2020-07-06 at 6.06.43 PM.png
>
>
> Currently creating a new connection for every table under 
> MobFileCleanerChore.java where we should theoretically just using the 
> connection from HBase masterservices.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24687) MobFileCleanerChore uses a new Connection for each table each time it runs

2020-07-07 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24687:

Summary: MobFileCleanerChore uses a new Connection for each table each time 
it runs  (was: New Connection being created for each table)

> MobFileCleanerChore uses a new Connection for each table each time it runs
> --
>
> Key: HBASE-24687
> URL: https://issues.apache.org/jira/browse/HBASE-24687
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.2.3
>Reporter: Manas
>Priority: Minor
> Attachments: Screen Shot 2020-07-06 at 6.06.43 PM.png
>
>
> Currently creating a new connection for every table under 
> MobFileCleanerChore.java where we should theoretically just using the 
> connection from HBase masterservices.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24686) [LOG] Log improvement in Connection#close

2020-07-07 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153100#comment-17153100
 ] 

Sean Busbey commented on HBASE-24686:
-

Better logging here especially including who called close would help us chase 
down a master failure case [~mchakka] has seen unsteadily in test lately for 
branch-2.2. Basically the master's primary connection gets closed but the 
master does not abort and then a bunch of stuff fails.

e.g. the master will log will show the normalizer failing

{code}

2020-07-03 09:57:42,005 ERROR 
org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
normalize regions.
java.io.IOException: connection is closed
at 
org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:241)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:797)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:768)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:727)
at 
org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:215)
at 
org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
at 
org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1815)
at 
org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore.chore(RegionNormalizerChore.java:48)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:188)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

also if you try to use the shell or the java api to create tables that will 
fail at a similar point once the master needs to call 
{{MetaTableAccessor.getMetaHTable}}.

Right now the only indication of a connection closing is a line like this:

{code}
2020-07-03 09:52:38,751 INFO 
org.apache.hadoop.hbase.client.ConnectionImplementation: Closing master 
protocol: MasterService
{code}

a good starting point would be to update ConnectionImplementation with an INFO 
message that says when we are done closing everything and a DEBUG message at 
the start that includes the call stack.

> [LOG] Log improvement in Connection#close
> -
>
> Key: HBASE-24686
> URL: https://issues.apache.org/jira/browse/HBASE-24686
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, logging
>Affects Versions: 2.2.3
>Reporter: mokai
>Priority: Major
>
> We met some customers used hbase connection improperly, some threads call 
> failed since the shared connection closed by one of the threads.
> It's better to print the details when connection closing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-04 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24625:

Component/s: wal
 Replication

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 2.1.0, 2.0.0, 2.2.0, 2.3.0
>Reporter: chenglei
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-04 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24625:

Affects Version/s: 2.1.0
   2.0.0
   2.2.0

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.0, 2.2.0, 2.3.0
>Reporter: chenglei
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-03 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151098#comment-17151098
 ] 

Sean Busbey commented on HBASE-24625:
-

We'll need a release note that includes a downstream facing description of  
what the failure looks like and a workaround if possible. (I think the 
workaround is to switch from the fan out wal to the filesystem wal)

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: chenglei
>Priority: Critical
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-03 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24625:

Fix Version/s: 2.2.6
   2.3.1
   3.0.0-alpha-1

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: chenglei
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-03 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151096#comment-17151096
 ] 

Sean Busbey commented on HBASE-24625:
-

What versions are impacted? All 2.x.y?

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: chenglei
>Priority: Critical
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24625) AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

2020-07-03 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24625:

Priority: Critical  (was: Major)

> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> 
>
> Key: HBASE-24625
> URL: https://issues.apache.org/jira/browse/HBASE-24625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: chenglei
>Priority: Critical
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
> rollWriterLock.lock();
> try {
>   Path currentPath = getOldPath();
>   if (path.equals(currentPath)) {
> W writer = this.writer;
> return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>   } else {
> return OptionalLong.empty();
>   }
> } finally {
>   rollWriterLock.unlock();
> }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
> public long getLength() {
> return length.get();
> }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>  public void append(Entry entry) {
>   int buffered = output.buffered();
>   try {
>   entry.getKey().
>   
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>   .writeDelimitedTo(asyncOutputWrapper);
>   } catch (IOException e) {
>  throw new AssertionError("should not happen", e);
>   }
>
> try {
>for (Cell cell : entry.getEdit().getCells()) {
>  cellEncoder.write(cell);
>}
>   } catch (IOException e) {
>throw new AssertionError("should not happen", e);
>  }
>  length.addAndGet(output.buffered() - buffered);
>  }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24667) Rename configs that support atypical DNS set ups to put them in hbase.unsafe

2020-07-02 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150666#comment-17150666
 ] 

Sean Busbey commented on HBASE-24667:
-

That's a good question. I should start a DISCUSS thread and get an explanation 
in the ref guide.

It's not really a namespace ATM. I think the only config that uses it is 
{{hbase.unsafe.stream.capability.enforce}}, where the name was picked in 
discussion on HBASE-19289 to make it clear that folks who set it are taking on 
more risk.


> Rename configs that support atypical DNS set ups to put them in hbase.unsafe
> 
>
> Key: HBASE-24667
> URL: https://issues.apache.org/jira/browse/HBASE-24667
> Project: HBase
>  Issue Type: Task
>  Components: conf, Operability
>Reporter: Sean Busbey
>Priority: Major
>  Labels: beginner
>
> HBASE-18226 added a config for disabling reverse DNS checks 
> {{hbase.regionserver.hostname.disable.master.reversedns}} and the release 
> note calls out that the config is dangerous:
> {quote}he following config is added by this JIRA:
> hbase.regionserver.hostname.disable.master.reversedns
> This config is for experts: don't set its value unless you really know what 
> you are doing.
>  When set to true, regionserver will use the current node hostname for the 
> servername and HMaster will skip reverse DNS lookup and use the hostname sent 
> by regionserver instead. Note that this config and 
> hbase.regionserver.hostname are mutually exclusive. See 
> https://issues.apache.org/jira/browse/HBASE-18226 for more details.
> Caution: please make sure rolling upgrade succeeds before turning on this 
> feature.
> {quote}
> We should make clear the risks of using this config by making sure the name 
> starts with {{hbase.unsafe}}.
> Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
> {{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make 
> sure the former is kept with a "deprecated config name" warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24568) do-release need not wait for tag

2020-07-02 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150588#comment-17150588
 ] 

Sean Busbey commented on HBASE-24568:
-

We don't need it anymore because we started making the repo from the tag stage 
available for the build stage right?

So if we are just doing the build we would fail with a "can't find the tag" 
instead of waiting? That seems fine.

> do-release need not wait for tag
> 
>
> Key: HBASE-24568
> URL: https://issues.apache.org/jira/browse/HBASE-24568
> Project: HBase
>  Issue Type: Bug
>  Components: build, community
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
>
> Making release failed waiting for tag to propagate to GitHub. On inspection, 
> it seems the GitHub url is missing host information.
> {noformat}
> Waiting up to 30 seconds for tag to propagate to github mirror...
> + sleep 30
> + max_propagation_time=0
> + check_for_tag 2.3.0RC0
> + curl -s --head --fail /releases/tag/2.3.0RC0
> + ((  max_propagation_time <= 0  ))
> + echo 'ERROR: Taking more than 5 minutes to propagate Release Tag 2.3.0RC0 
> to github mirror.'
> ERROR: Taking more than 5 minutes to propagate Release Tag 2.3.0RC0 to github 
> mirror.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24667) Rename configs that support atypical DNS set ups to put them in hbase.unsafe

2020-07-01 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149686#comment-17149686
 ] 

Sean Busbey commented on HBASE-24667:
-

for similar reasons we should also put {{hbase.regionserver.hostname}} defined 
in HBASE-12954 into {{hbase.unsafe}}

> Rename configs that support atypical DNS set ups to put them in hbase.unsafe
> 
>
> Key: HBASE-24667
> URL: https://issues.apache.org/jira/browse/HBASE-24667
> Project: HBase
>  Issue Type: Task
>  Components: conf, Operability
>Reporter: Sean Busbey
>Priority: Major
>  Labels: beginner
>
> HBASE-18226 added a config for disabling reverse DNS checks 
> {{hbase.regionserver.hostname.disable.master.reversedns}} and the release 
> note calls out that the config is dangerous:
> {quote}he following config is added by this JIRA:
> hbase.regionserver.hostname.disable.master.reversedns
> This config is for experts: don't set its value unless you really know what 
> you are doing.
>  When set to true, regionserver will use the current node hostname for the 
> servername and HMaster will skip reverse DNS lookup and use the hostname sent 
> by regionserver instead. Note that this config and 
> hbase.regionserver.hostname are mutually exclusive. See 
> https://issues.apache.org/jira/browse/HBASE-18226 for more details.
> Caution: please make sure rolling upgrade succeeds before turning on this 
> feature.
> {quote}
> We should make clear the risks of using this config by making sure the name 
> starts with {{hbase.unsafe}}.
> Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
> {{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make 
> sure the former is kept with a "deprecated config name" warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24667) Rename configs that support atypical DNS set ups to put them in hbase.unsafe

2020-07-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24667:

Summary: Rename configs that support atypical DNS set ups to put them in 
hbase.unsafe  (was: Rename config that disables reverse DNS to put it in 
hbase.unsafe)

> Rename configs that support atypical DNS set ups to put them in hbase.unsafe
> 
>
> Key: HBASE-24667
> URL: https://issues.apache.org/jira/browse/HBASE-24667
> Project: HBase
>  Issue Type: Task
>  Components: conf, Operability
>Reporter: Sean Busbey
>Priority: Major
>  Labels: beginner
>
> HBASE-18226 added a config for disabling reverse DNS checks 
> {{hbase.regionserver.hostname.disable.master.reversedns}} and the release 
> note calls out that the config is dangerous:
> {quote}he following config is added by this JIRA:
> hbase.regionserver.hostname.disable.master.reversedns
> This config is for experts: don't set its value unless you really know what 
> you are doing.
>  When set to true, regionserver will use the current node hostname for the 
> servername and HMaster will skip reverse DNS lookup and use the hostname sent 
> by regionserver instead. Note that this config and 
> hbase.regionserver.hostname are mutually exclusive. See 
> https://issues.apache.org/jira/browse/HBASE-18226 for more details.
> Caution: please make sure rolling upgrade succeeds before turning on this 
> feature.
> {quote}
> We should make clear the risks of using this config by making sure the name 
> starts with {{hbase.unsafe}}.
> Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
> {{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make 
> sure the former is kept with a "deprecated config name" warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24667) Rename config that disables reverse DNS to put it in hbase.unsafe

2020-07-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24667:

Labels: beginner  (was: )

> Rename config that disables reverse DNS to put it in hbase.unsafe
> -
>
> Key: HBASE-24667
> URL: https://issues.apache.org/jira/browse/HBASE-24667
> Project: HBase
>  Issue Type: Task
>  Components: conf, Operability
>Reporter: Sean Busbey
>Priority: Major
>  Labels: beginner
>
> HBASE-18226 added a config for disabling reverse DNS checks 
> {{hbase.regionserver.hostname.disable.master.reversedns}} and the release 
> note calls out that the config is dangerous:
> {quote}he following config is added by this JIRA:
> hbase.regionserver.hostname.disable.master.reversedns
> This config is for experts: don't set its value unless you really know what 
> you are doing.
>  When set to true, regionserver will use the current node hostname for the 
> servername and HMaster will skip reverse DNS lookup and use the hostname sent 
> by regionserver instead. Note that this config and 
> hbase.regionserver.hostname are mutually exclusive. See 
> https://issues.apache.org/jira/browse/HBASE-18226 for more details.
> Caution: please make sure rolling upgrade succeeds before turning on this 
> feature.
> {quote}
> We should make clear the risks of using this config by making sure the name 
> starts with {{hbase.unsafe}}.
> Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
> {{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make 
> sure the former is kept with a "deprecated config name" warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24667) Rename config that disables reverse DNS to put it in hbase.unsafe

2020-07-01 Thread Sean Busbey (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-24667:

Description: 
HBASE-18226 added a config for disabling reverse DNS checks 
{{hbase.regionserver.hostname.disable.master.reversedns}} and the release note 
calls out that the config is dangerous:
{quote}he following config is added by this JIRA:

hbase.regionserver.hostname.disable.master.reversedns

This config is for experts: don't set its value unless you really know what you 
are doing.
 When set to true, regionserver will use the current node hostname for the 
servername and HMaster will skip reverse DNS lookup and use the hostname sent 
by regionserver instead. Note that this config and hbase.regionserver.hostname 
are mutually exclusive. See https://issues.apache.org/jira/browse/HBASE-18226 
for more details.

Caution: please make sure rolling upgrade succeeds before turning on this 
feature.
{quote}
We should make clear the risks of using this config by making sure the name 
starts with {{hbase.unsafe}}.

Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
{{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make sure 
the former is kept with a "deprecated config name" warning.

  was:
HBASE-18226 added a config for disabling reverse DNS checks 
{{hbase.regionserver.hostname.disable.master.reversedns}} and the release note 
calls out that the config is dangerous:

{quote}

he following config is added by this JIRA:

hbase.regionserver.hostname.disable.master.reversedns

This config is for experts: don't set its value unless you really know what you 
are doing.
When set to true, regionserver will use the current node hostname for the 
servername and HMaster will skip reverse DNS lookup and use the hostname sent 
by regionserver instead. Note that this config and hbase.regionserver.hostname 
are mutually exclusive. See https://issues.apache.org/jira/browse/HBASE-18226 
for more details.

Caution: please make sure rolling upgrade succeeds before turning on this 
feature.

{quote}

We should make clear the risks of using this config by making sure the name 
starts with {{hbase.unsafe}}.

Rename hbase.regionserver.hostname.disable.master.reversedns}} to 
{{}}{{hbase.unsafe.regionserver.hostname.disable.master.reversedns}}{{}} but 
make sure the former is kept with a "deprecated config name" warning.}}


> Rename config that disables reverse DNS to put it in hbase.unsafe
> -
>
> Key: HBASE-24667
> URL: https://issues.apache.org/jira/browse/HBASE-24667
> Project: HBase
>  Issue Type: Task
>  Components: conf, Operability
>Reporter: Sean Busbey
>Priority: Major
>
> HBASE-18226 added a config for disabling reverse DNS checks 
> {{hbase.regionserver.hostname.disable.master.reversedns}} and the release 
> note calls out that the config is dangerous:
> {quote}he following config is added by this JIRA:
> hbase.regionserver.hostname.disable.master.reversedns
> This config is for experts: don't set its value unless you really know what 
> you are doing.
>  When set to true, regionserver will use the current node hostname for the 
> servername and HMaster will skip reverse DNS lookup and use the hostname sent 
> by regionserver instead. Note that this config and 
> hbase.regionserver.hostname are mutually exclusive. See 
> https://issues.apache.org/jira/browse/HBASE-18226 for more details.
> Caution: please make sure rolling upgrade succeeds before turning on this 
> feature.
> {quote}
> We should make clear the risks of using this config by making sure the name 
> starts with {{hbase.unsafe}}.
> Rename {{hbase.regionserver.hostname.disable.master.reversedns}} to 
> {{hbase.unsafe.regionserver.hostname.disable.master.reversedns}} but make 
> sure the former is kept with a "deprecated config name" warning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24667) Rename config that disables reverse DNS to put it in hbase.unsafe

2020-07-01 Thread Sean Busbey (Jira)
Sean Busbey created HBASE-24667:
---

 Summary: Rename config that disables reverse DNS to put it in 
hbase.unsafe
 Key: HBASE-24667
 URL: https://issues.apache.org/jira/browse/HBASE-24667
 Project: HBase
  Issue Type: Task
  Components: conf, Operability
Reporter: Sean Busbey


HBASE-18226 added a config for disabling reverse DNS checks 
{{hbase.regionserver.hostname.disable.master.reversedns}} and the release note 
calls out that the config is dangerous:

{quote}

he following config is added by this JIRA:

hbase.regionserver.hostname.disable.master.reversedns

This config is for experts: don't set its value unless you really know what you 
are doing.
When set to true, regionserver will use the current node hostname for the 
servername and HMaster will skip reverse DNS lookup and use the hostname sent 
by regionserver instead. Note that this config and hbase.regionserver.hostname 
are mutually exclusive. See https://issues.apache.org/jira/browse/HBASE-18226 
for more details.

Caution: please make sure rolling upgrade succeeds before turning on this 
feature.

{quote}

We should make clear the risks of using this config by making sure the name 
starts with {{hbase.unsafe}}.

Rename hbase.regionserver.hostname.disable.master.reversedns}} to 
{{}}{{hbase.unsafe.regionserver.hostname.disable.master.reversedns}}{{}} but 
make sure the former is kept with a "deprecated config name" warning.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24639) RequestId Tracing feature for HBase

2020-06-30 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149112#comment-17149112
 ] 

Sean Busbey commented on HBASE-24639:
-

please help out on HBASE-22120 or provide a scope document about how this fits 
in with that effort.

> RequestId Tracing feature for HBase 
> 
>
> Key: HBASE-24639
> URL: https://issues.apache.org/jira/browse/HBASE-24639
> Project: HBase
>  Issue Type: New Feature
>Reporter: Pranshu Khandelwal
>Assignee: Pranshu Khandelwal
>Priority: Major
>
> Associate a TraceId corresponding to Hbase Put Request on the mutation level 
> which will be further propagated to the RPC layer. This will enable 
> cost-effective logging of the Request Flow and help identifying Network 
> bottlenecks. We aim to send the TraceId to the RPC server by modifying the 
> RequestProto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >