from:"Nick Dimiduk"

Nick Dimiduk created HBASE-24465:


 Summary: Normalizer should consider region max file size when 
planning merges
 Key: HBASE-24465
 URL: https://issues.apache.org/jira/browse/HBASE-24465
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Nick Dimiduk


When the normalizer plans its actions, it does not consider 
{{hbase.hregion.max.filesize}}. This means it could get into a merge/split loop 
in collaboration with the region server: the normalizer sees two regions are 
smaller than the table average, it merges them. The resulting region is larger 
that this max file size, so the region server splits them. Repeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24464) Normalizer should have configuration for minimum region size

Nick Dimiduk created HBASE-24464:


 Summary: Normalizer should have configuration for minimum region 
size
 Key: HBASE-24464
 URL: https://issues.apache.org/jira/browse/HBASE-24464
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Nick Dimiduk


Another issue raised in PR review on HBASE-24418. With the current 
implementation, the lower-bound on how small the normalizer will make a region 
is its lowest granularity of size measurement: 1mb. That means it will happily 
balance a table out to be 1000 * 1mb regions (probably not likely, but seems 
plausible). The proposal was to add configuration knob that specifies the lower 
bound on region size as a guard against excessive splitting, something of an 
analogue to {{hbase.hregion.max.filesize}}.

I'm not convinced this is really needed; I think the table would need to have 
been crafted intentionally to trigger this behavior. However, it seems like a 
reasonable guard rail to install. Discuss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24463) Allow operator to limit total normalization work per invocation

Nick Dimiduk created HBASE-24463:


 Summary: Allow operator to limit total normalization work per 
invocation
 Key: HBASE-24463
 URL: https://issues.apache.org/jira/browse/HBASE-24463
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Nick Dimiduk


During review on HBASE-24418, we observed that there's no way for an operator 
to limit the total amount of work a normalizer invocation will do. One 
suggestion was allowing settings for maximum number of plan executions per 
table or maximum number of plan executions per run. I've seen other systems 
limit the amount of CPU time permitted for a given run.

At least we have a run lock that prevents multiple invocations from running 
concurrently, which should prevent one run from starting before the previous 
one finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24460) Expose HBCK Chore Reports via metrics

Nick Dimiduk created HBASE-24460:


 Summary: Expose HBCK Chore Reports via metrics
 Key: HBASE-24460
 URL: https://issues.apache.org/jira/browse/HBASE-24460
 Project: HBase
  Issue Type: Improvement
  Components: master, metrics
Reporter: Nick Dimiduk


Talking with an operator about their experience upgrading to HBase2. They 
automate as much as they can, and like having access to so much system state 
via {{/json}} endpoint. One of the things they miss on the endpoint is access 
to hbck reports. Now in HBase2 these reports are managed by the master, we can 
represent them in JMX and make them available as metrics as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24419) Normalizer merge plans should account more than 2 regions when possible

2020-05-22 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24419:


 Summary: Normalizer merge plans should account more than 2 regions 
when possible
 Key: HBASE-24419
 URL: https://issues.apache.org/jira/browse/HBASE-24419
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 3.0.0-alpha-1, 2.3.0
Reporter: Nick Dimiduk


The merge plans produced by the normalizer operate over two regions. Our merge 
operation supports multiple regions in a single request. When there are 
multiple merge plans generated over contiguous region space, these should be 
collapsed into a single merge operation. This should automatically honor 
whatever existing configuration settings exist limiting the number of regions 
that can participate in a merge procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24418) Consolidate Normalizer implementations

2020-05-22 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24418:


 Summary: Consolidate Normalizer implementations
 Key: HBASE-24418
 URL: https://issues.apache.org/jira/browse/HBASE-24418
 Project: HBase
  Issue Type: Task
  Components: master
Affects Versions: 3.0.0-alpha-1, 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


After HBASE-22285, we have two implementations of {{RegionNormalizer}}, that 
have different feature sets and different configurations. I think these can be 
combined into a single implementation, with clear, decoupled configuration 
parameters. At least on branch-2.3, there's too many subsequent changes for 
HBASE-22285 to revert cleanly, so I'll use this ticket to consolidate the 
implementations.

If you have issues with the current normalizer, speak up here and we can 
include them, or add them as sub-tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: branch-2.2, branch-2.3 and branch-2's nightly job failed because not finish

2020-05-21 Thread Nick Dimiduk

Based on what I saw for branch-2.3 on builds 99, 100, I filed
https://issues.apache.org/jira/browse/INFRA-20296. I haven’t stopped to
look at the others.

On Thu, May 21, 2020 at 19:03 Guanghao Zhang  wrote:

> The console-report html is empty and the patch-unit-root.txt is incomplete.
> Seems the ut not finished.
>
>
> https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2671/artifact/output-jdk8-hadoop2/
>
> https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/100/artifact/output-jdk8-hadoop2/
>
> https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/873/artifact/output-jdk8-hadoop2/
>

Re: [DISCUSS] Delete old branches

2020-05-20 Thread Nick Dimiduk

Why remove old/unused branches? To keep our garden tidy. They’re
distracting at best, confusing at worst. For old release line branches,
it’s not clear to a casual committer which branches need to receive a back
port. It’s clear if the EOL branches are gone.

On Wed, May 20, 2020 at 18:06 Guanghao Zhang  wrote:

> +1 for remove feature branches and start a  case-by-case discussion for
> "others".
>
> And for branchs of old release line, what's the harm if keep them? I
> thought we don't need to remove them.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2020年5月21日周四 上午8:14写道：
>
> > What is the benefit?
> >
> > Nick Dimiduk 于2020年5月21日 周四07:31写道：
> >
> > > Heya,
> > >
> > > We have lots of branches hanging around in git. These appear to be
> > > 1. branches for old release lines (i.e., 0.90),
> > > 2. feature branches (that are potentially stale, i.e., HBASE-11288),
> > > 3. "other" (i.e., 0.89-fb, former_0.20, revert-1633-HBASE-24221).
> > >
> > > Can we decide it's okay to delete some of these?
> > >
> > > For (1), all of our release tags, going back to 0.1, are preserved.
> > There's
> > > no benefit to keeping these.
> > >
> > > For (2), I think there's no discussion required, just someone to go
> check
> > > each Jira ID, and delete any that are closed, maybe with a comment on
> the
> > > Jira first. Maybe this could be automated?
> > >
> > > For (3), I suppose we need a case-by-case discussion? Maybe there are
> > > categories of these that can be resolved in blocks.
> > >
> > > Thanks,
> > > Nick
> > >
> >
>

[DISCUSS] Delete old branches

2020-05-20 Thread Nick Dimiduk

Heya,

We have lots of branches hanging around in git. These appear to be
1. branches for old release lines (i.e., 0.90),
2. feature branches (that are potentially stale, i.e., HBASE-11288),
3. "other" (i.e., 0.89-fb, former_0.20, revert-1633-HBASE-24221).

Can we decide it's okay to delete some of these?

For (1), all of our release tags, going back to 0.1, are preserved. There's
no benefit to keeping these.

For (2), I think there's no discussion required, just someone to go check
each Jira ID, and delete any that are closed, maybe with a comment on the
Jira first. Maybe this could be automated?

For (3), I suppose we need a case-by-case discussion? Maybe there are
categories of these that can be resolved in blocks.

Thanks,
Nick

Re: Nightly builds are not running on branch-2.1

2020-05-20 Thread Nick Dimiduk

Can we delete the old branch as well?

On Wed, May 20, 2020 at 07:57 Sean Busbey  wrote:

> I updated the header. might take a bit for it to sync up.
>
> On Wed, May 20, 2020 at 9:55 AM Sean Busbey  wrote:
>
> > hurm. looks like we missed some steps on cleaning up branch-2.1. the EOL
> > isn't listed on our header for downloads.apache.org.
> >
> > we sent a notice to user@hbase in March when 2.1.10 was coming out:
> > https://s.apache.org/om9lb
> >
> > On Wed, May 20, 2020 at 3:28 AM ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> >> Oh my bad. I did not realize that. Thanks for letting me know.
> >>
> >> On Wed, May 20, 2020, 1:54 PM Peter Somogyi 
> wrote:
> >>
> >> > The branch-2.1 reached End of Life so the builds got disabled. The
> last
> >> > release from branch-2.1 is 2.1.10.
> >> >
> >> > On Wed, May 20, 2020 at 9:48 AM ramkrishna vasudevan <
> >> > ramkrishna.s.vasude...@gmail.com> wrote:
> >> >
> >> > > Hi All
> >> > >
> >> > > After we did some recent checkins for branch-2.1 and was waiting for
> >> the
> >> > > nightly builds observed that the builds have not been running for
> >> last 10
> >> > > days.
> >> > >
> >> https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/lastBuild/
> >> > >
> >> > > Recently there was a compilation issue caused by  HBASE-24186 which
> I
> >> > > reverted couple of days ago.
> >> > >
> >> > > Is there a way we can trigger the nightly build once again? Or it is
> >> some
> >> > > other known issue as in branch-2.2 (where a recent discussion had
> >> > > happened).
> >> > >
> >> > > Regards
> >> > > Ram
> >> > >
> >> >
> >>
> >
>

[jira] [Resolved] (HBASE-24361) Make `RESTApiClusterManager` more resilient

2020-05-19 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24361.
--
Fix Version/s: 2.3.0
   3.0.0-alpha-1
   Resolution: Fixed

> Make `RESTApiClusterManager` more resilient
> ---
>
> Key: HBASE-24361
> URL: https://issues.apache.org/jira/browse/HBASE-24361
> Project: HBase
>  Issue Type: Test
>  Components: integration tests
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> The Cloudera Manager API client in {{RESTApiClusterManager}} appears to 
> assume that API calls sent to CM for process commands block on command 
> completion. However, these commands are "asynchronous," queuing work in the 
> background for execution. Update the client to track command submission and 
> block on completion of that commandId. This allows this {{ClusterManager}} to 
> conform to the expectations of the {{Actions}} that invoke it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24360) RollingBatchRestartRsAction loses track of dead servers

2020-05-18 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24360.
--
Fix Version/s: 2.3.0
   3.0.0-alpha-1
   Resolution: Fixed

> RollingBatchRestartRsAction loses track of dead servers
> ---
>
> Key: HBASE-24360
> URL: https://issues.apache.org/jira/browse/HBASE-24360
> Project: HBase
>  Issue Type: Test
>  Components: integration tests
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> {{RollingBatchRestartRsAction}} doesn't handle failure cases when tracking 
> its list of dead servers. The original author believed that a failure to 
> restart would result in a retry. However, by removing the dead server from 
> the failed list prematurely, that state is lost, and retry of that server 
> never occurs. Because this action doesn't ever look back to the current state 
> of the cluster, relying only on its local state for the current action 
> invocation, it never realizes the abandoned server is still dead. Instead, be 
> more careful to only remove the dead server from the list when the 
> {{startRs}} invocation claims to have been successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-23969) Meta browser should show all `info` columns

2020-05-18 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-23969.
--
Resolution: Fixed

Thanks for the fix [~liuml07]!

> Meta browser should show all `info` columns
> ---
>
> Key: HBASE-23969
> URL: https://issues.apache.org/jira/browse/HBASE-23969
> Project: HBase
>  Issue Type: Improvement
>  Components: master, UI
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Nick Dimiduk
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: Screen Shot 2020-04-10 at 4.02.50 AM.png, Screen Shot 
> 2020-04-11 at 3.27.57 AM.png, Screen Shot 2020-04-17 at 7.07.06 PM.png, 
> Screen Shot 2020-04-21 at 10.16.58 PM.png
>
>
> The Meta table browser lists region states. There are other {{info}} columns 
> in the table, which should be displayed. Looking through {{HConstants}}, it 
> seems we need to add the following:
>  * {{server}}
>  * {{sn}}
>  * {{splitA}}
>  * {{splitB}}
>  * {{merge}}
>  * {{mergeA}}
>  * {{mergeB}}
> Are there others?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

2020-05-14 Thread Nick Dimiduk

HBASE-24086 and HBASE-24106 have been reverted, HBASE-24271 has been
applied. Thanks for the fruitful discussion.

-n

On Fri, Apr 17, 2020 at 4:52 PM Nick Dimiduk  wrote:

> On Fri, Apr 17, 2020 at 3:31 PM Stack  wrote:
>
>> On writing to local 'tmp' dir, thats fine, but quickstart was always
>> supposed to be a transient install (its one example of setting config is
>> the setting of the tmp location). The messaging that this is the case
>> needs
>> an edit after a re-read (I volunteer to do this and to give refguide a
>> once-over on lack of guarantees when an hbase deploy is unconfigured).
>>
>
> Sounds like you're reading my handiwork, pushed in HBASE-24106. I'm
> definitely open to editing help, yes please! Before that change, the Quick
> Start section required the user to set hbase.rootdir,
> hbase.zookeeper.property.dataDir, and
> hbase.unsafe.stream.capability.enforce all before that could start the
> local process.
>
> Can we have the start-out-of-the-box back please? Its a PITA having to go
>> edit config running a local build trying to test something nevermind the
>> poor noob whose first experience is a fail.
>>
>
> I agree.
>
> The conclusion I understand from this thread looks something like this:
>
> 1. revert HBASE-24086, make it so that running on `LocalFileSystem` is a
> fatal condition with default configs.
> 2. ship a conf/hbase-site.xml that contains
> hbase.unsafe.stream.capability.enforce=false, along with a big comment
> saying this is not safe for production.
> 3. ship a conf/hbase-site.xml that contains hbase.tmp.dir=./tmp, along
> with a comment saying herein you'll find temporary and persistent data,
> reconfigure for production with hbase.rootdir pointed to a durable
> filesystem that supports our required stream capabilities (see above).
> 4. update HBASE-24106 as appropriate.
>
> Neither 2 nor 3 are suitable for production deployments, thus the changes
> do not go into hbase-default.xml. Anyone standing up a production deploy
> must edit hbase-site.xml anyway, so this doesn't change anything. It also
> restores our "simple" first-time user experience of not needing to run
> anything besides `bin/start-hbase.sh` (or `bin/hbase master start`, or
> whatever it is we're telling people these days).
>
> We can reassess this once more when a durable equivalent to
> LocalFileSystem comes along.
>
> Thanks,
> Nick
>

[jira] [Resolved] (HBASE-24271) Set values in `conf/hbase-site.xml` that enable running on `LocalFileSystem` out of the box



 [ 
https://issues.apache.org/jira/browse/HBASE-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24271.
--
Release Note: 

HBASE-24271 makes changes the the default `conf/hbase-site.xml` such that 
`bin/hbase` will run directly out of the binary tarball or a compiled source 
tree without any configuration modifications vs. Hadoop 2.8+. This changes our 
long-standing history of shipping no configured values in 
`conf/hbase-site.xml`, so existing processes that assume this file is empty of 
configuration properties may require attention.
  Resolution: Fixed

> Set values in `conf/hbase-site.xml` that enable running on `LocalFileSystem` 
> out of the box
> ---
>
> Key: HBASE-24271
> URL: https://issues.apache.org/jira/browse/HBASE-24271
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.2.4
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.2.5
>
>
> This ticket is to implement the changes as described on the [discussion on 
> dev|https://lists.apache.org/thread.html/r089de243a9bc9d923fa07c81e6bc825b82be68f567b892a342a0c61f%40%3Cdev.hbase.apache.org%3E].
>  It reverts and supersedes changes made on HBASE-24086 and HBASE-24106.
> {quote}
> The conclusion I understand from this thread looks something like this:
> 1. revert HBASE-24086, make it so that running on `LocalFileSystem` is a 
> fatal condition with default configs.
> 2. ship a conf/hbase-site.xml that contains 
> hbase.unsafe.stream.capability.enforce=false, along with a big comment saying 
> this is not safe for production.
> 3. ship a conf/hbase-site.xml that contains hbase.tmp.dir=./tmp, along with a 
> comment saying herein you'll find temporary and persistent data, reconfigure 
> for production with hbase.rootdir pointed to a durable filesystem that 
> supports our required stream capabilities (see above).
> 4. update HBASE-24106 as appropriate.
> Neither 2 nor 3 are suitable for production deployments, thus the changes do 
> not go into hbase-default.xml. Anyone standing up a production deploy must 
> edit hbase-site.xml anyway, so this doesn't change anything. It also restores 
> our "simple" first-time user experience of not needing to run anything 
> besides `bin/start-hbase.sh` (or `bin/hbase master start`, or whatever it is 
> we're telling people these days).
> We can reassess this once more when a durable equivalent to LocalFileSystem 
> comes along.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24086) Disable output stream capability enforcement when running in standalone mode



 [ 
https://issues.apache.org/jira/browse/HBASE-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24086.
--
Fix Version/s: (was: 2.2.5)
   (was: 1.7.0)
   (was: 2.3.0)
   (was: 3.0.0-alpha-1)
 Release Note:   (was: In the presence of an instance of `LocalFileSystem` 
used for a WAL, HBase will degrade to NOT enforcing unsafe stream capabilities. 
A warning log message is generated each time this occurs.)
 Assignee: (was: Nick Dimiduk)
   Resolution: Won't Fix

Reverted from all branches. Superseded by HBASE-24271.

> Disable output stream capability enforcement when running in standalone mode
> 
>
> Key: HBASE-24086
> URL: https://issues.apache.org/jira/browse/HBASE-24086
> Project: HBase
>  Issue Type: Task
>  Components: master, Operability
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Nick Dimiduk
>Priority: Critical
>
> {noformat}
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> mvn clean install -DskipTests
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> ./bin/hbase master start
> {noformat}
> gives
> {noformat}
> 2020-03-30 17:12:43,857 ERROR 
> [master/192.168.111.13:16000:becomeActiveMaster] master.HMaster: Failed to 
> become active master  
>  
> java.io.IOException: cannot get log writer
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:118)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:704)
>   
>  
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:710)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:128)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:839)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:549)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.init(AbstractFSWAL.java:490)
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:156)
>   
>
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:61)
>   
> 
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:297) 
>   
> 
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.createWAL(RegionProcedureStore.java:256)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.bootstrap(RegionProcedureStore.java:273)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:482)
>   
>
> at 
> org.apache.hadoop.hbase.procedur

[jira] [Resolved] (HBASE-24106) Update getting started documentation after HBASE-24086



 [ 
https://issues.apache.org/jira/browse/HBASE-24106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24106.
--
Resolution: Won't Fix

Original commit reverted, superseded by HBASE-24271.

> Update getting started documentation after HBASE-24086
> --
>
> Key: HBASE-24106
> URL: https://issues.apache.org/jira/browse/HBASE-24106
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>    Reporter: Nick Dimiduk
>Priority: Major
>
> HBASE-24086 allows HBase to degrade gracefully to running on a 
> {{LocalFileSystem}} without further user configuration. Update the docs 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24366) Document how to move WebUI access log entries to a separate log file



 [ 
https://issues.apache.org/jira/browse/HBASE-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24366.
--
Resolution: Duplicate

Indeed. Thanks [~zhangduo].

> Document how to move WebUI access log entries to a separate log file
> 
>
> Key: HBASE-24366
> URL: https://issues.apache.org/jira/browse/HBASE-24366
> Project: HBase
>  Issue Type: Task
>  Components: master, regionserver
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>Priority: Major
>
> I've noticed that after a recent commit, we now have webui access log lines 
> going into our service log file. The log entires are going to a logger called 
> {{http.requests.regionserver}}, and after the preamble of timestamp, log 
> level, logger, they appear to be conformant to the 
> [CLF|https://en.wikipedia.org/wiki/Common_Log_Format] specification. Tools 
> designed for parsing http logs usually expect to have just the CLF entries, 
> and not need preprocessing.
> We should document how to configure the service to log these entries into a 
> separate log file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24367) ScheduledChore log elapsed timespan in a human-friendly format

Nick Dimiduk created HBASE-24367:


 Summary: ScheduledChore log elapsed timespan in a human-friendly 
format
 Key: HBASE-24367
 URL: https://issues.apache.org/jira/browse/HBASE-24367
 Project: HBase
  Issue Type: Task
  Components: master, regionserver
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


I noticed this in a log line,

{noformat}
2020-04-23 18:31:14,183 INFO org.apache.hadoop.hbase.ScheduledChore: 
host-a.example.com,16000,1587577999888-ClusterStatusChore average execution 
time: 68488258 ns.
{noformat}

I'm not sure if there's a case when elapsed time in nanoseconds is meaningful 
for these background chores, but we could do a little work before printing the 
number and time unit to truncate precision down to something a little more 
intuitive for operators. This number purports to be an average, so a high level 
of precision isn't necessarily meaningful.

Separately, or while we're here, if we think an operator really cares about the 
performance of this chore, we should print a histogram of elapsed times, rather 
than an opaque average.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Separate web access logs

2020-05-13 Thread Nick Dimiduk

Having read through a region server log with this feature enabled, looking
to diagnose another issue, I'm going to change my tone. I think having
these messages in the standard server log is too noisy, makes them even
harder to read. In my case, it looks like automation is pinging the /jmx
endpoint on a regular basis, and we're seeing those interaction here.

So, I propose we change the default configuration to ship these logs to a
dedicated access.log.

On Wed, May 13, 2020 at 2:41 PM Nick Dimiduk  wrote:

> Heya,
>
> Looks like since HBASE-24310 we have nicely formatted logs of access to
> our WebUI endpoints, following a standard web server log format. I think
> it'll be a common requirement for environments to process these logs
> following standard processes for web servers, so I think we should document
> how to separate them out to their own dedicated file, thus HBASE-24366.
> However, I'm wondering if we should make the separate file as the default
> configuration. Are there different answers to this question for 2.x vs. 3.0
> releases?
>
> Maybe you're a lurking operator who has opinions about this? Please speak
> up! :D
>
> Thanks,
> Nick
>

[DISCUSS] Separate web access logs

2020-05-13 Thread Nick Dimiduk

Heya,

Looks like since HBASE-24310 we have nicely formatted logs of access to our
WebUI endpoints, following a standard web server log format. I think it'll
be a common requirement for environments to process these logs following
standard processes for web servers, so I think we should document how to
separate them out to their own dedicated file, thus HBASE-24366. However,
I'm wondering if we should make the separate file as the default
configuration. Are there different answers to this question for 2.x vs. 3.0
releases?

Maybe you're a lurking operator who has opinions about this? Please speak
up! :D

Thanks,
Nick

[jira] [Created] (HBASE-24366) Document how to move WebUI access log entries to a separate log file

Nick Dimiduk created HBASE-24366:


 Summary: Document how to move WebUI access log entries to a 
separate log file
 Key: HBASE-24366
 URL: https://issues.apache.org/jira/browse/HBASE-24366
 Project: HBase
  Issue Type: Task
  Components: master, regionserver
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


I've noticed that after a recent commit, we now have webui access log lines 
going into our service log file. The log entires are going to a logger called 
{{http.requests.regionserver}}, and after the preamble of timestamp, log level, 
logger, they appear to be conformant to the 
[CLF|https://en.wikipedia.org/wiki/Common_Log_Format] specification. Tools 
designed for parsing http logs usually expect to have just the CLF entries, and 
not need preprocessing.

We should document how to configure the service to log these entries into a 
separate log file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New HBase committer Wei-Chiu Chuang

2020-05-13 Thread Nick Dimiduk

Thank you Wei-Chiu for your contributions! Looking forward to continuing to
work together :)

On Wed, May 13, 2020 at 2:01 PM Rushabh Shah
 wrote:

> Congratulations Wei-Chiu !!
>
>
> Rushabh Shah
>
>- Software Engineering SMTS | Salesforce
>-
>   - Mobile: 213 422 9052
>
>
>
> On Wed, May 13, 2020 at 1:35 PM Zach York 
> wrote:
>
> > Congratulations Wei-Chiu!
> >
> > On Wed, May 13, 2020 at 12:15 PM Bharath Vissapragada <
> bhara...@apache.org
> > >
> > wrote:
> >
> > > Congrats, Wei-Chiu.
> > >
> > > On Wed, May 13, 2020 at 12:13 PM Andrew Purtell 
> > > wrote:
> > >
> > > > Congratulations and welcome Wei-Chiu!
> > > >
> > > > On Wed, May 13, 2020 at 12:10 PM Sean Busbey 
> > wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > On behalf of the Apache HBase PMC I am pleased to announce that
> > > Wei-Chiu
> > > > > Chuang has accepted the PMC's invitation to become a committer on
> the
> > > > > project.
> > > > >
> > > > > We appreciate all of the great contributions Wei-Chiu has made to
> the
> > > > > community thus far and we look forward to his continued
> involvement.
> > > > >
> > > > > Allow me to be the first to congratulate Wei-Chiu on his new role!
> > > > >
> > > > > thanks,
> > > > > busbey
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >- A23, Crosstalk
> > > >
> > >
> >
>

[jira] [Resolved] (HBASE-24093) Exclude H2 from the build workers pool



 [ 
https://issues.apache.org/jira/browse/HBASE-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24093.
--
Resolution: Won't Fix

I think the issues we were trying to side-step were resolved by Infra 
increasing the jenkins worker heap allocation.

> Exclude H2 from the build workers pool
> --
>
> Key: HBASE-24093
> URL: https://issues.apache.org/jira/browse/HBASE-24093
> Project: HBase
>  Issue Type: Task
>  Components: build
>    Reporter: Nick Dimiduk
>Priority: Major
>
> Tracking INFRA-20025, H2 keeps coming up as impacted. Let's exclude it from 
> our build workers while Infra investigates the hardware.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-24190) Case-sensitive use of configuration parameter hbase.security.authentication



 [ 
https://issues.apache.org/jira/browse/HBASE-24190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24190:
--

The commits applied do not conform to the project requirements for including a 
Jira ticket and matching between the commit title and jira summary. Responsible 
committer, please revert and reapply everywhere. Thanks.

> Case-sensitive use of configuration parameter hbase.security.authentication
> ---
>
> Key: HBASE-24190
> URL: https://issues.apache.org/jira/browse/HBASE-24190
> Project: HBase
>  Issue Type: Bug
>Reporter: Yuanliang Zhang
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.1.10, 1.4.14, 2.2.5
>
>
> In hbase-20586 (https://issues.apache.org/jira/browse/HBASE-20586)
> （commit_sha: [https://github.com/apache/hbase/commit/cd61bcc0] ）
> The code added 
> ([SyncTable.java|https://github.com/apache/hbase/commit/cd61bcc0#diff-d1b79635f33483bf6226609e91fd1cc3])
>  for the use of *hbase.security.authentication* is case-sensitive. So users 
> setting it to “KERBEROS” won’t take effect. 
>  
> {code:java}
>  private void initCredentialsForHBase(String zookeeper, Job job) throws 
> IOException {
>    Configuration peerConf = 
> HBaseConfiguration.createClusterConf(job.getConfiguration(), zookeeper);
>    if(peerConf.get("hbase.security.authentication").equals("kerberos")){
>  TableMapReduceUtil.initCredentialsForCluster(job, peerConf);    }
>  }
> {code}
>  
> However, in current code base, other uses of *hbase.security.authentication* 
> are all case-insensitive. For example in *MasterFileSystem.java.* 
>  
> {code:java}
> public MasterFileSystem(Configuration conf) throws IOException{   
>   ...   
>   this.isSecurityEnabled = 
> "kerberos".equalsIgnoreCase(conf.get("hbase.security.authentication"));  
>   ... 
> }
> {code}
>  
> The doc in GitHub repo is also misleading (Giving upper-case value).
> {quote}As a distributed database, HBase must be able to authenticate users 
> and HBase services across an untrusted network. Clients and HBase services 
> are treated equivalently in terms of authentication (and this is the only 
> time we will draw such a distinction).
> There are currently three modes of authentication which are supported by 
> HBase today via the configuration property {{hbase.security.authentication}}
> {{1.SIMPLE}}
> {{2.KERBROS}}
> {{3.TOKEN}}
> {quote}
> Users may misconfigure the parameter because of the case-senstive problem.
> *How To Fix*
> Using *eqaulsIgnoreCase* API consistently in every place when using 
> *hbase.security.authentication* or make it clear in Doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Recent experience with Chaos Monkey?

2020-05-13 Thread Nick Dimiduk

To follow up, I've needed to apply these two patches to get my local
environment running.

https://issues.apache.org/jira/browse/HBASE-24360
https://issues.apache.org/jira/browse/HBASE-24361

On Tue, May 12, 2020 at 11:52 AM Nick Dimiduk  wrote:

> Thanks Zach.
>
> > It actually performs even worse in this case in my experience since
> Chaos monkey can consider the failure mechanism to have failed (and
> eventually times out) because the process is too quick to recover (or the
> recovery fails because the process is already running). The only way I was
> able to get it to run was to disable the process that automatically
> restarts killed processes in my system.
>
> Interesting observation.
>
> > This brings up a discussion on whether the ITBLL (or whatever process)
> should even continue if either a killing or recovering action failed.
> I would argue that invalidates the entire test, but it might not be obvious
> it failed unless you were watching the logs as it went.
>
> I'm coming to a similar conclusion -- failure in the orchestration layer
> should invalidate the test.
>
> On Thu, May 7, 2020 at 5:27 PM Zach York 
> wrote:
>
>> I should note that I was using HBase 2.2.3 to test.
>>
>> On Thu, May 7, 2020 at 5:26 PM Zach York 
>> wrote:
>>
>> > I recently ran ITBLL with Chaos monkey[1] against a real HBase
>> > installation (EMR). I initially tried to run it locally, but couldn't
>> get
>> > it working and eventually gave up.
>> >
>> > > So I'm curious if this matches others' experience running the monkey.
>> For
>> > example, do you have an environment more resilient than mine, one where
>> an
>> > external actor is restarting downed processed without the monkey
>> action's
>> > involvement?
>> >
>> > It actually performs even worse in this case in my experience since
>> Chaos
>> > monkey can consider the failure mechanism to have failed (and eventually
>> > times out)
>> > because the process is too quick to recover (or the recovery fails
>> because
>> > the process is already running). The only way I was able to get it to
>> run
>> > was to disable
>> > the process that automatically restarts killed processes in my system.
>> >
>> > One other thing I hit was the validation for a suspended process was
>> > incorrect so if chaos monkey tried to suspend the process the run would
>> > fail. I'll put up a JIRA for that.
>> >
>> > This brings up a discussion on whether the ITBLL (or whatever process)
>> > should even continue if either a killing or recovering action failed. I
>> > would argue that invalidates the entire test,
>> > but it might not be obvious it failed unless you were watching the logs
>> as
>> > it went.
>> >
>> > Thanks,
>> > Zach
>> >
>> >
>> > [1] sudo -u hbase hbase
>> > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList -m
>> serverKilling
>> > loop 4 2 100 ${RANDOM} 10
>> >
>> > On Thu, May 7, 2020 at 5:05 PM Nick Dimiduk 
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> Does anyone have recent experience running Chaos Monkey? Are you
>> running
>> >> against an external cluster, or one of the other modes? What monkey
>> >> factory
>> >> are you using? Any property overrides? A non-default ClusterManager?
>> >>
>> >> I'm trying to run ITBLL with chaos against branch-2.3 and I'm not
>> having
>> >> much luck. My environment is an "external" cluster, 4 racks of 4 hosts
>> >> each, the relatively simple "serverKilling" factory with
>> >> `rolling.batch.suspend.rs.ratio = 0.0`. So, randomly kill various
>> hosts
>> >> on
>> >> various scheduled, plus some balancer play mixed in; no process
>> >> suspension.
>> >>
>> >> Running for any length of time (~30 minutes) the chaos monkey
>> eventually
>> >> terminates between a majority and all of the hosts in the cluster. My
>> logs
>> >> are peppered with warnings such as the below. There are other
>> variants. As
>> >> far as I can tell, actions are intended to cause some harm and then
>> >> restore
>> >> state after themselves. In practice, the harm is successful but
>> >> restoration
>> >> rarely succeeds. Mostly these actions are "safeguarded" but this 60-sec
>> >

[jira] [Created] (HBASE-24361) Make `RESTApiClusterManager` more resilient

2020-05-12 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24361:


 Summary: Make `RESTApiClusterManager` more resilient
 Key: HBASE-24361
 URL: https://issues.apache.org/jira/browse/HBASE-24361
 Project: HBase
  Issue Type: Test
  Components: integration tests
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


The Cloudera Manager API client in {{RESTApiClusterManager}} appears to assume 
that API calls sent to CM for process commands block on command completion. 
However, these commands are "asynchronous," queuing work in the background for 
execution. Update the client to track command submission and block on 
completion of that commandId. This allows this {{ClusterManager}} to conform to 
the expectations of the {{Actions}} that invoke it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24360) RollingBatchRestartRsAction loses track of dead servers

2020-05-12 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24360:


 Summary: RollingBatchRestartRsAction loses track of dead servers
 Key: HBASE-24360
 URL: https://issues.apache.org/jira/browse/HBASE-24360
 Project: HBase
  Issue Type: Test
  Components: integration tests
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


{{RollingBatchRestartRsAction}} doesn't handle failure cases when tracking its 
list of dead servers. The original author believed that a failure to restart 
would result in a retry. However, by removing the dead server from the failed 
list prematurely, that state is lost, and retry of that server never occurs. 
Because this action doesn't ever look back to the current state of the cluster, 
relying only on its local state for the current action invocation, it never 
realizes the abandoned server is still dead. Instead, be more careful to only 
remove the dead server from the list when the {{startRs}} invocation claims to 
have been successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Recent experience with Chaos Monkey?

2020-05-12 Thread Nick Dimiduk

Thanks Zach.

> It actually performs even worse in this case in my experience since Chaos
monkey can consider the failure mechanism to have failed (and eventually
times out) because the process is too quick to recover (or the recovery
fails because the process is already running). The only way I was able to
get it to run was to disable the process that automatically restarts killed
processes in my system.

Interesting observation.

> This brings up a discussion on whether the ITBLL (or whatever process)
should even continue if either a killing or recovering action failed.
I would argue that invalidates the entire test, but it might not be obvious
it failed unless you were watching the logs as it went.

I'm coming to a similar conclusion -- failure in the orchestration layer
should invalidate the test.

On Thu, May 7, 2020 at 5:27 PM Zach York 
wrote:

> I should note that I was using HBase 2.2.3 to test.
>
> On Thu, May 7, 2020 at 5:26 PM Zach York 
> wrote:
>
> > I recently ran ITBLL with Chaos monkey[1] against a real HBase
> > installation (EMR). I initially tried to run it locally, but couldn't get
> > it working and eventually gave up.
> >
> > > So I'm curious if this matches others' experience running the monkey.
> For
> > example, do you have an environment more resilient than mine, one where
> an
> > external actor is restarting downed processed without the monkey action's
> > involvement?
> >
> > It actually performs even worse in this case in my experience since Chaos
> > monkey can consider the failure mechanism to have failed (and eventually
> > times out)
> > because the process is too quick to recover (or the recovery fails
> because
> > the process is already running). The only way I was able to get it to run
> > was to disable
> > the process that automatically restarts killed processes in my system.
> >
> > One other thing I hit was the validation for a suspended process was
> > incorrect so if chaos monkey tried to suspend the process the run would
> > fail. I'll put up a JIRA for that.
> >
> > This brings up a discussion on whether the ITBLL (or whatever process)
> > should even continue if either a killing or recovering action failed. I
> > would argue that invalidates the entire test,
> > but it might not be obvious it failed unless you were watching the logs
> as
> > it went.
> >
> > Thanks,
> > Zach
> >
> >
> > [1] sudo -u hbase hbase
> > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList -m
> serverKilling
> > loop 4 2 100 ${RANDOM} 10
> >
> > On Thu, May 7, 2020 at 5:05 PM Nick Dimiduk  wrote:
> >
> >> Hello,
> >>
> >> Does anyone have recent experience running Chaos Monkey? Are you running
> >> against an external cluster, or one of the other modes? What monkey
> >> factory
> >> are you using? Any property overrides? A non-default ClusterManager?
> >>
> >> I'm trying to run ITBLL with chaos against branch-2.3 and I'm not having
> >> much luck. My environment is an "external" cluster, 4 racks of 4 hosts
> >> each, the relatively simple "serverKilling" factory with
> >> `rolling.batch.suspend.rs.ratio = 0.0`. So, randomly kill various hosts
> >> on
> >> various scheduled, plus some balancer play mixed in; no process
> >> suspension.
> >>
> >> Running for any length of time (~30 minutes) the chaos monkey eventually
> >> terminates between a majority and all of the hosts in the cluster. My
> logs
> >> are peppered with warnings such as the below. There are other variants.
> As
> >> far as I can tell, actions are intended to cause some harm and then
> >> restore
> >> state after themselves. In practice, the harm is successful but
> >> restoration
> >> rarely succeeds. Mostly these actions are "safeguarded" but this 60-sec
> >> timeout. The result is a methodical termination of the cluster.
> >>
> >> So I'm curious if this matches others' experience running the monkey.
> For
> >> example, do you have an environment more resilient than mine, one where
> an
> >> external actor is restarting downed processed without the monkey
> action's
> >> involvement? Is the monkey designed to run only in such an environment?
> >> These timeouts are configurable; are you cranking them way up?
> >>
> >> Any input you have would be greatly appreciated. This is my last major
> >> action item blocking initi

Recent experience with Chaos Monkey?

2020-05-07 Thread Nick Dimiduk

Hello,

Does anyone have recent experience running Chaos Monkey? Are you running
against an external cluster, or one of the other modes? What monkey factory
are you using? Any property overrides? A non-default ClusterManager?

I'm trying to run ITBLL with chaos against branch-2.3 and I'm not having
much luck. My environment is an "external" cluster, 4 racks of 4 hosts
each, the relatively simple "serverKilling" factory with
`rolling.batch.suspend.rs.ratio = 0.0`. So, randomly kill various hosts on
various scheduled, plus some balancer play mixed in; no process suspension.

Running for any length of time (~30 minutes) the chaos monkey eventually
terminates between a majority and all of the hosts in the cluster. My logs
are peppered with warnings such as the below. There are other variants. As
far as I can tell, actions are intended to cause some harm and then restore
state after themselves. In practice, the harm is successful but restoration
rarely succeeds. Mostly these actions are "safeguarded" but this 60-sec
timeout. The result is a methodical termination of the cluster.

So I'm curious if this matches others' experience running the monkey. For
example, do you have an environment more resilient than mine, one where an
external actor is restarting downed processed without the monkey action's
involvement? Is the monkey designed to run only in such an environment?
These timeouts are configurable; are you cranking them way up?

Any input you have would be greatly appreciated. This is my last major
action item blocking initial 2.3.0 release candidates.

Thanks,
Nick

20/05/05 21:19:29 WARN policies.Policy: Exception occurred during
performing action: java.io.IOException: did timeout 6ms waiting for
region server to start: host-a.example.com
at
org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:163)
at
org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:228)
at
org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.gracefulRestartRs(RestartActionBaseAction.java:70)
at
org.apache.hadoop.hbase.chaos.actions.GracefulRollingRestartRsAction.perform(GracefulRollingRestartRsAction.java:61)
at
org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
at
org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
at
org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
at java.base/java.lang.Thread.run(Thread.java:834)

[jira] [Resolved] (HBASE-24295) [Chaos Monkey] abstract logging through the class hierarchy

2020-05-07 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24295.
--
Resolution: Fixed

> [Chaos Monkey] abstract logging through the class hierarchy
> ---
>
> Key: HBASE-24295
> URL: https://issues.apache.org/jira/browse/HBASE-24295
> Project: HBase
>  Issue Type: Task
>  Components: integration tests
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Running chaos monkey and watching the logs, it's very difficult to tell what 
> actions are actually running. There's lots of shared methods through the 
> class hierarchy that extends from {{abstract class Action}}, and each class 
> comes with its own {{Logger}}. As a result, the logs have useless stuff like
> {noformat}
> INFO actions.Action: Started regionserver...
> {noformat}
> Add {{protected abstract Logger getLogger()}} to the class's internal 
> interface, and have the concrete implementations provide their logger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-24295) [Chaos Monkey] abstract logging through the class hierarchy

2020-05-07 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24295:
--

Reopening for addendum.

> [Chaos Monkey] abstract logging through the class hierarchy
> ---
>
> Key: HBASE-24295
> URL: https://issues.apache.org/jira/browse/HBASE-24295
> Project: HBase
>  Issue Type: Task
>  Components: integration tests
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Running chaos monkey and watching the logs, it's very difficult to tell what 
> actions are actually running. There's lots of shared methods through the 
> class hierarchy that extends from {{abstract class Action}}, and each class 
> comes with its own {{Logger}}. As a result, the logs have useless stuff like
> {noformat}
> INFO actions.Action: Started regionserver...
> {noformat}
> Add {{protected abstract Logger getLogger()}} to the class's internal 
> interface, and have the concrete implementations provide their logger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Nightly job for branch-2.2 is failing

2020-05-05 Thread Nick Dimiduk

The dockerfile on branch-2.3+ were rewritten in order to support multiple
JDKs. These changes should be compatible with branch-2.2, other than they
no longer set JAVA_HOME in the Dockerfiles itself. Backport with a little
tweaking to the Dockerfiles or the build script should be fine.

I'm not certain the dockerfiles on branch-2.3+ are immune to these kinds of
breaks either, btw...

On Tue, May 5, 2020 at 7:36 AM 张铎(Duo Zhang)  wrote:

> For branch-2.3+ we pinned all the dependencies with a specific version,
> maybe we should backport the dockerfile to branch-2.2?
>
> Jan Hentschel  于2020年5月5日周二 下午9:58写道：
>
> > I thought we are already pinned the Rubocop version, but not sure about
> > branch-2.2.
> >
> > From: Duo Zhang 
> > Reply-To: "dev@hbase.apache.org" 
> > Date: Tuesday, May 5, 2020 at 2:38 PM
> > To: HBase Dev List 
> > Subject: Nightly job for branch-2.2 is failing
> >
> > Because of failing to build the dockerfile.
> >
> > The error message is
> >
> > *18:52:25*   [91mERROR:  Error installing rubocop:*18:52:25*rubocop
> > requires Ruby version >= 2.4.0.
> >
> >
> > IIRC we have discussed this before that we should keep the rubocop at a
> > specific version to prevent it being upgraded accidentally?
> >
> >
>

[jira] [Created] (HBASE-24330) [fakey test] TestInfoServersACL testJmxAvailableForAdmin

2020-05-05 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24330:


 Summary: [fakey test] TestInfoServersACL testJmxAvailableForAdmin
 Key: HBASE-24330
 URL: https://issues.apache.org/jira/browse/HBASE-24330
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


There was a failure on branch-2.3 last night, 
https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/65/testReport/

This test failed for JDK11 but not JDK8.

Before the failed assertion, I do see a log from the loop on line 312:

{noformat}
2020-05-05 10:57:45,323 INFO  [Time-limited test] 
http.TestInfoServersACL$11(312): Hadoop:service=HBase,name=StartupProgress
{noformat}

This indicates to me that the HBase portion of the process hadn't finished 
coming online yet, and so the precondition was not yet satisfied. Just 
speculation though; I don't understand why the same bean wouldn't be available 
to the admin user as well.

Furthermore, looks like the test does not clean up after itself correctly. On 
retry, the subsequent attempt flat-out failed with

{noformat}
Failed to load or create keytab 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3/component/hbase-server/target/test-data/f2e84dcf-0d1f-fe85-3519-4fcb37058d39/keytab

org.apache.kerby.kerberos.kerb.KrbException: Failed to load or create keytab 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3/component/hbase-server/target/test-data/f2e84dcf-0d1f-fe85-3519-4fcb37058d39/keytab
at 
org.apache.hadoop.hbase.http.TestInfoServersACL.beforeClass(TestInfoServersACL.java:114)
Caused by: java.io.IOException: No such file or directory
at 
org.apache.hadoop.hbase.http.TestInfoServersACL.beforeClass(TestInfoServersACL.java:114)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24324) NPE from /procedures.jsp on backup master

Nick Dimiduk created HBASE-24324:


 Summary: NPE from /procedures.jsp on backup master
 Key: HBASE-24324
 URL: https://issues.apache.org/jira/browse/HBASE-24324
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


When going to {{/procedures.jsp}} on a backup master (i.e., a user hits refresh 
on a window they have open, meanwhile, the active master has flipped over), we 
throw an NPE back to the user. Instead, we should do practically anything else.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.generated.master.procedures_jsp._jspService(procedures_jsp.java:63)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780)
at 
org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
at 
org.apache.hadoop.hbase.http.SecurityHeadersFilter.doFilter(SecurityHeadersFilter.java:66)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
at 
org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:52)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
at 
org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1491)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
at 
org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:50)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:539)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Heads up, infra integration changes

2020-05-04 Thread Nick Dimiduk

Thank you Bharath!!

On Mon, May 4, 2020 at 12:48 PM Bharath Vissapragada 
wrote:

> Hi,
>
> I just committed this change
> <
> https://github.com/apache/hbase/commit/da7c6cc059660478eaeb5695b2283a8d90b66a7b
> >
> that
> changes the way we integrate with the ASF infra. Refer to HBASE-24261
> . As mentioned in the
> jira, there _might be_ some unintended consequences, for example, issues
> like
>
> - Something new popped up in the project Github page that was not there
> before
> - Project emails being sent to wrong aliases messing up my labels
> - Github PR is spamming jiras with unnecessary comments etc.
>
> These are just some random examples I came up with and they may or may not
> happen, but incase you notice anything odd w.r.t infrastructure, please
> report on the jira or respond to this email. Once we are sure that all the
> mailing lists and PR linking is working as expected, we need to do similar
> fixes in our other repos (and branches).
>
> - Bharath
>

[jira] [Resolved] (HBASE-24295) [Chaos Monkey] abstract logging through the class hierarchy



 [ 
https://issues.apache.org/jira/browse/HBASE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24295.
--
Fix Version/s: 2.3.0
   3.0.0-alpha-1
   Resolution: Fixed

> [Chaos Monkey] abstract logging through the class hierarchy
> ---
>
> Key: HBASE-24295
> URL: https://issues.apache.org/jira/browse/HBASE-24295
> Project: HBase
>  Issue Type: Task
>  Components: integration tests
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Running chaos monkey and watching the logs, it's very difficult to tell what 
> actions are actually running. There's lots of shared methods through the 
> class hierarchy that extends from {{abstract class Action}}, and each class 
> comes with its own {{Logger}}. As a result, the logs have useless stuff like
> {noformat}
> INFO actions.Action: Started regionserver...
> {noformat}
> Add {{protected abstract Logger getLogger()}} to the class's internal 
> interface, and have the concrete implementations provide their logger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24320) check-aggregate-license on hbase-shaded-client-byo-hadoop is not compatible with `--threads`

Nick Dimiduk created HBASE-24320:


 Summary: check-aggregate-license on hbase-shaded-client-byo-hadoop 
is not compatible with `--threads`
 Key: HBASE-24320
 URL: https://issues.apache.org/jira/browse/HBASE-24320
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0-alpha-1, 2.3.0
Reporter: Nick Dimiduk


I see occasional failures in this enforcer rule when running maven with 
{{-T1.0C}}. Usually it succeeds when building again. Looks related to the bean 
shell script that loops over license files.

I have been using {{mvn -U -T1.0C clean install -DskipTests}} for both JDK8 and 
JDK11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24260) Add a ClusterManager that issues commands via coprocessor



 [ 
https://issues.apache.org/jira/browse/HBASE-24260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24260.
--
Fix Version/s: 2.3.0
   3.0.0-alpha-1
   Resolution: Fixed

> Add a ClusterManager that issues commands via coprocessor
> -
>
> Key: HBASE-24260
> URL: https://issues.apache.org/jira/browse/HBASE-24260
> Project: HBase
>  Issue Type: New Feature
>  Components: integration tests
>    Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> I have a need to run the chaos monkey in an environment where ssh access is 
> restricted. I can get most of what I need via {{RESTApiClusterManager}}, 
> however {{kill -9}} is a critical and unsupported command. I have a 
> {{ClusterManager}} that implements {{kill}} via an executor coprocessor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24295) [Chaos Monkey] abstract logging through the class hierarchy

Nick Dimiduk created HBASE-24295:


 Summary: [Chaos Monkey] abstract logging through the class 
hierarchy
 Key: HBASE-24295
 URL: https://issues.apache.org/jira/browse/HBASE-24295
 Project: HBase
  Issue Type: Task
  Components: integration tests
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


Running chaos monkey and watching the logs, it's very difficult to tell what 
actions are actually running. There's lots of shared methods through the class 
hierarchy that extends from {{abstract class Action}}, and each class comes 
with its own {{Logger}}. As a result, the logs have useless stuff like

{noformat}
INFO actions.Action: Started regionserver...
{noformat}

Add {{protected abstract Logger getLogger()}} to the class's internal 
interface, and have the concrete implementations provide their logger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24293) Assignment manager should never give up assigning meta

Nick Dimiduk created HBASE-24293:


 Summary: Assignment manager should never give up assigning meta
 Key: HBASE-24293
 URL: https://issues.apache.org/jira/browse/HBASE-24293
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


Not yet sure how we got here, but,

{noformat}
2020-04-29 22:39:16,140 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308, 
state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure 
server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true found 
a region state=OFFLINE, location=null, table=hbase:meta, region=1588230740 
which is no longer on us host-a.example.com,16020,1588033841562, give up 
assigning...
{noformat}

Assignment manager gives up on this procedure and nothing can progress. Manual 
intervention is necessary.

>From this [conditional 
>block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
> it seems the {{regionNode}} location is {{null}}.

{noformat}
// This is possible, as when a server is dead, TRSP will fail to 
schedule a RemoteProcedure
// to us and then try to assign the region to a new RS. And before it 
has updated the region
// location to the new RS, we may have already called the 
am.getRegionsOnServer so we will
// consider the region is still on us. And then before we arrive here, 
the TRSP could have
// updated the region location, or even finished itself, so the region 
is no longer on us
// any more, we should not try to assign it again. Please see 
HBASE-23594 for more details.
if (!serverName.equals(regionNode.getRegionLocation())) {
  LOG.info("{} found a region {} which is no longer on us {}, give up 
assigning...", this,
regionNode, serverName);
  continue;
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24292) A "stuck" master should not idle as active without taking action

Nick Dimiduk created HBASE-24292:


 Summary: A "stuck" master should not idle as active without taking 
action
 Key: HBASE-24292
 URL: https://issues.apache.org/jira/browse/HBASE-24292
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


The master schedules a SCP for the region server hosting meta. However, due to 
a misconfiguration, the cluster cannot make progress. After fixing the 
configuration issue and restarting, the cluster still cannot make progress. 
After the configured period (15 minuets), the master enters a "holding pattern" 
where it retains Active master status, but isn't taking any action.

This "brown-out" state is toxic. It should either keep trying to make progress, 
or it should abort. Staying up and not doing anything is the wrong thing to do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24291) HBCK2 should accept "meta" as an alias for the encoded region name

Nick Dimiduk created HBASE-24291:


 Summary: HBCK2 should accept "meta" as an alias for the encoded 
region name
 Key: HBASE-24291
 URL: https://issues.apache.org/jira/browse/HBASE-24291
 Project: HBase
  Issue Type: Improvement
  Components: hbck2
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


Simple one: hbck2 should accept the alias "meta" in place of the encoded region 
name. It's a little silly to have to say {{hbck2 assigns 1588230740}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24274) `RESTApiClusterManager` attempts to deserialize response using serialization API



 [ 
https://issues.apache.org/jira/browse/HBASE-24274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24274.
--
Fix Version/s: 2.2.5
   2.3.0
   3.0.0
   Resolution: Fixed

> `RESTApiClusterManager` attempts to deserialize response using serialization 
> API
> 
>
> Key: HBASE-24274
> URL: https://issues.apache.org/jira/browse/HBASE-24274
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Affects Versions: 3.0.0, 2.3.0, 2.2.4
>    Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.5
>
>
> I'm not sure if this class ever worked, or if Gson changed their API behavior 
> and we never noticed. The fix is quite simple, to use the streaming 
> {{JsonParser}} API instead of the higher-level object API. However, testing 
> this means standing up a web service that mocks Cloudera Manager response 
> bodies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24257) Exclude jsr311-api from classpath



 [ 
https://issues.apache.org/jira/browse/HBASE-24257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24257.
--
Resolution: Won't Fix

> Exclude jsr311-api from classpath
> -
>
> Key: HBASE-24257
> URL: https://issues.apache.org/jira/browse/HBASE-24257
> Project: HBase
>  Issue Type: Task
>  Components: build
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>Priority: Major
>
> When building on Hadoop3, we get two incompatible version of java-ws rs on 
> the class path, both 1.1.1 and 2.0.1. These cause conflicts when running 
> chaos monkey and integration test tools.
> {noformat}
> [INFO] |  |  +- 
> org.apache.hadoop:hadoop-yarn-server-timelineservice:jar:3.1.2:test
> [INFO] |  |  |  +- org.apache.commons:commons-csv:jar:1.0:test
> [INFO] |  |  |  \- javax.ws.rs:jsr311-api:jar:1.1.1:test
> {noformat}
> {noformat}
> [INFO] org.apache.hbase:hbase-http:jar:2.3.0-SNAPSHOT
> [INFO] +- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile
> {noformat}
> Problem looks like
> {noformat}
> 20/04/23 15:36:04 INFO hbase.RESTApiClusterManager: Executing GET against ...
> Exception in thread "ChaosMonkey" java.lang.NoSuchMethodError: 'void 
> javax.ws.rs.core.MultivaluedMap.addAll(java.lang.Object, java.lang.Object[])'
> at 
> org.glassfish.jersey.client.ClientRequest.accept(ClientRequest.java:336)
> at 
> org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:221)
> at 
> org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:59)
> at 
> org.apache.hadoop.hbase.RESTApiClusterManager.getJsonNodeFromURIGet(RESTApiClusterManager.java:244)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24287) TestExportSnapshotWithTemporaryDirectory fails with "No such file or directory"



 [ 
https://issues.apache.org/jira/browse/HBASE-24287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24287.
--
Resolution: Duplicate

> TestExportSnapshotWithTemporaryDirectory fails with "No such file or 
> directory"
> ---
>
> Key: HBASE-24287
> URL: https://issues.apache.org/jira/browse/HBASE-24287
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Nick Dimiduk
>Priority: Major
> Attachments: 
> TEST-org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.xml
>
>
> Running tests locally last night, I get this failure.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.service.ServiceStateException: ExitCodeException 
> exitCode=1: chmod: 
> /private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing:
>  No such file or directory
>   at 
> org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> ExitCodeException exitCode=1: chmod: 
> /private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing:
>  No such file or directory
>   at 
> org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
> Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
> chmod: 
> /private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing:
>  No such file or directory
>   at 
> org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Please vote on the second hbase-thirdparty-3.3.0 release candidate

2020-04-29 Thread Nick Dimiduk

Testing via unit tests on branch-2.3 revealed nothing amiss.

Belated +1.

On Wed, Apr 29, 2020 at 5:43 AM Peter Somogyi  wrote:

> +1 from me as well.
>
> With five +1 votes (four binding), this vote passes. Let me push out the
> release.
> Thank you all for vote!
>
> Peter
>
>
> On Wed, Apr 29, 2020 at 1:52 PM Josh Elser  wrote:
>
> > +1 (binding)
> >
> > On Tue, Apr 28, 2020, 14:10 Peter Somogyi  wrote:
> >
> > > Please vote on this Apache hbase thirdparty release candidate,
> > > hbase-thirdparty-3.3.0RC1
> > >
> > > The second release candidate only differs in the RELEASENOTES.md update
> > > compared to RC0.
> > >
> > > The VOTE will remain open until there are at least the required voting
> > > majority
> > >
> > > [ ] +1 Release this package as Apache hbase thirdparty 3.3.0
> > > [ ] -1 Do not release this package because ...
> > >
> > > The tag to be voted on is 3.3.0RC1:
> > >
> > > https://github.com/apache/hbase-thirdparty/tree/3.3.0RC1
> > >
> > > The release files, including signatures, digests, as well as CHANGES.md
> > > and RELEASENOTES.md included in this RC can be found at:
> > >
> > >  https://dist.apache.org/repos/dist/dev/hbase/3.3.0RC1/
> > >
> > > Maven artifacts are available in a staging repository at:
> > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1389/
> > >
> > > Artifacts were signed with the psomo...@apache.org key which can be
> > found
> > > in:
> > >
> > >  https://dist.apache.org/repos/dist/release/hbase/KEYS
> > >
> > >  To learn more about apache hbase thirdparty, please see
> > > http://hbase.apache.org/
> > >
> > > Thanks,
> > > Your HBase Release Manager
> > >
> >
>

[DISCUSS] Migrating HBase to new CI Master (was Re: Migration of Hadoop labelled nodes to new dedicated Master)

2020-04-29 Thread Nick Dimiduk

Hi Gavin,

I'd like to get started by copying over our nightly job to run on the new
master. I've accessed ci-hadoop.apache.org and find that I'm able to log in
with my Apache ID. However, I don't seem to have permissions to create a
new job. Can you please grant me appropriate karma?

Next question is with migrating itself. Is it possible for you to restore
the current job XML into a job on new master? This preferable vs. us
attempting to manually define the job via the UI. The job in question is
https://builds.apache.org/job/HBase%20Nightly/

Thanks a lot,
Nick

On Wed, Apr 29, 2020 at 6:06 AM Gavin McDonald  wrote:

> Hi All,
>
> Following on from the below email I sent *11 DAYS ago now*, so far we have
> had *one reply *from mahout cc:d to me (thank you Trevor) , and had *ONE
> PERSON* sign up to the new hadoop-migrati...@infra.apache.org -
> that is out of a total of *OVER 7000* people signed up to the 13 mailing
> lists emailed.
>
> To recap what I asked for:-
>
> "...What I would like from each community, is to decide who is going to
> help with their project in performing these migrations - ideally 2 or 3
> folks who use the current builds.a.o regularly. Those folks should then
> subscribe to the new dedicated hadoop-migrati...@infra.apache.org mailing
> lists as soon as possible so we can get started..."
>
> This will be last email I sent to your dev list directly. I am now building
> a new Jenkins Master, and as soon as it is ready I will start to migrate
> the Jenkins Nodes/Agents over to the new system.
> And; when I am done, the existing builds.apache.org *WILL BE TURNED OFF*.
>
> I am now going to continue all conversations on the
> hadoop-migrati...@infra.apache.org list *only.*
>
> Thanks
>
> Gavin McDonald (ASF Infra)
>
>
> On Sat, Apr 18, 2020 at 4:21 PM Gavin McDonald 
> wrote:
>
> > Hi All,
> >
> > A couple of months ago, I wrote to a few project private lists mentioning
> > the need to migrate Hadoop labelled nodes (H0-H21) over to a new
> dedicated
> > Jenkins Master [1] (a Cloudbees Client Master.).
> >
> > I'd like to revisit this now that I have more time to dedicate to getting
> > this done. However, keeping track across multiple mailing lists,
> > separate conversations that spring up in various places is cumbersome and
> > not realistic. To that end, I have created a new specific mailing list
> > dedicated to the migrations of these nodes, and the projects that use
> them,
> > over to the new system.
> >
> > The mailing list 'hadoop-migrati...@infra.apache.org' is up and running
> > now (and this will be the first post to it). Previous discussions were on
> > the private PMC lists, (there was some debate about that but I wanted the
> > PMCs initially to be aware of the change,) this new list is public and
> > archived.
> >
> > This email is BCC'd to 13 projects dev lists [2] determined by the
> https:/
> > hadoop.apache.org list of Related projects, minus Cassandra whom already
> > have their own dedicated client master [3] and I added Yetus as I think
> > they cross collaborate with many Hadoop based projects. If anyone thinks
> a
> > project is missing, or should not be on the list, let me know.
> >
> > What I would like from each community, is to decide who is going to help
> > with their project in performing these migrations - ideally 2 or 3 folks
> > who use the current builds.a.o regularly. Those folks should then
> subscribe
> > to the new dedicated hadoop-migrati...@infra.apache.org mailing lists as
> > soon as possible so we can get started.
> >
> > About the current setup - and I hope this answers previously asked
> > questions on private lists - the new dedicated master is a Cloudbees
> Client
> > Master 2.204.3.7-rolling. It is not the same setup as the current Jenkins
> > master on builds.a.o - it is not intended to be. It is more or less a
> > 'clean install' in that I have not installed over 500 plugins as is the
> > case on builds.a.o , I would rather we install plugins as we find we need
> > them. So yes, there may be some features missing - the point of having
> > people sign up to the new list is to find out what those are, get them
> > installed, and get your builds to at least the same state they are in
> > currently.
> >
> > We have 2 nodes on there currently for testing, as things progress we can
> > transfer over a couple more, projects can start to migrate their jobs
> over
> > at any time they are happy , until done. We also need to test auth - the
> > master; and its nodes will be restricted to just Hadoop + Related
> projects
> > (which is important this list of related projects is correct). No longer
> > will other projects be able to hop on to Hadoop nodes, and no longer will
> > Hadoop related projects be able to hop onto other folks nodes. This is a
> > good thing, and may encourage some providers to donate a few more VMs for
> > dedicated use.
> >
> > For now then, decide who will help with this process, and sign up to the
> > new mailing list,

[jira] [Created] (HBASE-24287) TestExportSnapshotWithTemporaryDirectory fails with "No such file or directory"

Nick Dimiduk created HBASE-24287:


 Summary: TestExportSnapshotWithTemporaryDirectory fails with "No 
such file or directory"
 Key: HBASE-24287
 URL: https://issues.apache.org/jira/browse/HBASE-24287
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
 Attachments: 
TEST-org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.xml

Running tests locally last night, I get this failure.

{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
org.apache.hadoop.service.ServiceStateException: ExitCodeException exitCode=1: 
chmod: 
/private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing: 
No such file or directory

at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
Caused by: org.apache.hadoop.service.ServiceStateException: 
ExitCodeException exitCode=1: chmod: 
/private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing: 
No such file or directory

at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
chmod: 
/private/tmp/hadoop-yarn-ndimiduk/node-attribute/nodeattribute.mirror.writing: 
No such file or directory

at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshotWithTemporaryDirectory.setUpBeforeClass(TestExportSnapshotWithTemporaryDirectory.java:43)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24274) `RESTApiClusterManager` attempts to deserialize response using serialization API

Nick Dimiduk created HBASE-24274:


 Summary: `RESTApiClusterManager` attempts to deserialize response 
using serialization API
 Key: HBASE-24274
 URL: https://issues.apache.org/jira/browse/HBASE-24274
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


I'm not sure if this class ever worked, or if Gson changed their API behavior 
and we never noticed. The fix is quite simple, to use the streaming 
{{JsonParser}} API instead of the higher-level object API. However, testing 
this means standing up a web service that mocks Cloudera Manager response 
bodies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: DISCUSS: Move hbase-thrift and hbase-rest out of core to hbase-connectors project?

2020-04-27 Thread Nick Dimiduk

I suppose an alternative is to provide a shaded version of the YARN jars
needed by hbase-rest in hbase-thirdparty. We'd need these for both hadoop2
and hadoop3, to pull them in via profile, and to exclude the originals
wherever they appear as transitive deps (though they shouldn't, right?)

On Mon, Apr 27, 2020 at 11:24 AM Josh Elser  wrote:

>
>
> On 4/27/20 1:52 PM, Nick Dimiduk wrote:
> > On Mon, Apr 27, 2020 at 10:11 Stack  wrote:
> >
> >> On Mon, Apr 27, 2020 at 9:44 AM Josh Elser  wrote:
> >>
> >>> +1 to the idea, -0 to the implied execution
> >>>
> >>> I agree hbase-connectors is a better place for REST and thrift, long
> >> term.
> >>> My concern is that I read this thread as suggesting:
> >>>
> >>> 1. Remove rest/thrift from 2.3
> >>> 1a. Proceed with 2.3.0 rc's
> >>> 2. Add rest/thrift to hbase-connectors
> >>> ...
> >>> n. Release hbase-connectors
> >>>
> >>> I'm not a fan of removing anything which was previously there until
> >>> there is are new releases and documentation to tell me how to do it.
> I'm
> >>> still trying to help dig out another project who did the 'remove and
> >>> then migrate" and left a pile of busted.
> >>>
> >>> If that's not what you were suggesting, let me shirk back into the
> >>> shadows;)
> >>>
> >>
> >> Ha ha. Not what I was suggesting but that could for sure happen.
> >>
> >> S
> >>
> >> P.S. I'm having trouble w/ REST jersey1 vs jersey2 vs Hadoop3 transitive
> >> includes++. Thrift has sporadic test failures that seem inherent to
> thrift
> >> rather of our manufacture. The discussion here was provoked by a
> daydream
> >> that bringing forward this inevitable migration of REST+thrift would
> >> 'solve' my immediate pain. Wasn't giving too much mind to the amount of
> >> work needed on the other side.
> >
> > I’m not clear on how moving the module out will resolve the class path
> > issues. Whether it’s built from the main repo or from the side repo,
> yarn’s
> > transitive hull is still present...
> >
>
> Yeah, understandable :)
>
> I think working with "something" for REST/Thrift in connectors (punt on
> H3 to start?) is reasonable. You shouldn't be stuck with the baggage of
> Jersey dependencies if that's not the problem you're trying to solve.
>

[jira] [Created] (HBASE-24271) Set values in `conf/hbase-site.xml` that enable running on `LocalFileSystem` out of the box

Nick Dimiduk created HBASE-24271:

Summary: Set values in `conf/hbase-site.xml` that enable running
on `LocalFileSystem` out of the box
Key: HBASE-24271
URL: https://issues.apache.org/jira/browse/HBASE-24271
Project: HBase
Issue Type: Task
Affects Versions: 3.0.0, 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk

This ticket is to implement the changes as described on the [discussion on
dev|https://lists.apache.org/thread.html/r089de243a9bc9d923fa07c81e6bc825b82be68f567b892a342a0c61f%40%3Cdev.hbase.apache.org%3E].
It reverts and supersedes changes made on HBASE-24086 and HBASE-24106.

{quote}
The conclusion I understand from this thread looks something like this:

1. revert HBASE-24086, make it so that running on `LocalFileSystem` is a fatal
condition with default configs.
2. ship a conf/hbase-site.xml that contains
hbase.unsafe.stream.capability.enforce=false, along with a big comment saying
this is not safe for production.
3. ship a conf/hbase-site.xml that contains hbase.tmp.dir=./tmp, along with a
comment saying herein you'll find temporary and persistent data, reconfigure
for production with hbase.rootdir pointed to a durable filesystem that supports
our required stream capabilities (see above).
4. update HBASE-24106 as appropriate.

Neither 2 nor 3 are suitable for production deployments, thus the changes do
not go into hbase-default.xml. Anyone standing up a production deploy must edit
hbase-site.xml anyway, so this doesn't change anything. It also restores our
"simple" first-time user experience of not needing to run anything besides
`bin/start-hbase.sh` (or `bin/hbase master start`, or whatever it is we're
telling people these days).

We can reassess this once more when a durable equivalent to LocalFileSystem
comes along.
{quote}

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: DISCUSS: Move hbase-thrift and hbase-rest out of core to hbase-connectors project?

2020-04-27 Thread Nick Dimiduk

On Fri, Apr 24, 2020 at 22:06 Sean Busbey  wrote:

> By "works with it" do you mean has documented steps to work with it or do
> you mean that the convenience binary that ships for 2.3.0 will have the
> same deployment model as prior 2.y releases where I can run those services
> directly from the download?

I would not suggest packaging hbase-connectors into the hbase(-core)
tarball. Instead, I would recommend some accommodation akin to (or perhaps
simpler than) what we do with hbck2. Perhaps we can add a env variable that
is the path to the unpacked connectors tarball. This can be sniffed in
`bin/hbase` to retain support for the existing `bin/hbase 
` commands.

I do think we’d need to have a release of connectors before we release
core, if this is indeed the desired path.

On Fri, Apr 24, 2020, 16:34 Stack  wrote:
>
> > Taking a sounding
> >
> > We've talked of moving the hbase-rest and hbase-thrift modules out of
> core
> > over to hbase-connectors project [2, 3]. The connectors project [1] was
> > meant for the likes of REST and thrift. I'm thinking of trying to do the
> > move in the next few days BEFORE 2.3.0RC0. Any objections? I'd make a
> > release from hbase-connectors as part of this effort and would make sure
> it
> > works w/ 2.3.0.
> >
> > Thank you,
> > S
> >
> >
> > 1. https://github.com/apache/hbase-connectors
> > 2. https://issues.apache.org/jira/browse/HBASE-20999
> > 3. https://issues.apache.org/jira/browse/HBASE-20998
> >
>

Re: The flakey find jobs are failing

2020-04-27 Thread Nick Dimiduk

Filed https://issues.apache.org/jira/browse/HBASE-24270

On Mon, Apr 27, 2020 at 9:44 AM Nick Dimiduk  wrote:

> On Fri, Apr 24, 2020 at 6:37 PM 张铎(Duo Zhang) 
> wrote:
>
>> On branch-2.2- we do not have these versions, I guess there must be a
>> reason why we introduced these versions?
>>
>
> Yes, the reason for pinning package versions is that it's a Dockerfile
> best practice, as described in the official guide. On later branches,
> hadolint is running against these files and flagged this error. However, it
> seems I made the version numbers too specific, and a more relaxed
> formulation should be applicable.
>
> Nick Dimiduk  于2020年4月25日周六 上午12:53写道：
>>
>> > Thanks for the quick fix.
>> >
>> > I wonder -- was I too specific with these version numbers? Is it
>> possible
>> > to loosen them? Maybe drop the debian-revision [0] component?
>> >
>> > [0]: http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html
>> >
>> > On Thu, Apr 23, 2020 at 11:51 PM 张铎(Duo Zhang) 
>> > wrote:
>> >
>> > > OK pre commit is also failing.
>> > >
>> > > Filed HBASE-24251...
>> > >
>> > > 张铎(Duo Zhang)  于2020年4月24日周五 下午12:07写道：
>> > >
>> > > > Fail to build docker image?
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/master/1587/console
>> > > >
>> > > >
>> > > > *23:22:27*  Step 3/4 : RUN DEBIAN_FRONTEND=noninteractive apt-get
>> -qq
>> > -y
>> > > update && DEBIAN_FRONTEND=noninteractive apt-get -qq -y install
>> > > --no-install-recommends   curl=7.58.0-2ubuntu3.8
>> > >  python2.7=2.7.17-1~18.04   python-pip=9.0.1-2.3~ubuntu1.18.04.1
>> > >  python-setuptools=39.0.1-2 && apt-get clean && rm -rf
>> > > /var/lib/apt/lists/**23:22:29*   ---> Running in
>> cd5ccda2f128*23:22:38*
>> > >  [91mE: Version '2.7.17-1~18.04' for 'python2.7' was not found
>> > > >
>> > > > *23:22:38* [0mThe command '/bin/sh -c DEBIAN_FRONTEND=noninteractive
>> > > > apt-get -qq -y update && DEBIAN_FRONTEND=noninteractive apt-get -qq
>> -y
>> > > > install --no-install-recommends curl=7.58.0-2ubuntu3.8
>> > > > python2.7=2.7.17-1~18.04 python-pip=9.0.1-2.3~ubuntu1.18.04.1
>> > > > python-setuptools=39.0.1-2 && apt-get clean && rm -rf
>> > > /var/lib/apt/lists/*'
>> > > > returned a non-zero code: 100
>> > > >
>> > > > Anyone has a chance to take a look?  Seems the hard coded python
>> > version
>> > > > in dev-support/Dockerfile is gone?
>> > > >
>> > >
>> >
>>
>

[jira] [Created] (HBASE-24270) Relax version numbers specified in Dockerfiles

Nick Dimiduk created HBASE-24270:


 Summary: Relax version numbers specified in Dockerfiles
 Key: HBASE-24270
 URL: https://issues.apache.org/jira/browse/HBASE-24270
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0, 2.3.0
Reporter: Nick Dimiduk


It seems the version numbers used in our dockerfiles are too strict. As debian 
maintainers bump internal revisions, the old ones roll off, and our docker 
builds break. We should be less strict on the version numbers, but exactly how 
strict will need a package-by-package evaluation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: The flakey find jobs are failing

2020-04-27 Thread Nick Dimiduk

On Fri, Apr 24, 2020 at 6:37 PM 张铎(Duo Zhang)  wrote:

> On branch-2.2- we do not have these versions, I guess there must be a
> reason why we introduced these versions?
>

Yes, the reason for pinning package versions is that it's a Dockerfile best
practice, as described in the official guide. On later branches, hadolint
is running against these files and flagged this error. However, it seems I
made the version numbers too specific, and a more relaxed formulation
should be applicable.

Nick Dimiduk  于2020年4月25日周六 上午12:53写道：
>
> > Thanks for the quick fix.
> >
> > I wonder -- was I too specific with these version numbers? Is it possible
> > to loosen them? Maybe drop the debian-revision [0] component?
> >
> > [0]: http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html
> >
> > On Thu, Apr 23, 2020 at 11:51 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > OK pre commit is also failing.
> > >
> > > Filed HBASE-24251...
> > >
> > > 张铎(Duo Zhang)  于2020年4月24日周五 下午12:07写道：
> > >
> > > > Fail to build docker image?
> > > >
> > > >
> > > >
> > >
> >
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/master/1587/console
> > > >
> > > >
> > > > *23:22:27*  Step 3/4 : RUN DEBIAN_FRONTEND=noninteractive apt-get -qq
> > -y
> > > update && DEBIAN_FRONTEND=noninteractive apt-get -qq -y install
> > > --no-install-recommends   curl=7.58.0-2ubuntu3.8
> > >  python2.7=2.7.17-1~18.04   python-pip=9.0.1-2.3~ubuntu1.18.04.1
> > >  python-setuptools=39.0.1-2 && apt-get clean && rm -rf
> > > /var/lib/apt/lists/**23:22:29*   ---> Running in cd5ccda2f128*23:22:38*
> > >  [91mE: Version '2.7.17-1~18.04' for 'python2.7' was not found
> > > >
> > > > *23:22:38* [0mThe command '/bin/sh -c DEBIAN_FRONTEND=noninteractive
> > > > apt-get -qq -y update && DEBIAN_FRONTEND=noninteractive apt-get -qq
> -y
> > > > install --no-install-recommends curl=7.58.0-2ubuntu3.8
> > > > python2.7=2.7.17-1~18.04 python-pip=9.0.1-2.3~ubuntu1.18.04.1
> > > > python-setuptools=39.0.1-2 && apt-get clean && rm -rf
> > > /var/lib/apt/lists/*'
> > > > returned a non-zero code: 100
> > > >
> > > > Anyone has a chance to take a look?  Seems the hard coded python
> > version
> > > > in dev-support/Dockerfile is gone?
> > > >
> > >
> >
>

[jira] [Resolved] (HBASE-24143) [JDK11] Switch default garbage collector from CMS



 [ 
https://issues.apache.org/jira/browse/HBASE-24143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24143.
--
Resolution: Fixed

Addendum applied to all three branches. Nice catch [~busbey].

> [JDK11] Switch default garbage collector from CMS
> -
>
> Key: HBASE-24143
> URL: https://issues.apache.org/jira/browse/HBASE-24143
> Project: HBase
>  Issue Type: Sub-task
>  Components: scripts
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> When running HBase tools on the cli, one of the warnings generated is
> {noformat}
> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in 
> version 9.0 and will likely be removed in a future release.
> {noformat}
> Java9+ use G1GC as the default collector. Maybe we simply omit GC 
> configurations and use the default settings? Or someone has some suggested 
> settings we can ship out of the box?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24269) Use `set -e` in our bin scripts

Nick Dimiduk created HBASE-24269:


 Summary: Use `set -e` in our bin scripts
 Key: HBASE-24269
 URL: https://issues.apache.org/jira/browse/HBASE-24269
 Project: HBase
  Issue Type: Task
  Components: scripts
Reporter: Nick Dimiduk


We don't {{set -e}} in our bin scripts anywhere, as far as I can tell. We can 
have an error, of our own making, or a user's setting in {{conf}}, and our 
scripts proceed as if all is right in the world.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24260) Add a ClusterManager that issues commands via coprocessor

2020-04-24 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24260:


 Summary: Add a ClusterManager that issues commands via coprocessor
 Key: HBASE-24260
 URL: https://issues.apache.org/jira/browse/HBASE-24260
 Project: HBase
  Issue Type: New Feature
  Components: integration tests
Reporter: Nick Dimiduk


I have a need to run the chaos monkey in an environment where ssh access is 
restricted. I can get most of what I need via {{RESTApiClusterManager}}, 
however {{kill -9}} is a critical and unsupported command. I have a 
{{ClusterManager}} that implements {{kill}} via an executor coprocessor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24257) Exclude jsr311-api from classpath

2020-04-24 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24257:


 Summary: Exclude jsr311-api from classpath
 Key: HBASE-24257
 URL: https://issues.apache.org/jira/browse/HBASE-24257
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


When building on Hadoop3, we get two incompatible version of java-ws rs on the 
class path, both 1.1.1 and 2.0.1. These cause conflicts when running chaos 
monkey and integration test tools.

{noformat}
[INFO] |  |  +- 
org.apache.hadoop:hadoop-yarn-server-timelineservice:jar:3.0.0-cdh6.3.2:test
[INFO] |  |  |  +- org.apache.commons:commons-csv:jar:1.0:test
[INFO] |  |  |  \- javax.ws.rs:jsr311-api:jar:1.1.1:test
{noformat}

{noformat}
[INFO] org.apache.hbase:hbase-http:jar:2.3.0-nick-SNAPSHOT
[INFO] +- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile
{noformat}

Problem looks like

{noformat}
20/04/23 15:36:04 INFO hbase.RESTApiClusterManager: Executing GET against ...
Exception in thread "ChaosMonkey" java.lang.NoSuchMethodError: 'void 
javax.ws.rs.core.MultivaluedMap.addAll(java.lang.Object, java.lang.Object[])'
at 
org.glassfish.jersey.client.ClientRequest.accept(ClientRequest.java:336)
at 
org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:221)
at 
org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:59)
at 
org.apache.hadoop.hbase.RESTApiClusterManager.getJsonNodeFromURIGet(RESTApiClusterManager.java:244)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: The flakey find jobs are failing

2020-04-24 Thread Nick Dimiduk

Thanks for the quick fix.

I wonder -- was I too specific with these version numbers? Is it possible
to loosen them? Maybe drop the debian-revision [0] component?

[0]: http://manpages.ubuntu.com/manpages/trusty/man5/deb-version.5.html

On Thu, Apr 23, 2020 at 11:51 PM 张铎(Duo Zhang) 
wrote:

> OK pre commit is also failing.
>
> Filed HBASE-24251...
>
> 张铎(Duo Zhang)  于2020年4月24日周五 下午12:07写道：
>
> > Fail to build docker image?
> >
> >
> >
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/master/1587/console
> >
> >
> > *23:22:27*  Step 3/4 : RUN DEBIAN_FRONTEND=noninteractive apt-get -qq -y
> update && DEBIAN_FRONTEND=noninteractive apt-get -qq -y install
> --no-install-recommends   curl=7.58.0-2ubuntu3.8
>  python2.7=2.7.17-1~18.04   python-pip=9.0.1-2.3~ubuntu1.18.04.1
>  python-setuptools=39.0.1-2 && apt-get clean && rm -rf
> /var/lib/apt/lists/**23:22:29*   ---> Running in cd5ccda2f128*23:22:38*
>  [91mE: Version '2.7.17-1~18.04' for 'python2.7' was not found
> >
> > *23:22:38* [0mThe command '/bin/sh -c DEBIAN_FRONTEND=noninteractive
> > apt-get -qq -y update && DEBIAN_FRONTEND=noninteractive apt-get -qq -y
> > install --no-install-recommends curl=7.58.0-2ubuntu3.8
> > python2.7=2.7.17-1~18.04 python-pip=9.0.1-2.3~ubuntu1.18.04.1
> > python-setuptools=39.0.1-2 && apt-get clean && rm -rf
> /var/lib/apt/lists/*'
> > returned a non-zero code: 100
> >
> > Anyone has a chance to take a look?  Seems the hard coded python version
> > in dev-support/Dockerfile is gone?
> >
>

[jira] [Created] (HBASE-24248) AsyncRpcRetryingCaller should include master call details

2020-04-23 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24248:


 Summary: AsyncRpcRetryingCaller should include master call details
 Key: HBASE-24248
 URL: https://issues.apache.org/jira/browse/HBASE-24248
 Project: HBase
  Issue Type: Improvement
  Components: IPC/RPC
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


I think the below is a retry loop from a coprocessor execution pointed at a 
master. The call details are missing some important details, such as:
 * hostname and port
 * call id
 * remote (rpc) method name

{noformat}
20/04/23 21:09:26 WARN client.AsyncRpcRetryingCaller: Call to master failed, 
tries = 6, maxAttempts = 10, timeout = 120 ms, time elapsed = 2902 ms
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running 
yet
at 
org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2873)
at 
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2885)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.rpcPreCheck(MasterRpcServices.java:438)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:882)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99)
at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89)
at 
org.apache.hadoop.hbase.client.ConnectionUtils.translateException(ConnectionUtils.java:321)
at 
org.apache.hadoop.hbase.client.AsyncRpcRetryingCaller.onError(AsyncRpcRetryingCaller.java:159)
at 
org.apache.hadoop.hbase.client.AsyncMasterRequestRpcRetryingCaller.lambda$null$4(AsyncMasterRequestRpcRetryingCaller.java:73)
at 
org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.hadoop.hbase.client.MasterCoprocessorRpcChannelImpl$1.run(MasterCoprocessorRpcChannelImpl.java:63)
at 
org.apache.hadoop.hbase.client.MasterCoprocessorRpcChannelImpl$1.run(MasterCoprocessorRpcChannelImpl.java:58)
at 
org.apache.hbase.thirdparty.com.google.protobuf.RpcUtil$1.run(RpcUtil.java:79)
at 
org.apache.hbase.thirdparty.com.google.protobuf.RpcUtil$1.run(RpcUtil.java:70)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:87)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:407)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:403)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:117)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:132)
at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162)
at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355

[jira] [Resolved] (HBASE-23829) Get `-PrunSmallTests` passing on JDK11

2020-04-21 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-23829.
--
Resolution: Fixed

> Get `-PrunSmallTests` passing on JDK11
> --
>
> Key: HBASE-23829
> URL: https://issues.apache.org/jira/browse/HBASE-23829
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>    Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Start with the small tests, shaking out issues identified by the harness. So 
> far it seems like {{-Dhadoop.profile=3.0}} and 
> {{-Dhadoop-three.version=3.3.0-SNAPSHOT}} maybe be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24227) [JDK11] shell fails to launch

2020-04-21 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24227:


 Summary: [JDK11] shell fails to launch
 Key: HBASE-24227
 URL: https://issues.apache.org/jira/browse/HBASE-24227
 Project: HBase
  Issue Type: Sub-task
  Components: shell
Affects Versions: 3.0.0
Reporter: Nick Dimiduk


{noformat}
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home 
./bin/hbase shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.apache.hadoop.hbase.util.UnsafeAvailChecker 
(file:/Users/ndimiduk/repos/apache/hbase/hbase-common/target/hbase-common-3.0.0-SNAPSHOT.jar)
 to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of 
org.apache.hadoop.hbase.util.UnsafeAvailChecker
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/book.html#shell
Version 3.0.0-SNAPSHOT, rbcacc4ce939e60fd69891df6315a39aef852b567, Tue Apr 21 
15:47:27 PDT 2020
Took 0. seconds 


ArgumentError: wrong number of arguments (1 for 0)
   ` at uri:classloader:/jruby/kernel/jruby/process_manager.rb:32
   ` at uri:classloader:/jruby/kernel/jruby/process_manager.rb:54
  initialize at 
/Users/ndimiduk/repos/apache/hbase/hbase-shell/src/main/ruby/irb/hirb.rb:46
   start at /Users/ndimiduk/repos/apache/hbase/bin/../bin/hirb.rb:207
   at /Users/ndimiduk/repos/apache/hbase/bin/../bin/hirb.rb:219
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24224) Update GC config examples in hbase-env.sh for G1GC

2020-04-21 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24224:


 Summary: Update GC config examples in hbase-env.sh for G1GC
 Key: HBASE-24224
 URL: https://issues.apache.org/jira/browse/HBASE-24224
 Project: HBase
  Issue Type: Sub-task
  Components: scripts
Affects Versions: 3.0.0, 2.3.0
Reporter: Nick Dimiduk


Anywhere we changed the default collector to G1GC, we should also update 
{{hbase-env.sh}} to include GC configuration examples appropriate for that 
collector and JVM. This means, where we support both JDK8 and JDK11, we'll have 
two sections of examples, one for CMS and the other for G1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Build failed in Jenkins: HBase-Nightly-ARM #81

2020-04-17 Thread Nick Dimiduk

Why are these going to dev@ ? Don't we have a builds@ list for this purpose?

Thanks,
Nick

On Thu, Apr 16, 2020 at 7:38 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See <
> https://builds.apache.org/job/HBase-Nightly-ARM/81/display/redirect?page=changes
> >
>
> Changes:
>
> [stack] HBASE-24175 [Flakey Tests] TestSecureExportSnapshot
>
> [github] HBASE-24197 TestHttpServer.testBindAddress failure with latest
> jetty
>
> [github] HBASE-24170 Remove hadoop-2.0 profile (#1495)
>
> [github] HBASE-24194 : Refactor anonymous inner classes of
> BufferedEncodedSeeker
>
> [github] HBASE-24186: RegionMover ignores replicationId (#1512)
>
> [github] HBASE-24195 : Admin.getRegionServers() should return live servers
> exc…
>
> [stack] HBASE-24158 [Flakey Tests] TestAsyncTableGetMultiThreaded
>
>
> --
> [...truncated 421.84 KB...]
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-install-plugin:2.5.2:install [m
> [1m(default-install) [m @  [36mhbase-client-project [0;1m --- [m
> [ [1;34mINFO [m] Installing <
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-client-project/target/hbase-client-project-3.0.0-SNAPSHOT.jar>
> to
> /home/jenkins/.m2/repository/org/apache/hbase/hbase-client-project/3.0.0-SNAPSHOT/hbase-client-project-3.0.0-SNAPSHOT.jar
> [ [1;34mINFO [m] Installing <
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-client-project/pom.xml>
> to
> /home/jenkins/.m2/repository/org/apache/hbase/hbase-client-project/3.0.0-SNAPSHOT/hbase-client-project-3.0.0-SNAPSHOT.pom
> [ [1;34mINFO [m] Installing <
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-client-project/target/hbase-client-project-3.0.0-SNAPSHOT-tests.jar>
> to
> /home/jenkins/.m2/repository/org/apache/hbase/hbase-client-project/3.0.0-SNAPSHOT/hbase-client-project-3.0.0-SNAPSHOT-tests.jar
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m<
> [0;36morg.apache.hbase:hbase-shaded-client-project [0;1m > [m
> [ [1;34mINFO [m]  [1mBuilding Apache HBase - Exemplar for
> hbase-shaded-client archetype 3.0.0-SNAPSHOT [40/41] [m
> [ [1;34mINFO [m]  [1m[ jar
> ]- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-clean-plugin:3.1.0:clean [m
> [1m(default-clean) [m @  [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m] Deleting <
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-shaded-client-project/target
> >
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---
> [0;32mbuild-helper-maven-plugin:3.0.0:bsh-property [m
> [1m(negate-license-bundles-property) [m @  [36mhbase-shaded-client-project
> [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---
> [0;32mbuild-helper-maven-plugin:3.0.0:regex-property [m
> [1m(create-license-file-path-property) [m @
> [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m] No match to regex '\\' found in '<
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-shaded-client-project/target/maven-shared-archive-resources/META-INF/LICENSE'.>
> The initial value '<
> https://builds.apache.org/job/HBase-Nightly-ARM/ws/hbase-archetypes/hbase-shaded-client-project/target/maven-shared-archive-resources/META-INF/LICENSE'>
> is left as-is...
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:3.0.0-M2:enforce [m
> [1m(enforce-maven-version) [m @  [36mhbase-shaded-client-project [0;1m ---
> [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:3.0.0-M2:enforce [m
> [1m(hadoop-profile-min-maven-min-java-banned-xerces) [m @
> [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:3.0.0-M2:enforce [m
> [1m(banned-jsr305) [m @  [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:3.0.0-M2:enforce [m
> [1m(banned-scala) [m @  [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---
> [0;32mbuildnumber-maven-plugin:1.4:create-timestamp [m  [1m(default) [m @
> [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:3.0.0-M2:enforce [m
> [1m(banned-illegal-imports) [m @  [36mhbase-shaded-client-project [0;1m ---
> [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---
> [0;32mmaven-remote-resources-plugin:1.6.0:process [m
> [1m(process-resource-bundles) [m @  [36mhbase-shaded-client-project [0;1m
> --- [m
> [ [1;34mINFO [m] Preparing remote bundle
> org.apache:apache-jar-resource-bundle:1.4
> [ [1;34mINFO [m] Copying 3 resources from 1 bundle.
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-resources-plugin:3.1.0:resources [m
> [1m(default-resources) [m @  [36mhbase-shaded-client-project [0;1m --- [m
> [ [1;34mINFO [m] Using 'UTF-8' encoding to copy filtered resources.
>

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

2020-04-17 Thread Nick Dimiduk

On Fri, Apr 17, 2020 at 3:31 PM Stack  wrote:

> On writing to local 'tmp' dir, thats fine, but quickstart was always
> supposed to be a transient install (its one example of setting config is
> the setting of the tmp location). The messaging that this is the case needs
> an edit after a re-read (I volunteer to do this and to give refguide a
> once-over on lack of guarantees when an hbase deploy is unconfigured).
>

Sounds like you're reading my handiwork, pushed in HBASE-24106. I'm
definitely open to editing help, yes please! Before that change, the Quick
Start section required the user to set hbase.rootdir,
hbase.zookeeper.property.dataDir, and
hbase.unsafe.stream.capability.enforce all before that could start the
local process.

Can we have the start-out-of-the-box back please? Its a PITA having to go
> edit config running a local build trying to test something nevermind the
> poor noob whose first experience is a fail.
>

I agree.

The conclusion I understand from this thread looks something like this:

1. revert HBASE-24086, make it so that running on `LocalFileSystem` is a
fatal condition with default configs.
2. ship a conf/hbase-site.xml that contains
hbase.unsafe.stream.capability.enforce=false, along with a big comment
saying this is not safe for production.
3. ship a conf/hbase-site.xml that contains hbase.tmp.dir=./tmp, along with
a comment saying herein you'll find temporary and persistent data,
reconfigure for production with hbase.rootdir pointed to a durable
filesystem that supports our required stream capabilities (see above).
4. update HBASE-24106 as appropriate.

Neither 2 nor 3 are suitable for production deployments, thus the changes
do not go into hbase-default.xml. Anyone standing up a production deploy
must edit hbase-site.xml anyway, so this doesn't change anything. It also
restores our "simple" first-time user experience of not needing to run
anything besides `bin/start-hbase.sh` (or `bin/hbase master start`, or
whatever it is we're telling people these days).

We can reassess this once more when a durable equivalent to LocalFileSystem
comes along.

Thanks,
Nick

[jira] [Resolved] (HBASE-23875) Add JDK11 compilation and unit test support to Jira attached patch precommit

2020-04-17 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-23875.
--
  Assignee: (was: Nick Dimiduk)
Resolution: Won't Fix

Resolving as won't fix. Code contribution via patches attached to Jira is no 
longer supported.

> Add JDK11 compilation and unit test support to Jira attached patch precommit
> 
>
> Key: HBASE-23875
> URL: https://issues.apache.org/jira/browse/HBASE-23875
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>    Reporter: Nick Dimiduk
>Priority: Major
>
> We already test against multiple JDK versions in a handful of places. Let's 
> get JDK11 added to the mix. Applies to Jira attached-patch pre-commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24143) [JDK11] Switch default garbage collector from CMS

2020-04-17 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24143.
--
Fix Version/s: 2.3.0
   3.0.0
 Release Note: 

`bin/hbase` will now dynamically select a Garbage Collector implementation 
based on the detected JVM version. JDKs 8,9,10 use `-XX:+UseConcMarkSweepGC`, 
while JDK11+ use `-XX:+UseG1GC`.

Notice a slight compatibility change. Previously, the garbage collector choice 
would always be appended to a user-provided value for `HBASE_OPTS`. As of this 
change, this setting will only be applied when `HBASE_OPTS` is unset. That 
means that operators who provide a value for this variable will now need to 
also specify the collector. This is especially important for those on JDK8, 
where the vm default GC is not the recommended ConcMarkSweep.
   Resolution: Fixed

> [JDK11] Switch default garbage collector from CMS
> -
>
> Key: HBASE-24143
> URL: https://issues.apache.org/jira/browse/HBASE-24143
> Project: HBase
>  Issue Type: Sub-task
>  Components: scripts
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> When running HBase tools on the cli, one of the warnings generated is
> {noformat}
> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in 
> version 9.0 and will likely be removed in a future release.
> {noformat}
> Java9+ use G1GC as the default collector. Maybe we simply omit GC 
> configurations and use the default settings? Or someone has some suggested 
> settings we can ship out of the box?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24209) Add Hadoop-3.3.0-SNAPSHOT to hadoopcheck in our yetus personality

2020-04-17 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24209:


 Summary: Add Hadoop-3.3.0-SNAPSHOT to hadoopcheck in our yetus 
personality
 Key: HBASE-24209
 URL: https://issues.apache.org/jira/browse/HBASE-24209
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Nick Dimiduk


Since HBASE-23833 we're paying attention to our builds on Hadoop trunk, 
currently 3.3.0-SNAPSHOT. Let's add this version to the version lists in 
hadoopcheck so our CI will let us know when things break, at least compile-time 
anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24202) Nightly CI's USE_YETUS_PRERELEASE feature is broken

2020-04-16 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24202:


 Summary: Nightly CI's USE_YETUS_PRERELEASE feature is broken
 Key: HBASE-24202
 URL: https://issues.apache.org/jira/browse/HBASE-24202
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Nick Dimiduk


I tried kicking off a 
[build|https://builds.apache.org/blue/organizations/jenkins/HBase%20Nightly/detail/master/1701/pipeline/]
 with the latest Yetus today, only to find

{noformat}
[2020-04-16T20:54:09.742Z] bash: 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_master/yetus-git/precommit/test-patch.sh:
 No such file or directory
{noformat}

Looks like we don't adjust the path to test-patch.sh based on working from a 
source tarball vs. a release tarball.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24201) Fix PR builds on branch-2.2

2020-04-16 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24201:


 Summary: Fix PR builds on branch-2.2
 Key: HBASE-24201
 URL: https://issues.apache.org/jira/browse/HBASE-24201
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 2.2.5
Reporter: Nick Dimiduk


>From a recent [PR 
>build|https://builds.apache.org/blue/organizations/jenkins/HBase-PreCommit-GitHub-PR/detail/PR-1532/1/pipeline/]

{noformat}
[2020-04-16T18:43:21.548Z] Setting up ruby2.3 (2.3.3-1+deb9u7) ...
[2020-04-16T18:43:21.548Z] Setting up ruby2.3-dev:amd64 (2.3.3-1+deb9u7) ...
[2020-04-16T18:43:21.548Z] Setting up ruby-dev:amd64 (1:2.3.3) ...
[2020-04-16T18:43:21.548Z] Setting up ruby (1:2.3.3) ...
[2020-04-16T18:43:22.261Z] Processing triggers for libc-bin (2.24-11+deb9u3) ...
[2020-04-16T18:43:22.975Z] Successfully installed rake-13.0.1
[2020-04-16T18:43:22.975Z] Building native extensions.  This could take a 
while...
[2020-04-16T18:43:25.277Z] ERROR:  Error installing rubocop:
[2020-04-16T18:43:25.277Z]  rubocop requires Ruby version >= 2.4.0.
{noformat}

Looks like the Dockerfile on branch-2.2 has bit-rot. I suspect package versions 
are partially pinned or not pinned at all: the rubocop version has incremented 
by ruby version has not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Want to join slack channel for HBase

2020-04-16 Thread Nick Dimiduk

Invitation sent.

On Thu, Apr 16, 2020 at 10:12 AM shishir Rai 
wrote:

> Hi,
> I am a principal software developer working on HBase.I want to join slack
> channel
> to get quick resolutions regarding HBase queries .
> Please send me an invite to join the channel.
>
> --
> SHISHIR RAI
>

[jira] [Created] (HBASE-24200) Upgrade to Yetus 0.12.0

2020-04-16 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24200:


 Summary: Upgrade to Yetus 0.12.0
 Key: HBASE-24200
 URL: https://issues.apache.org/jira/browse/HBASE-24200
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Nick Dimiduk


A new Yetus release is imminent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

2020-04-15 Thread Nick Dimiduk

On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell  wrote:

> Nick's mail doesn't make a distinction between avoiding data loss via
> typical tmp cleaner configurations, unfortunately adjacent to mention of
> "durability", and real data durability, which implies more than what a
> single system configuration can offer, no matter how many tweaks we make to
> LocalFileSystem. Maybe I'm being pedantic but this is something to be
> really clear about IMHO.
>

I prefer to focus the attention of this thread to the question of data
durability via `FileSystem` characteristics. I agree that there are
concerns of durability (and others) around the use of the path under /tmp.
Let's keep that discussion in the other thread.

On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey  wrote:
>
> > I think the first assumption no longer holds. Especially with the move
> > to flexible compute environments I regularly get asked by folks what
> > the smallest HBase they can start with for production. I can keep
> > saying 3/5/7 nodes or whatever but I guarantee there are folks who
> > want to and will run HBase with a single node. Probably those
> > deployments won't want to have the distributed flag set. None of them
> > really have a good option for where the WALs go, and failing loud when
> > they try to go to LocalFileSystem is the best option I've seen so far
> > to make sure folks realize they are getting into muddy waters.
> >
> > I agree with the second assumption. Our quickstart in general is too
> > complicated. Maybe if we include big warnings in the guide itself, we
> > could make a quickstart specific artifact to download that has the
> > unsafe disabling config in place?
> >
> > Last fall I toyed with the idea of adding an "hbase-local" module to
> > the hbase-filesystem repo that could start us out with some
> > optimizations for single node set ups. We could start with a fork of
> > RawLocalFileSystem (which will call OutputStream flush operations in
> > response to hflush/hsync) that properly advertises its
> > StreamCapabilities to say that it supports the operations we need.
> > Alternatively we could make our own implementation of FileSystem that
> > uses NIO stuff. Either of these approaches would solve both problems.
> >
> > On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk 
> wrote:
> > >
> > > Hi folks,
> > >
> > > I'd like to bring up the topic of the experience of new users as it
> > > pertains to use of the `LocalFileSystem` and its associated (lack of)
> > data
> > > durability guarantees. By default, an unconfigured HBase runs with its
> > root
> > > directory on a `file:///` path. This patch is picked up as an instance
> of
> > > `LocalFileSystem`. Hadoop has long offered this class, but it has never
> > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase
> > runs
> > > on this configuration, it is unable to ensure that WAL writes are
> > durable,
> > > and thus will ACK a write without this assurance. This is the case,
> even
> > > when running in a fully durable WAL mode.
> > >
> > > This impacts a new user, someone kicking the tires on HBase following
> our
> > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
> > will
> > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
> > > describes a process of disabling enforcement of stream capability
> > > enforcement as a first step. This is a mandatory configuration for
> > running
> > > HBase directly out of our binary distribution.
> > >
> > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
> > > 2.8: log a warning and cary on. The critique of this approach is that
> > it's
> > > far too subtle, too quiet for a system operating in a state known to
> not
> > > provide data durability.
> > >
> > > I have two assumptions/concerns around the state of things, which
> > prompted
> > > my solution on HBASE-24086 and the associated doc update on
> HBASE-24106.
> > >
> > > 1. No one should be running a production system on `LocalFileSystem`.
> > >
> > > The initial implementation checked both for `LocalFileSystem` and
> > > `hbase.cluster.distributed`. When running on the former and the latter
> is
> > > false, we assume the user is running a non-production deployment and
> > carry
> > > on with the warning. When the latter is true, we assume the user
> > intended a
> > > production deployment and

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

2020-04-15 Thread Nick Dimiduk

On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey  wrote:

> I think the first assumption no longer holds. Especially with the move
> to flexible compute environments I regularly get asked by folks what
> the smallest HBase they can start with for production. I can keep
> saying 3/5/7 nodes or whatever but I guarantee there are folks who
> want to and will run HBase with a single node. Probably those
> deployments won't want to have the distributed flag set. None of them
> really have a good option for where the WALs go, and failing loud when
> they try to go to LocalFileSystem is the best option I've seen so far
> to make sure folks realize they are getting into muddy waters.
>

I think this is where we disagree. My answer to this same question is 12
node: 3 "coordinator" hosts for HA ZK, HDFS, and HBase master + 9 "worker"
hosts for replicated data serving and storage. Tweak the number of workers
and the replication factor if you like, but that's how you get a durable,
available deployment suitable for an online production solution. Anything
smaller than this and you're in the "muddy waters" of under-replicated
distributed system failure domains.

I agree with the second assumption. Our quickstart in general is too
> complicated. Maybe if we include big warnings in the guide itself, we
> could make a quickstart specific artifact to download that has the
> unsafe disabling config in place?
>

I'm not a fan of the dedicated artifact as a binary tarball. I think that
approach fractures the brand of our product and emphasizes the idea that
it's even more complicated. If we want a dedicated quick start experience,
I would advocate investing the resources into something more like a
learning laboratory that is accompanied with a runtime image in a VM or
container.

Last fall I toyed with the idea of adding an "hbase-local" module to
> the hbase-filesystem repo that could start us out with some
> optimizations for single node set ups. We could start with a fork of
> RawLocalFileSystem (which will call OutputStream flush operations in
> response to hflush/hsync) that properly advertises its
> StreamCapabilities to say that it supports the operations we need.
> Alternatively we could make our own implementation of FileSystem that
> uses NIO stuff. Either of these approaches would solve both problems.
>

I find this approach more palatable than a custom quick start binary
tarball.

On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk  wrote:
> >
> > Hi folks,
> >
> > I'd like to bring up the topic of the experience of new users as it
> > pertains to use of the `LocalFileSystem` and its associated (lack of)
> data
> > durability guarantees. By default, an unconfigured HBase runs with its
> root
> > directory on a `file:///` path. This patch is picked up as an instance of
> > `LocalFileSystem`. Hadoop has long offered this class, but it has never
> > supported `hsync` or `hflush` stream characteristics. Thus, when HBase
> runs
> > on this configuration, it is unable to ensure that WAL writes are
> durable,
> > and thus will ACK a write without this assurance. This is the case, even
> > when running in a fully durable WAL mode.
> >
> > This impacts a new user, someone kicking the tires on HBase following our
> > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
> will
> > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
> > describes a process of disabling enforcement of stream capability
> > enforcement as a first step. This is a mandatory configuration for
> running
> > HBase directly out of our binary distribution.
> >
> > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
> > 2.8: log a warning and cary on. The critique of this approach is that
> it's
> > far too subtle, too quiet for a system operating in a state known to not
> > provide data durability.
> >
> > I have two assumptions/concerns around the state of things, which
> prompted
> > my solution on HBASE-24086 and the associated doc update on HBASE-24106.
> >
> > 1. No one should be running a production system on `LocalFileSystem`.
> >
> > The initial implementation checked both for `LocalFileSystem` and
> > `hbase.cluster.distributed`. When running on the former and the latter is
> > false, we assume the user is running a non-production deployment and
> carry
> > on with the warning. When the latter is true, we assume the user
> intended a
> > production deployment and the process terminates due to stream capability
> > enforcement. Subsequent code review resulted in skipping the
> > `hbase.cluster.distributed` check and simply warning, as was done on 2.

[DISCUSS] Change the Location of hbase.rootdir to improve the Quick Start User Experience (was Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086))

2020-04-15 Thread Nick Dimiduk

Branching off this subject from the original thread.

On Wed, Apr 15, 2020 at 9:56 AM Andrew Purtell 
wrote:

> Quick Start and Production are exclusive configurations.
>

Yes absolutely.

Quick Start, as you say, should have as few steps to up and running as
> possible.
>
> Production requires a real distributed filesystem for persistence and that
> means HDFS and that means, whatever the provisioning and deployment and
> process management (Ambari or k8s or...) choices are going to be, they will
> not be a Quick Start.
>
> We’ve always had this problem. The Quick Start simply can’t produce a
> system capable of durability because prerequisites for durability are not
> quick to set up.
>

Is this exclusively due to the implementation of `LocalFileSystem` or are
there other issues at play? I've seen there's also `RawLocalFileSystem` but
I haven't investigated their relationship, it's capabilities, or if we
might profit from its use for the Quick Start experience.

Specifically about /tmp...  I agree that’s not a good default. Time and
> again I’ve heard people complain that the tmp cleaner has removed their
> test data. It shouldn’t be surprising but is and that is real feedback on
> mismatch of user expectation to what we are providing in that
> configuration. Addressing this aspect of the Quick Start experience would
> be a simple change: make the default a new directory in $HOME, perhaps
> “.hbase” .
>

I propose changing the default value of `hbase.tmp.dir` as shipped in the
default hbase-site.xml to be simply `tmp`, as I documented in my change on
HBASE-24106. That way it's not hidden somewhere and it's self-contained to
this unpacking of the source/binary distribution. I.e., there's no need to
worry about upgrading the data stored there when a user experiments with a
new version.

> On Apr 15, 2020, at 9:40 AM, Nick Dimiduk  wrote:
> >
> > Hi folks,
> >
> > I'd like to bring up the topic of the experience of new users as it
> > pertains to use of the `LocalFileSystem` and its associated (lack of)
> data
> > durability guarantees. By default, an unconfigured HBase runs with its
> root
> > directory on a `file:///` path. This patch is picked up as an instance of
> > `LocalFileSystem`. Hadoop has long offered this class, but it has never
> > supported `hsync` or `hflush` stream characteristics. Thus, when HBase
> runs
> > on this configuration, it is unable to ensure that WAL writes are
> durable,
> > and thus will ACK a write without this assurance. This is the case, even
> > when running in a fully durable WAL mode.
> >
> > This impacts a new user, someone kicking the tires on HBase following our
> > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
> will
> > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
> > describes a process of disabling enforcement of stream capability
> > enforcement as a first step. This is a mandatory configuration for
> running
> > HBase directly out of our binary distribution.
> >
> > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
> > 2.8: log a warning and cary on. The critique of this approach is that
> it's
> > far too subtle, too quiet for a system operating in a state known to not
> > provide data durability.
> >
> > I have two assumptions/concerns around the state of things, which
> prompted
> > my solution on HBASE-24086 and the associated doc update on HBASE-24106.
> >
> > 1. No one should be running a production system on `LocalFileSystem`.
> >
> > The initial implementation checked both for `LocalFileSystem` and
> > `hbase.cluster.distributed`. When running on the former and the latter is
> > false, we assume the user is running a non-production deployment and
> carry
> > on with the warning. When the latter is true, we assume the user
> intended a
> > production deployment and the process terminates due to stream capability
> > enforcement. Subsequent code review resulted in skipping the
> > `hbase.cluster.distributed` check and simply warning, as was done on 2.8
> > and earlier.
> >
> > (As I understand it, we've long used the `hbase.cluster.distributed`
> > configuration to decide if the user intends this runtime to be a
> production
> > deployment or not.)
> >
> > Is this a faulty assumption? Is there a use-case we support where we
> > condone running production deployment on the non-durable
> `LocalFileSystem`?
> >
> > 2. The Quick Start experience should require no configuration at all.
> >
> > Our stack is difficult enough to run in a fully durable production
> > environment. We should make it a pr

Re: [DISCUSS] Arrange Events for 10-year Anniversary

2020-04-15 Thread Nick Dimiduk

On Wed, Apr 15, 2020 at 9:25 AM Vladimir Rodionov 
wrote:

> 2020 - 10 = 2010. As far as I remember I joined HBase community in 2009 :)
> and I am pretty sure that Mr. Stack did it even earlier.
>

IIRC, 2010 is when HBase graduated from being a Hadoop sub-project and
became a Apache Top-Level Project.

On Wed, Apr 15, 2020 at 5:57 AM Yu Li  wrote:
>
> > Dear all,
> >
> > Since our project has reached its 10th birthday, and 10 years is
> definitely
> > a great milestone, I propose to arrange some special (virtual) events for
> > celebration. What comes into my mind include:
> >
> > * Open threads to collect voices from our dev/user mailing list, like
> "what
> > do you want to say to HBase for its 10th birthday" (as well as our
> twitter
> > accounts maybe, if any)
> >
> > * Arrange some online interviews to both PMC members and our customers.
> > Some of us have been in this project all the way and there must be some
> > good stories to tell, as well as expectations for the future.
> >
> > * Join the Apache Feathercast as suggested in another thread.
> >
> > * Form a blogpost to include all above events as an official celebration.
> >
> > What do you think? Any other good ideas? Looking forward to more voices
> > (smile).
> >
> > Best Regards,
> > Yu
> >
>

[DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

2020-04-15 Thread Nick Dimiduk

Hi folks,

I'd like to bring up the topic of the experience of new users as it
pertains to use of the `LocalFileSystem` and its associated (lack of) data
durability guarantees. By default, an unconfigured HBase runs with its root
directory on a `file:///` path. This patch is picked up as an instance of
`LocalFileSystem`. Hadoop has long offered this class, but it has never
supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs
on this configuration, it is unable to ensure that WAL writes are durable,
and thus will ACK a write without this assurance. This is the case, even
when running in a fully durable WAL mode.

This impacts a new user, someone kicking the tires on HBase following our
Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will
WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
describes a process of disabling enforcement of stream capability
enforcement as a first step. This is a mandatory configuration for running
HBase directly out of our binary distribution.

HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
2.8: log a warning and cary on. The critique of this approach is that it's
far too subtle, too quiet for a system operating in a state known to not
provide data durability.

I have two assumptions/concerns around the state of things, which prompted
my solution on HBASE-24086 and the associated doc update on HBASE-24106.

1. No one should be running a production system on `LocalFileSystem`.

The initial implementation checked both for `LocalFileSystem` and
`hbase.cluster.distributed`. When running on the former and the latter is
false, we assume the user is running a non-production deployment and carry
on with the warning. When the latter is true, we assume the user intended a
production deployment and the process terminates due to stream capability
enforcement. Subsequent code review resulted in skipping the
`hbase.cluster.distributed` check and simply warning, as was done on 2.8
and earlier.

(As I understand it, we've long used the `hbase.cluster.distributed`
configuration to decide if the user intends this runtime to be a production
deployment or not.)

Is this a faulty assumption? Is there a use-case we support where we
condone running production deployment on the non-durable `LocalFileSystem`?

2. The Quick Start experience should require no configuration at all.

Our stack is difficult enough to run in a fully durable production
environment. We should make it a priority to ensure it's as easy as
possible to try out HBase. Forcing a user to make decisions about data
durability before they even launch the web ui is a terrible experience, in
my opinion, and should be a non-starter for us as a project.

(In my opinion, the need to configure either `hbase.rootdir` or
`hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
experience. It is a second, more subtle question of data durability that we
should avoid out of the box. But I'm happy to leave that for another
thread.)

Thank you for your time,
Nick

[jira] [Resolved] (HBASE-23994) Add WebUI to Canary

2020-04-14 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-23994.
--
Fix Version/s: 2.3.0
   3.0.0
   Resolution: Fixed

Applied to branch-2.3+. [~GeorryHuang] if you'd like to prepare a patch for 
branch-1, we can back port it there as well. Thanks for the nice contribution!

>  Add WebUI to Canary
> 
>
> Key: HBASE-23994
> URL: https://issues.apache.org/jira/browse/HBASE-23994
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, website
>Affects Versions: 3.0.0
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Trivial
> Fix For: 3.0.0, 2.3.0
>
> Attachments: image-2020-03-16-09-12-00-595.png
>
>
> During the running of Canary, the table sniff failure information will be 
> printed through the Log.
> {code:java}
> LOG.error("Read from {} on {}", table, server);
> {code}
>  
> I think we can use WebUI to display these failures to make it easier for us 
> to view this information
>  
> !image-2020-03-16-09-12-00-595.png!
> As shown in the figure above, we can directly see the Table and Regionserver 
> where the error occurred



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-24106) Update getting started documentation after HBASE-24086

2020-04-10 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24106:
--

Reopening, pending further discussion on the parent.

> Update getting started documentation after HBASE-24086
> --
>
> Key: HBASE-24106
> URL: https://issues.apache.org/jira/browse/HBASE-24106
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>    Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0
>
>
> HBASE-24086 allows HBase to degrade gracefully to running on a 
> {{LocalFileSystem}} without further user configuration. Update the docs 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24160) create-release fails to process x.y.0 version info correctly

2020-04-10 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24160.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

> create-release fails to process x.y.0 version info correctly
> 
>
> Key: HBASE-24160
> URL: https://issues.apache.org/jira/browse/HBASE-24160
> Project: HBase
>  Issue Type: Bug
>        Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Trying out {{create-release/do-release-docker.sh}}, is has trouble parsing 
> the 2.3.0-SNAPSHOT version. It also builds an invalid tag name for the API 
> check.
> {noformat}
> Current branch VERSION is 2.3.0-SNAPSHOT.
> RELEASE_VERSION [2.3.-1]: 2.3.0
> NEXT_VERSION [2.3.0-SNAPSHOT]: 2.3.1-SNAPSHOT
> RC_COUNT [0]:
> GIT_REF [2.3.0RC0]:
> api_diff_tag, [rel/2.2.0)]:
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24160) create-release fails to process x.y.0 version info correctly

2020-04-09 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HBASE-24160:


 Summary: create-release fails to process x.y.0 version info 
correctly
 Key: HBASE-24160
 URL: https://issues.apache.org/jira/browse/HBASE-24160
 Project: HBase
  Issue Type: Bug
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Trying out {{create-release/do-release-docker.sh}}, is has trouble parsing the 
2.3.0-SNAPSHOT version. It also builds an invalid tag name for the API check.

{noformat}
Current branch VERSION is 2.3.0-SNAPSHOT.
RELEASE_VERSION [2.3.-1]: 2.3.0
NEXT_VERSION [2.3.0-SNAPSHOT]: 2.3.1-SNAPSHOT
RC_COUNT [0]:
GIT_REF [2.3.0RC0]:
api_diff_tag, [rel/2.2.0)]:
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24136) Add release branch report to git-jira-release-audit tool

2020-04-09 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24136.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add release branch report to git-jira-release-audit tool
> 
>
> Key: HBASE-24136
> URL: https://issues.apache.org/jira/browse/HBASE-24136
> Project: HBase
>  Issue Type: Improvement
>        Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Update {{git-jira-release-audit}} to build a "what's new" report for a 
> release branch (i.e., {{branch-2.3}}). Currently it only supports building 
> such a report from a release line branch (i.e., {{branch-2}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24086) Disable output stream capability enforcement when running in standalone mode



 [ 
https://issues.apache.org/jira/browse/HBASE-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24086.
--
Fix Version/s: 2.2.5
   Resolution: Fixed

> Disable output stream capability enforcement when running in standalone mode
> 
>
> Key: HBASE-24086
> URL: https://issues.apache.org/jira/browse/HBASE-24086
> Project: HBase
>  Issue Type: Task
>  Components: master
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.5
>
>
> {noformat}
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> mvn clean install -DskipTests
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> ./bin/hbase master start
> {noformat}
> gives
> {noformat}
> 2020-03-30 17:12:43,857 ERROR 
> [master/192.168.111.13:16000:becomeActiveMaster] master.HMaster: Failed to 
> become active master  
>  
> java.io.IOException: cannot get log writer
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:118)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:704)
>   
>  
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:710)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:128)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:839)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:549)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.init(AbstractFSWAL.java:490)
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:156)
>   
>
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:61)
>   
> 
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:297) 
>   
> 
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.createWAL(RegionProcedureStore.java:256)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.bootstrap(RegionProcedureStore.java:273)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:482)
>   
>
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>   
>   
> at 
> org.apache.hadoop.hbase.master.HMaste

[jira] [Reopened] (HBASE-24086) Disable output stream capability enforcement when running in standalone mode



 [ 
https://issues.apache.org/jira/browse/HBASE-24086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24086:
--

Reopening for 2.2 backport.

> Disable output stream capability enforcement when running in standalone mode
> 
>
> Key: HBASE-24086
> URL: https://issues.apache.org/jira/browse/HBASE-24086
> Project: HBase
>  Issue Type: Task
>  Components: master
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Nick Dimiduk
>    Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> {noformat}
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> mvn clean install -DskipTests
> $ 
> JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
> ./bin/hbase master start
> {noformat}
> gives
> {noformat}
> 2020-03-30 17:12:43,857 ERROR 
> [master/192.168.111.13:16000:becomeActiveMaster] master.HMaster: Failed to 
> become active master  
>  
> java.io.IOException: cannot get log writer
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:118)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:704)
>   
>  
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:710)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:128)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:839)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:549)
>   
>   
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.init(AbstractFSWAL.java:490)
>   
> 
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:156)
>   
>
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:61)
>   
> 
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:297) 
>   
> 
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.createWAL(RegionProcedureStore.java:256)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.bootstrap(RegionProcedureStore.java:273)
>   
>   
> at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:482)
>   
>
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>   
>   
> at 
> org.apache.hadoop.hbase.master.HMaste

[jira] [Created] (HBASE-24147) Verify Jira and git agree on issue fixVersions

Nick Dimiduk created HBASE-24147:


 Summary: Verify Jira and git agree on issue fixVersions
 Key: HBASE-24147
 URL: https://issues.apache.org/jira/browse/HBASE-24147
 Project: HBase
  Issue Type: Sub-task
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Compare what's on the branches with fixVersions tagged in Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24146) Run a perf test with YCSB

Nick Dimiduk created HBASE-24146:


 Summary: Run a perf test with YCSB
 Key: HBASE-24146
 URL: https://issues.apache.org/jira/browse/HBASE-24146
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


Kick the tires on a recent build. Compare results with a run on the latest 
2.2.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24145) Run a correctness test with ITBLL

Nick Dimiduk created HBASE-24145:


 Summary: Run a correctness test with ITBLL
 Key: HBASE-24145
 URL: https://issues.apache.org/jira/browse/HBASE-24145
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


Kick the tires on a recent commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24144) Update docs from branch-2 and master

Nick Dimiduk created HBASE-24144:


 Summary: Update docs from branch-2 and master
 Key: HBASE-24144
 URL: https://issues.apache.org/jira/browse/HBASE-24144
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Take a pass updating the docs. Have a look at what's on branch-2.2 and add 
whatever updates we need from master. Consider refreshing branch-2 as well, 
since it's been a while.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24143) [JDK11] Switch default garbage collector from CMS

Nick Dimiduk created HBASE-24143:


 Summary: [JDK11] Switch default garbage collector from CMS
 Key: HBASE-24143
 URL: https://issues.apache.org/jira/browse/HBASE-24143
 Project: HBase
  Issue Type: Sub-task
  Components: scripts
Affects Versions: 3.0.0, 2.3.0
Reporter: Nick Dimiduk


When running HBase tools on the cli, one of the warnings generated is

{noformat}
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in 
version 9.0 and will likely be removed in a future release.
{noformat}

Java9+ use G1GC as the default collector. Maybe we simply omit GC 
configurations and use the default settings? Or someone has some suggested 
settings we can ship out of the box?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24141) Build "what's new in 2.3.0" report

Nick Dimiduk created HBASE-24141:


 Summary: Build "what's new in 2.3.0" report
 Key: HBASE-24141
 URL: https://issues.apache.org/jira/browse/HBASE-24141
 Project: HBase
  Issue Type: Sub-task
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Work out the set of features that are new in 2.3.0. Provide this report along 
with RC artifacts, and use it to highlight RC testing and release announcements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-24136) Add release branch report to git-jira-release-audit tool

Nick Dimiduk created HBASE-24136:


 Summary: Add release branch report to git-jira-release-audit tool
 Key: HBASE-24136
 URL: https://issues.apache.org/jira/browse/HBASE-24136
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Update {{git-jira-release-audit}} to build a "what's new" report for a release 
branch (i.e., {{branch-2.3}}). Currently it only supports building such a 
report from a release line branch (i.e., {{branch-2}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24068) [Flakey Tests] TestServerSideScanMetricsFromClientSide and NullPointerException



 [ 
https://issues.apache.org/jira/browse/HBASE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24068.
--
Resolution: Fixed

> [Flakey Tests] TestServerSideScanMetricsFromClientSide and 
> NullPointerException
> ---
>
> Key: HBASE-24068
> URL: https://issues.apache.org/jira/browse/HBASE-24068
> Project: HBase
>  Issue Type: Test
>  Components: flakies
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> This fails for me locally sporadically. 
> {code}
>  
> ---
>  Test set: org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide
>  
> ---
>  Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.893 s <<< 
> FAILURE! - in org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide
>  
> org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide.testRowsSeenMetricWithAsync
>   Time elapsed: 0.325 s  <<< ERROR!
>  java.lang.NullPointerException
> {code}
> Log pinpoints the problem a little better
> {code}
>  Exception in thread "testTable.asyncPrefetcher" 
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:183)
>at java.lang.Thread.run(Thread.java:748)
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
>  2020-03-26 22:18:15,160 ERROR [Listener at localhost/53268] 
> hbase.TestServerSideScanMetricsFromClientSide(202): FAIL
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
> {code}
> It didn't make sense to me since complaint was about a 'lock' that is created 
> as a final data member. But then this issue happened for my compañero, 
> [~ndimiduk]... and he saw that the call is from a Thread inner class in this 
> class's grandparent that is set running on construction; i.e. the lock 
> *could* be null at the time of reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-24068) [Flakey Tests] TestServerSideScanMetricsFromClientSide and NullPointerException



 [ 
https://issues.apache.org/jira/browse/HBASE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24068:
--

Looks like this one was never committed boss.

> [Flakey Tests] TestServerSideScanMetricsFromClientSide and 
> NullPointerException
> ---
>
> Key: HBASE-24068
> URL: https://issues.apache.org/jira/browse/HBASE-24068
> Project: HBase
>  Issue Type: Test
>  Components: flakies
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> This fails for me locally sporadically. 
> {code}
>  
> ---
>  Test set: org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide
>  
> ---
>  Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.893 s <<< 
> FAILURE! - in org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide
>  
> org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide.testRowsSeenMetricWithAsync
>   Time elapsed: 0.325 s  <<< ERROR!
>  java.lang.NullPointerException
> {code}
> Log pinpoints the problem a little better
> {code}
>  Exception in thread "testTable.asyncPrefetcher" 
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:183)
>at java.lang.Thread.run(Thread.java:748)
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
>  2020-03-26 22:18:15,160 ERROR [Listener at localhost/53268] 
> hbase.TestServerSideScanMetricsFromClientSide(202): FAIL
>  java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.client.ClientAsyncPrefetchScanner$PrefetchRunnable.run(ClientAsyncPrefetchScanner.java:174)
>at java.lang.Thread.run(Thread.java:748)
> {code}
> It didn't make sense to me since complaint was about a 'lock' that is created 
> as a final data member. But then this issue happened for my compañero, 
> [~ndimiduk]... and he saw that the call is from a Thread inner class in this 
> class's grandparent that is set running on construction; i.e. the lock 
> *could* be null at the time of reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24130) rat plugin complains about having an unlicensed file.