[jira] [Resolved] (HBASE-15539) HBase Client region location is expensive

2020-01-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-15539.
---
Resolution: Later

No progress. Resolving as 'Later'.

> HBase Client region location is expensive 
> --
>
> Key: HBASE-15539
> URL: https://issues.apache.org/jira/browse/HBASE-15539
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Vladimir Rodionov
>Assignee: Mikhail Antonov
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> ConnectionImplementation.locateRegion and MetaCache.getTableLocations are hot 
> spots in a client.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23676) Address feedback on HBASE-23055 Alter hbase:meta.

2020-01-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23676.
---
Resolution: Won't Fix

The parent HBASE-23055 was recast. This issue no longer relevant (The feedback 
was addressed up in new patch attached on HBASE-23055)

> Address feedback on HBASE-23055 Alter hbase:meta.
> -
>
> Key: HBASE-23676
> URL: https://issues.apache.org/jira/browse/HBASE-23676
> Project: HBase
>  Issue Type: Bug
>  Components: meta
>Affects Versions: 2.3.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Good feedback on HBASE-23055 came in after merge from [~zhangduo]. Opening 
> this issue to address it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-18326) Fix and reenable TestMasterProcedureWalLease

2020-01-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-18326.
---
  Assignee: Szabolcs Bukros
Resolution: Fixed

Resolving because test was removed. Thanks [~bszabolcs]. Assigned ticket to you 
as you did research.

> Fix and reenable TestMasterProcedureWalLease
> 
>
> Key: HBASE-18326
> URL: https://issues.apache.org/jira/browse/HBASE-18326
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Assignee: Szabolcs Bukros
>Priority: Blocker
> Fix For: 3.0.0, 2.3.0
>
>
> Fix and reenable flakey important test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23612) Update pom.xml to use another 2.5.0 protoc as external protobuf

2020-01-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23612.
---
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
 Assignee: zhao bo
   Resolution: Fixed

Merged. Thanks for the patch [~bzhaoopenstack]

> Update pom.xml to use another 2.5.0 protoc as external protobuf
> ---
>
> Key: HBASE-23612
> URL: https://issues.apache.org/jira/browse/HBASE-23612
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: zhao bo
>Assignee: zhao bo
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, there is no protoc 2.5.0 for release [1]. So we can make a new one 
> for ARM specific. For make sure that could work on ARM.
> We will introduce a new ARM artifact for protoc, group_id is 
> org.openlabtesting.protobuf .. This is just used for protobuf-maven-plugin to 
> compile .proto files. As the 3.X version of protoc support ARM already. So 
> this won't affect the internal protoc usage, which is 3.5.1-1 now.
>  
> [1][https://github.com/protocolbuffers/protobuf/issues/3844#issuecomment-343355946]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23690) Checkstyle plugin complains about our checkstyle.xml format; doc how to resolve mismatched version

2020-01-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23690.
---
Fix Version/s: 3.0.0
 Assignee: Michael Stack
   Resolution: Fixed

Resolving. Made subissue to address the update of the checkstyle plugin and to 
do the Nick nice suggestion above.

Thanks for reviews.

> Checkstyle plugin complains about our checkstyle.xml format; doc how to 
> resolve mismatched version
> --
>
> Key: HBASE-23690
> URL: https://issues.apache.org/jira/browse/HBASE-23690
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Trying to add the checkstyle.xml to the intellij checkstyle plugin after 
> reading HBASE-23688, it complains with the following when it reads in the 
> config file:
> {code}
> com.puppycrawl.tools.checkstyle.api.CheckstyleException: cannot initialize 
> module TreeWalker - TreeWalker is not allowed as a parent of LineLength 
> Please review 'Parent Module' section for this Check in web documentation if 
> Check is standard.
>   at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:473)
>   at 
> com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
>   at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
>   at 
> org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
>   at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:130)
>   at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
>   at 
> org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
>   at 
> org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:46)
> Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: 
> TreeWalker is not allowed as a parent of LineLength Please review 'Parent 
> Module' section for this Check in web documentation if Check is standard.
>   at 
> com.puppycrawl.tools.checkstyle.TreeWalker.setupChild(TreeWalker.java:147)
>   at 
> com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
>   at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:468)
>   ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23706) Update checkstyle plugin (and update checkstyle.xml to match)

2020-01-17 Thread Michael Stack (Jira)
Michael Stack created HBASE-23706:
-

 Summary: Update checkstyle plugin (and update checkstyle.xml to 
match)
 Key: HBASE-23706
 URL: https://issues.apache.org/jira/browse/HBASE-23706
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


In parent issue, its suggested we update our checkstyle plugin to match of 
intellij plugin default at least.  Will need checkstyle.xml changes else it 
fails parse (See notes in parent by [~bharathv] on what needs doing).

For extra points, do the [~ndimiduk] suggestion: "...It would be nice also if 
we could commit the .idea/checkstyle-idea.xml file with the checkstyle version 
used buy the plugin pinned to the same version as we're using in maven."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23705) Add CellComparator to HFileContext

2020-01-17 Thread Michael Stack (Jira)
Michael Stack created HBASE-23705:
-

 Summary: Add CellComparator to HFileContext
 Key: HBASE-23705
 URL: https://issues.apache.org/jira/browse/HBASE-23705
 Project: HBase
  Issue Type: Sub-task
  Components: io
Reporter: Michael Stack
Assignee: Michael Stack
 Fix For: 3.0.0, 2.3.0


The HFileContext is present when reading and writing files. It is populated at 
read time using HFile trailer content and file metadata. At write time, we 
create it up front.

Interesting is that though CellComparator is written to the HFile trailer, and 
parse of the Trailer creates an HFileInfo which builds the HFileContext at read 
time, the HFileContext does not expose what CellComparator to use decoding and 
seeking. Around the codebase there are various compensations made for this lack 
with decoders that actually have a decoding context (with a reference to the 
hfilecontext), hard-coding use of the default CellComparator. StoreFileInfo 
will use default if not passed a comparator (even though we'd just read the 
trailer) and HFile itself is similar.

Let me fix this situation removing ambiguity. It will also fix bugs in parent 
issue where UTs are failing because wrong CellComparator is being used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23697) Document new RegionProcedureStore operation and migration

2020-01-15 Thread Michael Stack (Jira)
Michael Stack created HBASE-23697:
-

 Summary: Document new RegionProcedureStore operation and migration
 Key: HBASE-23697
 URL: https://issues.apache.org/jira/browse/HBASE-23697
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.3.0
Reporter: Michael Stack


Add a few notes to the refguide on the new RegionProcedureStore, how it works, 
how it differs from WALPS, and note it auto-migrates and there should be new 
issue moving on to the new store.

Mention the configuration. Mention it is on WALFS even though it is a 'Region', 
etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23696) Stop WALProcedureStore after migration finishes

2020-01-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23696.
---
Resolution: Duplicate

Resolving as dupe of HBASE-23694 

> Stop WALProcedureStore after migration finishes
> ---
>
> Key: HBASE-23696
> URL: https://issues.apache.org/jira/browse/HBASE-23696
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> WALProcedureStore is left up with its sync thread running in background 
> though we are done with after starting it inside the migration method. Add 
> stop when done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23696) Stop WALProcedureStore after migration finishes

2020-01-15 Thread Michael Stack (Jira)
Michael Stack created HBASE-23696:
-

 Summary: Stop WALProcedureStore after migration finishes
 Key: HBASE-23696
 URL: https://issues.apache.org/jira/browse/HBASE-23696
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack
Assignee: Michael Stack
 Fix For: 3.0.0, 2.3.0


WALProcedureStore is left up with its sync thread running in background though 
we are done with after starting it inside the migration method. Add stop when 
done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23690) Checkstyle plugin complains about our checkstyle.xml format

2020-01-14 Thread Michael Stack (Jira)
Michael Stack created HBASE-23690:
-

 Summary: Checkstyle plugin complains about our checkstyle.xml 
format
 Key: HBASE-23690
 URL: https://issues.apache.org/jira/browse/HBASE-23690
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


Trying to add the checkstyle.xml to the intellij checkstyle plugin after 
reading HBASE-23688, it complains with the following when it reads in the 
config file:
{code}
com.puppycrawl.tools.checkstyle.api.CheckstyleException: cannot initialize 
module TreeWalker - TreeWalker is not allowed as a parent of LineLength Please 
review 'Parent Module' section for this Check in web documentation if Check is 
standard.
at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:473)
at 
com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
at 
org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61)
at 
org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26)
at 
org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:130)
at 
org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60)
at 
org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51)
at 
org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:46)
Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: TreeWalker 
is not allowed as a parent of LineLength Please review 'Parent Module' section 
for this Check in web documentation if Check is standard.
at 
com.puppycrawl.tools.checkstyle.TreeWalker.setupChild(TreeWalker.java:147)
at 
com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:198)
at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:468)
... 7 more
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23689) Bookmark for github PR to jira redirection

2020-01-14 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23689.
---
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Resolving on merge. Nice one [~bharathv].

>  Bookmark for github PR to jira redirection
> ---
>
> Key: HBASE-23689
> URL: https://issues.apache.org/jira/browse/HBASE-23689
> Project: HBase
>  Issue Type: Sub-task
>  Components: tooling
>Affects Versions: master
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Minor
> Fix For: 3.0.0
>
>
> Following is a simple js snippet that redirects from any HBase PR to its 
> corresponding jira. Without this, one has to copy the jira ID from the PR, 
> construct a jira URL manually and paste it in the browser URL bar. Saves a 
> bunch of clicks.
> {code:javascript}
> javascript:location.href='https://issues.apache.org/jira/browse/'document.getElementsByClassName("js-issue-title")[0].innerHTML.match(/HBASE-\d/)[0];{code}
> Particularly helpful for reviewers who'd like to read the jira contents often 
> when reviewing a PR.
> For chrome:
>  - Right Click on the bookmarks bar
>  - Click on Add page. Fill in the following details:
>  Name: HBase jira redirect (or any other that you prefer)
>  URL: – {{snippet from above}}--
>  - Click Save
> Now you should see "HBase jira redirect" (or any other name you gave) 
> bookmark on the bar.
>  Go to any Github PR, click on this button and it redirects to the 
> corresponding jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23687) DEBUG logging cleanup

2020-01-14 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23687.
---
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Merged to branch-2 and master. Thanks for review [~janh]

> DEBUG logging cleanup
> -
>
> Key: HBASE-23687
> URL: https://issues.apache.org/jira/browse/HBASE-23687
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0, 2.3.0
>
>
> Minor cleanup of annoying loggings. For example, over an hour, we logged this 
> 200k times:
> {code}2020-01-14 11:06:00,287 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.LogCleaner: Exiting
> {code}
> There is no corresponding 'Starting'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23687) DEBUG logging cleanup

2020-01-14 Thread Michael Stack (Jira)
Michael Stack created HBASE-23687:
-

 Summary: DEBUG logging cleanup
 Key: HBASE-23687
 URL: https://issues.apache.org/jira/browse/HBASE-23687
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack


Minor cleanup of annoying loggings. For example, over an hour, we logged this 
200k times:

{code}2020-01-14 11:06:00,287 DEBUG 
org.apache.hadoop.hbase.master.cleaner.LogCleaner: Exiting
{code}

There is no corresponding 'Starting'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23685) indicates a last flushed sequence id ... that is less than the previous last flushed sequence id

2020-01-13 Thread Michael Stack (Jira)
Michael Stack created HBASE-23685:
-

 Summary: indicates a last flushed sequence id ... that is less 
than the previous last flushed sequence id
 Key: HBASE-23685
 URL: https://issues.apache.org/jira/browse/HBASE-23685
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack


I'm getting loads of the below in Master log running tests against branch-2. It 
is heavily loaded but generally keeping up though there is backlog in WAL 
files > 32 ... around the cluster so forced flushes happening. I'll see the 
below for a Region even though it seems like we've since flushed out a 
sequenceid on the RS-side that is larger than what the Master is seeing. Two 
column family table.

{code}
 2020-01-13 23:33:18,455 WARN org.apache.hadoop.hbase.master.ServerManager: 
 RegionServer hbasedn030.sp07.siri.apple.com,16020,1578934813139 indicates a
 last flushed sequence id (1593644) that is less than the previous last flushed 
 sequence id (1593649) for region t1,f9371d5,1576227377175. 
 3e41deae849d25f0a2f1d654f482d73a. Ignoring.
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23286) Improve MTTR: Split WAL to HFile

2020-01-13 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23286.
---
Resolution: Fixed

Resolving. No problem.

Will file issues elsewhere (e.g. HBASE-23684).


> Improve MTTR: Split WAL to HFile
> 
>
> Key: HBASE-23286
> URL: https://issues.apache.org/jira/browse/HBASE-23286
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> After HBASE-20724, the compaction event marker is not used anymore when 
> failover. So our new proposal is split WAL to HFile to imporve MTTR. It has 3 
> steps:
>  # Read WAL and write HFile to region’s column family’s recovered.hfiles 
> directory.
>  # Open region.
>  # Bulkload the recovered.hfiles for every column family.
> The design doc was attathed by a google doc. Any suggestions are welcomed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23684) NPE HFilesOutputSink

2020-01-13 Thread Michael Stack (Jira)
Michael Stack created HBASE-23684:
-

 Summary: NPE HFilesOutputSink
 Key: HBASE-23684
 URL: https://issues.apache.org/jira/browse/HBASE-23684
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.3.0
Reporter: Michael Stack


Ran into this after enabling hfile splitter:

{code}
 2020-01-13 17:37:08,204 INFO org.apache.hadoop.hbase.wal.OutputSink: 3 split 
writer threads finished
 2020-01-13 17:37:08,233 INFO org.apache.hadoop.hbase.wal.WALSplitter: 
Processed 1007 edits across 0 regions cost 284 ms; edits skipped=76; 
WAL=hdfs://nameservice1/hbase/genie/WALs/hbasedn101.example.org,16020,1578934806382-splitting/hbasedn101.example.org%2C16020%2C1578934806382.1578937008832,
 size=128.5 M, length=134708720, corrupted=false, progress failed=true
 2020-01-13 17:37:08,234 WARN 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of 
WALs/hbasedn101.example.org,16020,1578934806382-splitting/hbasedn101.example.org%2C16020%2C1578934806382.1578937008832
 failed, returning error
 java.io.IOException: java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.writeRemainingEntryBuffers(BoundedRecoveredHFilesOutputSink.java:173)
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.close(BoundedRecoveredHFilesOutputSink.java:140)
 at 
org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:339)
 at 
org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:181)
 at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:105)
 at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:84)
 at 
org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
 at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
 Caused by: java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.configContextForNonMetaWriter(BoundedRecoveredHFilesOutputSink.java:225)
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.createRecoveredHFileWriter(BoundedRecoveredHFilesOutputSink.java:213)
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.append(BoundedRecoveredHFilesOutputSink.java:117)
 at 
org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.lambda$writeRemainingEntryBuffers$3(BoundedRecoveredHFilesOutputSink.java:155)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
{code}

It is a bit odd because log says there were zero regions. Not sure what that 
was about.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23055) Alter hbase:meta

2020-01-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-23055:
---

Reopening (again). Reverted the commit. Good back and forth going on here, in 
PR, and in sub-issue. [~zhangduo] is concerned that being able to disable 
hbase:meta is a step to far and proposes alter w/o disabling as a means to 
achieve this issues' objective (also suggests guard against operator 
accidentally deleting fundamental hbase:meta column families). Let me go his 
suggested route.

> Alter hbase:meta
> 
>
> Key: HBASE-23055
> URL: https://issues.apache.org/jira/browse/HBASE-23055
> Project: HBase
>  Issue Type: Task
>  Components: meta
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> hbase:meta is currently hardcoded. Its schema cannot be change.
> This issue is about allowing edits to hbase:meta schema. It will allow our 
> being able to set encodings such as the block-with-indexes which will help 
> quell CPU usage on host carrying hbase:meta. A dynamic hbase:meta is first 
> step on road to being able to split meta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23680) RegionProcedureStore missing cleaning of hfile archive

2020-01-10 Thread Michael Stack (Jira)
Michael Stack created HBASE-23680:
-

 Summary: RegionProcedureStore missing cleaning of hfile archive
 Key: HBASE-23680
 URL: https://issues.apache.org/jira/browse/HBASE-23680
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Michael Stack
 Fix For: 2.3.0


See tail of parent issue. The new RegionProcedureStore accumulates deleted 
hfiles in its local archive dir. Needs a cleaner like the one that watches over 
/hbase/archive.

Is there a problem clearning the new $masterproc$ files from the oldWALs too? 
These seem to stick around also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23668) Master log start filling with "Flush journal status" messages

2020-01-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23668.
---
Hadoop Flags: Reviewed
Assignee: Michael Stack
  Resolution: Fixed

Merged to branch-2+. Thanks for review [~zhangduo] (reopen to add your UT?).

> Master log start filling with "Flush journal status" messages
> -
>
> Key: HBASE-23668
> URL: https://issues.apache.org/jira/browse/HBASE-23668
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2, RegionProcedureStore
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Takes a while to get into this condition. Not each to tell how because all 
> logs have rolled off and I only have logs filled w/ below:
> {code}
> 2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Flush status journal:
>   Acquiring readlock on region at 1578553261723
>   Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
> failureReason:Nothing to flush,flush seq id45226854 at 1578553261723
> 2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Flush status journal:
>   Acquiring readlock on region at 1578553261723
>   Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
> failureReason:Nothing to flush,flush seq id45226855 at 1578553261723
> 2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Flush status journal:
>   Acquiring readlock on region at 1578553261723
>   Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
> failureReason:Nothing to flush,flush seq id45226856 at 1578553261723
> 2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Flush status journal:
>   Acquiring readlock on region at 1578553261723
>   Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
> failureReason:Nothing to flush,flush seq id45226857 at 1578553261723
> {code}
> ... I added the printing of flushresult... i.e. cannot flush because store is 
> empty.
> Digging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23676) Address feedback on HBASE-23055 Alter hbase:meta.

2020-01-10 Thread Michael Stack (Jira)
Michael Stack created HBASE-23676:
-

 Summary: Address feedback on HBASE-23055 Alter hbase:meta.
 Key: HBASE-23676
 URL: https://issues.apache.org/jira/browse/HBASE-23676
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack
Assignee: Michael Stack
 Fix For: 2.3.0


Good feedback on HBASE-23055 came in after merge from [~zhangduo]. Opening this 
issue to address it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23055) Alter hbase:meta

2020-01-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23055.
---
Resolution: Fixed

Pushed on master branch too. Thanks for reviews [~bharathv]

> Alter hbase:meta
> 
>
> Key: HBASE-23055
> URL: https://issues.apache.org/jira/browse/HBASE-23055
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> hbase:meta is currently hardcoded. Its schema cannot be change.
> This issue is about allowing edits to hbase:meta schema. It will allow our 
> being able to set encodings such as the block-with-indexes which will help 
> quell CPU usage on host carrying hbase:meta. A dynamic hbase:meta is first 
> step on road to being able to split meta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23672) 350+ lossy-count threads running

2020-01-09 Thread Michael Stack (Jira)
Michael Stack created HBASE-23672:
-

 Summary: 350+ lossy-count threads running
 Key: HBASE-23672
 URL: https://issues.apache.org/jira/browse/HBASE-23672
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack


Looking at a server under load (branch-2), I see 350 instances of lossy-count 
threads running. They look like this:

{code}
8611 "lossy-count-0" #11672 daemon prio=5 os_prio=0 cpu=0.09ms elapsed=281.33s 
tid=0x7f1baee76800 nid=0x2411 waiting on condition  [0x7f1b78793000]
8612java.lang.Thread.State: WAITING (parking)
8613 at jdk.internal.misc.Unsafe.park(java.base@11.0.4/Native Method)
8614 - parking to wait for  <0x910a91e0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
8615 at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.4/LockSupport.java:194)
8616 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.4/AbstractQueuedSynchronizer.java:2081)
8617 at 
java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.4/LinkedBlockingQueue.java:433)
8618 at 
java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.4/ThreadPoolExecutor.java:1054)
8619 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.4/ThreadPoolExecutor.java:1114)
8620 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.4/ThreadPoolExecutor.java:628)
8621 at java.lang.Thread.run(java.base@11.0.4/Thread.java:834)
{code}

Why we need 350 threads?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23286) Improve MTTR: Split WAL to HFile

2020-01-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-23286:
---

Reopening. Feature doesn't seem to be working on tip of branch-2.

> Improve MTTR: Split WAL to HFile
> 
>
> Key: HBASE-23286
> URL: https://issues.apache.org/jira/browse/HBASE-23286
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> After HBASE-20724, the compaction event marker is not used anymore when 
> failover. So our new proposal is split WAL to HFile to imporve MTTR. It has 3 
> steps:
>  # Read WAL and write HFile to region’s column family’s recovered.hfiles 
> directory.
>  # Open region.
>  # Bulkload the recovered.hfiles for every column family.
> The design doc was attathed by a google doc. Any suggestions are welcomed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23378) Clean Up FSUtil setClusterId

2020-01-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23378.
---
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged. Thanks for the patch [~belugabehr]

> Clean Up FSUtil setClusterId
> 
>
> Key: HBASE-23378
> URL: https://issues.apache.org/jira/browse/HBASE-23378
> Project: HBase
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 3.0.0
>
>
> * Use try-with-resources
> * Remove bad practice of catching one's own Exceptions
> * Method signature 'wait' should be of type long to match JDK API
> * Add additional debugging
> * Do not swallow Interrupt status of thread
> * General cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23103) Survey incidence of table state queries

2020-01-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23103.
---
  Assignee: Michael Stack
Resolution: Information Provided

Ok. The discussion here has good info but is moot in light of the form the 
parent issue took on commit. The parent no longer goes via master to find table 
state. That decision/development has been put off for now. Instead master issue 
adds meta table state as a method in Registry and adds an implementation to 
ZKAsyncRegistry that does lookup into zk, bypassing Master. For now resolving 
as 'Information Provided'. We'll be back to this topic after Master-based 
Registry lands. This latter project looks to change nature of Master from 
background janitorial player to active participant in data 

> Survey incidence of table state queries
> ---
>
> Key: HBASE-23103
> URL: https://issues.apache.org/jira/browse/HBASE-23103
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Task that comes of parent issue. Parent makes it so we go via Master to 
> figure state of a table. It is the authority and since the parent issues adds 
> being able to enable/disable hbase:meta, table state is now in two places -- 
> in hbase:meta table... and elsewhere for the hbase:meta's state. Rather than 
> have client go to two locations dependent on which table is being asked 
> about, parent made it so we went to master. Parent allows that this puts more 
> load on the Master. [~zhangduo] brings up the valid concern that it might be 
> too much or that dependent on the Master for state puts Master too much 
> in-line with read/writes.
> This issue is a survey to figure how much load and how much state-in-master 
> could mess up inline read/writes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23668) Master log start filling with "Flush journal status" messages

2020-01-08 Thread Michael Stack (Jira)
Michael Stack created HBASE-23668:
-

 Summary: Master log start filling with "Flush journal status" 
messages
 Key: HBASE-23668
 URL: https://issues.apache.org/jira/browse/HBASE-23668
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Reporter: Michael Stack
 Fix For: 2.3.0


Takes a while to get into this condition. Not each to tell how because all logs 
have rolled off and I only have logs filled w/ below:

{code}
2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Flush status journal:
Acquiring readlock on region at 1578553261723
Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
failureReason:Nothing to flush,flush seq id45226854 at 1578553261723
2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Flush status journal:
Acquiring readlock on region at 1578553261723
Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
failureReason:Nothing to flush,flush seq id45226855 at 1578553261723
2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Flush status journal:
Acquiring readlock on region at 1578553261723
Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
failureReason:Nothing to flush,flush seq id45226856 at 1578553261723
2020-01-09 07:01:01,723 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Flush status journal:
Acquiring readlock on region at 1578553261723
Flush successful flush result:CANNOT_FLUSH_MEMSTORE_EMPTY, 
failureReason:Nothing to flush,flush seq id45226857 at 1578553261723
{code}

... I added the printing of flushresult... i.e. cannot flush because store is 
empty.

Digging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23369) Auto-close 'unknown' Regions reported as OPEN on RegionServers

2020-01-03 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23369.
---
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Incompatible change
 Release Note: If a RegionServer reports a Region as OPEN in disagreement 
with Master's status on the Region, the Master now tells the RegionServer to 
silently close the Region.
 Assignee: Michael Stack
   Resolution: Fixed

Merged to branch-2 and. master branch.

I think this belongs in branch-2.2 too. Shout and I'll pull it back.

> Auto-close 'unknown' Regions reported as OPEN on RegionServers
> --
>
> Key: HBASE-23369
> URL: https://issues.apache.org/jira/browse/HBASE-23369
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> In old days, if a RegionServer reported a variance that didn't agree w/ 
> Master view of the cluster, we'd kill the RegionServer.
> Lately, in tests that overrun a cluster, after a sustained high-load, Master 
> can start failing its updates against Meta (CallQueueTooBigException <= More 
> on this later). It then can lose proper accounting of all Region members. One 
> variant has a RegionServer reporting its list of open Regions to the Master 
> and the Master doesn't 'know' of a particular Region or the Master may know 
> the Region but expects it open on another RegionServer.
> Here is an example of how it looks each time RS reports:
> {code}
>  2019-12-03 07:07:00,757 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
> t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode 
> but reported ONLINE at server.example.org,16020,1575354666245 
> (inServerRegionList=false).
>  2019-12-03 07:07:03,793 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
> t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode 
> but reported ONLINE at server.example.org,16020,1575354666245 
> (inServerRegionList=false).
> {code}
> Will also show as an 'inconsistency' in the 'HBCK' tab on the Master UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23585) MetricsRegionServerWrapperImpl.getL1CacheHitCount always returns 200

2020-01-02 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23585.
---
Fix Version/s: 1.6.1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-1. It seems to be a problem on this branch only (correct me if 
I am wrong [~DeanZ]). Thanks to reviewers [~binlijin] and [~janh]

> MetricsRegionServerWrapperImpl.getL1CacheHitCount always returns 200
> 
>
> Key: HBASE-23585
> URL: https://issues.apache.org/jira/browse/HBASE-23585
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.4.12
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 1.6.1
>
>
> Looks like it was copied from a UT class and forgot to change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23632) DeadServer cleanup

2020-01-02 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23632.
---
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2+ (Applied it to branch-2.2 but then reverted because it just 
an improvement and logging is changed mildly).

> DeadServer cleanup
> --
>
> Key: HBASE-23632
> URL: https://issues.apache.org/jira/browse/HBASE-23632
> Project: HBase
>  Issue Type: Improvement
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
>
> Cleanup of DeadServer class shutting down access, undoing duplication, adding 
> doc., and removing unused code.
> One change is that we do not remove a server from 'processing' list when we 
> 'remove' deadservers; we let SCP do it since it owns processing list (Saw 
> issue where on fast restart of a server, the server was removed from 
> deadserver and from processing list though the SCP for the dead server was 
> still running -- no repercussions that I could see but a little confusing).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23596) HBCKServerCrashProcedure can double assign

2020-01-02 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23596.
---
Fix Version/s: (was: 2.2.4)
   2.2.3
 Hadoop Flags: Reviewed
 Release Note: 
Makes it so the recently added HBCKServerCrashProcedure -- the SCP that gets 
invoked when an operator schedules an SCP via hbck2 scheduleRecoveries command 
-- now works the same as SCP EXCEPT if master knows nothing of the scheduled 
servername. In this latter case, HBCKSCP will do a full scan of hbase:meta 
looking for instances of the passed servername. If any found it will attempt 
cleanup of hbase:meta references by reassigning any found OPEN or OPENING and 
by closing any in CLOSING state.

Used to fix instances of what the 'HBCK Report' page shows as 'Unknown Servers'.
   Resolution: Fixed

Merged to branch-2.2+

> HBCKServerCrashProcedure can double assign
> --
>
> Key: HBASE-23596
> URL: https://issues.apache.org/jira/browse/HBASE-23596
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> The new SCP that does SCP plus cleanup 'Unknown Servers' with mentions in 
> hbase:meta added by the below can make for double assignments.
> {code}
> commit c238891a26734e1e4276b6b1677a58cf83de5dc4
> Author: stack 
> Date:   Wed Nov 13 22:36:26 2019 -0800
> HBASE-23282 HBCKServerCrashProcedure for 'Unknown Servers'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23628) Remove Apache Commons Digest Base64

2020-01-02 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23628.
---
Fix Version/s: 3.0.0
 Release Note: 
>From the PR:

"Yes. The two create the same output... I just wrote a small test suite to 
increase my confidence on that. I generated many tens of millions of random 
byte patterns and compared the output of the two algorithms. They came back 
identical every time.

"Just in case any inquiring minds would like to know, there is no longer an 
encoding required when generating the strings. The JDK implementation 
specifically specifies that strings returned are StandardCharsets.ISO_8859_1. 
This does not change anything because UTF8 and ISO_8859 overlap for the limited 
character set (64 characters) the encoding uses."
   Resolution: Fixed

Thanks for the patch [~belugabehr] and the work done to verify the change.

> Remove Apache Commons Digest Base64
> ---
>
> Key: HBASE-23628
> URL: https://issues.apache.org/jira/browse/HBASE-23628
> Project: HBase
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 3.0.0
>
>
> Use the native JDK Base64 implementation instead.  Most places are using the 
> JDK version, but a couple of spots were missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23632) DeadServer cleanup

2020-01-02 Thread Michael Stack (Jira)
Michael Stack created HBASE-23632:
-

 Summary: DeadServer cleanup
 Key: HBASE-23632
 URL: https://issues.apache.org/jira/browse/HBASE-23632
 Project: HBase
  Issue Type: Improvement
Reporter: Michael Stack
Assignee: Michael Stack






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23238) Additional test and checks for null references on ScannerCallableWithReplicas

2019-12-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23238.
---
Resolution: Fixed

[~bharathv] I pushed the addendum on branch-2.1+. The patch wouldn't go back to 
branch-1. Should it? Thanks.

> Additional test and checks for null references on ScannerCallableWithReplicas
> -
>
> Key: HBASE-23238
> URL: https://issues.apache.org/jira/browse/HBASE-23238
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.12
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.6.0, 2.2.3, 2.1.8
>
> Attachments: HBASE-23238.branch-2.patch
>
>
> One of our customers running a 1.2 based version is facing NPE when scanning 
> data from a MR job. It happens when the map task is finalising:
> {noformat}
> ...
> 2019-09-10 14:17:22,238 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@3a5b7d7e
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:730)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:178)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:89)
> at 
> org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.close(MultiTableInputFormatBase.java:112)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:529)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2039)
> ...
> 2019-09-10 14:18:24,601 FATAL [IPC Server handler 5 on 35745] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1566832111959_6047_m_00_3 - exited : 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:264)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:248)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:542)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
> at 
> org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.nextKeyValue(MultiTableInputFormatBase.java:139)
> ...
> {noformat}
> After some investigation, we found out that 1.2 based deployments will 
> consistently face this problem under the following conditions:
> 1) The sum of all the given row KVs size targeted to be returned in the scan 
> is larger than *max result size*;
> 2) At same time, the scan filter has exhausted, so that all remaining KVs 
> should be filtered and not returned;
> We could simulate this with the UT being proposed in this PR. When checking 
> newer branches, though, I could verify this specific problem is not present 
> on newer branches, I believe it was indirectly sorted by changes from 
> HBASE-17489.
> Nevertheless, I think it would still be useful to have this extra test and 
> checks added as a safeguard measure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23238) Additional test and checks for null references on ScannerCallableWithReplicas

2019-12-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-23238:
---

Reopening to apply addendum.

> Additional test and checks for null references on ScannerCallableWithReplicas
> -
>
> Key: HBASE-23238
> URL: https://issues.apache.org/jira/browse/HBASE-23238
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.12
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.6.0, 2.1.8, 2.2.3
>
> Attachments: HBASE-23238.branch-2.patch
>
>
> One of our customers running a 1.2 based version is facing NPE when scanning 
> data from a MR job. It happens when the map task is finalising:
> {noformat}
> ...
> 2019-09-10 14:17:22,238 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@3a5b7d7e
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:730)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:178)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:89)
> at 
> org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.close(MultiTableInputFormatBase.java:112)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:529)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2039)
> ...
> 2019-09-10 14:18:24,601 FATAL [IPC Server handler 5 on 35745] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1566832111959_6047_m_00_3 - exited : 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:264)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:248)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:542)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222)
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
> at 
> org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.nextKeyValue(MultiTableInputFormatBase.java:139)
> ...
> {noformat}
> After some investigation, we found out that 1.2 based deployments will 
> consistently face this problem under the following conditions:
> 1) The sum of all the given row KVs size targeted to be returned in the scan 
> is larger than *max result size*;
> 2) At same time, the scan filter has exhausted, so that all remaining KVs 
> should be filtered and not returned;
> We could simulate this with the UT being proposed in this PR. When checking 
> newer branches, though, I could verify this specific problem is not present 
> on newer branches, I believe it was indirectly sorted by changes from 
> HBASE-17489.
> Nevertheless, I think it would still be useful to have this extra test and 
> checks added as a safeguard measure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23614) Revert miscommit "Add status when fixing hole" subsequently fixed by HBASE-23313

2019-12-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23614.
---
Resolution: Not A Problem

Resolving as not a problem. Wellington overwrote my change with his commit.

> Revert miscommit "Add status when fixing hole" subsequently fixed by 
> HBASE-23313
> 
>
> Key: HBASE-23614
> URL: https://issues.apache.org/jira/browse/HBASE-23614
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>
> The mis-commit was found by [~zhangduo].
> I started in on trying to fix setRegionState but then [~wchevreuil] fixed it 
> with
> commit 70bbc38aaefa7af336e274296766d4f3ece4646e
> Author: Wellington Ramos Chevreuil 
> Date:   Wed Nov 27 08:41:23 2019 +
> HBASE-23313 [hbck2] setRegionState should update Master in-memory sta… 
> (#864)
> Signed-off-by: Mingliang Liu 
> Signed-off-by: stack 
> My commit was mistakenly pushed to branch-2. Reverting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23614) Revert miscommit "Add status when fixing hole" subsequently fixed by HBASE-23313

2019-12-23 Thread Michael Stack (Jira)
Michael Stack created HBASE-23614:
-

 Summary: Revert miscommit "Add status when fixing hole" 
subsequently fixed by HBASE-23313
 Key: HBASE-23614
 URL: https://issues.apache.org/jira/browse/HBASE-23614
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The mis-commit was found by [~zhangduo].

I started in on trying to fix setRegionState but then [~wchevreuil] fixed it 
with

commit 70bbc38aaefa7af336e274296766d4f3ece4646e
Author: Wellington Ramos Chevreuil 
Date:   Wed Nov 27 08:41:23 2019 +

HBASE-23313 [hbck2] setRegionState should update Master in-memory sta… 
(#864)

Signed-off-by: Mingliang Liu 
Signed-off-by: stack 

My commit was mistakenly pushed to branch-2. Reverting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-20103) [pv2] AssignmentProcedure is too coarse grained

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20103.
---
Resolution: Won't Fix

Stale.

> [pv2] AssignmentProcedure is too coarse grained
> ---
>
> Key: HBASE-20103
> URL: https://issues.apache.org/jira/browse/HBASE-20103
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Critical
> Fix For: 3.0.0
>
>
> Comes of work on HBASE-20100 but in particular, in feedback from [~Apache9] 
> https://mail.google.com/mail/u/0/#inbox/161d8e41054be406
> The AP is too coarse-grained. There is precheck+start, then transform state 
> is edit meta setting state to OPENING and then dispatch (rpc)  Finish is 
> edit of meta and setting internal state. The edit of meta should be distinct 
> step at least.
> Would save on duplicated ops -- e.g. re-editing hbase:meta and dispatching 
> another RPC -- if we fail going into finishing. [~Apache9] brings up our 
> perhaps masking other state change hiccups when steps are so coarse-grained.
> Do same for unassignprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-20266) Review current set of ignored tests

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20266.
---
Resolution: Later

Stale. Later.

> Review current set of ignored tests
> ---
>
> Key: HBASE-20266
> URL: https://issues.apache.org/jira/browse/HBASE-20266
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Critical
> Fix For: 3.0.0, 2.3.0
>
>
> [~Apache9] turned up a list of currently ignored tests. At first blush, its 
> fine to ignore some such as TestHTraceHooks and TestRegionsOnMaster but 
> others could do with a review. This issue is about looking at list to make 
> sure nothing important missed for hbase2 and that we for sure marked why a 
> test was ignored with comment and that there is a follow-on to enable JIRA.
> {code}
> TestRpcHandlerException
> TestRSKilledWhenInitializing
> TestHTraceHooks
> TestAsyncTableGetMultiThreadedWithEagerCompaction
> TestStochasticBalancerJmxMetrics
> TestReplicator
> TestQuotaThrottle
> TestFavoredStochasticLoadBalancer
> TestAsyncTableGetMultiThreadedWithBasicCompaction
> TestRegionPlacement
> TestMasterTransitions
> TestMemstoreLABWithoutPool
> TestRegionsOnMasterOptions
> TestRestoreSnapshotFromClientWithRegionReplicas
> TestMasterBalanceThrottling
> TestMasterProcedureWalLease
> TestRegionServerReadRequestMetrics
> TestHttpServerLifecycle
> TestHRegionServerBulkLoadWithOldSecureEndpoint
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-20412) Update our compliance-checker from 2.1 to 2.4

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20412.
---
Resolution: Implemented

commit 0a6aec49813e794d7ed5d6608e0ee4fab5587ccd
Author: Mike Drob 
Date:   Tue Jun 12 13:23:13 2018 -0500

HBASE-19377 Update Java API CC version

Compatibility checker complaining about hash collisions, newer versions
use longer id strings.

> Update our compliance-checker from 2.1 to 2.4
> -
>
> Key: HBASE-20412
> URL: https://issues.apache.org/jira/browse/HBASE-20412
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
> Attachments: update.txt
>
>
> I thought we had an issue to do this already but I can't find it.
> The newer compatibility-checker has the filtering on annotation added by one 
> of us (or at least asked-for by one-of-us).
> i tried it yesterday.  Seems to work nicely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-21167) Master killed after IOE in FanOutOneBlockAsyncDFSOutput on log roll

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-21167.
---
Resolution: Later

Haven't seen this in a while.

> Master killed after IOE in FanOutOneBlockAsyncDFSOutput on log roll
> ---
>
> Key: HBASE-21167
> URL: https://issues.apache.org/jira/browse/HBASE-21167
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Michael Stack
>Priority: Major
>
> Logging this in case we see it again. I had a Master working furiously. It 
> had assigned over 400k regions on startup. Then this happened which knocked 
> the hard-working server out:
> {code}
> 2018-09-06 07:50:18,983 ERROR org.apache.hadoop.hbase.master.HMaster: Master 
> server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> com.cloudera.navigator.audit.hbase.MasterAuditCoProcessor]
> 2018-09-06 07:50:18,983 ERROR org.apache.hadoop.hbase.master.HMaster: * 
> ABORTING master vc0207.halxg.cloudera.com,22001,1536173228913: IOE in log 
> roller *
> java.io.IOException: Connection to 10.17.208.34/10.17.208.34:20002 closed
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput$AckHandler.lambda$channelInactive$2(FanOutOneBlockAsyncDFSOutput.java:289)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.failed(FanOutOneBlockAsyncDFSOutput.java:236)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.access$300(FanOutOneBlockAsyncDFSOutput.java:99)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput$AckHandler.channelInactive(FanOutOneBlockAsyncDFSOutput.java:288)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHan

[jira] [Resolved] (HBASE-21350) Forward-port HBASE-21242 [amv2] Miscellaneous minor log and assign procedure create improvements

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-21350.
---
Resolution: Won't Fix

Stale. Context is different now.

> Forward-port HBASE-21242 [amv2] Miscellaneous minor log and assign procedure 
> create improvements
> 
>
> Key: HBASE-21350
> URL: https://issues.apache.org/jira/browse/HBASE-21350
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Sub-issue to forward port the parent. Its acting up and the parent has been 
> open long enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-21308) Forward-port to branch-2 "HBASE-21259 [amv2] Revived deadservers; recreated serverstatenode"

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-21308.
---
Resolution: Won't Fix

Stale. Context is different now.

> Forward-port to branch-2 "HBASE-21259 [amv2] Revived deadservers; recreated 
> serverstatenode"
> 
>
> Key: HBASE-21308
> URL: https://issues.apache.org/jira/browse/HBASE-21308
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> TODO: Recast HBASE-21259 so it works for branch-2; stuff is different in 
> branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-21613) Up/Down arrows on home page listing regions can look like smudges; needs definition

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-21613.
---
Resolution: Duplicate

Resolving as duplicate of HBASE-21403

> Up/Down arrows on home page listing regions can look like smudges; needs 
> definition
> ---
>
> Key: HBASE-21613
> URL: https://issues.apache.org/jira/browse/HBASE-21613
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Priority: Major
>
> This has come up a few times now on branch-2.1 votes:  
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201810.mbox/%3ccaaayanpjcjzsb+ynefeczp4pk9xgxxecw+pnpppvu6_ogoe...@mail.gmail.com%3E
>  and then later by our [~dbist13] votiing on a 2.1.2RC (Artem added an image 
> here: https://photos.app.goo.gl/PziWBMAzXCwbZqmF8). The UI got a bit of more 
> detail added around region transitioning and it cluttered the UI such that 
> the up/down sort arrows can look crowded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9779.
--
Resolution: Won't Fix

Stale. Context is different now.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify 
> table 
> ---
>
> Key: HBASE-9779
> URL: https://issues.apache.org/jira/browse/HBASE-9779
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Critical
> Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster 
> state.  Interestingly we can disable the table successfully but then 
> immediately after we fail the delete because we cannot get the table 
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which 
> causes clients to retry over and over again as though the descriptor was 
> going to come back.
> This bug is kinda ugly because in at least one case it caused our 
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:53,539 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70, 
> negotiated timeout = 4
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=9 
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper 
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Opening socket connection to server 
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
> initiating session
> 2013-10-11 18:27:54,696 INFO  
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: 
> Session establishment complete on server 
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71, 
> negotiated timeout = 4
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog 
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 
> 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down
> .2013-10-11 18:27:55,890 INFO  [main] zookeeper.ZooKeeper: Initiating client 
> connection, connectString=a1805.halxg.cloudera.com:2181 sessio

[jira] [Resolved] (HBASE-9403) Build improvements

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9403.
--
Resolution: Won't Fix

Stale. Context is different now.

> Build improvements
> --
>
> Key: HBASE-9403
> URL: https://issues.apache.org/jira/browse/HBASE-9403
> Project: HBase
>  Issue Type: Task
>  Components: build
>Reporter: Michael Stack
>Priority: Major
>
> Here are some improvements we could make to the build.  Will list them as I 
> think of them.  Can do them individually as subtasks of this one.
> When I undo the hbase-X.Y.Z-hadoop1-bin.tar.gz tarball and look at the doc, 
> the version is hbase-X.Y.Z-hadoop1 in the javadoc.  Should be just 
> hbase-X.Y.Z.  This is so in the xref src and in javadoc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9059) Address HBASE-8764 'Some MasterMonitorCallable should retry' review

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9059.
--
Resolution: Won't Fix

Stale. Context is different now.

> Address HBASE-8764 'Some MasterMonitorCallable should retry' review
> ---
>
> Key: HBASE-9059
> URL: https://issues.apache.org/jira/browse/HBASE-9059
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>
> Jesse came in w/ some review post-commit.  Let me address in this followup.  
> Let me paste form our offlist correspondence:
> {quote}
> +++ 
> b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionOfflineException.java
> @@ -24,7 +24,7 @@ import org.apache.hadoop.hbase.exceptions.RegionException;
>  
>  /** Thrown when a table can not be located */
>  @InterfaceAudience.Public
> -@InterfaceStability.Stable
> +@InterfaceStability.Evolving
> Really? Same patch? Come on man - you are doing similar cleanup all over the 
> place (shakes head)... :)
> +@InterfaceStability.Stable
> +public class RpcRetryingCaller {
> Calling this stable as the first time its going in seems a bit presumptuous...
> +this.startTime = EnvironmentEdgeManager.currentTimeMillis();
> +int remaining = (int)(callTimeout - (this.startTime - 
> this.globalStartTime));
> +if (remaining < MIN_RPC_TIMEOUT) {
> +  // If there is no time left, we're trying anyway. It's too late.
> +  // 0 means no timeout, and it's not the intent here. So we secure both 
> cases by
> +  // resetting to the minimum.
> +  remaining = MIN_RPC_TIMEOUT;
> +}
> +RpcClient.setRpcTimeout(remaining);
> Looks like some new logic... seems reasonable to me, so I'll let it slide 
> this time :)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9378) TestRegionFavoredNodes.testFavoredNodes

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9378.
--
Resolution: Won't Fix

Stale. Context is different now.

> TestRegionFavoredNodes.testFavoredNodes
> ---
>
> Key: HBASE-9378
> URL: https://issues.apache.org/jira/browse/HBASE-9378
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Stack
>Assignee: Devaraj Das
>Priority: Major
>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/700/testReport/org.apache.hadoop.hbase.regionserver/TestRegionFavoredNodes/testFavoredNodes/
> {code}
> org.apache.hadoop.hbase.regionserver.TestRegionFavoredNodes.testFavoredNodes
> Failing for the past 1 build (Since Failed#700 )
> Took 61 ms.
> add description
> Error Message
> Block location 127.0.0.1:51233 not a favored node
> Stacktrace
> java.lang.AssertionError: Block location 127.0.0.1:51233 not a favored node
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionFavoredNodes.testFavoredNodes(TestRegionFavoredNodes.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runners.Suite.runChild(Suite.java:127)
>   at org.junit.runners.Suite.runChild(Suite.java:26)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> Any chance of your taking a looksee [~devaraj]?  What you reckon? Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9053) Reenable TestHFileOutputFormat.testExcludeAllFromMinorCompaction

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9053.
--
Resolution: Won't Fix

> Reenable TestHFileOutputFormat.testExcludeAllFromMinorCompaction
> 
>
> Key: HBASE-9053
> URL: https://issues.apache.org/jira/browse/HBASE-9053
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Michael Stack
>Priority: Major
>
> Reenable TestHFileOutputFormat.testExcludeAllFromMinorCompaction after making 
> it so it is no longer flakey.  Was disabled over in HBASE-9051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8990) Reenable TestFromClientSideWithCoprocessor.testClientPoolThreadLocal

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8990.
--
Resolution: Won't Fix

Stale. Context is different now.

> Reenable TestFromClientSideWithCoprocessor.testClientPoolThreadLocal
> 
>
> Key: HBASE-8990
> URL: https://issues.apache.org/jira/browse/HBASE-8990
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Michael Stack
>Priority: Major
>
> Look at HBASE-8989 and figure why it is flakey then reenable this test.  It 
> came in as part of HBASE-2939



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8958) Sometimes we refer to the single .META. table region as ".META.,,1" and other times as ".META.,,1.1028785192"

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8958.
--
Resolution: Won't Fix

Stale. Context is different now.

> Sometimes we refer to the single .META. table region as ".META.,,1" and other 
> times as ".META.,,1.1028785192" 
> --
>
> Key: HBASE-8958
> URL: https://issues.apache.org/jira/browse/HBASE-8958
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>
> See here how we say in a log:
> {code}
> 2013-07-15 22:32:53,805 INFO  [main] regionserver.HRegion(4176): Open 
> {ENCODED => 1028785192, NAME => '.META.,,1', STARTKEY => '', ENDKEY => ''}
> {code}
> but when we open other regions we do:
> {code}
> 764 2013-07-15 22:40:10,867 INFO  [RS_OPEN_REGION-durruti:61987-0] 
> regionserver.HRegion: Open {ENCODED => 93dad2bbf6ff5ea0d7477f504b303346, NAME 
> => 'x,,1373953210791.93dad2bbf6ff5ea0d7477f504b303346.', ...
> {code}
> Note how in the second, the name includes the encoded name.
> We'll also do :
> {code}
> 2013-07-15 22:32:53,810 INFO  [main] regionserver.HRegion(629): Onlined 
> 1028785192/.META.; next sequenceid=1
> {code}
> vs
> {code}
> 785 2013-07-15 22:40:10,885 INFO  [AM.ZK.Worker-pool-2-thread-7] 
> master.RegionStates: Onlined 93dad2bbf6ff5ea0d7477f504b303346 on 
> durruti,61987,1373947581222
> {code}
> ... where we print the encoded name.
> Master web UI shows ".META.,,1.1028785192"
> Benoit originally noticed this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8748) Be able to accomodate zookeeper going away for a minute or two -- or more

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8748.
--
Resolution: Won't Fix

Stale. Context is different now.

> Be able to accomodate zookeeper going away for a minute or two -- or more
> -
>
> Key: HBASE-8748
> URL: https://issues.apache.org/jira/browse/HBASE-8748
> Project: HBase
>  Issue Type: Brainstorming
>  Components: Zookeeper
>Reporter: Michael Stack
>Priority: Major
>
> I was talking w/ Christophe Taton yesterday and he asked what happens if 
> zookeeper goes away for a minute or two -- say a network or ensemble hiccup 
> of some type -- then what happens?
> Unless the ensemble comes back inside the zk session timeout, the cluster 
> will go down.
> To my knowledge, zk has hiccuped a few times.  There was the bug where 
> sequence numbers rolled around the top causing the ensemble to blip (fixed in 
> a newer zk).  There was another event where some combination of 
> a leader election and accumulated log files (>100k) caused the 
> ensemble blip at SU.  
> At FB apparently the zk session is way up -- > 5minutes -- in case a 
> top-of-the-rack switch reboots partitioning the network separating nodes from 
> the zk ensemble and rather than rely on presence of ephemeral nodes, rather, 
> they depend on heartbeats to determine presence or not of a regionserver (w/ 
> some smarts so that if all members of a rack disappear at the same time, it 
> is not likely they all crashed at same time).
> I am stating the obvious I know but the base presumption that zk will just 
> always be there is lazy on our part and we should not be acting as though it 
> were.
> Marking this a brainstorming issue because will need a bit of 
> discussion/design undoing our current presumption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8717) ui is inconsistent in its use of server names

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8717.
--
Resolution: Implemented

Fixed elsewhere.

> ui is inconsistent in its use of server names
> -
>
> Key: HBASE-8717
> URL: https://issues.apache.org/jira/browse/HBASE-8717
> Project: HBase
>  Issue Type: Bug
>  Components: UI, Usability
>Reporter: Michael Stack
>Priority: Major
>
> In main master screen, the regionservers are listed showing their hostname 
> only though the heading is 'ServerName': sss-4 rather than 
> sss-4,60020,1369949440012.  Should we list ServerName here?  Would have port 
> on it.
> The dead servers tab shows full ServerName as in sss-4,60020,1369949440012.  
> The column is named ServerName.  This looks right.
> The name on the master main page is 'Master: sss-1'. Should it be 'Master: 
> sss-1,60020,1369949440012'; i.e. full ServerName?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8601) Make ROW bloom work w/ .META.

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8601.
--
Resolution: Won't Fix

Stale. Context is different now.

> Make ROW bloom work w/ .META.
> -
>
> Key: HBASE-8601
> URL: https://issues.apache.org/jira/browse/HBASE-8601
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: Michael Stack
>Priority: Major
>
> I just tried enabling ROW blooms globally but tests against meta were failing 
> doing getClosestOrBefore.  If I waited, they worked.  Something odd is going 
> on here.  Would be good having blooms on on .META.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-8009) Fix and reenable the hbase-example unit tests.

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8009.
--
Resolution: Won't Fix

Stale. Context is different now.

> Fix and reenable the hbase-example unit tests.
> --
>
> Key: HBASE-8009
> URL: https://issues.apache.org/jira/browse/HBASE-8009
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Michael Stack
>Priority: Critical
>
> The unit tests pass locally for me repeatedly but fail from time to time up 
> on jenkins.  HBASE-7994 disabled them.  This issue is about spending the time 
> to make sure they pass up on jenkins again.  They have been disabled because 
> unit tests have been failing way more often than they have been passing over 
> the last few months and we want to establish passing tests as the precedent 
> again.  Once that is in place, we can work on bringing back examples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-7909) How to figure if a Cell is deep or shallow.

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-7909.
--
Resolution: Won't Fix

Stale. Context is different now.

> How to figure if a Cell is deep or shallow.
> ---
>
> Key: HBASE-7909
> URL: https://issues.apache.org/jira/browse/HBASE-7909
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Priority: Minor
>
> The CellScanner interface is how you iterate scanners.  It is more bare bones 
> than java Iterator, explicitly so, to minimize the need for retaining 
> references to the current Cell.
> The Interface currently has get/current to pull the Cell that is currently 
> loaded in the breech.  It also has (had) another method getDeepCopy.  This 
> latter was removed by hbase-7899 "Tools to build cell blocks" as suggested by 
> [~mcorgan] (and seconded by other reviewers in that they found it 
> problematic).
> So, how then to determine if the Cell you have is a deep or shallow copy?
> On the one hand, should we even be concerned?  The whole point of our Cell 
> retrofit, in part, is to force us disconnect from how the Cell is implemented 
> so maybe we should just do away w/ this notion of deepCopy altogether and 
> hope that  in action, we don't actually need it and that we our fixation is 
> only because deepCopies is all we ever had when we were exclusively KeyValue.
> Or, do we need to add a means of asking a Cell "Are you deep?" or having 
> deepCopies implement a subInterface -- StableCell or StandaloneCell?
> This issue raises the problem but I do not think it critical we deal with it 
> just now.  At least, I do not see imminent need, at least not currently where 
> we are still Cell backed by "deepCopy" KeyValues.  Maybe later when we have 
> different implementations this issue will come to the fore.  Until then, am 
> fine leaving it as minor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-7025) Metric for how many WAL files a regionserver is carrying

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-7025.
--
Resolution: Implemented

This is implemented I believe (at least I can see a graph that shows WAL counts 
per host... where I'm sitting)

> Metric for how many WAL files a regionserver is carrying
> 
>
> Key: HBASE-7025
> URL: https://issues.apache.org/jira/browse/HBASE-7025
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Michael Stack
>Priority: Major
>
> A metric that shows how many WAL files a regionserver is carrying at any one 
> time would be useful for fingering those servers that are always over the 
> upper bounds and in need of attention



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-7023) Forward-port HBASE-6727 size-based HBaseServer callQueue throttle from 0.89fb branch

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-7023.
--
Resolution: Won't Fix

Stale. Context is different now.

> Forward-port HBASE-6727 size-based HBaseServer callQueue throttle from 0.89fb 
> branch
> 
>
> Key: HBASE-7023
> URL: https://issues.apache.org/jira/browse/HBASE-7023
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC
>Reporter: Michael Stack
>Assignee: Ted Yu
>Priority: Major
>  Labels: beginner
> Attachments: 6727-fb.txt
>
>
> Forward port the size base throttle that is out in 0.89fb branch.  Its nicer 
> than what we have in trunk where we just count queue items.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-6902) Add doc and unit test of the various checksum settings

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-6902.
--
Resolution: Won't Fix

Stale. Context is different now.

> Add doc and unit test of the various checksum settings
> --
>
> Key: HBASE-6902
> URL: https://issues.apache.org/jira/browse/HBASE-6902
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Michael Stack
>Priority: Critical
>
> See HBASE-6868.  Doc the options, their pluses and negatives as well as the 
> bugs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-6154) Pom cleanups; move verion properties above their use, add NO-MVN-MAN-VER, eclipse fixes

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-6154.
--
Resolution: Won't Fix

Stale. Context is different now.

> Pom cleanups; move verion properties above their use, add NO-MVN-MAN-VER, 
> eclipse fixes
> ---
>
> Key: HBASE-6154
> URL: https://issues.apache.org/jira/browse/HBASE-6154
> Project: HBase
>  Issue Type: Task
>  Components: pom
>Reporter: Michael Stack
>Priority: Minor
>  Labels: delete
>
> See Jesse comments over in hbase-6145.  Good stuff on changes to improve poms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-5868) jmx bean layout makes no sense

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-5868.
--
Resolution: Won't Fix

Stale. Context is different now.

> jmx bean layout makes no sense
> --
>
> Key: HBASE-5868
> URL: https://issues.apache.org/jira/browse/HBASE-5868
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Critical
>  Labels: delete
>
> Top level is 'hadoop'  Under 'hadoop', there is 'HBase' MBean.  BESIDE this 
> MBean is one named Master and another named RegionServer.  It makes no sense.
> Top level should be org.apache.hbase.  Inside there should be an MBean per 
> running server.  It should be the server's ServerName, not 'Master' or 
> 'RegionServer'.
> Under RegionServer there is a RegionServer bean [sic], then beside it a 
> RegionServerStatistics and a RegionServerDynamicStatistics.
> I'd think that as they are, they are unusable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-6199) Change PENDING_OPEN scope from pre-rpc open to OPENING to just post-rpc open to OPENING

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-6199.
--
Resolution: Won't Fix

Stale. Context is different now.

> Change PENDING_OPEN scope from pre-rpc open to OPENING to just post-rpc open 
> to OPENING
> ---
>
> Key: HBASE-6199
> URL: https://issues.apache.org/jira/browse/HBASE-6199
> Project: HBase
>  Issue Type: Improvement
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Attachments: 6199v4.txt, pending_open.txt, pending_open2.txt, 
> pending_open3.txt
>
>
> PENDING_OPEN currently is a murky state.  Its a master in-memory state with 
> no corresponding znode state that sits between OFFLINE and OPENING states.
> The OFFLINE state is set by the master when it goes to open a region.  
> OPENING is set by the regionserver after its assumed control of a region and 
> is moving it through the OPENING process.  PENDING_OPEN currently spans the 
> open rpc invocation.  This state is in place pre-open-rpc-invocation, during 
> open-rpc-invocation, and post-rpc-invocation until we get the OPENING 
> callback. That PENDING_OPEN covers this many different conditions effectively 
> makes it unactionable.
> This issue proposes PENDING_OPEN only be in place post-rpc-invocation.  Now 
> its meaning is clear as the space between rpc-open-invocation and our 
> receiving the callback which sets RegionState to OPENING.  PENDING_OPEN 
> becomes actionable too in that if a regionserver dies post 
> rpc-open-invocation, we know that we can reassign the region.
> See 
> https://issues.apache.org/jira/browse/HBASE-6060?focusedCommentId=13292646&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13292646
>  for more discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-5275) Create migration for hbase-2600 meta table rejigger so regions denoted by end row

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-5275.
--
Resolution: Won't Fix

Stale. Context is different now.

> Create migration for hbase-2600 meta table rejigger so regions denoted by end 
> row
> -
>
> Key: HBASE-5275
> URL: https://issues.apache.org/jira/browse/HBASE-5275
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Priority: Major
>
> Chatting with Alex, we'd do as was done previous where we'll can data from 
> 0.92 and then have a test that unbundles this canned data, migrates it and 
> then makes sure all still works.  Migration test would include verification 
> of idempotency; i.e. if migration fails midway, we should be able to rerun it.
> Canned data should include a meta with splits and WALs to split (migrations 
> usually require clean shutdown so no WALs should be in place but just in 
> case... And replication is reading logs)
> We were thinking that on startup, we'd check hbase.version file.  If not 
> updated, we'd rewrite .META. offline before starting up.
> In offline mode -- open of the .META. regions -- we'd do a rewrite per row 
> changing the HRegionInfo version from VERSION=1 to VERSION=2.
> VERSION=2 is the new format HRegionInfo.
> VERSION=2 will use endrow but it will keep its current encoded  name (though 
> it was generated with startrow as input) so we don't have to move stuff 
> around in filesystem.
> New HRIs subsequent to the migration will be written out as VERSION=3.  A 
> VERSION=3 has endrow in its name but the encoded name will be made using 
> startrow+endrow+regionid+tablename rather than just 
> startrow+regionid+tablename as in VERSION=1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-3628) Add upper bound on threads for TThreadPoolServer; too many have run into the OOME can't create native thread because thrift spawns w/o bound

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-3628.
--
Resolution: Won't Fix

Stale. Context is different now.

> Add upper bound on threads for TThreadPoolServer; too many have run into the 
> OOME can't create native thread because thrift spawns w/o bound
> 
>
> Key: HBASE-3628
> URL: https://issues.apache.org/jira/browse/HBASE-3628
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Reporter: Michael Stack
>Priority: Major
>  Labels: thrift, thrift2
>
> See tail of this thread:
> http://search-hadoop.com/m/Ooyif0dZ89/major+hdfs+issues&subj=Re+major+hdfs+issues
> We need to hack in something like the below:
> {code}
> diff --git a/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 
> b/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
> index 06621ab..74856af 100644
> --- a/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
> +++ b/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
> @@ -69,6 +69,7 @@ import org.apache.hadoop.hbase.thrift.generated.TRegionInfo;
>  import org.apache.hadoop.hbase.thrift.generated.TRowResult;
>  import org.apache.hadoop.hbase.util.Bytes;
>  import org.apache.thrift.TException;
> +import org.apache.thrift.TProcessorFactory;
>  import org.apache.thrift.protocol.TBinaryProtocol;
>  import org.apache.thrift.protocol.TCompactProtocol;
>  import org.apache.thrift.protocol.TProtocolFactory;
> @@ -911,9 +912,25 @@ public class ThriftServer {
>} else {
>  transportFactory = new TTransportFactory();
>}
> -
> -  LOG.info("starting HBase ThreadPool Thrift server on " + listenAddress 
> + ":" + Integer.toString(listenPort));
> -  server = new TThreadPoolServer(processor, serverTransport, 
> transportFactory, protocolFactory);
> +  TThreadPoolServer.Options poolServerOptions =
> +new TThreadPoolServer.Options();
> +  int maxWorkerThreads = Integer.MAX_VALUE;
> +  if (cmd.hasOption("maxWorkerThreads")) {
> +try {
> +  maxWorkerThreads =
> +Integer.parseInt(cmd.getOptionValue("maxWorkerThreads", "" + 
> Integer.MAX_VALUE));
> +} catch (NumberFormatException e) {
> +  LOG.error("Could not parse maxWorkerThreads option", e);
> +  printUsageAndExit(options, -1);
> +}
> +  }
> +  poolServerOptions.maxWorkerThreads = maxWorkerThreads;
> +  LOG.info("starting HBase ThreadPool Thrift server on " + listenAddress 
> +
> +":" + Integer.toString(listenPort) +
> +", maxWorkerThreads=" + maxWorkerThreads);
> +  server = new TThreadPoolServer(processor, serverTransport,
> +transportFactory, transportFactory, protocolFactory, protocolFactory,
> +poolServerOptions);
>  }
> {code}
> ...only with better factoring AND exposing other options in Options; they 
> look useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-3675) hbase.hlog.split.skip.errors is false by default but we don't act properly when its true; can make for inconsistent view

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-3675.
--
Resolution: Won't Fix

Stale. Context is different now.

> hbase.hlog.split.skip.errors is false by default but we don't act properly 
> when its true; can make for inconsistent view
> 
>
> Key: HBASE-3675
> URL: https://issues.apache.org/jira/browse/HBASE-3675
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Critical
>
> So, by default hbase.hlog.split.skip.error is false so we should not be 
> skipping errors (What should we do, abort?).
> Anyways, see https://issues.apache.org/jira/browse/HBASE-3674.  It has 
> checksum error on near to last log BUT it writes out recovered.edits  gotten 
> so far.  We then go and assign the regions anyways, applying edits gotten so 
> far, though there are edits behind the checksum error still to be recovered.  
> Not good.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-3638.
--
Resolution: Won't Fix

Stale. Context is different now.

> If a FS bootstrap, need to also ensure ZK is cleaned
> 
>
> Key: HBASE-3638
> URL: https://issues.apache.org/jira/browse/HBASE-3638
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Minor
>  Labels: beginner
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), 
> noticed that we were doing a bootstrap on startup but then we were picking up 
> the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, 
> region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
> BOOTSTRAP: creating ROOT and first META regions
> .
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 
> 1028785192 references a server no longer up X.X.X; letting RIT timeout so 
> will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-2958) When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-2958.
--
Resolution: Won't Fix

Stale. Context is different now.

> When hbase.hlog.split.skip.errors is set to false, we fail the split but 
> thats it
> -
>
> Key: HBASE-2958
> URL: https://issues.apache.org/jira/browse/HBASE-2958
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>  Labels: delete
>
> When hbase.hlog.split.skip.errors is set to false, if we encounter an error 
> splitting, splitting stops and exception is let propagate up the stack.  I 
> see that its caught in the new MasterFileSystem class and logged, but thats 
> it.  It would seem processing continues BUT we've dropped the edits in the 
> split.  We need to do better (default is hbase.hlog.split.skip.errors set to 
> false -- i.e. skip errors but keep going).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-2236) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-2236.
--
Resolution: Won't Fix

Still an issue but context is different now. Resolving this one.

> Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)
> --
>
> Key: HBASE-2236
> URL: https://issues.apache.org/jira/browse/HBASE-2236
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, wal
>Reporter: Michael Stack
>Priority: Critical
>  Labels: moved_from_0_20_5
>
> So hbase-2053 is not aggressive enough.  WALs can still overwhelm the upper 
> limit on log count.  While the code added by HBASE-2053, when done, will 
> ensure we let go of the oldest WAL, to do it, we might have to flush many 
> regions.  E.g:
> {code}
> 2010-02-15 14:20:29,351 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=45, maxlogs=32; forcing flush of 5 regions(s): 
> test1,193717,1266095474624, test1,194375,1266108228663, 
> test1,195690,1266095539377, test1,196348,1266095539377, 
> test1,197939,1266069173999
> {code}
> This takes time.  If we are taking on edits a furious rate, we might have 
> rolled the log again, meantime, maybe more than once.
> Also log rolls happen inline with a put/delete as soon as it hits the 64MB 
> (default) boundary whereas the necessary flushing is done in background by a 
> single thread and the memstore can overrun the (default) 64MB size.  Flushes 
> needed to release logs will be mixed in with "natural" flushes as memstores 
> fill.  Flushes may take longer than the writing of an HLog because they can 
> be larger.
> So, on an RS that is struggling the tendency would seem to be for a slight 
> rise in WALs.  Only if the RS gets a breather will the flusher catch up.
> If HBASE-2087 happens, then the count of WALs get a boost.
> Ideas to fix this for good would be :
> + Priority queue for queuing up flushes with those that are queued to free up 
> WALs having priority
> + Improve the HBASE-2053 code so that it will free more than just the last 
> WAL, maybe even queuing flushes so we clear all WALs such that we are back 
> under the maximum WALS threshold again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-1872) Dirty, fast, kill table script

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-1872.
--
Resolution: Won't Fix

Stale

> Dirty, fast, kill table script
> --
>
> Key: HBASE-1872
> URL: https://issues.apache.org/jira/browse/HBASE-1872
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>  Labels: beginner
> Attachments: kill_table.rb
>
>
> Some fellas embedding hbase want to be able to kill tables quickly between 
> tests; they don't want to have to wait on enable/disable stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-1611) Have shell output binary hex-encoded rather than octal-encoded

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-1611.
--
Resolution: Won't Fix

Stale

> Have shell output binary hex-encoded rather than octal-encoded
> --
>
> Key: HBASE-1611
> URL: https://issues.apache.org/jira/browse/HBASE-1611
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>  Labels: beginner
>
> Native Ruby String dump and inspect output unprintables in octal.  Don't seem 
> to be able to change that fact.  Figure way to do them as hex to match 
> binaries in UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23572) In 'HBCK Report', distinguish between live, dead, and unknown servers

2019-12-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23572.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Merged manually to branch-2.2+. Thanks for review [~busbey]

> In 'HBCK Report', distinguish between live, dead, and unknown servers
> -
>
> Key: HBASE-23572
> URL: https://issues.apache.org/jira/browse/HBASE-23572
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Debugging, when viewing 'HBCK Report' sections, it helps if we know if 
> referenced server is online, dead, or unknown.
> Add ornamentation so that when we mention a servername in 'HBCK Report', if 
> live, then show the server as link (to live server), if dead, show it in 
> italics, and if unknown, show it plain text. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23600) Improve chances of edits landing into hbase:meta even when high load

2019-12-19 Thread Michael Stack (Jira)
Michael Stack created HBASE-23600:
-

 Summary: Improve chances of edits landing into hbase:meta even 
when high load
 Key: HBASE-23600
 URL: https://issues.apache.org/jira/browse/HBASE-23600
 Project: HBase
  Issue Type: Improvement
  Components: rpc
Reporter: Michael Stack


Of late I've been testing clusters under high load to study failures and to 
figure how to effect recovery if cluster is unable to recover on its own.

One interesting case is a RS that is struggling mostly because writes to HDFS 
are backed up and sync calls are running very slow taking a long time to 
complete. The RPC backs up with waiting requests, and eventually goes over one 
or more bounds. The RS then starts throwing CallQueueTooBigExceptions. This 
struggling state can last a good while. We throw CQTBEs whatever the priority 
of the incoming request.

We throw CQTBE in two places; on original parse of the request before we 
dispatch it on a handler -- here we check size of all queues and if over the 
threshold (default 1G), throw the exception -- and then later when we dispatch 
the request to internal queues, we'll count items in queue and if over default 
in any one queue (default is 10 * handler count), we'll fail dispatch and again 
throw CQTBE.

We shouldn't be running w/ big queues. We should be rejecting Requests we know 
we'll never process in time before client loses interest (See the CoDel thesis 
and the implementations added a good while back). TODO.

Meantime I was looking to see if having read a high-priority request, if rather 
than dropping it on the floor, instead, what would happen if I let it through 
even if above thresholds? My main concern is edits to hbase:meta. When 
sustained, saturated load on the RS carrying hbase:meta, edits may not land. 
The result is incomplete Procedures and a disorientated Master. I was playing 
w/ trying to put off the corruption as long as possible, experimenting (CoDel 
doesn't do priority at first blush; we probably want to add this).





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23596) HBCKServerCrashProcedure can double assign

2019-12-18 Thread Michael Stack (Jira)
Michael Stack created HBASE-23596:
-

 Summary: HBCKServerCrashProcedure can double assign
 Key: HBASE-23596
 URL: https://issues.apache.org/jira/browse/HBASE-23596
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Reporter: Michael Stack
 Fix For: 2.2.3


The new SCP that does SCP plus cleanup 'Unknown Servers' with mentions in 
hbase:meta added by the below can make for double assignments.

{code}
commit c238891a26734e1e4276b6b1677a58cf83de5dc4
Author: stack 
Date:   Wed Nov 13 22:36:26 2019 -0800

HBASE-23282 HBCKServerCrashProcedure for 'Unknown Servers'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23593) Stalled SCP Assigns

2019-12-18 Thread Michael Stack (Jira)
Michael Stack created HBASE-23593:
-

 Summary: Stalled SCP Assigns
 Key: HBASE-23593
 URL: https://issues.apache.org/jira/browse/HBASE-23593
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.2.3
Reporter: Michael Stack


I'm stuck on this one so doing a write up here in case anyone else has ideas.

Heavily loaded cluster. Server crashes. SCP cuts in and usually no problem but 
from time to time I'll see the SCP stuck waiting on an Assign to finish. The 
assign seems stuck at the queuing of the OpenRegionProcedure. We've stored the 
procedure but then not a peek thereafter. Later we'll see complaint that the 
region is STUCK. Doesn't recover. Doesn't run.

Basic story is as follows:

Server dies:
{code}
 2019-12-17 11:10:42,002 INFO 
org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node 
deleted, processing expiration [s011.example.org,16020,1576561318119]
 2019-12-17 11:10:42,002 DEBUG org.apache.hadoop.hbase.master.DeadServer: Added 
s011.example.org,16020,1576561318119; numProcessing=1
...
 2019-12-17 11:10:42,110 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
Started processing s011.example.org,16020,1576561318119; numProcessing=1
{code}

The dead server restarts which purges the old server from dead server and 
processing lists:

{code}
 2019-12-17 11:10:58,145 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
Removed s011.example.org,16020,1576561318119, processing=true, numProcessing=0
 2019-12-17 11:10:58,145 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
STARTUP: Server s011.example.org,16020,1576581054424 came back up, removed it 
from the dead servers list
{code}
 

even though we are still processing logs in the SCP of the old server...

{code}
 2019-12-17 11:10:58,392 INFO org.apache.hadoop.hbase.wal.WALSplitUtil: 
Archived processed log 
hdfs://nameservice1/hbase/WALs/s011.example.org,16020,1576561318119-splitting/s011.example.org%2C16020%2C1576561318119.s011.example.org%2C16020%2C1576561318119.regiongroup-0.1576580737491
 to hdfs://nameservice1/hbase/oldWALs/s011.example.
org%2C16020%2C1576561318119.s011.example.org%2C16020%2C1576561318119.regiongroup-0.1576580737491
{code}

I thought early purge of deadserver was a problem but I don't think so after 
study.

WALS split took two minutes to split and server was removed from dead 
servers...  three minutes earlier...
{code}
 2019-12-17 11:13:05,356 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
Finished splitting (more than or equal to) 30.6G (32908464448 bytes) in 228 log 
files in 
[hdfs://nameservice1/hbase/WALs/s011.example.org,16020,1576561318119-splitting] 
in 143236ms
{code}

 Almost immediately we get this:

{code}
 2019-12-17 11:14:08,649 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition state=OPEN, location=s011.example.org,16020,1576561318119, 
table=t1, region=9d6d6d5f261a0cbe7c9e85091f2c2bd4
{code}

For this region assign, I see the SCP proc making an assign for this region 
which then makes a subtask to OpenRegionProcedure. This is where it gets stuck. 
No progress after this. The procedure does not come alive to run.

Here are logs for the ORP pid=421761:

{code}
2019-12-17 11:38:34,761 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized 
subprocedures=[{pid=421761, ppid=402475, state=RUNNABLE; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]

2019-12-17 11:38:34,765 DEBUG 
org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: Add 
TableQueue(t1, xlock=false sharedLock=3144 size=427) to run queue because: the 
exclusive lock is not held by anyone when adding pid=421761, ppid=402475, 
state=RUNNABLE; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
2019-12-17 11:38:34,770 DEBUG 
org.apache.hadoop.hbase.procedure2.RootProcedureState: Add procedure 
pid=421761, ppid=402475, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 3193th 
rollback step
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-22920) github pr testing job should use dev-support script for gathering machine info

2019-12-13 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-22920.
---
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Beata Sudi... please sign into JIRA so we can credit you this issue. Meanwhile 
resolving w/o an assignee.

> github pr testing job should use dev-support script for gathering machine info
> --
>
> Key: HBASE-22920
> URL: https://issues.apache.org/jira/browse/HBASE-22920
> Project: HBase
>  Issue Type: Improvement
>  Components: community, test
>Reporter: Sean Busbey
>Priority: Major
>  Labels: beginner
> Fix For: 3.0.0
>
>
> the PR tester {{Jenkinsfile_GitHub}} has its own set of commands for 
> gathering information about the build environment it runs in. Instead it 
> should rely on the {{dev-support/gather_machine_environment.sh}} that gets 
> used by nightly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23572) In 'HBCK Report', distringush between live, dead, and unknown servers

2019-12-12 Thread Michael Stack (Jira)
Michael Stack created HBASE-23572:
-

 Summary: In 'HBCK Report', distringush between live, dead, and 
unknown servers
 Key: HBASE-23572
 URL: https://issues.apache.org/jira/browse/HBASE-23572
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


Debugging, when viewing 'HBCK Report' sections, it helps if we know if 
referenced server is online, dead, or unknown.

Add ornamentation so that when we mention a servername in 'HBCK Report', if 
live, then show the server as link (to live server), if dead, show it in 
italics, and if unknown, show it plain text. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23570) Point users to the async-profiler home page if diagrams are coming up blank

2019-12-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23570.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Merged these one-liners to master.

> Point users to the async-profiler home page if diagrams are coming up blank
> ---
>
> Key: HBASE-23570
> URL: https://issues.apache.org/jira/browse/HBASE-23570
> Project: HBase
>  Issue Type: Bug
>  Components: profiler
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Add minor note on servlet and to doc pointing folks to async-profiler home 
> page if diagrams are coming up blank 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23570) Point users to the async-profiler home page if diagrams are coming up blank

2019-12-12 Thread Michael Stack (Jira)
Michael Stack created HBASE-23570:
-

 Summary: Point users to the async-profiler home page if diagrams 
are coming up blank
 Key: HBASE-23570
 URL: https://issues.apache.org/jira/browse/HBASE-23570
 Project: HBase
  Issue Type: Bug
  Components: profiler
Reporter: Michael Stack
Assignee: Michael Stack


Add minor note on servlet and to doc pointing folks to async-profiler home page 
if diagrams are coming up blank 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23555) TestQuotaThrottle is broken

2019-12-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23555.
---
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merge to master and backported to branch-2. Thanks for the fix [~meiyi]

> TestQuotaThrottle is broken
> ---
>
> Key: HBASE-23555
> URL: https://issues.apache.org/jira/browse/HBASE-23555
> Project: HBase
>  Issue Type: Bug
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
>
> TestQuotaThrottle is broken now. And it is anotated as Ignore because it's 
> flakey so the Jenkins test can not report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23554) Encoded regionname to regionname utility

2019-12-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23554.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-2.2+. Thanks for reviews [~busbey] and [~zhangduo]

> Encoded regionname to regionname utility
> 
>
> Key: HBASE-23554
> URL: https://issues.apache.org/jira/browse/HBASE-23554
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Debugging I keep wanting to look at region state/transition in meta but all I 
> have is encoded region name gleaned from log or from some parts of the UI. I 
> find myself doing dump of the meta table to a text file just to search 
> especially if region replicas are enabled (their encoded name is not even 
> mentioned in hbase:meta). Utility that let me lookup regionname using encoded 
> regionname would be handy.
> This actually exists already... almost. The Admin Service has a 
> getRegionInfo. Usually it just returns RegionInfo if passed a region name. It 
> can add a bit more info if it a MOB Region and the query is against Master or 
> if the query is against the hosting RegionServer, it can tack on some 
> compaction state detail. Wouldn't take much to extend this existing facility 
> so could query w/ encoded name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23556) Minor ChoreService Cleanup

2019-12-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23556.
---
Fix Version/s: master
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed on Master. Thanks for patch [~belugabehr]

> Minor ChoreService Cleanup
> --
>
> Key: HBASE-23556
> URL: https://issues.apache.org/jira/browse/HBASE-23556
> Project: HBase
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: master
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23561) Look up of Region in Master by encoded region name is O(n)

2019-12-10 Thread Michael Stack (Jira)
Michael Stack created HBASE-23561:
-

 Summary: Look up of Region in Master by encoded region name is O(n)
 Key: HBASE-23561
 URL: https://issues.apache.org/jira/browse/HBASE-23561
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


{{  public RegionState getRegionState(final String encodedRegionName) {
// TODO: Need a map  but it is just dispatch merge...
for (RegionStateNode node: regionsMap.values()) {
  if (node.getRegionInfo().getEncodedName().equals(encodedRegionName)) {
return node.toRegionState();
  }
}
return null;
  }}}

It is not used much so making it trivial.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23554) Encoded regionname to regionname utility

2019-12-09 Thread Michael Stack (Jira)
Michael Stack created HBASE-23554:
-

 Summary: Encoded regionname to regionname utility
 Key: HBASE-23554
 URL: https://issues.apache.org/jira/browse/HBASE-23554
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Michael Stack
Assignee: Michael Stack
 Fix For: 3.0.0, 2.3.0, 2.2.3


Debugging I keep wanting to look at region state/transition in meta but all I 
have is encoded region name gleaned from log or from some parts of the UI. I 
find myself doing dump of the meta table to a text file just to search 
especially if region replicas are enabled (their encoded name is not even 
mentioned in hbase:meta). Utility that let me lookup regionname using encoded 
regionname would be handy.

This actually exists already... almost. The Admin Service has a getRegionInfo. 
Usually it just returns RegionInfo if passed a region name. It can add a bit 
more info if it a MOB Region and the query is against Master or if the query is 
against the hosting RegionServer, it can tack on some compaction state detail. 
Wouldn't take much to extend this existing facility so could query w/ encoded 
name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23369) Auto-close 'unknown' Regions reported as OPEN on RegionServers

2019-12-04 Thread Michael Stack (Jira)
Michael Stack created HBASE-23369:
-

 Summary: Auto-close 'unknown' Regions reported as OPEN on 
RegionServers
 Key: HBASE-23369
 URL: https://issues.apache.org/jira/browse/HBASE-23369
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


In old days, if a RegionServer reported a variance that didn't agree w/ Master 
view of the cluster, we'd kill the RegionServer.

Lately, in tests that overrun a cluster, after a sustained high-load, Master 
can start failing its updates against Meta (CallQueueTooBigException <= More on 
this later). It then can lose proper accounting of all Region members. One 
variant has a RegionServer reporting its list of open Regions to the Master and 
the Master doesn't 'know' of a particular Region or the Master may know the 
Region but expects it open on another RegionServer.

Here is an example of how it looks each time RS reports:

{code}
 2019-12-03 07:07:00,757 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but 
reported ONLINE at server.example.org,16020,1575354666245 
(inServerRegionList=false).
 2019-12-03 07:07:03,793 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but 
reported ONLINE at server.example.org,16020,1575354666245 
(inServerRegionList=false).
{code}

Will also show as an 'inconsistency' in the 'HBCK' tab on the Master UI.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23117) Bad enum in hbase:meta info:state column can fail loadMeta and stop startup

2019-11-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23117.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.2+ . Thanks for the fix [~sandeep.pal] (thanks too to the 
reviewers).

> Bad enum in hbase:meta info:state column can fail loadMeta and stop startup
> ---
>
> Key: HBASE-23117
> URL: https://issues.apache.org/jira/browse/HBASE-23117
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Michael Stack
>Assignee: Sandeep Pal
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Had a bad value in info:state field in meta and it made it so couldn't start 
> up the cluster; loadMeta would not succeed. If a bad state, should note it, 
> compensate, and move on.
> The bad entry was an own goal that happened while trying to fix other issues 
> in a pre-hbck2 cluster.
> Here was the exception:
> {code}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.hbase.master.RegionState.State.1
>   at java.lang.Enum.valueOf(Enum.java:238)
>   at 
> org.apache.hadoop.hbase.master.RegionState$State.valueOf(RegionState.java:37)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.getRegionState(RegionStateStore.java:338)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMetaEntry(RegionStateStore.java:116)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.access$100(RegionStateStore.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore$1.visit(RegionStateStore.java:87)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:769)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanRegions(MetaTableAccessor.java:220)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMeta(RegionStateStore.java:77)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.loadMeta(AssignmentManager.java:1248)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1209)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:998)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2260)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23332) [HBCKReport] Split Regions shown as Overlaps in 'Overlap' section

2019-11-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23332.
---
Resolution: Cannot Reproduce

Resolving. Lost logs. Seems like root cause is corrupt procedure. Spent time 
verifying we don't drop 'split/offline' flags when serializing to hbase:meta 
and that seems fine. Resolving because unable to debug.

> [HBCKReport] Split Regions shown as Overlaps in 'Overlap' section
> -
>
> Key: HBASE-23332
> URL: https://issues.apache.org/jira/browse/HBASE-23332
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, UI
>Reporter: Michael Stack
>Priority: Major
>
> The new 'HBCK Report' page has to be exacting else makes for wild goose chase 
> or worse, operator damage of running cluster.
> I just came across instances where split parents as reported as overlapping 
> their daughters:
> {code}
>  {ENCODED => 22776817918e40d0ba93eb48314d65a1, NAME => 
> 't1,2ac082e1,1572669261019.22776817918e40d0ba93eb48314d65a1.', STARTKEY => 
> '2ac082e1', ENDKEY => '2b020c18'}  {ENCODED => 
> 8cbe15b2f59d69974357e8800a0bfbbc, NAME => 
> 't1,2ac082e1,1574362260851.8cbe15b2f59d69974357e8800a0bfbbc.', STARTKEY => 
> '2ac082e1', ENDKEY => '2ae3529d-1d72-4250-9bd8-4e9b9959284f'}
>  {ENCODED => 22776817918e40d0ba93eb48314d65a1, NAME => 
> 't1,2ac082e1,1572669261019.22776817918e40d0ba93eb48314d65a1.', STARTKEY => 
> '2ac082e1', ENDKEY => '2b020c18'}  {ENCODED => 
> bd062ce8e9c99a6988f0a8223168e028, NAME => 
> 't1,2ae3529d-1d72-4250-9bd8-4e9b9959284f,1574362260851.bd062ce8e9c99a6988f0a8223168e028.',
>  STARTKEY => '2ae3529d-1d72-4250-9bd8-4e9b9959284f', ENDKEY => 
> '2b020c18'}
> {code}
> Need to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23332) [HBCKReport] Split Regions shown as Overlaps in 'Overlap' section

2019-11-22 Thread Michael Stack (Jira)
Michael Stack created HBASE-23332:
-

 Summary: [HBCKReport] Split Regions shown as Overlaps in 'Overlap' 
section
 Key: HBASE-23332
 URL: https://issues.apache.org/jira/browse/HBASE-23332
 Project: HBase
  Issue Type: Bug
  Components: hbck2, UI
Reporter: Michael Stack


The new 'HBCK Report' page has to be exacting else makes for wild goose chase 
or worse, operator damage of running cluster.

I just came across instances where split parents as reported as overlapping 
their daughters:
{code}
 {ENCODED => 22776817918e40d0ba93eb48314d65a1, NAME => 
't1,2ac082e1,1572669261019.22776817918e40d0ba93eb48314d65a1.', STARTKEY => 
'2ac082e1', ENDKEY => '2b020c18'}  {ENCODED => 
8cbe15b2f59d69974357e8800a0bfbbc, NAME => 
't1,2ac082e1,1574362260851.8cbe15b2f59d69974357e8800a0bfbbc.', STARTKEY => 
'2ac082e1', ENDKEY => '2ae3529d-1d72-4250-9bd8-4e9b9959284f'}
 {ENCODED => 22776817918e40d0ba93eb48314d65a1, NAME => 
't1,2ac082e1,1572669261019.22776817918e40d0ba93eb48314d65a1.', STARTKEY => 
'2ac082e1', ENDKEY => '2b020c18'}  {ENCODED => 
bd062ce8e9c99a6988f0a8223168e028, NAME => 
't1,2ae3529d-1d72-4250-9bd8-4e9b9959284f,1574362260851.bd062ce8e9c99a6988f0a8223168e028.',
 STARTKEY => '2ae3529d-1d72-4250-9bd8-4e9b9959284f', ENDKEY => 
'2b020c18'}
{code}

Need to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23280) Purge rep_barrier:seqnumDuringOpen on delete of Region

2019-11-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23280.
---
Resolution: Not A Problem

Resolving as 'Not a problem' any more after subtask which runs the 
ReplicationBarrierCleaner when hbck2 fixMeta is invoked and because of 
HBASE-23294 which fixed a bug in RBC.

> Purge rep_barrier:seqnumDuringOpen on delete of Region
> --
>
> Key: HBASE-23280
> URL: https://issues.apache.org/jira/browse/HBASE-23280
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Michael Stack
>Priority: Major
>
> The Region GC Procedure only cleans the 'info' column family.  We also write 
> a rep_barrier column family as of HBASE-20115 . HBASE-20117 adds a chore to 
> clean them up after-the-fact.  I've not studied how rep_barrier works (There 
> is a comment in MetaTableAccessor to add explaination).
> This issue is about adding the deletion of the rep_barrier content on region 
> delete ([~zhangduo] will this mess up serial replication?).
> I want to clean out these rows. They occasionally can be misinterpreted in 
> such as the hbck report as 'Orphan Regions' or in simple loading tools, we'll 
> find the rep_barrier row and then fail because no accompanying 
> info:regioninfo.
> Perhaps removing rep_barrier column family promptly is the wrong thing to 
> do... we need the lag for replication to catch up Let me know [~zhangduo].
> Here is what they look like:
> {code}
> hbase(main):050:0> get 'hbase:meta', 
> ',22d0e538,1572669183985.6aa8710020b8a4f9ea290539fc254a76.'
> COLUMN
>   CELL
>  rep_barrier:seqnumDuringOpen 
>   timestamp=1573272944262, value=\x00\x00\x00\x00\x00\x00\x00\x02
> {code}
> They get updated on split and when location moves. I don't seem to be able to 
> disable this facility -- it is on always. It also called 'unused' in title of 
> HBASE-20117. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23307) Add running of ReplicationBarrierCleaner to hbck2 fixMeta invocation

2019-11-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23307.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.2+. Thanks for review [~binlijin]. Confirmed this works out 
on loaded cluster.

> Add running of ReplicationBarrierCleaner to hbck2 fixMeta invocation
> 
>
> Key: HBASE-23307
> URL: https://issues.apache.org/jira/browse/HBASE-23307
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Run the ReplicationBarrierCleaner chore when hbck2 invokes fixMeta. It will 
> clean up stale rep_barrier entries in hbase:meta which can help if trying to 
> do a restore of hbase:meta to good state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23328) info:regioninfo goes wrong when region replicas enabled

2019-11-21 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23328.
---
Fix Version/s: 2.1.9
   2.2.3
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.1+. Thanks for reviews [~gxcheng] and [~ramkrishna]

> info:regioninfo goes wrong when region replicas enabled
> ---
>
> Key: HBASE-23328
> URL: https://issues.apache.org/jira/browse/HBASE-23328
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3, 2.1.9
>
>
> Noticed that the info:regioninfo content in hbase:meta can become that of a 
> serialized replica. I think it mostly harmless but accounting especially 
> debugging is frustrated because hbase:meta row name does not match the 
> info:regioninfo.
> Here is an example:
> {code}
> t1,c6e977ef,1572669121340.0b455b2d57f91c153d5088533205c268. 
> column=info:regioninfo, timestamp=1574367093772, value={ENCODED => 
> 5199f7826c340ba944517e97c6ebaf04, NAME => 
> 't1,c6e977ef,1572669121340_0001.5199f7826c340ba944517e97c6ebaf04.', STARTKEY 
> => 'c6e977ef', ENDKEY => 'c72b0126', REPLICA_ID => 1}
> {code}
> Notice how hbase:meta row name is like that of the info:regioninfo content 
> only we are listing REPLICA_ID content and the encoded name is different (as 
> it factors replicaid).
> The original Region Replica design describes how the info:regioninfo is 
> supposed to have the default HRI serialized only. See comment on HRI changes 
> in 
> https://issues.apache.org/jira/secure/attachment/12627276/hbase-10347_redo_v8.patch
> -Going back over history, this may have been a bug since Region Replicas came 
> in.- <= No. Looking at an old cluster w/ region replicas, it doesn't have 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23328) info:regioninfo goes wrong when region replicas enabled

2019-11-21 Thread Michael Stack (Jira)
Michael Stack created HBASE-23328:
-

 Summary: info:regioninfo goes wrong when region replicas enabled
 Key: HBASE-23328
 URL: https://issues.apache.org/jira/browse/HBASE-23328
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Reporter: Michael Stack
Assignee: Michael Stack


Noticed that the info:regioninfo content in hbase:meta can become that of a 
serialized replica. I think it mostly harmless but accounting especially 
debugging is frustrated because hbase:meta row name does not match the 
info:regioninfo.

Here is an example:

{code}
t1,c6e977ef,1572669121340.0b455b2d57f91c153d5088533205c268. 
column=info:regioninfo, timestamp=1574367093772, value={ENCODED => 
5199f7826c340ba944517e97c6ebaf04, NAME => 
't1,c6e977ef,1572669121340_0001.5199f7826c340ba944517e97c6ebaf04.', STARTKEY => 
'c6e977ef', ENDKEY => 'c72b0126', REPLICA_ID => 1}
{code}

Notice how hbase:meta row name is like that of the info:regioninfo content only 
we are listing REPLICA_ID content and the encoded name is different (as it 
factors replicaid).

The original Region Replica design describes how the info:regioninfo is 
supposed to have the default HRI serialized only. See comment on HRI changes in 
https://issues.apache.org/jira/secure/attachment/12627276/hbase-10347_redo_v8.patch

Going back over history, this may have been a bug since Region Replicas came in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23321) [hbck2] fixHoles of fixMeta doesn't update in-memory state

2019-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23321.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Release Note: If holes in hbase:meta, hbck2 fixMeta now will update Master 
in-memory state so you do not need to restart master just so you can assign the 
new hole-bridging regions.
   Resolution: Fixed

Merged to branch-2.2+

> [hbck2] fixHoles of fixMeta doesn't update in-memory state
> --
>
> Key: HBASE-23321
> URL: https://issues.apache.org/jira/browse/HBASE-23321
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck2
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> If hbase:meta has holes, you can run fixMeta from hbck2. This will close the 
> holes but you have to restart the Master for it to notice the new region 
> additions. Also, we were plugging holes by adding regions but no state for 
> the region which makes it awkward to subsequently assign. Fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23322) [hbck2] Simplification on HBCKSCP scheduling

2019-11-19 Thread Michael Stack (Jira)
Michael Stack created HBASE-23322:
-

 Summary: [hbck2] Simplification on HBCKSCP scheduling
 Key: HBASE-23322
 URL: https://issues.apache.org/jira/browse/HBASE-23322
 Project: HBase
  Issue Type: Sub-task
  Components: hbck2
Reporter: Michael Stack
Assignee: Michael Stack


I can make the scheduling of HBCKSCP simpler.  I can also fix a bug in parent 
issue that I notice after exercising it a bunch on a cluster.

The bug is that 'Unknown Servers' seem to be retained in the Map of reporting 
servers. They are usually cleared just before an SCP is scheduled but 
scheduling HBCKSCP doesn't go the usual route.

The patch here forces HBCKSCP via the usual SCP route only at the scheduling 
time, context dictates whether SCP or the scouring HBCKSCP.

Let me put up a patch and will test in meantime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23308) Review of NullPointerExceptions

2019-11-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23308.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-2 and master branch. Thanks for the patch [~belugabehr]

> Review of NullPointerExceptions
> ---
>
> Key: HBASE-23308
> URL: https://issues.apache.org/jira/browse/HBASE-23308
> Project: HBase
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23321) [hbck2] fixHoles of fixMeta doesn't update in-memory state

2019-11-19 Thread Michael Stack (Jira)
Michael Stack created HBASE-23321:
-

 Summary: [hbck2] fixHoles of fixMeta doesn't update in-memory state
 Key: HBASE-23321
 URL: https://issues.apache.org/jira/browse/HBASE-23321
 Project: HBase
  Issue Type: Improvement
  Components: hbck2
Reporter: Michael Stack
Assignee: Michael Stack


If hbase:meta has holes, you can run fixMeta from hbck2. This will close the 
holes but you have to restart the Master for it to notice the new region 
additions. Also, we were plugging holes by adding regions but no state for the 
region which makes it awkward to subsequently assign. Fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23315) Miscellaneous HBCK Report page cleanup

2019-11-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23315.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Assignee: Michael Stack
   Resolution: Fixed

Merged to branch-2.2+.

> Miscellaneous HBCK Report page cleanup
> --
>
> Key: HBASE-23315
> URL: https://issues.apache.org/jira/browse/HBASE-23315
> Project: HBase
>  Issue Type: Improvement
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> A bunch of touch up on the hbck report page:
>  * Add a bit of javadoc around SerialReplicationChecker.
>  * Miniscule edit to the profiler jsp page and then a bit of doc on how to 
> make it work that might help.
>  * Add some detail if NPE getting BitSetNode to help w/ debug.
>  * Change HbckChore to log region names instead of encoded names; helps doing 
> diagnostics; can take region name and query in shell to find out all about 
> the region according to hbase:meta.
>  * Add some fix-it help inline in the HBCK Report page -- how to fix.
>  * Add counts in procedures page so can see if making progress; move listing 
> of WALs to end of the page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23315) Miscellaneous HBCK Report page cleanup

2019-11-18 Thread Michael Stack (Jira)
Michael Stack created HBASE-23315:
-

 Summary: Miscellaneous HBCK Report page cleanup
 Key: HBASE-23315
 URL: https://issues.apache.org/jira/browse/HBASE-23315
 Project: HBase
  Issue Type: Improvement
Reporter: Michael Stack


A bunch of touch up on the hbck report page:

 * Add a bit of javadoc around SerialReplicationChecker.
 * Miniscule edit to the profiler jsp page and then a bit of doc on how to make 
it work that might help.
 * Add some detail if NPE getting BitSetNode to help w/ debug.
 * Change HbckChore to log region names instead of encoded names; helps doing 
diagnostics; can take region name and query in shell to find out all about the 
region according to hbase:meta.
 * Add some fix-it help inline in the HBCK Report page -- how to fix.
 * Add counts in procedures page so can see if making progress; move listing of 
WALs to end of the page.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23282) HBCKServerCrashProcedure for 'Unknown Servers'

2019-11-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23282.
---
Fix Version/s: 2.2.3
   2.3.0
   3.0.0
 Release Note: hbck2 scheduleRecoveries will now run a SCP that also looks 
in hbase:meta for any references to the scheduled server -- not just consult 
Master in-memory state -- just in case vestiges of the server are leftover in 
hbase:meta 
 Assignee: Michael Stack
   Resolution: Fixed

> HBCKServerCrashProcedure for 'Unknown Servers'
> --
>
> Key: HBASE-23282
> URL: https://issues.apache.org/jira/browse/HBASE-23282
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, proc-v2
>Affects Versions: 2.2.2
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> With an overdriving, sustained load, I can fairly easily manufacture an 
> hbase:meta table that references servers that are no longer in the live list 
> nor are members of deadservers; i.e. 'Unknown Servers'.  The new 'HBCK 
> Report' UI in Master has a section where it lists 'Unknown Servers' if any in 
> hbase:meta.
> Once in this state, the repair is awkward. Our assign/unassign Procedure is 
> particularly dogged about insisting that we confirm close/open of Regions 
> when it is going about its business which is well and good if server is in 
> live/dead sets but when an 'Unknown Server', we invariably end up trying to 
> confirm against a non-longer present server (More on this in follow-on 
> issues).
> What is wanted is queuing of a ServerCrashProcedure for each 'Unknown 
> Server'. It would split any WALs (there shouldn't be any if server was 
> restarted) and ideally it would cancel out any assigns and reassign regions 
> off the 'Unknown Server'.  But the 'normal' SCP consults the in-memory 
> cluster state figuring what Regions were on the crashed server... And 
> 'Unknown Servers' don't have state in in-master memory Maps of Servers to 
> Regions or  in DeadServers list which works fine for the usual case.
> Suggestion here is that hbck2 be able to drive in a special SCP, one which 
> would get list of Regions by scanning hbase:meta rather than asking Master 
> memory; an HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23313) [hbck2] setRegionState should update Master in-memory state too

2019-11-18 Thread Michael Stack (Jira)
Michael Stack created HBASE-23313:
-

 Summary: [hbck2] setRegionState should update Master in-memory 
state too
 Key: HBASE-23313
 URL: https://issues.apache.org/jira/browse/HBASE-23313
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


setRegionState changes the hbase:meta table info:state column. It does not 
alter the Master's in-memory state. This means you have to kill Master and have 
another assume Active Master role of a state-change to be noticed. Better if 
the setRegionState just went via Master and updated Master and hbase:meta.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23307) Add running of ReplicationBarrierCleaner to hbck2 fixMeta invocation

2019-11-16 Thread Michael Stack (Jira)
Michael Stack created HBASE-23307:
-

 Summary: Add running of ReplicationBarrierCleaner to hbck2 fixMeta 
invocation
 Key: HBASE-23307
 URL: https://issues.apache.org/jira/browse/HBASE-23307
 Project: HBase
  Issue Type: Sub-task
  Components: hbck2
Reporter: Michael Stack
Assignee: Michael Stack


Run the ReplicationBarrierCleaner chore when hbck2 invokes fixMeta. It will 
clean up stale rep_barrier entries in hbase:meta which can help if trying to do 
a restore of hbase:meta to good state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    3   4   5   6   7   8   9   10   11   12   >