[jira] [Created] (HBASE-24547) Thrift support for HBASE-23941

2020-06-12 Thread Viraj Jasani (Jira)
Viraj Jasani created HBASE-24547:


 Summary: Thrift support for HBASE-23941
 Key: HBASE-24547
 URL: https://issues.apache.org/jira/browse/HBASE-24547
 Project: HBase
  Issue Type: Task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24548) improvement for HBase SCP

2020-06-12 Thread Junhong Xu (Jira)
Junhong Xu created HBASE-24548:
--

 Summary: improvement for HBase SCP
 Key: HBASE-24548
 URL: https://issues.apache.org/jira/browse/HBASE-24548
 Project: HBase
  Issue Type: Improvement
Reporter: Junhong Xu
Assignee: Junhong Xu


In our internal hbase based on branch-2.1 in community, we find after the 
regionserver is stopped about 30 s later, the master find it dead finally from 
its ephemeral node deleted in zk. During this time, the regions on this server 
is unavailable and no progress. The log is as follows:
{code:java}
[2020-06-12 15:51:41.888 
ActorThreadPool-consumer-processor-talos-set-alias-55-1 ERROR 
c.x.xmpush.hbase.utils.HBaseHelper] [get data hbase failed, tableName = 
mipush:app_alias_new]
com.xiaomi.infra.hbase.client.HException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=10, exceptions:
Fri Jun 12 15:50:44 CST 2020, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

Fri Jun 12 15:50:44 CST 2020, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
{code}
The logs in master:
{code:java}
2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] 
org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node 
deleted, processing expiration [c3-hadoop-srv-st639.bj,13700,1591932264018]
2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] 
org.apache.hadoop.hbase.master.ServerManager: Processing expiration of 
c3-hadoop-srv-st639.bj,13700,1591932264018 on 
c3-hadoop-miui-zk05.bj,13600,1591927126881
2020-06-12,15:51:12,109 INFO [RegionServerTracker-0] 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: Added 
c3-hadoop-srv-st639.bj,13700,1591932264018 to dead servers which 
carryingMeta=false, submitted ServerCrashProcedure pid=97428
2020-06-12,15:51:12,109 INFO 
[org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-c3-hadoop-miui-zk05.bj,13600,1591927126881]
 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread:
 Updating default servers.
2020-06-12,15:51:12,111 INFO [PEWorker-11] 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=97428, 
state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure 
server=c3-hadoop-srv-st639.bj,13700,1591932264018, splitWal=true, meta=false
{code}
After discussion with [~zghao] offline, we could accelerate this process by 
sending the message to the master or deleting the ephemeral node itself before 
stop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24529) hbase.rs.evictblocksonclose is not honored when removing compacted files and closing the storefiles

2020-06-12 Thread Toshihiro Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiro Suzuki resolved HBASE-24529.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> hbase.rs.evictblocksonclose is not honored when removing compacted files and 
> closing the storefiles
> ---
>
> Key: HBASE-24529
> URL: https://issues.apache.org/jira/browse/HBASE-24529
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.1.10, 2.2.6
>
>
> Currently, when removing compacted files and closing the storefiles, RS 
> always does evict block caches for the store files. It should honor 
> hbase.rs.evictblocksonclose:
> https://github.com/apache/hbase/blob/7b396e9b8ca93361de6a6c4bc8a40442db77c4da/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L2744
> https://github.com/apache/hbase/blob/7b396e9b8ca93361de6a6c4bc8a40442db77c4da/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L625



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24446) Use EnvironmentEdgeManager to compute clock skew in Master

2020-06-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HBASE-24446.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master, branch-2 and branch-1.

> Use EnvironmentEdgeManager to compute clock skew in Master
> --
>
> Key: HBASE-24446
> URL: https://issues.apache.org/jira/browse/HBASE-24446
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Sandeep Guggilam
>Assignee: Sandeep Guggilam
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0
>
>
> There are few cases where the Master is not able to complete the 
> initialization as it waiting for the region server to report to it. The 
> region server actually reported to the master but the master rejected the 
> request because of clock skew issue though both of them are on  same JVM
> The Region server uses EnvironmentEdgeManager.currentTime to report the 
> current time and HMaster uses System.currentTimeMillis() to get the current 
> time for computation against the reported time by RS.  We should also just 
> use EnvironmentEdgeManager even in Master as we are expected not to use 
> System.currentTime directly and instead go through EnvironmentEdgeManager
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24545) Add backoff to SCP check on WAL split completion

2020-06-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24545.
---
Fix Version/s: 2.3.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Merged to branch-2.3+. Thanks for reviews [~zhangduo]

> Add backoff to SCP check on WAL split completion
> 
>
> Key: HBASE-24545
> URL: https://issues.apache.org/jira/browse/HBASE-24545
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Crashed cluster. Lots of backed up WALs. Startup. Recover hundreds of 
> servers; each has a running SCP. Taking a thread dump during recovery, I 
> noticed that there were 160 threads each in SCP waiting on split WAL 
> completion. Each thread was scanning zk splitWAL directory every 100ms. The 
> dir had thousands of entries in it so each check was pulling down MB from 
> zk... * 160 (max configured PE threads (16) * 10 for the KeepAlive factor 
> that has us do 10 * configured PEs as max for PE worker pool).
> If lots of remaining WALs to split, have the SCP backoff on its wait so it 
> checks less frequently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24549) Cluster locked up due to REST gateway queries for non-existent tables

2020-06-12 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24549:


 Summary: Cluster locked up due to REST gateway queries for 
non-existent tables
 Key: HBASE-24549
 URL: https://issues.apache.org/jira/browse/HBASE-24549
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Nick Dimiduk


We had a case where a REST gateway instance was picked up by an automated 
security scanning tool. Many of the http requests it generated were interpreted 
requests against non-existent tables. The result was a flood of queries against 
meta, completely overrunning the priority queue of the region server hosting 
meta.

We need some kind of back pressure mechanism that can protect meta from a 
single bad actor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24550) Passing '-h' or '--help' to bin/hbase doesn't do as expected

2020-06-12 Thread Michael Stack (Jira)
Michael Stack created HBASE-24550:
-

 Summary: Passing '-h' or '--help' to bin/hbase doesn't do as 
expected
 Key: HBASE-24550
 URL: https://issues.apache.org/jira/browse/HBASE-24550
 Project: HBase
  Issue Type: Bug
  Components: Operability, shell
Reporter: Michael Stack


If I do 'bin/hbase -h' or './bin/hbase --help', it doesn't dump usage as I'd 
expect. Instead, the param gets passed direct to the jvm for it to spew 
complaint that the param is unrecognized.

Should do the right thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9039) During recovery, perform assignment and distributed log replay in parallel

2020-06-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-9039.
--
Resolution: Later

> During recovery, perform assignment and distributed log replay in parallel
> --
>
> Key: HBASE-9039
> URL: https://issues.apache.org/jira/browse/HBASE-9039
> Project: HBase
>  Issue Type: Improvement
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Minor
>
> In the ServerShutDownHandler, the log replay starts only after all the 
> regions have been assigned. It might make sense to do the log replay as and 
> when assignment happens for the regions rather than wait for the batch of 
> assignments to complete. As part of region opening, the store files are read, 
> and if a certain store file block is on a slow datanode, it slows down the 
> assignment process. Maybe this can be amortized by having the log replay in 
> parallel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-20991) MTTR

2020-06-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20991.
---
Resolution: Later

Old. No progress. Resolving as 'later'.

> MTTR
> 
>
> Key: HBASE-20991
> URL: https://issues.apache.org/jira/browse/HBASE-20991
> Project: HBase
>  Issue Type: Brainstorming
>  Components: MTTR
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for addressing Mean Time To Recovery (MTTR). This used to be a 
> hot issue but has gone quiet. A survey of old items is a rag bag of 
> miscellaneous. Let me open this to hang items from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23195) FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT implements CanUnbuffer but its parents class implements CanUnbuffer

2020-06-12 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-23195.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the work, [~zhaoyim]! Sorry it took so long to get it committed. 
Thanks for sticking with it.

> FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT 
> implements CanUnbuffer but its parents class implements CanUnbuffer 
> --
>
> Key: HBASE-23195
> URL: https://issues.apache.org/jira/browse/HBASE-23195
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.0.2
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.6
>
>
> FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT 
> implements CanUnbuffer but its parents class implements CanUnbuffer
> For example:
> There are 1 interface I1 and one class implements I1 named PC1 and the class 
> C1 extends from PC1
> If we want to invoke the C1 unbuffer() method the FSDataInputStreamWrapper 
> unbuffer  can NOT do that. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24539) Fix the classpath for the local mini-cluster

2020-06-12 Thread Bharath Vissapragada (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada resolved HBASE-24539.
--
Fix Version/s: master
   Resolution: Fixed

> Fix the classpath for the local mini-cluster
> 
>
> Key: HBASE-24539
> URL: https://issues.apache.org/jira/browse/HBASE-24539
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, native-client
>Affects Versions: 3.0.0-alpha-1
>Reporter: Bharath Vissapragada
>Assignee: Marc Parisi
>Priority: Major
> Fix For: master
>
>
> The classpath is created assuming the native-client is a module of the parent 
> hbase repo. Since that is not the case anymore and we rely on a dynamically 
> pulled HBase version, we need to fix the classpath for the JVM that is spun 
> using JNI. Otherwise mini-cluster won't start..
> {noformat}
> using hbase::MiniCluster;
> JNIEnv *MiniCluster::CreateVM(JavaVM **jvm) {
>   JavaVMInitArgs args;
>   JavaVMOption jvm_options;
>   args.version = JNI_VERSION_1_6;
>   args.nOptions = 1;
>   char *classpath = getenv("CLASSPATH");
>   std::string clspath;
>   if (classpath == NULL || strstr(classpath, "-tests.jar") == NULL) {
> std::string 
> clsPathFilePath("../../../hbase-build-configuration/target/cached_classpath.txt");
> std::ifstream fd(clsPathFilePath);
> std::string prefix("");
> if (fd.is_open()) {
>   if (classpath == NULL) {
> LOG(INFO) << "got empty classpath";
>   } else {
> // prefix bootstrapper.jar
> prefix.assign(classpath);
>   }
>   std::string line;
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24551) Fix README with docs on compiling and running tests

2020-06-12 Thread Bharath Vissapragada (Jira)
Bharath Vissapragada created HBASE-24551:


 Summary: Fix README with docs on compiling and running tests
 Key: HBASE-24551
 URL: https://issues.apache.org/jira/browse/HBASE-24551
 Project: HBase
  Issue Type: Sub-task
  Components: Client, native-client
Affects Versions: master
Reporter: Bharath Vissapragada


Once all the other issues are fixed, lets fix the documentation with the 
following information

- Building shared and static libs for client.
- Running the full test suite
- Compiling with a custom HBase version
.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24552) Replica region needs to do check if primary region exists in hdfs during createRegionOnFileSystem().

2020-06-12 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24552:


 Summary: Replica region needs to do check if primary region exists 
in hdfs during createRegionOnFileSystem().
 Key: HBASE-24552
 URL: https://issues.apache.org/jira/browse/HBASE-24552
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


When a replica is opened, it does not check if region dir exists and if 
.regionInfo exists in the directory, region server will online this replica 
region even the primary region does not exist. 

 

It needs to do better to do more checks and fails region open if the check does 
not pass.

Maybe we can do this check in master, will see.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24554) Improve/stable read replica

2020-06-12 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24554:


 Summary: Improve/stable read replica
 Key: HBASE-24554
 URL: https://issues.apache.org/jira/browse/HBASE-24554
 Project: HBase
  Issue Type: Task
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Tracing some read replica issues recently, this is the umbrella Jira to track 
this effort. A few observations so far:
 # balancer balances replica regions too often, need to spend time on it. 
Replica region does not serve write and rarely serve reads (unless the client 
specifically selects the replica region). So data locality should be a very 
minimum factor for replica regions. 
 # Need to study split/merge for regions with replica, need to make them more 
robust. With proc-v2, probably it is already robust. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24555) Correct the description of hbase.hregion.max.filesize in doc

2020-06-12 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-24555:
--

 Summary: Correct the description of hbase.hregion.max.filesize in 
doc
 Key: HBASE-24555
 URL: https://issues.apache.org/jira/browse/HBASE-24555
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Zheng Wang
Assignee: Zheng Wang


For now, we actually split region by store size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)