[jira] [Commented] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-06-08 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043970#comment-16043970
 ] 

Sean Busbey commented on HBASE-17988:
-

no worries, it happens. Could you switch the jira to Patch Available so the QA 
bot can take a look?

> get-active-master.rb and draining_servers.rb no longer work
> ---
>
> Key: HBASE-17988
> URL: https://issues.apache.org/jira/browse/HBASE-17988
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Chinmay Kulkarni
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-17988.002.patch, HBASE-17988.patch
>
>
> The scripts {{bin/get-active-master.rb}} and {{bin/draining_servers.rb}} no 
> longer work on current master branch. Here is an example error message:
> {noformat}
> $ bin/hbase-jruby bin/get-active-master.rb 
> NoMethodError: undefined method `masterAddressZNode' for 
> #
>at bin/get-active-master.rb:35
> {noformat}
> My initial probing suggests that this is likely due to movement that happened 
> in HBASE-16690. Perhaps instead of reworking the ruby, there is similar Java 
> functionality already existing somewhere.
> Putting priority at critical since it's impossible to know whether users rely 
> on the scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043966#comment-16043966
 ] 

Hudson commented on HBASE-18195:


FAILURE: Integrated in Jenkins build HBase-2.0 #13 (See 
[https://builds.apache.org/job/HBase-2.0/13/])
HBASE-18195 Removed redundant single quote from start message for (stack: rev 
a81577d827f96dc64b8cbe95b15cfc3ba151d207)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Remove redundant single quote from start message for HMaster and HRegionServer
> --
>
> Key: HBASE-18195
> URL: https://issues.apache.org/jira/browse/HBASE-18195
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
>  Labels: beginners
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18195.master.001.patch
>
>
> Message in the log shows up as:
> {code}
> INFO  [main] master.HMaster: STARTING service 'HMaster
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18195:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2 and master. Thanks [~uagashe]

> Remove redundant single quote from start message for HMaster and HRegionServer
> --
>
> Key: HBASE-18195
> URL: https://issues.apache.org/jira/browse/HBASE-18195
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
>  Labels: beginners
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18195.master.001.patch
>
>
> Message in the log shows up as:
> {code}
> INFO  [main] master.HMaster: STARTING service 'HMaster
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043945#comment-16043945
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #193 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/193/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev 92f3db6cb99da542059560c04746c192930e8646)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043943#comment-16043943
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #179 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/179/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev 92f3db6cb99da542059560c04746c192930e8646)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043940#comment-16043940
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


I verified this test case TestCompactionArchiveIOException. It seems doing the 
right thing but if we specifically get fileNotFoundException - I think we can 
be sure that it was already been archived and remove that also from the 
compactedFile list.
Infact to confirm this shall we do file.exists() check in the 'archive' path?

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> 

[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043922#comment-16043922
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #150 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/150/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev a2617b00c0fd330d5b779c4aedc65e9dcc08cb5c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18037) Do not expose implementation classes to CP

2017-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043919#comment-16043919
 ] 

stack commented on HBASE-18037:
---

Thanks for knocking down a blocker [~Apache9]

> Do not expose implementation classes to CP
> --
>
> Key: HBASE-18037
> URL: https://issues.apache.org/jira/browse/HBASE-18037
> Project: HBase
>  Issue Type: Umbrella
>  Components: Coprocessors
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0
>
>
> For example, StoreFile. Expose the implementation classes to CP will make it 
> harder to implement new features or improve the old implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043901#comment-16043901
 ] 

Ted Yu commented on HBASE-17678:


Anoop:
You can commit the patches.

Thanks

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043899#comment-16043899
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #3161 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3161/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev a558d6c57a02219cb429b311bad13199042ad720)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerCoprocessorHost.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043898#comment-16043898
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #146 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/146/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev a2617b00c0fd330d5b779c4aedc65e9dcc08cb5c)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043896#comment-16043896
 ] 

Hadoop QA commented on HBASE-18137:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
16s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s 
{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
34s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 48s 
{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 31s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 191m 47s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 244m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 
|
|   | hadoop.hbase.regionserver.TestPerColumnFamilyFlush |
|   | hadoop.hbase.master.TestMasterBalanceThrottling |
| Timed out junit tests | 
org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:395d9a0 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872172/HBASE-18137.branch-1.v1.patch
 |
| JIRA Issue | HBASE-18137 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 0c34c97b8321 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1 / ba3a816 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7151/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7151/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7151/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7151/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console 

[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043890#comment-16043890
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.4 #768 (See 
[https://builds.apache.org/job/HBase-1.4/768/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev 6c4980161b736878d8c1f83bd9b65a591d5bae9c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043889#comment-16043889
 ] 

Anoop Sam John commented on HBASE-17678:


[~tedyu] Can you pls commit to branch-1 and related also?

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043880#comment-16043880
 ] 

Ted Yu commented on HBASE-18160:


lgtm

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-08 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-18160:
-
Attachment: HBASE-18160.v2.patch

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.v1.patch, HBASE-18160.v2.patch, 
> HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-18196) Fix historical FindBugs/Javadoc issues in branch-1.1

2017-06-08 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-18196:


Assignee: Zheng Hu

> Fix historical FindBugs/Javadoc issues in branch-1.1 
> -
>
> Key: HBASE-18196
> URL: https://issues.apache.org/jira/browse/HBASE-18196
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Minor
>
> See  
> https://issues.apache.org/jira/browse/HBASE-17678?focusedCommentId=16042572=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16042572



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-08 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043867#comment-16043867
 ] 

Zheng Hu commented on HBASE-18160:
--

The failed/timeout UTs are unrelated to the patch.

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.v1.patch, HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-08 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043864#comment-16043864
 ] 

Zheng Hu commented on HBASE-17678:
--

I opened another issue HBASE-18196 to fix the findbugs & javadoc for 
branch-1.1. 

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18196) Fix historical FindBugs/Javadoc issues in branch-1.1

2017-06-08 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-18196:


 Summary: Fix historical FindBugs/Javadoc issues in branch-1.1 
 Key: HBASE-18196
 URL: https://issues.apache.org/jira/browse/HBASE-18196
 Project: HBase
  Issue Type: Improvement
Reporter: Zheng Hu
Priority: Minor


See  
https://issues.apache.org/jira/browse/HBASE-17678?focusedCommentId=16042572=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16042572



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-08 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043861#comment-16043861
 ] 

Zheng Hu commented on HBASE-17678:
--

The failed UT & javadoc & findbugs are unrelated to patch in branch-1.1,  they 
are historical issuses in branch-1.1. Maybe I'll open another issue to fix them.

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.

[jira] [Updated] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

2017-06-08 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-17678:
-
Attachment: HBASE-17678.branch-1.v1.patch

Trigger Hadoop QA for branch-1 again.

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> --
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.v1.patch, 
> HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, HBASE-17678.v3.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v4.patch, HBASE-17678.v5.patch, 
> HBASE-17678.v6.patch, HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18178) [C++] Retrying meta location lookup and zookeeper connection

2017-06-08 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-18178:
--
Attachment: hbase-18178-v1.patch

Here is a patch that sits on top of HBASE-18188.  
It does: 
 - Retry zookeeper connections, re-open the connection in case of failure. 
 - Being able to lookup the location of meta itself. 
 - Meta cache invalidation 
 - Bunch of unit tests
 - Some small cleanups. 

> [C++] Retrying meta location lookup and zookeeper connection 
> -
>
> Key: HBASE-18178
> URL: https://issues.apache.org/jira/browse/HBASE-18178
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: hbase-18178-v1.patch
>
>
> Currently location-cache can only do a single lookup to meta. If meta 
> location changes or we have zookeeper connection problems, we never retry. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16138) Cannot open regions after non-graceful shutdown due to deadlock with Replication Table

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043846#comment-16043846
 ] 

Hadoop QA commented on HBASE-16138:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 10s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} hbase-client generated 0 new + 1 unchanged - 1 fixed = 
1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 100m 9s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
39s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.replication.TestSerialReplication |
|   | hadoop.hbase.replication.regionserver.TestGlobalThrottler |
|   | 
hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint |
|   | hadoop.hbase.regionserver.TestRegionReplicaFailover |
| Timed out junit tests | 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor |
|   | org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat |
|   | org.apache.hadoop.hbase.TestPartialResultsFromClientSide |
|   | org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872176/HBASE-16138-v2.patch |
| JIRA Issue | 

[jira] [Commented] (HBASE-18160) Fix incorrect logic in FilterList.filterKeyValue

2017-06-08 Thread Zheng Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043844#comment-16043844
 ] 

Zheng Hu commented on HBASE-18160:
--

[~anoop.hbase],  Could you help to review the patch v2 ?  Thanks. 

> Fix incorrect  logic in FilterList.filterKeyValue
> -
>
> Key: HBASE-18160
> URL: https://issues.apache.org/jira/browse/HBASE-18160
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
> Attachments: HBASE-18160.v1.patch, HBASE-18160.v2.patch
>
>
> As HBASE-17678 said, there are two problems in FilterList.filterKeyValue 
> implementation: 
> 1.  FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like 
> INCLUDE_AND_SEEK_NEXT_ROW is a newly added case, and the dev forgot to 
> consider FilterList), So if a user use INCLUDE_AND_SEEK_NEXT_ROW in his own 
> Filter and wrapped by a FilterList,  it'll  throw  an 
> IllegalStateException("Received code is not valid."). 
> 2.  For FilterList with MUST_PASS_ONE,   if filter-A in filter list return  
> INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL,   the 
> FilterList will return  INCLUDE_AND_NEXT_COL finally.  According to the 
> mininal step rule , It's incorrect.  (filter list with MUST_PASS_ONE choose 
> the mininal step among filters in filter list. Let's call it: The Mininal 
> Step Rule).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043832#comment-16043832
 ] 

Hudson commented on HBASE-18027:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #192 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/192/])
HBASE-18027 HBaseInterClusterReplicationEndpoint should respect RPC (apurtell: 
rev 4227757335c3fe15ef1d7139140c795b414facf2)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicator.java


> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043830#comment-16043830
 ] 

Hudson commented on HBASE-18027:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #178 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/178/])
HBASE-18027 HBaseInterClusterReplicationEndpoint should respect RPC (apurtell: 
rev 4227757335c3fe15ef1d7139140c795b414facf2)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicator.java


> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18092) Removing a peer does not properly clean up the ReplicationSourceManager state and metrics

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043823#comment-16043823
 ] 

Ted Yu commented on HBASE-18092:


lgtm

> Removing a peer does not properly clean up the ReplicationSourceManager state 
> and metrics
> -
>
> Key: HBASE-18092
> URL: https://issues.apache.org/jira/browse/HBASE-18092
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-18092.master.001.patch
>
>
> Removing a peer does not clean up the associated metrics and state from 
> walsById map in the ReplicationSourceManager.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043822#comment-16043822
 ] 

Anoop Sam John commented on HBASE-18050:


+1 for adding the documentation
But IMO this is an indication that we expose unwanted methods in Interface
Eg :   See Region interface
RegionServicesForStores getRegionServicesForStores();
RegionServicesForStores is IA.private.   We have this extra interface so that 
we have some region level methods in that which should not be exposed to CP via 
the IA.LimitedPrivate interface.(Like blockUpdates)..   But we have getter 
for that interface in the Region!   Ya we say here by documentation not to use 
these returned Objects and call any methods on that.  But this API as such of 
no use then.   So it is better we clean up them. 


> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043820#comment-16043820
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #883 (See 
[https://builds.apache.org/job/HBase-1.2-IT/883/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev a2617b00c0fd330d5b779c4aedc65e9dcc08cb5c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18192) Replication drops recovered queues on region server shutdown

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043821#comment-16043821
 ] 

Ted Yu commented on HBASE-18192:


lgtm

> Replication drops recovered queues on region server shutdown
> 
>
> Key: HBASE-18192
> URL: https://issues.apache.org/jira/browse/HBASE-18192
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18192.branch-1.3.001.patch, 
> HBASE-18192.branch-1.3.002.patch
>
>
> When a recovered queue has only one active ReplicationWorkerThread, the 
> recovered queue is completely dropped on a region server shutdown. This will 
> happen in situation when 
> 1. DefaultWALProvider is used.
> 2. RegionGroupingProvider provider is used but replication is stuck on one 
> WAL group for some reason (for example HBASE-18137)
> 3. All other replication workers have died due to unhandled exception, and 
> the only one finishes. This will cause the recovered queue to get deleted 
> without a regionserver shutdown. This can happen on deployments without fix 
> for HBASE-17381.
> The problematic piece of code is:
> {Code}
> while (isWorkerActive()){
> // The worker thread run loop...
> }
> if (replicationQueueInfo.isQueueRecovered()) {
> // use synchronize to make sure one last thread will clean the queue
> synchronized (workerThreads) {
>   Threads.sleep(100);// wait a short while for other worker thread to 
> fully exit
>   boolean allOtherTaskDone = true;
>   for (ReplicationSourceWorkerThread worker : workerThreads.values()) 
> {
> if (!worker.equals(this) && worker.isAlive()) {
>   allOtherTaskDone = false;
>   break;
> }
>   }
>   if (allOtherTaskDone) {
> manager.closeRecoveredQueue(this.source);
> LOG.info("Finished recovering queue " + peerClusterZnode
> + " with the following stats: " + getStats());
>   }
> }
> {Code}
> The conceptual issue is that isWorkerActive() tells whether a worker is 
> currently running or not and it's being used as a proxy for whether a worker 
> has finished it's work. But, in fact, "Should a worker should exit?" and "Has 
> completed it's work?" are two different questions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2017-06-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043811#comment-16043811
 ] 

Anoop Sam John commented on HBASE-7404:
---

Because the block size is not a hard limit. While writing HFiles, it is always 
possible that we might have crossed the block size for the current cell.  Then 
only we have check that says the size is crossed so we move on to the next 
block.  To accommodate this possibility, we have 1K extra

> Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
> --
>
> Key: HBASE-7404
> URL: https://issues.apache.org/jira/browse/HBASE-7404
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.3
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.95.0
>
> Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 
> 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 
> 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, 
> hbase-7404-94v2.patch, HBASE-7404-backport-0.94.patch, 
> hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket 
> Cache.pdf
>
>
> First, thanks @neil from Fusion-IO share the source code.
> Usage:
> 1.Use bucket cache as main memory cache, configured as the following:
> –"hbase.bucketcache.ioengine" "heap" (or "offheap" if using offheap memory to 
> cache block )
> –"hbase.bucketcache.size" 0.4 (size for bucket cache, 0.4 is a percentage of 
> max heap size)
> 2.Use bucket cache as a secondary cache, configured as the following:
> –"hbase.bucketcache.ioengine" "file:/disk1/hbase/cache.data"(The file path 
> where to store the block data)
> –"hbase.bucketcache.size" 1024 (size for bucket cache, unit is MB, so 1024 
> means 1GB)
> –"hbase.bucketcache.combinedcache.enabled" false (default value being true)
> See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
> What's Bucket Cache? 
> It could greatly decrease CMS and heap fragment by GC
> It support a large cache space for High Read Performance by using high speed 
> disk like Fusion-io
> 1.An implementation of block cache like LruBlockCache
> 2.Self manage blocks' storage position through Bucket Allocator
> 3.The cached blocks could be stored in the memory or file system
> 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
> combined with LruBlockCache to decrease CMS and fragment by GC.
> 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
> store block) to enlarge cache space
> How about SlabCache?
> We have studied and test SlabCache first, but the result is bad, because:
> 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
> of block size, especially using DataBlockEncoding
> 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
> and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
> it causes CMS and heap fragment don't get any better
> 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
> recommend using "heap" engine 
> See more in the attachment and in the patch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043808#comment-16043808
 ] 

Hudson commented on HBASE-18141:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #61 (See 
[https://builds.apache.org/job/HBase-1.3-IT/61/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev 92f3db6cb99da542059560c04746c192930e8646)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18192) Replication drops recovered queues on region server shutdown

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043798#comment-16043798
 ] 

Hadoop QA commented on HBASE-18192:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 4s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
19s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} branch-1.3 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} branch-1.3 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s 
{color} | {color:red} hbase-server in branch-1.3 has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} branch-1.3 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m 51s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 20s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 124m 2s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:9ba21e3 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872157/HBASE-18192.branch-1.3.002.patch
 |
| JIRA Issue | HBASE-18192 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 898c81d294f9 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.3 / 4227757 |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |  

[jira] [Updated] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-18141:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.2.7
   1.4.0
   3.0.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Committed to branch-1.2+.

> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18092) Removing a peer does not properly clean up the ReplicationSourceManager state and metrics

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043790#comment-16043790
 ] 

Hadoop QA commented on HBASE-18092:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
23s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 44s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 138m 57s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 48s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12871096/HBASE-18092.master.001.patch
 |
| JIRA Issue | HBASE-18092 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux b27675a488c7 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 112bff4 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7149/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7149/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Removing a peer does not properly clean up the ReplicationSourceManager state 
> and metrics
> -
>
> Key: HBASE-18092
> URL: https://issues.apache.org/jira/browse/HBASE-18092
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.1
>Reporter: Ashu Pachauri
>  

[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043787#comment-16043787
 ] 

Hudson commented on HBASE-18141:


FAILURE: Integrated in Jenkins build HBase-2.0 #12 (See 
[https://builds.apache.org/job/HBase-2.0/12/])
HBASE-18141 Regionserver fails to shutdown when abort triggered during (garyh: 
rev 17966525e94524f7fc4a72aeeb6804b05020c97c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerCoprocessorHost.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbort.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 1.3.2
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043776#comment-16043776
 ] 

Duo Zhang commented on HBASE-18050:
---

I suppose we only generate documentation from master? So commiting to master is 
enough I think.

> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-18037) Do not expose implementation classes to CP

2017-06-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-18037.
---
   Resolution: Fixed
 Assignee: Duo Zhang
Fix Version/s: 3.0.0

Resolve as all sub tasks have been resolved.

> Do not expose implementation classes to CP
> --
>
> Key: HBASE-18037
> URL: https://issues.apache.org/jira/browse/HBASE-18037
> Project: HBase
>  Issue Type: Umbrella
>  Components: Coprocessors
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0
>
>
> For example, StoreFile. Expose the implementation classes to CP will make it 
> harder to implement new features or improve the old implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-18050:
--
Hadoop Flags: Reviewed

> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043742#comment-16043742
 ] 

Hadoop QA commented on HBASE-18195:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 37s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 119m 1s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 161m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.coprocessor.TestCoprocessorMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872151/HBASE-18195.master.001.patch
 |
| JIRA Issue | HBASE-18195 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 8b4587a36c36 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 112bff4 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7148/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7148/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7148/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7148/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Remove 

[jira] [Updated] (HBASE-18174) Implement Table#checkAndPut()

2017-06-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18174:
---
Attachment: 18174.v9.txt

Patch v9 has basic functionality.
Still need to figure out how to incorporate the boolean return value.

> Implement Table#checkAndPut()
> -
>
> Key: HBASE-18174
> URL: https://issues.apache.org/jira/browse/HBASE-18174
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
> Attachments: 18174.v1.txt, 18174.v7.lambda.txt, 18174.v9.txt
>
>
> This task is to implement Table#checkAndPut() method



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16138) Cannot open regions after non-graceful shutdown due to deadlock with Replication Table

2017-06-08 Thread Maddineni Sukumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043717#comment-16043717
 ] 

Maddineni Sukumar commented on HBASE-16138:
---

Thanks [~te...@apache.org],  Created new review board request 
https://reviews.apache.org/r/59939/

> Cannot open regions after non-graceful shutdown due to deadlock with 
> Replication Table
> --
>
> Key: HBASE-16138
> URL: https://issues.apache.org/jira/browse/HBASE-16138
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Joseph
>Assignee: Ashu Pachauri
>Priority: Critical
> Attachments: HBASE-16138.patch, HBASE-16138-v1.patch, 
> HBASE-16138-v2.patch
>
>
> If we shutdown an entire HBase cluster and attempt to start it back up, we 
> have to run the WAL pre-log roll that occurs before opening up a region. Yet 
> this pre-log roll must record the new WAL inside of ReplicationQueues. This 
> method call ends up blocking on 
> TableBasedReplicationQueues.getOrBlockOnReplicationTable(), because the 
> Replication Table is not up yet. And we cannot assign the Replication Table 
> because we cannot open any regions. This ends up deadlocking the entire 
> cluster whenever we lose Replication Table availability. 
> There are a few options that we can do, but none of them seem very good:
> 1. Depend on Zookeeper-based Replication until the Replication Table becomes 
> available
> 2. Have a separate WAL for System Tables that does not perform any 
> replication (see discussion  at HBASE-14623)
>   Or just have a seperate WAL for non-replicated vs replicated 
> regions
> 3. Record the WAL log in the ReplicationQueue asynchronously (don't block 
> opening a region on this event), which could lead to inconsistent Replication 
> state
> The stacktrace:
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.recordLog(ReplicationSourceManager.java:376)
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.preLogRoll(ReplicationSourceManager.java:348)
> 
> org.apache.hadoop.hbase.replication.regionserver.Replication.preLogRoll(Replication.java:370)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.tellListenersAboutPreLogRoll(FSHLog.java:637)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:701)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:600)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.(FSHLog.java:533)
> 
> org.apache.hadoop.hbase.wal.DefaultWALProvider.getWAL(DefaultWALProvider.java:132)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:186)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:197)
> org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:240)
> 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1883)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:363)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
> 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> Does anyone have any suggestions/ideas/feedback?
> Attached a review board at: https://reviews.apache.org/r/50546/
> It is still pretty rough, would just like some feedback on it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16138) Cannot open regions after non-graceful shutdown due to deadlock with Replication Table

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043702#comment-16043702
 ] 

Ted Yu commented on HBASE-16138:


[~sukuna...@gmail.com]:
Since you have new patch, you can create new review board request.

> Cannot open regions after non-graceful shutdown due to deadlock with 
> Replication Table
> --
>
> Key: HBASE-16138
> URL: https://issues.apache.org/jira/browse/HBASE-16138
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Joseph
>Assignee: Ashu Pachauri
>Priority: Critical
> Attachments: HBASE-16138.patch, HBASE-16138-v1.patch, 
> HBASE-16138-v2.patch
>
>
> If we shutdown an entire HBase cluster and attempt to start it back up, we 
> have to run the WAL pre-log roll that occurs before opening up a region. Yet 
> this pre-log roll must record the new WAL inside of ReplicationQueues. This 
> method call ends up blocking on 
> TableBasedReplicationQueues.getOrBlockOnReplicationTable(), because the 
> Replication Table is not up yet. And we cannot assign the Replication Table 
> because we cannot open any regions. This ends up deadlocking the entire 
> cluster whenever we lose Replication Table availability. 
> There are a few options that we can do, but none of them seem very good:
> 1. Depend on Zookeeper-based Replication until the Replication Table becomes 
> available
> 2. Have a separate WAL for System Tables that does not perform any 
> replication (see discussion  at HBASE-14623)
>   Or just have a seperate WAL for non-replicated vs replicated 
> regions
> 3. Record the WAL log in the ReplicationQueue asynchronously (don't block 
> opening a region on this event), which could lead to inconsistent Replication 
> state
> The stacktrace:
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.recordLog(ReplicationSourceManager.java:376)
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.preLogRoll(ReplicationSourceManager.java:348)
> 
> org.apache.hadoop.hbase.replication.regionserver.Replication.preLogRoll(Replication.java:370)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.tellListenersAboutPreLogRoll(FSHLog.java:637)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:701)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:600)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.(FSHLog.java:533)
> 
> org.apache.hadoop.hbase.wal.DefaultWALProvider.getWAL(DefaultWALProvider.java:132)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:186)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:197)
> org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:240)
> 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1883)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:363)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
> 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> Does anyone have any suggestions/ideas/feedback?
> Attached a review board at: https://reviews.apache.org/r/50546/
> It is still pretty rough, would just like some feedback on it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16138) Cannot open regions after non-graceful shutdown due to deadlock with Replication Table

2017-06-08 Thread Maddineni Sukumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maddineni Sukumar updated HBASE-16138:
--
Attachment: HBASE-16138-v2.patch

Attaching new patch against latest code base and also with some of review 
comment fixes. I dont have permissions to add to existing review request. 
Can I create new Review request or can someone give me permission to existing 
one.  

Tested this patch by simulating replication lag in source cluster and bringing 
source cluster down multiple times and its catching up normally. Planning to 
run some perf tests unless I get review comments/objection.  

> Cannot open regions after non-graceful shutdown due to deadlock with 
> Replication Table
> --
>
> Key: HBASE-16138
> URL: https://issues.apache.org/jira/browse/HBASE-16138
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Joseph
>Assignee: Ashu Pachauri
>Priority: Critical
> Attachments: HBASE-16138.patch, HBASE-16138-v1.patch, 
> HBASE-16138-v2.patch
>
>
> If we shutdown an entire HBase cluster and attempt to start it back up, we 
> have to run the WAL pre-log roll that occurs before opening up a region. Yet 
> this pre-log roll must record the new WAL inside of ReplicationQueues. This 
> method call ends up blocking on 
> TableBasedReplicationQueues.getOrBlockOnReplicationTable(), because the 
> Replication Table is not up yet. And we cannot assign the Replication Table 
> because we cannot open any regions. This ends up deadlocking the entire 
> cluster whenever we lose Replication Table availability. 
> There are a few options that we can do, but none of them seem very good:
> 1. Depend on Zookeeper-based Replication until the Replication Table becomes 
> available
> 2. Have a separate WAL for System Tables that does not perform any 
> replication (see discussion  at HBASE-14623)
>   Or just have a seperate WAL for non-replicated vs replicated 
> regions
> 3. Record the WAL log in the ReplicationQueue asynchronously (don't block 
> opening a region on this event), which could lead to inconsistent Replication 
> state
> The stacktrace:
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.recordLog(ReplicationSourceManager.java:376)
> 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.preLogRoll(ReplicationSourceManager.java:348)
> 
> org.apache.hadoop.hbase.replication.regionserver.Replication.preLogRoll(Replication.java:370)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.tellListenersAboutPreLogRoll(FSHLog.java:637)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:701)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:600)
> 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.(FSHLog.java:533)
> 
> org.apache.hadoop.hbase.wal.DefaultWALProvider.getWAL(DefaultWALProvider.java:132)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:186)
> 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:197)
> org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:240)
> 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1883)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:363)
> 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
> 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> Does anyone have any suggestions/ideas/feedback?
> Attached a review board at: https://reviews.apache.org/r/50546/
> It is still pretty rough, would just like some feedback on it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18137:
---
Status: Patch Available  (was: Open)

Looks like the new unit test doesn't pass for me when testing the branch-1 
patch:
{noformat}
Failed tests: 
  TestReplicationSmallTests.testEmptyWAL:862->testSimplePutDelete:226 Waited 
too much time for put replication

Tests run: 14, Failures: 1, Errors: 0, Skipped: 0
{noformat}

Timeout too aggressive?


> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch, 
> HBASE-18137.branch-1.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043657#comment-16043657
 ] 

Ashu Pachauri commented on HBASE-18027:
---

Thanks [~apurtell]

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043654#comment-16043654
 ] 

Hudson commented on HBASE-18027:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #60 (See 
[https://builds.apache.org/job/HBase-1.3-IT/60/])
HBASE-18027 HBaseInterClusterReplicationEndpoint should respect RPC (apurtell: 
rev 4227757335c3fe15ef1d7139140c795b414facf2)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicator.java


> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18192) Replication drops recovered queues on region server shutdown

2017-06-08 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-18192:
--
Status: Patch Available  (was: Open)

Submitting for QA run.

> Replication drops recovered queues on region server shutdown
> 
>
> Key: HBASE-18192
> URL: https://issues.apache.org/jira/browse/HBASE-18192
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.2.6, 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18192.branch-1.3.001.patch, 
> HBASE-18192.branch-1.3.002.patch
>
>
> When a recovered queue has only one active ReplicationWorkerThread, the 
> recovered queue is completely dropped on a region server shutdown. This will 
> happen in situation when 
> 1. DefaultWALProvider is used.
> 2. RegionGroupingProvider provider is used but replication is stuck on one 
> WAL group for some reason (for example HBASE-18137)
> 3. All other replication workers have died due to unhandled exception, and 
> the only one finishes. This will cause the recovered queue to get deleted 
> without a regionserver shutdown. This can happen on deployments without fix 
> for HBASE-17381.
> The problematic piece of code is:
> {Code}
> while (isWorkerActive()){
> // The worker thread run loop...
> }
> if (replicationQueueInfo.isQueueRecovered()) {
> // use synchronize to make sure one last thread will clean the queue
> synchronized (workerThreads) {
>   Threads.sleep(100);// wait a short while for other worker thread to 
> fully exit
>   boolean allOtherTaskDone = true;
>   for (ReplicationSourceWorkerThread worker : workerThreads.values()) 
> {
> if (!worker.equals(this) && worker.isAlive()) {
>   allOtherTaskDone = false;
>   break;
> }
>   }
>   if (allOtherTaskDone) {
> manager.closeRecoveredQueue(this.source);
> LOG.info("Finished recovering queue " + peerClusterZnode
> + " with the following stats: " + getStats());
>   }
> }
> {Code}
> The conceptual issue is that isWorkerActive() tells whether a worker is 
> currently running or not and it's being used as a proxy for whether a worker 
> has finished it's work. But, in fact, "Should a worker should exit?" and "Has 
> completed it's work?" are two different questions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043637#comment-16043637
 ] 

Andrew Purtell commented on HBASE-18027:


[~ashu210890]. No good reason. Done. 

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Fix Version/s: 1.3.2

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-06-08 Thread Chinmay Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kulkarni updated HBASE-17988:
-
Attachment: HBASE-17988.002.patch

> get-active-master.rb and draining_servers.rb no longer work
> ---
>
> Key: HBASE-17988
> URL: https://issues.apache.org/jira/browse/HBASE-17988
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Chinmay Kulkarni
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-17988.002.patch, HBASE-17988.patch
>
>
> The scripts {{bin/get-active-master.rb}} and {{bin/draining_servers.rb}} no 
> longer work on current master branch. Here is an example error message:
> {noformat}
> $ bin/hbase-jruby bin/get-active-master.rb 
> NoMethodError: undefined method `masterAddressZNode' for 
> #
>at bin/get-active-master.rb:35
> {noformat}
> My initial probing suggests that this is likely due to movement that happened 
> in HBASE-16690. Perhaps instead of reworking the ruby, there is similar Java 
> functionality already existing somewhere.
> Putting priority at critical since it's impossible to know whether users rely 
> on the scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-06-08 Thread Chinmay Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043631#comment-16043631
 ] 

Chinmay Kulkarni commented on HBASE-17988:
--

Oops! Not sure how I missed that. Thanks for pointing it out. Attaching an 
amended patch which uses _MasterAddressTracker_ instead of _ZKUtil_ to get the 
server name inside {{bin/get-active-master.rb}}.

> get-active-master.rb and draining_servers.rb no longer work
> ---
>
> Key: HBASE-17988
> URL: https://issues.apache.org/jira/browse/HBASE-17988
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Chinmay Kulkarni
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-17988.002.patch, HBASE-17988.patch
>
>
> The scripts {{bin/get-active-master.rb}} and {{bin/draining_servers.rb}} no 
> longer work on current master branch. Here is an example error message:
> {noformat}
> $ bin/hbase-jruby bin/get-active-master.rb 
> NoMethodError: undefined method `masterAddressZNode' for 
> #
>at bin/get-active-master.rb:35
> {noformat}
> My initial probing suggests that this is likely due to movement that happened 
> in HBASE-16690. Perhaps instead of reworking the ruby, there is similar Java 
> functionality already existing somewhere.
> Putting priority at critical since it's impossible to know whether users rely 
> on the scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon updated HBASE-18137:
-
Attachment: HBASE-18137.branch-1.v1.patch

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch, 
> HBASE-18137.branch-1.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043618#comment-16043618
 ] 

Andrew Purtell commented on HBASE-18137:


We should follow up on HBASE-12125 so folks can fix this if they don't opt in. 

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043616#comment-16043616
 ] 

Andrew Purtell commented on HBASE-18137:


lgtm. Please add a release note on the JIRA. 
I'll commit shortly unless objection.

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17960) IntegrationTestReplication fails in successive runs due to lack of appropriate cleanup

2017-06-08 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-17960:
--
Fix Version/s: 1.3.2
   1.4.0
   2.0.0

> IntegrationTestReplication fails in successive runs due to lack of 
> appropriate cleanup
> --
>
> Key: HBASE-17960
> URL: https://issues.apache.org/jira/browse/HBASE-17960
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
>
> The way ITR works right now is that it adds a peer named 'TestPeer' for the 
> replication destination cluster. The name of the peer is same across runs.
> Also, it removes the peer in the beginning of each run. However, it does not 
> wait for the queues corresponding to the peer to get cleaned up (which is an 
> asynchronous operation and can take 10s of seconds). This causes the next run 
> to fail and so on.
> The test setup should wait for a non-trivial amount of time to cleanup the 
> queues corresponding to the peer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-17960) IntegrationTestReplication fails in successive runs due to lack of appropriate cleanup

2017-06-08 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri reassigned HBASE-17960:
-

Assignee: Ashu Pachauri

> IntegrationTestReplication fails in successive runs due to lack of 
> appropriate cleanup
> --
>
> Key: HBASE-17960
> URL: https://issues.apache.org/jira/browse/HBASE-17960
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>
> The way ITR works right now is that it adds a peer named 'TestPeer' for the 
> replication destination cluster. The name of the peer is same across runs.
> Also, it removes the peer in the beginning of each run. However, it does not 
> wait for the queues corresponding to the peer to get cleaned up (which is an 
> asynchronous operation and can take 10s of seconds). This causes the next run 
> to fail and so on.
> The test setup should wait for a non-trivial amount of time to cleanup the 
> queues corresponding to the peer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043606#comment-16043606
 ] 

Ashu Pachauri commented on HBASE-18027:
---

[~apurtell] The underlying rpc size enforcement was added in 1.3 and that 
combined with this issue is definitely a bug. Do you see any reason not to push 
it to branch-1.3 ?

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon updated HBASE-18137:
-
Attachment: HBASE-18137.branch-1.3.v3.patch

Added a config setting "replication.source.eof.autorecovery" with a default of 
false.

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch, HBASE-18137.branch-1.3.v3.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18092) Removing a peer does not properly clean up the ReplicationSourceManager state and metrics

2017-06-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18092:
---
Status: Patch Available  (was: Open)

> Removing a peer does not properly clean up the ReplicationSourceManager state 
> and metrics
> -
>
> Key: HBASE-18092
> URL: https://issues.apache.org/jira/browse/HBASE-18092
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-18092.master.001.patch
>
>
> Removing a peer does not clean up the associated metrics and state from 
> walsById map in the ReplicationSourceManager.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18192) Replication drops recovered queues on region server shutdown

2017-06-08 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-18192:
--
Attachment: HBASE-18192.branch-1.3.002.patch

> Replication drops recovered queues on region server shutdown
> 
>
> Key: HBASE-18192
> URL: https://issues.apache.org/jira/browse/HBASE-18192
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 1.2.6
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18192.branch-1.3.001.patch, 
> HBASE-18192.branch-1.3.002.patch
>
>
> When a recovered queue has only one active ReplicationWorkerThread, the 
> recovered queue is completely dropped on a region server shutdown. This will 
> happen in situation when 
> 1. DefaultWALProvider is used.
> 2. RegionGroupingProvider provider is used but replication is stuck on one 
> WAL group for some reason (for example HBASE-18137)
> 3. All other replication workers have died due to unhandled exception, and 
> the only one finishes. This will cause the recovered queue to get deleted 
> without a regionserver shutdown. This can happen on deployments without fix 
> for HBASE-17381.
> The problematic piece of code is:
> {Code}
> while (isWorkerActive()){
> // The worker thread run loop...
> }
> if (replicationQueueInfo.isQueueRecovered()) {
> // use synchronize to make sure one last thread will clean the queue
> synchronized (workerThreads) {
>   Threads.sleep(100);// wait a short while for other worker thread to 
> fully exit
>   boolean allOtherTaskDone = true;
>   for (ReplicationSourceWorkerThread worker : workerThreads.values()) 
> {
> if (!worker.equals(this) && worker.isAlive()) {
>   allOtherTaskDone = false;
>   break;
> }
>   }
>   if (allOtherTaskDone) {
> manager.closeRecoveredQueue(this.source);
> LOG.info("Finished recovering queue " + peerClusterZnode
> + " with the following stats: " + getStats());
>   }
> }
> {Code}
> The conceptual issue is that isWorkerActive() tells whether a worker is 
> currently running or not and it's being used as a proxy for whether a worker 
> has finished it's work. But, in fact, "Should a worker should exit?" and "Has 
> completed it's work?" are two different questions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-18195:
-
Status: Patch Available  (was: Open)

> Remove redundant single quote from start message for HMaster and HRegionServer
> --
>
> Key: HBASE-18195
> URL: https://issues.apache.org/jira/browse/HBASE-18195
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
>  Labels: beginners
> Attachments: HBASE-18195.master.001.patch
>
>
> Message in the log shows up as:
> {code}
> INFO  [main] master.HMaster: STARTING service 'HMaster
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-18195:
-
Attachment: HBASE-18195.master.001.patch

Removed redundant single quote from start message for HMaster and HRegionServer

> Remove redundant single quote from start message for HMaster and HRegionServer
> --
>
> Key: HBASE-18195
> URL: https://issues.apache.org/jira/browse/HBASE-18195
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
>  Labels: beginners
> Attachments: HBASE-18195.master.001.patch
>
>
> Message in the log shows up as:
> {code}
> INFO  [main] master.HMaster: STARTING service 'HMaster
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread Yi Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043508#comment-16043508
 ] 

Yi Liang commented on HBASE-18109:
--

Thanks for committing and the comments. 

> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043506#comment-16043506
 ] 

Hudson commented on HBASE-18109:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3160 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3160/])
HBASE-18109: Assign system tables first This issue adds comments and a (stack: 
rev 112bff4ba038634b5a2c3fd35c5ee3ee8a097886)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java


> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-18195:
-
Labels: beginners  (was: )

> Remove redundant single quote from start message for HMaster and HRegionServer
> --
>
> Key: HBASE-18195
> URL: https://issues.apache.org/jira/browse/HBASE-18195
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
>  Labels: beginners
>
> Message in the log shows up as:
> {code}
> INFO  [main] master.HMaster: STARTING service 'HMaster
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043504#comment-16043504
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Interesting, I hadn't realized that the HDFS blocks are cached in the 
RegionLocationFinder. I will benchmark the code tomorrow with/without the 
RegionLocationFinder to see if it was adding latency.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18195) Remove redundant single quote from start message for HMaster and HRegionServer

2017-06-08 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-18195:


 Summary: Remove redundant single quote from start message for 
HMaster and HRegionServer
 Key: HBASE-18195
 URL: https://issues.apache.org/jira/browse/HBASE-18195
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Umesh Agashe
Assignee: Umesh Agashe
Priority: Minor


Message in the log shows up as:
{code}
INFO  [main] master.HMaster: STARTING service 'HMaster
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043500#comment-16043500
 ] 

Hudson commented on HBASE-18109:


FAILURE: Integrated in Jenkins build HBase-2.0 #11 (See 
[https://builds.apache.org/job/HBase-2.0/11/])
HBASE-18109: Assign system tables first This issue adds comments and a (stack: 
rev ea7d51e1291bf3c6d5d6ef977f9abcea79c04a3e)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java


> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18174) Implement Table#checkAndPut()

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043430#comment-16043430
 ] 

Ted Yu commented on HBASE-18174:


Experimented with the following modification but still got linker error:

* changed return type from bool to std::shared_ptr
* inlining RequestConverter::CheckToMutateRequest() in the lambda

> Implement Table#checkAndPut()
> -
>
> Key: HBASE-18174
> URL: https://issues.apache.org/jira/browse/HBASE-18174
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
> Attachments: 18174.v1.txt, 18174.v7.lambda.txt
>
>
> This task is to implement Table#checkAndPut() method



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043411#comment-16043411
 ] 

Ted Yu commented on HBASE-18141:


lgtm

> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 1.3.2
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043380#comment-16043380
 ] 

Andrew Purtell commented on HBASE-18141:


+1

> Regionserver fails to shutdown when abort triggered in RegionScannerImpl 
> during RPC call
> 
>
> Key: HBASE-18141
> URL: https://issues.apache.org/jira/browse/HBASE-18141
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 1.3.2
>
> Attachments: HBASE-18141.001.patch, HBASE-18141.branch-1.3.001.patch, 
> HBASE-18141.branch-1.3.002.patch
>
>
> When an abort is triggered within the RPC call path by 
> HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC 
> caller identity in the RegionServerObserver.preStopRegionServer() hook.  This 
> leaves the regionserver in a non-responsive state, where its regions are not 
> reassigned and it returns exceptions for all requests.
> When an abort is triggered on the server side, we should not allow a 
> coprocessor to reject the abort at all.
> Here is a sample stack trace:
> {noformat}
> 17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider]
> 17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
> stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'rpcuser' (global, action=ADMIN)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
> at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {noformat}
> I haven't yet evaluated which other release branches this might apply to.
> I have a patch currently in progress, which I will post as soon as I complete 
> a test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18152) [AMv2] Corrupt Procedure WAL file; procedure data stored out of order

2017-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18152:
--
Attachment: pv2-0036.log

Another corruption form. Trying to figure it.

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> -
>
> Key: HBASE-18152
> URL: https://issues.apache.org/jira/browse/HBASE-18152
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-17537.master.002.patch, 
> HBASE-18152.master.001.patch, pv2-0036.log, 
> pv2-0047.log, reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18109:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2 and master. Thanks for the digging and the patch 
[~easyliangjob] What I committed was your patch plus some comments that adds to 
code your findings researching this issue.

Thanks.

Was going to file an issue for the [~yangzhe1991] request but the man would 
probably do a better job than I explaining what is needed.


> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043363#comment-16043363
 ] 

Andrew Purtell commented on HBASE-18137:


bq. So we only dump the current file and move on if we get EOFException, the 
length is 0, and there are WALs in the queue behind this one (we assume that 
means the current WAL is closed and therefore there really is no data).

This works for autorecovery, although there's a chance a block has gone missing 
and will come back once a datanode is recovered. Do we still want to make this 
opt-in and handle it otherwise with hbck? 

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18109:
--
Release Note: Adds a sort of procedures before submission so system tables 
are queued first (which will help ensure they go out first). This should be 
good enough along w/ existing scheduling mechanisms to ensure system/meta are 
assigned first (See reasoning below). Open new issue if insufficient.

> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043338#comment-16043338
 ] 

Chia-Ping Tsai commented on HBASE-18164:


bq. The big bottleneck by far was the second bit about collecting all the HDFS 
blocks of every region for every iteration of the balancer.
The HDFS blocks(HDFSBlocksDistribution) is also cached in RegionLocationFinder, 
but we still spent a bunch of time to collect the HDFS blocks for every 
iteration?

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18109) Assign system tables first (priority)

2017-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043336#comment-16043336
 ] 

stack commented on HBASE-18109:
---

Good find on HTD Priority. Seems like we should expand its compass to include 
not only RPC scheduling but also, its order when assigning. Can do in follow-up 
issue.

On your writeup, that is a nice summary. I'm going to commit your patch with 
some added commentary that comes of your findings. I think that we then resolve 
this issue as 'done'. If we find a case where the current scheduling is failing 
put out system tables generally ahead of user-space tables, lets revisit. Thank 
you for digging in here.

> Assign system tables first (priority)
> -
>
> Key: HBASE-18109
> URL: https://issues.apache.org/jira/browse/HBASE-18109
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-18109-V1.patch, HBASE-18109-V2.patch
>
>
> Need this for stuff like the RSGroup table, etc. Assign these ahead of 
> user-space regions.
> From 'Handle sys table assignment first (e.g. acl, namespace, rsgroup); 
> currently only hbase:meta is first.' of 
> https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.oefcyphs0v0x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043328#comment-16043328
 ] 

Hadoop QA commented on HBASE-18164:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
58m 10s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 151m 43s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
5s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 241m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver |
| Timed out junit tests | 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream |
|   | org.apache.hadoop.hbase.client.TestSnapshotMetadata |
|   | org.apache.hadoop.hbase.client.TestSnapshotFromClientWithRegionReplicas |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872092/HBASE-18164-02.patch |
| JIRA Issue | HBASE-18164 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 63222803bf4a 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 72cb7d9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7147/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7147/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7147/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043289#comment-16043289
 ] 

stack commented on HBASE-18050:
---

Oh, do we want this to apply to 2.0 too?

> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18050:
--
Release Note: Adds this qualification to section on Audience annotation: " 
Notice that, you may find that the classes which are declared as IA.Private are 
used as parameter or return value for the interfaces which are declared as 
IA.LimitedPrivate. This is possible. You should treat the IA.Private object as 
a monolithic object, which means you can use it as a parameter to call other 
methods, or return it, but you should never try to access its methods or 
fields."

> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18050) Add document about the IA.Private classes which appear in IA.LimitedPrivate interfaces

2017-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043288#comment-16043288
 ] 

stack commented on HBASE-18050:
---

+1 This is great.

> Add document about the IA.Private classes which appear in IA.LimitedPrivate 
> interfaces
> --
>
> Key: HBASE-18050
> URL: https://issues.apache.org/jira/browse/HBASE-18050
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18050.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043278#comment-16043278
 ] 

Sean Busbey commented on HBASE-18137:
-

bq. (we assume that means the current WAL is closed and therefore there really 
is no data).

I think this is incorrect, but we already assume it in many places. Just to 
confirm, HDFS doesn't have an api call for "does anyone have a lease on this 
file", right?

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon updated HBASE-18137:
-
Attachment: HBASE-18137.branch-1.3.v2.patch

Added a check for 0 length

So we only dump the current file and move on if we get EOFException, the length 
is 0, and there are WALs in the queue behind this one (we assume that means the 
current WAL is closed and therefore there really is no data).

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch, 
> HBASE-18137.branch-1.3.v2.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-18194) Feature to get IO characteristics at query level

2017-06-08 Thread NITIN VERMA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NITIN VERMA reassigned HBASE-18194:
---

Assignee: NITIN VERMA

> Feature to get IO characteristics at query level
> 
>
> Key: HBASE-18194
> URL: https://issues.apache.org/jira/browse/HBASE-18194
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: NITIN VERMA
>Assignee: NITIN VERMA
>Priority: Minor
>
> Relational databases like SQL Server, Oracle and Sybase provides a way to get 
> IO characteristics at query level. We need similar feature with HBase, where, 
> when requested, we should be able to get metrics like how many IO's satisfied 
> from Block Cache, how many from Bucket Cache and how many ended up in HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18194) Feature to get IO characteristics at query level

2017-06-08 Thread NITIN VERMA (JIRA)
NITIN VERMA created HBASE-18194:
---

 Summary: Feature to get IO characteristics at query level
 Key: HBASE-18194
 URL: https://issues.apache.org/jira/browse/HBASE-18194
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: NITIN VERMA
Priority: Minor


Relational databases like SQL Server, Oracle and Sybase provides a way to get 
IO characteristics at query level. We need similar feature with HBase, where, 
when requested, we should be able to get metrics like how many IO's satisfied 
from Block Cache, how many from Bucket Cache and how many ended up in HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18179) Add new hadoop releases to the pre commit hadoop check

2017-06-08 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043199#comment-16043199
 ] 

Mike Drob commented on HBASE-18179:
---

I think the license problem was already flagged in HBASE-18033, but resolving 
it here is fine too.

> Add new hadoop releases to the pre commit hadoop check
> --
>
> Key: HBASE-18179
> URL: https://issues.apache.org/jira/browse/HBASE-18179
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>
> 3.0.0-alpha3 is out, we should replace the old alpha2 release with alpha3. 
> And we should add new 2.x releases also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-08 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043190#comment-16043190
 ] 

Ashu Pachauri commented on HBASE-18186:
---

bq. If you are not looking into this I can take this up
Not looking into this at the moment. But, I think [~mantonov] is digging into 
HBASE-17406 which may be overlapping with this.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> 

[jira] [Commented] (HBASE-18193) Master web UI presents the incorrect number of regions

2017-06-08 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043181#comment-16043181
 ] 

Chia-Ping Tsai commented on HBASE-18193:


Thanks for the reviews. [~enis]
Unless someone raises an objection, I will commit it tomorrow.


> Master web UI presents the incorrect number of regions
> --
>
> Key: HBASE-18193
> URL: https://issues.apache.org/jira/browse/HBASE-18193
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18193.v0.patch
>
>
> {code:title=RegionStates.java}
>   public Map 
> getRegionByStateOfTable(TableName tableName) {
> final State[] states = State.values();
> final Map tableRegions =
> new HashMap(states.length);
> for (int i = 0; i < states.length; ++i) {
>   tableRegions.put(states[i], new ArrayList());
> }
> for (RegionStateNode node: regionsMap.values()) {
>   tableRegions.get(node.getState()).add(node.getRegionInfo());
> }
> return tableRegions;
>   }
> {code}
> It always returns all regions...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17529) MergeTableRegionsProcedure failed due to ArrayIndexOutOfBoundsException

2017-06-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-17529.

Resolution: Cannot Reproduce

> MergeTableRegionsProcedure failed due to ArrayIndexOutOfBoundsException
> ---
>
> Key: HBASE-17529
> URL: https://issues.apache.org/jira/browse/HBASE-17529
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>  Labels: rpc
> Attachments: 17529-master.log
>
>
> I built tar ball using master branch based on commit 
> 616f4801b06a8427a03ceca9fb8345700ce1ad71.
> Was running the following command:
> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList 
> -DinMemoryCompaction=BASIC Loop 4 6 100 /tmp/hbase-biglinkedlist-verify 6 
> --monkey slowDeterministic
> Here was related snippet:
> {code}
> 2017-01-24 21:29:00,107 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=0,queue=0,port=16000] 
> procedure2.ProcedureExecutor: Stored MergeTableRegionsProcedure 
> (table=IntegrationTestBigLinkedList 
> regions=[IntegrationTestBigLinkedList,,1485292220242.4c5ea240e86ef22ec7264b1153dd557d.,
>  
> IntegrationTestBigLinkedList,\x0E8\xE3\x8E8\xE3\x8E8,1485292220242.6cdb98dfed41ea689b3cd66478c2c580.
>  ] forcible=false), procId=12, owner=hbase, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE
> 2017-01-24 21:29:00,108 DEBUG [ProcedureExecutorWorker-14] 
> wal.WALProcedureStore: Set running procedure count=1, slots=24
> 2017-01-24 21:29:00,127 ERROR [ProcedureExecutorWorker-14] 
> procedure2.ProcedureExecutor: CODE-BUG: Uncatched runtime exception for 
> procedure: MergeTableRegionsProcedure (table=IntegrationTestBigLinkedList 
> regions=[IntegrationTestBigLinkedList,,1485292220242.4c5ea240e86ef22ec7264b1153dd557d.,
>  
> IntegrationTestBigLinkedList,\x0E8\xE3\x8E8\xE3\x8E8,1485292220242.6cdb98dfed41ea689b3cd66478c2c580.
>  ] forcible=false), procId=12, owner=hbase, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_MOVE_REGION_TO_SAME_RS
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1024)
> at 
> org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:628)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$ByteBuffByteInput.read(RpcServer.java:1483)
> at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.ByteInputByteString.copyToInternal(ByteInputByteString.java:105)
> at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.ByteString.toByteArray(ByteString.java:651)
> at org.apache.hadoop.hbase.RegionLoad.getName(RegionLoad.java:50)
> at 
> org.apache.hadoop.hbase.ServerLoad.getRegionsLoad(ServerLoad.java:236)
> at 
> org.apache.hadoop.hbase.master.procedure.MergeTableRegionsProcedure.getRegionLoad(MergeTableRegionsProcedure.java:774)
> at 
> org.apache.hadoop.hbase.master.procedure.MergeTableRegionsProcedure.MoveRegionsToSameRS(MergeTableRegionsProcedure.java:461)
> at 
> org.apache.hadoop.hbase.master.procedure.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:142)
> at 
> org.apache.hadoop.hbase.master.procedure.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:72)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:154)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:708)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1332)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1133)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:76)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1588)
> {code}
> Master log to be attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18008) Any HColumnDescriptor we give out should be immutable

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043171#comment-16043171
 ] 

Hudson commented on HBASE-18008:


FAILURE: Integrated in Jenkins build HBase-2.0 #10 (See 
[https://builds.apache.org/job/HBase-2.0/10/])
HBASE-18008 Any HColumnDescriptor we give out should be immutable (chia7712: 
rev 1e7804634c043fca295f97ab70b7fc1da16f274c)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ColumnFamilyDescriptorBuilder.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdminWithClusters.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptorBuilder.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncAdmin.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ImmutableHTableDescriptor.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/TestHColumnDescriptor.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestTableDescriptorBuilder.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ImmutableHColumnDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestImmutableHColumnDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestColumnFamilyDescriptorBuilder.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptor.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestImmutableHTableDescriptor.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ColumnFamilyDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncHBaseAdmin.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin1.java


> Any HColumnDescriptor we give out should be immutable
> -
>
> Key: HBASE-18008
> URL: https://issues.apache.org/jira/browse/HBASE-18008
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18008.v0.patch, HBASE-18008.v1.patch, 
> HBASE-18008.v2.patch, HBASE-18008.v3.patch, HBASE-18008.v4.patch, 
> HBASE-18008.v5.patch, HBASE-18008.v6.patch, HBASE-18008.v6.patch, 
> HBASE-18008.v7.patch, HBASE-18008.v7.patch, HBASE-18008.v8.patch, 
> HBASE-18008.v8.patch
>
>
> This is similar to HBASE-15583, but we should move up on to the 
> ColumnFamilyDescriptor rather than ColumnDescriptor.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18193) Master web UI presents the incorrect number of regions

2017-06-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043166#comment-16043166
 ] 

Enis Soztutar commented on HBASE-18193:
---

+1. 

> Master web UI presents the incorrect number of regions
> --
>
> Key: HBASE-18193
> URL: https://issues.apache.org/jira/browse/HBASE-18193
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18193.v0.patch
>
>
> {code:title=RegionStates.java}
>   public Map 
> getRegionByStateOfTable(TableName tableName) {
> final State[] states = State.values();
> final Map tableRegions =
> new HashMap(states.length);
> for (int i = 0; i < states.length; ++i) {
>   tableRegions.put(states[i], new ArrayList());
> }
> for (RegionStateNode node: regionsMap.values()) {
>   tableRegions.get(node.getState()).add(node.getRegionInfo());
> }
> return tableRegions;
>   }
> {code}
> It always returns all regions...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18008) Any HColumnDescriptor we give out should be immutable

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043134#comment-16043134
 ] 

Hudson commented on HBASE-18008:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3159 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3159/])
HBASE-18008 Any HColumnDescriptor we give out should be immutable (chia7712: 
rev 72cb7d97ccd5a959d30b2c84008d27e6c7597fc1)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptor.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ImmutableHColumnDescriptor.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ColumnFamilyDescriptorBuilder.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdminWithClusters.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/TestHColumnDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestTableDescriptorBuilder.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncHBaseAdmin.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestImmutableHTableDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ImmutableHTableDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin1.java
* (add) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ColumnFamilyDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncAdmin.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptorBuilder.java
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestColumnFamilyDescriptorBuilder.java
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestImmutableHColumnDescriptor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java


> Any HColumnDescriptor we give out should be immutable
> -
>
> Key: HBASE-18008
> URL: https://issues.apache.org/jira/browse/HBASE-18008
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18008.v0.patch, HBASE-18008.v1.patch, 
> HBASE-18008.v2.patch, HBASE-18008.v3.patch, HBASE-18008.v4.patch, 
> HBASE-18008.v5.patch, HBASE-18008.v6.patch, HBASE-18008.v6.patch, 
> HBASE-18008.v7.patch, HBASE-18008.v7.patch, HBASE-18008.v8.patch, 
> HBASE-18008.v8.patch
>
>
> This is similar to HBASE-15583, but we should move up on to the 
> ColumnFamilyDescriptor rather than ColumnDescriptor.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18185) IntegrationTestTimeBoundedRequestsWithRegionReplicas unbalanced tests fails with AssertionError

2017-06-08 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043130#comment-16043130
 ] 

huaxiang sun commented on HBASE-18185:
--

Hi [~balazs.meszaros], I feel like that when  victimServers.size() <= 2, the 
assert will fail. It is an setting up issue. Can we make sure in the test, 
victimServers.size() <= 2 does not happen? i.e, there are already 3 region 
servers at least for this test. Thanks.

> IntegrationTestTimeBoundedRequestsWithRegionReplicas unbalanced tests fails 
> with AssertionError
> ---
>
> Key: HBASE-18185
> URL: https://issues.apache.org/jira/browse/HBASE-18185
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Affects Versions: 2.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Minor
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
> Attachments: HBASE-18185-BM-0001.patch, HBASE-18185-BM-0002.patch
>
>
> We got the following error:
> Exception in thread "main" java.lang.AssertionError: Verification failed with 
> error code 1
> at org.junit.Assert.fail(Assert.java:88)
> at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runIngestTest(IntegrationTestTimeBoundedRequestsWithRegionReplicas.java:217)
> at 
> org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:123)
> at 
> org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:106)
> at 
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:123)
> at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas.main(IntegrationTestTimeBoundedRequestsWithRegionReplicas.java:362)
> The reason why we got it because another assertion fails in 
> UnbalanceKillAndRebalanceAction:
> Exception in thread "Thread-57" java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.hadoop.hbase.chaos.actions.UnbalanceKillAndRebalanceAction.perform(UnbalanceKillAndRebalanceAction.java:60)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18179) Add new hadoop releases to the pre commit hadoop check

2017-06-08 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043090#comment-16043090
 ] 

Sean Busbey commented on HBASE-18179:
-

excellent, that's just an improperly named license. if you update 
supplemental-info.xml for that dependency to specify the correct name it'll 
work. there should be lots of examples, in short it's basically "Apache 
License, Version 2.0".

> Add new hadoop releases to the pre commit hadoop check
> --
>
> Key: HBASE-18179
> URL: https://issues.apache.org/jira/browse/HBASE-18179
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>
> 3.0.0-alpha3 is out, we should replace the old alpha2 release with alpha3. 
> And we should add new 2.x releases also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18137) Replication gets stuck for empty WALs

2017-06-08 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043086#comment-16043086
 ] 

Sean Busbey commented on HBASE-18137:
-

yeah, dfs will report a 0 length while there's data someplace in some 
circumstances if the file is still open for writing.

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18137.branch-1.3.v1.patch
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17893) Allow HBase to build against Hadoop 2.8.0

2017-06-08 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043062#comment-16043062
 ] 

Sean Busbey commented on HBASE-17893:
-

we expressly call out running on top of 2.8.0 as unsupported in the ref guide, 
so we ought not add it to the personality. if 2.8.1 is out now, then I'm +1 on 
adding that.

> Allow HBase to build against Hadoop 2.8.0
> -
>
> Key: HBASE-17893
> URL: https://issues.apache.org/jira/browse/HBASE-17893
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.5
>Reporter: Lars Hofhansl
> Attachments: 17883-1.2-BROKEN.txt, 17893-1.3-backport.txt
>
>
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) 
> on project hbase-assembly: Error rendering velocity resource. Error invoking 
> method 'get(java.lang.Integer)' in java.util.ArrayList at 
> META-INF/LICENSE.vm[line 1671, column 8]: InvocationTargetException: Index: 
> 0, Size: 0 -> [Help 1]
> {code}
> The in the generated LICENSE.
> {code}
> This product includes Nimbus JOSE+JWT licensed under the The Apache Software 
> License, Version 2.0.
> ${dep.licenses[0].comments}
> Please check  this License for acceptability here:
> https://www.apache.org/legal/resolved
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
> If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> com.nimbusds
> nimbus-jose-jwt
> 3.9
> maven central search
> g:com.nimbusds AND a:nimbus-jose-jwt AND v:3.9
> project website
> https://bitbucket.org/connect2id/nimbus-jose-jwt
> project source
> https://bitbucket.org/connect2id/nimbus-jose-jwt
> {code}
> Maybe the problem is just that it says: Apache _Software_ License



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042999#comment-16042999
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


[~tedyu] I just made your requested changes (shortening lines, renaming, and 
squashing into a single commit).

[~chia7712] The big bottleneck by far was the second bit about collecting all 
the HDFS blocks of every region for every iteration of the balancer. Adding the 
caching of the localities at the beginning of the balancer run is responsible 
for most of the speedup.

The first part, albeit less impactful, is still important. The old locality 
computation was O(# regions * # region servers), which does not scale well as 
the cluster gets larger. Now it's effectively O(1), which makes a substantial 
difference.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18175) Add hbase-spark integration test into hbase-it

2017-06-08 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-18175:

Status: In Progress  (was: Patch Available)

moving out of patch available pending feedback

> Add hbase-spark integration test into hbase-it
> --
>
> Key: HBASE-18175
> URL: https://issues.apache.org/jira/browse/HBASE-18175
> Project: HBase
>  Issue Type: Test
>  Components: spark
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: hbase-18175-v1.patch
>
>
> After HBASE-17574, all test under hbase-spark are regarded as unit test, and 
> this jira will add integration test of hbase-spark into hbase-it.  This patch 
> run same tests as mapreduce.IntegrationTestBulkLoad, just change mapreduce to 
> spark.  
> test in Maven:
> mvn verify -Dit.test=IntegrationTestSparkBulkLoad
> test on cluster:
> spark-submit --class 
> org.apache.hadoop.hbase.spark.IntegrationTestSparkBulkLoad 
> HBASE_HOME/lib/hbase-it-2.0.0-SNAPSHOT-tests.jar 
> -Dhbase.spark.bulkload.chainlength=50 -m slowDeterministic



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >