[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857588#comment-15857588
 ] 

stack commented on HBASE-14614:
---

.009 is rebase

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.001.patch, 
> HBASE-14614.master.002.patch, HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-02-07 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14614:
--
Attachment: HBASE-14614.master.009.patch

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.001.patch, 
> HBASE-14614.master.002.patch, HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857586#comment-15857586
 ] 

Hudson commented on HBASE-17381:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #96 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/96/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
0976b86930fdfb1df0af3adce3c82aa08da55956)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857571#comment-15857571
 ] 

Hudson commented on HBASE-17381:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #89 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/89/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
0976b86930fdfb1df0af3adce3c82aa08da55956)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857567#comment-15857567
 ] 

Hudson commented on HBASE-17381:


FAILURE: Integrated in Jenkins build HBase-1.4 #618 (See 
[https://builds.apache.org/job/HBase-1.4/618/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
8574934f5912b09b785444036bfee9740c966bbb)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857566#comment-15857566
 ] 

Hudson commented on HBASE-17381:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #109 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/109/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
6431eab8e3823d7d450b76a3dfe28114cfbf3a09)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857476#comment-15857476
 ] 

Hadoop QA commented on HBASE-17472:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 33s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 32m 35s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.master.locking.TestLockManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851551/HBASE-17472.v3.patch |
| JIRA Issue | HBASE-17472 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux a22a58a98838 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/Pr

[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread huzheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857431#comment-15857431
 ] 

huzheng commented on HBASE-17381:
-

Failed UT(testRegionServerCoprocessorsReported) in Jenkins  works fine under my 
desktop machine.   And there are no useful log  to debug in jenkins.  :(

> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857429#comment-15857429
 ] 

Hudson commented on HBASE-17381:


FAILURE: Integrated in Jenkins build HBase-1.2-IT #592 (See 
[https://builds.apache.org/job/HBase-1.2-IT/592/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
0976b86930fdfb1df0af3adce3c82aa08da55956)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857398#comment-15857398
 ] 

Hudson commented on HBASE-17381:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #826 (See 
[https://builds.apache.org/job/HBase-1.3-IT/826/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
0976b86930fdfb1df0af3adce3c82aa08da55956)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17521) Avoid stopping the load balancer in graceful stop

2017-02-07 Thread Sandeep Guggilam (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857393#comment-15857393
 ] 

Sandeep Guggilam commented on HBASE-17521:
--

Yes [~stack] , we are doing some testing on the same. Will upload the patch as 
soon as it is done

[~mnpoonia]

> Avoid stopping the load balancer in graceful stop
> -
>
> Key: HBASE-17521
> URL: https://issues.apache.org/jira/browse/HBASE-17521
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> ... instead setting the regionserver in question to draining.
> [~sandeep.guggilam], FYI



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857369#comment-15857369
 ] 

Hudson commented on HBASE-17381:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2464 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2464/])
HBASE-17381 ReplicationSourceWorkerThread can die due to unhandled (garyh: rev 
d8f3c6cff93c62d68ac3f68703bad86deaa03f14)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread huzheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huzheng updated HBASE-17472:

Affects Version/s: 2.0.0

> Correct the semantic of  permission grant
> -
>
> Key: HBASE-17472
> URL: https://issues.apache.org/jira/browse/HBASE-17472
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin
>Affects Versions: 2.0.0
>Reporter: huzheng
>Assignee: huzheng
> Fix For: 2.0.0
>
> Attachments: HBASE-17472.v1.patch, HBASE-17472.v2.patch, 
> HBASE-17472.v3.patch
>
>
> Currently, HBase grant operation has following semantic:
> {code}
> hbase(main):019:0> grant 'hbase_tst', 'RW', 'ycsb'
> 0 row(s) in 0.0960 seconds
> hbase(main):020:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
> 
>  hbase_tst   default,ycsb,,: 
> [Permission:actions=READ,WRITE]   
>   
>   
> 1 row(s) in 0.0550 seconds
> hbase(main):021:0> grant 'hbase_tst', 'CA', 'ycsb'
> 0 row(s) in 0.0820 seconds
> hbase(main):022:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
>  hbase_tst   default,ycsb,,: 
> [Permission: actions=CREATE,ADMIN]
>   
>   
> 1 row(s) in 0.0490 seconds
> {code}  
> Later permission will replace previous granted permissions, which confused 
> most of HBase administrator.
> It's seems more reasonable that HBase merge multiple granted permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857361#comment-15857361
 ] 

Hudson commented on HBASE-17275:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #95 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/95/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 2aaf7851a4de28e40b1a0d641d8fc98e54f5342d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,3000

[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread huzheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857355#comment-15857355
 ] 

huzheng commented on HBASE-17472:
-

Upload patch v3:

1. add java docs for grant/grant2 API. 
2. fix failed UT TestTablePermissions.  which caused by using a TreeSet to 
serialize multiple enum variables into byte array.  So I reorder the expected 
array for AssertEquals , and UT pass. 

> Correct the semantic of  permission grant
> -
>
> Key: HBASE-17472
> URL: https://issues.apache.org/jira/browse/HBASE-17472
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin
>Reporter: huzheng
>Assignee: huzheng
> Fix For: 2.0.0
>
> Attachments: HBASE-17472.v1.patch, HBASE-17472.v2.patch, 
> HBASE-17472.v3.patch
>
>
> Currently, HBase grant operation has following semantic:
> {code}
> hbase(main):019:0> grant 'hbase_tst', 'RW', 'ycsb'
> 0 row(s) in 0.0960 seconds
> hbase(main):020:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
> 
>  hbase_tst   default,ycsb,,: 
> [Permission:actions=READ,WRITE]   
>   
>   
> 1 row(s) in 0.0550 seconds
> hbase(main):021:0> grant 'hbase_tst', 'CA', 'ycsb'
> 0 row(s) in 0.0820 seconds
> hbase(main):022:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
>  hbase_tst   default,ycsb,,: 
> [Permission: actions=CREATE,ADMIN]
>   
>   
> 1 row(s) in 0.0490 seconds
> {code}  
> Later permission will replace previous granted permissions, which confused 
> most of HBase administrator.
> It's seems more reasonable that HBase merge multiple granted permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread huzheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huzheng updated HBASE-17472:

Attachment: HBASE-17472.v3.patch

> Correct the semantic of  permission grant
> -
>
> Key: HBASE-17472
> URL: https://issues.apache.org/jira/browse/HBASE-17472
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin
>Reporter: huzheng
>Assignee: huzheng
> Fix For: 2.0.0
>
> Attachments: HBASE-17472.v1.patch, HBASE-17472.v2.patch, 
> HBASE-17472.v3.patch
>
>
> Currently, HBase grant operation has following semantic:
> {code}
> hbase(main):019:0> grant 'hbase_tst', 'RW', 'ycsb'
> 0 row(s) in 0.0960 seconds
> hbase(main):020:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
> 
>  hbase_tst   default,ycsb,,: 
> [Permission:actions=READ,WRITE]   
>   
>   
> 1 row(s) in 0.0550 seconds
> hbase(main):021:0> grant 'hbase_tst', 'CA', 'ycsb'
> 0 row(s) in 0.0820 seconds
> hbase(main):022:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
>  hbase_tst   default,ycsb,,: 
> [Permission: actions=CREATE,ADMIN]
>   
>   
> 1 row(s) in 0.0490 seconds
> {code}  
> Later permission will replace previous granted permissions, which confused 
> most of HBase administrator.
> It's seems more reasonable that HBase merge multiple granted permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857318#comment-15857318
 ] 

Hudson commented on HBASE-17275:


FAILURE: Integrated in Jenkins build HBase-1.4 #617 (See 
[https://builds.apache.org/job/HBase-1.4/617/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev a75e5a543531c7656ccb5108ec8fe613d584a89e)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,147978

[jira] [Commented] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857319#comment-15857319
 ] 

Hudson commented on HBASE-17565:


FAILURE: Integrated in Jenkins build HBase-1.4 #617 (See 
[https://builds.apache.org/job/HBase-1.4/617/])
HBASE-17565 StochasticLoadBalancer may incorrectly skip balancing due to 
(tedyu: rev 0553290c6a893a74ea926535829dc2237fc48b04)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java


> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.addendum, 17565.v1.txt, 17565.v2.txt, 
> 17565.v3.txt, 17565.v4.txt, 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16731) Inconsistent results from the Get/Scan if we use the empty FilterList

2017-02-07 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857314#comment-15857314
 ] 

ChiaPing Tsai commented on HBASE-16731:
---

[~pankaj2461]

Would you please create a separate JIRA to enhance that? Thanks.

> Inconsistent results from the Get/Scan if we use the empty FilterList
> -
>
> Key: HBASE-16731
> URL: https://issues.apache.org/jira/browse/HBASE-16731
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16731.v0.patch, HBASE-16731.v1.patch, 
> HBASE-16731.v2.patch, HBASE-16731.v3.patch
>
>
> RSRpcServices#get() converts the Get to Scan without 
> scan#setLoadColumnFamiliesOnDemand. It causes that the result retrieved from 
> Get and Scan will be different if we use the empty filter. Scan doesn't 
> return any data but Get does.
> see [HBASE-16729 |https://issues.apache.org/jira/browse/HBASE-16729]
> Any comments? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread huzheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857303#comment-15857303
 ] 

huzheng commented on HBASE-17381:
-

Thanks for your help, [~ghelmling]

> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-14123) HBase Backup/Restore Phase 2

2017-02-07 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857270#comment-15857270
 ] 

Vladimir Rodionov edited comment on HBASE-14123 at 2/8/17 2:56 AM:
---

{quote}
Can you please check or comment on the findbugs warnings and test failures.
{quote}

All findbugs are in protobuf - generated code, but I will take a look at failed 
UTs. Do not think they are related.

What kind of writeup do you expect, [~enis]

Upd.

Test failures have nothing to do with the patch. It is the separate issue, two 
of them are fine now, in a current master, but TestScannerResource is still 
failing. 


was (Author: vrodionov):
{quote}
Can you please check or comment on the findbugs warnings and test failures.
{quote}

All findbugs are in protobuf - generated code, but I will take a look at failed 
UTs. Do not think they are related.

What kind of writeup do you expect, [~enis]

> HBase Backup/Restore Phase 2
> 
>
> Key: HBASE-14123
> URL: https://issues.apache.org/jira/browse/HBASE-14123
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: 14123-master.v14.txt, 14123-master.v15.txt, 
> 14123-master.v16.txt, 14123-master.v17.txt, 14123-master.v18.txt, 
> 14123-master.v19.txt, 14123-master.v20.txt, 14123-master.v21.txt, 
> 14123-master.v24.txt, 14123-master.v25.txt, 14123-master.v27.txt, 
> 14123-master.v28.txt, 14123-master.v29.full.txt, 14123-master.v2.txt, 
> 14123-master.v30.txt, 14123-master.v31.txt, 14123-master.v32.txt, 
> 14123-master.v33.txt, 14123-master.v34.txt, 14123-master.v35.txt, 
> 14123-master.v36.txt, 14123-master.v37.txt, 14123-master.v38.txt, 
> 14123.master.v39.patch, 14123-master.v3.txt, 14123.master.v40.patch, 
> 14123.master.v41.patch, 14123.master.v42.patch, 14123.master.v44.patch, 
> 14123.master.v45.patch, 14123.master.v46.patch, 14123.master.v48.patch, 
> 14123.master.v49.patch, 14123.master.v50.patch, 14123.master.v51.patch, 
> 14123.master.v52.patch, 14123.master.v54.patch, 14123.master.v56.patch, 
> 14123.master.v57.patch, 14123-master.v5.txt, 14123-master.v6.txt, 
> 14123-master.v7.txt, 14123-master.v8.txt, 14123-master.v9.txt, 14123-v14.txt, 
> HBASE-14123-for-7912-v1.patch, HBASE-14123-for-7912-v6.patch, 
> HBASE-14123-v10.patch, HBASE-14123-v11.patch, HBASE-14123-v12.patch, 
> HBASE-14123-v13.patch, HBASE-14123-v15.patch, HBASE-14123-v16.patch, 
> HBASE-14123-v1.patch, HBASE-14123-v2.patch, HBASE-14123-v3.patch, 
> HBASE-14123-v4.patch, HBASE-14123-v5.patch, HBASE-14123-v6.patch, 
> HBASE-14123-v7.patch, HBASE-14123-v9.patch
>
>
> Phase 2 umbrella JIRA. See HBASE-7912 for design document and description. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857278#comment-15857278
 ] 

Phil Yang commented on HBASE-17599:
---

{code}
* @deprecated the word 'partial' ambiguous, use {@link 
#mayHaveMoreCellsInRow()} instead.
* Deprecated since 1.4.0, will be removed in 2.0.0.
* @see #mayHaveMoreCellsInRow()
*/
@Deprecated
public boolean isPartial() 
{code}
We should just remove it in the patch for master branch?


> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17605) Refactor procedure framework code

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857277#comment-15857277
 ] 

Hadoop QA commented on HBASE-17605:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 59s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s 
{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s 
{color} | {color:red} hbase-server generated 2 new + 1 unchanged - 0 fixed = 3 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s 
{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 6s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 131m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Should org.apache.hadoop.hbase.master.procedure.SchemaLocking$Lock be a 
_static_ inner class?  At SchemaLocking.java:inner class?  At 
SchemaLocking.java:[lines 50-108] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851513/HBASE-17605.master.003.patch
 |
| JIRA Issue | HBASE-17605 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d7572f85c08b 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-perso

[jira] [Commented] (HBASE-14123) HBase Backup/Restore Phase 2

2017-02-07 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857270#comment-15857270
 ] 

Vladimir Rodionov commented on HBASE-14123:
---

{quote}
Can you please check or comment on the findbugs warnings and test failures.
{quote}

All findbugs are in protobuf - generated code, but I will take a look at failed 
UTs. Do not think they are related.

What kind of writeup do you expect, [~enis]

> HBase Backup/Restore Phase 2
> 
>
> Key: HBASE-14123
> URL: https://issues.apache.org/jira/browse/HBASE-14123
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: 14123-master.v14.txt, 14123-master.v15.txt, 
> 14123-master.v16.txt, 14123-master.v17.txt, 14123-master.v18.txt, 
> 14123-master.v19.txt, 14123-master.v20.txt, 14123-master.v21.txt, 
> 14123-master.v24.txt, 14123-master.v25.txt, 14123-master.v27.txt, 
> 14123-master.v28.txt, 14123-master.v29.full.txt, 14123-master.v2.txt, 
> 14123-master.v30.txt, 14123-master.v31.txt, 14123-master.v32.txt, 
> 14123-master.v33.txt, 14123-master.v34.txt, 14123-master.v35.txt, 
> 14123-master.v36.txt, 14123-master.v37.txt, 14123-master.v38.txt, 
> 14123.master.v39.patch, 14123-master.v3.txt, 14123.master.v40.patch, 
> 14123.master.v41.patch, 14123.master.v42.patch, 14123.master.v44.patch, 
> 14123.master.v45.patch, 14123.master.v46.patch, 14123.master.v48.patch, 
> 14123.master.v49.patch, 14123.master.v50.patch, 14123.master.v51.patch, 
> 14123.master.v52.patch, 14123.master.v54.patch, 14123.master.v56.patch, 
> 14123.master.v57.patch, 14123-master.v5.txt, 14123-master.v6.txt, 
> 14123-master.v7.txt, 14123-master.v8.txt, 14123-master.v9.txt, 14123-v14.txt, 
> HBASE-14123-for-7912-v1.patch, HBASE-14123-for-7912-v6.patch, 
> HBASE-14123-v10.patch, HBASE-14123-v11.patch, HBASE-14123-v12.patch, 
> HBASE-14123-v13.patch, HBASE-14123-v15.patch, HBASE-14123-v16.patch, 
> HBASE-14123-v1.patch, HBASE-14123-v2.patch, HBASE-14123-v3.patch, 
> HBASE-14123-v4.patch, HBASE-14123-v5.patch, HBASE-14123-v6.patch, 
> HBASE-14123-v7.patch, HBASE-14123-v9.patch
>
>
> Phase 2 umbrella JIRA. See HBASE-7912 for design document and description. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857263#comment-15857263
 ] 

Andrew Purtell commented on HBASE-17437:


Tests pass here too

> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14123) HBase Backup/Restore Phase 2

2017-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857253#comment-15857253
 ] 

Enis Soztutar commented on HBASE-14123:
---

bq. This JIRA: https://issues.apache.org/jira/browse/HBASE-16825 Contains 2 
follow up sub tasks and 3 related JIRAs.
This came up before, but still the way that this effort is tracked is not 
correct. 
There are a couple of things that we are not following in the "standard way". 
This is important because otherwise reviewers like me cannot figure out what is 
committed and where, and what is to follow once the merge is complete. 

First, when an issue is resolved with a fixVersion set, it means that the patch 
itself is available in the branch that will be released as that version. 
Looking at all 9 subtasks of HBASE-16825, they are marked as fixed with 
fixVersion 2.0.0. This is wrong, since they are not committed in the master 
code (or anywhere for that matter). 
If an issue is committed to a branch, it should be resolved with a fixVersion 
equal to the branch name. Once the branch is merged, then we have the option to 
add fixVersions. 

Second, when doing patch iterations, we do not track "addressing comments" with 
individual jiras. In that sense, the first 9 subtasks of HBASE-HBASE-16825 
seems completely unnecessary. You should open a jira only in cases where the 
comments are not addressed, and we need a jira for tracking those changes 
afterwards. 

Third, looking at the Phase 3 jiras, bunch of them are in fixed state, again 
with fixVersion of 2.0.0. Are any of these jiras committed anywhere? Some of 
them I can find it in the branch. [~ted_yu] why are these jiras marked with 
2.0.0? 

Can you guys please mark all the jiras in the branch with the 
fixVersion=HBASE-7912 (and not 2.0.0). Once the merge, we can correct the 
fixVersions. 

Where is the jira for moving to the backup module? 

Can you please check or comment on the findbugs warnings and test failures. 
Also please do the write up (that I asked above) for the other reviewers.  

> HBase Backup/Restore Phase 2
> 
>
> Key: HBASE-14123
> URL: https://issues.apache.org/jira/browse/HBASE-14123
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: 14123-master.v14.txt, 14123-master.v15.txt, 
> 14123-master.v16.txt, 14123-master.v17.txt, 14123-master.v18.txt, 
> 14123-master.v19.txt, 14123-master.v20.txt, 14123-master.v21.txt, 
> 14123-master.v24.txt, 14123-master.v25.txt, 14123-master.v27.txt, 
> 14123-master.v28.txt, 14123-master.v29.full.txt, 14123-master.v2.txt, 
> 14123-master.v30.txt, 14123-master.v31.txt, 14123-master.v32.txt, 
> 14123-master.v33.txt, 14123-master.v34.txt, 14123-master.v35.txt, 
> 14123-master.v36.txt, 14123-master.v37.txt, 14123-master.v38.txt, 
> 14123.master.v39.patch, 14123-master.v3.txt, 14123.master.v40.patch, 
> 14123.master.v41.patch, 14123.master.v42.patch, 14123.master.v44.patch, 
> 14123.master.v45.patch, 14123.master.v46.patch, 14123.master.v48.patch, 
> 14123.master.v49.patch, 14123.master.v50.patch, 14123.master.v51.patch, 
> 14123.master.v52.patch, 14123.master.v54.patch, 14123.master.v56.patch, 
> 14123.master.v57.patch, 14123-master.v5.txt, 14123-master.v6.txt, 
> 14123-master.v7.txt, 14123-master.v8.txt, 14123-master.v9.txt, 14123-v14.txt, 
> HBASE-14123-for-7912-v1.patch, HBASE-14123-for-7912-v6.patch, 
> HBASE-14123-v10.patch, HBASE-14123-v11.patch, HBASE-14123-v12.patch, 
> HBASE-14123-v13.patch, HBASE-14123-v15.patch, HBASE-14123-v16.patch, 
> HBASE-14123-v1.patch, HBASE-14123-v2.patch, HBASE-14123-v3.patch, 
> HBASE-14123-v4.patch, HBASE-14123-v5.patch, HBASE-14123-v6.patch, 
> HBASE-14123-v7.patch, HBASE-14123-v9.patch
>
>
> Phase 2 umbrella JIRA. See HBASE-7912 for design document and description. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857249#comment-15857249
 ] 

Hudson commented on HBASE-17275:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #88 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/88/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 2aaf7851a4de28e40b1a0d641d8fc98e54f5342d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,3000

[jira] [Updated] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-17381:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.2.5
   1.3.1
   1.4.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Committed to branch-1.2+.  This would require some substantial rework for 
branch-1.1.

Thanks for the fix [~openinx]!

> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857224#comment-15857224
 ] 

Enis Soztutar commented on HBASE-17437:
---

I was checking the branch-1 patch. 
We are not doing this in the master patch: 
{code}
+checkRootDir(this.rootdir, conf, this.fs, HConstants.HBASE_DIR, 
HBASE_DIR_PERMS);
+// if the log directory is different from root, check if it exists
+if (!this.walRootDir.equals(this.rootdir)) {
+  checkRootDir(this.walRootDir, conf, this.walFs, 
HFileSystem.HBASE_WAL_DIR, HBASE_WAL_DIR_PERMS);
+}
{code}
we only check for the root dir. Is this needed in both? Seems likely. We can do 
an addendum to master if needed. 

Do we need to check both the root dir, and the wal dir here (and in the checks 
below)? 
{code}
-if (this.fsOk) {
+if (this.walFsOk) {
{code}





> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17278) [C++] Cell Scanner and KeyValueCodec for encoding cells in RPC

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857222#comment-15857222
 ] 

Hadoop QA commented on HBASE-17278:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} HBASE-17278 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851516/hbase-17287_v6.patch |
| JIRA Issue | HBASE-17278 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5623/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> [C++] Cell Scanner and KeyValueCodec for encoding cells in RPC
> --
>
> Key: HBASE-17278
> URL: https://issues.apache.org/jira/browse/HBASE-17278
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-17278.HBASE-14850.v1.patch, 
> HBASE-17278.HBASE-14850.v2.patch, HBASE-17278.HBASE-14850.v3.patch, 
> HBASE-17278.HBASE-14850.v4.patch, hbase-17278_v5.patch, hbase-17287_v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857221#comment-15857221
 ] 

Hudson commented on HBASE-17275:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #108 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/108/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 6391c53e9f47355ced07758ff08879cdcbf49d15)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30

[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857220#comment-15857220
 ] 

Duo Zhang commented on HBASE-17599:
---

Any other concerns? Thanks.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16731) Inconsistent results from the Get/Scan if we use the empty FilterList

2017-02-07 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857218#comment-15857218
 ] 

Pankaj Kumar commented on HBASE-16731:
--

Can we backport this fix to branch-1/1.x.x?

> Inconsistent results from the Get/Scan if we use the empty FilterList
> -
>
> Key: HBASE-16731
> URL: https://issues.apache.org/jira/browse/HBASE-16731
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16731.v0.patch, HBASE-16731.v1.patch, 
> HBASE-16731.v2.patch, HBASE-16731.v3.patch
>
>
> RSRpcServices#get() converts the Get to Scan without 
> scan#setLoadColumnFamiliesOnDemand. It causes that the result retrieved from 
> Get and Scan will be different if we use the empty filter. Scan doesn't 
> return any data but Get does.
> see [HBASE-16729 |https://issues.apache.org/jira/browse/HBASE-16729]
> Any comments? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17590) Drop cache hint should work for StoreFile write path

2017-02-07 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-17590:
--
Summary: Drop cache hint should work for StoreFile write path  (was: Drop 
cache hint should work for StoreFileWriter)

> Drop cache hint should work for StoreFile write path
> 
>
> Key: HBASE-17590
> URL: https://issues.apache.org/jira/browse/HBASE-17590
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Ashu Pachauri
>
> We have this in the code right now.
> {noformat}
> public Builder withShouldDropCacheBehind(boolean 
> shouldDropCacheBehind/*NOT USED!!*/) {
>   // TODO: HAS NO EFFECT!!! FIX!!
>   return this;
> }
> {noformat}
> Creating jira to track it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17590) Drop cache hint should work for StoreFileWriter

2017-02-07 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857162#comment-15857162
 ] 

Ashu Pachauri commented on HBASE-17590:
---

Picking this up

> Drop cache hint should work for StoreFileWriter
> ---
>
> Key: HBASE-17590
> URL: https://issues.apache.org/jira/browse/HBASE-17590
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Ashu Pachauri
>
> We have this in the code right now.
> {noformat}
> public Builder withShouldDropCacheBehind(boolean 
> shouldDropCacheBehind/*NOT USED!!*/) {
>   // TODO: HAS NO EFFECT!!! FIX!!
>   return this;
> }
> {noformat}
> Creating jira to track it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-17590) Drop cache hint should work for StoreFileWriter

2017-02-07 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri reassigned HBASE-17590:
-

Assignee: Ashu Pachauri

> Drop cache hint should work for StoreFileWriter
> ---
>
> Key: HBASE-17590
> URL: https://issues.apache.org/jira/browse/HBASE-17590
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Ashu Pachauri
>
> We have this in the code right now.
> {noformat}
> public Builder withShouldDropCacheBehind(boolean 
> shouldDropCacheBehind/*NOT USED!!*/) {
>   // TODO: HAS NO EFFECT!!! FIX!!
>   return this;
> }
> {noformat}
> Creating jira to track it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857135#comment-15857135
 ] 

Hudson commented on HBASE-17275:


FAILURE: Integrated in Jenkins build HBase-1.2-IT #591 (See 
[https://builds.apache.org/job/HBase-1.2-IT/591/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 2aaf7851a4de28e40b1a0d641d8fc98e54f5342d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,

[jira] [Updated] (HBASE-17585) [C++] Use KVCodec in the RPC request/response

2017-02-07 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17585:
--
Attachment: hbase-17585_v2.patch

v2 patch which is significantly different than the v1. 
 - Service is changed from {{unique_ptr to Response}} to  
{{unique_ptr to unique_ptr}}. This is needed since we do not 
want to copy construct the Response when passing around.
- Codec is initialized from higher levels, then passed to RpcClient and below 
layers (this is needed for the BUCK module dependency). 
- Response has a CellScanner now coming from the rpc layers
- ResponseConverter can create Results from GetResponse or ScanResponse with 
optional CellScanner. 

[~sudeeps] do you mind taking a quick look. 
[~xiaobingo] this is changing some of the RPC layer signatures, FYI. 

> [C++] Use KVCodec in the RPC request/response
> -
>
> Key: HBASE-17585
> URL: https://issues.apache.org/jira/browse/HBASE-17585
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
> Attachments: hbase-17585_v1.patch, hbase-17585_v2.patch
>
>
> After HBASE-17278, we need to start using the KVCodec in RPC, so that we do 
> not serialize Cells via PB back and forth. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-17585) [C++] Use KVCodec in the RPC request/response

2017-02-07 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar reassigned HBASE-17585:
-

Assignee: Enis Soztutar

> [C++] Use KVCodec in the RPC request/response
> -
>
> Key: HBASE-17585
> URL: https://issues.apache.org/jira/browse/HBASE-17585
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: hbase-17585_v1.patch, hbase-17585_v2.patch
>
>
> After HBASE-17278, we need to start using the KVCodec in RPC, so that we do 
> not serialize Cells via PB back and forth. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857125#comment-15857125
 ] 

Lars Hofhansl commented on HBASE-17609:
---

Let's do one thing at a time. The basic functionality, then we can make it look 
nice (unless the making-it-look-nice is simple).

> Allow for region merging in the UI 
> ---
>
> Key: HBASE-17609
> URL: https://issues.apache.org/jira/browse/HBASE-17609
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0, 1.4.0
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-17609-branch-1.3.patch, HBASE-17609.patch
>
>
> HBASE-49 discussed having the ability to merge regions through the HBase UI, 
> but online region merging wasn't around back then. 
> I have created additional form fields for the table.jsp where you can pass in 
> two encoded region names (must be adjacent regions) and a merge can be called 
> through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-17565.

Resolution: Fixed

> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.addendum, 17565.v1.txt, 17565.v2.txt, 
> 17565.v3.txt, 17565.v4.txt, 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16942) FavoredNodes - Balancer improvements

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857112#comment-15857112
 ] 

Hadoop QA commented on HBASE-16942:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 25s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 30s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 115m 4s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851486/HBASE-16942.master.003.patch
 |
| JIRA Issue | HBASE-16942 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d312b7993f37 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c55fce0 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5621/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5621/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FavoredNodes - Balancer improvements
> 
>
> Key: HBASE-16942
> URL: https://issues.apache.org/jira/browse/HBASE-16942
> Project: HBase
>  Issue Type: Sub-task
>  Components: FavoredNodes
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-16942.master.001.patch, 
> HBASE-16

[jira] [Updated] (HBASE-17278) [C++] Cell Scanner and KeyValueCodec for encoding cells in RPC

2017-02-07 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17278:
--
Attachment: hbase-17287_v6.patch

v6 patch moves the classes to serde module. This was needed because of BUCK 
build module dependencies (connection module cannot use core module). 



> [C++] Cell Scanner and KeyValueCodec for encoding cells in RPC
> --
>
> Key: HBASE-17278
> URL: https://issues.apache.org/jira/browse/HBASE-17278
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-17278.HBASE-14850.v1.patch, 
> HBASE-17278.HBASE-14850.v2.patch, HBASE-17278.HBASE-14850.v3.patch, 
> HBASE-17278.HBASE-14850.v4.patch, hbase-17278_v5.patch, hbase-17287_v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17612) [C++] Set client version info in RPC header

2017-02-07 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17612:
--
Attachment: hbase-17612_v1.patch

v1 patch. When we hook the build system, we maybe able to clean some of the 
code generation / copying bits, but this should do for now. 

> [C++] Set client version info in RPC header
> ---
>
> Key: HBASE-17612
> URL: https://issues.apache.org/jira/browse/HBASE-17612
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: hbase-17612_v1.patch
>
>
> We need to set the RPC header version info in the RPC Header to use the 
> KVCodec in get path. 
> This is needed after HBASE-13158 (and a couple others where they check the 
> client version). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857099#comment-15857099
 ] 

Hudson commented on HBASE-17275:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1839 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1839/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev a8158b550053aa72815577f6e77786ed590c4817)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,

[jira] [Commented] (HBASE-17574) Clean up how to run tests under hbase-spark module

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857090#comment-15857090
 ] 

Hudson commented on HBASE-17574:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2463 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2463/])
HBASE-17574 Clean up how to run tests under hbase-spark module (Yi (jerryjch: 
rev 8088aa3733539a09cb258f98cb12c1d96ea2463a)
* (edit) hbase-spark/README.txt
* (edit) hbase-spark/pom.xml


> Clean up how to run tests under hbase-spark module 
> ---
>
> Key: HBASE-17574
> URL: https://issues.apache.org/jira/browse/HBASE-17574
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBase-17574-V1.patch, HBase-17574-V2.patch
>
>
> In master brunch, the test of hbase-spark module needs clean-up.
> I think we need to let hbase-spark follow the rules that exist in the whole 
> hbase project
> 1. In hbase-spark, all the scala test cases are regarded as integration test, 
> i.e. we need to go to hbase-spark folder to use mvn verify to run the test 
> case.  I think these tests had better to be regard as unit test for the 
> following reasons:
> (1) All the scala test are very small, most of them can be finished within 
> 20s.
> (2) Integration test usually  put into hbase-it module, not in its own module.
> (3) Hadoop QA could not run those scala test in hbase-spark, I guess Hadoop 
> QA will only run mvn test under root dir, however hbase-spark need mvn verify.
> (4) From its pom.xml below, you can see that, both 
> integration-test and test point to same 
> test. From MVN reference, 
> http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Built-in_Lifecycle_Bindings,
>  we know that if a goal is bound to one or more build phases, that goal will 
> be called in all those phases. it means that mvn test and mvn 
> integration-test will do same thing, however true in 
> test phase just disable the mvn test command.  It is uncommon to have define 
> like that. 
> {code}
>   
> 
> test
> test
> 
> test
> 
> 
> true
> 
> 
> 
> integration-test
> integration-test
> 
> test
> 
> 
> Integration-Test
> 
> -Xmx1536m -XX:MaxPermSize=512m 
> -XX:ReservedCodeCacheSize=512m
> 
> false
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857089#comment-15857089
 ] 

Hudson commented on HBASE-17565:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2463 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2463/])
HBASE-17565 StochasticLoadBalancer may incorrectly skip balancing due to 
(tedyu: rev d0498d979cdb9aa17065c27572f35a80fc7d59c9)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java


> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.addendum, 17565.v1.txt, 17565.v2.txt, 
> 17565.v3.txt, 17565.v4.txt, 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15995) Separate replication WAL reading from shipping

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857088#comment-15857088
 ] 

Hudson commented on HBASE-15995:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2463 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2463/])
HBASE-15995 Separate replication WAL reading from shipping (tedyu: rev 
c55fce00f3a5757c706ce50ed42980c7f6c4b97b)
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ClusterMarkingEntryFilter.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReaderThread.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationWALReaderManager.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java
* (delete) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationWALReaderManager.java


> Separate replication WAL reading from shipping
> --
>
> Key: HBASE-15995
> URL: https://issues.apache.org/jira/browse/HBASE-15995
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 2.0.0
>
> Attachments: HBASE-15995.master.v1.patch, 
> HBASE-15995.master.v2.patch, HBASE-15995.master.v3.patch, 
> HBASE-15995.master.v4.patch, HBASE-15995.master.v6.patch, 
> HBASE-15995.master.v7.patch, replicationV1_100ms_delay.png, 
> replicationV2_100ms_delay.png
>
>
> Currently ReplicationSource reads edits from the WAL and ships them in the 
> same thread.
> By breaking out the reading from the shipping, we can introduce greater 
> parallelism and lay the foundation for further refactoring to a pipelined, 
> streaming model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857086#comment-15857086
 ] 

Hudson commented on HBASE-17275:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #96 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/96/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 6391c53e9f47355ced07758ff08879cdcbf49d15)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,3000

[jira] [Updated] (HBASE-17605) Refactor procedure framework code

2017-02-07 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-17605:
-
Attachment: HBASE-17605.master.003.patch

> Refactor procedure framework code
> -
>
> Key: HBASE-17605
> URL: https://issues.apache.org/jira/browse/HBASE-17605
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-17605.master.001.patch, 
> HBASE-17605.master.002.patch, HBASE-17605.master.003.patch, 
> without-patch.png, with-patch.png
>
>
> - Moved locks out of MasterProcedureScheduler#Queue. One Queue object is 
> used for each namespace/table, which aren't more than 100. So we don't 
> complexity arising from all functionalities being in one place. 
> MasterProcedureLocking#Lock is the new locking class.
> - Removed NamespaceQueue because it wasn't being used as Queue 
> (add,peek,poll,etc functions threw UnsupportedOperationException). It's was 
> only used for locks on namespaces. Now that locks have been moved out of 
> Queue class, it's not needed anymore.
> - Remoed RegionEvent which was there only for locking on regions. 
> Tables/namespaces used locking from Queue class and regions couldn't (there 
> are no separate proc queue at region level), hence the redundance. Now that 
> locking is separate, we can use the same for regions too.
> - Removed QueueInterface class. No declarations, except one 
> implementaion, which makes the point of having an interface moot.
> - Removed QueueImpl, which was the only concrete implementation of 
> abstract Queue class. Moved functions to Queue class itself to avoid 
> unnecessary level in inheritance hierarchy.
> - Removed ProcedureEventQueue class which was just a wrapper around 
> ArrayDeque class.
> - Encapsulated table priority related stuff in a single class.
> - Removed some unused functions.
> *Perf using MasterProcedureSchedulerPerformanceEvaluation*
> 10 threads, 10M ops, 5 tables
> Without patch:
> 10 regions/table : #yield 584980, addBack time 4.1s, poll time 10s
> 1M regions/table: #yield 16, addBack time 5.9s, poll time 12.9s
> With patch:
> 10 regions/table : #yield 86413, addBack time 4.1s, poll time 8.2s
> 1M regions/table: #yield 9, addBack time 6s, poll time 13s
> *Memory footprint and CPU* (don't compare GC as that depends on life of 
> objects which will be much longer in real-world scenarios)
> Without patch
> !without-patch.png|width=800!
> With patch
> !with-patch.png|width=800!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857060#comment-15857060
 ] 

Hudson commented on HBASE-17275:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1923 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1923/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev a8158b550053aa72815577f6e77786ed590c4817)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,

[jira] [Commented] (HBASE-17612) [C++] Set client version info in RPC header

2017-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857022#comment-15857022
 ] 

Enis Soztutar commented on HBASE-17612:
---

We need this for HBASE-17585 and HBASE-17278. 

> [C++] Set client version info in RPC header
> ---
>
> Key: HBASE-17612
> URL: https://issues.apache.org/jira/browse/HBASE-17612
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> We need to set the RPC header version info in the RPC Header to use the 
> KVCodec in get path. 
> This is needed after HBASE-13158 (and a couple others where they check the 
> client version). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17612) [C++] Set client version info in RPC header

2017-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857020#comment-15857020
 ] 

Enis Soztutar commented on HBASE-17612:
---

[~saint@gmail.com] this is for the C++ client. Sorry for the confusion. 

> [C++] Set client version info in RPC header
> ---
>
> Key: HBASE-17612
> URL: https://issues.apache.org/jira/browse/HBASE-17612
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> We need to set the RPC header version info in the RPC Header to use the 
> KVCodec in get path. 
> This is needed after HBASE-13158 (and a couple others where they check the 
> client version). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856985#comment-15856985
 ] 

Hudson commented on HBASE-17275:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #825 (See 
[https://builds.apache.org/job/HBase-1.3-IT/825/])
HBASE-17275 Assign timeout may cause region to be unassigned forever (tedyu: 
rev 2aaf7851a4de28e40b1a0d641d8fc98e54f5342d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,

[jira] [Commented] (HBASE-17612) [C++] Set client version info in RPC header

2017-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856958#comment-15856958
 ] 

stack commented on HBASE-17612:
---

We do this already [~enis] or we don't do it enough?

See VersionInfo field in ConnectionHeader...

// This is sent on connection setup after the connection preamble is sent.
message ConnectionHeader {
  optional UserInformation user_info = 1;
  optional string service_name = 2;
  // Cell block codec we will use sending over optional cell blocks.  Server 
throws exception
  // if cannot deal.  Null means no codec'ing going on so we are pb all the 
time (SLOW!!!)
  optional string cell_block_codec_class = 3;
  // Compressor we will use if cell block is compressed.  Server will throw 
exception if not supported.
  // Class must implement hadoop's CompressionCodec Interface.  Can't compress 
if no codec.
  optional string cell_block_compressor_class = 4;
  optional VersionInfo version_info = 5;
  // the transformation for rpc AES encryption with Apache Commons Crypto
  optional string rpc_crypto_cipher_transformation = 6;
}

> [C++] Set client version info in RPC header
> ---
>
> Key: HBASE-17612
> URL: https://issues.apache.org/jira/browse/HBASE-17612
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> We need to set the RPC header version info in the RPC Header to use the 
> KVCodec in get path. 
> This is needed after HBASE-13158 (and a couple others where they check the 
> client version). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16942) FavoredNodes - Balancer improvements

2017-02-07 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-16942:
-
Attachment: HBASE-16942.master.003.patch

> FavoredNodes - Balancer improvements
> 
>
> Key: HBASE-16942
> URL: https://issues.apache.org/jira/browse/HBASE-16942
> Project: HBase
>  Issue Type: Sub-task
>  Components: FavoredNodes
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-16942.master.001.patch, 
> HBASE-16942.master.002.patch, HBASE-16942.master.003.patch, 
> HBASE_16942_rough_draft.patch
>
>
> This deals with the balancer based enhancements to favored nodes patch as 
> discussed in HBASE-15532.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856926#comment-15856926
 ] 

Lars George commented on HBASE-17609:
-

What would be nice is to have checkboxes next to the regions, tick what you 
want (for merging usually regions that are next to each other) and then merge 
(or split or compact) the selected regions. The current page is sooo 1999. 

> Allow for region merging in the UI 
> ---
>
> Key: HBASE-17609
> URL: https://issues.apache.org/jira/browse/HBASE-17609
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0, 1.4.0
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-17609-branch-1.3.patch, HBASE-17609.patch
>
>
> HBASE-49 discussed having the ability to merge regions through the HBase UI, 
> but online region merging wasn't around back then. 
> I have created additional form fields for the table.jsp where you can pass in 
> two encoded region names (must be adjacent regions) and a merge can be called 
> through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17612) [C++] Set client version info in RPC header

2017-02-07 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-17612:
-

 Summary: [C++] Set client version info in RPC header
 Key: HBASE-17612
 URL: https://issues.apache.org/jira/browse/HBASE-17612
 Project: HBase
  Issue Type: Sub-task
Reporter: Enis Soztutar
Assignee: Enis Soztutar


We need to set the RPC header version info in the RPC Header to use the KVCodec 
in get path. 

This is needed after HBASE-13158 (and a couple others where they check the 
client version). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856895#comment-15856895
 ] 

Hadoop QA commented on HBASE-17609:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 31s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 21s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 109m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851454/HBASE-17609.patch |
| JIRA Issue | HBASE-17609 |
| Optional Tests |  asflicense  javac  javadoc  unit  |
| uname | Linux 312b2f8e9f19 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6c5eec2 |
| Default Java | 1.8.0_121 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5620/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5620/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Allow for region merging in the UI 
> ---
>
> Key: HBASE-17609
> URL: https://issues.apache.org/jira/browse/HBASE-17609
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0, 1.4.0
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-17609-branch-1.3.patch, HBASE-17609.patch
>
>
> HBASE-49 discussed having the ability to merge regions through the HBase UI, 
> but online region merging wasn't around back then. 
> I have created additional form fields for the table.jsp where you can pass in 
> two encoded region names (must be adjacent regions) and a merge can be called 
> through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17442) Move most of the replication related classes to hbase-server package

2017-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856859#comment-15856859
 ] 

Enis Soztutar commented on HBASE-17442:
---

Sorry missed this. Agreed with the plan. However, if the replication module 
seems too difficult to do before 2.0, I think we should still move them to 
hbase-server at least. Having these in the client should be avoided. 

> Move most of the replication related classes to hbase-server package
> 
>
> Key: HBASE-17442
> URL: https://issues.apache.org/jira/browse/HBASE-17442
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, Replication
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: 0001-hbase-replication-module.patch
>
>
> After the replication requests are routed through master, replication 
> implementation details didn't need be exposed to client. We should move most 
> of the replication related classes to hbase-server package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17611) Thrift 2 per-call latency metrics are capped at ~ 2 seconds

2017-02-07 Thread Gary Helmling (JIRA)
Gary Helmling created HBASE-17611:
-

 Summary: Thrift 2 per-call latency metrics are capped at ~ 2 
seconds
 Key: HBASE-17611
 URL: https://issues.apache.org/jira/browse/HBASE-17611
 Project: HBase
  Issue Type: Bug
  Components: metrics, Thrift
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 1.3.1


Thrift 2 latency metrics are measured in nanoseconds.  However, the duration 
used for per-method latencies is cast to an int, meaning the values are capped 
at 2.147 seconds.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17275:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.1.9
   1.2.5
   1.3.1
   1.4.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Allan.

Thanks for the review, Stephen.

> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,1479780976834}
> {noformat}
> In the meantime, master still try to re-assign this region in another thread. 
> Master 

[jira] [Updated] (HBASE-17275) Assign timeout may cause region to be unassigned forever

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17275:
---
Summary: Assign timeout may cause region to be unassigned forever  (was: 
Assign timeout cause region unassign forever)

> Assign timeout may cause region to be unassigned forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,1479780976834}
> {noformat}
> In the meantime, master still try to re-assign this region in another thread. 
> Master first close this region in case of multi assign, then change the state 
> of this region change from PENDING_OPEN >OFFLINE>PENDING_OPEN. Its RIT node 
> in zk was also transitioned to OFFLINE, as 

[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Zach York (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856770#comment-15856770
 ] 

Zach York commented on HBASE-17437:
---

[~stack] Sorry for the late response on the online meeting times. I could do 
Thursday, Friday, or next week. I can make my schedule work for most of the 
times. Is there any time that works best for your coworkers and youself?

Thanks! 

> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-15995) Separate replication WAL reading from shipping

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-15995:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the patch, Vincent.

Thanks all for the reviews.

> Separate replication WAL reading from shipping
> --
>
> Key: HBASE-15995
> URL: https://issues.apache.org/jira/browse/HBASE-15995
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 2.0.0
>
> Attachments: HBASE-15995.master.v1.patch, 
> HBASE-15995.master.v2.patch, HBASE-15995.master.v3.patch, 
> HBASE-15995.master.v4.patch, HBASE-15995.master.v6.patch, 
> HBASE-15995.master.v7.patch, replicationV1_100ms_delay.png, 
> replicationV2_100ms_delay.png
>
>
> Currently ReplicationSource reads edits from the WAL and ships them in the 
> same thread.
> By breaking out the reading from the shipping, we can introduce greater 
> parallelism and lay the foundation for further refactoring to a pipelined, 
> streaming model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17574) Clean up how to run tests under hbase-spark module

2017-02-07 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-17574:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Clean up how to run tests under hbase-spark module 
> ---
>
> Key: HBASE-17574
> URL: https://issues.apache.org/jira/browse/HBASE-17574
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBase-17574-V1.patch, HBase-17574-V2.patch
>
>
> In master brunch, the test of hbase-spark module needs clean-up.
> I think we need to let hbase-spark follow the rules that exist in the whole 
> hbase project
> 1. In hbase-spark, all the scala test cases are regarded as integration test, 
> i.e. we need to go to hbase-spark folder to use mvn verify to run the test 
> case.  I think these tests had better to be regard as unit test for the 
> following reasons:
> (1) All the scala test are very small, most of them can be finished within 
> 20s.
> (2) Integration test usually  put into hbase-it module, not in its own module.
> (3) Hadoop QA could not run those scala test in hbase-spark, I guess Hadoop 
> QA will only run mvn test under root dir, however hbase-spark need mvn verify.
> (4) From its pom.xml below, you can see that, both 
> integration-test and test point to same 
> test. From MVN reference, 
> http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Built-in_Lifecycle_Bindings,
>  we know that if a goal is bound to one or more build phases, that goal will 
> be called in all those phases. it means that mvn test and mvn 
> integration-test will do same thing, however true in 
> test phase just disable the mvn test command.  It is uncommon to have define 
> like that. 
> {code}
>   
> 
> test
> test
> 
> test
> 
> 
> true
> 
> 
> 
> integration-test
> integration-test
> 
> test
> 
> 
> Integration-Test
> 
> -Xmx1536m -XX:MaxPermSize=512m 
> -XX:ReservedCodeCacheSize=512m
> 
> false
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17574) Clean up how to run tests under hbase-spark module

2017-02-07 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856740#comment-15856740
 ] 

Jerry He commented on HBASE-17574:
--

Pushed to master.
Thanks for the reviews.
Thanks for the patch, [~easyliangjob].  A reminder to use 'git format-patch' to 
generate patch in the future.

> Clean up how to run tests under hbase-spark module 
> ---
>
> Key: HBASE-17574
> URL: https://issues.apache.org/jira/browse/HBASE-17574
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBase-17574-V1.patch, HBase-17574-V2.patch
>
>
> In master brunch, the test of hbase-spark module needs clean-up.
> I think we need to let hbase-spark follow the rules that exist in the whole 
> hbase project
> 1. In hbase-spark, all the scala test cases are regarded as integration test, 
> i.e. we need to go to hbase-spark folder to use mvn verify to run the test 
> case.  I think these tests had better to be regard as unit test for the 
> following reasons:
> (1) All the scala test are very small, most of them can be finished within 
> 20s.
> (2) Integration test usually  put into hbase-it module, not in its own module.
> (3) Hadoop QA could not run those scala test in hbase-spark, I guess Hadoop 
> QA will only run mvn test under root dir, however hbase-spark need mvn verify.
> (4) From its pom.xml below, you can see that, both 
> integration-test and test point to same 
> test. From MVN reference, 
> http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Built-in_Lifecycle_Bindings,
>  we know that if a goal is bound to one or more build phases, that goal will 
> be called in all those phases. it means that mvn test and mvn 
> integration-test will do same thing, however true in 
> test phase just disable the mvn test command.  It is uncommon to have define 
> like that. 
> {code}
>   
> 
> test
> test
> 
> test
> 
> 
> true
> 
> 
> 
> integration-test
> integration-test
> 
> test
> 
> 
> Integration-Test
> 
> -Xmx1536m -XX:MaxPermSize=512m 
> -XX:ReservedCodeCacheSize=512m
> 
> false
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17565:
---
Attachment: 17565.addendum

Addendum which resets min cost to the default value before returning from 
testNeedBalance()

> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.addendum, 17565.v1.txt, 17565.v2.txt, 
> 17565.v3.txt, 17565.v4.txt, 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2017-02-07 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856716#comment-15856716
 ] 

Gary Helmling commented on HBASE-17381:
---

+1 on patch v3.  Thanks for the fix!  I'll commit shortly.

> ReplicationSourceWorkerThread can die due to unhandled exceptions
> -
>
> Key: HBASE-17381
> URL: https://issues.apache.org/jira/browse/HBASE-17381
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Gary Helmling
>Assignee: huzheng
> Attachments: HBASE-17381.patch, HBASE-17381.v1.patch, 
> HBASE-17381.v2.patch, HBASE-17381.v3.patch
>
>
> If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
> run() method (for example failure to allocate direct memory for the DFS 
> client), the exception will be logged by the UncaughtExceptionHandler, but 
> the thread will also die and the replication queue will back up indefinitely 
> until the Regionserver is restarted.
> We should make sure the worker thread is resilient to all exceptions that it 
> can actually handle.  For those that it really can't, it seems better to 
> abort the regionserver rather than just allow replication to stop with 
> minimal signal.
> Here is a sample exception:
> {noformat}
> ERROR regionserver.ReplicationSource: Unexpected exception in 
> ReplicationSourceWorkerThread, 
> currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:693)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
> at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856713#comment-15856713
 ] 

Ted Yu commented on HBASE-17565:


In 
https://builds.apache.org/job/HBase-TRUNK_matrix/2462/jdk=JDK%201.8%20(latest),label=Hadoop/testReport/org.apache.hadoop.hbase.master.balancer/TestStochasticLoadBalancer/testSmallCluster/
 :
{code}
2017-02-07 19:49:15,444 INFO  [main] balancer.StochasticLoadBalancer(296): 
Skipping load balancing because balanced cluster; total cost is 0.0, sum 
multiplier is 1062.0 min cost which need balance is 1.0
{code}
1.0 should only be used for testNeedBalance().
It turns out that testNeedBalance() should properly reset the value for min 
cost.

> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.v1.txt, 17565.v2.txt, 17565.v3.txt, 17565.v4.txt, 
> 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-17565:


> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.v1.txt, 17565.v2.txt, 17565.v3.txt, 17565.v4.txt, 
> 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-17609:
---
Status: Patch Available  (was: Open)

> Allow for region merging in the UI 
> ---
>
> Key: HBASE-17609
> URL: https://issues.apache.org/jira/browse/HBASE-17609
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0, 1.4.0
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-17609-branch-1.3.patch, HBASE-17609.patch
>
>
> HBASE-49 discussed having the ability to merge regions through the HBase UI, 
> but online region merging wasn't around back then. 
> I have created additional form fields for the table.jsp where you can pass in 
> two encoded region names (must be adjacent regions) and a merge can be called 
> through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17610) Replication source ageOfLastShipped value not updated properly in below scenarios

2017-02-07 Thread Maddineni Sukumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maddineni Sukumar updated HBASE-17610:
--
Fix Version/s: (was: 1.1.9)
   (was: 1.2.5)
   (was: 1.3.1)
   (was: 1.4.0)
   (was: 2.0.0)

> Replication source ageOfLastShipped value not updated properly in below 
> scenarios
> -
>
> Key: HBASE-17610
> URL: https://issues.apache.org/jira/browse/HBASE-17610
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.2.4, 1.1.8
>Reporter: Maddineni Sukumar
>Assignee: Maddineni Sukumar
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AgeOfLastShipped value not updated properly when CrossRealm trust is broken 
> between primary and its peer cluster(Secured clusters). 
> 1. Region Server running and cross realm trust broken
> 2. Cross realm broken and then RegionServer restarted



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17610) Replication source ageOfLastShipped value not updated properly in below scenarios

2017-02-07 Thread Maddineni Sukumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maddineni Sukumar updated HBASE-17610:
--
Description: 
AgeOfLastShipped value not updated properly when CrossRealm trust is broken 
between primary and its peer cluster(Secured clusters). 

1. Region Server running and cross realm trust broken
2. Cross realm broken and then RegionServer restarted

  was:
For every client connect call to RS we are logging below log line in DEBUG mode 
which is causing too many logs in RegionServer.  

2017-01-18 17:51:40,739 DEBUG [.reader=4,port=60020] 
security.HBaseSaslRpcServer - SASL server GSSAPI callback: setting 
canonicalized client ID: hbase/hostname-xxx@KDC


> Replication source ageOfLastShipped value not updated properly in below 
> scenarios
> -
>
> Key: HBASE-17610
> URL: https://issues.apache.org/jira/browse/HBASE-17610
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.2.4, 1.1.8
>Reporter: Maddineni Sukumar
>Assignee: Maddineni Sukumar
>Priority: Minor
>  Labels: easyfix
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AgeOfLastShipped value not updated properly when CrossRealm trust is broken 
> between primary and its peer cluster(Secured clusters). 
> 1. Region Server running and cross realm trust broken
> 2. Cross realm broken and then RegionServer restarted



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856698#comment-15856698
 ] 

Hudson commented on HBASE-17565:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2462 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2462/])
HBASE-17565 StochasticLoadBalancer may incorrectly skip balancing due to 
(tedyu: rev 9d8de85fa513dde34f2382af9221f93164248dc8)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer2.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java


> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.v1.txt, 17565.v2.txt, 17565.v3.txt, 17565.v4.txt, 
> 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17484) Add non cached version of OffheapKV for write path

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856697#comment-15856697
 ] 

Hudson commented on HBASE-17484:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2462 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2462/])
HBASE-17484 Add non cached version of OffheapKV for write path (Ram) 
(ramkrishna: rev 6c5eec249c6fcedd3d9f7fd810f89656647c1c67)
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* (edit) 
hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/RedundantKVGenerator.java
* (edit) 
hbase-common/src/test/java/org/apache/hadoop/hbase/TestOffheapKeyValue.java
* (edit) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexSeekerV1.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/OffheapKeyValue.java
* (edit) 
hbase-common/src/main/java/org/apache/hadoop/hbase/codec/KeyValueCodec.java
* (edit) 
hbase-common/src/test/java/org/apache/hadoop/hbase/io/TestTagCompressionContext.java


> Add non cached version of OffheapKV for write path
> --
>
> Key: HBASE-17484
> URL: https://issues.apache.org/jira/browse/HBASE-17484
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17484_1.patch, HBASE-17484_2.patch, 
> HBASE-17484.patch
>
>
> After running lot of different performance tests for various scenarios and 
> with multi threads we thought that is  better to have a version of OffheapKV 
> in write path that does not cache anything and its fixed_overhead is equal to 
> that in KeyValue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17610) Replication source ageOfLastShipped value not updated properly in below scenarios

2017-02-07 Thread Maddineni Sukumar (JIRA)
Maddineni Sukumar created HBASE-17610:
-

 Summary: Replication source ageOfLastShipped value not updated 
properly in below scenarios
 Key: HBASE-17610
 URL: https://issues.apache.org/jira/browse/HBASE-17610
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.2.4, 1.1.8
Reporter: Maddineni Sukumar
Assignee: Maddineni Sukumar
Priority: Minor
 Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.9


For every client connect call to RS we are logging below log line in DEBUG mode 
which is causing too many logs in RegionServer.  

2017-01-18 17:51:40,739 DEBUG [.reader=4,port=60020] 
security.HBaseSaslRpcServer - SASL server GSSAPI callback: setting 
canonicalized client ID: hbase/hostname-xxx@KDC



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-17609:
---
Attachment: HBASE-17609.patch
HBASE-17609-branch-1.3.patch

> Allow for region merging in the UI 
> ---
>
> Key: HBASE-17609
> URL: https://issues.apache.org/jira/browse/HBASE-17609
> Project: HBase
>  Issue Type: Task
>Affects Versions: 2.0.0, 1.4.0
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-17609-branch-1.3.patch, HBASE-17609.patch
>
>
> HBASE-49 discussed having the ability to merge regions through the HBase UI, 
> but online region merging wasn't around back then. 
> I have created additional form fields for the table.jsp where you can pass in 
> two encoded region names (must be adjacent regions) and a merge can be called 
> through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856603#comment-15856603
 ] 

Hadoop QA commented on HBASE-17001:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 7s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 20 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
32s {color} | {color:green} HBASE-16961 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} HBASE-16961 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 
46s {color} | {color:green} HBASE-16961 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
8s {color} | {color:green} HBASE-16961 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 18s 
{color} | {color:red} hbase-protocol-shaded in HBASE-16961 has 24 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} HBASE-16961 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 28 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 50s 
{color} | {color:red} The patch causes 306 errors with Hadoop v2.4.0. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 29s 
{color} | {color:red} The patch causes 306 errors with Hadoop v2.4.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 4s 
{color} | {color:red} The patch causes 306 errors with Hadoop v2.5.0. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 38s 
{color} | {color:red} The patch causes 306 errors with Hadoop v2.5.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 13s 
{color} | {color:red} The patch causes 306 errors with Hadoop v2.5.2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 31s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 24m 43s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
53s {color} | {color:green} The patch d

[jira] [Created] (HBASE-17609) Allow for region merging in the UI

2017-02-07 Thread churro morales (JIRA)
churro morales created HBASE-17609:
--

 Summary: Allow for region merging in the UI 
 Key: HBASE-17609
 URL: https://issues.apache.org/jira/browse/HBASE-17609
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0, 1.4.0
Reporter: churro morales
Assignee: churro morales


HBASE-49 discussed having the ability to merge regions through the HBase UI, 
but online region merging wasn't around back then. 

I have created additional form fields for the table.jsp where you can pass in 
two encoded region names (must be adjacent regions) and a merge can be called 
through the UI. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16179) Fix compilation errors when building hbase-spark against Spark 2.0

2017-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856491#comment-15856491
 ] 

Ted Yu commented on HBASE-16179:


[~busbey]
Hopefully you can have some time this week.

Thanks

> Fix compilation errors when building hbase-spark against Spark 2.0
> --
>
> Key: HBASE-16179
> URL: https://issues.apache.org/jira/browse/HBASE-16179
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: 16179.v0.txt, 16179.v10.txt, 16179.v11.txt, 
> 16179.v12.txt, 16179.v12.txt, 16179.v12.txt, 16179.v13.txt, 16179.v15.txt, 
> 16179.v16.txt, 16179.v18.txt, 16179.v19.txt, 16179.v19.txt, 16179.v1.txt, 
> 16179.v1.txt, 16179.v20.txt, 16179.v22.txt, 16179.v23.txt, 16179.v4.txt, 
> 16179.v5.txt, 16179.v7.txt, 16179.v8.txt, 16179.v9.txt
>
>
> I tried building hbase-spark module against Spark-2.0 snapshot and got the 
> following compilation errors:
> http://pastebin.com/bg3w247a
> Some Spark classes such as DataTypeParser and Logging are no longer 
> accessible to downstream projects.
> hbase-spark module should not depend on such classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856441#comment-15856441
 ] 

Ted Yu commented on HBASE-17437:


With latest patch for branch-1, the above test passes.

> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856414#comment-15856414
 ] 

Ted Yu commented on HBASE-17437:


Can you look at the following test failure which I encountered running test 
locally ?
{code}
testSeqIdsFromReplay(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
  Time elapsed: 2.996 sec  <<< ERROR!
java.lang.IllegalStateException: Illegal WAL directory specified. WAL 
directories are not permitted to be under the root directory if set.
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.initHRegion(TestHRegionReplayEvents.java:1660)
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.initHRegion(TestHRegionReplayEvents.java:1653)
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSeqIdsFromReplay(TestHRegionReplayEvents.java:1084)
{code}
Thanks

> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17521) Avoid stopping the load balancer in graceful stop

2017-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856433#comment-15856433
 ] 

stack commented on HBASE-17521:
---

Sounds good to me [~sandeepbits.g] You have a patch sir? Thanks.

> Avoid stopping the load balancer in graceful stop
> -
>
> Key: HBASE-17521
> URL: https://issues.apache.org/jira/browse/HBASE-17521
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> ... instead setting the regionserver in question to draining.
> [~sandeep.guggilam], FYI



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856431#comment-15856431
 ] 

Hudson commented on HBASE-17565:


FAILURE: Integrated in Jenkins build HBase-1.4 #616 (See 
[https://builds.apache.org/job/HBase-1.4/616/])
HBASE-17565 StochasticLoadBalancer may incorrectly skip balancing due to 
(tedyu: rev 5a0020e8674f83667e422bbcab2b8641342e2c67)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer2.java


> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.v1.txt, 17565.v2.txt, 17565.v3.txt, 17565.v4.txt, 
> 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17275) Assign timeout cause region unassign forever

2017-02-07 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856386#comment-15856386
 ] 

Stephen Yuan Jiang commented on HBASE-17275:


+1 V3 patch looks good.

> Assign timeout cause region unassign forever
> 
>
> Key: HBASE-17275
> URL: https://issues.apache.org/jira/browse/HBASE-17275
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-17275-branch-1.patch, 
> HBASE-17275-branch-1.v2.patch, HBASE-17275-branch-1.v3.patch
>
>
> This is a real cased happened in my test cluster.
> I have more 8000 regions to assign when I restart a cluster, but I only 
> started one regionserver. That means master need to assign these 8000 regions 
> to a single server(I know it is not right, but just for testing).
> The rs recevied the open region rpc and began to open regions. But the due to 
> the hugh number of regions, , master timeout the rpc call(but actually some 
> region had already opened) after 1 mins, as you can see from log 1.
> {noformat}
> 1. 2016-11-22 10:17:32,285 INFO  [example.org:30001.activeMasterManager] 
> master.AssignmentManager: Unable to communicate with 
> example.org,30003,1479780976834 in order to assign regions,
> java.io.IOException: Call to /example.org:30003 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, waitTime=60001, 
> operationTimeout=6 expired.
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1338)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:290)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:30177)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:1000)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1719)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2828)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2775)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:2876)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:646)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:493)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:796)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:188)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1711)
> at java.lang.Thread.run(Thread.java:756)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1, 
> waitTime=60001, operationTimeout=6 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:81)
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
> ... 14 more  
> {noformat}
> for the region 7e9aee32eb98a6fc9d503b99fc5f9615(like many others), after 
> timeout, master use a pool to re-assign them, as in 2
> {noformat}
> 2. 2016-11-22 10:17:32,303 DEBUG [AM.-pool1-t26] master.AssignmentManager: 
> Force region state offline {7e9aee32eb98a6fc9d503b99fc5f9615 
> state=PENDING_OPEN, ts=1479780992078, server=example.org,30003,1479780976834} 
>  
> {noformat}
> But, this region was actually opened on the rs, but (maybe) due to the hugh 
> pressure, the OPENED zk event recevied by master , as you can tell from 3, 
> "which is more than 15 seconds late"
> {noformat}
> 3. 2016-11-22 10:17:32,304 DEBUG [AM.ZK.Worker-pool2-t3] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=example.org,30003,1479780976834, 
> region=7e9aee32eb98a6fc9d503b99fc5f9615, which is more than 15 seconds late, 
> current_state={7e9aee32eb98a6fc9d503b99fc5f9615 state=PENDING_OPEN, 
> ts=1479780992078, server=example.org,30003,1479780976834}
> {noformat}
> In the meantime, master still try to re-assign this region in another thread. 
> Master first close this region in case of multi assign, then change the state 
> of this region change from PENDING_OPEN >OFFLINE>PENDING_OPEN. Its RIT node 
> in zk was also transitioned to OFFLINE, as in 4,5,6,7
> {noformat}
> 4. 2016-11-22 10

[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory

2017-02-07 Thread Zach York (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856383#comment-15856383
 ] 

Zach York commented on HBASE-17437:
---

This test passes locally. [~enis] [~tedyu] Can someone take another look?

> Support specifying a WAL directory outside of the root directory
> 
>
> Key: HBASE-17437
> URL: https://issues.apache.org/jira/browse/HBASE-17437
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, wal
>Affects Versions: 1.2.4
>Reporter: Yishan Yang
>Assignee: Zach York
>  Labels: patch
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17437.branch-1.001.patch, 
> HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, 
> HBASE-17437.branch-1.004.patch, hbase-17437-branch-1.2.patch, 
> HBASE-17437.master.001.patch, HBASE-17437.master.002.patch, 
> HBASE-17437.master.003.patch, HBASE-17437.master.004.patch, 
> HBASE-17437.master.005.patch, HBASE-17437.master.006.patch, 
> HBASE-17437.master.007.patch, HBASE-17437.master.008.patch, 
> HBASE-17437.master.009.patch, HBASE-17437.master.010.patch, 
> HBASE-17437.master.011.patch, HBASE-17437.master.012.patch, 
> hbase-17437-master.patch
>
>
> Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some 
> FileSystems (such as Amazon S3) don’t support append or consistent writes. 
> These two properties are imperative for the WAL in order to avoid loss of 
> writes. However, StoreFiles don’t necessarily need the same consistency 
> guarantees (since writes are cached locally and if writes fail, they can 
> always be replayed from the WAL).
>  
> This JIRA aims to allow users to configure a log directory (for WALs) that is 
> outside of the root directory or even in a different FileSystem. The default 
> value will still put the log directory under the root directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856347#comment-15856347
 ] 

Hadoop QA commented on HBASE-17599:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 5m 
23s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 8s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 5m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 33s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 52s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
58s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 35s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851407/HBASE-17599-v2.patch |
| JIRA Issue | HBASE-17599 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compil

[jira] [Updated] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies

2017-02-07 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17001:
---
Attachment: HBASE-17001.006.HBASE-16961.patch

.006 updates from rb

> [RegionServer] Implement enforcement of quota violation policies
> 
>
> Key: HBASE-17001
> URL: https://issues.apache.org/jira/browse/HBASE-17001
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-17001.001.patch, HBASE-17001.003.patch, 
> HBASE-17001.004.HBASE-16961.patch, HBASE-17001.005.HBASE-16961.patch, 
> HBASE-17001.006.HBASE-16961.patch
>
>
> When the master enacts a quota violation policy, the RegionServers need to 
> actually enforce that policy per its definition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856257#comment-15856257
 ] 

Hadoop QA commented on HBASE-17472:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
20s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
31s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 19s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 11s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
40s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.security.access.TestTablePermissions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851400/HBASE-17472.v2.patch |
| JIRA Issue | HBASE-17472 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux b0b3b3d60482 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreComm

[jira] [Updated] (HBASE-17484) Add non cached version of OffheapKV for write path

2017-02-07 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-17484:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Add non cached version of OffheapKV for write path
> --
>
> Key: HBASE-17484
> URL: https://issues.apache.org/jira/browse/HBASE-17484
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17484_1.patch, HBASE-17484_2.patch, 
> HBASE-17484.patch
>
>
> After running lot of different performance tests for various scenarios and 
> with multi threads we thought that is  better to have a version of OffheapKV 
> in write path that does not cache anything and its fixed_overhead is equal to 
> that in KeyValue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17484) Add non cached version of OffheapKV for write path

2017-02-07 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-17484:
---
Attachment: HBASE-17484_2.patch

This is what I will commit. Thanks for the comments [~anoop.hbase].

> Add non cached version of OffheapKV for write path
> --
>
> Key: HBASE-17484
> URL: https://issues.apache.org/jira/browse/HBASE-17484
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17484_1.patch, HBASE-17484_2.patch, 
> HBASE-17484.patch
>
>
> After running lot of different performance tests for various scenarios and 
> with multi threads we thought that is  better to have a version of OffheapKV 
> in write path that does not cache anything and its fixed_overhead is equal to 
> that in KeyValue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856232#comment-15856232
 ] 

Hadoop QA commented on HBASE-17472:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 41s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s 
{color} | {color:green} hbase-protocol in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 44s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.security.access.TestTablePermissions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851393/HBASE-17472.v1.patch |
| JIRA Issue | HBASE-17472 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux 06b502f9c6eb 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-H

[jira] [Commented] (HBASE-17571) Add batch coprocessor service support

2017-02-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856119#comment-15856119
 ] 

Duo Zhang commented on HBASE-17571:
---

After reviewing the API, I do not think we need to provide a separated 
'batchCoprocessorService' method. This is just an implementation detail. Just 
like we will group the request to the same RS together when implementing multi, 
we do not need to provide a 'groupedMulti' method.

What do you think sir? [~stack]

Thanks.

> Add batch coprocessor service support
> -
>
> Key: HBASE-17571
> URL: https://issues.apache.org/jira/browse/HBASE-17571
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17603) TestScannerResource#testTableDoesNotExist fails

2017-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855238#comment-15855238
 ] 

Ted Yu edited comment on HBASE-17603 at 2/7/17 2:34 PM:


If we do not fetch data when calling getScanner(), the return code wouldn't be 
404 for non-existent table.
This is incompatible change.

For getScanner() issued from hbase-rest module, maybe data can be fetched to 
maintain backward compatibility?




was (Author: yuzhih...@gmail.com):
If we do not fetch data when calling getScanner(), the return code wouldn't be 
404 for non-existent table.
This is incompatible change.

For getScanner() issued from hbase-rest module, can data be fetched to maintain 
backward compatibility, [~Apache9] ?

> TestScannerResource#testTableDoesNotExist fails
> ---
>
> Key: HBASE-17603
> URL: https://issues.apache.org/jira/browse/HBASE-17603
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Ted Yu
>
> This was the first Jenkins build where 
> TestScannerResource#testTableDoesNotExist started failing.
> https://builds.apache.org/job/HBase-1.4/612/jdk=JDK_1_8,label=Hadoop/testReport/junit/org.apache.hadoop.hbase.rest/TestScannerResource/testTableDoesNotExist/
> The test failure can be reproduced locally.
> The test failure seemed to start after HBASE-17508 went in.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum

2017-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17565:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the reviews.

> StochasticLoadBalancer may incorrectly skip balancing due to skewed 
> multiplier sum
> --
>
> Key: HBASE-17565
> URL: https://issues.apache.org/jira/browse/HBASE-17565
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17565.v1.txt, 17565.v2.txt, 17565.v3.txt, 17565.v4.txt, 
> 17565.v5.txt, 17565.v6.txt
>
>
> I was investigating why a 6 node cluster kept skipping balancing requests.
> Here were the region counts on the servers:
> 449, 448, 447, 449, 453, 0
> {code}
> 2017-01-26 22:04:47,145 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost 
> which need balance is 0.05
> {code}
> The big multiplier sum caught my eyes. Here was what additional debug logging 
> showed:
> {code}
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0
> 2017-01-27 23:25:31,749 DEBUG 
> [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] 
> balancer.StochasticLoadBalancer: class 
> org.apache.hadoop.hbase.master.balancer.  
> StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0
> {code}
> Note however, that no table in the cluster used read replica.
> I can think of two ways of fixing this situation:
> 1. If there is no read replica in the cluster, ignore the multipliers for the 
> above two functions.
> 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), 
> ignore the multiplier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Attachment: HBASE-17599-v2.patch

Moved the implementation hint to the comment of the field. Rename 
partialResultFormed to mayHaveMoreCellsInRow for ScannerContext.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17574) Clean up how to run tests under hbase-spark module

2017-02-07 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856067#comment-15856067
 ] 

Sean Busbey commented on HBASE-17574:
-

+1 thanks for this [~easyliangjob]!

> Clean up how to run tests under hbase-spark module 
> ---
>
> Key: HBASE-17574
> URL: https://issues.apache.org/jira/browse/HBASE-17574
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBase-17574-V1.patch, HBase-17574-V2.patch
>
>
> In master brunch, the test of hbase-spark module needs clean-up.
> I think we need to let hbase-spark follow the rules that exist in the whole 
> hbase project
> 1. In hbase-spark, all the scala test cases are regarded as integration test, 
> i.e. we need to go to hbase-spark folder to use mvn verify to run the test 
> case.  I think these tests had better to be regard as unit test for the 
> following reasons:
> (1) All the scala test are very small, most of them can be finished within 
> 20s.
> (2) Integration test usually  put into hbase-it module, not in its own module.
> (3) Hadoop QA could not run those scala test in hbase-spark, I guess Hadoop 
> QA will only run mvn test under root dir, however hbase-spark need mvn verify.
> (4) From its pom.xml below, you can see that, both 
> integration-test and test point to same 
> test. From MVN reference, 
> http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Built-in_Lifecycle_Bindings,
>  we know that if a goal is bound to one or more build phases, that goal will 
> be called in all those phases. it means that mvn test and mvn 
> integration-test will do same thing, however true in 
> test phase just disable the mvn test command.  It is uncommon to have define 
> like that. 
> {code}
>   
> 
> test
> test
> 
> test
> 
> 
> true
> 
> 
> 
> integration-test
> integration-test
> 
> test
> 
> 
> Integration-Test
> 
> -Xmx1536m -XX:MaxPermSize=512m 
> -XX:ReservedCodeCacheSize=512m
> 
> false
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856022#comment-15856022
 ] 

Hudson commented on HBASE-17606:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2460 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2460/])
HBASE-17606 Fix failing TestRpcControllerFactory introduced by (zhangduo: rev 
5c77a7dcd455f7a6e0ba3f289266032be687dc4f)
* (edit) 
hbase-endpoint/src/test/java/org/apache/hadoop/hbase/client/TestRpcControllerFactory.java


> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant

2017-02-07 Thread huzheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855994#comment-15855994
 ] 

huzheng commented on HBASE-17472:
-

Upload patch v2, which remove unnecessary modifications.

> Correct the semantic of  permission grant
> -
>
> Key: HBASE-17472
> URL: https://issues.apache.org/jira/browse/HBASE-17472
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin
>Reporter: huzheng
>Assignee: huzheng
> Fix For: 2.0.0
>
> Attachments: HBASE-17472.v1.patch, HBASE-17472.v2.patch
>
>
> Currently, HBase grant operation has following semantic:
> {code}
> hbase(main):019:0> grant 'hbase_tst', 'RW', 'ycsb'
> 0 row(s) in 0.0960 seconds
> hbase(main):020:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
> 
>  hbase_tst   default,ycsb,,: 
> [Permission:actions=READ,WRITE]   
>   
>   
> 1 row(s) in 0.0550 seconds
> hbase(main):021:0> grant 'hbase_tst', 'CA', 'ycsb'
> 0 row(s) in 0.0820 seconds
> hbase(main):022:0> user_permission 'ycsb'
> User 
> Namespace,Table,Family,Qualifier:Permission   
>   
>   
>  hbase_tst   default,ycsb,,: 
> [Permission: actions=CREATE,ADMIN]
>   
>   
> 1 row(s) in 0.0490 seconds
> {code}  
> Later permission will replace previous granted permissions, which confused 
> most of HBase administrator.
> It's seems more reasonable that HBase merge multiple granted permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >