[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-24 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391387#comment-15391387
 ] 

Duo Zhang commented on HBASE-16278:
---

[~ikeda] One problem is that, we may use a byte[] as key multiple times in a 
method, so declare a map with something like ByteArrayWrapper can prevent 
allocating an extra object every time.

And I think it is also a burden that we need to track the interface change 
between different java versions. For example, in java8 there is a 
computeIfAbsent method, which is very useful. And master is claimed to only 
support java 8+, so in master we should also implement this method. But for 
branch-1, we can not implement it since we should also support java 7. Of 
course, this is not a problem that can not be solved but I think a wrapper 
class is simple and enough.

Thanks.

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
> Attachments: ConcurrentHashByteArrayMap.java
>
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16275) Change ServerManager#onlineServers from ConcurrentHashMap to ConcurrentSkipListMap

2016-07-24 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-16275:
-
Status: Open  (was: Patch Available)

working in progress.

> Change ServerManager#onlineServers from ConcurrentHashMap to 
> ConcurrentSkipListMap
> --
>
> Key: HBASE-16275
> URL: https://issues.apache.org/jira/browse/HBASE-16275
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-16275-v001.patch
>
>
> In Class ServerManager, onlineServers is declared as ConcurrentHashMap. In 
> findServerWithSameHostnamePortWithLock(), it has to do a loop to find if 
> there is a ServerName with same host:port pair. If replaced with 
> ConcurrentSkipListMap, findServerWithSameHostnamePortWithLock() can be 
> replaced with a O(logN) implementation. 
> I run some performance comparison(test the function only), it seems that 
> there is no difference if there are 1000 servers. With more servers, 
> ConcurrentSkipListMap implementation is going to win big.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-16275) Change ServerManager#onlineServers from ConcurrentHashMap to ConcurrentSkipListMap

2016-07-24 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-16275 started by huaxiang sun.

> Change ServerManager#onlineServers from ConcurrentHashMap to 
> ConcurrentSkipListMap
> --
>
> Key: HBASE-16275
> URL: https://issues.apache.org/jira/browse/HBASE-16275
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-16275-v001.patch
>
>
> In Class ServerManager, onlineServers is declared as ConcurrentHashMap. In 
> findServerWithSameHostnamePortWithLock(), it has to do a loop to find if 
> there is a ServerName with same host:port pair. If replaced with 
> ConcurrentSkipListMap, findServerWithSameHostnamePortWithLock() can be 
> replaced with a O(logN) implementation. 
> I run some performance comparison(test the function only), it seems that 
> there is no difference if there are 1000 servers. With more servers, 
> ConcurrentSkipListMap implementation is going to win big.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16275) Change ServerManager#onlineServers from ConcurrentHashMap to ConcurrentSkipListMap

2016-07-24 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391379#comment-15391379
 ] 

huaxiang sun commented on HBASE-16275:
--

[~allan163], Yeah, I do not numbers yet, still working on some real performance 
testing, will share more later.

> Change ServerManager#onlineServers from ConcurrentHashMap to 
> ConcurrentSkipListMap
> --
>
> Key: HBASE-16275
> URL: https://issues.apache.org/jira/browse/HBASE-16275
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-16275-v001.patch
>
>
> In Class ServerManager, onlineServers is declared as ConcurrentHashMap. In 
> findServerWithSameHostnamePortWithLock(), it has to do a loop to find if 
> there is a ServerName with same host:port pair. If replaced with 
> ConcurrentSkipListMap, findServerWithSameHostnamePortWithLock() can be 
> replaced with a O(logN) implementation. 
> I run some performance comparison(test the function only), it seems that 
> there is no difference if there are 1000 servers. With more servers, 
> ConcurrentSkipListMap implementation is going to win big.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-24 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated HBASE-16278:
--
Attachment: ConcurrentHashByteArrayMap.java

Added a trial class which might be useful.

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
> Attachments: ConcurrentHashByteArrayMap.java
>
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-07-24 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391364#comment-15391364
 ] 

Phil Yang commented on HBASE-9465:
--

HBASE-16281

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16281) TestMasterReplication is flaky

2016-07-24 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-16281:
--
Attachment: HBASE-16281-v1.patch

> TestMasterReplication is flaky
> --
>
> Key: HBASE-16281
> URL: https://issues.apache.org/jira/browse/HBASE-16281
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.5, 1.2.2, 0.98.20
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16281-v1.patch
>
>
> In TestMasterReplication we put some mutations and wait until we can read the 
> data from slave cluster. However the waiting time is too short. Replication 
> service in slave cluster may not be initialized and ready to handle 
> replication RPC requests in several seconds. 
> We should wait for more time.
> {quote}
> 2016-07-25 11:47:03,156 WARN  [Time-limited 
> test-EventThread.replicationSource,1.replicationSource.10.235.114.28%2C56313%2C1469418386448,1]
>  regionserver.HBaseInterClusterReplicationEndpoint(310): Can't replicate 
> because of a local or network error: 
> java.io.IOException: java.io.IOException: Replication services are not 
> initialized yet
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2263)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
> Caused by: com.google.protobuf.ServiceException: Replication services are not 
> initialized yet
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1935)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
>   ... 3 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16281) TestMasterReplication is flaky

2016-07-24 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-16281:
--
Status: Patch Available  (was: Open)

> TestMasterReplication is flaky
> --
>
> Key: HBASE-16281
> URL: https://issues.apache.org/jira/browse/HBASE-16281
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.20, 1.2.2, 1.1.5
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16281-v1.patch
>
>
> In TestMasterReplication we put some mutations and wait until we can read the 
> data from slave cluster. However the waiting time is too short. Replication 
> service in slave cluster may not be initialized and ready to handle 
> replication RPC requests in several seconds. 
> We should wait for more time.
> {quote}
> 2016-07-25 11:47:03,156 WARN  [Time-limited 
> test-EventThread.replicationSource,1.replicationSource.10.235.114.28%2C56313%2C1469418386448,1]
>  regionserver.HBaseInterClusterReplicationEndpoint(310): Can't replicate 
> because of a local or network error: 
> java.io.IOException: java.io.IOException: Replication services are not 
> initialized yet
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2263)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
> Caused by: com.google.protobuf.ServiceException: Replication services are not 
> initialized yet
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1935)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
>   ... 3 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16281) TestMasterReplication is flaky

2016-07-24 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-16281:
--
Description: 
In TestMasterReplication we put some mutations and wait until we can read the 
data from slave cluster. However the waiting time is too short. Replication 
service in slave cluster may not be initialized and ready to handle replication 
RPC requests in several seconds. 
We should wait for more time.

{quote}
2016-07-25 11:47:03,156 WARN  [Time-limited 
test-EventThread.replicationSource,1.replicationSource.10.235.114.28%2C56313%2C1469418386448,1]
 regionserver.HBaseInterClusterReplicationEndpoint(310): Can't replicate 
because of a local or network error: 
java.io.IOException: java.io.IOException: Replication services are not 
initialized yet
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2263)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
Caused by: com.google.protobuf.ServiceException: Replication services are not 
initialized yet
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1935)
at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
... 3 more
{quote}

  was:
In TestMasterReplication we put some mutations and wait until we can read the 
data from slave cluster. However the waiting time is too short. Replication 
service in slave cluster may not be initialized and ready to handle replication 
RPC requests in several seconds. 
We should wait for more time.


> TestMasterReplication is flaky
> --
>
> Key: HBASE-16281
> URL: https://issues.apache.org/jira/browse/HBASE-16281
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.5, 1.2.2, 0.98.20
>Reporter: Phil Yang
>Assignee: Phil Yang
>
> In TestMasterReplication we put some mutations and wait until we can read the 
> data from slave cluster. However the waiting time is too short. Replication 
> service in slave cluster may not be initialized and ready to handle 
> replication RPC requests in several seconds. 
> We should wait for more time.
> {quote}
> 2016-07-25 11:47:03,156 WARN  [Time-limited 
> test-EventThread.replicationSource,1.replicationSource.10.235.114.28%2C56313%2C1469418386448,1]
>  regionserver.HBaseInterClusterReplicationEndpoint(310): Can't replicate 
> because of a local or network error: 
> java.io.IOException: java.io.IOException: Replication services are not 
> initialized yet
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2263)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
> Caused by: com.google.protobuf.ServiceException: Replication services are not 
> initialized yet
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1935)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
>   ... 3 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16281) TestMasterReplication is flaky

2016-07-24 Thread Phil Yang (JIRA)
Phil Yang created HBASE-16281:
-

 Summary: TestMasterReplication is flaky
 Key: HBASE-16281
 URL: https://issues.apache.org/jira/browse/HBASE-16281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.20, 1.2.2, 1.1.5
Reporter: Phil Yang
Assignee: Phil Yang


In TestMasterReplication we put some mutations and wait until we can read the 
data from slave cluster. However the waiting time is too short. Replication 
service in slave cluster may not be initialized and ready to handle replication 
RPC requests in several seconds. 
We should wait for more time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-07-24 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391287#comment-15391287
 ] 

Duo Zhang commented on HBASE-9465:
--

I think we should open a new issue to revisit this testcase. We need to 
understand the testcase and then choose a proper timeout, not only enlarge the 
timeout.

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16280) Use hash based map in SequenceIdAccounting

2016-07-24 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16280:
-

 Summary: Use hash based map in SequenceIdAccounting
 Key: HBASE-16280
 URL: https://issues.apache.org/jira/browse/HBASE-16280
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang


Its update method is on the write path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-07-24 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391278#comment-15391278
 ] 

Phil Yang commented on HBASE-9465:
--

Failing message is "Waited too much time for replication" because of logs are 
not pushed in time. Seems not related with my patch. Maybe we need open a new 
issue to enlarge the waiting time in TestMasterReplication to reduce the 
probability of failing

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader

2016-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391276#comment-15391276
 ] 

Hadoop QA commented on HBASE-14479:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
17s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} branch-1 passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
18s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} branch-1 passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 14m 35s 
{color} | {color:red} Patch causes 11 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 16m 20s 
{color} | {color:red} Patch causes 11 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 18m 3s 
{color} | {color:red} Patch causes 11 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 19m 47s 
{color} | {color:red} Patch causes 11 errors with Hadoop v2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 30s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.ipc.TestRpcClientLeaks |
|   | hadoop.hbase.procedure.TestProcedureManager |
|   | hadoop.hbase.master.balancer.TestRegionLocationFinder |
| Timed out junit tests | org.apache.hadoop.hbase.ipc.TestAsyncIPC |
|   | org.apache.hadoop.hbase.ipc.TestIPC |
|   | org.apache.hadoop.hbase.security.TestAsyncSecureIPC |
|   | org.apache.hadoop.hbase.security.TestSecureIPC |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch 

[jira] [Commented] (HBASE-16275) Change ServerManager#onlineServers from ConcurrentHashMap to ConcurrentSkipListMap

2016-07-24 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391248#comment-15391248
 ] 

Allan Yang commented on HBASE-16275:


As you said,  ‘’it seems that there is no difference if there are 1000 
servers“”, so  how many servers will the patch make some difference. As far as 
I know, the biggest HBase cluster is still no more than several thousands  
nodes. So I think this fix is trival

> Change ServerManager#onlineServers from ConcurrentHashMap to 
> ConcurrentSkipListMap
> --
>
> Key: HBASE-16275
> URL: https://issues.apache.org/jira/browse/HBASE-16275
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-16275-v001.patch
>
>
> In Class ServerManager, onlineServers is declared as ConcurrentHashMap. In 
> findServerWithSameHostnamePortWithLock(), it has to do a loop to find if 
> there is a ServerName with same host:port pair. If replaced with 
> ConcurrentSkipListMap, findServerWithSameHostnamePortWithLock() can be 
> replaced with a O(logN) implementation. 
> I run some performance comparison(test the function only), it seems that 
> there is no difference if there are 1000 servers. With more servers, 
> ConcurrentSkipListMap implementation is going to win big.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16266) Do not throw ScannerTimeoutException when catch UnknownScannerException

2016-07-24 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391238#comment-15391238
 ] 

Duo Zhang commented on HBASE-16266:
---

This is not a big change so let's finish it. Will commit if no objections till 
this evening.

Thanks.

> Do not throw ScannerTimeoutException when catch UnknownScannerException
> ---
>
> Key: HBASE-16266
> URL: https://issues.apache.org/jira/browse/HBASE-16266
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Scanners
>Affects Versions: 1.1.5, 1.2.2
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16266-v1.patch, HBASE-16266-v2.patch, 
> HBASE-16266-v3.patch
>
>
> Now in scanner we have heartbeat to prevent timeout. The time blocked on 
> ResultScanner.next() may much longer than scanner timeout. So it is no need 
> any more to throw  ScannerTimeoutException when server throws 
> UnknownScannerException, we can just reset the scanner like 
> NotServingRegionException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader

2016-07-24 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated HBASE-14479:
--
Attachment: HBASE-14479.branch-1.V5_e.patch

Add a renamed patch for Hadoop QA, with fixing typo.

> Apply the Leader/Followers pattern to RpcServer's Reader
> 
>
> Key: HBASE-14479
> URL: https://issues.apache.org/jira/browse/HBASE-14479
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, Performance
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Attachments: HBASE-14479-V2 (1).patch, HBASE-14479-V2.patch, 
> HBASE-14479-V2.patch, HBASE-14479-V3-experimental_branch-1.patch, 
> HBASE-14479-V4-experimental_branch-1.patch, HBASE-14479.branch-1.V5_e.patch, 
> HBASE-14479.patch, flamegraph-19152.svg, flamegraph-32667.svg, gc.png, 
> gets.png, io.png, median.png
>
>
> {{RpcServer}} uses multiple selectors to read data for load distribution, but 
> the distribution is just done by round-robin. It is uncertain, especially for 
> long run, whether load is equally divided and resources are used without 
> being wasted.
> Moreover, multiple selectors may cause excessive context switches which give 
> priority to low latency (while we just add the requests to queues), and it is 
> possible to reduce throughput of the whole server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16272) Overflow in ServerName's compareTo method

2016-07-24 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-16272:
---
Fix Version/s: 1.4.0

> Overflow in ServerName's compareTo method
> -
>
> Key: HBASE-16272
> URL: https://issues.apache.org/jira/browse/HBASE-16272
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16272-v001.patch
>
>
> Looking at the ServerName's compareTo(), 
> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/ServerName.java#L303
> It converts the return int value by converting long to int like 
> (int)(longValue), which could be incorrect when it overflows, need to replace 
> it with Long.compareTo(a,b).
> [~mbertozzi] found some others as well, such as
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java#L990



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391123#comment-15391123
 ] 

stack commented on HBASE-16278:
---

Makes sense [~Apache9]

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16229) Cleaning up size and heapSize calculation

2016-07-24 Thread Anastasia Braginsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391032#comment-15391032
 ] 

Anastasia Braginsky commented on HBASE-16229:
-

Very long patch, can you please put it on Review Board?

> Cleaning up size and heapSize calculation
> -
>
> Key: HBASE-16229
> URL: https://issues.apache.org/jira/browse/HBASE-16229
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-16229.patch, HBASE-16229_V2.patch, 
> HBASE-16229_V3.patch
>
>
> It is bit ugly now. For eg:
> AbstractMemStore
> {code}
> public final static long FIXED_OVERHEAD = ClassSize.align(
>   ClassSize.OBJECT +
>   (4 * ClassSize.REFERENCE) +
>   (2 * Bytes.SIZEOF_LONG));
>   public final static long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD +
>   (ClassSize.ATOMIC_LONG + ClassSize.TIMERANGE_TRACKER +
>   ClassSize.CELL_SKIPLIST_SET + ClassSize.CONCURRENT_SKIPLISTMAP));
> {code}
> We include the heap overhead of Segment also here. It will be better the 
> Segment contains its overhead part and the Memstore impl uses the heap size 
> of all of its segments to calculate its size.
> Also this
> {code}
> public long heapSize() {
> return getActive().getSize();
>   }
> {code}
> HeapSize to consider all segment's size not just active's. I am not able to 
> see an override method in CompactingMemstore.
> This jira tries to solve some of these.
> When we create a Segment, we seems pass some initial heap size value to it. 
> Why?  The segment object internally has to know what is its heap size not 
> like some one else dictate it.
> More to add when doing this cleanup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16229) Cleaning up size and heapSize calculation

2016-07-24 Thread Anastasia Braginsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391029#comment-15391029
 ] 

Anastasia Braginsky commented on HBASE-16229:
-

I found this JIRA due to Code Review of HBASE-14921 comment. 

Pay attention that some cases here are sensitive and where done on purpose. At 
least in 14921 the heap size can be known only by Segment as only Segment knows 
what type it is.
Going to review this patch...

> Cleaning up size and heapSize calculation
> -
>
> Key: HBASE-16229
> URL: https://issues.apache.org/jira/browse/HBASE-16229
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-16229.patch, HBASE-16229_V2.patch, 
> HBASE-16229_V3.patch
>
>
> It is bit ugly now. For eg:
> AbstractMemStore
> {code}
> public final static long FIXED_OVERHEAD = ClassSize.align(
>   ClassSize.OBJECT +
>   (4 * ClassSize.REFERENCE) +
>   (2 * Bytes.SIZEOF_LONG));
>   public final static long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD +
>   (ClassSize.ATOMIC_LONG + ClassSize.TIMERANGE_TRACKER +
>   ClassSize.CELL_SKIPLIST_SET + ClassSize.CONCURRENT_SKIPLISTMAP));
> {code}
> We include the heap overhead of Segment also here. It will be better the 
> Segment contains its overhead part and the Memstore impl uses the heap size 
> of all of its segments to calculate its size.
> Also this
> {code}
> public long heapSize() {
> return getActive().getSize();
>   }
> {code}
> HeapSize to consider all segment's size not just active's. I am not able to 
> see an override method in CompactingMemstore.
> This jira tries to solve some of these.
> When we create a Segment, we seems pass some initial heap size value to it. 
> Why?  The segment object internally has to know what is its heap size not 
> like some one else dictate it.
> More to add when doing this cleanup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16205) When Cells are not copied to MSLAB, deep clone it while adding to Memstore

2016-07-24 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390974#comment-15390974
 ] 

Anoop Sam John commented on HBASE-16205:


Exactly [~carp84]. Thanks.
That is why V2 patch cause some test failure.
Any way the check and deep clone in upsert() and that then call internalAdd() 
where there are no extra check.  Its ok to have same one liner code in 2 
methods.

> When Cells are not copied to MSLAB, deep clone it while adding to Memstore
> --
>
> Key: HBASE-16205
> URL: https://issues.apache.org/jira/browse/HBASE-16205
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16205.patch, HBASE-16205_V2.patch, 
> HBASE-16205_V3.patch, HBASE-16205_V3.patch
>
>
> This is imp after HBASE-15180 optimization. After that we the cells flowing 
> in write path will be backed by the same byte[] where the RPC read the 
> request into. By default we have MSLAB On and so we have a copy operation 
> while adding Cells to memstore.  This copy might not be there if
> 1. MSLAB is turned OFF
> 2. Cell size is more than a configurable max size. This defaults to 256 KB
> 3. If the operation is Append/Increment. 
> In such cases, we should just clone the Cell into a new byte[] and then add 
> to memstore.  Or else we keep referring to the bigger byte[] chunk for longer 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16008) A robust way deal with early termination of HBCK

2016-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390958#comment-15390958
 ] 

Hudson commented on HBASE-16008:


FAILURE: Integrated in HBase-1.4 #305 (See 
[https://builds.apache.org/job/HBase-1.4/305/])
HBASE-16008 A robust way deal with early termination of HBCK (Stephen 
(syuanjiangdev: rev a8dd359d7e2c1f6c92eaed3fdcb6d7455aae4ef8)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/SplitOrMergeTracker.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* hbase-protocol/src/main/protobuf/Master.proto
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterMaintenanceModeTracker.java


> A robust way deal with early termination of HBCK
> 
>
> Key: HBASE-16008
> URL: https://issues.apache.org/jira/browse/HBASE-16008
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16008.v0-master.patch, 
> HBASE-16008.v1-branch-1.patch, HBASE-16008.v1-master.patch
>
>
> When HBCK is running, we want to disable Catalog Janitor, Balancer and 
> Split/Merge.  Today, the implementation is not robust.  If HBCK is terminated 
> earlier by Control-C, the changed state would not be reset to original.  
> HBASE-15406 was trying to solve this problem for Split/Merge switch.  The 
> implementation is complicated, and it did not solve CJ and Balancer.  
> The proposal to solve the problem is to use a znode to indicate that the HBCK 
> is running.  CJ, balancer, and Split/Merge switch all look for this znode 
> before doing it operation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)