[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-08-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411314#comment-15411314
 ] 

Duo Zhang commented on HBASE-9465:
--

[~ashish singhi] Any other concerns?

Thanks,

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, 
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server

2016-08-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-12770:
--
Affects Version/s: 1.4.0
   2.0.0
Fix Version/s: 1.4.0
   2.0.0

> Don't transfer all the queued hlogs of a dead server to the same alive server
> -
>
> Key: HBASE-12770
> URL: https://issues.apache.org/jira/browse/HBASE-12770
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Jianwei Cui
>Assignee: Phil Yang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-12770-branch-1-v1.patch, 
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch, 
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server

2016-08-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-12770:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-1.

Thanks all for reviewing.

> Don't transfer all the queued hlogs of a dead server to the same alive server
> -
>
> Key: HBASE-12770
> URL: https://issues.apache.org/jira/browse/HBASE-12770
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Jianwei Cui
>Assignee: Phil Yang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-12770-branch-1-v1.patch, 
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch, 
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16285) Drop RPC requests if it must be considered as timeout at client

2016-08-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16285:
--
Attachment: HBASE-16285-v7.patch

Retry and ping [~stack].

> Drop RPC requests if it must be considered as timeout at client
> ---
>
> Key: HBASE-16285
> URL: https://issues.apache.org/jira/browse/HBASE-16285
> Project: HBase
>  Issue Type: Improvement
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16285-branch-1-v1.patch, 
> HBASE-16285-branch-1-v2.patch, HBASE-16285-branch-1-v3.patch, 
> HBASE-16285-branch-1-v4.patch, HBASE-16285-v1.patch, HBASE-16285-v2.patch, 
> HBASE-16285-v3.patch, HBASE-16285-v4.patch, HBASE-16285-v5.patch, 
> HBASE-16285-v6.patch, HBASE-16285-v7.patch, HBASE-16285-v7.patch
>
>
> After HBASE-15593, we have a timeout param in header of RPC requests. We can 
> use it in more scenes.
> A straightforward scene is to drop requests if it has waited so long in RPC 
> queue and has been dropped by client. Even if we handle this request and send 
> the response back, it will not be used any more. And client may have sent a 
> retry. In an extreme case, if the server is slow, all requests may be timeout 
> or queue-full-exception because we should handle previous requests which have 
> been dropped by client and many resources at server are wasted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9465) Push entries to peer clusters serially

2016-08-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-9465:
-
Attachment: HBASE-9465-v6.patch

Retry.

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, 
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-08-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411516#comment-15411516
 ] 

Duo Zhang commented on HBASE-9465:
--

[~yangzhe1991] Can you please upload a rebased patch? Thanks.

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, 
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-08-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411369#comment-15411369
 ] 

Duo Zhang commented on HBASE-9465:
--

Good. Thanks for reviewing.

Will commit shortly.

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, 
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server

2016-08-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411466#comment-15411466
 ] 

Duo Zhang commented on HBASE-12770:
---

The failed UT is unrelated. Will commit shortly.

Thanks.

> Don't transfer all the queued hlogs of a dead server to the same alive server
> -
>
> Key: HBASE-12770
> URL: https://issues.apache.org/jira/browse/HBASE-12770
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Jianwei Cui
>Assignee: Phil Yang
>Priority: Minor
> Attachments: HBASE-12770-branch-1-v1.patch, 
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch, 
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-08-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411796#comment-15411796
 ] 

Duo Zhang commented on HBASE-9465:
--

Is TestReplicationSourceManager related?

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, 
> HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v3.patch, HBASE-9465-v4.patch, HBASE-9465-v5.patch, 
> HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465-v7.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16209) Provide an ExponentialBackOffPolicy sleep between failed region open requests

2016-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402986#comment-15402986
 ] 

Duo Zhang commented on HBASE-16209:
---

It is forceNewPlan, which means we must generate a new plan even if we already 
have one. I think we should change the parameter name of invokeAssign method to 
forceNewPlan to avoid misunderstanding.

Thanks.

> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -
>
> Key: HBASE-16209
> URL: https://issues.apache.org/jira/browse/HBASE-16209
> Project: HBase
>  Issue Type: Bug
>Reporter: Joseph
>Assignee: Joseph
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16209-addendum.patch, 
> HBASE-16209-branch-1-addendum-v2.patch, HBASE-16209-branch-1-addendum.patch, 
> HBASE-16209-branch-1-v3.patch, HBASE-16209-branch-1.patch, 
> HBASE-16209-v2.patch, HBASE-16209.patch
>
>
> Related to HBASE-16138. As of now we currently have no pause between retrying 
> failed region open requests. And with a low maximumAttempt default, we can 
> quickly use up all our regionOpen retries if the server is in a bad state. I 
> added in a ExponentialBackOffPolicy so that we spread out the timing of our 
> open region retries in AssignmentManager. Review board at 
> https://reviews.apache.org/r/50011/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16323) Introduce a new type of filter for compaction scan

2016-08-02 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16323:
-

 Summary: Introduce a new type of filter for compaction scan
 Key: HBASE-16323
 URL: https://issues.apache.org/jira/browse/HBASE-16323
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


As some projects will use CP to drop some cells when compaction. The old filter 
is too general and powerful for this usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) Push entries to peer clusters serially

2016-08-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403507#comment-15403507
 ] 

Duo Zhang commented on HBASE-9465:
--

Also +1 on v3.

[~stack] [~lhofhansl] [~enis] Any concerns?

Thanks.

> Push entries to peer clusters serially
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
> Attachments: HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16225) Refactor ScanQueryMatcher

2016-08-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16225:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-1.

Thanks all for reviewing.

> Refactor ScanQueryMatcher
> -
>
> Key: HBASE-16225
> URL: https://issues.apache.org/jira/browse/HBASE-16225
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, Scanners
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16225-branch-1-v1.patch, 
> HBASE-16225-branch-1.patch, HBASE-16225-v1.patch, HBASE-16225-v2.patch, 
> HBASE-16225-v3.patch, HBASE-16225-v4.patch, HBASE-16225-v5.patch, 
> HBASE-16225-v6.patch, HBASE-16225.patch
>
>
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. 
> I suggest that we can abstract an interface and implement several sub classes 
> which separate different logic into different implementations. For example, 
> the requirements of compaction and user scan are different, now we also need 
> to consider the logic of user scan even if we only want to add a logic for 
> compaction. And at least, the raw scan does not need a query matcher... we 
> can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16320) Revisit scan semantics and implementations

2016-08-02 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16320:
-

 Summary: Revisit scan semantics and implementations
 Key: HBASE-16320
 URL: https://issues.apache.org/jira/browse/HBASE-16320
 Project: HBase
  Issue Type: Umbrella
Reporter: Duo Zhang


Create an umbrella issue to track the pending discussions in HBASE-16225.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16322) Dsiable filter for raw scan

2016-08-02 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16322:
-

 Summary: Dsiable filter for raw scan
 Key: HBASE-16322
 URL: https://issues.apache.org/jira/browse/HBASE-16322
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


As we will pass delete markers to the filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16225) Refactor ScanQueryMatcher

2016-08-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16225:
--
Affects Version/s: 1.4.0
   2.0.0
Fix Version/s: 1.4.0
   2.0.0
  Component/s: Scanners
   regionserver

> Refactor ScanQueryMatcher
> -
>
> Key: HBASE-16225
> URL: https://issues.apache.org/jira/browse/HBASE-16225
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, Scanners
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16225-branch-1-v1.patch, 
> HBASE-16225-branch-1.patch, HBASE-16225-v1.patch, HBASE-16225-v2.patch, 
> HBASE-16225-v3.patch, HBASE-16225-v4.patch, HBASE-16225-v5.patch, 
> HBASE-16225-v6.patch, HBASE-16225.patch
>
>
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. 
> I suggest that we can abstract an interface and implement several sub classes 
> which separate different logic into different implementations. For example, 
> the requirements of compaction and user scan are different, now we also need 
> to consider the logic of user scan even if we only want to add a logic for 
> compaction. And at least, the raw scan does not need a query matcher... we 
> can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16324) Remove LegacyScanQueryMatcher

2016-08-02 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16324:
-

 Summary: Remove LegacyScanQueryMatcher
 Key: HBASE-16324
 URL: https://issues.apache.org/jira/browse/HBASE-16324
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


After we eliminate all references to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16209) Provide an ExponentialBackOffPolicy sleep between failed region open requests

2016-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403183#comment-15403183
 ] 

Duo Zhang commented on HBASE-16209:
---

Why do you want to call invokeAssignLaterOnFailure in ClosedRegionHandler? 
There is no exception here right?

> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -
>
> Key: HBASE-16209
> URL: https://issues.apache.org/jira/browse/HBASE-16209
> Project: HBase
>  Issue Type: Bug
>Reporter: Joseph
>Assignee: Joseph
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16209-addendum.patch, 
> HBASE-16209-branch-1-addendum-v2.patch, HBASE-16209-branch-1-addendum.patch, 
> HBASE-16209-branch-1-v3.patch, HBASE-16209-branch-1.patch, 
> HBASE-16209-v2.patch, HBASE-16209.patch
>
>
> Related to HBASE-16138. As of now we currently have no pause between retrying 
> failed region open requests. And with a low maximumAttempt default, we can 
> quickly use up all our regionOpen retries if the server is in a bad state. I 
> added in a ExponentialBackOffPolicy so that we spread out the timing of our 
> open region retries in AssignmentManager. Review board at 
> https://reviews.apache.org/r/50011/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16196) Update jruby to a newer version.

2016-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403184#comment-15403184
 ] 

Duo Zhang commented on HBASE-16196:
---

Can we try printing out which dependency misses a license?

> Update jruby to a newer version.
> 
>
> Key: HBASE-16196
> URL: https://issues.apache.org/jira/browse/HBASE-16196
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies, shell
>Reporter: Elliott Clark
>Assignee: Matt Mullins
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 0001-Update-to-JRuby-9.1.2.0-and-JLine-2.12.patch, 
> hbase-16196.branch-1.patch, hbase-16196.v2.branch-1.patch
>
>
> Ruby 1.8.7 is no longer maintained.
> The TTY library in the old jruby is bad. The newer one is less bad.
> Since this is only a dependency on the hbase-shell module and not on 
> hbase-client or hbase-server this should be a pretty simple thing that 
> doesn't have any backwards compat issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16196) Update jruby to a newer version.

2016-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403214#comment-15403214
 ] 

Duo Zhang commented on HBASE-16196:
---

Good.

> Update jruby to a newer version.
> 
>
> Key: HBASE-16196
> URL: https://issues.apache.org/jira/browse/HBASE-16196
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies, shell
>Reporter: Elliott Clark
>Assignee: Matt Mullins
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 0001-Update-to-JRuby-9.1.2.0-and-JLine-2.12.patch, 
> hbase-16196.branch-1.patch, hbase-16196.v2.branch-1.patch
>
>
> Ruby 1.8.7 is no longer maintained.
> The TTY library in the old jruby is bad. The newer one is less bad.
> Since this is only a dependency on the hbase-shell module and not on 
> hbase-client or hbase-server this should be a pretty simple thing that 
> doesn't have any backwards compat issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16209) Provide an ExponentialBackOffPolicy sleep between failed region open requests

2016-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403193#comment-15403193
 ] 

Duo Zhang commented on HBASE-16209:
---

And in my experience, there is no NPE. The problem is that the when moving a 
region, the region is not online quick enough after offline, this causes a lot 
of UT to fail. I found the problem is that we call invokeAssignLater in 
ClosedRegionHandler, this add a delay before assigning the region to a new 
place. The original implementation is to call assign directly. Any reasons why 
we should change it to invokeLater? Thanks very much.

> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -
>
> Key: HBASE-16209
> URL: https://issues.apache.org/jira/browse/HBASE-16209
> Project: HBase
>  Issue Type: Bug
>Reporter: Joseph
>Assignee: Joseph
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16209-addendum.patch, 
> HBASE-16209-branch-1-addendum-v2.patch, HBASE-16209-branch-1-addendum.patch, 
> HBASE-16209-branch-1-v3.patch, HBASE-16209-branch-1.patch, 
> HBASE-16209-v2.patch, HBASE-16209.patch
>
>
> Related to HBASE-16138. As of now we currently have no pause between retrying 
> failed region open requests. And with a low maximumAttempt default, we can 
> quickly use up all our regionOpen retries if the server is in a bad state. I 
> added in a ExponentialBackOffPolicy so that we spread out the timing of our 
> open region retries in AssignmentManager. Review board at 
> https://reviews.apache.org/r/50011/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16225) Refactor ScanQueryMatcher

2016-08-01 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16225:
--
Attachment: (was: HBASE-16225-branch-1-v1.patch)

> Refactor ScanQueryMatcher
> -
>
> Key: HBASE-16225
> URL: https://issues.apache.org/jira/browse/HBASE-16225
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16225-branch-1-v1.patch, 
> HBASE-16225-branch-1.patch, HBASE-16225-v1.patch, HBASE-16225-v2.patch, 
> HBASE-16225-v3.patch, HBASE-16225-v4.patch, HBASE-16225-v5.patch, 
> HBASE-16225-v6.patch, HBASE-16225.patch
>
>
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. 
> I suggest that we can abstract an interface and implement several sub classes 
> which separate different logic into different implementations. For example, 
> the requirements of compaction and user scan are different, now we also need 
> to consider the logic of user scan even if we only want to add a logic for 
> compaction. And at least, the raw scan does not need a query matcher... we 
> can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16225) Refactor ScanQueryMatcher

2016-08-01 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16225:
--
Attachment: HBASE-16225-branch-1-v1.patch

Retry.

> Refactor ScanQueryMatcher
> -
>
> Key: HBASE-16225
> URL: https://issues.apache.org/jira/browse/HBASE-16225
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16225-branch-1-v1.patch, 
> HBASE-16225-branch-1.patch, HBASE-16225-v1.patch, HBASE-16225-v2.patch, 
> HBASE-16225-v3.patch, HBASE-16225-v4.patch, HBASE-16225-v5.patch, 
> HBASE-16225-v6.patch, HBASE-16225.patch
>
>
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. 
> I suggest that we can abstract an interface and implement several sub classes 
> which separate different logic into different implementations. For example, 
> the requirements of compaction and user scan are different, now we also need 
> to consider the logic of user scan even if we only want to add a logic for 
> compaction. And at least, the raw scan does not need a query matcher... we 
> can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-07-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358668#comment-15358668
 ] 

Duo Zhang commented on HBASE-16135:
---

The failed tests are unrelated. Any other concerns? Thanks.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135-v3.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-30 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358260#comment-15358260
 ] 

Duo Zhang commented on HBASE-16135:
---

oldsources is a {{CopyOnWriteArrayList}}, its iterator does not support remove 
which means we can not remove element from it on fly.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-30 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358416#comment-15358416
 ] 

Duo Zhang commented on HBASE-16135:
---

I can not reproduce it locally. The strange log output is

{noformat}
2016-07-01 04:09:57,901 DEBUG [main] replication.ReplicationQueueInfo(112): 
Found dead servers:[hostname1.example.org,1234,1]
2016-07-01 04:09:57,910 INFO  [main] 
replication.TableBasedReplicationQueuesImpl(250): hostname.example.org,1234,1 
has deleted abandoned queue 2-hostname1.example.org,1234,1 from 
hostname1.example.org,1234,1
{noformat}

It should be
{noformat}
2016-07-01 13:19:01,981 DEBUG [main] replication.ReplicationQueueInfo(112): 
Found dead servers:[hostname1.example.org,1234,1]
2016-07-01 13:19:01,983 INFO  [main] 
replication.TableBasedReplicationQueuesImpl(246): 
dummyserver1.example.org,1234,1 has claimed queue 
1-hostname1.example.org,1234,1 from hostname1.example.org,1234,1
{noformat}

Let me dig more.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-30 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358344#comment-15358344
 ] 

Duo Zhang commented on HBASE-16135:
---

You can file another issue to do refactoring.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-30 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358391#comment-15358391
 ] 

Duo Zhang commented on HBASE-16135:
---

Let me check the failed UT.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16165) Decrease RpcServer.callQueueSize before writeResponse causes OOM

2016-07-01 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-16165:
-

 Summary: Decrease RpcServer.callQueueSize before writeResponse 
causes OOM
 Key: HBASE-16165
 URL: https://issues.apache.org/jira/browse/HBASE-16165
 Project: HBase
  Issue Type: Bug
Reporter: Duo Zhang


In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls 
which causes OOM. But in {{CallRunner.run}}, we decrease it before send the 
response back. And even after calling {{sendResponseIfReady}}, the call object 
could stay in our heap for a long time if we can not write out the 
response(That's why we need a Responder thread...). This makes it possible that 
the actual size of all call object in heap is larger than 
{{maxQueueSizeInBytes}} and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16165) Decrease RpcServer.callQueueSize before writeResponse causes OOM

2016-07-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358726#comment-15358726
 ] 

Duo Zhang commented on HBASE-16165:
---

One possible way to fix this is to decrease call queue size after we write out 
the whole response. And in fact, the param field of a call object is useless 
when we want to write response, so maybe we could set it null to reduce memory 
pressure?

Suggestions are welcomed. Thanks.

> Decrease RpcServer.callQueueSize before writeResponse causes OOM
> 
>
> Key: HBASE-16165
> URL: https://issues.apache.org/jira/browse/HBASE-16165
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>
> In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls 
> which causes OOM. But in {{CallRunner.run}}, we decrease it before send the 
> response back. And even after calling {{sendResponseIfReady}}, the call 
> object could stay in our heap for a long time if we can not write out the 
> response(That's why we need a Responder thread...). This makes it possible 
> that the actual size of all call object in heap is larger than 
> {{maxQueueSizeInBytes}} and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-30 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-v3.patch

Haven't found the root cause yet, but peer id '2' is only introduced in the new 
test. So I changed the server hostname to hostname2 in the new test to make 
them independent.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135-v3.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-branch-1.1.patch

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-0.98.patch

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-branch-1.2.patch

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-branch-1.2.patch, 
> HBASE-16135-branch-1.patch, HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15454) Archive store files older than max age

2016-06-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356292#comment-15356292
 ] 

Duo Zhang commented on HBASE-15454:
---

I haven't tested this in a real cluster yet...

And one thing I found is that the freeze window boundaries in HFile's metadata 
are useless. I need to consider lots of other properties to decide whether I 
can do EC on a storefile.

I will post new patches if we begin to deploy DTCS in real cluster(maybe 
several months later...).

And we can also track this jira for more production experience I think.

https://issues.apache.org/jira/browse/CASSANDRA-10195

Thanks.

> Archive store files older than max age
> --
>
> Key: HBASE-15454
> URL: https://issues.apache.org/jira/browse/HBASE-15454
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0, 0.98.21
>
> Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, 
> HBASE-15454-v3.patch, HBASE-15454-v4.patch, HBASE-15454-v5.patch, 
> HBASE-15454-v6.patch, HBASE-15454-v7.patch, HBASE-15454.patch
>
>
> In date tiered compaction, the store files older than max age are never 
> touched by minor compactions. Here we introduce a 'freeze window' operation, 
> which does the follow things:
> 1. Find all store files that contains cells whose timestamp are in the give 
> window.
> 2. Compaction all these files and output one file for each window that these 
> files covered.
> After the compaction, we will have only one in the give window, and all cells 
> whose timestamp are in the give window are in the only file. And if you do 
> not write new cells with an older timestamp in this window, the file will 
> never be changed. This makes it easier to do erasure coding on the freezed 
> file to reduce redundence. And also, it makes it possible to check 
> consistency between master and peer cluster incrementally.
> And why use the word 'freeze'?
> Because there is already an 'HFileArchiver' class. I want to use a different 
> word to prevent confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Affects Version/s: 1.2.2
   1.4.0
   1.3.0
   1.1.5
   0.98.20
Fix Version/s: 1.2.3
   0.98.21
   1.1.6
   1.4.0
   1.3.0
   2.0.0

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356506#comment-15356506
 ] 

Duo Zhang commented on HBASE-16135:
---

Ping [~ashu210890] and [~ghelmling], let's commit if you guys do not have other 
concerns?

Thanks.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16110) AsyncFS WAL doesn't work with Hadoop 2.8+

2016-06-30 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356594#comment-15356594
 ] 

Duo Zhang commented on HBASE-16110:
---

Ping [~busbey]

> AsyncFS WAL doesn't work with Hadoop 2.8+
> -
>
> Key: HBASE-16110
> URL: https://issues.apache.org/jira/browse/HBASE-16110
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Sean Busbey
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-16110.patch
>
>
> The async wal implementation doesn't work with Hadoop 2.8+. Fails compilation 
> and will fail running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16165) Decrease RpcServer.callQueueSize before writeResponse causes OOM

2016-07-01 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16165:
--
Priority: Minor  (was: Major)

> Decrease RpcServer.callQueueSize before writeResponse causes OOM
> 
>
> Key: HBASE-16165
> URL: https://issues.apache.org/jira/browse/HBASE-16165
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Minor
>
> In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls 
> which causes OOM. But in {{CallRunner.run}}, we decrease it before send the 
> response back. And even after calling {{sendResponseIfReady}}, the call 
> object could stay in our heap for a long time if we can not write out the 
> response(That's why we need a Responder thread...). This makes it possible 
> that the actual size of all call object in heap is larger than 
> {{maxQueueSizeInBytes}} and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16165) Decrease RpcServer.callQueueSize before writeResponse causes OOM

2016-07-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359877#comment-15359877
 ] 

Duo Zhang commented on HBASE-16165:
---

We hit this recently, but only happens on our legacy 94 clusters. And we found 
that there is another bug in 0.94.

In 0.94, when we can not write back the whole response at the first place, we 
will attach the call to the channel's SelectionKey, and never detach it. So if 
we have lots of connections whose selection key is attached with a call, and 
the call's param field is large(this usually happens when replication is 
enabled) then we will run into OOM.

So for hbase 0.98+, I think this is only theoretical. It could only happen if a 
client keeps sending large put request but never receives the response. Let's 
modify the priority. :)

> Decrease RpcServer.callQueueSize before writeResponse causes OOM
> 
>
> Key: HBASE-16165
> URL: https://issues.apache.org/jira/browse/HBASE-16165
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>
> In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls 
> which causes OOM. But in {{CallRunner.run}}, we decrease it before send the 
> response back. And even after calling {{sendResponseIfReady}}, the call 
> object could stay in our heap for a long time if we can not write out the 
> response(That's why we need a Responder thread...). This makes it possible 
> that the actual size of all call object in heap is larger than 
> {{maxQueueSizeInBytes}} and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-07-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360740#comment-15360740
 ] 

Duo Zhang commented on HBASE-16135:
---

Will commit this evening(GMT+8) if no objections.

Thanks.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135-v3.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-07-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to all branches. Thanks all for reviewing.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135-v2.patch, HBASE-16135-v3.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: (was: HBASE-16135-branch-1.patch)

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-branch-1.patch

Patch for branch-1.

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-branch-1.patch, HBASE-16135-v1.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16135) PeerClusterZnode under rs of removed peer may never be deleted

2016-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-16135:
--
Attachment: HBASE-16135-branch-1.patch

Missed one @Test annotation...

> PeerClusterZnode under rs of removed peer may never be deleted
> --
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HBASE-16135-branch-1.patch, HBASE-16135-v1.patch, 
> HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15921) Add first AsyncTable impl and create TableImpl based on it

2016-08-16 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423827#comment-15423827
 ] 

Duo Zhang commented on HBASE-15921:
---

OK, you use it in this patch...

I left a comment on rb, I do not think it is a good idea to jump over the stub 
layer of protobuf to call a RpcChannel directly, the method name mapping is 
fragile, and we also eliminate the type check...

Thanks.

> Add first AsyncTable impl and create TableImpl based on it
> --
>
> Key: HBASE-15921
> URL: https://issues.apache.org/jira/browse/HBASE-15921
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jurriaan Mous
>Assignee: Jurriaan Mous
> Attachments: HBASE-15921.patch, HBASE-15921.v1.patch
>
>
> First we create an AsyncTable interface with implementation without the Scan 
> functionality. Those will land in a separate patch since they need a refactor 
> of existing scans.
> Also added is a new TableImpl to replace HTable. It uses the AsyncTableImpl 
> internally and should be a bit faster because it does jump through less hoops 
> to do ProtoBuf transportation. This way we can run all existing tests on the 
> AsyncTableImpl to guarantee its quality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15798) Add Async RpcChannels to all RpcClients

2016-08-16 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423786#comment-15423786
 ] 

Duo Zhang commented on HBASE-15798:
---

Sorry a bit late, but how do you use the {{AsyncRpcChannel}} in HBase?

The protobuf stub can only be instantiated using {{RpcChannel}} or 
{{BlockingRpcChannel}}.

Thanks.

> Add Async RpcChannels to all RpcClients
> ---
>
> Key: HBASE-15798
> URL: https://issues.apache.org/jira/browse/HBASE-15798
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jurriaan Mous
>Assignee: Jurriaan Mous
> Fix For: 2.0.0
>
> Attachments: HBASE-15798-v1.patch, HBASE-15798-v1.patch, 
> HBASE-15798-v2.patch, HBASE-15798.patch
>
>
> The RpcClients all need to expose an async protobuf RpcChannel and our own 
> custom AsyncRpcChannel (without protobuf overhead) so an Async table 
> implementation can be made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16388) Prevent client threads being blocked by only one slow region server

2016-08-17 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425911#comment-15425911
 ] 

Duo Zhang commented on HBASE-16388:
---

How do you deal with the async call? Although it is not used right now and I 
plan to reimplement it...

> Prevent client threads being blocked by only one slow region server
> ---
>
> Key: HBASE-16388
> URL: https://issues.apache.org/jira/browse/HBASE-16388
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16388-v1.patch, HBASE-16388-v2.patch, 
> HBASE-16388-v2.patch
>
>
> It is a general use case for HBase's users that they have several 
> threads/handlers in their service, and each handler has its own Table/HTable 
> instance. Generally users think each handler is independent and won't 
> interact each other.
> However, in an extreme case, if a region server is very slow, every requests 
> to this RS will timeout, handlers of users' service may be occupied by the 
> long-waiting requests even requests belong to other RS will also be timeout.
> For example: 
> If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 
> 10 region servers whose average response time is 50ms. If no region server is 
> slow, we can handle 2000 requests per second.
> Now this service's QPS is 1000. If there is one region server very slow and 
> all requests to it will be timeout. Users hope that only 10% requests failed, 
> and 90% requests' response time is still 50ms, because only 10% requests are 
> located to the slow RS. However, each second we have 100 long-waiting 
> requests which exactly occupies all 100 handles. So all handlers is blocked, 
> the availability of this service is almost zero.
> To prevent this case, we can limit the max concurrent requests to one RS in 
> process-level. Requests exceeding the limit will throws 
> ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above 
> case, if we set this limit to 20, only 20 handlers will be occupied and other 
> 80 handlers can still handle requests to other RS. The availability of this 
> service is 90% as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849784#comment-15849784
 ] 

Duo Zhang commented on HBASE-17508:
---

{quote}
How hard to do this?
{quote}

I haven't read the code carefully yet so I'm not sure. But I think it worth to 
fix it even if it is very hard. It is also a little confusing to the end users, 
although the comment says 'we will modify the scan'...

Anyway, this shouldbe addressed in another issue. Let me finish this issue 
first. Only one UT now.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17552) Update developer section in hbase book

2017-02-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851169#comment-15851169
 ] 

Duo Zhang commented on HBASE-17552:
---

Sorry for being late. +1. Nice work [~appy].

> Update developer section in hbase book
> --
>
> Key: HBASE-17552
> URL: https://issues.apache.org/jira/browse/HBASE-17552
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
> Fix For: 2.0.0
>
> Attachments: HBASE-17552.master.001.patch, 
> HBASE-17552.master.002.patch
>
>
> Updates
> - Changes 'Create Patch' in major way. Promoting use of submit-patch.py 
> script.
> - Changes instructions for committing patch to make contributor of a patch 
> also  its author so proper credit is given to contributors in terms of github 
> history.
> - Rewording in 'code formatting guidelines'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17508:
--
Attachment: HBASE-17508-v6.patch

Address the comments on rb.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851163#comment-15851163
 ] 

Duo Zhang commented on HBASE-17508:
---

Any other concerns sir? [~stack]

If no big concern, let me start to prepare the patch for branch-1.

Thanks.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852520#comment-15852520
 ] 

Duo Zhang commented on HBASE-17508:
---

Filed HBASE-17595 to track it. [~stack].

Let me prepare patch for branch-1.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852515#comment-15852515
 ] 

Duo Zhang commented on HBASE-17508:
---

{quote}
Only concern is the follow-on JIRA that will ensure we maintain compatibility.
{quote}
You mean the 'allowPartial' for small and limited scan? Let me file a issue to 
track it.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17595) Add partial result for small/limited scan

2017-02-03 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17595:
-

 Summary: Add partial result for small/limited scan
 Key: HBASE-17595
 URL: https://issues.apache.org/jira/browse/HBASE-17595
 Project: HBase
  Issue Type: Sub-task
  Components: asyncclient, Client, scan
Affects Versions: 2.0.0, 1.4.0
Reporter: Duo Zhang
Priority: Blocker
 Fix For: 2.0.0, 1.4.0


The partial result support is marked as a 'TODO' when implementing HBASE-17045. 
And when implementing HBASE-17508, we found that if we make small scan share 
the same logic with general scan, the scan request other than open scanner will 
not have the small flag so the server may return  partial result to the client 
and cause some strange behavior. It is solved by modifying the logic at server 
side, but this means the 1.4.x client is not safe to contact with earlier 1.x 
server. So we'd better address the problem at client side. Marked as blocker as 
this issue should be finished before any 2.x and 1.4.x releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848372#comment-15848372
 ] 

Duo Zhang commented on HBASE-17508:
---

Fix TestSyncTable. The problem is in the syncRange method, we use one Scan 
object to instantiate two scanners at the same time. We will modify the Scan 
object during the scan operation, for example, we will change the startKey and 
the mvccReadPoint of the Scan object. But why we can pass the test before this 
patch? Just because we are lucky... It is very easy to get a strange result as 
the two scanners will both modify the Scan object's startKey...

And Why not use a copy of the Scan object internally? This is because we need 
to use the Scan object to pass the ScanMetrics. I think this is a bad practice. 
IMO, we should make the ResultScanner carry the ScanMetrics, not the Scan 
object.

Thanks.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-01 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17508:
--
Attachment: HBASE-17508-v3.patch

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-01 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17508:
--
Attachment: HBASE-17508-v4.patch

Some minor cleanups.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Attachment: HBASE-17599-v1.patch

Will open a new issue to clean up the usage of isPartial.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17402) TestAsyncTableScan sometimes hangs

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17402:
--
Attachment: HBASE-17402.patch

The fix is simple.. I forgot to record toSend in pendingRequest in the 
onComplete method. The newly added test 
TestAsyncNonMetaRegionLocator.testConcurrentLocate can reproduce the problem.

With the fix TestAsyncNonMetaRegionLocator.testConcurrentLocate never hangs for 
me. And this bug will also cause TestAsyncNonMetaRegionLocatorConcurrenyLimit 
to fail.

> TestAsyncTableScan sometimes hangs
> --
>
> Key: HBASE-17402
> URL: https://issues.apache.org/jira/browse/HBASE-17402
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17402.patch
>
>
> It may hang in the setUp method. Never seen this in jenkins build but it 
> happens several times locally. I think the problem is 
> inAsyncNonMetaRegionLocator where we have a logic to limit the concurrecy of 
> request to meta table.
> Open a issue here, will dig more later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17402) TestAsyncTableScan sometimes hangs

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17402:
--
 Assignee: Duo Zhang
Affects Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> TestAsyncTableScan sometimes hangs
> --
>
> Key: HBASE-17402
> URL: https://issues.apache.org/jira/browse/HBASE-17402
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17402.patch
>
>
> It may hang in the setUp method. Never seen this in jenkins build but it 
> happens several times locally. I think the problem is 
> inAsyncNonMetaRegionLocator where we have a logic to limit the concurrecy of 
> request to meta table.
> Open a issue here, will dig more later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Also set the partial flag of Result to true if we reach the batch limit

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Also set the partial flag of Result to true if we reach the batch limit
> ---
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Summary: Use mayHaveMoreCellsInRow instead of isPartial  (was: Also set the 
partial flag of Result to true if we reach the batch limit)

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855111#comment-15855111
 ] 

Duo Zhang commented on HBASE-17599:
---

Any concerns on the new approach? [~stack] [~anoop.hbase] [~yangzhe1991].

Thanks.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17597) TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17597:
--
Affects Version/s: 1.3.0
   1.2.4
   1.1.8
Fix Version/s: 1.1.9
   1.2.5
   1.3.1

> TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky
> 
>
> Key: HBASE-17597
> URL: https://issues.apache.org/jira/browse/HBASE-17597
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.3.0, 1.4.0, 1.2.4, 1.1.8
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17597-branch-1.patch
>
>
> The problem is we get NPE when getting ServerName from HRegionLocation.
> I think this is a test issue, not something wrong with our code. The location 
> of meta region is fetched from zk and it could be null if the region has not 
> been assigned yet. We should deal with null HRegionLocation in the test code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855175#comment-15855175
 ] 

Duo Zhang commented on HBASE-17599:
---

{quote}
This is a little unsettling.
{quote}
Yeah for end user this is not useful. Let me move this comment to another place.

{quote}
Do you want to change the ScannerContext method partialResultFormed to match 
the above?
{quote}
Let me do it.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17606) Fix failing tests introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17606:
-

 Summary: Fix failing tests introduced by HBASE-17508
 Key: HBASE-17606
 URL: https://issues.apache.org/jira/browse/HBASE-17606
 Project: HBase
  Issue Type: Bug
  Components: Client, scan
Affects Versions: 2.0.0, 1.4.0
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 2.0.0, 1.4.0


TestRpcControllerFactory and TestScannerResource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Attachment: HBASE-17606.patch

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Status: Patch Available  (was: Open)

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17402) TestAsyncTableScan sometimes hangs

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17402:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks all for reviewing.

> TestAsyncTableScan sometimes hangs
> --
>
> Key: HBASE-17402
> URL: https://issues.apache.org/jira/browse/HBASE-17402
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17402.patch, HBASE-17402-v1.patch
>
>
> It may hang in the setUp method. Never seen this in jenkins build but it 
> happens several times locally. I think the problem is 
> inAsyncNonMetaRegionLocator where we have a logic to limit the concurrecy of 
> request to meta table.
> Open a issue here, will dig more later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17607) Rest api for scan should return 404 when table not exists

2017-02-06 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17607:
-

 Summary: Rest api for scan should return 404 when table not exists
 Key: HBASE-17607
 URL: https://issues.apache.org/jira/browse/HBASE-17607
 Project: HBase
  Issue Type: Bug
  Components: REST, scan
Affects Versions: 2.0.0, 1.4.0
Reporter: Duo Zhang
Priority: Critical
 Fix For: 2.0.0, 1.4.0


The problem is introduced after HBASE-17508. After HBASE-17508 we will not 
contact RS when getScanner. So for rest, get scanner will not return 404 
either. But we should get a 404 when fetching data from the scanner but now it 
will return 204.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855256#comment-15855256
 ] 

Duo Zhang commented on HBASE-17508:
---

Filed HBASE-17606 and HBASE-17607.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508-branch-1.patch, 
> HBASE-17508-branch-1-v1.patch, HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch, HBASE-17508-v7.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855291#comment-15855291
 ] 

Duo Zhang commented on HBASE-17606:
---

Will commit later if no objections.

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17402) TestAsyncTableScan sometimes hangs

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855112#comment-15855112
 ] 

Duo Zhang commented on HBASE-17402:
---

Will fix the star import and commit shortly.

> TestAsyncTableScan sometimes hangs
> --
>
> Key: HBASE-17402
> URL: https://issues.apache.org/jira/browse/HBASE-17402
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17402.patch
>
>
> It may hang in the setUp method. Never seen this in jenkins build but it 
> happens several times locally. I think the problem is 
> inAsyncNonMetaRegionLocator where we have a logic to limit the concurrecy of 
> request to meta table.
> Open a issue here, will dig more later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Summary: Fix failing TestRpcControllerFactory introduced by HBASE-17508  
(was: Fix failing tests introduced by HBASE-17508)

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> TestRpcControllerFactory and TestScannerResource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855192#comment-15855192
 ] 

Duo Zhang commented on HBASE-17508:
---

Thanks [~tedyu] for pointing out this. Let me open a issue to fix it, and also 
TestScannerResource .

I think the flaky test finder maybe too aggresive... [~appy]. And 
TestRpcControllerFactory and TestScannerResource fails consistently, I do not 
think this should be regard as flaky?

Thanks.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508-branch-1.patch, 
> HBASE-17508-branch-1-v1.patch, HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch, HBASE-17508-v7.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Description: (was: TestRpcControllerFactory and TestScannerResource.)

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17597) TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17597:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Push to branch-1 and branch-1.x. Thanks all for reviewing.

> TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky
> 
>
> Key: HBASE-17597
> URL: https://issues.apache.org/jira/browse/HBASE-17597
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.3.0, 1.4.0, 1.2.4, 1.1.8
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 1.4.0, 1.3.1, 1.2.5, 1.1.9
>
> Attachments: HBASE-17597-branch-1.patch
>
>
> The problem is we get NPE when getting ServerName from HRegionLocation.
> I think this is a test issue, not something wrong with our code. The location 
> of meta region is fetched from zk and it could be null if the region has not 
> been assigned yet. We should deal with null HRegionLocation in the test code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17402) TestAsyncTableScan sometimes hangs

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17402:
--
Attachment: HBASE-17402-v1.patch

> TestAsyncTableScan sometimes hangs
> --
>
> Key: HBASE-17402
> URL: https://issues.apache.org/jira/browse/HBASE-17402
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17402.patch, HBASE-17402-v1.patch
>
>
> It may hang in the setUp method. Never seen this in jenkins build but it 
> happens several times locally. I think the problem is 
> inAsyncNonMetaRegionLocator where we have a logic to limit the concurrecy of 
> request to meta table.
> Open a issue here, will dig more later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Affects Version/s: (was: 1.4.0)
Fix Version/s: (was: 1.4.0)

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17608) Add suspend support for RawScanResultConsumer

2017-02-07 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17608:
-

 Summary: Add suspend support for RawScanResultConsumer
 Key: HBASE-17608
 URL: https://issues.apache.org/jira/browse/HBASE-17608
 Project: HBase
  Issue Type: Sub-task
  Components: asyncclient, Client, scan
Affects Versions: 2.0.0
Reporter: Duo Zhang
 Fix For: 2.0.0


Now for the AsyncResultScanner, we can only close the scanner if we reach the 
cache size limit and open a new scanner later. This will breaks the region 
level consistency. We should just stop fetching data and leave the scanner open 
at RS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17606) Fix failing TestRpcControllerFactory introduced by HBASE-17508

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17606:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master. branch-1 does not have the problem.

> Fix failing TestRpcControllerFactory introduced by HBASE-17508
> --
>
> Key: HBASE-17606
> URL: https://issues.apache.org/jira/browse/HBASE-17606
> Project: HBase
>  Issue Type: Bug
>  Components: Client, scan
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17606.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17583) Add inclusive/exclusive support for startRow and endRow of scan for sync client

2017-02-06 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17583:
--
Attachment: HBASE-17583-v2.patch

Add more comments.

> Add inclusive/exclusive support for startRow and endRow of scan for sync 
> client
> ---
>
> Key: HBASE-17583
> URL: https://issues.apache.org/jira/browse/HBASE-17583
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17583.patch, HBASE-17583-v1.patch, 
> HBASE-17583-v2.patch
>
>
> Implement the same feature of HBASE-17320 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853009#comment-15853009
 ] 

Duo Zhang commented on HBASE-17508:
---

The failed UT is branch-1 only and introduced in HBASE-17238. The location of 
meta region is fetched from zk so I do not think it is breaked by our patch 
here. And I skimmed the code, it seems to be a test issue. Will open another 
issue to address it.

Let me commit the patch here.

Thanks.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508-branch-1.patch, 
> HBASE-17508-branch-1-v1.patch, HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch, HBASE-17508-v7.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17508:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: Now the scan.setSmall method is deprecated. Consider using 
scan.setLimit and scan.setReadType in the future.
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-1.

Thanks [~stack] for reviewing.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508-branch-1.patch, 
> HBASE-17508-branch-1-v1.patch, HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch, HBASE-17508-v7.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17597) TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky

2017-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17597:
--
Attachment: HBASE-17597-branch-1.patch

Rewrite the test with HTU.waitFor.

> TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky
> 
>
> Key: HBASE-17597
> URL: https://issues.apache.org/jira/browse/HBASE-17597
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
>Reporter: Duo Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-17597-branch-1.patch
>
>
> The problem is we get NPE when getting ServerName from HRegionLocation.
> I think this is a test issue, not something wrong with our code. The location 
> of meta region is fetched from zk and it could be null if the region has not 
> been assigned yet. We should deal with null HRegionLocation in the test code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17597) TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky

2017-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17597:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky
> 
>
> Key: HBASE-17597
> URL: https://issues.apache.org/jira/browse/HBASE-17597
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-17597-branch-1.patch
>
>
> The problem is we get NPE when getting ServerName from HRegionLocation.
> I think this is a test issue, not something wrong with our code. The location 
> of meta region is fetched from zk and it could be null if the region has not 
> been assigned yet. We should deal with null HRegionLocation in the test code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17508) Unify the implementation of small scan and regular scan for sync client

2017-02-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852996#comment-15852996
 ] 

Duo Zhang commented on HBASE-17508:
---

Let me check the failed UT.

> Unify the implementation of small scan and regular scan for sync client
> ---
>
> Key: HBASE-17508
> URL: https://issues.apache.org/jira/browse/HBASE-17508
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17508-branch-1.patch, 
> HBASE-17508-branch-1-v1.patch, HBASE-17508.patch, HBASE-17508-v1.patch, 
> HBASE-17508-v2.patch, HBASE-17508-v3.patch, HBASE-17508-v4.patch, 
> HBASE-17508-v5.patch, HBASE-17508-v6.patch, HBASE-17508-v7.patch
>
>
> Implement the same logic with HBASE-17045 for sync client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17595) Add partial result for small/limited scan

2017-02-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17595:
--
Priority: Major  (was: Blocker)

When implementing other stuffs for scan I found that it is not easy to make the 
new client compatible with an old server,,, And this is not required in our 
compatibility matrix. So change the priority to major instead of critical.

> Add partial result for small/limited scan
> -
>
> Key: HBASE-17595
> URL: https://issues.apache.org/jira/browse/HBASE-17595
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> The partial result support is marked as a 'TODO' when implementing 
> HBASE-17045. And when implementing HBASE-17508, we found that if we make 
> small scan share the same logic with general scan, the scan request other 
> than open scanner will not have the small flag so the server may return  
> partial result to the client and cause some strange behavior. It is solved by 
> modifying the logic at server side, but this means the 1.4.x client is not 
> safe to contact with earlier 1.x server. So we'd better address the problem 
> at client side. Marked as blocker as this issue should be finished before any 
> 2.x and 1.4.x releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17505) Do not issue close scanner request if RS tells us there is no more results for this region

2017-02-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-17505.
---
Resolution: Duplicate

The logic has been implemented by HBASE-17508.

> Do not issue close scanner request if RS tells us there is no more results 
> for this region
> --
>
> Key: HBASE-17505
> URL: https://issues.apache.org/jira/browse/HBASE-17505
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> The follow on issue of HBASE-17489.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17505) Do not issue close scanner request if RS tells us there is no more results for this region

2017-02-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17505:
--
Affects Version/s: (was: 1.3.0)
Fix Version/s: (was: 1.3.1)

> Do not issue close scanner request if RS tells us there is no more results 
> for this region
> --
>
> Key: HBASE-17505
> URL: https://issues.apache.org/jira/browse/HBASE-17505
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> The follow on issue of HBASE-17489.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17571) Add batch coprocessor service support

2017-02-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856119#comment-15856119
 ] 

Duo Zhang commented on HBASE-17571:
---

After reviewing the API, I do not think we need to provide a separated 
'batchCoprocessorService' method. This is just an implementation detail. Just 
like we will group the request to the same RS together when implementing multi, 
we do not need to provide a 'groupedMulti' method.

What do you think sir? [~stack]

Thanks.

> Add batch coprocessor service support
> -
>
> Key: HBASE-17571
> URL: https://issues.apache.org/jira/browse/HBASE-17571
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Attachment: HBASE-17599-v2.patch

Moved the implementation hint to the comment of the field. Rename 
partialResultFormed to mayHaveMoreCellsInRow for ScannerContext.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17603) Rest api for scan should return 404 when table not exists

2017-02-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858909#comment-15858909
 ] 

Duo Zhang commented on HBASE-17603:
---

What if the table is deleted after the existence check but still before we 
actually send a next request?

> Rest api for scan should return 404 when table not exists
> -
>
> Key: HBASE-17603
> URL: https://issues.apache.org/jira/browse/HBASE-17603
> Project: HBase
>  Issue Type: Bug
>  Components: REST, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Ted Yu
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17603.v1.txt
>
>
> This was the first Jenkins build where 
> TestScannerResource#testTableDoesNotExist started failing.
> https://builds.apache.org/job/HBase-1.4/612/jdk=JDK_1_8,label=Hadoop/testReport/junit/org.apache.hadoop.hbase.rest/TestScannerResource/testTableDoesNotExist/
> The test failure can be reproduced locally.
> The test failure seemed to start after HBASE-17508 went in.
> The problem is introduced after HBASE-17508. After HBASE-17508 we will not 
> contact RS when getScanner. So for rest, get scanner will not return 404 
> either. But we should get a 404 when fetching data from the scanner but now 
> it will return 204.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Attachment: HBASE-17599-v3.patch

Fix comment.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch, HBASE-17599-v3.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17599:
--
Attachment: HBASE-17599-branch-1.patch

Patch for branch-1.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599-branch-1.patch, HBASE-17599.patch, 
> HBASE-17599-v1.patch, HBASE-17599-v2.patch, HBASE-17599-v3.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17599) Use mayHaveMoreCellsInRow instead of isPartial

2017-02-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858908#comment-15858908
 ] 

Duo Zhang commented on HBASE-17599:
---

{quote}
We should just remove it in the patch for master branch?
{quote}

Oops, forgot to remove it. As [~stack] said, we'd better keep it for a little 
longer time, maybe remove it in 3.0?

Let me remove it and prepare a new patch.

> Use mayHaveMoreCellsInRow instead of isPartial
> --
>
> Key: HBASE-17599
> URL: https://issues.apache.org/jira/browse/HBASE-17599
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17599.patch, HBASE-17599-v1.patch, 
> HBASE-17599-v2.patch
>
>
> For now if we set scan.allowPartial(true), the partial result returned will 
> have the partial flag set to true. But for scan.setBatch(xx), the partial 
> result returned will not be marked as partial.
> This is an Incompatible change, indeed. But I do not think it will introduce 
> any issues as we just provide more informations to client. The old partial 
> flag for batched scan is always false so I do not think anyone can make use 
> of it.
> This is very important for the limited scan to support partial results from 
> server. If we get a Result which partial flag is false then we know we get 
> the whole row. Otherwise we need to fetch one more row to see if the row key 
> is changed which causes the logic to be more complicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17603) Rest api for scan should return 404 when table not exists

2017-02-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858943#comment-15858943
 ] 

Duo Zhang commented on HBASE-17603:
---

I mean the http code for rest-client.

> Rest api for scan should return 404 when table not exists
> -
>
> Key: HBASE-17603
> URL: https://issues.apache.org/jira/browse/HBASE-17603
> Project: HBase
>  Issue Type: Bug
>  Components: REST, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Ted Yu
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17603.v1.txt
>
>
> This was the first Jenkins build where 
> TestScannerResource#testTableDoesNotExist started failing.
> https://builds.apache.org/job/HBase-1.4/612/jdk=JDK_1_8,label=Hadoop/testReport/junit/org.apache.hadoop.hbase.rest/TestScannerResource/testTableDoesNotExist/
> The test failure can be reproduced locally.
> The test failure seemed to start after HBASE-17508 went in.
> The problem is introduced after HBASE-17508. After HBASE-17508 we will not 
> contact RS when getScanner. So for rest, get scanner will not return 404 
> either. But we should get a 404 when fetching data from the scanner but now 
> it will return 204.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17603) Rest api for scan should return 404 when table not exists

2017-02-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858973#comment-15858973
 ] 

Duo Zhang commented on HBASE-17603:
---

Yes it is possible. But if you bypass the meta cache the meta table will be 
fucked...Every scan request will lead to a request to meta table...

> Rest api for scan should return 404 when table not exists
> -
>
> Key: HBASE-17603
> URL: https://issues.apache.org/jira/browse/HBASE-17603
> Project: HBase
>  Issue Type: Bug
>  Components: REST, scan
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Ted Yu
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17603.v1.txt
>
>
> This was the first Jenkins build where 
> TestScannerResource#testTableDoesNotExist started failing.
> https://builds.apache.org/job/HBase-1.4/612/jdk=JDK_1_8,label=Hadoop/testReport/junit/org.apache.hadoop.hbase.rest/TestScannerResource/testTableDoesNotExist/
> The test failure can be reproduced locally.
> The test failure seemed to start after HBASE-17508 went in.
> The problem is introduced after HBASE-17508. After HBASE-17508 we will not 
> contact RS when getScanner. So for rest, get scanner will not return 404 
> either. But we should get a 404 when fetching data from the scanner but now 
> it will return 204.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17608) Add suspend support for RawScanResultConsumer

2017-02-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17608:
--
Attachment: HBASE-17608-v1.patch

Missed the two new files.

> Add suspend support for RawScanResultConsumer
> -
>
> Key: HBASE-17608
> URL: https://issues.apache.org/jira/browse/HBASE-17608
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client, scan
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17608.patch, HBASE-17608-v1.patch
>
>
> Now for the AsyncResultScanner, we can only close the scanner if we reach the 
> cache size limit and open a new scanner later. This will breaks the region 
> level consistency. We should just stop fetching data and leave the scanner 
> open at RS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    5   6   7   8   9   10   11   12   13   14   >