[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-31 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108077#comment-16108077
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 2a2b107593e335ca41d99e6704214470793b0515 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=2a2b107 ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

When the colocated child bucket failed to initialize, remove the leader
bucket since all these buckets should be created atomically.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105950#comment-16105950
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 731954e04a206004311726216676062079c6186a in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=731954e ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

When the colocated child bucket failed to initialize, remove the leader
bucket since all these buckets should be created atomically.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105940#comment-16105940
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 1504e0f28e0f36a87eb8bd81a2026c1c7163a18d in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=1504e0f ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

When the colocated child bucket failed to initialize, remove the leader
bucket since all these buckets should be created atomically.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105925#comment-16105925
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit c2ac3b7c984972d1ce0da17da13b815985f32107 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=c2ac3b7 ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

When the colocated child bucket failed to initialize, remove the leader
bucket since all these buckets should be created atomically.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096855#comment-16096855
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 24c004d8a4ccf212fd3ea89adab027fc52522326 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=24c004d ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

There's another issue: the CreateMissingBucketsTask did not wait until the
region's buckets finished recovery, thus its check usually did nothing.
If a shadow region bucket failed to initialize due to race condition, then
no way to create missing bucket of the shadow region.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095866#comment-16095866
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit ab13ac474e9455d5b07665f0f8d05c7af8318634 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=ab13ac4 ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.

There's another issue: the CreateMissingBucketsTask did not wait until the
region's buckets finished recovery, thus its check usually did nothing.
If a shadow region bucket failed to initialize due to race condition, then
no way to create missing bucket of the shadow region.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095863#comment-16095863
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 40fb5fdf9e7053de7b1f1df0fc1adfd3fe78546b in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=40fb5fd ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065051#comment-16065051
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 1ceb97c7a9be2143f8cb5a4c4c38e8c8e2d216e3 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=1ceb97c ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064271#comment-16064271
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 1ef7d2b6a80880a1bd5ac58c5c518a842825ae13 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=1ef7d2b ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-22 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059603#comment-16059603
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 8b629d35649091433b7000f7eb60f591dd342b4e in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=8b629d3 ]

GEODE-3055: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049776#comment-16049776
 ] 

ASF GitHub Bot commented on GEODE-3055:
---

Github user upthewaterspout commented on a diff in the pull request:

https://github.com/apache/geode/pull/570#discussion_r122085382
  
--- Diff: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionDataStore.java
 ---
@@ -1472,6 +1472,19 @@ public boolean removeBucket(int bucketId, boolean 
forceRemovePrimary) {
   }
 
   BucketAdvisor bucketAdvisor = bucketRegion.getBucketAdvisor();
+  InternalDistributedMember primary = bucketAdvisor.getPrimary();
+  InternalDistributedMember myId =
+  
this.partitionedRegion.getDistributionManager().getDistributionManagerId();
+  if (primary == null || myId.equals(primary)) {
--- End diff --

This seems similar to the logic a few lines down where we say "if 
(!forceRemovePrimary && bucketAdvisor.isPrimary()) {..."

Unlike that line, your new logic doesn't honor the forceRemovePrimary flag. 
Should it? I don't actually see any cases where that is passed in as true, so 
maybe we should just remove that flag?


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049733#comment-16049733
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 32c74ecec4cd88ea42221d1f7234a07cb7134186 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=32c74ec ]

GEODE-3055: waitUntilFlush should use data region's bucketid list. Some of the 
buckets maybe not initialized, then wait for tempQueue to be empty.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049734#comment-16049734
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 56a3fa75ad35943b1e447af00bdba37d3d225297 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=56a3fa7 ]

GEODE-3055: real root cause is: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049691#comment-16049691
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit 84baa37185ac210bde492e67362ea101565ecdb3 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=84baa37 ]

GEODE-3055: real root cause is: The old primary's the shadow bucket is not
initialized when rebalance remove it. Thus the new primary candidate can
never initialize from it. The fix is to wait until new primary exists before
remove the old primary's bucket in rebalance.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043518#comment-16043518
 ] 

ASF GitHub Bot commented on GEODE-3055:
---

GitHub user gesterzhou opened a pull request:

https://github.com/apache/geode/pull/570

GEODE-3055: waitUntilFlush should use data region's bucketid list. So…

@upthewaterspout @boglesby 

…me of the buckets maybe not initialized, then wait for tempQueue to be 
empty.

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?

- [ ] Is your initial contribution a single, squashed commit?

- [ ] Does `gradlew build` run cleanly?

- [ ] Have you written or updated unit tests to verify your changes?

- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and
submit an update to your PR as soon as possible. If you need help, please 
send an
email to d...@geode.apache.org.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/geode feature/GEM-1483

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/geode/pull/570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #570


commit e07b513f3c564d9a4f6c69ca47fbddf70bb76563
Author: zhouxh 
Date:   2017-06-08T21:52:56Z

GEODE-3055: waitUntilFlush should use data region's bucketid list. Some of 
the buckets maybe not initialized, then wait for tempQueue to be empty.




> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
> Fix For: 1.2.0
>
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-3055) waitUntilFlush did not check the brq's tempQueue, which caused data mismatch

2017-06-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043513#comment-16043513
 ] 

ASF subversion and git services commented on GEODE-3055:


Commit e07b513f3c564d9a4f6c69ca47fbddf70bb76563 in geode's branch 
refs/heads/feature/GEM-1483 from zhouxh
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=e07b513 ]

GEODE-3055: waitUntilFlush should use data region's bucketid list. Some of the 
buckets maybe not initialized, then wait for tempQueue to be empty.


> waitUntilFlush did not check the brq's tempQueue, which caused data mismatch
> 
>
> Key: GEODE-3055
> URL: https://issues.apache.org/jira/browse/GEODE-3055
> Project: Geode
>  Issue Type: Bug
>Reporter: xiaojian zhou
>Assignee: xiaojian zhou
>  Labels: lucene
> Fix For: 1.2.0
>
>
> /export/buglogs_bvt/xzhou/lucene/concParRegHAPersist-0601-171739
> lucene/concParRegHAPersist.conf
> A=accessor
> B=dataStore
> accessorHosts=1
> accessorThreadsPerVM=5
> accessorVMsPerHost=1
> dataStoreHosts=6
> dataStoreThreadsPerVM=5
> dataStoreVMsPerHost=1
> numVMsToStop=2
> redundantCopies=0
> no local.conf
> In dataStoregemfire5_7483/system.log, thread tid=0xdf, putAll Object_11066
> 17:22:27.135 tid=0xdf] generated tag {v1; rv13 shadowKey=2939
> 17:22:27.136 _partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1 bucket : null // 
> brq is not ready yet
> is enqueued to the tempQueue
> 17:22:27.272 tid=0xdf] generated tag {v3; rv15 shadowKey=3278
> 17:22:33.111 Subregion created: 
> /_PR/_BAsyncEventQueueindex#partitionedRegionPARALLELGATEWAYSENDER_QUEUE_1
> vm_3_dataStore3_r02-s28_28143.log:
> 17:22:33.120 Put successfully in the queue shadowKey= 2939
> 17:22:33.156 tid=0x7fe started query
> 17:22:33.176 Peeked shadowKey= 2939
> So the root cause is: the event is still in tempQueue before it's processed, 
> the query happened. WaitUntilFlush should wait until tempQueue is also 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)