[
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955345#comment-16955345
]
Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:06 AM:
------------------------------------------------------------
An error case TransactionalPartitionedTwoBackupFullSync is as the following log
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:
[ERROR]
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
Backup queue is not empty. Node:
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache.
expected:<0> but was:<1>
But we don't see such error for TransactionalReplicatedTwoBackupFullSync or
TransactionalPartitionedOneBackupFullSync
Looked at Ignite code, the following snip of onEntryUpdated() in
CacheContinuousQueryHandler.java
{code:java}
if (primary || skipPrimaryCheck) //
TransactionalReplicatedTwoBackupFullSync goes here
onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into
backupQ
{code}
After notifying the query client, there seems an ack msg
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to
clean up the entries in backupQ. And there is even a periodic BackupCleaner
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:
{code:java}
/**
* @param updateCntr Acknowledged counter.
*/
void cleanupBackupQueue(Long updateCntr) {
Iterator<CacheContinuousQueryEntry> it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
So some questions are
# Why is a backupEntry still left over in backupQ after all these?
# Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
# Is it possible that the ack msg is sent to only one of the two backup nodes?
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be
a msg somehow dropped in the middle.
Please help to look into more, especially from Ignite experts or developers.
Thanks,
was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:
[ERROR]
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
Backup queue is not empty. Node:
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache.
expected:<0> but was:<1>
But we don't see such error for TransactionalReplicatedTwoBackupFullSync or
TransactionalPartitionedOneBackupFullSync
Looked at Ignite code, the following snip of onEntryUpdated() in
CacheContinuousQueryHandler.java
{code:java}
if (primary || skipPrimaryCheck) //
TransactionalReplicatedTwoBackupFullSync goes here
onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into
backupQ
{code}
After notifying the query client, there seems an ack msg
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to
clean up the entries in backupQ. And there is even a periodic BackupCleaner
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:
{code:java}
/**
* @param updateCntr Acknowledged counter.
*/
void cleanupBackupQueue(Long updateCntr) {
Iterator<CacheContinuousQueryEntry> it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
So some questions are
# Why is a backupEntry still left over in backupQ after all these?
# Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
# Is it possible that the ack msg is sent to only one of the two backup nodes?
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be
a msg somehow dropped in the middle.
Please help to look into more, especially from Ignite experts or developers.
Thanks,
> Memory leaks in continuous query handlers
> -----------------------------------------
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.7
> Reporter: Denis Mekhanikov
> Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java,
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache
> events are processed.
> A test, that reproduces the problem, is attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)