[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955345#comment-16955345
 ] 

Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:08 AM:
------------------------------------------------------------

An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java. But 
we don't see such error for cases of TransactionalReplicatedTwoBackupFullSync 
and TransactionalPartitionedOneBackupFullSync. 

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
    if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
        onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
    else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
        handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
    /**
     * @param updateCntr Acknowledged counter.
     */
    void cleanupBackupQueue(Long updateCntr) {
        Iterator<CacheContinuousQueryEntry> it = backupQ.iterator();
        while (it.hasNext()) {
            CacheContinuousQueryEntry backupEntry = it.next();
            if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
                it.remove();
        }
    }
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 


was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync. Looked at Ignite code, the following 
snip of onEntryUpdated() in CacheContinuousQueryHandler.java

 
{code:java}
    if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
        onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
    else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
        handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
    /**
     * @param updateCntr Acknowledged counter.
     */
    void cleanupBackupQueue(Long updateCntr) {
        Iterator<CacheContinuousQueryEntry> it = backupQ.iterator();
        while (it.hasNext()) {
            CacheContinuousQueryEntry backupEntry = it.next();
            if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
                it.remove();
        }
    }
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 

> Memory leaks in continuous query handlers
> -----------------------------------------
>
>                 Key: IGNITE-10959
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10959
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7
>            Reporter: Denis Mekhanikov
>            Priority: Major
>             Fix For: 2.9
>
>         Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to