[ 
https://issues.apache.org/jira/browse/SOLR-11443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203212#comment-16203212
 ] 

Cao Manh Dat commented on SOLR-11443:
-------------------------------------

Thank [~dragonsinth] for reviewing!
bq. I might be thinking about this wrong, but the test seems to trying to 
thread an invisible needle, I guess we're trying to shut down overseer halfway 
through the list of updates? But we might very well just complete all 
operations quickly and restart overseer after they're all done.
bq. I feel like the maybeFlushBefore, maybeFlushAfter bits need a little more 
thinking. Seems pretty arbitrary to only check the firstCommand; maybe we 
should completely separate command-specific flush trigger from general purpose 
flush trigger? Then you could check command-level flushing on each command, if 
that's even still necessary.
bq.When would numUpdates diverge from updates.size()?

That all relates to SOLR-11447 changes. 
For the first one, I assume that you're talking about {{testDownNodeFailover}}, 
DOWNNODE message is converted to multiple ZKWriteCommands, so the test proves 
that if we flush clusterstate when processing the first command and Overseer 
get restarted right after the flushing, the rest of ZkWriteCommands will never 
get executed.
For the second comment, I fixed that in last patch of SOLR-11447. 
The numUpdates count number of ZkWriteCommand was processed, updates.size() 
indicates how many collections get affected ( many ZkWriteCommands can affect 
single collection )

bq. Seems like you could just always set this dirty; but if you're trying to 
in-memory surgery as an optimization, I don't understand the need for the 
containsAll check.

It is a sanity check ( which can never happen ). But If we know that there are 
nodes get deleted but not present in the cache, the cache seems in the dirty 
state. Here is slightly better version of that code block.
{code}
    int cacheSizeBefore = knownChildren.size();
    knownChildren.removeAll(paths);
    if (cacheSizeBefore - paths.size() == knownChildren.size()) {
      stats.setQueueLength(knownChildren.size());
    } else {
      // There are elements get deleted but not present in the cache,
      // the cache seems not valid anymore
      knownChildren.clear();
      isDirty = true;
    }
{code}



> Remove the usage of workqueue for Overseer
> ------------------------------------------
>
>                 Key: SOLR-11443
>                 URL: https://issues.apache.org/jira/browse/SOLR-11443
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>         Attachments: SOLR-11443.patch, SOLR-11443.patch
>
>
> If we can remove the usage of workqueue, We can save a lot of IO blocking in 
> Overseer, hence boost performance a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to