[ https://issues.apache.org/jira/browse/SOLR-11443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203212#comment-16203212 ]
Cao Manh Dat commented on SOLR-11443: ------------------------------------- Thank [~dragonsinth] for reviewing! bq. I might be thinking about this wrong, but the test seems to trying to thread an invisible needle, I guess we're trying to shut down overseer halfway through the list of updates? But we might very well just complete all operations quickly and restart overseer after they're all done. bq. I feel like the maybeFlushBefore, maybeFlushAfter bits need a little more thinking. Seems pretty arbitrary to only check the firstCommand; maybe we should completely separate command-specific flush trigger from general purpose flush trigger? Then you could check command-level flushing on each command, if that's even still necessary. bq.When would numUpdates diverge from updates.size()? That all relates to SOLR-11447 changes. For the first one, I assume that you're talking about {{testDownNodeFailover}}, DOWNNODE message is converted to multiple ZKWriteCommands, so the test proves that if we flush clusterstate when processing the first command and Overseer get restarted right after the flushing, the rest of ZkWriteCommands will never get executed. For the second comment, I fixed that in last patch of SOLR-11447. The numUpdates count number of ZkWriteCommand was processed, updates.size() indicates how many collections get affected ( many ZkWriteCommands can affect single collection ) bq. Seems like you could just always set this dirty; but if you're trying to in-memory surgery as an optimization, I don't understand the need for the containsAll check. It is a sanity check ( which can never happen ). But If we know that there are nodes get deleted but not present in the cache, the cache seems in the dirty state. Here is slightly better version of that code block. {code} int cacheSizeBefore = knownChildren.size(); knownChildren.removeAll(paths); if (cacheSizeBefore - paths.size() == knownChildren.size()) { stats.setQueueLength(knownChildren.size()); } else { // There are elements get deleted but not present in the cache, // the cache seems not valid anymore knownChildren.clear(); isDirty = true; } {code} > Remove the usage of workqueue for Overseer > ------------------------------------------ > > Key: SOLR-11443 > URL: https://issues.apache.org/jira/browse/SOLR-11443 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Assignee: Cao Manh Dat > Attachments: SOLR-11443.patch, SOLR-11443.patch > > > If we can remove the usage of workqueue, We can save a lot of IO blocking in > Overseer, hence boost performance a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org