[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243445#comment-16243445 ] ASF subversion and git services commented on SOLR-9440: --- Commit 23e1aeb2ce8f7abaee4999d1a951328edebc in lucene-solr's branch refs/heads/branch_7x from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23e1aeb ] SOLR-9440: Revert visibility change of collectionWatches (cherry picked from commit 2e4b692) > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243444#comment-16243444 ] ASF subversion and git services commented on SOLR-9440: --- Commit f6f5d01be9bafedb4794d2e3a104ea8db5bcfa78 in lucene-solr's branch refs/heads/branch_7x from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6f5d01 ] SOLR-9440: The ZkStateReader.removeCollectionStateWatcher method can cache a DocCollection reference and never update it causing stale state to be returned in ClusterState (cherry picked from commit 39376cd) > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233650#comment-16233650 ] ASF subversion and git services commented on SOLR-9440: --- Commit 2e4b6929d2d714b1c12dc66cd46a2307c8bb1044 in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e4b692 ] SOLR-9440: Revert visibility change of collectionWatches > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233639#comment-16233639 ] Shalin Shekhar Mangar commented on SOLR-9440: - {quote} I'm wondering if we need some synchronization here, with registerCollectionStateWatcher and also to make sure the watchedCollectionStates.remove and lazyCollectionStates.put is done atomically {quote} Hmm good point. I don't think we need synchronization here but we need to ensure that the result, once visible, is consistent. So this is a trick that the ZkStateReader uses -- It adds all collections to the lazyCollectionStates map and never removes them unless the collection is deleted. But it gives priority to watchedCollectionStates over the lazy ones. This ensures that during constructState, the collection is always available in the cluster state even if it is removed from the watchedCollectionStates. Actually the lazyCollectionStates.put is not necessary but it is there just for safety. bq. Is this only for testing purposes? Oops, yes, thanks for catching. I'll revert it. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227172#comment-16227172 ] Tomás Fernández Löbbe commented on SOLR-9440: - Thanks for looking at this Shalin! {code:java} if (v.canBeRemoved()) { watchedCollectionStates.remove(collection); lazyCollectionStates.put(collection, new LazyCollectionRef(collection)); reconstructState.set(true); {code} I'm wondering if we need some synchronization here, with {{registerCollectionStateWatcher}} and also to make sure the {{watchedCollectionStates.remove}} and {{lazyCollectionStates.put}} is done atomically? {code} - private ConcurrentHashMap collectionWatches = new ConcurrentHashMap<>(); + public ConcurrentHashMap collectionWatches = new ConcurrentHashMap<>(); {code} Is this only for testing purposes? > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226696#comment-16226696 ] Shalin Shekhar Mangar commented on SOLR-9440: - I'll observe the impact of this change on jenkins for a couple of days before porting it to branch_7x. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226695#comment-16226695 ] ASF subversion and git services commented on SOLR-9440: --- Commit 39376cd8b5ef03b3338c2e8fa31dce732749bcd7 in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39376cd ] SOLR-9440: The ZkStateReader.removeCollectionStateWatcher method can cache a DocCollection reference and never update it causing stale state to be returned in ClusterState > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > Attachments: SOLR-9440.patch, SOLR-9440.patch > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226416#comment-16226416 ] Shalin Shekhar Mangar commented on SOLR-9440: - Also, the reason why we saw this manifest on tests is because the SolrCloudTestCase.waitForState method eventually calls ZkStateReader.waitForState method which registers and then removes a collection state watcher. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226414#comment-16226414 ] Shalin Shekhar Mangar commented on SOLR-9440: - I found the root cause. The bug is in {{ZkStateReader.removeCollectionStateWatcher}} which only removes the collection from the collection's watch list i.e. collectionWatches map. Since ZK does not have a way to remove a watch, the watch object is fired again when the collection changes. Now, there is code in StateWatcher's refreshAndWatch method which is supposed to evict the cached DocCollection object from watchedCollectionStates if the collection is no more in the collectionWatches map. However, that code never gets executed because the StateWatcher's process method returns early if the collection is not in collectionWatches list. So a cached DocCollection reference that is neither lazy nor watched is left behind. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 6.7, 7.0, master (8.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190978#comment-16190978 ] ASF subversion and git services commented on SOLR-9440: --- Commit 5982d8734adf44e12cae1985574ca682f07839ca in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5982d87 ] SOLR-11076: Added more debug logging. Ensure collections are active before we exercise autoscaling. Added workaround for SOLR-9440. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar > Fix For: 6.7, 7.0 > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015005#comment-16015005 ] Tomás Fernández Löbbe commented on SOLR-9440: - I hit this problem all the time in the tests of SOLR-10233, the only workaround I found is to use {{cluster.getSolrClient().getZkStateReader().registerCore(collectionName);}}. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar > Fix For: 6.6, master (7.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it
[ https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907791#comment-15907791 ] Ishan Chattopadhyaya commented on SOLR-9440: Moving to 6.5, since 6.4 has already been released. > ZkStateReader on a client can cache collection state and never refresh it > - > > Key: SOLR-9440 > URL: https://issues.apache.org/jira/browse/SOLR-9440 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Shalin Shekhar Mangar > Fix For: 6.5, master (7.0) > > > I saw this while writing a test case for SOLR-9438. The collection1 > collection which was in stateFormat=2 was somehow caching the > CloudSolrClient's ZkStateReader such that the returned cluster state > contained the collection state. However this collection was neither watched > nor lazy so any call to waitForRecoveriesToFinish would see stale state and > loop until timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org