[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-11-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243445#comment-16243445
 ] 

ASF subversion and git services commented on SOLR-9440:
---

Commit 23e1aeb2ce8f7abaee4999d1a951328edebc in lucene-solr's branch 
refs/heads/branch_7x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23e1aeb ]

SOLR-9440: Revert visibility change of collectionWatches

(cherry picked from commit 2e4b692)


> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-11-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243444#comment-16243444
 ] 

ASF subversion and git services commented on SOLR-9440:
---

Commit f6f5d01be9bafedb4794d2e3a104ea8db5bcfa78 in lucene-solr's branch 
refs/heads/branch_7x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6f5d01 ]

SOLR-9440: The ZkStateReader.removeCollectionStateWatcher method can cache a 
DocCollection reference and never update it causing stale state to be returned 
in ClusterState

(cherry picked from commit 39376cd)


> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233650#comment-16233650
 ] 

ASF subversion and git services commented on SOLR-9440:
---

Commit 2e4b6929d2d714b1c12dc66cd46a2307c8bb1044 in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e4b692 ]

SOLR-9440: Revert visibility change of collectionWatches


> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233639#comment-16233639
 ] 

Shalin Shekhar Mangar commented on SOLR-9440:
-

{quote}
I'm wondering if we need some synchronization here, with 
registerCollectionStateWatcher and also to make sure the 
watchedCollectionStates.remove and lazyCollectionStates.put is done atomically
{quote}

Hmm good point. I don't think we need synchronization here but we need to 
ensure that the result, once visible, is consistent. So this is a trick that 
the ZkStateReader uses -- It adds all collections to the lazyCollectionStates 
map and never removes them unless the collection is deleted. But it gives 
priority to watchedCollectionStates over the lazy ones. This ensures that 
during constructState, the collection is always available in the cluster state 
even if it is removed from the watchedCollectionStates. Actually the 
lazyCollectionStates.put is not necessary but it is there just for safety.

bq. Is this only for testing purposes?

Oops, yes, thanks for catching. I'll revert it.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227172#comment-16227172
 ] 

Tomás Fernández Löbbe commented on SOLR-9440:
-

Thanks for looking at this Shalin!
{code:java}
  if (v.canBeRemoved()) {
watchedCollectionStates.remove(collection);
lazyCollectionStates.put(collection, new LazyCollectionRef(collection));
reconstructState.set(true);
{code}
I'm wondering if we need some synchronization here, with 
{{registerCollectionStateWatcher}} and also to make sure the 
{{watchedCollectionStates.remove}} and {{lazyCollectionStates.put}} is done 
atomically?
{code}
-  private ConcurrentHashMap collectionWatches = new 
ConcurrentHashMap<>();
+  public ConcurrentHashMap collectionWatches = new 
ConcurrentHashMap<>();
{code}
Is this only for testing purposes?

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226696#comment-16226696
 ] 

Shalin Shekhar Mangar commented on SOLR-9440:
-

I'll observe the impact of this change on jenkins for a couple of days before 
porting it to branch_7x.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226695#comment-16226695
 ] 

ASF subversion and git services commented on SOLR-9440:
---

Commit 39376cd8b5ef03b3338c2e8fa31dce732749bcd7 in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39376cd ]

SOLR-9440: The ZkStateReader.removeCollectionStateWatcher method can cache a 
DocCollection reference and never update it causing stale state to be returned 
in ClusterState


> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
> Attachments: SOLR-9440.patch, SOLR-9440.patch
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226416#comment-16226416
 ] 

Shalin Shekhar Mangar commented on SOLR-9440:
-

Also, the reason why we saw this manifest on tests is because the 
SolrCloudTestCase.waitForState method eventually calls 
ZkStateReader.waitForState method which registers and then removes a collection 
state watcher.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-31 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226414#comment-16226414
 ] 

Shalin Shekhar Mangar commented on SOLR-9440:
-

I found the root cause.

The bug is in {{ZkStateReader.removeCollectionStateWatcher}} which only removes 
the collection from the collection's watch list i.e. collectionWatches map. 
Since ZK does not have a way to remove a watch, the watch object is fired again 
when the collection changes. Now, there is code in StateWatcher's 
refreshAndWatch method which is supposed to evict the cached DocCollection 
object from watchedCollectionStates if the collection is no more in the 
collectionWatches map. However, that code never gets executed because the 
StateWatcher's process method returns early if the collection is not in 
collectionWatches list. So a cached DocCollection reference that is neither 
lazy nor watched is left behind.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-10-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190978#comment-16190978
 ] 

ASF subversion and git services commented on SOLR-9440:
---

Commit 5982d8734adf44e12cae1985574ca682f07839ca in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5982d87 ]

SOLR-11076: Added more debug logging. Ensure collections are active before we 
exercise autoscaling. Added workaround for SOLR-9440.


> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-05-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015005#comment-16015005
 ] 

Tomás Fernández Löbbe commented on SOLR-9440:
-

I hit this problem all the time in the tests of SOLR-10233, the only workaround 
I found is to use 
{{cluster.getSolrClient().getZkStateReader().registerCore(collectionName);}}.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.6, master (7.0)
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9440) ZkStateReader on a client can cache collection state and never refresh it

2017-03-13 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907791#comment-15907791
 ] 

Ishan Chattopadhyaya commented on SOLR-9440:


Moving to 6.5, since 6.4 has already been released.

> ZkStateReader on a client can cache collection state and never refresh it
> -
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.5, master (7.0)
>
>
> I saw this while writing a test case for SOLR-9438. The collection1 
> collection which was in stateFormat=2 was somehow caching the 
> CloudSolrClient's ZkStateReader such that the returned cluster state 
> contained the collection state. However this collection was neither watched 
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and 
> loop until timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org