[ 
https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186206#comment-14186206
 ] 

Hoss Man commented on SOLR-6631:
--------------------------------

bq. I'm curious how we might be able to trigger a "None" event to get raised so 
I can add that to the unit test?

IIUC: "EventType.None" only happens when there are "session level events" -- 
ie: KeeperState in (Disconnected, SyncConnected (reconnected), Expired ).

bq. I think the fix would be to only set the event in the watcher if the type 
is not None.

I'm not very familiar with this code, perhaps a better approach would be to 
only proceed with this code if the EventType recived is in an explicit set of 
_expected_ types?

whichever way makes sense, one trick i find very helpful in situations like 
this (a: dealing with enums from third party packages; b: wanting to behave 
according to partitions of the enum space) is to not just "do X if state in (A) 
" but "do X if state in (A) else no-op if state in (B) else ERROR" so that if 
someone upgrades zookeeper and there are suddenly all new EventTypes we don't 
expect, they aren't silently ignored.

the EnumSet.allOf() and EnumSet.complimentsOf() methods can also help write 
very targetted unit tests to alert you to unexpected values as soon as you 
upgrade.

So for example...

{code}
public class DistributedQueue {
  public static final EnumSet<EventType> EXPECTED_EVENTS = EnumSet.of(...);
  public static final EnumSet<EventType> IGNORED_EVENTS = EnumSet.of(...);
  ...
    if (EXPECTED_EVENTS.contains(event.getType()) {
      // do stuff
      ...
    } else if (IGNORED_EVENTS.contains(event.getType()) {
      // NO-OP
    } else {
      log.error("WTF EVENT IS THIS? " + ...)
    }
  ...
}
public class TestDistributedQueue {
  ...
  /**
   * if this test fails, don't change it - go audit these EnumSets and all 
their usages
   */
  public void testSanityOfEventTypes() {
    EnumSet<EventType> known = EnumSet.copyOf(DistributedQueue.EXPECTED_EVENTS);
    known.addAll(DistributedQueue.IGNORED_EVENTS);

    EnumSet<EventType> unknown = EnumSet.complementOf(known);
    assertEquals("un-known EventTypes found, zk upgrade?", 
EnumSet.noneOf(EventType.class), unknown)
  }
{code}


> DistributedQueue spinning on calling zookeeper getChildren()
> ------------------------------------------------------------
>
>                 Key: SOLR-6631
>                 URL: https://issues.apache.org/jira/browse/SOLR-6631
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Jessica Cheng Mallet
>            Assignee: Timothy Potter
>              Labels: solrcloud
>         Attachments: SOLR-6631.patch
>
>
> The change from SOLR-6336 introduced a bug where now I'm stuck in a loop 
> making getChildren() request to zookeeper with this thread dump:
> {quote}
> Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
> java.lang.Object.wait()
> org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
> ZooKeeper$WatchRegistration)
> org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
> org.apache.solr.common.cloud.SolrZkClient$6.execute()<2 recursive calls>
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
> org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, 
> boolean)
> org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
> org.apache.solr.cloud.DistributedQueue.getChildren(long)
> org.apache.solr.cloud.DistributedQueue.peek(long)
> org.apache.solr.cloud.DistributedQueue.peek(boolean)
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
> java.lang.Thread.run()
> {quote}
> Looking at the code, I think the issue is that LatchChildWatcher#process 
> always sets the event to its member variable event, regardless of its type, 
> but the problem is that once the member event is set, the await no longer 
> waits. In this state, the while loop in getChildren(long), when called with 
> wait being Integer.MAX_VALUE will loop back, NOT wait at await because event 
> != null, but then it still will not get any children.
> {quote}
> while (true) \{
>   if (!children.isEmpty()) break;
>   watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
>   if (watcher.getWatchedEvent() != null)
>     \{ children = orderedChildren(null); \}
>   if (wait != Long.MAX_VALUE) break;
> \}
> {quote}
> I think the fix would be to only set the event in the watcher if the type is 
> not None.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to