Under certain circumstances, the ServiceDiscoveryManager internal
LookupCache implementation can incorrectly process attribute
change events before the lookup snapshot is processed.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: RIVER-324
URL: https://issues.apache.org/jira/browse/RIVER-324
Project: River
Issue Type: Bug
Components: net_jini_lookup
Affects Versions: AR1
Reporter: Brian Murphy
Priority: Minor
Fix For: AR2
Attachments: river-324-2.diff, river-324.patch
When an attribute change event is received from the
lookup service between the time the cache registers
the event listener and the initial LookupTask takes
the snapshot of the associated service state, the change event can
get processed first, which can result in incorrect attribute state.
This bug has been observed in a currently deployed
system, generally at startup when the services of
the system are changing their attributes from an
initial, 'unknown' state, to a discovered state that is shared
among those services. What has been
observed is a sequence like the following:
1. event registration is sent to the lookup service
2. snapshot is requested (LookupTask is queued)
3. the lookup service sends back in the requested
snapshot, the initial state the service registered
for itself
4. the service sends an attribute modification request to the
lookup service, which sends an
attribute change event to the cache
5. before the cache's LookupTask processes the snapshot from
the lookup service, the event
arrives and the event processing thread of the
cache processes the event containing the latest
state of the service's attributes.
6. the cache then processes the snapshot, replacing
the latest, most up-to-date attribute state with
the original, initial state reflected in the
snapshot.
7. the cache now has an incorrect view of the
service's state.
Bob Scheifler has implemented a simple fix; which
is (quoting Bob), "to have the LookupTask execute the tasks it
creates directly, rather than queueing
them." That is, force any pending snapshot processing
tasks to be executed before the event processing
tasks.
Note that with the proposed fix, if more than
one lookup service is running, it is possible for
an attribute to "regress" as the lookup services
do not receive a given attribute change at exactly
the same time, but the inconsistency will eventually correct
itself as the cache receives each attribute
change event, and so should not be a permanent condition.