Re: A new implementation of TaskManager

Peter Firmstone Thu, 08 Jul 2010 03:25:54 -0700

Actually, it might be easier to just treat that as a separate issue fornow, which it mostly is and do what you had planned with TaskManger,tackle that one later if you want, when you've had some more time todigest the codebase. My apologies, the last thing I want to do is scareyou off.


Peter.


Peter Firmstone wrote:

Thanks Patricia, your effort and conviction is much appreciated, Riverdoesn't have bug's, it's got alligators, they're a little harder tosquash, but good for a challenge.
Cheers,

Peter.

Patricia Shanahan wrote:
I need to study this. I'll comment when I know more.

Patricia


Peter Firmstone wrote:
Hi Patricia,

This is an example of some timing difficulties, a bug involving Task.
Perhaps Task can extend Remote? Then we can pass them around asdistributed objects, which will either be a local piece of proxycode executing or a stub. That was one advantage of allowing Taskto contain it's dependencies. If it's sent elsewhere to othernodes, they can add it to their Task dependencies and the result canbe retrieved remotely. Perhaps with a getResult() method likeRunnableFuture has.
Of course there are other ways, just passing on thoughts &knowledge, for problem solving.
There seems to be a GC & concurrency bug in DGC(DistributedGarbageCollection) reported on the list, I'll dig up thedetails and and create a JIRA issue for it. It causes an exportedobject to be garbage collected before a stub can contact it, this isfor a distributed object that isn't registered as a service. Thatbug would cause problems for a Remote Task, if it were to beimplemented among other things, it needs to be fixed.
Cheers,

Peter.


Bob Scheifler (JIRA) wrote:
[https://issues.apache.org/jira/browse/RIVER-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
Bob Scheifler reopened RIVER-324:
---------------------------------

      Assignee:     (was: Brian Murphy)

Original fix had a nasty flaw.  Fix to fix has been attached.
Under certain circumstances, the ServiceDiscoveryManager internalLookupCache implementation can incorrectly process attributechange events before the lookup snapshot is processed.---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                Key: RIVER-324
                URL: https://issues.apache.org/jira/browse/RIVER-324
            Project: River
         Issue Type: Bug
         Components: net_jini_lookup
   Affects Versions: AR1
           Reporter: Brian Murphy
           Priority: Minor
            Fix For: AR2

        Attachments: river-324-2.diff, river-324.patch


When an attribute change event is received from the
lookup service between the time the cache registers
the event listener and the initial LookupTask takes
the snapshot of the associated service state, the change event canget processed first, which can result in incorrect attribute state.
This bug has been observed in a currently deployed
system, generally at startup when the services of
the system are changing their attributes from an
initial, 'unknown' state, to a discovered state that is sharedamong those services. What has been
observed is a sequence like the following:
1. event registration is sent to the lookup service
2. snapshot is requested (LookupTask is queued)
3. the lookup service sends back in the requested
   snapshot, the initial state the service registered
   for itself
4. the service sends an attribute modification request to thelookup service, which sends an
   attribute change event to the cache
5. before the cache's LookupTask processes the snapshot fromthe lookup service, the event
   arrives and the event processing thread of the
   cache processes the event containing the latest
   state of the service's attributes.
6. the cache then processes the snapshot, replacing
   the latest, most up-to-date attribute state with
   the original, initial state reflected in the
   snapshot.
7. the cache now has an incorrect view of the
   service's state.
Bob Scheifler has implemented a simple fix; which
is (quoting Bob), "to have the LookupTask execute the tasks itcreates directly, rather than queueing
them." That is, force any pending snapshot processing
tasks to be executed before the event processing
tasks.
Note that with the proposed fix, if more than
one lookup service is running, it is possible for
an attribute to "regress" as the lookup services
do not receive a given attribute change at exactly
the same time, but the inconsistency will eventually correctitself as the cache receives each attribute
change event, and so should not be a permanent condition.

Re: A new implementation of TaskManager

Reply via email to