Scheduler Resilience
- Use heuristic (user-supplied?) within GThread class to estimate when a
thread has taken too long.  Re-launch with greater running time
allowance until completion (or until max. # launches exceeded)

- Multiple instances of same thread on different executors would be
great, esp. if # instances is an option in Console

- Add Round Robin scheduling soon to allow use of multiple cpu-intensive
applications without a FIFO execution


Executor Resilience
- Thread to monitor state of Executor within Manager, could use frequent
pings or some other message system.  One thread in Manager talks to
Responder thread in Executor.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Tibor Biro
Sent: Thursday, January 26, 2006 11:15 AM
To: [email protected]
Subject: [Alchemi-developers] Improving Alchemi resilience


Hi all,

I am looking for ideas to improve the Alchemi resilience. 

As a first step I am trying to identify areas where improvements are
necessary. Since this is just brainstorming no idea is too wild or silly
so
if you have any let's hear them.

Scheduler features that improve resilience:
- schedule a thread on multiple executors, take the response from the
first
one. This improves the chances of a thread being executed.
- schedule a thread on multiple executors and compare the results before
returning to the application. This improves the quality of the
computation
done by executors and weeds out executors that corrupt data.
- wait a given amount of time for a thread to be executed and if no
response
is received then re-schedule the thread on another executor. This is an
optimistic implementation of the first variation.

Executor features that improve resilience:
- if the executor is shut down nicely release the running thread back to
the
manager.
- if the executor is killed then release the running thread back to the
manager on startup.
- if connection to the manager is lost due to network issues, the
manager
being down or whatever then re-connect and continue working on the
existing
thread.

Manager features that improve resilience:
- detect dead executors and re-schedule their threads.
- detect dead applications and stop running their threads.

Tibor




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Alchemi-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-developers




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Alchemi-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-developers

Reply via email to