https://github.com/airavata-courses/spring17-workload-management/wiki/%5BFinal%5D-Centralized-architecture-for-workload-management

So from the design presented in this link: "
How do we upgrade a worker, say with a new task ā€˜E’ implementation, in such a 
manner that if something goes wrong with code for ā€˜E’, the entire worker node 
should not fail? In short, avoid regression testing the entire worker module."



I was thinking that we can create a queue in the worker class. It can keep 
track of which jobs are entering, which are being processed currently, which 
have failed, and which are finished. Once the job is finished, we don't have to 
report to the scheduler. If the job does fail, we can tell the scheduler to put 
it back in queue. However, another issue that can arise is that if that 
particular machine is the only one that does that one type of job, it can keep 
looping in a circle. For that solution, i'm thinking some sort of unique key 
for every job. What am i missing and any recommendations?


Reply via email to