On Apr 30, 2009, at 5:48 AM, [email protected] wrote: > > > jm7 > >> >> 1) We do it too often (event driven) > Exactly what we are not listening to. The rate of tests is NOT the > reason > for incorrect switches.
No, but it is the reason we have difficulty finding them. And it is a source of instability. John, you can stick your head in the sand all you want, it will not make the problems go away because you refuse to see them. Also, because you consider "globally" and the universe is different every time you recalculate, because you also recalculate based on the situation as it is NOW, and that situation evolves if for no other reason than work is done in the mean time, well, if you are doing that 10 times a minute you are going to get 10 different answers. Those answers MAY be close enough that no change is needed based on the rules as they are, but, coupled with the other limitations this is an issue. I did show a very specific example of this effect where task A completes, B starts, A uploads and completes the upload, B suspended, C started, Another task D completes, E started, D's upload completes, E Suspended and F is started. The last item I will remind you once again. I may not be able to walk straight anymore, and I sometime have trouble talking, but, I am a trained and skilled systems engineer. This is what I used to do. I know I cannot put my finger on a line in a log to convince you or anyone else, But, this is a problem. It is a problem because it loads up the logs with unneeded entries and it is also a cause of some of the instability we see. Anyone that works with unstable systems knows that bumping an unstable system causes problems, the more you bump it the faster those problems arise. >> 2) All currently running tasks are eligible for preemption > Not completely true, and not the problem. Tasks that are not in the > list > of want to run are preemptable, tasks that are in the list of want > to run > are preemptable. They should only be preempted if either that task > is past > its TSI, or there is a task with deadline trouble (please work with > me on > the definition of deadline trouble). Which means you have not looked at the code. The first loop in the code marks the next state of ALL running tasks as preempted. Dr. Anderson made a change that was supposed to cure that, but it does not. >> 3) TSI is not respected as a limiting factor > It cannot be in all cases. There may be more cases where the TSI > could be > honored. For the reason above, it is not honored at all. I have pointed to the block of code where all tasks are marked for preemption and that my friend means that TSI is not considered at all ... Again, you are thinking in terms of single stream systems and on those I agree that this is the case. On multi-core systems it is much less of an issue to the point where it might never be an issue at all. 8 Core system all tasks running are 8 Hours in length Average time between task completions: 1 hour Assuming that the system has been running for awhile that is what statistics tells me. With the mix of task lengths I see on my systems the situation is usually much better than that. See the numbers below. One of my first posts I actually listed the numbers of tasks and the run times ... but the nubmers below are illustrative enough. >> 4) TSI is used in calculating deadline peril > And it has to be. Since tasks may (or may not) be re-scheduled at all > during a TSI, and the TSI may line up badly with a connection, the > TSI is > an important part of the calculation. > > Example: > 12 hour TSI. > 1 hour CPU time left on the task. > 12 hours and 1 second left before deadline. > No events for the next 12 hours. > Without TSI in the calculation, there is the distinct possibility that > there is no deadline trouble recorded. > Wait 12 hours. > You how have 1 second wall time left and 1 hour CPU time left. Your > task > is now late. > > With TSI in the calculation. > Deadline trouble is noted at the point 12 hours and 1 second before > deadline (if not somewhat earlier depending on other load). The > task gets > started and completes before deadline. Proving once again you are thinking of systems that are running a single processing stream. I suppose that you forgot my last test where you did not want to read the numbers. Or the test before that. In the first test the average time to completion between tasks run off was 6 minutes (measured over 24 hours), on the other test there were: Request CPU reschedule: 3 11 14 22 19 handle_finished_apps In a three hour period. Those numbers were for a 4, 4, 8, 4, 8 CPU system respectfully. Over a 3 hour period. Meaning that the time between a completed task and the next was at worst 60 minutes and at best about 8 minutes apart (6 minutes for the first test). Your theory falls apart because when the next task completes the pending task can be picked up and scheduled next. We are not talking about scheduling problems on single core systems. It would be nice if you would keep that in mind. We are talking about the use of parameters to control the scheduling that were developed on single thread systems being inappropriate on multi-core systems. >> 5) Work mix is not kept "interesting" >> 6) Resource Share is used in calculating run time allocations > A simulation that tracks what the machine is likely do actually do > has to > track what happens based on resource share. It may not want to be the > trigger for instant preemption though. Sadly it does do that right now, trigger preemption at the slightest breeze. Last night I had 5 uFluids tasks all running in parallel because the scheduler decided that the deadline of 5/13 could not be met. It ran those tasks for several hours before I suspended most of them. Later it suspended the one it was still running and late last night I unsuspended all of them again. They are STILL waiting to be restarted. Because they have deadlines that are close the mechanisms used to "globally" calculate will always select these tasks in batches and screw up the work mix, which means that my i7 is run in a mode that is significantly less efficient. This is also why I have proposed other metrics and rules to make these decisions to lower the driving by Resource Share on the selection process. >> 7) Work "batches" (tasks with roughly similar deadlines) are not >> "bank >> teller queued" > I really don't understand this one. A bank teller queue means that > tasks > come from one queue and are spread across the available resources as > they > become available. Are they always run in FIFO? No. However, that > does > not mean that they are not coming from the same queue. Probably because you keep refusing to read what I write carefully. See the example above. If you schedule "globally" as you so love, then tasks with close deadlines and relatively low Resource Shares will always cause these panics. I get them for IBERCIVIS, VTU, and just recently uFluids >> 8) History of work scheduling is not preserved and all deadlines are >> calculated fresh each invocation. > Please explain why this is a problem? The history of work > scheduling may > have no bearing on what has to happen in the future. See above. It also leads to other instabilities that you don't want to recognize. When I re-enabled the uFluid tasks that were such a cause for panic yesterday it sure would seem that it should be a cause for panic today. I have a NQueens task that was suspended yesterday with 12 minutes to run and it still has not restarted. If it was so important to start yesterday to run up to that point, why, 24 hours later has BOINC been running off task from projects that it has just downloaded work from that have deadlines that are later? >> 9) True deadline peril is rare, but "false positives" are common > Methods that defer leaving RR for a long time will increase true > deadline > peril. What is needed is something in between. Again, the systems of which we speak tend to be completing tasks fast enough that this argument makes no sense. With resources coming free in minutes, on average, there is no chance that this is going to be as common as you posit. Again, and again, you are thinking of the old slow systems and when you refuse to consider the evidence that people like Richard and I supply, well ... I know it is harder to see on a 4 core system. Though I did notice these issues in 2005 after I had gotten my first 4 CPU system (the first two of the test above), but, you can see it if you watch the patterns of operation. >> 10) Some of the sources of work peril may be caused by a defective >> work fetch allocation > Please give examples from logs. I don't have to. You have described over and over again why every suggested change cannot work because of these very issues. Go back and look at your examples. Virtually all your examples involve BOINC downloading work that all of a sudden causes this magical situation where I have to madly start processing the new work because BOINC fetched something that causes the world to change. Ergo, if BOINC had not fetched that work, the problem would not have occurred and the universe would not be ending. Even so, many of those examples of panics are still modeled on only having a single stream of work processing. >> 11) Other factors either obscured by the above, I forgot them, or >> maybe nothing else ... >> >>> work-fetch decisions >> >> Seems to be related to: >> >> 1) Bad debt calculations >> 2) Asking for inappropriate work loads >> 3) asking for inappropriate amounts > Please give examples. I have, any number of times. I could send you another long log showing that the CUDA debt is slowly building and in another 24 hours or so is going to be so out of whack that the client is going to stop asking for work from GPU Grid the only project from which GPU work can be fetched, and BOINC is still happily ignoring all evidence to the contrary and trying to get CUDA work from every other project in the universe and pouting because it cannot get it. There is the Rosetta guy who cannot get a queue full of Rosetta work because of the opposite problem (he is only attached to GPU Grid and Rosetta), there is Richard's logs where he needs one class of work in one part and the work fetch asks for the wrong kind of work. Others have mentioned this before, but the next is where I ask for 1 second of work and instead of getting one task I get 10 or more, or even more than one. This is a long standing problem and the issue is on the server end, but, it is still a problem >> 4) Design of client / server interactions > There are design constraints that limit the transaction to one round > trip. Actually they are design choices. And they may or may not be the best choices. One of the recent examples and questions was why we feed up the list of tasks to the server each time. Another design choice. The server is supposed to use that information to make a good choice to feed work down. If I understand the other proposal made recently changes could be made to this exchange that might be beneficial. Changes which you have also rejected out of hand. >> >>> bad debt calculation >> >> Seems to be related to: >> >> 1) Assuming that all projects have CUDA work and asking for it >> 2) Assuming that a CUDA only project has CPU work and asking for it. >> 3) Not necessarily taking into account system width correctly > I don't understand what you mean by system width. More modern systems are faster, they are also "wider" with more processing units. My i7 has 12 with 8 virtual CPUs and 4 GPU engines. I am actively considering a system with 16 CPUs with room for as many as 6 or 8 GPU cores which could bring that number up to 24 elements. As I have been struggling to get through that this changes the way work can be processed I have been using this term, a lot. Which tells me yet again that you have not actually been carefully reading what I have been writing. I know it is a PITA to read things carefully, but, I am not wordy out of spite, but to be as clear as possible. Skimming proposals looking only for reasons to reject them is not actually that helpful. >> 4) Not taking into account CUDA capability correctly >> >>> efficiency >>> of the scheduling calculations (if it's an issue) >> >> It is, but you and other nay-sayers don't have systems that >> experience >> the issues so, you and others denigrate or ignore the reports. > Fix the algorithm FIRST, optimize SECOND. Reducing the hit rate is not intended to be done to optimize anything. Sadly this is a point that I know I will never be able to prove to your satisfaction, and it is apparent that I cannot explain it well though I have tried very hard to do so. But, even with a perfect rule set, the system will retain the characteristic of instability if we keep calling the scheduler at times when there is no specific need. I get why some of those calls are made, but, the way we proceed from there is the secondary cause. And when I suggest that there may not be specific needs you have made examples time and again where work is downloaded and for some reason cannot quite grasp the fact that in most cases waiting for 30 seconds before we check to see how the schedule might be affected by this new work insist that the world is magically better if I check it instantaneously. With no evidence I might add. Even you defunct project with 5 minute deadlines would only be affected if the tasks took 4 minutes and 59 seconds ... which means they would also blow the deadlines because of the latency in uploads and downloads. If the task were a reasonable 1 minute in length then the only effect of waiting to schedule the task by 30 seconds would be to trim the margin slightly. But the more cogent point is you are offering a straw-man argument using a project that essentially collapsed because they had unreasonable requirements. So, why are we coding BOINC to handle unreasonable requirements from a project that does not exist anymore? That is a poser I cannot fathom. The fact that reducing the call rate has the side effect of increasing efficiency is a nice side effect. But it is not the reason I have proposed it, and I wish you would stop pretending that it is. In either case, the two main reasons to reduce the call rate are: a) to lower the log clutter b) reduce the rate of false changes so they are easier to identify Your intransigence on this matter is nothing short of amazing. You complain about the large logs that obscure the very problems we are hunting and yet denigrate the one way we can start to get a handle on that very issue. >> The worse point is that to identify some of the problems requires >> logging, because we do, for example, resource scheduling so often the >> logs get so big they are not usable because of the sheer size because >> we are performing actions that ARE NOT NECESSARY ... because the >> assumption is that there is no cost. But, here is a cost right here. >> If we do resource scheduling 10 times more often than needed then >> there is 10 times more data to sift. Which is the main reason I have >> harped on SLOWING THIS DOWN. >> >> It is also why in my pseudo-code proposal I suggested that we do two >> things, one, make it switchable so that we can start with a bare >> bones >> "bank teller" style queuing system and only add refinements as we see >> where it does not work adequately. Let us not add more rules than >> needed. Start with the simplest rule set possible, run it, find >> exceptions, figure out why, fix those, move on ... > In other words step back 5 years. We were there, and we had to add > refinements to get it to work. See, that is the way we fixed it then, why are you so resistant to this approach now? Back then the most common system was single core, with some duals. And, as I point out that was the time I started to notice these issues on my 4 core system. Those issues were not handled back then and they are worse now ... So lets try a new mechanism for the wide systems with as few rules as possible and see if it works. If we can create situations where it starts to fail, well, then we add complexity. I suspect that many of the rules we have now will not be needed at all. In fact, I think that much of the complexity can go away because now we can make choices that are not at all possible on single processing thread machines. > Let us not throw the baby out with the bath water. If the baby is dead, why not? The problem is fundamentally that we developed elaborate rules to handle scheduling on single processing thread machines. Duals made some of those rules passe but the effects were almost unnoticible. The effects started to become visible on 4 core systems and are now quite obvious on wider systems. This is one reason in my psuedo code I suggested that at least for the time being we keep the current scheduler for systems of less than 4 cores and try something new on the 4 and wider systems. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
