Re: The reliability of python threads
Carl J. Van Arsdall wrote: Steve Holden wrote: [snip] Are you using memory with built-in error detection and correction? You mean in the hardware? I'm not really sure, I'd assume so but is there any way I can check on this? If the hardware isn't doing that, is there anything I can do with my software to offer more stability? You might be able to check using the OS features (have you said what OS you are using?) - alternatively Google for information from the system supplier. If you don't have that feature in hardware you are up sh*t creek without a paddle, as it can't be emulated. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Blog of Note: http://holdenweb.blogspot.com See you at PyCon? http://us.pycon.org/TX2007 -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Aahz wrote: In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] wrote: My point is that an app that dies only once every few months under load is actually pretty damn stable! That is not the kind of problem that you are likely to stimulate. This has all been so vague. How does it die? It would be useful if Python detected obvious deadlock. If all threads are blocked on mutexes, you're stuck, and at that point, it's time to abort and do tracebacks on all threads. You shouldn't have to run under a debugger to detect that. Then a timer, so that if the Global Python Lock stays locked for more than N seconds, you get an abort and a traceback. That way, if you get stuck in some C library, it gets noticed. Those would be some good basic facilities to have in thread support. In real-time work, you usually have a high-priority thread which wakes up periodically and checks that a few flags have been set indicating progress of the real time work, then clears the flags. Throughout the real time code, flags are set indicating progress for the checking thread to notice. All serious real time systems have some form of stall timer like that; there's often a stall timer in hardware. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Steve Holden wrote: [snip] Are you using memory with built-in error detection and correction? You mean in the hardware? I'm not really sure, I'd assume so but is there any way I can check on this? If the hardware isn't doing that, is there anything I can do with my software to offer more stability? -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
John Nagle wrote: Aahz wrote: In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] wrote: My point is that an app that dies only once every few months under load is actually pretty damn stable! That is not the kind of problem that you are likely to stimulate. This has all been so vague. How does it die? Well, before operating on most of the data I perform type checks, if the type check fails, my system flags an exception. Now i'm in the process of finding out how the data went bad. I gotta wait at this point though, so I was investigating possibilities so I could find a new way of throwing the kitchen sink at it. It would be useful if Python detected obvious deadlock. If all threads are blocked on mutexes, you're stuck, and at that point, it's time to abort and do tracebacks on all threads. You shouldn't have to run under a debugger to detect that. Then a timer, so that if the Global Python Lock stays locked for more than N seconds, you get an abort and a traceback. That way, if you get stuck in some C library, it gets noticed. Those would be some good basic facilities to have in thread support. I agree. That would be incredibly useful. Although doesn't this spark up the debate on threads killing threads? From what I understand, this is frowned upon (and was removed from java because it was dangerous). Although I think that if there was a master or control thread that watched the state of the system and could intervene, that would be powerful. One way to do this could be to use processes, and each process could catch a kill signal if it appears to be stalled, although I am absolutely sure there is more to it than that. I don't think this could be done at all with python threads though, but as a fan of python threads and their ease of use, it would be a nice and powerful feature to have. -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Hendrik van Rooyen wrote: [snip] could definitely do more of them. The thing will be When I read this - I thought - probably your stuff is working perfectly - on your test cases - you could try to send it some random data and to see what happens - seeing as you have a test server, throw the kitchen sink at it. Possibly random here means something that looks like data but that is malformed in some way. Kind of try to trick the system to get it to break reliably. I'm sorry I can't be more specific - it sounds so weak, and you probably already have test cases that must fail but I don't know how to put it any better... Well, sometimes a weak analogy is the best thing because it allows me to fill in the blanks How can I throw a kitchen sink at it in a way I never have before And away my mind goes, so thank you. -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Aahz wrote: My response is that you're asking the wrong questions here. Our database server locked up hard Sunday morning, and we still have no idea why (the machine itself, not just the database app). I think it's more important to focus on whether you have done all that is reasonable to make your application reliable -- and then put your efforts into making your app recoverable. Well, I assume that I have done all I can to make it reliable. This list is usually my last resort, or a place where I come hoping to find ideas that aren't coming to me naturally. The only other thing I thought to come up with was that there might be network errors. But i've gone back and forth on that, because TCP should handle that for me and I shouldn't have to deal with it directly in pyro, although I've added (and continue to add) checks in places that appear appropriate (and in some cases, checks because I prefer to be paranoid about errors). My point is that an app that dies only once every few months under load is actually pretty damn stable! That is not the kind of problem that you are likely to stimulate. I'm particularly making this comment in the context of your later point about the bug showing up only every three or four months. Side note: without knowing what error messages you're getting, there's not much anybody can say about your programs or the reliability of threads for your application. Right, I wasn't coming here to get someone to debug my app, I'm just looking for ideas. I constantly am trying to find new ways to improve my software and new ways to reduce bugs, and when i get really stuck, new ways to track bugs down. The exception won't mean much, but I can say that the error appears to me as bad data. I do checks prior to performing actions on any data, if the data doesn't look like what it should look like, then the system flags an exception. The problem I'm having is determining how the data went bad. In tracking down the problem a couple guys mentioned that problems like that usually are a race condition. From here I examined my code, checked out all the locking stuff, made sure it was good, and wasn't able to find anything. Being that there's one lock and the critical sections are well defined, I'm having difficulty. One idea I have to try and get a better understanding might be to check data before its stored. Again, I still don't know how it would get messed up nor can I reproduce the error on my own. Do any of you think that would be a good practice for trying to track this down? (Check the data after reading it, check the data before saving it) What we do at my company is maintain log files. When we think we have identified a potential choke point for problems, we add a log call. Tracking this down will involve logging the changes to your data until you can figure out where it goes wrong -- once you know where it goes wrong, you have an excellent chance of figuring out why. -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ I disrespectfully agree. --SJM -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Carl J. Van Arsdall wrote: Aahz wrote: [snip] My response is that you're asking the wrong questions here. Our database server locked up hard Sunday morning, and we still have no idea why (the machine itself, not just the database app). I think it's more important to focus on whether you have done all that is reasonable to make your application reliable -- and then put your efforts into making your app recoverable. Well, I assume that I have done all I can to make it reliable. This list is usually my last resort, or a place where I come hoping to find ideas that aren't coming to me naturally. The only other thing I thought to come up with was that there might be network errors. But i've gone back and forth on that, because TCP should handle that for me and I shouldn't have to deal with it directly in pyro, although I've added (and continue to add) checks in places that appear appropriate (and in some cases, checks because I prefer to be paranoid about errors). I'm particularly making this comment in the context of your later point about the bug showing up only every three or four months. Side note: without knowing what error messages you're getting, there's not much anybody can say about your programs or the reliability of threads for your application. Right, I wasn't coming here to get someone to debug my app, I'm just looking for ideas. I constantly am trying to find new ways to improve my software and new ways to reduce bugs, and when i get really stuck, new ways to track bugs down. The exception won't mean much, but I can say that the error appears to me as bad data. I do checks prior to performing actions on any data, if the data doesn't look like what it should look like, then the system flags an exception. The problem I'm having is determining how the data went bad. In tracking down the problem a couple guys mentioned that problems like that usually are a race condition. From here I examined my code, checked out all the locking stuff, made sure it was good, and wasn't able to find anything. Being that there's one lock and the critical sections are well defined, I'm having difficulty. One idea I have to try and get a better understanding might be to check data before its stored. Again, I still don't know how it would get messed up nor can I reproduce the error on my own. Do any of you think that would be a good practice for trying to track this down? (Check the data after reading it, check the data before saving it) Are you using memory with built-in error detection and correction? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Blog of Note: http://holdenweb.blogspot.com See you at PyCon? http://us.pycon.org/TX2007 -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Hendrik van Rooyen wrote: Carl J. Van Arsdall [EMAIL PROTECTED] wrote: 8 --- Yea, I do some of that too. I use that with conditional print statements to stderr when i'm doing my validation against my test cases. But I could definitely do more of them. The thing will be When I read this - I thought - probably your stuff is working perfectly - on your test cases - you could try to send it some random data and to see what happens - seeing as you have a test server, throw the kitchen sink at it. Possibly random here means something that looks like data but that is malformed in some way. Kind of try to trick the system to get it to break reliably. I'm sorry I can't be more specific - it sounds so weak, and you probably already have test cases that must fail but I don't know how to put it any better... - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], [EMAIL PROTECTED] writes: | | What makes you think Paddy indicated he wouldn't try to solve the problem? | Here's what he wrote: | | What I'm proposing is that if, for example, a process stops running | three times in a year at roughly three to four months intervals , and it | should have stayed up; then restart the server sooner, at aa time of | your choosing, whilst taking other measures to investicate the error. | | I see nothing wrong with trying to minimize the chances of a problem rearing | its ugly head while at the same time trying to investigate its cause (and | presumably solve it). No, nor do I, but look more closely. His quote makes it quite clear that he has got it firmly in his mind that this is a degradation problem, and so regular restarting will improve the reliability. Well, it could also be one where failure becomes LESS likely the longer the server stays up (i.e. the settling down problem). No problem is as hard to find as one where you are firmly convinced that it is somewhere other than where it is. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On 26 Jan, 09:05, [EMAIL PROTECTED] (Nick Maclaren) wrote: In article [EMAIL PROTECTED],[EMAIL PROTECTED] writes:| | What makes you think Paddy indicated he wouldn't try to solve the problem? | Here's what he wrote: | | What I'm proposing is that if, for example, a process stops running | three times in a year at roughly three to four months intervals , and it | should have stayed up; then restart the server sooner, at aa time of | your choosing, whilst taking other measures to investicate the error. | | I see nothing wrong with trying to minimize the chances of a problem rearing | its ugly head while at the same time trying to investigate its cause (and | presumably solve it). No, nor do I, but look more closely. His quote makes it quite clear that he has got it firmly in his mind that this is a degradation problem, and so regular restarting will improve the reliability. Well, it could also be one where failure becomes LESS likely the longer the server stays up (i.e. the settling down problem). If in the past year the settling down problem did not rear its head when the server crashed after three to four months and was restarted, then why not implement a regular , notified, downtime - whilst also looking into the problem in more depth? * You are already having to restart. * restarts last for 3-4 months. Why burden yourself with Oh but it could fail once in three hours, you've not prooved that it can't, we'll have to stop everything whilst we do a thorough investigation. Is it Poisson? Is it 'settling down'? Just wait whilst I prepare my next doctoral thesis... - Okay, the last was extreme. but cathartic :-) No problem is as hard to find as one where you are firmly convinced that it is somewhere other than where it is. Amen! Regards, Nick Maclaren. - Paddy. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Hendrik van Rooyen wrote: Carl J. Van Arsdall [EMAIL PROTECTED] wrote: [snip] Are you 100% rock bottom gold plated guaranteed sure that there is not something else that is also critical that you just haven't realised is? 100%? No, definitely not. I know myself, as I explore this option and other options, I will of course be going into and out of the code, looking for that small piece I might have missed. But I'm like a modern operating system, I do lots of things at once. So after being unable to solve it the first few times, I thought to pose a question, but as I pose the question that never means that I'm done looking at my code and hoping I missed something. I'd much rather have this be my fault... that means I have a much higher probability of fixing it. But i sought to explore some tips given to me. Ah, but the day I could be 100% sure, that would be a good day (hell, i'd go ask for a raise for being the best coder ever!) This stuff is never obvious before the fact - and always seems stupid afterward, when you have found it. Your best (some would say only) weapon is your imagination, fueled by scepticism... Yea, seriously! try and get a better understanding might be to check data before its stored. Again, I still don't know how it would get messed up nor can I reproduce the error on my own. Do any of you think that would be a good practice for trying to track this down? (Check the data after reading it, check the data before saving it) Nothing wrong with doing that to find a bug - not as a general practice, of course - that would be too pessimistic. In hard to find bugs - doing anything to narrow the time and place of the error down is fair game - the object is to get you to read some code that you *know works* with new eyes... I really like that piece of wisdom, I'll add that to my list of coding mantras. Thanks! I build in a global boolean variable that I call trace, and when its on I do all sort of weird stuff, giving a running commentary (either by print or in some log like file) of what the programme is doing, like read this, wrote that, received this, done that here, etc. A bare useful minimum is a we get here indicator like the routine name, but the data helps a lot too. Yea, I do some of that too. I use that with conditional print statements to stderr when i'm doing my validation against my test cases. But I could definitely do more of them. The thing will be simulating the failure. In the production server, thousands of printed messages would be bad. I've done short but heavy simulations, but to no avail. For example, I'll have a couple systems infinitely loop and beat on the system. This is a much heavier load than the system will ever normally face, as its hit a lot at once and then idles for a while. The test environment constantly hits it, and I let that run for several days. Maybe a longer run is needed, but how long is reasonable before determining that its something beyond my control? Compared to an assert, it does not stop the execution, and you could get lucky by cross correlating such traces from different threads. - or better, if you use a queue or a pipe for the log, you might see the timing relationships directly. Ah, store the logs in a rotating queue of fixed size? That would work pretty well to maintain control on a large run, thanks! But this in itself is fraught with danger, as you can hit file size limits, or slow the whole thing down to unusability. On the other hand it does not generate the volume that a genuine trace does, it is easier to read, and you can limit it to the bits that you are currently suspicious of. Programming is such fun... Yea, I'm one of those guys who really gets a sense of satisfaction out of coding. Thanks for the tips. -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Paddy [EMAIL PROTECTED] writes: | | Three to four months before `strange errors`? I'd spend some time | correlating logs; not just for your program, but for everything running | on the server. Then I'd expect to cut my losses and arrange to safely | re-start the program every TWO months. | (I'd arrange the re-start after collecting logs but before their | analysis. Life is too short). Forget it. That strategy is fine in general, but is a waste of time where threading issues are involved (or signal handling, or some types of communication problem, for that matter). There are three unrelated killer facts that interact: Such failures are usually probabilistic (Poisson process), and so have no history. The expected number is usually proportional to the square of the activity, sometimes a higher power. Virtually nothing involved does any routine logging, or even has options to log relevant events. The first means that the strategy of restarting doesn't help. All three mean that current logs are almost never any use. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 25, 9:26 am, [EMAIL PROTECTED] (Nick Maclaren) wrote: In article [EMAIL PROTECTED],Paddy [EMAIL PROTECTED] writes:| | Three to four months before `strange errors`? I'd spend some time | correlating logs; not just for your program, but for everything running | on the server. Then I'd expect to cut my losses and arrange to safely | re-start the program every TWO months. | (I'd arrange the re-start after collecting logs but before their | analysis. Life is too short). Forget it. That strategy is fine in general, but is a waste of time where threading issues are involved (or signal handling, or some types of communication problem, for that matter). Nah, Its a great strategy. it keeps you up and running when all you know for sure is that you will most likely be able to keep things together for three months normally. The OP only thinks its a threading problem - it doesn't matter what the true fix will be, as long as arranging to re-start the server well before its likely to go down doesn't take too long, compared to your exploration of the problem, and, of course, you have to be able to afford the glitch in availability. There are three unrelated killer facts that interact: Such failures are usually probabilistic (Poisson process), and so have no history. The expected number is usually proportional to the square of the activity, sometimes a higher power. Virtually nothing involved does any routine logging, or even has options to log relevant events. The first means that the strategy of restarting doesn't help. All three mean that current logs are almost never any use. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Aahz wrote: [snip] My response is that you're asking the wrong questions here. Our database server locked up hard Sunday morning, and we still have no idea why (the machine itself, not just the database app). I think it's more important to focus on whether you have done all that is reasonable to make your application reliable -- and then put your efforts into making your app recoverable. Well, I assume that I have done all I can to make it reliable. This list is usually my last resort, or a place where I come hoping to find ideas that aren't coming to me naturally. The only other thing I thought to come up with was that there might be network errors. But i've gone back and forth on that, because TCP should handle that for me and I shouldn't have to deal with it directly in pyro, although I've added (and continue to add) checks in places that appear appropriate (and in some cases, checks because I prefer to be paranoid about errors). I'm particularly making this comment in the context of your later point about the bug showing up only every three or four months. Side note: without knowing what error messages you're getting, there's not much anybody can say about your programs or the reliability of threads for your application. Right, I wasn't coming here to get someone to debug my app, I'm just looking for ideas. I constantly am trying to find new ways to improve my software and new ways to reduce bugs, and when i get really stuck, new ways to track bugs down. The exception won't mean much, but I can say that the error appears to me as bad data. I do checks prior to performing actions on any data, if the data doesn't look like what it should look like, then the system flags an exception. The problem I'm having is determining how the data went bad. In tracking down the problem a couple guys mentioned that problems like that usually are a race condition. From here I examined my code, checked out all the locking stuff, made sure it was good, and wasn't able to find anything. Being that there's one lock and the critical sections are well defined, I'm having difficulty. One idea I have to try and get a better understanding might be to check data before its stored. Again, I still don't know how it would get messed up nor can I reproduce the error on my own. Do any of you think that would be a good practice for trying to track this down? (Check the data after reading it, check the data before saving it) -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Paddy [EMAIL PROTECTED] writes: | | | Three to four months before `strange errors`? I'd spend some time | | correlating logs; not just for your program, but for everything running | | on the server. Then I'd expect to cut my losses and arrange to safely | | re-start the program every TWO months. | | (I'd arrange the re-start after collecting logs but before their | | analysis. Life is too short). | | Forget it. That strategy is fine in general, but is a waste of time | where threading issues are involved (or signal handling, or some types | of communication problem, for that matter). | | Nah, Its a great strategy. it keeps you up and running when all you | know for sure is that you will most likely be able to keep things | together for three months normally. | | The OP only thinks its a threading problem - it doesn't matter what the | true fix will be, as long as arranging to re-start the server well ^ | before its likely to go down doesn't take too long, compared to your | exploration of the problem, and, of course, you have to be able to | afford the glitch in availability. Consider the marked phrase in the context of a Poisson process failure model, and laugh. If you don't understand why I say that, I suggest finding out the properties of the Poisson process! Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 25, 7:36 pm, [EMAIL PROTECTED] (Nick Maclaren) wrote: In article [EMAIL PROTECTED],Paddy [EMAIL PROTECTED] writes:| | | Three to four months before `strange errors`? I'd spend some time | | correlating logs; not just for your program, but for everything running | | on the server. Then I'd expect to cut my losses and arrange to safely | | re-start the program every TWO months. | | (I'd arrange the re-start after collecting logs but before their | | analysis. Life is too short). | | Forget it. That strategy is fine in general, but is a waste of time | where threading issues are involved (or signal handling, or some types | of communication problem, for that matter). | | Nah, Its a great strategy. it keeps you up and running when all you | know for sure is that you will most likely be able to keep things | together for three months normally. | | The OP only thinks its a threading problem - it doesn't matter what the | true fix will be, as long as arranging to re-start the server well ^ | before its likely to go down doesn't take too long, compared to your | exploration of the problem, and, of course, you have to be able to | afford the glitch in availability. Consider the marked phrase in the context of a Poisson process failure model, and laugh. If you don't understand why I say that, I suggest finding out the properties of the Poisson process! Regards, Nick Maclaren. No, you should think of the service that needs to be up. You seem to be talking about how it can't be fixed rather than looking for ways to keep things going. A little learning is fine but it can't theoretically be fixed is no solution. With a program that stays up for that long, the situation will usualy work out for the better when either software versions are upgraded, or OS and drivers are upgraded. (Sometimes as a result of the analysis, sometimes not). Keep your eye on the goal and your more likely to score! - Paddy. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Paddy [EMAIL PROTECTED] writes: No, you should think of the service that needs to be up. You seem to be talking about how it can't be fixed rather than looking for ways to keep things going. But you're proposing cargo cult programming. There is no reason whatsoever to expect that restarting the server now and then will help the problem in the slightest. Nick used the fancy term Poisson process but it just means that the probability of failure at any moment is independent of what's happened in the past, like the spontaneous radioactive decay of an atom. It's not like a mechanical system where some part gradually gets worn out and eventually breaks, so you can prevent the failure by replacing the part every so often. A little learning is fine but it can't theoretically be fixed is no solution. The best you can do is identify the unfixable situations precisely and work around them. Precision is important. The next best thing is have several servers running simultaneously, with failure detection and automatic failover. If a server is failing at random every few months, trying to prevent that by restarting it every so often is just shooting in the dark. Think of your server stopping now and then because there's a power failure, where you get power failures every few months on the average. Shutting down your server once a month, unplugging it, and plugging it back in will do nothing to prevent those outages. You need to either identify and fix whatever is causing the power outages, or install a backup generator. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 25, 8:00 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: Paddy [EMAIL PROTECTED] writes: No, you should think of the service that needs to be up. You seem to be talking about how it can't be fixed rather than looking for ways to keep things going. But you're proposing cargo cult programming. i don't know that term. What I'm proposing is that if, for example, a process stops running three times in a year at roughly three to four months intervals , and it should have stayed up; then restart the server sooner, at aa time of your choosing, whilst taking other measures to investicate the error. There is no reason whatsoever to expect that restarting the server now and then will help the problem in the slightest. Thats where we most likely differ. The problem is only indirecctly the program failing. the customer wants reliable service. Which you can get from unreliable components. It happens all the time in firmware controlled systems that periodically reboot themselves as a matter of course. Nick used the fancy term Poisson process but it just means that the probability of failure at any moment is independent of what's happened in the past, like the spontaneous radioactive decay of an atom. It's not like a mechanical system where some part gradually gets worn out and eventually breaks, so you can prevent the failure by replacing the part every so often. Whilst you sit agreeing on how many fairys can dance on the end of a pin or not Your company could be loosing customers. You and Nick seem to be saying it *must* be Poisson, therefore we can't do... A little learning is fine but it can't theoretically be fixed is no solution.The best you can do is identify the unfixable situations precisely and work around them. Precision is important. I'm sorry, but your argument reminds me of when Western statistical quality control first met with the Japanese Zero defects methodologies. We had argued ourselves into accepting a certain amount of defective cars getting out to customers as the result of our theories. The Japanese practices emphasized *no* defects were acceptable at the customer, and they seemed to deliver better made cars. The next best thing is have several servers running simultaneously, with failure detection and automatic failover. Yah, finally. I can work with that If a server is failing at random every few months, trying to prevent that by restarting it every so often is just shooting in the dark. at random - every few months Me thinking it happens every few months allows me to search for a fix. If thinking it happens at random leads you to a brick wall, then switch! Think of your server stopping now and then because there's a power failure, where you get power failures every few months on the average. Shutting down your server once a month, unplugging it, and plugging it back in will do nothing to prevent those outages. You need to either identify and fix whatever is causing the power outages, or install a backup generator. Yep. I also know that a mad bloke entering the server room with a hammer every three to four months is also not likely to be fixed by restarting the server every two months ;-) - Paddy. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Paddy [EMAIL PROTECTED] writes: But you're proposing cargo cult programming. i don't know that term. http://en.wikipedia.org/wiki/Cargo_cult_programming What I'm proposing is that if, for example, a process stops running three times in a year at roughly three to four months intervals , and it should have stayed up; then restart the server sooner, at aa time of your choosing, What makes you think that restarting the server will make it less likely to fail? It sounds to me like there's zero evidence of that, since you say roughly three or four month intervals and talk about threading and race conditions. If it's failing every 3 months, 15 days and 2.43 hours like clockwork, that's different, sure, restart it every three months. But the description I see so far sounds like a random failure caused by some events occurring with low enough probability that they only happen on average every few months of operation. That kind of thing is very common and is often best diagnosed by instrumenting the hell out of the code. There is no reason whatsoever to expect that restarting the server now and then will help the problem in the slightest. Thats where we most likely differ. Do you think there is a reason to expect that restarting the server will help the problem in the slightest? I realize you seem to expect that, but you have not given a REASON. That's what I mean by cargo cult programming. Whilst you sit agreeing on how many fairys can dance on the end of a pin or not Your company could be loosing customers. You and Nick seem to be saying it *must* be Poisson, therefore we can't do... I dunno about Nick, I'm saying it's best to assume that it's Poisson and do whatever is necessary to diagnose and fix the bug, and that the voodoo measure you're proposing is not all that likely to help and it will take years to find out whether it helps or not (i.e. restarting after 3 months and going another 3 months without a failure proves nothing). I'm sorry, but your argument reminds me of when Western statistical quality control first met with the Japanese Zero defects methodologies. We had argued ourselves into accepting a certain amount of defective cars getting out to customers as the result of our theories. The Japanese practices emphasized *no* defects were acceptable at the customer, and they seemed to deliver better made cars. I don't see your point. You're the one who wants to keep operating defective software instead of fixing it. at random - every few months Me thinking it happens every few months allows me to search for a fix. If thinking it happens at random leads you to a brick wall, then switch! But you need evidence before you can say it happens every few months. Do you have, say, a graph of the exact dates and times of failure, the number of requests processed so far, etc.? If it happened at some exact or almost exact uniform time interval or precisely once every 1.273 million requests or whatever, that tells you something. But the earlier description didn't sound like that. Restarting the server is not much better than carrying a lucky rabbit's foot. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Paddy [EMAIL PROTECTED] writes: | | No, you should think of the service that needs to be up. You seem to be | talking about how it can't be fixed rather than looking for ways to | keep things going. A little learning is fine but it can't | theoretically be fixed is no solution. I suggest that you do invest in a little learning and look up Poisson processes. | Keep your eye on the goal and your more likely to score! And, if you have your eye on the wrong goal, you would generally be better off not scoring :-) Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Paul I dunno about Nick, I'm saying it's best to assume that it's Paul Poisson and do whatever is necessary to diagnose and fix the bug, Paul and that the voodoo measure you're proposing is not all that Paul likely to help and it will take years to find out whether it helps Paul or not (i.e. restarting after 3 months and going another 3 months Paul without a failure proves nothing). What makes you think Paddy indicated he wouldn't try to solve the problem? Here's what he wrote: What I'm proposing is that if, for example, a process stops running three times in a year at roughly three to four months intervals , and it should have stayed up; then restart the server sooner, at aa time of your choosing, whilst taking other measures to investicate the error. I see nothing wrong with trying to minimize the chances of a problem rearing its ugly head while at the same time trying to investigate its cause (and presumably solve it). Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
[EMAIL PROTECTED] writes: What makes you think Paddy indicated he wouldn't try to solve the problem? Here's what he wrote: What I'm proposing is that if, for example, a process stops running three times in a year at roughly three to four months intervals , and it should have stayed up; then restart the server sooner, at aa time of your choosing, whilst taking other measures to investicate the error. Well, ok, that's better than just rebooting every so often and leaving it at that, like the firmware systems he cited. I see nothing wrong with trying to minimize the chances of a problem I think a measure to minimize the chance of some problem is only valid if there's some plausible theory that it WILL decrease the chance of the problem (e.g. if there's reason to think that the problem is caused by a very slow resource leak, but that hasn't been suggested). That's the part that I'm missing from this story. One thing I'd certainly want to do is set up a test server under a much heavier load than the real server sees, and check whether the problem occurs faster. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Right, I wasn't coming here to get someone to debug my app, I'm just looking for ideas. I constantly am trying to find new ways to improve my software and new ways to reduce bugs, and when i get really stuck, new ways to track bugs down. The exception won't mean much, but I can say that the error appears to me as bad data. I do checks prior to performing actions on any data, if the data doesn't look like what it should look like, then the system flags an exception. The problem I'm having is determining how the data went bad. In tracking down the problem a couple guys mentioned that problems like that usually are a race condition. From here I examined my code, checked out all the locking stuff, made sure it was good, and wasn't able to find anything. Being that there's one lock and the critical sections are well defined, I'm having difficulty. One idea I have to Are you 100% rock bottom gold plated guaranteed sure that there is not something else that is also critical that you just haven't realised is? This stuff is never obvious before the fact - and always seems stupid afterward, when you have found it. Your best (some would say only) weapon is your imagination, fueled by scepticism... try and get a better understanding might be to check data before its stored. Again, I still don't know how it would get messed up nor can I reproduce the error on my own. Do any of you think that would be a good practice for trying to track this down? (Check the data after reading it, check the data before saving it) Nothing wrong with doing that to find a bug - not as a general practice, of course - that would be too pessimistic. In hard to find bugs - doing anything to narrow the time and place of the error down is fair game - the object is to get you to read some code that you *know works* with new eyes... I build in a global boolean variable that I call trace, and when its on I do all sort of weird stuff, giving a running commentary (either by print or in some log like file) of what the programme is doing, like read this, wrote that, received this, done that here, etc. A bare useful minimum is a we get here indicator like the routine name, but the data helps a lot too. Compared to an assert, it does not stop the execution, and you could get lucky by cross correlating such traces from different threads. - or better, if you use a queue or a pipe for the log, you might see the timing relationships directly. But this in itself is fraught with danger, as you can hit file size limits, or slow the whole thing down to unusability. On the other hand it does not generate the volume that a genuine trace does, it is easier to read, and you can limit it to the bits that you are currently suspicious of. Programming is such fun... hth - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
The reliability of python threads
Hey everyone, I have a question about python threads. Before anyone goes further, this is not a debate about threads vs. processes, just a question. With that, are python threads reliable? Or rather, are they safe? I've had some strange errors in the past, I use threading.lock for my critical sections, but I wonder if that is really good enough. Does anyone have any conclusive evidence that python threads/locks are safe or unsafe? Thanks, Carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] writes: | Hey everyone, I have a question about python threads. Before anyone | goes further, this is not a debate about threads vs. processes, just a | question. | | With that, are python threads reliable? Or rather, are they safe? I've | had some strange errors in the past, I use threading.lock for my | critical sections, but I wonder if that is really good enough. | | Does anyone have any conclusive evidence that python threads/locks are | safe or unsafe? Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft etc.) Python will shield you from some problems, but not all. There is precious little that you can do, because the root cause is that the standards and specifications are hopelessly flawed. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On 24 Jan 2007 17:12:19 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] writes: | Hey everyone, I have a question about python threads. Before anyone | goes further, this is not a debate about threads vs. processes, just a | question. | | With that, are python threads reliable? Or rather, are they safe? I've | had some strange errors in the past, I use threading.lock for my | critical sections, but I wonder if that is really good enough. | | Does anyone have any conclusive evidence that python threads/locks are | safe or unsafe? Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft etc.) Python will shield you from some problems, but not all. There is precious little that you can do, because the root cause is that the standards and specifications are hopelessly flawed. This is sufficiently inaccurate that I would call it FUD. Using threads from Python, as from any other language, requires knowledge of the tradeoffs and limitations of threading, but claiming that their usage is *inherently* unsafe isn't true. It is almost certain that your code and locking are flawed, not that the threads underneath you are buggy. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Carl Does anyone have any conclusive evidence that python threads/locks Carl are safe or unsafe? In my experience Python threads are generally safer than the programmers that use them. ;-) Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
[EMAIL PROTECTED] wrote: Carl Does anyone have any conclusive evidence that python threads/locks Carl are safe or unsafe? In my experience Python threads are generally safer than the programmers that use them. ;-) Haha, yea, tell me about it. The whole GIL thing made me nervous about the locking operations happening truly atomically and not getting weird. Thanks for ensuring me that i'm just nuts :) -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Chris Mellon [EMAIL PROTECTED] writes: | | | | Does anyone have any conclusive evidence that python threads/locks are | | safe or unsafe? | | Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft | etc.) Python will shield you from some problems, but not all. | | There is precious little that you can do, because the root cause is | that the standards and specifications are hopelessly flawed. | | This is sufficiently inaccurate that I would call it FUD. Using | threads from Python, as from any other language, requires knowledge of | the tradeoffs and limitations of threading, but claiming that their | usage is *inherently* unsafe isn't true. It is almost certain that | your code and locking are flawed, not that the threads underneath you | are buggy. I suggest that you find out rather more about the ill-definition of POSIX threading memory model, to name one of the better documented aspects. A Web search should provide you with more information on the ghastly mess than any sane person wants to know. And that is only one of many aspects :-( Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Chris Mellon [EMAIL PROTECTED] writes: | | | | Does anyone have any conclusive evidence that python threads/locks are | | safe or unsafe? | | Unsafe. They are built on top of unsafe primitives (POSIX, Microsoft | etc.) Python will shield you from some problems, but not all. | | There is precious little that you can do, because the root cause is | that the standards and specifications are hopelessly flawed. | | This is sufficiently inaccurate that I would call it FUD. Using | threads from Python, as from any other language, requires knowledge of | the tradeoffs and limitations of threading, but claiming that their | usage is *inherently* unsafe isn't true. It is almost certain that | your code and locking are flawed, not that the threads underneath you | are buggy. I suggest that you find out rather more about the ill-definition of POSIX threading memory model, to name one of the better documented aspects. A Web search should provide you with more information on the ghastly mess than any sane person wants to know. And that is only one of many aspects :-( I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard. Emphasizing this means that people will tend to ignore bugs as being the fault of POSIX rather than either auditing their code more carefully, or avoiding threads entirely (the second being what I suspect your goal is). As a last case, I should point out that while the POSIX memory model can't be proven safe, concrete implementations do not necessarily suffer from this problem. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Chris Mellon wrote: On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: [snip] I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard. Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. THe thing is, I'm really not doing anything too crazy, so i'm having difficult tracking it down. I had heard in the past that there may be issues with threads, so I thought to investigate this side of things. It still proves difficult, but reassurance of the threading model helps me focus my efforts. Emphasizing this means that people will tend to ignore bugs as being the fault of POSIX rather than either auditing their code more carefully, or avoiding threads entirely (the second being what I suspect your goal is). As a last case, I should point out that while the POSIX memory model can't be proven safe, concrete implementations do not necessarily suffer from this problem. Would you consider the Linux implementation of threads to be concrete? -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] writes: | Chris Mellon wrote: | | Logic and programming errors in user code are far more likely to be | the cause of random errors in a threaded program than theoretical | (I've never come across a case in practice) issues with the POSIX | standard. | | Yea, typically I would think that. The problem I am seeing is | incredibly intermittent. Like a simple pyro server that gives me a | problem maybe every three or four months. Just something funky will | happen to the state of the whole thing, some bad data, i'm having an | issue tracking it down and some more experienced programmers mentioned | that its most likely a race condition. THe thing is, I'm really not | doing anything too crazy, so i'm having difficult tracking it down. I | had heard in the past that there may be issues with threads, so I | thought to investigate this side of things. I have seen that many dozens of times on half a dozen Unices, but have only tracked down the cause in a handful of cases. Of those, implementation defects that are sanctioned by the standards have accounted for about half. Note that the term race condition is accurate but misleading! One of the worst problems with POSIX is that it does not define how non-memory global state is synchronised. For example, it is possible for a memory update and an associated signal to occur on different sides of a synchronisation boundary. Similarly, it is possible for I/O to sidestep POSIX's synchronisation boundaries. I have seen both. Perhaps the nastiest is that POSIX leaves it unclear whether the action of synchronisation is transitive. So, if A synchronises with B, and then B with C, A may not have synchronised with C. Again, I have seen that. It can happen on Intel systems, according to the experts I know. | Would you consider the Linux implementation of threads to be concrete? In this sort of area, Linux tends to be saner than most systems, but remember that it has had MUCH less stress testing on threaded codes than many other Unices. In fact, it was only a few years ago that Linux threads became stable enough to be worth using. Note that failures due to implementation defects and flaws in the standards are likely to show up in very obscure ways; ones due to programmer error tend to be much simpler. If you want to contact me by Email, and can describe technically what you are doing and (most importantly) what you are assuming, I may be able to give some hints. But no promises. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Hey everyone, I have a question about python threads. Before anyone goes further, this is not a debate about threads vs. processes, just a question. With that, are python threads reliable? Or rather, are they safe? I've had some strange errors in the past, I use threading.lock for my critical sections, but I wonder if that is really good enough. Does anyone have any conclusive evidence that python threads/locks are safe or unsafe? My response is that you're asking the wrong questions here. Our database server locked up hard Sunday morning, and we still have no idea why (the machine itself, not just the database app). I think it's more important to focus on whether you have done all that is reasonable to make your application reliable -- and then put your efforts into making your app recoverable. I'm particularly making this comment in the context of your later point about the bug showing up only every three or four months. Side note: without knowing what error messages you're getting, there's not much anybody can say about your programs or the reliability of threads for your application. -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ Help a hearing-impaired person: http://rule6.info/hearing.html -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Aahz) writes: | | My response is that you're asking the wrong questions here. Our database | server locked up hard Sunday morning, and we still have no idea why (the | machine itself, not just the database app). I think it's more important | to focus on whether you have done all that is reasonable to make your | application reliable -- and then put your efforts into making your app | recoverable. Absolutely! Shit happens. In a well-designed world, that would not be the case, but we don't live in one. Until you have identified the cause, you can't tell if threading has anything to do with the failure - given what we know, it seems likely, but what Aahz says is how to tackle the problem WHATEVER the cause. Regards, Nick Maclaren. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 6:43 pm, Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Chris Mellon wrote: On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: [snip] I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard. Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. THe thing is, I'm really not doing anything too crazy, so i'm having difficult tracking it down. I had heard in the past that there may be issues with threads, so I thought to investigate this side of things. It still proves difficult, but reassurance of the threading model helps me focus my efforts. SNIP -carl Three to four months before `strange errors`? I'd spend some time correlating logs; not just for your program, but for everything running on the server. Then I'd expect to cut my losses and arrange to safely re-start the program every TWO months. (I'd arrange the re-start after collecting logs but before their analysis. Life is too short). - Paddy. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Carl J. Van Arsdall wrote: Chris Mellon wrote: On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: [snip] I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard. Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. Right. You're at MonteVista, which does real-time Linux systems and support. There will be people there who thoroughly understand thread issues. (I've used QNX for real time, but MonteVista has made progress since in recent years.) The Python thread documentation is kind of vague about how well the Python primitives are protected against concurrency problems. For example, do you have to protect basic types like lists and hashes against concurrent access? Is pop atomic? (It is in dequeue, but what about regular lists?) Can you crash Python from within Python via concurrency errors? Does the garbage collector run concurrently or does it freeze all threads? What's different depending upon whether you're using real OS threads or simulated Python threads? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 10:43 am, Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Chris Mellon wrote: On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: [snip] I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard.Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. THe thing is, I'm really not doing anything too crazy, so i'm having difficult tracking it down. I had heard in the past that there may be issues with threads, so I thought to investigate this side of things. It still proves difficult, but reassurance of the threading model helps me focus my efforts. Emphasizing this means that people will tend to ignore bugs as being the fault of POSIX rather than either auditing their code more carefully, or avoiding threads entirely (the second being what I suspect your goal is). As a last case, I should point out that while the POSIX memory model can't be proven safe, concrete implementations do not necessarily suffer from this problem.Would you consider the Linux implementation of threads to be concrete? -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 10:43 am, Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Chris Mellon wrote: On 24 Jan 2007 18:21:38 GMT, Nick Maclaren [EMAIL PROTECTED] wrote: [snip] I'm aware of the issues with the POSIX threading model. I still stand by my statement - bringing up the problems with the provability of correctness in the POSIX model amounts to FUD in a discussion of actual problems with actual code. Logic and programming errors in user code are far more likely to be the cause of random errors in a threaded program than theoretical (I've never come across a case in practice) issues with the POSIX standard.Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. THe thing is, I'm really not doing anything too crazy, so i'm having difficult tracking it down. I had heard in the past that there may be issues with threads, so I thought to investigate this side of things. It still proves difficult, but reassurance of the threading model helps me focus my efforts. Emphasizing this means that people will tend to ignore bugs as being the fault of POSIX rather than either auditing their code more carefully, or avoiding threads entirely (the second being what I suspect your goal is). As a last case, I should point out that while the POSIX memory model can't be proven safe, concrete implementations do not necessarily suffer from this problem.Would you consider the Linux implementation of threads to be concrete? -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 10:43 am, Carl J. Van Arsdall [EMAIL PROTECTED] wrote: Yea, typically I would think that. The problem I am seeing is incredibly intermittent. Like a simple pyro server that gives me a problem maybe every three or four months. Just something funky will happen to the state of the whole thing, some bad data, i'm having an issue tracking it down and some more experienced programmers mentioned that its most likely a race condition. THe thing is, I'm really not doing anything too crazy, so i'm having difficult tracking it down. I had heard in the past that there may be issues with threads, so I thought to investigate this side of things. POSIX issues aside, Python's threading model should be less susceptible to memory-barrier problems that are possible in other languages (this is due to the GIL). Double-checked locking, frinstance, is safe in python even though it isn't in java. Are you ever relying solely on the GIL to access shared data? -Mike -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Klaas [EMAIL PROTECTED] writes: POSIX issues aside, Python's threading model should be less susceptible to memory-barrier problems that are possible in other languages (this is due to the GIL). But the GIL is not part of Python's threading model; it's just a particular implementation artifact. Programs that rely on it are asking for trouble. Double-checked locking, frinstance, is safe in python even though it isn't in java. What's that? Are you ever relying solely on the GIL to access shared data? I think a lot of programs do that, which is probably unwise in the long run. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 4:11 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: Klaas [EMAIL PROTECTED] writes: POSIX issues aside, Python's threading model should be less susceptible to memory-barrier problems that are possible in other languages (this is due to the GIL). But the GIL is not part of Python's threading model; it's just a particular implementation artifact. Programs that rely on it are asking for trouble. CPython is more that a particular implementation of python, and the GIL is more than an artifact. It is a central tenet of threaded python programming. I don't advocate relying on the GIL to manage shared data when threading, but 1) it is useful for the reasons I mention 2) the OP's question was almost certainly about an application written for and run on CPython. Double-checked locking, frinstance, is safe in python even though it isn't in java. What's that? google.com -Mike -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
Klaas [EMAIL PROTECTED] writes: CPython is more that a particular implementation of python, It's precisely a particular implementation of Python. Other implementations include Jython, PyPy, and IronPython. and the GIL is more than an artifact. It is a central tenet of threaded python programming. If it's a central tenet of threaded python programming, why is it not mentioned at all in the language or library manual? The threading module documentation describes the right way to handle thread synchronization in Python, and that module implements traditional locking approaches without reference to the GIL. I don't advocate relying on the GIL to manage shared data when threading, but 1) it is useful for the reasons I mention 2) the OP's question was almost certainly about an application written for and run on CPython. Possibly true. -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
and the GIL is more than an artifact. It is a central tenet of threaded python programming. If it's a central tenet of threaded python programming, why is it not mentioned at all in the language or library manual? The threading module documentation describes the right way to handle thread synchronization in Python, and that module implements traditional locking approaches without reference to the GIL. And we all hope the GIL will one day die it's natural death ... maybe... probably.. hopefully ;) -- damjan -- http://mail.python.org/mailman/listinfo/python-list
Re: The reliability of python threads
On Jan 24, 5:18 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: Klaas [EMAIL PROTECTED] writes: CPython is more that a particular implementation of python, It's precisely a particular implementation of Python. Other implementations include Jython, PyPy, and IronPython. I did not deny that it is an implementation of Python. I deny that it is but an implementation of Python. Jython: several versions behind, used primariy for interfacing with java PyPy: years away from being a practical platform for replacing CPython IronPython: best example you've given, but still probably three or four orders of magnitude less significant that CPython and the GIL is more than an artifact. It is a central tenet of threaded python programming. If it's a central tenet of threaded python programming, why is it not mentioned at all in the language or library manual? The same reason why IE CSS quirks are not delineated in the HTML 4.01 spec. This doesn't mean that they aren't central to css web programming (they are). How could the GIL, which limits the number of threads in which python code can be run in a single process to one, NOT be a central part of threaded python programming? The threading module documentation describes the right way to handle thread synchronization in Python, and that module implements traditional locking approaches without reference to the GIL. No-one has argued that the GIL should be used instead of threading-based locking. How could they? The two concepts are not interchangeable and while they affect each other, are two different things entirely. In the post you responded to and quoted I said: I don't advocate relying on the GIL to manage shared data when threading, -Mike -- http://mail.python.org/mailman/listinfo/python-list