Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Wed, Sep 21, 2011 at 07:41:50AM +0200, Martin v. Loewis wrote: Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? I've been considering to implement killing threads several times for the last 15 years (I think about it once every year), and every time I give up because it's too complex and just not implementable. To start with, a simple flag in the thread won't do any good. I don't agree. Now if you had written that it wouldn't solve all problem, I could understand that. But I have been in circumstances where a simple flag in the thread implementation would have been helpfull. It will not cancel blocking system calls, so people will complain that the threads they meant to cancel continue to run forever. Instead, you have to use some facility to interrupt blocking system calls. You then have to convince callers of those blocking system calls not to retry when they see that the first attempt to call it was interrupted. And so on. But this is no longer an implementation problem but a use problem. If someone gets an IOError for writing on a closed pipe and he cathes the exception and retries the write in a loop, then this a problem of the author of this loop, not of exceptions. So if one thread throws an exception to an other thread for instance to indicate a timeout for the latter and the latter catches that exception and tries again what it was doing in a loop, that is entirely the problem of the author of that loop and not of the abilty of one thread throwing an exception in an other. Unless of course there may be a lot of such problematic loops within the internal python code. -- Antoon Pardon -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? I've been considering to implement killing threads several times for the last 15 years (I think about it once every year), and every time I give up because it's too complex and just not implementable. To start with, a simple flag in the thread won't do any good. It will not cancel blocking system calls, so people will complain that the threads they meant to cancel continue to run forever. Instead, you have to use some facility to interrupt blocking system calls. You then have to convince callers of those blocking system calls not to retry when they see that the first attempt to call it was interrupted. And so on. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
Ian Kelly wrote: And what if the thread gets killed a second time while it's in the except block? And what if the thread gets killed in the middle of the commit? For these kinds of reasons, any feature for raising asynchronous exceptions in another thread would need to come with some related facilites: * A way of blocking asynchronous exceptions around a critical section would be needed. * Once an asynchronous exception has been raised, further asynchronous exceptions should be blocked until explicitly re-enabled. * Asynchronous exceptions should probably be disabled initially in a new thread until it explicitly enables them. Some care would still be required to write code that is robust in the presence of asynchronous exceptions, but given these facilities, it ought to be possible. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
The point of the Java thread.stop() being deprecated seems to have very little to do with undeclared exceptions being raised and a lot to do with objects being left in a potentially damaged state. As Ian said, it's a lot more complex than just adding try/catches. Killing a thread in the middle of some non-atomic operation with side-effects that propagate beyond the thread is a recipe for trouble. In fact, while a a lot can be written about Java being a poor language the specific article linked to about why Java deprecated thread.stop() gives a pretty damn good explanation as to why Thread.stop() and the like are a bad idea and what a better idea might be (Signalling that a graceful halt should be attempted) -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Mon, Sep 19, 2011 at 3:41 PM, Ian Kelly ian.g.ke...@gmail.com wrote: And what if the thread gets killed in the middle of the commit? Database managers solved this problem years ago. It's not done by preventing death until you're done - death can come from someone brutally pulling out your power cord. There's no except PowerCordRemoved to protect you from that! There are various ways, and I'm sure one of them will work for whatever situation is needed. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sun, Sep 18, 2011 at 07:35:01AM +1000, Chris Angelico wrote: On Sun, Sep 18, 2011 at 5:00 AM, Nobody nob...@nowhere.com wrote: Forking a thread to discuss threads ahem. Why is it that threads can't be killed? Do Python threads correspond to OS-provided threads (eg POSIX threads on Linux)? Every OS threading library I've seen has some way of killing threads, although I've not looked in detail into POSIX threads there (there seem to be two options, pthread_kill and pthread_cancel, that could be used, but I've not used either). If nothing else, it ought to be possible to implement a high level kill simply by setting a flag that the interpreter will inspect every few commands, the same way that KeyboardInterrupt is checked for. Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? Python has a half baked solution to this. If you go to http://docs.python.org/release/3.2.2/c-api/init.html You will find the following: int PyThreadState_SetAsyncExc(long id, PyObject *exc) Asynchronously raise an exception in a thread. The id argument is the thread id of the target thread; exc is the exception object to be raised. This function does not steal any references to exc. To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held. Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn’t found. If exc is NULL, the pending exception (if any) for the thread is cleared. This raises no exceptions. Some recipes can be found at: http://www.google.com/search?ie=UTF-8oe=utf-8q=python+recipe+PyThreadState_SetAsyncExc However it this doesn't work 100% correctly. Last time I tried using this, it didn't work with an exception instance but only with an execption class as parameter. There was a discussion at http://mail.python.org/pipermail/python-dev/2006-August/068158.html about this. I don't know how it was finaly resolved. -- Antoon Pardon -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sun, 18 Sep 2011 23:41:29 -0600, Ian Kelly wrote: If the transaction object doesn't get its commit() called, it does no actions at all, thus eliminating all issues of locks. And what if the thread gets killed in the middle of the commit? The essence of a commit is that it involves an atomic operation, for which there is no middle. -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Mon, Sep 19, 2011 at 12:25 AM, Chris Angelico ros...@gmail.com wrote: On Mon, Sep 19, 2011 at 3:41 PM, Ian Kelly ian.g.ke...@gmail.com wrote: And what if the thread gets killed in the middle of the commit? Database managers solved this problem years ago. It's not done by preventing death until you're done - death can come from someone brutally pulling out your power cord. There's no except PowerCordRemoved to protect you from that! I'm aware of that. I'm not saying it's impossible, just that the example you gave is over-simplified, as writing atomic transactional logic is a rather complex topic. There may be an existing Python library to handle this, but I'm not aware of one. PowerCordRemoved is not relevant here, as that would kill the entire process, which renders the issue of broken shared data within a continuing process rather moot. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Tue, Sep 20, 2011 at 8:04 AM, Ian Kelly ian.g.ke...@gmail.com wrote: PowerCordRemoved is not relevant here, as that would kill the entire process, which renders the issue of broken shared data within a continuing process rather moot. Assuming that the broken shared data exists only in RAM on one single machine, and has no impact on the state of anything on the hard disk or on any other computer, yes. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
Antoon Pardon wrote: int PyThreadState_SetAsyncExc(long id, PyObject *exc) To prevent naive misuse, you must write your own C extension to call this. Not if we use ctypes! Muahahahaaa! -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
Thanks for everyone's comments - much appreciated! Malcolm (the OP) -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sat, Sep 17, 2011 at 5:38 PM, Chris Angelico ros...@gmail.com wrote: But if it's done as an exception, all you need is to catch that exception and reraise it: def threadWork(lock, a1, a2, rate): try: while True: time.sleep(rate) lock.lock() t = a2.balance / 2 a1.balance += t #say a thread.kill kills at this point a2.balance -= t lock.release() except: # roll back the transaction in some way lock.release() raise And what if the thread gets killed a second time while it's in the except block? It'd require some care in coding, but it could be done. And if the lock/transaction object can be coded for it, it could even be done automatically: def threadWork(lock, a1, a2, rate): while True: time.sleep(rate) transaction.begin() t = a2.balance / 2 transaction.apply(a1.balance,t) #say a thread.kill kills at this point transaction.apply(a2.balance,-t) transaction.commit() If the transaction object doesn't get its commit() called, it does no actions at all, thus eliminating all issues of locks. And what if the thread gets killed in the middle of the commit? Getting the code right is going to be a lot more complicated than just adding a couple of try/excepts. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
On Fri, 16 Sep 2011 18:01:27 -0400, Terry Reedy wrote: Now, can you write that as a heuristic *algorithm* def dangerous_re(some_re):? return re.search(r'\\\d', some_re) is not None That will handle the most extreme case ;) If the OP is serious about analysing regexps, sre_parse.parse() will decompose a regexp to a more convenient form. However, I wouldn't rely upon being able to catch every possible bad case. The only robust solution is to use a separate process (threads won't suffice, as they don't have a .kill() method). -- http://mail.python.org/mailman/listinfo/python-list
Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sun, Sep 18, 2011 at 5:00 AM, Nobody nob...@nowhere.com wrote: The only robust solution is to use a separate process (threads won't suffice, as they don't have a .kill() method). Forking a thread to discuss threads ahem. Why is it that threads can't be killed? Do Python threads correspond to OS-provided threads (eg POSIX threads on Linux)? Every OS threading library I've seen has some way of killing threads, although I've not looked in detail into POSIX threads there (there seem to be two options, pthread_kill and pthread_cancel, that could be used, but I've not used either). If nothing else, it ought to be possible to implement a high level kill simply by setting a flag that the interpreter will inspect every few commands, the same way that KeyboardInterrupt is checked for. Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sat, Sep 17, 2011 at 2:35 PM, Chris Angelico ros...@gmail.com wrote: On Sun, Sep 18, 2011 at 5:00 AM, Nobody nob...@nowhere.com wrote: The only robust solution is to use a separate process (threads won't suffice, as they don't have a .kill() method). Forking a thread to discuss threads ahem. Why is it that threads can't be killed? Do Python threads correspond to OS-provided threads (eg POSIX threads on Linux)? Every OS threading library I've seen has some way of killing threads, although I've not looked in detail into POSIX threads there (there seem to be two options, pthread_kill and pthread_cancel, that could be used, but I've not used either). If nothing else, it ought to be possible to implement a high level kill simply by setting a flag that the interpreter will inspect every few commands, the same way that KeyboardInterrupt is checked for. Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? It's possible that the reason is analogous to why Java has deprecated its equivalent, Thread.stop(): http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html Cheers, Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sun, Sep 18, 2011 at 8:27 AM, Chris Rebert c...@rebertia.com wrote: It's possible that the reason is analogous to why Java has deprecated its equivalent, Thread.stop(): http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html Interesting. The main argument against having a way to raise an arbitrary exception in a different thread is that it gets around Java's requirement to declare all exceptions that a routine might throw - a requirement that Python doesn't have. So does that mean it'd be reasonable to have a way to trigger a TerminateThread exception (like SystemExit but for one thread) remotely? The above article recommends polling a variable, but that's the exact sort of thing that exceptions are meant to save you from doing. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On 17Sep2011 15:27, Chris Rebert c...@rebertia.com wrote: | On Sat, Sep 17, 2011 at 2:35 PM, Chris Angelico ros...@gmail.com wrote: | On Sun, Sep 18, 2011 at 5:00 AM, Nobody nob...@nowhere.com wrote: | The only robust solution is to use a separate process (threads won't | suffice, as they don't have a .kill() method). | | Forking a thread to discuss threads ahem. | | Why is it that threads can't be killed? Do Python threads correspond | to OS-provided threads (eg POSIX threads on Linux)? Every OS threading | library I've seen has some way of killing threads, although I've not | looked in detail into POSIX threads there (there seem to be two | options, pthread_kill and pthread_cancel, that could be used, but I've | not used either). If nothing else, it ought to be possible to | implement a high level kill simply by setting a flag that the | interpreter will inspect every few commands, the same way that | KeyboardInterrupt is checked for. | | Is it just that nobody's implemented it, or is there a good reason for | avoiding offering this sort of thing? | | It's possible that the reason is analogous to why Java has deprecated | its equivalent, Thread.stop(): | http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html Interesting. A lot of that discussion concerns exceptions that the Java app is unprepared for. Java's strong typing includes the throwable exceptions, so that's a quite legitimate concern. The aborting mutex regions thing is also very real. Conversely, Python can have unexpected exceptions anywhere, anytime because it is not strongly typed in this way. That doesn't make it magicly robust against this, but does mean this is _already_ an issue in Python programs, threaded or otherwise. Context managers can help a lot here, in that they offer a reliable exception handler in a less ad hoc fashion than try/except because it is tied to the locking object; but they won't magicly step in save your basic: with my_lock: stuff... Personally I'm of the view that thread stopping should be part of the overt program logic, not a low level facility (such as causing a ThreadDeath exception asynchronously). The latter has all the troubles in the cited URL. Doing it overtly would go like this: ... outside ... that_thread.stop()# sets the stopping flag on the thread object that_thread.join()# and now maybe we wait for it... ... thread code ... ... do stuff, eg: with my_lock: muck about ... if thread.stopping: abort now, _outside_ the mutex ... This avoids the issue of aborting in the middle of supposedly mutex-safe code. It still requires scattering checks on thread.stopping through library code such as the OP's rogue regexp evaluator. Cheers, -- Cameron Simpson c...@zip.com.au DoD#743 http://www.cskk.ezoshosting.com/cs/ One measure of `programming complexity' is the number of mental objects you have to keep in mind simultaneously in order to understand a program. The mental juggling act is one of the most difficult aspects of programming and is the reason programming requires more concentration than other activities. It is the reason programmers get upset about `quick interruptions' -- such interruptions are tantamount to asking a juggler to keep three balls in the air and hold your groceries at the same time. - Steve McConnell, _Code Complete_ -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On 9/17/2011 7:19 PM, Chris Angelico wrote: On Sun, Sep 18, 2011 at 8:27 AM, Chris Rebertc...@rebertia.com wrote: It's possible that the reason is analogous to why Java has deprecated its equivalent, Thread.stop(): http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html Interesting. The main argument against having a way to raise an arbitrary exception in a different thread is that it gets around Java's requirement to declare all exceptions that a routine might throw - a requirement that Python doesn't have. I saw the main argument as being that stopping a thread at an arbitrary point can have an arbitrary, unpredictable effect on all other threads. And more so that shutting down an independent process. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On Sun, Sep 18, 2011 at 9:26 AM, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: def threadWork(lock, a1, a2, rate): while True: time.sleep(rate) lock.lock() t = a2.balance / 2 a1.balance += t #say a thread.kill kills at this point a2.balance -= t lock.release() It's obviously going to be an issue with killing processes too, which is why database engines have so much code specifically to protect against this. But if it's done as an exception, all you need is to catch that exception and reraise it: def threadWork(lock, a1, a2, rate): try: while True: time.sleep(rate) lock.lock() t = a2.balance / 2 a1.balance += t #say a thread.kill kills at this point a2.balance -= t lock.release() except: # roll back the transaction in some way lock.release() raise It'd require some care in coding, but it could be done. And if the lock/transaction object can be coded for it, it could even be done automatically: def threadWork(lock, a1, a2, rate): while True: time.sleep(rate) transaction.begin() t = a2.balance / 2 transaction.apply(a1.balance,t) #say a thread.kill kills at this point transaction.apply(a2.balance,-t) transaction.commit() If the transaction object doesn't get its commit() called, it does no actions at all, thus eliminating all issues of locks. Obviously there won't be any problem with the Python interpreter itself (refcounts etc) if the kill is done by exception - that would be a potential risk if using OS-level kills. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads (was Re: Cancel or timeout a long running regular expression)
On 18/09/2011 00:26, Dennis Lee Bieber wrote: On Sun, 18 Sep 2011 07:35:01 +1000, Chris Angelicoros...@gmail.com declaimed the following in gmane.comp.python.general: Is it just that nobody's implemented it, or is there a good reason for avoiding offering this sort of thing? Any asynchronous kill runs the risk of leaving shared data structures in a corrupt state. {Stupid example, but, in pseudo-Python: import time class Account(object): def __init__(self, initial=0.0) self.balance = initial myAccount = Account(100.0) yourAccount = Account(100.0) accountLock = threading.Lock() def threadWork(lock, a1, a2, rate): while True: time.sleep(rate) lock.lock() t = a2.balance / 2 a1.balance += t #say a thread.kill kills at this point a2.balance -= t lock.release() # create/start thread1 passing (accountLock, myAccount, yourAccount, 60) # create/start thread2 passing (accountLock, yourAccount, myAccount, 120) time.sleep(300) thread1.kill() So... Thread1 may be killed after one account gets incremented but before the other is decremented... And what happens to the lock? If it doesn't get released as part of the .kill() processing, they program is dead-locked (and the magically appearing money will never be seen). If it does get released, then the sum total of money in the system will have increased. [snip] The lock won't be released if an exception is raised, for example, if 'a1' isn't an Account instance and has no 'balance' attribute. Using a context manager would help in that case. -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
On Thu, 15 Sep 2011 14:54:57 -0400, Terry Reedy wrote: I was thinking there might be a technique I could use to evaluate regular expressions in a thread or another process launched via multiprocessing module and then kill the thread/process after a specified timeout period. Only solution I remember ever seen posted. I wonder if there are any heuristics for detecting exponential time re's. Exponential growth results from non-determinism, i.e. if there are multiple transitions for a given character from a given state. Common patterns include: ...(a...)?a... ...(a...)*a... ...(a...)+a... with a choice between matching the a at the start of the bracketed pattern or the a following it. (xxxa...|xxxb...|xxxc...) with a choice between branches which cannot be resolved until more data has been read. For re.search (as opposed to re.match): axxxa... When axxx has been read, a following a gives a choice between continuing the existing match with the second a, or aborting and matching against the first a. For patterns which contain many copies of the initial character, each copy creates another branch. Also, using back-references in a regexp [sic] can be a significant performance killer, as it rules out the use of a DFA (which, IIRC, Python doesn't do anyhow) and can require brute-forcing many combinations. A particularly bad case is: (a*)\1 matching against If the performance issue is with re.match/re.search rather than with re.compile, one option is to use ctypes to access libc's regexp functions. Those are likely to provide better searching throughput at the expense of potentially increased compilation time. -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
On 9/16/2011 9:57 AM, Nobody wrote: I wonder if there are any heuristics for detecting exponential time re's. Exponential growth results from non-determinism, i.e. if there are multiple transitions for a given character from a given state. Common patterns include: ...(a...)?a... ...(a...)*a... ...(a...)+a... with a choice between matching the a at the start of the bracketed pattern or the a following it. (xxxa...|xxxb...|xxxc...) with a choice between branches which cannot be resolved until more data has been read. For re.search (as opposed to re.match): axxxa... When axxx has been read, a following a gives a choice between continuing the existing match with the second a, or aborting and matching against the first a. For patterns which contain many copies of the initial character, each copy creates another branch. Also, using back-references in a regexp [sic] can be a significant performance killer, as it rules out the use of a DFA (which, IIRC, Python doesn't do anyhow) and can require brute-forcing many combinations. A particularly bad case is: (a*)\1 matching against If the performance issue is with re.match/re.search rather than with re.compile, one option is to use ctypes to access libc's regexp functions. Those are likely to provide better searching throughput at the expense of potentially increased compilation time. Now, can you write that as a heuristic *algorithm* def dangerous_re(some_re):? -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
On 9/15/2011 1:19 AM, pyt...@bdurham.com wrote: Is there a way to cancel or timeout a long running regular expression? I have a program that accepts regular expressions from users and I'm concerned about how to handle worst case regular expressions that seem to run forever. Ideally I'm looking for a way to evaluate a regular expression and timeout after a specified time period if the regular expression hasn't completed yet. Or a way for a user to cancel a long running regular expression. This is a general problem when evaluating *any* expression from the outside. [0]*1*1 will eat space as well as time. At least, as far as I know, an re cannot cause a disk reformat ;-). There have been previous discussions on this generally topic. I was thinking there might be a technique I could use to evaluate regular expressions in a thread or another process launched via multiprocessing module and then kill the thread/process after a specified timeout period. Only solution I remember ever seen posted. I wonder if there are any heuristics for detecting exponential time re's. My concern about the multiprocessing module technique is that launching a process for every regex evaluation sounds pretty inefficient. And I don't think the threading module supports the ability to kill threads from outside a thread itself. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Cancel or timeout a long running regular expression
On Fri, Sep 16, 2011 at 4:54 AM, Terry Reedy tjre...@udel.edu wrote: On 9/15/2011 1:19 AM, pyt...@bdurham.com wrote: I was thinking there might be a technique I could use to evaluate regular expressions in a thread or another process launched via multiprocessing module and then kill the thread/process after a specified timeout period. Only solution I remember ever seen posted. Then here's a minor refinement. Since long-running RE is the exceptional case, optimize for the other. Keep the process around and feed it all the jobs you get, and on problem, kill and respawn. That way, you pay most of the overhead cost only when you make use of the separation. (There's still some IPC overhead of course. Can't escape that.) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Cancel or timeout a long running regular expression
Is there a way to cancel or timeout a long running regular expression? I have a program that accepts regular expressions from users and I'm concerned about how to handle worst case regular expressions that seem to run forever. Ideally I'm looking for a way to evaluate a regular expression and timeout after a specified time period if the regular expression hasn't completed yet. Or a way for a user to cancel a long running regular expression. I was thinking there might be a technique I could use to evaluate regular expressions in a thread or another process launched via multiprocessing module and then kill the thread/process after a specified timeout period. My concern about the multiprocessing module technique is that launching a process for every regex evaluation sounds pretty inefficient. And I don't think the threading module supports the ability to kill threads from outside a thread itself. Malcolm -- http://mail.python.org/mailman/listinfo/python-list