Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Weldon, If the current scheme is the same that we had 1 or 2 years ago, the answer is no This is just the same scheme! I am really hoping that all of this is simply an implementation bug. There are no open issues on this scheme, there is no examples that fail right now because of the suspend_all. The bottom line is that to make the system easy to reason about, a thread should always be in suspend_enable mode before it does anything that might block. We already talk about that in the top of the thread. I agree with that, and will add some debug capacity to the TM. Eugeny, Actually, the code is not ideal, and there is a lot of things to do on it. You could contribute you ideas into the code, and test them. The suspend_all code contains a number of compromises that was produced by different workloads and stress tests failures. I attach patch that implement one of you ideas: to hold global thread lock between suspend_all / resume_all. As I remember it could cause deadlock in JVMTI. Probably, something change in mean time. Thanks Artem Index: vm/thread/src/thread_native_suspend.c === --- vm/thread/src/thread_native_suspend.c (revision 464417) +++ vm/thread/src/thread_native_suspend.c (working copy) @@ -420,7 +420,7 @@ } hythread_iterator_reset(iter); -hythread_iterator_release(iter); +//hythread_iterator_release(iter); if(t) { *t=iter; @@ -450,6 +450,8 @@ } hythread_iterator_release(iter); +hythread_iterator_release(iter); + return TM_ERROR_NONE; } - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/18/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: I agree it is required to have a solid model in mind. I also believe it is required to have such design/implementation which doesn't allow to break that model. No, that's the point if functionality is so safe that it's impossible to break the model, it's not so highly important(as in our case) to understand how it works, this model will restore itself. hmmm this is probably a case where we are all saying the same thing only using different words. Nikolay, would it be possible for you to try a different way of explaining the above? It would really be appreciated. It's ok to have safe points inside disabled regions only if it is really safe to enable GC at that point. All such cases should be taken with extreme caution. In our particular case we can't guarantee that it is safe to suspend the thread. That's why I think having something like assert(hythread_is_suspend_enabled) in the beginning of hythread_suspend_all is really required? Of cause it will require some changes in VM and TM... Again, I agree that sometimes safepoints enable suspension in wrong places an assert must be placed inside conditions, for instance, but suspend_all is the rare place where safe_point placed in suspend_disable region intentionally, by design(please refer to the lock semantics of safe regions in my answer to Weldon). Could you give the most impossible thing to do? Peace All Over the World? :) I was thinking of TM as of quite independent component. But now it seems like some parts of it are really depend on DRLVM implementation. :-( First of all, TM _is_ independent component, but some of its functions designed for special usage(it's potentially unsafe to nail up smth with the gun, for instance). Also, I believe that TM suspension safe enough an should not be rewritten w/o special need, and even if it should the performance of this functionality should be always in mind. Current suspension scheme was tested on multiple workloads and tuned to achieve better performance, and note that not even using additional locks but having CAS to perform suspend_disable/enable leads to noticeable performance loss. Actually my idea from the beginning is that while we don't have a scenario we should not change suspension algorithm. It's good enough, robust tuned for better performance of GC. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Weldon Washburn Intel Middleware Products Division
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/17/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: Hello All, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. hmmm... it is strange to hear words like that. TM provides specification for this particular method and it should perform according to that specification. I as a developer don't want to care about particular use case scenario. 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code of suspend_all method is dedicated to the cyclic suspension problem. The fact that this method is being called from suspend_disable region and have safe_point in within is all about cyclic suspension. A lot of time was spent to resolve deadlocks cause by two threads trying to suspend each other. I agree that problem is similar to one with conditions, but I believe that this one should be discussed as a part of particular scenario. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. First of all I would differentiate j.l.ThreadGroup and thread groups defined by thread manager(saying that I mean that this method was not designed for ordinary use, like ThreadGroup.suspend()), and after that return to the question why we would need it (I mean, it would be better to have particular scenario) and then we can discuss how to implement this. Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Nikolay, I understand there is only one use case for now. Again I expect the method works according to the spec but not how it is used in some particular case. Could you clarify what you mean by saying Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Why do you have such restriction? Where it is specified? Thanks Evgueni Thank you. Nik. Thanks Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Weldon, I agree with what you are saying above. Do you think it makes sense to call hythread_suspend_all in enabled state only? Evgueni On 10/17/06, Weldon Washburn [EMAIL PROTECTED] wrote: On 10/17/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: Hello All, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. In specific, with a solid model of system behavior in mind. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The simplest model is to grab the thread lock whenever thread A wants to suspend thread B at a safepoint. While this serializes thread suspension and can potentially be a bottleneck, let's wait until its a proven performance problem to change this model. For thread A to be ready to grab the thread lock, thread A must have all its java live references put in a place where the GC will see them. Why? Because thread A may block. Once thread A obtains the lock, it can disable suspension if it likes, reload the java live refs and do whatever it needs but make certain it is quick and non blocking. If thread A needs to block on some OS call, etc, it will need to re-enable suspension and abandon the thread lock. Why? Because if thread A blocks while holding the global thread lock, there may be deadlock or latency problems. Did you try the above approach? ARe there deadlocks? The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code of suspend_all method is dedicated to the cyclic suspension problem. The fact that this method is being called from suspend_disable region and have safe_point in within is all about cyclic suspension. A lot of time was spent to resolve deadlocks cause by two threads trying to suspend each other. I agree that problem is similar to one with conditions, but I believe that this one should be discussed as a part of particular scenario. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. First of all I would differentiate j.l.ThreadGroup and thread groups defined by thread manager(saying that I mean that this method was not designed for ordinary use, like ThreadGroup.suspend()), and after that return to the question why we would need it (I mean, it would be better to have particular scenario) and then we can discuss how to implement this. Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Thank you. Nik. Thanks Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Weldon Washburn Intel Middleware Products Division - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/17/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Evgueni Brevnov wrote: Hi All, I'd like to here you opinion regarding hythread_suspend_all and hythread_resume_all functionality provided by TM. Actually I have to separate questions: 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code you see is there to prevent following deadlock scenario: Thread A Thread B | | | suspend(A); | A-suspend_request = 1; |wait for A to reach a safepoint... | | suspend_all() | B-suspend_request = 1 wait for B to reach a safepoint ... and then two threads are infinitely waiting one another. Salikh, I see your scenario...I don't suggest to remove safe points from hythread_suspend_all. Contrary I believe it makes sense to suspend other threads only if it suspender thread is in a safe region. Agree? 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. We may as well leave it as the responsibility of application / TI agent writer not to modify a suspended thread group. Why do you think this should be enforced? In general, any good design should strive to eliminate/minimize cases of illegal use of the interface. In other words it should be hard to use it in a buggy way. In our case that means it is better to ensure integrity inside TM instead of making application responsible for that if we can do it inside what is the reason not to do it? Moreover if you look to the spec of hythread_suspend_all it states ...This method sets a suspend request for the every thread in the group and then returns the iterator that can be used to traverse through the suspended threads... But implementation contradicts with that. If group is changed while you are traversing through the group you can get wrong thread. What is even worse you can get a crash when one thread is iterating through the group while another thread inserts new elements (or removes) to it. Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Evgueni, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. hmmm... it is strange to hear words like that. TM provides specification for this particular method and it should perform according to that specification. I as a developer don't want to care about particular use case scenario. I'd say that w/o stop_the_world enumeration we would not include this function to the interface, and w/o Thread,stop()(which is deprecated and jvmtiThreadStop which is used in debugger) we would not include even suspend function to the interface, like pthread or original version of hythread. Most of the others implementations have some special notes about suspend: MSDN: This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. HP pthread_suspend_np(not portable): This functionality enables a process that is multithreaded to temporarily suspend all activity to a single thread of control. When the process is single threaded, the address space is not changing, and a consistent view of the process can be gathered. So, this function is unsafe, and should be used with care and In specific, with a solid model of system behavior in mind.(c) Weldon; And returning to the question of safepoints inside suspend_disable regions, we have a lot of such places inside a VM and suspend_all is probably the only place where this safepoint was left intentionally(I mean cyclic suspends). Nikolay, I understand there is only one use case for now. Again I expect the method works according to the spec but not how it is used in some particular case. Could you clarify what you mean by saying Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Why do you have such restriction? Where it is specified? We have such restrictions, because it's extremly hard to implement common(not only for GC + limited use of TI). The spec on suspend/suspend_all function usage scenarious can be found at: drlvm/trunk/vm/thread/doc/ThreadManager.htm paragraphs 6.2 and 6.3 Thank you. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
hmm I never thought of it that way. My initial reaction is no. Suspend enable/disable and global thread lock are seperate, distinct concepts. The thread lock should protect the VM internal thread structs when they are being modified. For example, the thread lock should allow only one thread create/die at any given instant. The enable/disable state is incidental to this event. This is independent of the concept of a thread running native code being in a state where the GC can find all its live references. If a thread needs to grab the thread lock, of course, it needs to put itself in a suspend enable mode because it might have to wait for the lock. Yes I agree that global lock allows only one thread to create/die (and so on) at any given moment, while suspend_disable/enable affect only suspension functionality. But in fact suspend_disable is per_thread lock for suspension, and if it's taken(suspend_disable called) other thread can't suspend particular thread while this lock is not released(suspend_enable called). And I believe that additional synchronization is excessive and very expensive. Also my opinion is that suspension scheme is the last place in DRLVM that should be changed w/o any open issue or problem which is depends on it (or we do have a problems in GC in regard to suspension). Do you really think that current scheme is unsafe and should be redesigned? Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Hi Nikolay! On 10/18/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: Evgueni, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. hmmm... it is strange to hear words like that. TM provides specification for this particular method and it should perform according to that specification. I as a developer don't want to care about particular use case scenario. I'd say that w/o stop_the_world enumeration we would not include this function to the interface, and w/o Thread,stop()(which is deprecated and jvmtiThreadStop which is used in debugger) we would not include even suspend function to the interface, like pthread or original version of hythread. Most of the others implementations have some special notes about suspend: MSDN: This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. HP pthread_suspend_np(not portable): This functionality enables a process that is multithreaded to temporarily suspend all activity to a single thread of control. When the process is single threaded, the address space is not changing, and a consistent view of the process can be gathered. So, this function is unsafe, and should be used with care and In specific, with a solid model of system behavior in mind.(c) Weldon; I agree it is required to have a solid model in mind. I also believe it is required to have such design/implementation which doesn't allow to break that model. And returning to the question of safepoints inside suspend_disable regions, we have a lot of such places inside a VM and suspend_all is probably the only place where this safepoint was left intentionally(I mean cyclic suspends). It's ok to have safe points inside disabled regions only if it is really safe to enable GC at that point. All such cases should be taken with extreme caution. In our particular case we can't guarantee that it is safe to suspend the thread. That's why I think having something like assert(hythread_is_suspend_enabled) in the beginning of hythread_suspend_all is really required? Of cause it will require some changes in VM and TM... Nikolay, I understand there is only one use case for now. Again I expect the method works according to the spec but not how it is used in some particular case. Could you clarify what you mean by saying Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Why do you have such restriction? Where it is specified? We have such restrictions, because it's extremly hard to implement common(not only for GC + limited use of TI). The spec on suspend/suspend_all function usage scenarious can be found at: drlvm/trunk/vm/thread/doc/ThreadManager.htm paragraphs 6.2 and 6.3 Could you give the most impossible thing to do? I was thinking of TM as of quite independent component. But now it seems like some parts of it are really depend on DRLVM implementation. :-( Thanks Evgueni Thank you. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
I agree it is required to have a solid model in mind. I also believe it is required to have such design/implementation which doesn't allow to break that model. No, that's the point if functionality is so safe that it's impossible to break the model, it's not so highly important(as in our case) to understand how it works, this model will restore itself. It's ok to have safe points inside disabled regions only if it is really safe to enable GC at that point. All such cases should be taken with extreme caution. In our particular case we can't guarantee that it is safe to suspend the thread. That's why I think having something like assert(hythread_is_suspend_enabled) in the beginning of hythread_suspend_all is really required? Of cause it will require some changes in VM and TM... Again, I agree that sometimes safepoints enable suspension in wrong places an assert must be placed inside conditions, for instance, but suspend_all is the rare place where safe_point placed in suspend_disable region intentionally, by design(please refer to the lock semantics of safe regions in my answer to Weldon). Could you give the most impossible thing to do? Peace All Over the World? :) I was thinking of TM as of quite independent component. But now it seems like some parts of it are really depend on DRLVM implementation. :-( First of all, TM _is_ independent component, but some of its functions designed for special usage(it's potentially unsafe to nail up smth with the gun, for instance). Also, I believe that TM suspension safe enough an should not be rewritten w/o special need, and even if it should the performance of this functionality should be always in mind. Current suspension scheme was tested on multiple workloads and tuned to achieve better performance, and note that not even using additional locks but having CAS to perform suspend_disable/enable leads to noticeable performance loss. Actually my idea from the beginning is that while we don't have a scenario we should not change suspension algorithm. It's good enough, robust tuned for better performance of GC. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
It seems we are not in sync I don't suggest changing current suspention scheme...I like it I'm talking about one particular case.and still can't see any disadvantages in what I propose... On 10/18/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: I agree it is required to have a solid model in mind. I also believe it is required to have such design/implementation which doesn't allow to break that model. No, that's the point if functionality is so safe that it's impossible to break the model, it's not so highly important(as in our case) to understand how it works, this model will restore itself. It's ok to have safe points inside disabled regions only if it is really safe to enable GC at that point. All such cases should be taken with extreme caution. In our particular case we can't guarantee that it is safe to suspend the thread. That's why I think having something like assert(hythread_is_suspend_enabled) in the beginning of hythread_suspend_all is really required? Of cause it will require some changes in VM and TM... Again, I agree that sometimes safepoints enable suspension in wrong places an assert must be placed inside conditions, for instance, but suspend_all is the rare place where safe_point placed in suspend_disable region intentionally, by design(please refer to the lock semantics of safe regions in my answer to Weldon). Could you give the most impossible thing to do? Peace All Over the World? :) I was thinking of TM as of quite independent component. But now it seems like some parts of it are really depend on DRLVM implementation. :-( First of all, TM _is_ independent component, but some of its functions designed for special usage(it's potentially unsafe to nail up smth with the gun, for instance). Also, I believe that TM suspension safe enough an should not be rewritten w/o special need, and even if it should the performance of this functionality should be always in mind. Current suspension scheme was tested on multiple workloads and tuned to achieve better performance, and note that not even using additional locks but having CAS to perform suspend_disable/enable leads to noticeable performance loss. Actually my idea from the beginning is that while we don't have a scenario we should not change suspension algorithm. It's good enough, robust tuned for better performance of GC. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/18/06, Evgueni Brevnov [EMAIL PROTECTED] wrote: On 10/17/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Evgueni Brevnov wrote: Hi All, I'd like to here you opinion regarding hythread_suspend_all and hythread_resume_all functionality provided by TM. Actually I have to separate questions: 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code you see is there to prevent following deadlock scenario: Thread A Thread B | | | suspend(A); | A-suspend_request = 1; |wait for A to reach a safepoint... | | suspend_all() | B-suspend_request = 1 wait for B to reach a safepoint ... and then two threads are infinitely waiting one another. Salikh, I see your scenario...I don't suggest to remove safe points from hythread_suspend_all. Contrary I believe it makes sense to suspend other threads only if it suspender thread is in a safe region. Agree? This seems to make the most sense. I think there might be an argument that Thread A trying to suspend all other threads really does not have to be in suspend_enable mode. But this corner case probably adds nothing but clutter/confusion to the design. As a design rule, I believe any time a thread calls a function that might block, the thread should be in suspend enable mode. This simple rule makes it much easier for the whole team working on the code base to know what to do. Suppose Thread A intends to suspend a subset of all java threads. Thread A will need to block somehow and wait for the complete subset to get to suspended state with their stacks enumerable (suspend_enabled). Meanwhile over on another CPU in the SMP box, a bunch of non-suspended threads chew up gobs of memory and one of them calls for a stop-the-world GC. Ultimately Thread A as well as the subset that was suspended needs to be in a suspend_enabled state. The easiest sync model to reason about is one where Thread A suspends its target subset of java threads *before* allowing the stop-the-world gc to proceed. A global thread/gc lock provides this guarantee. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. We may as well leave it as the responsibility of application / TI agent writer not to modify a suspended thread group. Why do you think this should be enforced? In general, any good design should strive to eliminate/minimize cases of illegal use of the interface. In other words it should be hard to use it in a buggy way. In our case that means it is better to ensure integrity inside TM instead of making application responsible for that if we can do it inside what is the reason not to do it? Yes. Good idea provided it really can be done. Moreover if you look to the spec of hythread_suspend_all it states ...This method sets a suspend request for the every thread in the group and then returns the iterator that can be used to traverse through the suspended threads... But implementation contradicts with that. If group is changed while you are traversing through the group you can get wrong thread. Agreed. What you describe can be a problem. I would like to see a conservative, simple design. Once we get the sync funtional part robust, we can look at the performance problems. While it is possible to simultaneously walk a link-list while the list is changing, this adds way too much confusion at this stage of VM development. And the VM does not yet have enough performance where it make sense to look at such fine detail. What is even worse you can get a crash when one thread is iterating through the group while another thread inserts new elements (or removes) to it. Actually a crash would be the best failure I can think of. At least the crash occurs somewhere close to the bad code. A worse scenario is that you can fool yourself into thinking you have processed all the threads on the list when, in fact, you may have missed a couple. And the impact may not surface for 10 seconds... Evgueni - Terms of use :
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/18/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: hmm I never thought of it that way. My initial reaction is no. Suspend enable/disable and global thread lock are seperate, distinct concepts. The thread lock should protect the VM internal thread structs when they are being modified. For example, the thread lock should allow only one thread create/die at any given instant. The enable/disable state is incidental to this event. This is independent of the concept of a thread running native code being in a state where the GC can find all its live references. If a thread needs to grab the thread lock, of course, it needs to put itself in a suspend enable mode because it might have to wait for the lock. Yes I agree that global lock allows only one thread to create/die (and so on) at any given moment, while suspend_disable/enable affect only suspension functionality. But in fact suspend_disable is per_thread lock for suspension, and if it's taken(suspend_disable called) other thread can't suspend particular thread while this lock is not released(suspend_enable called). And I believe that additional synchronization is excessive and very expensive. This is interesting. A thread's suspend enable/disable state is basically one bit of thread-local storage info that is only written by the owning thread. And is only read by other threads in the system. There is no lock protocol on this bit. It should be very cheap operation. Is there evidence that this operation is expensive? Also, note we have to take into account the hardware memory model. And, as fate would have it, different HW has different memory models. For example, Intel 32-bit has what is known as write ordering. Basically this means that writes inside of a CPU will hit the SMP coherency domain in the order of the program. There is no guarantee precisely when the writes hit the bus. Bottom line: Thread A can toggle its enable/disable bit and eventually other CPUs will _eventually_ see the writes in the order they happened. PPC is different, IPF is different. Grabbing the thread system lock will get expensive if it is done at a high rate. My initial hunch is that grabbing the thread system lock happens at low frequency. Why? Because operations such as thread create/kill, thread suspend/resume, get thread group, thread interrrupt,etc happen at rather low frequency. Is there evidence that workloads we care about will cause high frequency thread system lock? Also my opinion is that suspension scheme is the last place in DRLVM that should be changed w/o any open issue or problem which is depends on it (or we do have a problems in GC in regard to suspension). Do you really think that current scheme is unsafe and should be redesigned? If the current scheme is the same that we had 1 or 2 years ago, the answer is no. I am really hoping that all of this is simply an implementation bug. The bottom line is that to make the system easy to reason about, a thread should always be in suspend_enable mode before it does anything that might block. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Weldon Washburn Intel Middleware Products Division
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/19/06, Weldon Washburn [EMAIL PROTECTED] wrote: On 10/18/06, Evgueni Brevnov [EMAIL PROTECTED] wrote: On 10/17/06, Salikh Zakirov [EMAIL PROTECTED] wrote: Evgueni Brevnov wrote: Hi All, I'd like to here you opinion regarding hythread_suspend_all and hythread_resume_all functionality provided by TM. Actually I have to separate questions: 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code you see is there to prevent following deadlock scenario: Thread A Thread B | | | suspend(A); | A-suspend_request = 1; |wait for A to reach a safepoint... | | suspend_all() | B-suspend_request = 1 wait for B to reach a safepoint ... and then two threads are infinitely waiting one another. Salikh, I see your scenario...I don't suggest to remove safe points from hythread_suspend_all. Contrary I believe it makes sense to suspend other threads only if it suspender thread is in a safe region. Agree? This seems to make the most sense. I think there might be an argument that Thread A trying to suspend all other threads really does not have to be in suspend_enable mode. But this corner case probably adds nothing but clutter/confusion to the design. As a design rule, I believe any time a thread calls a function that might block, the thread should be in suspend enable mode. This simple rule makes it much easier for the whole team working on the code base to know what to do. Strongly agree! Suppose Thread A intends to suspend a subset of all java threads. Thread A will need to block somehow and wait for the complete subset to get to suspended state with their stacks enumerable (suspend_enabled). Meanwhile over on another CPU in the SMP box, a bunch of non-suspended threads chew up gobs of memory and one of them calls for a stop-the-world GC. Ultimately Thread A as well as the subset that was suspended needs to be in a suspend_enabled state. The easiest sync model to reason about is one where Thread A suspends its target subset of java threads *before* allowing the stop-the-world gc to proceed. A global thread/gc lock provides this guarantee. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. We may as well leave it as the responsibility of application / TI agent writer not to modify a suspended thread group. Why do you think this should be enforced? In general, any good design should strive to eliminate/minimize cases of illegal use of the interface. In other words it should be hard to use it in a buggy way. In our case that means it is better to ensure integrity inside TM instead of making application responsible for that if we can do it inside what is the reason not to do it? Yes. Good idea provided it really can be done. Moreover if you look to the spec of hythread_suspend_all it states ...This method sets a suspend request for the every thread in the group and then returns the iterator that can be used to traverse through the suspended threads... But implementation contradicts with that. If group is changed while you are traversing through the group you can get wrong thread. Agreed. What you describe can be a problem. I would like to see a conservative, simple design. Once we get the sync funtional part robust, we can look at the performance problems. While it is possible to simultaneously walk a link-list while the list is changing, this adds way too much confusion at this stage of VM development. And the VM does not yet have enough performance where it make sense to look at such fine detail. What is even worse you can get a crash when one thread is iterating through the group while another thread inserts new elements (or removes) to it. Actually a crash would be the best failure I can think of. At least the crash occurs somewhere close to the bad code. A worse scenario is that you can fool yourself into thinking you have processed all the threads on the list when, in fact, you may have missed a couple. And the impact may not surface for 10
[drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Hi All, I'd like to here you opinion regarding hythread_suspend_all and hythread_resume_all functionality provided by TM. Actually I have to separate questions: 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. Thanks Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Hello All, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code of suspend_all method is dedicated to the cyclic suspension problem. The fact that this method is being called from suspend_disable region and have safe_point in within is all about cyclic suspension. A lot of time was spent to resolve deadlocks cause by two threads trying to suspend each other. I agree that problem is similar to one with conditions, but I believe that this one should be discussed as a part of particular scenario. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. First of all I would differentiate j.l.ThreadGroup and thread groups defined by thread manager(saying that I mean that this method was not designed for ordinary use, like ThreadGroup.suspend()), and after that return to the question why we would need it (I mean, it would be better to have particular scenario) and then we can discuss how to implement this. Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Thank you. Nik. Thanks Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/17/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: Hello All, first of all I'd like to emphasize that suspend/resume_all functions are potentially unsafe and should be used with care. In specific, with a solid model of system behavior in mind. secondly, those methods were designed mainly to support stop_the_world_enumeration and thus usually being used under certain conditions. 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The simplest model is to grab the thread lock whenever thread A wants to suspend thread B at a safepoint. While this serializes thread suspension and can potentially be a bottleneck, let's wait until its a proven performance problem to change this model. For thread A to be ready to grab the thread lock, thread A must have all its java live references put in a place where the GC will see them. Why? Because thread A may block. Once thread A obtains the lock, it can disable suspension if it likes, reload the java live refs and do whatever it needs but make certain it is quick and non blocking. If thread A needs to block on some OS call, etc, it will need to re-enable suspension and abandon the thread lock. Why? Because if thread A blocks while holding the global thread lock, there may be deadlock or latency problems. Did you try the above approach? ARe there deadlocks? The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code of suspend_all method is dedicated to the cyclic suspension problem. The fact that this method is being called from suspend_disable region and have safe_point in within is all about cyclic suspension. A lot of time was spent to resolve deadlocks cause by two threads trying to suspend each other. I agree that problem is similar to one with conditions, but I believe that this one should be discussed as a part of particular scenario. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. First of all I would differentiate j.l.ThreadGroup and thread groups defined by thread manager(saying that I mean that this method was not designed for ordinary use, like ThreadGroup.suspend()), and after that return to the question why we would need it (I mean, it would be better to have particular scenario) and then we can discuss how to implement this. Till now suspend_all method was designed to work within one group(in particular default group, containing java threads), and called be GC. Thank you. Nik. Thanks Evgueni - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Weldon Washburn Intel Middleware Products Division
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
Evgueni Brevnov wrote: Hi All, I'd like to here you opinion regarding hythread_suspend_all and hythread_resume_all functionality provided by TM. Actually I have to separate questions: 1) I found that hythread_suspend_all calls thread_safe_point_impl inside. There is no assertion regarding thread's state upon entering hythread_suspend_all. So it can be called in suspend disabled state and nobody (at least me) expects to have a safe point during hythread_suspend_all. The problem seems to be very similar with the one discussed in [drlvm][threading] Possible race condition in implementation of conditional variables? Your thoughts? The code you see is there to prevent following deadlock scenario: Thread A Thread B | | | suspend(A); | A-suspend_request = 1; |wait for A to reach a safepoint... | | suspend_all() | B-suspend_request = 1 wait for B to reach a safepoint ... and then two threads are infinitely waiting one another. 2) Assume I need to suspend all threads in particular group. Ok I pass that group to hythread_suspend_all. Later when all suspended threads should be resumed I pass the same group to hythread_resume_all. But threads were suspended group has changed. For example one new thread was added to it. So the questions are. Is it acceptable to have such unsafe functionality? Would it better to lock the group in hythread_suspend_all and unlock it in hythread_resume_all. We may as well leave it as the responsibility of application / TI agent writer not to modify a suspended thread group. Why do you think this should be enforced? - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
The simplest model is to grab the thread lock whenever thread A wants to suspend thread B at a safepoint. While this serializes thread suspension and can potentially be a bottleneck, let's wait until its a proven performance problem to change this model. For thread A to be ready to grab the thread lock, thread A must have all its java live references put in a place where the GC will see them. Why? Because thread A may block. Once thread A obtains the lock, it can disable suspension if it likes, reload the java live refs and do whatever it needs but make certain it is quick and non blocking. If thread A needs to block on some OS call, etc, it will need to re-enable suspension and abandon the thread lock. Why? Because if thread A blocks while holding the global thread lock, there may be deadlock or latency problems. Did you try the above approach? ARe there deadlocks? I wonder if suspend_disable call can be treated as a thread lock and if yes we do have nearly the same scheme related to stop_the_world suspension. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [drlvm][threading] Is it safe to use hythread_suspend_all and hythread_resume_all?
On 10/17/06, Nikolay Kuznetsov [EMAIL PROTECTED] wrote: The simplest model is to grab the thread lock whenever thread A wants to suspend thread B at a safepoint. While this serializes thread suspension and can potentially be a bottleneck, let's wait until its a proven performance problem to change this model. For thread A to be ready to grab the thread lock, thread A must have all its java live references put in a place where the GC will see them. Why? Because thread A may block. Once thread A obtains the lock, it can disable suspension if it likes, reload the java live refs and do whatever it needs but make certain it is quick and non blocking. If thread A needs to block on some OS call, etc, it will need to re-enable suspension and abandon the thread lock. Why? Because if thread A blocks while holding the global thread lock, there may be deadlock or latency problems. Did you try the above approach? ARe there deadlocks? I wonder if suspend_disable call can be treated as a thread lock and if yes we do have nearly the same scheme related to stop_the_world suspension. hmm I never thought of it that way. My initial reaction is no. Suspend enable/disable and global thread lock are seperate, distinct concepts. The thread lock should protect the VM internal thread structs when they are being modified. For example, the thread lock should allow only one thread create/die at any given instant. The enable/disable state is incidental to this event. This is independent of the concept of a thread running native code being in a state where the GC can find all its live references. If a thread needs to grab the thread lock, of course, it needs to put itself in a suspend enable mode because it might have to wait for the lock. Nik. - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Weldon Washburn Intel Middleware Products Division