Re: RFC 178 (v2) Lightweight Threads
At 09:43 PM 9/9/00 -0400, Chaim Frenkel wrote: "DS" == Dan Sugalski [EMAIL PROTECTED] writes: DS Right, but databases are all dealing with mainly disk access. A 1ms lock DS operation's no big deal when it takes 100ms to fetch the data being locked. DS A 1ms lock operation *is* a big deal when it takes 100ns to fetch the data DS being locked... Actually, even a database can't waste too much time on locks, not when there may be thousands or millions of rows effected. But ... but... it does. The time isn't wasted, but it is spent. It has to be. Locking things in a relational database with ACID guarantees isn't cheap. Records, being disk based with rollback guarantees, are significantly heavier-weight than perl variables. DS Correctness is what we define it as. I'm more worried about expense. DS Detecting deadlocks is expensive and it means rolling our own locking DS protocols on most systems. You can't do it at all easily with PThreads DS locks, unfortunately. Just detecting a lock that blocks doesn't cut it, DS since that may well be legit, and doing a scan for circular locking issues DS every time a lock blocks is expensive. DS Rollbacks are also expensive, and they can generate unbounded amounts of DS temporary data, so they're also fraught with expense and peril. Then all "we" are planning on delivering is correctness with a possiblity of deadlocks with no notification. At the moment, yup. I'd like otherwise, and I don't mind planning for it (like giving lock a return value), but I think you'll find it rather difficult and pricey. Is deadlock detection really that expensive? The cost would be born by the thread that will be going to sleep. Can't get lock, do the scan. First, you don't scan every block. That's really expensive. I'm not sure how things like Oracle do it, but VMS' lock manager only scans once a second if there are blocked locks. Secondly, the cost is borne by the entire system. Scanning a tree of locks can be expensive. Depending on how many threads and how many locks outstanding, you have a potentially very large list of things to go through. That takes CPU time, memory, and cache space. I really think we will have to do it. And we should come up with the deadlock resolution. I don't think we will fly without it. We are going to be deluged with reports of "my program hangs. Bug in locking." Yup. That's not at all uncommon with threads. :( Adding in read/write locks and trylocks will help. Full thread deadlock detection means tossing out POSIX's mutex scheme and rolling our own, and doing that even close to efficiently is very platform-dependent and error-prone. I'd rather not if we could manage it. (You can't, for example, smack a thread blocked waiting on a mutex) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel wrote: Please elaborate. How deep do you go? $h{a}{b}{c}{d}{e}{f} This is my last mail on this subject - it is a half-assed idea, and this whole thread is becoming too tedious for words. Actually, I'd extend that to the whole p6 process. In fact I think I'll just unsubscribe. It's doomed. Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
Alan Burlison [EMAIL PROTECTED] writes: Nick Ing-Simmons wrote: The tricky bit i.e. the _design_ - is to separate the op-ness from the var-ness. I assume that there is something akin to hv_fetch_ent() which takes a flag to say - by the way this is going to be stored ... I'm not entirely clear on what you mean here - is it something like this, where $a is shared and $b is unshared? $a = $a + $b; because there is a potential race condition between the initial fetch of say $a and the assignment to it? My response to this is simple - tough. That is mine too - I was trying to deduce why you thought op tree had to change. I can make a weak case for $a += $b; Expanding to a-vtable[STORE](DONE = 1) = a-vtable[FETCH](LVALUE = 1) + b-vtable[FETCH](LVALUE = 0); but that can still break easily if b turns out to be tied to something that also dorks with a. -- Nick Ing-Simmons
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: What tied scalar? All you can contain in an aggregate is a reference to a tied scalar. The bucket in the aggregate is a regular bucket. No? I tied scalar is still a scalar and can be stored in a aggregate. Well if you want to place that restriction on perl6 so be it but in perl5 I can say tie $a[4],'Something'; Indeed that is exactly how tied arrays work - they (automatically) add 'p' magic (internal tie) to their elements. Tk apps to this all the time : $parent-Lable(-textvariable = \$somehash{'Foo'}); The reference is just to get the actual element rather than a copy. Tk then ties the actual element so it can see STORE ops and up date label. -- Nick Ing-Simmons
Re: RFC 178 (v2) Lightweight Threads(multiversionning)
I don't even want to take things out a step to guarantee atomicity at the statement level. There are speed issues there, since it means every statement will need to conditionally lock everything. (Since we can't necessarily know at compile time which variables are shared and which aren't) There are also lock ordering issues, which get us deadlock fun. And, of course, let's not forget some of the statements can last a *long* time and cause all sorts of fun--eval comes to mind, as do some of the funkier regex things. ]- what if we don't use "locks", but multple versions of the same variable !!! What I have in mind : If there is transaction-based variables THEN we can use multiversioning mechanism like some DB - Interbase for example. Check here : http://216.217.141.125/document/InternalsOverview.htm just thoughts, i've not read the whole discussion. = iVAN [EMAIL PROTECTED] =
Re: RFC 178 (v2) Lightweight Threads
You aren't being clear here. fetch($a) fetch($a) fetch($b) ... add ... store($a) store($a) Now all of the perl internals are done 'safely' but the result is garbage. You don't even know the result of the addition. Sorry you are right, I wasn't clear. You are correct - the final value of $a will depend on the exact ordering of the FETCHEs and STOREs in the two threads. ...I hadn't been thinking in terms of the stack machine. OK, we could put the internal locks around fetch and store. Now, can everyone deal with these examples Example $a = 0; $thread = new Thread sub { $a++ }; $a++; $thread-join; print $a; Output: 1 or 2 Example @a = (); async { push @a, (1, 2, 3) }; push @a, (4, 5, 6); print @a; Possible output: 142536 - SWM
Re: RFC 178 (v2) Lightweight Threads
Example @a = (); async { push @a, (1, 2, 3) }; push @a, (4, 5, 6); print @a; Possible output: 142536 Actually, I'm not sure I understand this. Can someone show how to program push() on a stack machine? - SWM
Re: RFC 178 (v2) Lightweight Threads
"NI" == Nick Ing-Simmons [EMAIL PROTECTED] writes: NI Chaim Frenkel [EMAIL PROTECTED] writes: NI Well if you want to place that restriction on perl6 so be it but in perl5 NI I can say NI tie $a[4],'Something'; That I didn't realize. NI Indeed that is exactly how tied arrays work - they (automatically) add NI 'p' magic (internal tie) to their elements. Hmm, I always understood a tied array to be the _array_ not each individual element. NI Tk apps to this all the time : NI $parent-Lable(-textvariable = \$somehash{'Foo'}); NI The reference is just to get the actual element rather than a copy. NI Tk then ties the actual element so it can see STORE ops and up date NI label. Would it be a loss to not allow the elements? The tie would then be to the aggregate. I might argue that under threading tieing to the aggregate may be 'more' correct for coherency (locking the aggregate before accessing.) chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
At 06:18 PM 9/7/00 -0400, Chaim Frenkel wrote: "AB" == Alan Burlison [EMAIL PROTECTED] writes: AB Chaim Frenkel wrote: The problem I have with this plan, is reconciling the fact that a database update does all of this and more. And how to do it is a known problem, its been developed over and over again. AB I'm sorry, but you are wrong. You are confusing transactions with AB threading, and the two are fundamentally different. Transactions are AB just a way of saying 'I want to see all of these changes, or none of AB them'. You can do this even in a non-threaded environment by AB serialising everything. Deadlock avoidance in databases is difficult, AB and Oracle for example 'resolves' a deadlock by picking one of the two AB deadlocking transactions at random and forcibly aborting it. Actually, I wasn't. I was considering the locking/deadlock handling part of database engines. (Map row - variable.) The problem with using database locking and transactions as your model is that they're *expensive*. Amazingly so. The expense is certainly worth it for what you get, and in many cases the expense is hidden (at least to some extent) by the cost you pay in disk I/O, but it's definitely there. Heavyweight locking schemes are fine for relatively infrequent or expensive operations (your average DLM for cluster-wide file access is an example) but we're not dealing with rare or heavy operations. We're dealing with very lightweight, frequent operations. That means we need a really cheap locking scheme for this sort of thing, or we're going to be spending most of our time in the lock manager... Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 178 (v2) Lightweight Threads
"AB" == Alan Burlison [EMAIL PROTECTED] writes: AB Chaim Frenkel wrote: You aren't being clear here. fetch($a) fetch($a) fetch($b) ... add ... store($a) store($a) Now all of the perl internals are done 'safely' but the result is garbage. You don't even know the result of the addition. AB Sorry you are right, I wasn't clear. You are correct - the final value AB of $a will depend on the exact ordering of the FETCHEs and STOREs in the AB two threads. As I said - tough. The problem is that defining a AB 'statement' is hard. Does map or grep constitute a single statement? I AB bet most perl programmers would say 'Yes'. However I suspect it AB wouldn't be practical to make it auto-locking in the manner you AB describe. In that case you aren't actually making anyone's life easier AB by adding auto-locking, as they now have a whole new problem to solve - AB remembering which operations are and aren't auto-locking. Explicit AB locks don't require a feat of memory - they are there for all to see in AB the code. I want to make _my_ life easier. I don't expect to have mucho communication between threads, the more communication the more lossage in performace to the sheer handshaking. So with minimal interaction, why bother with the sprinkling of the lock. You in effect are tell _all_ users that they must do lock($a) ... unlock($a) around all :shared variables. Now if that has to be done why not do it automatically. AB The other issue is that auto-locking operations will inevitably be done AB inside explicitly locked sections. This is firstly inefficient as it AB adds another level of locking, and secondly may well be prone to causing AB deadlocks. Aha, You might have missed one of my comments. A lock() within a scope would turn off the auto locking. The user _knows_ what he wants and is now willing to accept responsibility. Doing that store of a value in $h, ior pushing something onto @queue is going to be a complex operation. If you are going to keep a lock on %h while the entire expression/statement completes, then you have essentially given me an atomic operation which is what I would like. AB And you have given me something that I don't like, which is to make AB every shared hash a serialisation point. I'm sure you've done a lot of work in the core and serialization. But haven't you seen that when you want to lock an entry deep in the heart of some chain, you have to _first_ lock the chain to prevent changes? So, lock(aggregate) fetch(key) lock(chain) fetch(chain) lock(value) fetch(value) unlock(value) unlock(chain) unlock(key) unlock(aggregate) Actually, these could be readlocks, and they might be freed as soon as they aren't needed, but I'm not sure that the rule to keep holding might be better. (e.g. Promotion to an exclusive), but these algorithms have already been worked on for quite a long time. AB If I'm thinking of speeding up an app that uses a shared hash by AB threading it I'll see limited speedup because under your scheme, AB any accesses will be serialised by that damn automatic lock that I AB DON'T WANT! Then just use lock() in the scope. AB A more common approach to locking hashes is to have a lock per AB chain - this allows concurrent updates to happen as long as they AB are on different chains. Don't forget that the aggregate needs to be locked before trying to lock the chain. The aggregate may disappear underneath you unless you lock it down. AB Also, I'm not clear what type of automatic lock you are intending AB to cripple me with - an exclusive one or a read/write lock for AB example. My shared variable might be mainly read-only, so AB automatically taking out an exclusive lock every time I fetch its AB value isn't really helping me much. I agree with having a read-only vs. exclusive. But do all platforms provide this type of locking? If not would the overhead of implementing it kill any performance wins. Does promoting a read-only to an exclusive cost that much? So the automatic lock would be a read-only and the store promotes to an exclusive during the short storage period. AB I think what I'm trying to say is please stop trying to be helpful AB by adding auto locking, because in most cases it will just get in AB the way. Here we are arguing about which is the more common access method. Multiple shared or singluar shared. This is an experience issue. Those times that I've done threaded code, I've kept the sharing down to a minimum. Mostly work queues, or using a variable to create a critical section. AB If you *really* desperately want it, I think it should be optional, e.g. ABmy $a : shared, auto lock; AB or somesuch. This will probably be fine for those people who are using AB threads but who don't actually understand what they are doing.
Re: RFC 178 (v2) Lightweight Threads
"AB" == Alan Burlison [EMAIL PROTECTED] writes: AB Chaim Frenkel wrote: What tied scalar? All you can contain in an aggregate is a reference to a tied scalar. The bucket in the aggregate is a regular bucket. No? AB So you don't intend being able to roll back anything that has been AB modified via a reference then? And if you do intend to allow this, how AB will you know when to stop chasing references? What happens if there AB are circular references? How much time do you think it will take to AB scan a 4Gb array to find out which elements need to be checkpointed? AB Please consider carefully the potential consequences of your proposal. No scanning. I was considering that all variables on a store would safe store the previous value in a thread specific holding area[*]. Then upon a deadlock/rollback, the changed values would be restored. (This restoration should be valid, since the change could not have taken place without an exclusive lock on the variable.) Then the execution stack and program counter would be reset to the checkpoint. And then restarted. chaim [*] Think of it as the transaction log. -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
"AB" == Alan Burlison [EMAIL PROTECTED] writes: AB Please consider carefully the potential consequences of your proposal. I just realized, that no one has submitted a language level proposal how deadlocks are detected, delivered to the perl program, how they are to be recovered from, What happens to the held locks, etc. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads(multiversionning)
"r" == raptor [EMAIL PROTECTED] writes: r ]- what if we don't use "locks", but multple versions of the same variable r !!! What I have in mind : r If there is transaction-based variables THEN we can use multiversioning r mechanism like some DB - Interbase for example. r Check here : http://216.217.141.125/document/InternalsOverview.htm r just thoughts, i've not read the whole discussion. Doesn't really help. You just move the problem to commit time. Remember, the final result has to be _as if_ all of the interleaved changes were done serially (one thread finishing before the other). If this can not be done, then one or the other thread has to be notified of deadlock and the relevant changes thrown away. (As a former boss liked to say, "Work is conserved." or perhaps TANSTAFL) chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: "JH" == Jarkko Hietaniemi [EMAIL PROTECTED] writes: JH Multithreaded programming is hard and for a given program the only JH person truly knowing how to keep the data consistent and threads not JH strangling each other is the programmer. Perl shouldn't try to be too JH helpful and get in the way. Just give user the bare minimum, the JH basic synchronization primitives, and plenty of advice. The problem I have with this plan, is reconciling the fact that a database update does all of this and more. And how to do it is a known problem, its been developed over and over again. Yes - by the PROGRAMMER that does the database access code - that is far higher level than typical perl code. If all your data lives in database and you are prepared to lock database while you get/set them. Sure we can apply that logic to making statememts coherent in perl: while (1) { lock PERL_LOCK; do_state_ment unlock PERL_LOCK; } So ONLY 1 thread is ever _in_ perl at a time - easy! But now _by constraint_ a threaded perl program can NEVER be a performance win. The reason this isn't a pain for databases is they have other things to do while they wait ... -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: Some series of points (I can't remember what they are called in C) Sequence points. where operations are consider to have completed will have to be defined, between these points operations will have to be atomic. No, quite the reverse - absolutely no promisses are made as to state of anything between sequence points - BUT - the state at the sequence points is _AS IF_ the operations between then had executed in sequence. So not _inside_ these points the sub-operations are atomic, but rather This sequence of operations is atomic. The problem with big "atoms" is that it means if CPU A. is doing a complex atomic operation. the CPU B has to stop working on perl and go find something else to do till it finishes. chaim -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: RFC 178 (v2) Lightweight Threads
Jarkko Hietaniemi wrote: Multithreaded programming is hard and for a given program the only person truly knowing how to keep the data consistent and threads not strangling each other is the programmer. Perl shouldn't try to be too helpful and get in the way. Just give user the bare minimum, the basic synchronization primitives, and plenty of advice. Amen. I've been watching the various thread discussions with increasing despair. Most of the proposals have been so uninformed as to be laughable. I'm sorry if that puts some people's noses out of joint, but it is true. Doesn't it occur to people that if it was easy to add automatic locking to a threaded language it would have been done long ago? Although I've seen some pretty whacky Perl6 RFCs, I've yet to see one that says 'Perl6 should be a major Computer Science research project'. -- Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel wrote: The problem I have with this plan, is reconciling the fact that a database update does all of this and more. And how to do it is a known problem, its been developed over and over again. I'm sorry, but you are wrong. You are confusing transactions with threading, and the two are fundamentally different. Transactions are just a way of saying 'I want to see all of these changes, or none of them'. You can do this even in a non-threaded environment by serialising everything. Deadlock avoidance in databases is difficult, and Oracle for example 'resolves' a deadlock by picking one of the two deadlocking transactions at random and forcibly aborting it. Perl has full control of its innards so up until any data leaves perl's control, perl should be able to restart any changes. Take a mark at some point, run through the code, if the changes take, we're ahead of the game. If something fails, back off to the checkpoint and try the code again. So any stretch of code with only operations on internal structures could be made eligable for retries. Which will therefore be utterly useless. And, how on earth will you identify sections that "only operate on internal data"? -- Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel wrote: UG i don't see how you can do atomic ops easily. assuming interpreter UG threads as the model, an interpreter could run in the middle of another UG and corrupt it. most perl ops do too much work for any easy way to make UG them atomic without explicit locks/mutexes. leave the locking to the UG coder and keep perl clean. in fact the whole concept of transactions in UG perl makes me queasy. leave that to the RDBMS and their ilk. If this is true, then give up on threads. Perl will have to do atomic operations, if for no other reason than to keep from core dumping and maintaining sane states. I don't see that this is necessarily true. The best suggestion I have seen so far is to have each thread be effectively a separate instance of the interpreter, with all variables being by default local to that thread. If inter-thread communication is required it would be done via special 'shareable' variables, which are appropriately protected to ensure all operations on them are atomic, and that concurrent access doesn't cause corruption. This avoids the locking penalty for 95% of the cases where variables won't be shared. Note however that it will *still* be necessary to provide primitive locking operations, because code will inevitably require exclusive access to more than one shared variable at the same time: push(@shared_names, "fred"); $shared_name_count++; Will need a lock around it for example. Another good reason for having separate interpreter instances for each thread is it will allow people to write non-threaded modules that can still be safely used inside a threaded program. Let's not forget that the overwhelming bulk of CPAN modules will probably never be threaded. By loading the unthreaded module inside a 'wrapper' thread in the program you can safely use an unthreaded module in a threaded program - as far as the module is concerned, the fact that there are multiple threads is invisible. This will however require that different threads are allowed to have different optrees - perhaps some sort of 'copy on write' semantic should be used so that optrees can be shared cheaply for the cases where no changes are made to it. Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel wrote: I'd like to make the easy things easy. By making _all_ shared variables require a user level lock makes the code cluttered. In some (I think) large percentage of cases, a single variable or queue will be use to communicate between threads. Why not make it easy for the programmer. Because contrary to your assertion I fear it will be a special case that will cover such a tiny percentage of useful threaded code as to make it virtually useless. In general any meaningful operation that needs to be covered by a lock will involve the update of several pieces of state, and implicit locking just won't work. We are not talking syntactical niceties here - the code plain won't work. It's these isolated "drop something in the mailbox" that a lock around the statement would make sense. An exact definition of 'statement' would help. Also, some means of beaming into the skull of every perl6 developer exactly what does and does not constitute a statement would be useful ;-) It is all right sweeping awkward details under the rug, but make the mound big enough and everyone will trip over it. my $a :shared; $a += $b; If you read my suggestion carefully, you would see that I explicitly covered this case and said that the internal consistency of $a would always be maintained (it would have to be otherwise the interpreter would explode), so two threads both adding to a shared $a would result in $a being updated appropriately - it is just that you wouldn't know the order in which the two additions were made. I think you are getting confused between the locking needed within the interpreter to ensure that it's internal state is always consistent and sane, and the explicit application-level locking that will have to be in multithreaded perl programs to make them function correctly. Interpreter consistency and application correctness are *not* the same thing. my %h :shared; $h{$xyz} = $somevalue; my @queue :shared; push(@queue, $b); Again, all of these would have to be OK in an interpreter that ensured internal consistency. The trouble is if you want to update both $a, %h and @queue in an atomic fashion - then the application programmer MUST state his intent to the interpreter by providing explicit locking around the 3 updates. -- Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel wrote: AB I'm sorry, but you are wrong. You are confusing transactions with AB threading, and the two are fundamentally different. Transactions are AB just a way of saying 'I want to see all of these changes, or none of AB them'. You can do this even in a non-threaded environment by AB serialising everything. Deadlock avoidance in databases is difficult, AB and Oracle for example 'resolves' a deadlock by picking one of the two AB deadlocking transactions at random and forcibly aborting it. Actually, I wasn't. I was considering the locking/deadlock handling part of database engines. (Map row - variable.) Locking, transactions and deadlock detection are all related, but aren't the same thing. Relational databases and procedural programming languages aren't the same thing. Beware of misleading comparisons. How on earth does a compiler recognize checkpoints (or whatever they are called) in an expression. If you are talking about SQL it doesn't. You have to explicitly say where you want a transaction completed (COMMIT) or aborted (ROLLBACK). Rollback goes back to the point of the last COMMMIT. I'm probably way off base, but this was what I had in mind. (I. == Internal) I.Object - A non-tied scalar or aggregate object I.Expression - An expression (no function calls) involving only SObjects I.Operation - (non-io operators) operating on I.Expressions I.Function - A function that is made up of only I.Operations/I.Expressions I.Statement - A statment made up of only I.Functions, I.Operations and I.Expressions And if the aggregate contains a tied scalar - what then? The only way of knowing this would be to check every item of an aggregate before starting. I think not. Because if we can recover, we can take locks in arbitrary order and simply retry on deadlock. A variable could put its prior value into an undo log for use in recovery. Nope. Which one of the competing transactions wins? Do you want a nondeterministic outcome? Deadlocks are the bane of any DBAs life. They are exceedingly difficult to track down, and generally the first course of the DBA is to go looking for the responsible programmer with a baseball bat in one hand and a body bag in the other. If you get a deadlock it means your application is broken - it is trying to do two things which are mutually inconsistent at the same time. If you feel that automatically resolving this class of problem is an appropriate thing for perl to do, please sumbit an RFC entitled "Why perl6 should automatically fix all the broken programs out there and how I suggest it should be done". Then you can sit back and wait for the phonecall from Stockholm ;-) -- Alan Burlison
Re: RFC 178 (v2) Lightweight Threads
I think there may be a necessity for more than just a work area to be non-shared. There has been no meaningful discussion so far related to the fact that the vast majority of perl6 modules will *NOT* be threaded, but that people will want to use them in threaded programs. That is a non-trivial problem that may best be solved by keeping the entirety of such modules private to a single thread. In that case the optree might also have to be private, and with that and private work area it looks very much like a full interpreter to me. RFC 1 proposes this model, and there was some discussion of it on perl6-language-flow. RFC 178 argues against it, under DISCUSSION, Globals and Reentrancy. - SWM
Re: RFC 178 (v2) Lightweight Threads
On Thu, 07 Sep 2000, Steven W McDougall wrote: RFC 1 proposes this model, and there was some discussion of it on perl6-language-flow. Which is strange, since it was released for this group. Hmmm. But yes, we did seem to hash out at least some of this before, which, to Steven's credit, was the reason behind RFC 178. (To document an alternate solution to, and possible shortcomings of, RFC 1.) To reiterate (or clarify) RFC 1 - I'll investigate the next rev this weekend - the only atomicy (atomicity?) I was guaranteeing automatically in the shared variables was really fetch and restore. (In other words, truly internal. Whether that would extend to op dispatch, or other truly internal variable attributes would be left for those with more internals intuits than I. Existence is also another thing to be guaranteed, for whatever GC method we're going to use, but I think that's assumed.) $b = $a + foo($a); The $a passed to foo() is not guaranteed *by perl* to be the same $a the return value is added to. But the $a that you start introspecting to retrieve the value so that you can pass that value to foo() is guaranteed to be the same $a at the completion of retrieving that value. That's all. Any more automagical guarantees beyond that is beyond the scope of RFC 1, and my abilities, for that matter. -- Bryan C. Warnock ([EMAIL PROTECTED])
Re: RFC 178 (v2) Lightweight Threads
-Original Message- From: Nick Ing-Simmons [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Cc: Jarkko Hietaniemi [EMAIL PROTECTED]; Dan Sugalski [EMAIL PROTECTED]; Perl6-Internals [EMAIL PROTECTED]; Nick Ing-Simmons [EMAIL PROTECTED] Date: Thursday, September 07, 2000 9:03 AM Subject: Re: RFC 178 (v2) Lightweight Threads Alan Burlison [EMAIL PROTECTED] writes: Jarkko Hietaniemi wrote: Multithreaded programming is hard and for a given program the only person truly knowing how to keep the data consistent and threads not strangling each other is the programmer. Perl shouldn't try to be too helpful and get in the way. Just give user the bare minimum, the basic synchronization primitives, and plenty of advice. Amen. I've been watching the various thread discussions with increasing despair. I am glad it isn't just me ! And thanks for re-stating the interpreter-per-thread model. Most of the proposals have been so uninformed as to be laughable. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd. Ok, I'm not super familiar with threads so bear with me, and smack me upside the head when need be. But if we want threads written in Perl6 to be able to take advantage of mulitple processors, won't we inherently have to make perl6 multithreaded itself (and thus multiple instances of the interpreter)? Glenn King
Re: RFC 178 (v2) Lightweight Threads
(We are not (quite) discussing what to do for Perl6 any longer. I'm going though a learning phase here. I.e. where are my thoughts miswired.) "AB" == Alan Burlison [EMAIL PROTECTED] writes: Actually, I wasn't. I was considering the locking/deadlock handling part of database engines. (Map row - variable.) AB Locking, transactions and deadlock detection are all related, but aren't AB the same thing. Relational databases and procedural programming AB languages aren't the same thing. Beware of misleading comparisons. You are conflating what I'm saying. Doing locking and deadlock detection is the mapping. Transactions/rollback is what I was suggesting perl could use to accomplish under the covers recovery. How on earth does a compiler recognize checkpoints (or whatever they are called) in an expression. AB If you are talking about SQL it doesn't. You have to explicitly say AB where you want a transaction completed (COMMIT) or aborted (ROLLBACK). AB Rollback goes back to the point of the last COMMMIT. Sorry, I meant 'C' and Nick pointed out the correct term was sequence point. I'm probably way off base, but this was what I had in mind. (I. == Internal) I.Object - A non-tied scalar or aggregate object I.Expression - An expression (no function calls) involving only SObjects I.Operation - (non-io operators) operating on I.Expressions I.Function - A function that is made up of only I.Operations/I.Expressions I.Statement - A statment made up of only I.Functions, I.Operations and I.Expressions AB And if the aggregate contains a tied scalar - what then? The only way AB of knowing this would be to check every item of an aggregate before AB starting. I think not. What tied scalar? All you can contain in an aggregate is a reference to a tied scalar. The bucket in the aggregate is a regular bucket. No? Because if we can recover, we can take locks in arbitrary order and simply retry on deadlock. A variable could put its prior value into an undo log for use in recovery. AB Nope. Which one of the competing transactions wins? Do you want a AB nondeterministic outcome? It is already non-deterministic. Even if you lock up the gazoo, depending upon how the threads get there the value can be anything. Thread aThread B lock($a); $a=2; unlock($a); lock($a); $a=5; unlock($a); Is the value 5 or 2? It doesn't matter. All that a sequence of locking has to accomplish is to make them look as one or the other completed in sequence. (I've got a reference here somewhere to this definition of consistancy) The approach that I was suggesting is somewhat akin to (what I understand) a versioning approach to transactions would take. AB Deadlocks are the bane of any DBAs life. Not any of the DBAs that I'm familiar with. They just let the application programmers duke it out. AB If you get a deadlock it means your application is broken - it is AB trying to do two things which are mutually inconsistent at the AB same time. Sorry, that doesn't mean anything. There may be more than one application in a Database. And they may have very logical things that they need done in a different order. The Deadlock could quite well be the effect of the database engine. (I know sybase does this (or at least did it a few revisions ago. It took the locks it needed on an index a bit late.) A deadlock is not a sin or something wrong. Avoiding it is a useful (extremely useful) optimization. Working with it might be another approach. I think of it like I think of ethernet's back off and retry. AB If you feel that automatically resolving this class of problem is AB an appropriate thing for perl to do. Because I did it already in a simple situation. I wrote a layer that handled database interactions. Given a set of database operations, I saved a queue of all operations. If a deadlock occured I retried it until successful _unless_ I had already returned some data to the client. Once some data was returned I cleaned out the queue. The recovery was invisible to the client. Since no data ever left my service layer, no external effects/changes could have been made. Similarly, all of the locking and deadlocks here could be internal to perl, and never visible to the user, so taking out a series of locks, even if they do deadlock, perl can recover. Again, this is probably too expensive and complex, but it isn't something that is completely infeasible. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
"AB" == Alan Burlison [EMAIL PROTECTED] writes: my $a :shared; $a += $b; AB If you read my suggestion carefully, you would see that I explicitly AB covered this case and said that the internal consistency of $a would AB always be maintained (it would have to be otherwise the interpreter AB would explode), so two threads both adding to a shared $a would result AB in $a being updated appropriately - it is just that you wouldn't know AB the order in which the two additions were made. You aren't being clear here. fetch($a) fetch($a) fetch($b) ... add ... store($a) store($a) Now all of the perl internals are done 'safely' but the result is garbage. You don't even know the result of the addition. Without some of this minimal consistency, Every shared variable even those without cross variable consistancy, will need locks sprinkled around. AB I think you are getting confused between the locking needed within the AB interpreter to ensure that it's internal state is always consistent and AB sane, and the explicit application-level locking that will have to be in AB multithreaded perl programs to make them function correctly. AB Interpreter consistency and application correctness are *not* the same AB thing. I just said the same thing to someone else. I've been assuming that perl would make sure it doesn't dump core. I've been arguing for having perl do a minimal guarentee at the user level. my %h :shared; $h{$xyz} = $somevalue; my @queue :shared; push(@queue, $b); AB Again, all of these would have to be OK in an interpreter that ensured AB internal consistency. The trouble is if you want to update both $a, %h AB and @queue in an atomic fashion - then the application programmer MUST AB state his intent to the interpreter by providing explicit locking around AB the 3 updates. Sorry, internal consistancy isn't enough. Doing that store of a value in $h, ior pushing something onto @queue is going to be a complex operation. If you are going to keep a lock on %h while the entire expression/statement completes, then you have essentially given me an atomic operation which is what I would like. I think we all would agree that an op is atomic. +, op=, push, delete exists, etc. Yes? Then let's go on from there. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
"NI" == Nick Ing-Simmons [EMAIL PROTECTED] writes: NI The snag with attempting to automate such things is illustrated by : NI thread Athread B NI $a = $a + $b++; $b = $b + $a++; NI So we need to 'lock' both $a and $b both sides. NI So thread A will attempt to acquire locks on $a,$b (say) NI and (in this case by symetry but perhaps just by bad luck) thread B will NI go for locks on $b,$a - opposite order. They then both get 1st lock NI they wanted and stall waiting for the 2nd. We are in then in NI a "classic" deadly embrace. NI So the 'dragons' that Dan alludes to are those of intuiting the locks NI and the sequence of the locks to acquire, deadlock detection and backoff, ... Agreed. But for a single 'statement', it may be possible to gather all the objects needing a lock and then grabbing them in order (say by address). Also the thread doesn't need to make any changes until all the locks are available so a backoff algorithm may work. This would keep a _single_ statment 'consistent'. But wouldn't do anything for anything more complex. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
"DS" == Dan Sugalski [EMAIL PROTECTED] writes: DS Well, there'll be safe access to individual variables when perl needs to DS access them, but that's about it. DS Some things we can guarantee to be atomic. The auto increment/decrement DS operators can be reasonably guaranteed atomic, for example. But I don't DS think we should go further than "instantaneous access to shared data will DS see consistent internal data structures". This is going to be tricky. A list of atomic guarentees by perl will be needed. $a[++$b]; pop(@a); push(@a, @b); Will these? And given that users will be doing the locking. What do you see for handling deadlock detection and recovery/retry. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
But for a single 'statement', it may be possible to gather all the objects needing a lock and then grabbing them in order (say by address). I still don't buy that. In Perl even simple assignments and increments are not atomic which means that even 'single statements' would require locking and unlocking of a pile of data structures, leaving plenty of room to both inconsistencies and deadlocks. This would keep a _single_ statment 'consistent'. But wouldn't do anything for anything more complex. Why bother, then? Multithreaded programming is hard and for a given program the only person truly knowing how to keep the data consistent and threads not strangling each other is the programmer. Perl shouldn't try to be too helpful and get in the way. Just give user the bare minimum, the basic synchronization primitives, and plenty of advice. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: RFC 178 (v2) Lightweight Threads
"JH" == Jarkko Hietaniemi [EMAIL PROTECTED] writes: JH Multithreaded programming is hard and for a given program the only JH person truly knowing how to keep the data consistent and threads not JH strangling each other is the programmer. Perl shouldn't try to be too JH helpful and get in the way. Just give user the bare minimum, the JH basic synchronization primitives, and plenty of advice. my views exactly. most perl programs will not be multithreaded. we can support them with locks, a thread per interpreter paradigm and support other stuff like event and signal delivery and such. but no implied locks. the coder has to do some work and take responsibility. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting The Perl Books Page --- http://www.sysarch.com/cgi-bin/perl_books The Best Search Engine on the Net -- http://www.northernlight.com
Re: RFC 178 (v2) Lightweight Threads
DS Some things we can guarantee to be atomic. This is going to be tricky. A list of atomic guarentees by perl will be needed. From RFC 178 ...we have to decide which operations are [atomic]. As a starting point, we can take all the operators documented in Cperlop.pod and all the functions documented in Cperlfunc.pod as [atomic]. - SWM
Re: RFC 178 (v2) Lightweight Threads
what if i do $i++ and overflow into the float (or bigint) domain? that is enough work that you would need to have a lock around the ++. so then all ++ would have implied locks and their baggage. i say no atomic ops in perl. From RFC 178 [Atomic] operations typically lock their operands to avoid race conditions Perl source C Implementation $a = $b lock(a.mutex); lock(b.mutex); free(a.pData); a.length = b.length; a.pData = malloc(a.length); memcpy(a.pData, b.pData, a.length); unlock(a.mutex); unlock(b.mutex); leave the locking to the coder and keep perl clean. If we don't provide this level of locking internally, then async { $a = $b } is liable to crash the interpreter. - SWM
Re: RFC 178 (v2) Lightweight Threads
"UG" == Uri Guttman [EMAIL PROTECTED] writes: UG i don't see how you can do atomic ops easily. assuming interpreter UG threads as the model, an interpreter could run in the middle of another UG and corrupt it. most perl ops do too much work for any easy way to make UG them atomic without explicit locks/mutexes. leave the locking to the UG coder and keep perl clean. in fact the whole concept of transactions in UG perl makes me queasy. leave that to the RDBMS and their ilk. If this is true, then give up on threads. Perl will have to do atomic operations, if for no other reason than to keep from core dumping and maintaining sane states. If going from an int to a bigint is not atomic, woe to anyone using threads. If it is atomic, then the ++ has to be atomic, since the actual operation isn't complete until it finishes. Think ++$a(before int, after ++ value is bigint) Some series of points (I can't remember what they are called in C) where operations are consider to have completed will have to be defined, between these points operations will have to be atomic. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: RFC 178 (v2) Lightweight Threads
At 10:57 PM 9/4/00 -0400, Chaim Frenkel wrote: "SWM" == Steven W McDougall [EMAIL PROTECTED] writes: PRL All threads share the same global variables _All_ or only as requested by the user (ala :shared)? SWM All. Dan has gone through this with perl5 and he really would rather not have to go through that. He would like the amount of data that needs protection reduced. I don't mind if package variables are all shared by default, or if there's a simple way to share them all at thread creation time. We do tell people to not use package variables so if access to them is slow, well, no biggie. You are also creating problems when I don't want mediation. What if I know better than perl and I want to us a single item to protect a critical section? I'd definitely rather perl not do any sort of explicit user-level locking. That's not our job, and there be dragons. SWM Data coherence just means that the interpreter won't crash or corrupt SWM its internal data representation. RFC178 uses the term *data SWM synchronization* for coordinating access to multiple variables between SWM threads. Then this RFC seems to be confusing two things. This is for -internals we don't even have any internal structures, so how can you be protecting them. If you are working at the language level this is the wrong forum. Perl will guarantee coherence for any internal data structure that's shared between threads. No core dumps because lock()'s not thread-safe here... Perhaps, I'm archaic, but I really wouldn't mind if the thread model basically copied the fork() model and required those variable that have to live across threads to be marked as :shared. SWM Sigh...if that's the best I can get, I'll take it. I'm not the decisor here, I'm just pointing out another way to look at the problem. I really don't think you want to have _all_ variable actually visible. Even if they were, you will most likely have only a limited number that you want visible. Reducing the number of visible items is good, because it means perl doesn't have to keep its internal locks on thread-specific data elements. Perl can, to some extent, intuit which items aren't visible and skip making them shared, but that requires what's likely to be rather expensive flow analysis. (Which we might do anyway, sort of, but I wouldn't count on it) Taking out mutexes isn't free. Cheap, yes, but not free, and those 20ns mutex aquisition and releases do add up after a while. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 178 (v2) Lightweight Threads
What I'm trying to do in RFC178 is take the thread model that you get in compiled languages like C and C++, and combine it with the Perl5 programming model in a way that makes sense. There may be reasons not to follow RFC178 in Perl6. Maybe - it's too hard to implement - there are performance problems - Perl6 can actually do more for the user - it just doesn't make sense for Perl6 But RFC178 is the thread model that I'd like to program in, and I'm spec'ing it in the hopes that I'll actually get it in Perl6. PRL All threads see the same compiled subroutines Why? Why not allow two different threads to have a different view of the universe? 1. That's how it works in compiled languages. You have one .exe, and all threads run it. 2. Thread programming is difficult to begin with. A language where different threads see different code could be *very* difficult to program in. PRL All threads share the same global variables _All_ or only as requested by the user (ala :shared)? All. PRL Each thread gets its own copy of block-scoped lexicals upon execution PRL of Cmy Why? Perhaps I want a shared my? Different invocations of a subroutine within the same thread get their own lexicals. It seems a natural extension to say that different invocations of a subroutine in different threads also get their own lexicals. PRL Threads can share block-scoped lexicals by passing a reference to a PRL lexical into a thread, by declaring one subroutine within the scope of PRL another, or with closures. Sounds complex to me. Why not make it simply visible by marking it as such? These are the ways in which one subroutine can get access to the lexical variables of another in Perl5. RFC178 specifies that these mechanisms work across threads. PRL The interpreter guarantees data coherence It can't, don't even try. What if I need two or more variables kept in sync. The user has to mediate. Perl can't determine this. Data coherence just means that the interpreter won't crash or corrupt its internal data representation. RFC178 uses the term *data synchronization* for coordinating access to multiple variables between threads. Perhaps, I'm archaic, but I really wouldn't mind if the thread model basically copied the fork() model and required those variable that have to live across threads to be marked as :shared. Sigh...if that's the best I can get, I'll take it. - SWM
Re: RFC 178 (v2) Lightweight Threads
"SWM" == Steven W McDougall [EMAIL PROTECTED] writes: PRL All threads see the same compiled subroutines Why? Why not allow two different threads to have a different view of the universe? SWM 1. That's how it works in compiled languages. You have one .exe, and SWM all threads run it. Perl is not C. One of its strengths is its introspection. SWM 2. Thread programming is difficult to begin with. A language where SWM different threads see different code could be *very* difficult to SWM program in. I'm thinking of threads as fork on steroids. Fork doesn't let you easily share things. What we really should get is the isolation of fork, but with the ease of sharing what is necessary. And I don't know about you, but I don't see what is morally wrong with having one thread using foo and getting 7 back and another using foo and getting an -42. PRL All threads share the same global variables _All_ or only as requested by the user (ala :shared)? SWM All. Dan has gone through this with perl5 and he really would rather not have to go through that. He would like the amount of data that needs protection reduced. You are also creating problems when I don't want mediation. What if I know better than perl and I want to us a single item to protect a critical section? PRL Threads can share block-scoped lexicals by passing a reference to a PRL lexical into a thread, by declaring one subroutine within the scope of PRL another, or with closures. Sounds complex to me. Why not make it simply visible by marking it as such? SWM These are the ways in which one subroutine can get access to the SWM lexical variables of another in Perl5. RFC178 specifies that these SWM mechanisms work across threads. References are a completely different animal than access. A data item is independent of a thread. It is a chunk of memory. If a thread can see it, then it is available. PRL The interpreter guarantees data coherence It can't, don't even try. What if I need two or more variables kept in sync. The user has to mediate. Perl can't determine this. SWM Data coherence just means that the interpreter won't crash or corrupt SWM its internal data representation. RFC178 uses the term *data SWM synchronization* for coordinating access to multiple variables between SWM threads. Then this RFC seems to be confusing two things. This is for -internals we don't even have any internal structures, so how can you be protecting them. If you are working at the language level this is the wrong forum. Perhaps, I'm archaic, but I really wouldn't mind if the thread model basically copied the fork() model and required those variable that have to live across threads to be marked as :shared. SWM Sigh...if that's the best I can get, I'll take it. I'm not the decisor here, I'm just pointing out another way to look at the problem. I really don't think you want to have _all_ variable actually visible. Even if they were, you will most likely have only a limited number that you want visible. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183