Re: Threads... last call
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Never? Yes. From the user's perspective, synchronized objects more often than not provide no value! For instance, this Java code, run from competing threads, does not perform its intended purpose: // Vector was Java 1's dynamically sizable array class. if (!vector.Contains(obj)) vector.Add(obj); I does not prevent obj from appearing more than once in the collection. It's equivalent to this: bool temp; synchronized (collection) { temp = vector.Contains(obj); } // Preemption is possible between here... if (!temp) { sychronized (collection) { // ... and here. vector.Add(obj); } } The correct code is this: synchronized (vector) { if (!vector.Contains(obj)) vector.Add(obj); } If we again make explicit Vector's synchronized methods, a startling redundancy will become apparent: synchronized (vector) { bool temp; synchronized (collection) { temp = vector.Contains(obj); } if (!temp) { sychronized (collection) { vector.Add(obj); } } } This code is performing 3 times as many locks as necessary! More realistic code will perform many times more than 3 times the necessary locking. Imagine the waste in sorting a 10,000 element array. Beyond that stunning example of wasted effort, most structures aren't shared in the first place, and so the overhead is *completely* unnecessary for them. Still further, many shared objects are unmodified once they become shared, and so again require no locking. The only time automatically synchronized objects are useful is when the user's semantics exactly match the object's methods. (e.g., an automatically synchronized queue might be quite useful.) In that case, it's a trivial matter for the user to wrap the operation in a synchronized blockor subclass the object with a method that does so. But it is quite impossible to remove the overhead from the other 99% of uses, since the VM cannot discern the user's locking strategy. If the user is writing a threaded program, then data integrity is necessarily his problema virtual machine cannot solve it for him, and automatic synchronization provides little help. Protecting a policy-neutral object (such as an array) from its user is pointless; all we could do is to ensure that the object is in a state consistent with an incorrect sequence of operations. That's useless to the user, because his program still behaves incorrectly; the object is, to his eyes, corrupted since it no longer maintains his invariant conditions. Thus, since the VM cannot guarantee that the the user's program behaves as intended (only as written), and all that locking is wasted 4 times overthe virtual machine would be wise to limit its role to the prevention of crashes and to limiting the corruption that can result from incorrect (unsynchronized) access to objects. If automatic locking is the only way to do that for a particular data structure, then so be it. Oftentimes, though, thoughtful design can ensure that even unsynchronized accesses cannot crash or corrupt the VM as a whole--even if those operations might corrupt one object's state, or provide less than useful results. As an example of how this applies to parrot, all of the FixedMumbleArray classes[*] recently discussed could clearly be implemented completely and safely without automatic locks on all modern platformsif only parrot could allow lock-free access to at least some PMCs. ([*] Except FixedMixedArray. That stores UVals [plus more for a discriminator field], and couldn't be quite safe, as a UVal isn't an atomic write on many platforms. Then again, nor is a double.) Remember, Java already made the mistake of automatic synchronization with their original
Re: Threads... last call
On Wed, Jan 28, 2004 at 12:53:09PM -0500, Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. Quite the contrary--it is most useful. Parrot must, we all agree, under no circumstances crash due to unsynchronized data access. For it to do so would be, among other things, a gross security hole when running untrusted code in a restricted environment. There is no need for any further guarantee about unsynchronized data access, however. If unsyncronized threads invariably cause an exception, that's fine. If they cause the threads involved to halt, that's fine too. If they cause what was once an integer variable to turn into a string containing the definition of mulching...well, that too falls under the heading of undefined results. Parrot cannot and should not attempt to correct for bugs in user code, beyond limiting the extent of the damage to the threads and data structures involved. Java, when released, took the path that Parrot appears to be about to take--access to complex data structures (such as Vector) was always synchronized. This turned out to be a mistake--sufficiently so that Java programmers would often implement their own custom, unsynchronized replacements for the core classes. As a result, when the Collections library (which replaces those original data structures) was released, the classes in it were left unsynchronized. In Java's case, the problem was at the library level, not the VM level; as such, it was relatively easy to fix at a later date. Parrot's VM-level data structure locking will be less easy to change. - Damien
Re: IMCC - PerlArray getting trounced
Sorry about the delay in responding. My current sample program is 2760 lines of imcc in 23 files, plus a small .tcl script. I'll see if I can trim that down to a more reasonable test case. On Monday, January 26, 2004, at 05:32 AM, Leopold Toetsch wrote: Will Coleda [EMAIL PROTECTED] wrote: I'm trying to track down a problem with a PerlArray that is getting modified on me. I have a snippet of code like: Could you please provide a complete program, that imposes the bug? leo -- Will Coke Coledawill at coleda dot com
Re: t/src/extend.t hanging, and a (probably not very good) patch
Hi, I'll look into what's involved with upgrading. I'm not sure if I'll elect to do it or not. It's probably the one major upgrade to a linux system that I've never done so I'd like to do it as a learning experience. At the same time, I've heard it can be painful. And downloading anything with the Internet connections I have in this country (Malaysia) *is* painful :-p Jeff On Sat, Jan 24, 2004 at 11:09:11PM -0800, chromatic wrote: On Sat, 2004-01-24 at 19:51, Jeffrey Dik wrote: glibc-2.3.1-51a according to rpm -q Interesting. I'm using 2.3.2. Do you have the opportunity to try out NPTL with a newer glibc and kernel? It may involve some pain, so that's fine if you'd rather not. (I'm in the same boat.) I'm just about ready to bring this up on the Linux PPC user list in the meantime. -- c
[DOCS] Updated documentation in src
I've add inline docs to everything in src (except for malloc.c and malloc-trace.c). At times I wondered whether this was the right thing to do. For example, in mmd.c, where Dan had already created a mmd.pod, I ended up duplicating information. At other times I reckoned that what was needed was an autodoc. Other times the best I could do was rephrase the function name. All issues to address in phase 2. Next I think, for a bit of light relief, I'll do the examples. For those who want to browse: http://homepage.mac.com/michael_scott/Parrot/docs/html/ Mike
Messages delayed, out of order
What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to that message. And Leo, who responded to the most recent, had his email make it to the list before either of these. =-) Regards.
Re: Messages delayed, out of order
[EMAIL PROTECTED] (Will Coleda) writes: What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to that message. And Leo, who responded to the most recent, had his email make it to the list before either of these. =-) I think the mail servers for cpan.org/perl.org are having to shift rather a lot of mail at the moment, for some reason. -- Pray to God, but keep rowing to shore. -- Russian Proverb
Re: Some namespace notes
On Jan 28, 2004, at 6:42 AM, Peter Haworth wrote: On Thu, 15 Jan 2004 23:00:53 -0800, Jeff Clites wrote: I think we shouldn't try to do any sort of cross-language unification. That is, if we some day have a Parrot version of Java, and in Perl6 code I want to reference a global created inside of some Java class I've loaded in, it would be clearer to just reference this as java.lang.String.CASE_INSENSITIVE_ORDER, even inside of Perl6 code-- rather than having to do something like java::lang::String::CASE_INSENSITIVE_ORDER. Parrot itself would be completely ignorant of any concept of a separator character--these would just be uninterpreted strings, and foo::bar and foo.bar would be separate namespaces, whatever the language. What about languages which have the same separator, such as '::' (perl5, perl6, ruby) or '.' (java, python)? They are going to be unified either way. Not necessarily. Each language has a choice, at compile-time, of what separator to pass to the parrot API. For instance, even though I type Foo::Bar in my perl5 code, it's possible that the perl5 compiler passes Foo+++Bar on to parrot. So each language, via its compiler, has a choice of what it wants to do: unify, explicitly avoid unification, or not care. My point is that parrot shouldn't force the unification, but rather let each language choose its policy. I think it's confusing to try to unify namespaces across languages, and doesn't buy us anything. Without namespace unification, how else are you going to even specify File::Spec from java import java.lang.reflect.*; Class.forName(File::Spec); , or java.lang.string from perl5? bless $ref, java.lang.String; We can obviously invent suitable syntax for perl6, so that it can cope with arbitrarily named packages, but we don't have that luxury with most of the other languages we want to support. I believe that most languages will have similar reflection API; if not, we can add such API--languages which didn't previously have all of the features which parrot provides either don't get all of those features, or will need to take advantage of them via new API. Then the question becomes, What about namespace clashes?, which Tim has already addressed. We could certainly do some sort of language-specific prefixing, as Tim suggested, but it seems that we are then going to trouble to unify, only to immediately de-unify. Certainly, a random Java programmer shouldn't have to worry about naming a class so that it doesn't conflict with any class in any other language in the world--that's silly, especially since this Java programmer may not even know about parrot. (But it matters, if later someone wants to run his code on top of parrot.) If we use per-language prefixing, then I'm certainly going to have to be aware of what language is hosting a given class, and so it seems natural to instead just use that class name as I would expect it to be written--java.lang.String for Java, for example. And again, one of my basic points here is that although namespace nesting could be useful for some things (maybe), it isn't what's going on in most languages anyway. For instance, Java does not have nested namespaces--it just happens to have a convention for namespace naming which involves dots, and also uses a dot as the separator between a namespace name and the variable or class name. (Though I suppose there's really 2 separate, but related, issues--how namespace names are represented internally, and whether these names imply namespace nesting.) JEff
Re: Threads... last call
At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote: On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Pardon me but I've apparently lost track of context here. I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) -Melvin
Re: Messages delayed, out of order
Simon Cozens wrote: I think the mail servers for cpan.org/perl.org are having to shift rather a lot of mail at the moment, for some reason. For those that are unaware there is currently a rather serious virus in the wild at the moment and a lot of mailing lists are being afffected. I think its called the MyDoom virus or some such terribly script kiddy like name. So, if you receive an email with subject hello and a zipped attachment then delete it. As far as I am aware it only affects the smell checker of those on loonix H
Re: IO subsystem stuff
On Jan 27, 2004, at 3:47 PM, Cory Spencer wrote: Perhaps someone with a bit more familiarity with the Parrot IO subsystem could give me some guidance here. I'm currently trying to get a new 'peek' opcode working, and I'm having difficulties getting the io_unix layer implemented correctly. As far as I know, I'd get a call down into the io_unix layer when the ParrotIO object isn't buffered. What I want to be able to do is to read()/fread() a character off of the io-fd filedescriptor, copy it into the buffer, then ungetc() it back onto the stream. You can't push a character back onto a Unix file descriptor. In order to emulate this for parrot, you'll need some storage hanging off of the ParrotIO structure to store the pushed back characters, and then munge the read methods to pull data from here before reading from the real descriptor, if there has been anything pushed back. This is, essentially what the C std. lib. buffered IO API does--the core Unix IO API doesn't provide this functionality. For parrot, I think that we should only do this for the io_buf layer (and maybe the io_stdio layer), which is the buffered IO layer, and already has a buffer which can be used for this purpose. I don't think it's appropriate for the io_unix layer--I see that as a direct wrapper around the Unix API. Unfortunately, however, ungetc requires a (FILE *), while the ParrotIO object carries around only a raw file descriptor (I think). Yes, the C std. lib. IO API is a wrapper on top of the core OS IO routines (for Unix or Windows), and we're using the core IO routines to implement our IO functionality, rather than going through an extra layer. (And io_stdio is based on the C std. lib., and I believe is provided so that it can be used on systems for which none of the other base layers is available--non-Unix and non-Windows.) I've seen some instances where people will cast the raw descriptor to a (FILE *) I can't imagine where that would ever work. A (FILE *) is a pointer to a struct which stores various bits of data, including the actual file descriptor. A file descriptor is just an integer, and isn't going to be interpretable as a pointer to such a struct--using the core Unix IO API, no such struct will have been created anywhere in memory. So you can't get a FILE* via any sort of casting, at least not on Unix platforms. however the man page for ungetc warns ominously in its BUGS section that: It is not advisable to mix calls to input functions from the stdio library with low - level calls to read() for the file descriptor associated with the input stream; the results will be undefined and very probably not what you want. This is warning about something else. It's saying don't use the C API to do IO on a FILE*, and also use the Unix IO API on the descriptor which is is the fileno() of that FILE*. But even this you wouldn't do by casting--you'd either get the descriptor from the FILE* using fileno(), or use fdopen() to create a FILE* from a descriptor. But in any event, we don't want to use the C std. lib. IO API inside of the io_unix layer. That being said, what is the best course for buffering such characters at the io_unix layer? I apparently am not able to use the standard library functions to do so (additionally, they only guarantee that you can peek and replace a single character). As I said above, I think we'd only want to do this for the io_buf layer, though others may disagree. If we do want to do it at the io_unix layer, then we can just copy down a bunch of code from io_buf, because we will be making the io_unix layer a buffered layer (with the difference being that the buffer would only be populated in the case of pushing back read items, an not during reads). JEff
Re: Threads... last call
Melvin Smith [EMAIL PROTECTED] wrote: I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) Basically we have three kinds of locking: - HLL user level locking [1] - user level locking primitives [2] - vtable pmc locking to protect internals Locking at each stage and for each PMC will be slow and can deadlock too. Very coarse grained locking (like Pythons interpreter_lock) doesn't give any advantage on MP systems - only one interpreter is running at one time. We can't solve user data integrity at the lowest level: data logic and such isn't really visible here. But we should be able to integrate HLL locking with our internal needs, so that the former doesn't cause deadlocks[3] because we have to lock internally too, and we should be able to omit internal locking, if HLL locking code already provides this safety for a specific PMC. The final strategy when to lock what depends on the system the code is running. Python's model is fine for single processors. Fine grained PMC locking gives more boost on multi-processor machines. All generalization is evil and 47.4 +- 1.1% of all statistics^Wbenchmarks are wrong;) leo [1] BLOCK scoped, e.g. synchronized {... } or { lock $x; ... } These can be rwlocks or mutex typed locks [2] lock.aquire, lock.release [3] not user caused deadlocks - the mix of e.g. one user lock and internal locking. -Melvin leo
Re: Some namespace notes
Jeff Clites writes: We could certainly do some sort of language-specific prefixing, as Tim suggested, but it seems that we are then going to trouble to unify, only to immediately de-unify. Certainly, a random Java programmer shouldn't have to worry about naming a class so that it doesn't conflict with any class in any other language in the world--that's silly, especially since this Java programmer may not even know about parrot. (But it matters, if later someone wants to run his code on top of parrot.) If we use per-language prefixing, then I'm certainly going to have to be aware of what language is hosting a given class, and so it seems natural to instead just use that class name as I would expect it to be written--java.lang.String for Java, for example. And again, one of my basic points here is that although namespace nesting could be useful for some things (maybe), it isn't what's going on in most languages anyway. For instance, Java does not have nested namespaces--it just happens to have a convention for namespace naming which involves dots, and also uses a dot as the separator between a namespace name and the variable or class name. (Though I suppose there's really 2 separate, but related, issues--how namespace names are represented internally, and whether these names imply namespace nesting.) But in fact, one of Perl 6's key features depends on heirarchical namespaces: namespace aliasing. And then if the Java programmer accidentally named a class the same a some perl module (unlikely, because you probably won't see Perl with a org::bazbar::Foo module), Perl is able to move that namespace somewhere else to make room for another conflicting module. Luke
Re: Messages delayed, out of order
On Thursday, January 29, 2004, at 01:51 PM, Simon Cozens wrote: [EMAIL PROTECTED] (Will Coleda) writes: What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to that message. And Leo, who responded to the most recent, had his email make it to the list before either of these. =-) I think the mail servers for cpan.org/perl.org are having to shift rather a lot of mail at the moment, for some reason. Heh. I had guessed the cause of the delay, but the out of order-ness was confusing, until I thought about it for a minute. =-) (Note to self: think about all sends for a minute before pressing enter.) -- Will Coke Coledawill at coleda dot com
Re: Messages delayed, out of order
I have been getting out of order messages from this list for months... I just assumed that the internet was a mysterious thing... Matt Will Coleda wrote: On Thursday, January 29, 2004, at 01:51 PM, Simon Cozens wrote: [EMAIL PROTECTED] (Will Coleda) writes: What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to that message. And Leo, who responded to the most recent, had his email make it to the list before either of these. =-) I think the mail servers for cpan.org/perl.org are having to shift rather a lot of mail at the moment, for some reason. Heh. I had guessed the cause of the delay, but the out of order-ness was confusing, until I thought about it for a minute. =-) (Note to self: think about all sends for a minute before pressing enter.) -- Will Coke Coledawill at coleda dot com
Re: [DOCS] Updated documentation in src
Matt Fowles [EMAIL PROTECTED] wrote: [ another TOFU [1] ] AOL leo Mike~ You rock. That is really nice. Matt Michael Scott wrote: I've add inline docs to everything in src (except for malloc.c and malloc-trace.c). At times I wondered whether this was the right thing to do. For example, in mmd.c, where Dan had already created a mmd.pod, I ended up duplicating information. At other times I reckoned that what was needed was an autodoc. Other times the best I could do was rephrase the function name. All issues to address in phase 2. Next I think, for a bit of light relief, I'll do the examples. For those who want to browse: http://homepage.mac.com/michael_scott/Parrot/docs/html/ Mike [1] German: Text oben, fullquote unten - a posting style I don't like (and an ETLA that very likely doesn't deserve more translation :) - *but* there are of course exceptions ;)
Re: Messages delayed, out of order
On Thursday, January 29, 2004, at 09:12 , Matt Fowles wrote: I have been getting out of order messages from this list for months... I just assumed that the internet was a mysterious thing... Methinks the list is also manually moderated Gordon Henriksen [EMAIL PROTECTED]
Re: Threads... last call
On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote: At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote: On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Pardon me but I've apparently lost track of context here. I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) A shared data structure, as per Dan's document? It's a somewhat novel approach, trying to avoid locking overhead with dynamic dispatch and vtable swizzling. I'm discussing somewhat more traditional technologies, which simply allow an object to perform equally correctly and with no differentiation between shared and unshared cases. In essence, I'm arguing that a shared case isn't necessary for some data structures in the first place. Gordon Henriksen [EMAIL PROTECTED]