Re: Threads... last call

2004-01-29 Thread Gordon Henriksen
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data structures 
such as arrays and hashes.  Collections are not synchronized; access 
involves no locks at all.  Multiple threads accessing the same 
collection at the same time cannot, however, result in the virtual 
machine crashing.  (They can result in data structure corruption, but 
this corruption is limited to surprising results rather than VM 
crash.)
But this accomplishes nothing useful and still means the data structure 
is not re-entrant, nor is it corruption resistant, regardless of how 
we judge it.
It does accomplish something very useful indeed: It avoids the overhead 
of automatic locking when it isn't necessary. When *is* that locking 
necessary? To a second order approximation, ***NEVER.***

Never? Yes. From the user's perspective, synchronized objects more 
often than not provide no value! For instance, this Java code, run from 
competing threads, does not perform its intended purpose:

//  Vector was Java 1's dynamically sizable array class.
if (!vector.Contains(obj))
vector.Add(obj);
I does not prevent obj from appearing more than once in the collection. 
It's equivalent to this:

bool temp;
synchronized (collection) {
temp = vector.Contains(obj);
}
//  Preemption is possible between here...
if (!temp) {
sychronized (collection) {
//  ... and here.
vector.Add(obj);
}
}
The correct code is this:

synchronized (vector) {
if (!vector.Contains(obj))
vector.Add(obj);
}
If we again make explicit Vector's synchronized methods, a startling 
redundancy will become apparent:

synchronized (vector) {
bool temp;
synchronized (collection) {
temp = vector.Contains(obj);
}
if (!temp) {
sychronized (collection) {
vector.Add(obj);
}
}
}
This code is performing 3 times as many locks as necessary! More 
realistic code will perform many times more than 3 times the necessary 
locking. Imagine the waste in sorting a 10,000 element array.

Beyond that stunning example of wasted effort, most structures aren't 
shared in the first place, and so the overhead is *completely* 
unnecessary for them.

Still further, many shared objects are unmodified once they become 
shared, and so again require no locking.

The only time automatically synchronized objects are useful is when the 
user's semantics exactly match the object's methods. (e.g., an 
automatically synchronized queue might be quite useful.) In that case, 
it's a trivial matter for the user to wrap the operation in a 
synchronized blockor subclass the object with a method that does so. 
But it is quite impossible to remove the overhead from the other 99% of 
uses, since the VM cannot discern the user's locking strategy.

If the user is writing a threaded program, then data integrity is 
necessarily his problema virtual machine cannot solve it for him, and 
automatic synchronization provides little help. Protecting a 
policy-neutral object (such as an array) from its user is pointless; all 
we could do is to ensure that the object is in a state consistent with 
an incorrect sequence of operations. That's useless to the user, because 
his program still behaves incorrectly; the object is, to his eyes, 
corrupted since it no longer maintains his invariant conditions.

Thus, since the VM cannot guarantee that the the user's program behaves 
as intended (only as written), and all that locking is wasted 4 times 
overthe virtual machine would be wise to limit its role to the 
prevention of crashes and to limiting the corruption that can result 
from incorrect (unsynchronized) access to objects. If automatic locking 
is the only way to do that for a particular data structure, then so be 
it. Oftentimes, though, thoughtful design can ensure that even 
unsynchronized accesses cannot crash or corrupt the VM as a whole--even 
if those operations might corrupt one object's state, or provide less 
than useful results.

As an example of how this applies to parrot, all of the FixedMumbleArray 
classes[*] recently discussed could clearly be implemented completely 
and safely without automatic locks on all modern platformsif only 
parrot could allow lock-free access to at least some PMCs. ([*] Except 
FixedMixedArray. That stores UVals [plus more for a discriminator 
field], and couldn't be quite safe, as a UVal isn't an atomic write on 
many platforms. Then again, nor is a double.)

Remember, Java already made the mistake of automatic synchronization 
with their original 

Re: Threads... last call

2004-01-29 Thread Damien Neil
On Wed, Jan 28, 2004 at 12:53:09PM -0500, Melvin Smith wrote:
 At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:
 Java Collections are a standard Java library of common data structures
 such as arrays and hashes.  Collections are not synchronized; access
 involves no locks at all.  Multiple threads accessing the same
 collection at the same time cannot, however, result in the virtual
 machine crashing.  (They can result in data structure corruption,
 but this corruption is limited to surprising results rather than
 VM crash.)
 
 But this accomplishes nothing useful and still means the data structure
 is not re-entrant, nor is it corruption resistant, regardless of how we 
 judge it.

Quite the contrary--it is most useful.

Parrot must, we all agree, under no circumstances crash due to
unsynchronized data access.  For it to do so would be, among other
things, a gross security hole when running untrusted code in a
restricted environment.

There is no need for any further guarantee about unsynchronized data
access, however.  If unsyncronized threads invariably cause an exception,
that's fine.  If they cause the threads involved to halt, that's fine
too.  If they cause what was once an integer variable to turn into a
string containing the definition of mulching...well, that too falls
under the heading of undefined results.  Parrot cannot and should
not attempt to correct for bugs in user code, beyond limiting the extent
of the damage to the threads and data structures involved.

Java, when released, took the path that Parrot appears to be about
to take--access to complex data structures (such as Vector) was
always synchronized.  This turned out to be a mistake--sufficiently
so that Java programmers would often implement their own custom,
unsynchronized replacements for the core classes.  As a result,
when the Collections library (which replaces those original data
structures) was released, the classes in it were left unsynchronized.

In Java's case, the problem was at the library level, not the VM
level; as such, it was relatively easy to fix at a later date.
Parrot's VM-level data structure locking will be less easy to change.

 - Damien


Re: IMCC - PerlArray getting trounced

2004-01-29 Thread Will Coleda
Sorry about the delay in responding.

My current sample program is 2760 lines of imcc in 23 files, plus a 
small .tcl script.

I'll see if I can trim that down to a more reasonable test case.

On Monday, January 26, 2004, at 05:32  AM, Leopold Toetsch wrote:

Will Coleda [EMAIL PROTECTED] wrote:
I'm trying to track down a problem with a PerlArray that is getting
modified on me.

I have a snippet of code like:
Could you please provide a complete program, that imposes the bug?

leo


--
Will Coke Coledawill at coleda 
dot com



Re: t/src/extend.t hanging, and a (probably not very good) patch

2004-01-29 Thread Jeffrey Dik
Hi,

I'll look into what's involved with upgrading.  I'm not sure if I'll
elect to do it or not.  It's probably the one major upgrade to a linux
system that I've never done so I'd like to do it as a learning
experience.  At the same time, I've heard it can be painful.  And
downloading anything with the Internet connections I have in this
country (Malaysia) *is* painful :-p

Jeff

On Sat, Jan 24, 2004 at 11:09:11PM -0800, chromatic wrote:
 On Sat, 2004-01-24 at 19:51, Jeffrey Dik wrote:
 
  glibc-2.3.1-51a according to rpm -q
 
 Interesting.  I'm using 2.3.2.  Do you have the opportunity to try out
 NPTL with a newer glibc and kernel?  It may involve some pain, so that's
 fine if you'd rather not.  (I'm in the same boat.)
 
 I'm just about ready to bring this up on the Linux PPC user list in the
 meantime.
 
 -- c
 
 


[DOCS] Updated documentation in src

2004-01-29 Thread Michael Scott
I've add inline docs to everything in src (except for malloc.c and 
malloc-trace.c).

At times I wondered whether this was the right thing to do. For 
example, in mmd.c, where Dan had already created a mmd.pod, I ended up 
duplicating information. At other times I reckoned that what was needed 
was an autodoc. Other times the best I could do was rephrase the 
function name. All issues to address in phase 2.

Next I think, for a bit of light relief, I'll do the examples.

For those who want to browse:

	http://homepage.mac.com/michael_scott/Parrot/docs/html/

Mike




Messages delayed, out of order

2004-01-29 Thread Will Coleda

What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 
19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to 
that message. And Leo, who responded to the most recent, had his email make it to the 
list before either of these. =-)

Regards.




Re: Messages delayed, out of order

2004-01-29 Thread Simon Cozens
[EMAIL PROTECTED] (Will Coleda) writes:
 What's going on with the ordering of messages? My message of : Tue, 27 Jan 2004 
 19:53:15 -0500 just made it to the list, a day after my (also delayed) /followup/ to 
 that message. And Leo, who responded to the most recent, had his email make it to 
 the list before either of these. =-)

I think the mail servers for cpan.org/perl.org are having to shift
rather a lot of mail at the moment, for some reason.

-- 
Pray to God, but keep rowing to shore.
 -- Russian Proverb


Re: Some namespace notes

2004-01-29 Thread Jeff Clites
On Jan 28, 2004, at 6:42 AM, Peter Haworth wrote:

On Thu, 15 Jan 2004 23:00:53 -0800, Jeff Clites wrote:
I think we shouldn't try to do any sort of cross-language unification.
That is, if we some day have a Parrot version of Java, and in Perl6 
code I
want to reference a global created inside of some Java class I've 
loaded
in, it would be clearer to just reference this as
java.lang.String.CASE_INSENSITIVE_ORDER, even inside of Perl6 code--
rather than having to do something like
java::lang::String::CASE_INSENSITIVE_ORDER. Parrot itself would be
completely ignorant of any concept of a separator character--these 
would
just be uninterpreted strings, and foo::bar and foo.bar would be
separate namespaces, whatever the language.
What about languages which have the same separator, such as '::' 
(perl5,
perl6, ruby) or '.' (java, python)? They are going to be unified 
either way.
Not necessarily. Each language has a choice, at compile-time, of what 
separator to pass to the parrot API. For instance, even though I type 
Foo::Bar in my perl5 code, it's possible that the perl5 compiler 
passes Foo+++Bar on to parrot. So each language, via its compiler, 
has a choice of what it wants to do: unify, explicitly avoid 
unification, or not care. My point is that parrot shouldn't force the 
unification, but rather let each language choose its policy.

I think it's confusing to try to unify namespaces across languages, 
and
doesn't buy us anything.
Without namespace unification, how else are you going to even specify
File::Spec from java
import java.lang.reflect.*;
Class.forName(File::Spec);
, or java.lang.string from perl5?
bless $ref, java.lang.String;

We can obviously invent suitable syntax for perl6, so that it can cope 
with arbitrarily named packages, but we don't have that luxury with 
most of the other languages we want to support.
I believe that most languages will have similar reflection API; if not, 
we can add such API--languages which didn't previously have all of the 
features which parrot provides either don't get all of those features, 
or will need to take advantage of them via new API.

Then the question becomes, What about namespace clashes?, which Tim 
has
already addressed.
We could certainly do some sort of language-specific prefixing, as Tim 
suggested, but it seems that we are then going to trouble to unify, 
only to immediately de-unify. Certainly, a random Java programmer 
shouldn't have to worry about naming a class so that it doesn't 
conflict with any class in any other language in the world--that's 
silly, especially since this Java programmer may not even know about 
parrot. (But it matters, if later someone wants to run his code on top 
of parrot.) If we use per-language prefixing, then I'm certainly going 
to have to be aware of what language is hosting a given class, and so 
it seems natural to instead just use that class name as I would expect 
it to be written--java.lang.String for Java, for example.

And again, one of my basic points here is that although namespace 
nesting could be useful for some things (maybe), it isn't what's going 
on in most languages anyway. For instance, Java does not have nested 
namespaces--it just happens to have a convention for namespace naming 
which involves dots, and also uses a dot as the separator between a 
namespace name and the variable or class name. (Though I suppose 
there's really 2 separate, but related, issues--how namespace names are 
represented internally, and whether these names imply namespace 
nesting.)

JEff



Re: Threads... last call

2004-01-29 Thread Melvin Smith
At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote:
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data structures 
such as arrays and hashes.  Collections are not synchronized; access 
involves no locks at all.  Multiple threads accessing the same 
collection at the same time cannot, however, result in the virtual 
machine crashing.  (They can result in data structure corruption, but 
this corruption is limited to surprising results rather than VM crash.)
But this accomplishes nothing useful and still means the data structure 
is not re-entrant, nor is it corruption resistant, regardless of how we 
judge it.
It does accomplish something very useful indeed: It avoids the overhead of 
automatic locking when it isn't necessary. When *is* that locking 
necessary? To a second order approximation, ***NEVER.***
Pardon me but I've apparently lost track of context here.

I thought we were discussing correct behavior of a shared data structure,
not general cases. Or maybe this is the general case and I should
go read more backlog? :)
-Melvin




Re: Messages delayed, out of order

2004-01-29 Thread Harry Jackson
Simon Cozens wrote:
I think the mail servers for cpan.org/perl.org are having to shift
rather a lot of mail at the moment, for some reason.
For those that are unaware there is currently a rather serious virus in 
the wild at the moment and a lot of mailing lists are being afffected. I 
think its called the MyDoom virus or some such terribly script kiddy 
like name. So, if you receive an email with subject hello and a zipped 
attachment then delete it. As far as I am aware it only affects the 
smell checker of those on loonix

H



Re: IO subsystem stuff

2004-01-29 Thread Jeff Clites
On Jan 27, 2004, at 3:47 PM, Cory Spencer wrote:

Perhaps someone with a bit more familiarity with the Parrot IO 
subsystem
could give me some guidance here.  I'm currently trying to get a new
'peek' opcode working, and I'm having difficulties getting the io_unix
layer implemented correctly.

As far as I know, I'd get a call down into the io_unix layer when the
ParrotIO object isn't buffered.  What I want to be able to do is to
read()/fread() a character off of the io-fd filedescriptor, copy it 
into
the buffer, then ungetc() it back onto the stream.
You can't push a character back onto a Unix file descriptor. In order 
to emulate this for parrot, you'll need some storage hanging off of the 
ParrotIO structure to store the pushed back characters, and then 
munge the read methods to pull data from here before reading from the 
real descriptor, if there has been anything pushed back. This is, 
essentially what the C std. lib. buffered IO API does--the core Unix IO 
API doesn't provide this functionality. For parrot, I think that we 
should only do this for the io_buf layer (and maybe the io_stdio 
layer), which is the buffered IO layer, and already has a buffer which 
can be used for this purpose. I don't think it's appropriate for the 
io_unix layer--I see that as a direct wrapper around the Unix API.

Unfortunately, however, ungetc requires a (FILE *), while the ParrotIO 
object carries around only a raw file descriptor (I think).
Yes, the C std. lib. IO API is a wrapper on top of the core OS IO 
routines (for Unix or Windows), and we're using the core IO routines to 
implement our IO functionality, rather than going through an extra 
layer. (And io_stdio is based on the C std. lib., and I believe is 
provided so that it can be used on systems for which none of the other 
base layers is available--non-Unix and non-Windows.)

I've seen some instances where people will cast the raw descriptor to a
(FILE *)
I can't imagine where that would ever work. A (FILE *) is a pointer to 
a struct which stores various bits of data, including the actual file 
descriptor. A file descriptor is just an integer, and isn't going to be 
interpretable as a pointer to such a struct--using the core Unix IO 
API, no such struct will have been created anywhere in memory. So you 
can't get a FILE* via any sort of casting, at least not on Unix 
platforms.

however the man page for ungetc warns ominously in its BUGS
section that:
   It  is  not advisable to mix calls to input functions from
   the stdio library with low - level calls to read() for the
   file  descriptor  associated  with  the  input stream; the
   results will be undefined and very probably not  what  you
   want.
This is warning about something else. It's saying don't use the C API 
to do IO on a FILE*, and also use the Unix IO API on the descriptor 
which is is the fileno() of that FILE*. But even this you wouldn't do 
by casting--you'd either get the descriptor from the FILE* using 
fileno(), or use fdopen() to create a FILE* from a descriptor.

But in any event, we don't want to use the C std. lib. IO API inside of 
the io_unix layer.

That being said, what is the best course for buffering such characters 
at
the io_unix layer?  I apparently am not able to use the standard 
library
functions to do so (additionally, they only guarantee that you can peek
and replace a single character).
As I said above, I think we'd only want to do this for the io_buf 
layer, though others may disagree. If we do want to do it at the 
io_unix layer, then we can just copy down a bunch of code from io_buf, 
because we will be making the io_unix layer a buffered layer (with the 
difference being that the buffer would only be populated in the case of 
pushing back read items, an not during reads).

JEff



Re: Threads... last call

2004-01-29 Thread Leopold Toetsch
Melvin Smith [EMAIL PROTECTED] wrote:

 I thought we were discussing correct behavior of a shared data structure,
 not general cases. Or maybe this is the general case and I should
 go read more backlog? :)

Basically we have three kinds of locking:
- HLL user level locking [1]
- user level locking primitives [2]
- vtable pmc locking to protect internals

Locking at each stage and for each PMC will be slow and can deadlock
too. Very coarse grained locking (like Pythons interpreter_lock) doesn't
give any advantage on MP systems - only one interpreter is running at
one time.

We can't solve user data integrity at the lowest level: data logic and
such isn't really visible here. But we should be able to integrate HLL
locking with our internal needs, so that the former doesn't cause
deadlocks[3] because we have to lock internally too, and we should be able
to omit internal locking, if HLL locking code already provides this
safety for a specific PMC.

The final strategy when to lock what depends on the system the code is
running. Python's model is fine for single processors. Fine grained PMC
locking gives more boost on multi-processor machines.

All generalization is evil and 47.4 +- 1.1% of all
statistics^Wbenchmarks are wrong;)

leo

[1] BLOCK scoped, e.g. synchronized {... } or { lock $x; ... }
These can be rwlocks or mutex typed locks
[2] lock.aquire, lock.release

[3] not user caused deadlocks - the mix of e.g. one user lock and
internal locking.

 -Melvin

leo


Re: Some namespace notes

2004-01-29 Thread Luke Palmer
Jeff Clites writes:
 We could certainly do some sort of language-specific prefixing, as Tim 
 suggested, but it seems that we are then going to trouble to unify, 
 only to immediately de-unify. Certainly, a random Java programmer 
 shouldn't have to worry about naming a class so that it doesn't 
 conflict with any class in any other language in the world--that's 
 silly, especially since this Java programmer may not even know about 
 parrot. (But it matters, if later someone wants to run his code on top 
 of parrot.) If we use per-language prefixing, then I'm certainly going 
 to have to be aware of what language is hosting a given class, and so 
 it seems natural to instead just use that class name as I would expect 
 it to be written--java.lang.String for Java, for example.
 
 And again, one of my basic points here is that although namespace 
 nesting could be useful for some things (maybe), it isn't what's going 
 on in most languages anyway. For instance, Java does not have nested 
 namespaces--it just happens to have a convention for namespace naming 
 which involves dots, and also uses a dot as the separator between a 
 namespace name and the variable or class name. (Though I suppose 
 there's really 2 separate, but related, issues--how namespace names are 
 represented internally, and whether these names imply namespace 
 nesting.)

But in fact, one of Perl 6's key features depends on heirarchical
namespaces: namespace aliasing.  And then if the Java programmer
accidentally named a class the same a some perl module (unlikely,
because you probably won't see Perl with a org::bazbar::Foo module),
Perl is able to move that namespace somewhere else to make room for
another conflicting module.

Luke


Re: Messages delayed, out of order

2004-01-29 Thread Will Coleda
On Thursday, January 29, 2004, at 01:51  PM, Simon Cozens wrote:

[EMAIL PROTECTED] (Will Coleda) writes:
What's going on with the ordering of messages? My message of : Tue, 
27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my 
(also delayed) /followup/ to that message. And Leo, who responded to 
the most recent, had his email make it to the list before either of 
these. =-)
I think the mail servers for cpan.org/perl.org are having to shift
rather a lot of mail at the moment, for some reason.
Heh. I had guessed the cause of the delay, but the out of order-ness 
was confusing, until I thought about it for a minute. =-)

(Note to self: think about all sends for a minute before pressing 
enter.)

--
Will Coke Coledawill at coleda 
dot com



Re: Messages delayed, out of order

2004-01-29 Thread Matt Fowles
I have been getting out of order messages from this list for months... 
I just assumed that the internet was a mysterious thing...

Matt

Will Coleda wrote:

On Thursday, January 29, 2004, at 01:51  PM, Simon Cozens wrote:

[EMAIL PROTECTED] (Will Coleda) writes:

What's going on with the ordering of messages? My message of : Tue, 
27 Jan 2004 19:53:15 -0500 just made it to the list, a day after my 
(also delayed) /followup/ to that message. And Leo, who responded to 
the most recent, had his email make it to the list before either of 
these. =-)


I think the mail servers for cpan.org/perl.org are having to shift
rather a lot of mail at the moment, for some reason.


Heh. I had guessed the cause of the delay, but the out of order-ness was 
confusing, until I thought about it for a minute. =-)

(Note to self: think about all sends for a minute before pressing enter.)

--
Will Coke Coledawill at coleda dot 
com





Re: [DOCS] Updated documentation in src

2004-01-29 Thread Leopold Toetsch
Matt Fowles [EMAIL PROTECTED] wrote:

[ another TOFU [1] ]

AOL

leo

 Mike~

 You rock.  That is really nice.

 Matt

 Michael Scott wrote:
 I've add inline docs to everything in src (except for malloc.c and
 malloc-trace.c).

 At times I wondered whether this was the right thing to do. For example,
 in mmd.c, where Dan had already created a mmd.pod, I ended up
 duplicating information. At other times I reckoned that what was needed
 was an autodoc. Other times the best I could do was rephrase the
 function name. All issues to address in phase 2.

 Next I think, for a bit of light relief, I'll do the examples.

 For those who want to browse:

 http://homepage.mac.com/michael_scott/Parrot/docs/html/

 Mike

[1]

German: Text oben, fullquote unten - a posting style I don't like
(and an ETLA that very likely doesn't deserve more translation :) -
*but* there are of course exceptions ;)


Re: Messages delayed, out of order

2004-01-29 Thread Gordon Henriksen
On Thursday, January 29, 2004, at 09:12 , Matt Fowles wrote:

I have been getting out of order messages from this list for months... 
I just assumed that the internet was a mysterious thing...
Methinks the list is also manually moderated



Gordon Henriksen
[EMAIL PROTECTED]


Re: Threads... last call

2004-01-29 Thread Gordon Henriksen
On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote:

At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote:

On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data 
structures such as arrays and hashes.  Collections are not 
synchronized; access involves no locks at all.  Multiple threads 
accessing the same collection at the same time cannot, however, 
result in the virtual machine crashing.  (They can result in data 
structure corruption, but this corruption is limited to surprising 
results rather than VM crash.)
But this accomplishes nothing useful and still means the data 
structure is not re-entrant, nor is it corruption resistant, 
regardless of how we judge it.
It does accomplish something very useful indeed: It avoids the 
overhead of automatic locking when it isn't necessary. When *is* that 
locking necessary? To a second order approximation, ***NEVER.***
Pardon me but I've apparently lost track of context here.

I thought we were discussing correct behavior of a shared data 
structure, not general cases. Or maybe this is the general case and I 
should go read more backlog? :)
A shared data structure, as per Dan's document? It's a somewhat novel 
approach, trying to avoid locking overhead with dynamic dispatch and 
vtable swizzling. I'm discussing somewhat more traditional technologies, 
which simply allow an object to perform equally correctly and with no 
differentiation between shared and unshared cases. In essence, I'm 
arguing that a shared case isn't necessary for some data structures in 
the first place.



Gordon Henriksen
[EMAIL PROTECTED]