Re: Some namespace notes
On Jan 15, 2004, at 9:52 AM, Dan Sugalski wrote: At 10:13 AM -0800 1/13/04, Jeff Clites wrote: Here are some notes on namespaces, picking up a thread from about a month ago: On Dec 11, 2003, at 8:57 AM, Dan Sugalski wrote: That does, though, argue that we need to revisit the global access opcodes. If we're going hierarchic, and we want to separate out the name from the namespace, that would seem to argue that we'd want it to look like: find_global P1, ['global', 'namespace', 'hierarchy'], thingname That is, split the namespace path from the name of the thing, and make the namespace path a multidimensional key. I definitely agree that we should have separate slots for namespace and name, as you have above. So I think the discussion boils down to whether a namespace specifier is logically a string or an array of strings. Short version: I was originally going to argue for fully hierarchical namespaces, identified as above, but after turning this over in my head for a while, I came to the conclusion that namespaces are not conceptually hierarchical (especially as used in languages such as Perl5 and Java, at least), so I'm going to argue for a single string (rather than an array) as a namespace identifier. Here's my big, and in fact *only*, reason to go hierarchical: We don't need to mess around with separator character substitution. Other than that I don't much care and, as you've pointed out, most of the languages don't really do a hierarchical structure as such. Going hierarchical, though, means we don't have to do ::/:///whatever substitutions to present a unified view of the global namespaces. A key part of my argument (and it's find if you understood this, and disagree--just wanted to make sure that it was clear) is that I think we shouldn't try to do any sort of cross-language unification. That is, if we some day have a Parrot version of Java, and in Perl6 code I want to reference a global created inside of some Java class I've loaded in, it would be clearer to just reference this as java.lang.String.CASE_INSENSITIVE_ORDER, even inside of Perl6 code--rather than having to do something like java::lang::String::CASE_INSENSITIVE_ORDER. Parrot itself would be completely ignorant of any concept of a separator character--these would just be uninterpreted strings, and foo::bar and foo.bar would be separate namespaces, whatever the language. I think it's confusing to try to unify namespaces across languages, and doesn't buy us anything. I think it's much cleaner to say namespaces have names which are arbitrary strings, and if you want to put colons or periods in the name, so what--parrot doesn't care. (That said, if the Perl6 creators and the Java-on-Parrot creators decided that it _is_ good to try to unify their namespaces, they could still do this at the compiler level--so maybe Perl6 would substitute . for :: in namespace names at compile time. But parrot itself wouldn't know or care. And, if the Python people decide it's better not to try to unify with this mega-namespace, that's up to them.) So I'm arguing here against a unified view of the global namespaces. But, if we decide that's needed, then I definitely agree that it's best to avoid having some magic separator character--much cleaner to treat it as an array. JEff
Re: JVM as a threading example (threads proposal)
Damien Neil [EMAIL PROTECTED] wrote: On Thu, Jan 15, 2004 at 09:31:39AM +0100, Leopold Toetsch wrote: I don't see any advantage of such a model. The more as it doesn't gurantee any atomic access to e.g. long or doubles. The atomic access to ints and pointers seems to rely on the architecture but is of course reasonable. You *can't* guarantee atomic access to longs and doubles on some architectures, unless you wrap every read or write to one with a lock. The CPU support isn't there. Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. Parrot deals with PMCs, which can contain (lets consider scalars only) e.g. a PerlInt or a PerlNumer. Now we would have atomic access (normally) to the former and very likely non-atomic access to the latter just depending on the value which happened to be stored in the PMC. This implies, that we have to wrap almost[1] all shared write *and* read PMC access with LOCK/UNLOCK. [1] except plain ints and pointers on current platforms - Damien leo
Re: Problem during make test
Chromatic [EMAIL PROTECTED] wrote: On Sun, 2004-01-04 at 12:09, Harry Jackson wrote: I tried that as well, it spits out identical PASM each time but on the odd occasion I need to use CTRL-C to get back to the shell. I'm seeing the same thing on Linux PPC -- odd hangs from time to time when running PIR, while running the PASM emitted with -o works well. t/op/arithmetics 3 and 9 seem to be the big culprits in the test suite. Could you attach gdb to the hanging parrot? $ cat sl.pasm sleep 1 end $ parrot sl.pasm [ in second term ] $ ps ax | grep [p]arrot 28952 pts/0S 0:00 parrot sl.pasm 28953 pts/0S 0:00 parrot sl.pasm 28954 pts/0S 0:00 parrot sl.pasm $ gdb parrot 28952 GNU gdb 5.3 ... 0x4011a391 in __libc_nanosleep () at __libc_nanosleep:-1 -1 __libc_nanosleep: No such file or directory. in __libc_nanosleep (gdb) bac #0 0x4011a391 in __libc_nanosleep () at __libc_nanosleep:-1 #1 0x4011a31b in __sleep (seconds=1) at ../sysdeps/unix/sysv/linux/sleep.c:82 #2 0x08086792 in Parrot_sleep (seconds=1) at src/platform.c:47 #3 0x080f89c4 in Parrot_sleep_ic (cur_opcode=0x826e488, interpreter=0x824b0a8) at ops/sys.ops:151 #4 0x08082921 in runops_slow_core (interpreter=0x824b0a8, pc=0x826e488) at src/runops_cores.c:115 ... This is on linux, the lowest PID is the main thread. There should be some hints, where it hangs. -- c leo
[FYI] Win32 SFU
I don't know, if we should depend on that, but it would definitely help. Could some Windows guys have a look at: http://www.microsoft.com/windows/sfu/ cite [Interoperability. Integration. Extensibility.] Windows Services for UNIX (SFU) 3.5 provides the tools and environment that IT professionals and developers need to integrate Windows with UNIX and Linux environments. /cite leo
Re: Some namespace notes
On Jan 15, 2004, at 8:26 PM, Benjamin K. Stuhl wrote: Thus wrate Dan Sugalski: At 10:13 AM -0800 1/13/04, Jeff Clites wrote: Short version: I was originally going to argue for fully hierarchical namespaces, identified as above, but after turning this over in my head for a while, I came to the conclusion that namespaces are not conceptually hierarchical (especially as used in languages such as Perl5 and Java, at least), so I'm going to argue for a single string (rather than an array) as a namespace identifier. ... Performance-wise, I would guesstimate that it's more-or-less a wash between parsing strings and parsing multidimensional keys, so as long as we precreate the keys (keep thm in a constant table or something), I see no performance issues. It turns out that it makes a big difference in lookup times--doing one hash lookup v. several. I did this experiment using Perl5 (5.8.0): Create a structure holding 1296 entries, each logically 12 characters long--either one level of 12 character strings, or 2 levels of 6 character strings, or 3 levels of 4 character strings, or 4 levels of 3 character strings, and look up the same item 10 million times. Here is the time it takes for the lookups: 1-level: 14 sec. 2-level: 20 sec. 3-level: 25 sec. 4-level: 32 sec. Conclusion: It's faster to do one lookup of a single, longer string than several lookups of shorter strings. Of course, as Uri pointed out, even if we go with hierarchical namespaces, we could implement these internally as a single-level hash, behind the scenes, as an implementation detail and optimization. JEff
Re: Problem during make test
On Thu, 2004-01-15 at 23:26, Leopold Toetsch wrote: Could you attach gdb to the hanging parrot? This time, it's hanging at t/op/00ff-dos.t: (gdb) bac #0 0x0fd0e600 in sigsuspend () from /lib/libc.so.6 #1 0x0ff970ac in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0 #2 0x0ff96cf8 in pthread_onexit_process () from /lib/libpthread.so.0 #3 0x0fd10bc8 in exit () from /lib/libc.so.6 #4 0x1008c750 in Parrot_exit (status=0) at src/exit.c:54 #5 0x100320b4 in main (argc=1, argv=0x75c0) at imcc/main.c:555 Here's another run, this time hanging at test #3 in t/op/arithmetics.t: #0 0x0fd0e600 in sigsuspend () from /lib/libc.so.6 #1 0x0ff970ac in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0 #2 0x0ff96cf8 in pthread_onexit_process () from /lib/libpthread.so.0 #3 0x0fd10bc8 in exit () from /lib/libc.so.6 #4 0x1008c750 in Parrot_exit (status=0) at src/exit.c:54 #5 0x100320b4 in main (argc=1, argv=0x75b0) at imcc/main.c:555 I can upgrade glibc to see if that helps. -- c
Re: Some namespace notes
Jeff Clites writes: On Jan 15, 2004, at 8:26 PM, Benjamin K. Stuhl wrote: Thus wrate Dan Sugalski: At 10:13 AM -0800 1/13/04, Jeff Clites wrote: Short version: I was originally going to argue for fully hierarchical namespaces, identified as above, but after turning this over in my head for a while, I came to the conclusion that namespaces are not conceptually hierarchical (especially as used in languages such as Perl5 and Java, at least), so I'm going to argue for a single string (rather than an array) as a namespace identifier. ... Performance-wise, I would guesstimate that it's more-or-less a wash between parsing strings and parsing multidimensional keys, so as long as we precreate the keys (keep thm in a constant table or something), I see no performance issues. It turns out that it makes a big difference in lookup times--doing one hash lookup v. several. I did this experiment using Perl5 (5.8.0): Create a structure holding 1296 entries, each logically 12 characters long--either one level of 12 character strings, or 2 levels of 6 character strings, or 3 levels of 4 character strings, or 4 levels of 3 character strings, and look up the same item 10 million times. Here is the time it takes for the lookups: 1-level: 14 sec. 2-level: 20 sec. 3-level: 25 sec. 4-level: 32 sec. Conclusion: It's faster to do one lookup of a single, longer string than several lookups of shorter strings. Of course, as Uri pointed out, even if we go with hierarchical namespaces, we could implement these internally as a single-level hash, behind the scenes, as an implementation detail and optimization. My two cents: I don't care as long as we can toss symbol tables around as PMCs, and replace symbol tables with different (alternately implemented) PMCs. I think it's possible to do this using a clever ordered hash scheme even if we go one level, but it's something to keep in mind. Luke JEff
Re: JVM as a threading example (threads proposal)
On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote: Damien Neil [EMAIL PROTECTED] wrote: On Thu, Jan 15, 2004 at 09:31:39AM +0100, Leopold Toetsch wrote: I don't see any advantage of such a model. The more as it doesn't gurantee any atomic access to e.g. long or doubles. The atomic access to ints and pointers seems to rely on the architecture but is of course reasonable. You *can't* guarantee atomic access to longs and doubles on some architectures, unless you wrap every read or write to one with a lock. The CPU support isn't there. Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. What I was expecting that the Java model was trying to do (though I didn't find this) was something along these lines: Accessing the main store involves locking, so by copying things to a thread-local store we can perform several operations on an item before we have to move it back to the main store (again, with locking). If we worked directly from the main store, we'd have to lock for each and every use of the variable. The reason I'm not finding it is that the semantic rules spelled out in the spec _seem_ to imply that every local access implies a corresponding access to the main store, one-to-one. On the other hand, maybe the point is that it can save up these accesses--that is, lock the main store once, and push back several values from the thread-local store. If it can do this, then it is saving some locking. Parrot deals with PMCs, which can contain (lets consider scalars only) e.g. a PerlInt or a PerlNumer. Now we would have atomic access (normally) to the former and very likely non-atomic access to the latter just depending on the value which happened to be stored in the PMC. This implies, that we have to wrap almost[1] all shared write *and* read PMC access with LOCK/UNLOCK. [1] except plain ints and pointers on current platforms Ah, but this misses a key point: We know that user data is allowed to get corrupted if the user isn't locking properly--we only have to protect VM-internal state. The key point is that it's very unlikely that there will be any floats involved in VM-internal state--it's going to be all pointers and ints (for offsets and lengths). That is, a corrupted float won't crash the VM. JEff
Re: Problem during make test
On Jan 15, 2004, at 10:42 PM, chromatic wrote: On Sun, 2004-01-04 at 12:09, Harry Jackson wrote: I tried that as well, it spits out identical PASM each time but on the odd occasion I need to use CTRL-C to get back to the shell. I'm seeing the same thing on Linux PPC -- odd hangs from time to time when running PIR, while running the PASM emitted with -o works well. t/op/arithmetics 3 and 9 seem to be the big culprits in the test suite. Perl 5.8.2, gcc version 3.2.3 20030422. I've checked out a fresh source tree and still see this behavior. Removing -DHAVE_JIT from the Makefile (since I didn't find the configure argument) had no effect. Yeah, I think JIT is a red herring--I don't see how JIT problems can be involved when not running with the JIT core JEff
Q: thread function
Should a thread function be always non-prototyped or do we allow prototyped ones too? On preparing a thread, the parameters of the thread function are copied/cloned into the new thread interpreter. So I'd like to know, which registers get the state from the calling interpreter (all or according to prototyped or to non-prototyped calling conventions). leo
Re: Optimization brainstorm: variable clusters
At 19:52 -0500 1/15/04, Melvin Smith wrote: At 04:26 PM 1/15/2004 -0700, Luke Palmer wrote: I can see some potential problems to solve with regards to some languages where variables are dynamic and can be undefined, such as Perl6, but the optimization would certainly work for constants in all languages. The only problem with Perl6 would be if a global or package variable's address changed after it was stored in the register group at bytecode load time, (which could probably happen). Which is very hard not to happen as soon as you get into Exporter land. ;-( For example: use Scalar::Util qw(blessed weaken reftype); use POSIX; Anytime we cache something dynamic, we have to make sure the caches know about changes. I think that is where notifications might help. For constants it is easy. IMCC might say, this routine requires us to intialize at least 3 registers with a constant value, lets make it into a register block This may be a premature optimization, but for certain cases I think its pretty nifty. This smells like premature optimization to me for languages such as Perl[\d]. The number of times a variable occurs in a program, may have _no_ relation to how many times it will be accessed. So what's the optimization then? If you're thinking about this, then maybe a better heuristic would be to group globals into groups that are _only_ referenced within a specific scope and fetch them on scope entry and store them on scope exit. But then, anything like eval or the equivalent of a glob assignment (or even worse: an event) within that scope, will cause problems. But please, people around me always tell me that I'm way too negative. That I'm always saying why things _can't_ happen. I'd like to be proven wrong... ;-) Liz
Re: [DOCS] POD Errors
They're already commited. On 16 Jan 2004, at 00:21, chromatic wrote: On Thu, 2004-01-15 at 15:02, Michael Scott wrote: So, after migrating from Pod::Checker to Pod-Simple, I've cleared up all the pod errors and done a rudimentary html tree. Do you have patches to fix the errors in CVS or are they even necessary? -- c
Re: Problem during make test
Chromatic [EMAIL PROTECTED] wrote: On Thu, 2004-01-15 at 23:26, Leopold Toetsch wrote: Could you attach gdb to the hanging parrot? This time, it's hanging at t/op/00ff-dos.t: (gdb) bac #0 0x0fd0e600 in sigsuspend () from /lib/libc.so.6 #1 0x0ff970ac in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0 #2 0x0ff96cf8 in pthread_onexit_process () from /lib/libpthread.so.0 #3 0x0fd10bc8 in exit () from /lib/libc.so.6 #4 0x1008c750 in Parrot_exit (status=0) at src/exit.c:54 #5 0x100320b4 in main (argc=1, argv=0x75c0) at imcc/main.c:555 Ugly. Parrot starts the event thread as detached, so that *should* not cause problems. But maybe I'm doing something stupid somewhere. The event thread is waiting on a queue condition, but when main exits, it should just terminate AFAIK. I can send a special event_loop_terminate event though. I can upgrade glibc to see if that helps. That's another possibility. -- c leo
Re: Problem during make test
Chromatic [EMAIL PROTECTED] wrote: This time, it's hanging at t/op/00ff-dos.t: I've checked in now: * terminate the even loop thread on destroying of the last interp * this could help against the spurious hangs reported on p6i Could you please check if that helps. Thanks, leo
Re: Optimization brainstorm: variable clusters
Elizabeth Mattijsen [EMAIL PROTECTED] wrote: If you're thinking about this, then maybe a better heuristic would be to group globals into groups that are _only_ referenced within a specific scope and fetch them on scope entry and store them on scope exit. But then, anything like eval or the equivalent of a glob assignment (or even worse: an event) within that scope, will cause problems. Storing lexicals or globals isn't needed: $ cat g.pasm new P0, .PerlInt set P0, 4 store_global $a, P0 # ... find_global P1, $a inc P1 find_global P2, $a print P2 print \n end $ parrot g.pasm 5 So the optimization is to just keep lexicals/globals in registers as long as we have some. Where currently spilling is done, we just forget about that register (but not *reuse* it, Cnew or such is ok) - and refetch the variable later. So the *only* current optimization is: we need HLL directives for lexicals and globals so that the spilling code and register allocator can use this information. That is: we can always cut the life range of lexicals/globals, *if* we refetch, where we now fetch from the spill array. Liz leo
Re: Unicode, internationalization, C++, and ICU
Maybe we can use someone else's solution... http://lists.ximian.com/archives/public/mono-list/2003-November/ 016731.html On 16 Jan 2004, at 00:33, Jonathan Worthington wrote: - Original Message - From: Dan Sugalski [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 15, 2004 8:09 PM Subject: Unicode, internationalization, C++, and ICU Now, assuming there's still anyone left reading this message... We've been threatening to build ICU into parrot, and it's time for that to start happening. Unfortunately there's a problem--it doesn't work right now. So, what we need is some brave soul to track ICU development and keep us reasonably up to date. What I'd really like is: 1) ICU building and working 2) ICU not needing any C++ I've done some testing, and I hate to be the bearer of bad news but I believe we have something of a problem. :-( The configure script turns out to be a shell script which, unless I'm mistaken, means we're currently unable to build ICU anywhere we don't have bash or similar. Win32 for starters, which is where I'm testing. A possible solution might be to re-write the configure script in Perl - though we'd have to keep it maintained as we do ICU updates. Another one, for Win32 at least, is that we *might* be able to use UNIX Services For Win32 and run configure under that, generate a Win32 makefile and just copy it in place with the configure script. Less portable to other places with the same problem, though, and again we have to maintain it as we update ICU. There is also a problem with the configure stage on Win32, but that's an aside until the above issue is sorted out. I also gave it a spin in cygwin, where the configure script for ICU runs OK, but there's no C++ compiler so it doesn't get built. Thoughts? Jonathan
[PATCHish] Food for thought about allocation
I did a little benchmark test today that had some enlightening results. I'll explain what my goal was, then show you said results, and finally a patch (not meant to be committed) and the benchmark programs themselves. I dearly wanted real Continuations to be faster, because I loathe RetContinuations. They suck. Honest. They're of no use as or with continuations -- they do little more than put a return value on the proverbial stack. And they can't be promoted without serious danger of the state being corrupted before this happens. So, I implemented a new register stack scheme without chunks. The register stack is now a linked list of single frames. This has the advantage that you *never* have to copy frames, nor do you have to mess with marking things COW. I figured it would be an improvement. I wrote it using Parrot's default small object allocator for each frame. I ran benchmark_1 (see below), and got these results (or around there, I idiotically didn't keep them around): % time parrot-sync benchmark_1.imc # Parrot from CVS parrot-sync benchmark_1.imc 0.87s user ... % time parrot benchmark_1.imc # My modified version parrot benchmark_1.imc 2.04s user ... Benchmark one is about raw save/restore speed, without respect for anything fancy. This was clearly unacceptable. I doesn't matter how much faster it makes continuations, this large a speed hit can't be compromised. I thought the problem might be in the small object allocator, as I couldn't imagine how the simple algorithms in my modification could possibly take so long. So I wrote my own small object allocator for the register stacks, and got: % time parrot benchmark_1.imc parrot benchmark_1.imc 1.11s user ... Much better. Didn't make me entirely happy, but a great improvement indeed. This could probably be improved with some careful and attentive coding. Here are the times for the other benchmarks: % time parrot-sync benchmark_2.imc parrot-sync benchmark_2.imc 0.78s user ... % time parrot benchmark_2.imc parrot benchmark_2.imc 0.89s user ... Benchmark 2 calls a sub many times, creating a RetContinuation each time. And finally the one with real continuations: % time parrot-sync benchmark_3.imc parrot-sync benchmark_3.imc 1.58s user ... % time parrot benchmark_3.imc parrot benchmark_3.imc 0.45s user ... Benchmark 3 does the same as benchmark 2, except that it creates a real Continuation each time. Note the number of iterations is half that of benchmark 2. Well, I achieved my goal for sure. Side results weren't so great. But the most enlightening thing is how much things improved by writing my own small object allocator -- even a quick and dirty one. Last time I was fishing through that subsystem, it seemed to have a lot of overhead. Are there things about it that require this overhead, or could it take an optimization run brining it near the overhead level of this little one? If so, we might have an opportunity to boost parrot's usual speed by a fine degree. Here are the benchmarks: benchmark_1.imc --- .sub _main $I0 = 50 again: unless $I0 goto quit savetop restoretop dec $I0 goto again quit: end .end benchmark_2.imc --- .sub _main $I0 = 10 newsub P0, .Sub, _other again: unless $I0 goto quit saveall newsub P1, .RetContinuation, back invokecc restoreall dec $I0 goto again quit: end .end .sub _other invoke P1 .end benchmark_3.imc --- .sub _main $I0 = 5 newsub P0, .Sub, _other again: unless $I0 goto quit saveall newsub P1, .Continuation, back invokecc restoreall dec $I0 goto again quit: end .end .sub _other invoke P1 .end And the exemplary patch: Index: config/gen/config_h/config_h.in === RCS file: /cvs/public/parrot/config/gen/config_h/config_h.in,v retrieving revision 1.20 diff -u -r1.20 config_h.in --- config/gen/config_h/config_h.in 24 Dec 2003 14:54:16 - 1.20 +++ config/gen/config_h/config_h.in 16 Jan 2004 10:08:56 - @@ -100,15 +100,8 @@ /* typedef INTVAL *(*opcode_funcs)(void *, void *) OPFUNC; */ -#define FRAMES_PER_CHUNK 16 - /* Default amount of memory to allocate in one whack */ #define DEFAULT_SIZE 32768 - -#define FRAMES_PER_PMC_REG_CHUNK FRAMES_PER_CHUNK -#define FRAMES_PER_NUM_REG_CHUNK FRAMES_PER_CHUNK -#define FRAMES_PER_INT_REG_CHUNK FRAMES_PER_CHUNK -#define FRAMES_PER_STR_REG_CHUNK FRAMES_PER_CHUNK #define JIT_CPUARCH ${jitcpuarch} #define JIT_OSNAME ${jitosname} Index: imcc/pcc.c === RCS file: /cvs/public/parrot/imcc/pcc.c,v
Vtables organization
PMCs use Vtables for almost all their functionality *and* for stuff that in Perl5 term is magic (or they should AFAIK). E.g. setting the _ro property of a PMC (that supports it[1]) swaps in the Const$PMC vtable, where all vtable methods that would change the PMC thrown an exception. Or: setting a PMC shared, whould swap in a vtable, that locks e.g. internal aggregate state on access. That is a non-shared PMC doesn't suffer of any locking slowdown. Tieing will very likely swap in just another vtable and so on. The questions are: - Where and how should we store these vtables? - Are these PMC variants distinct types (with class_enum and name) - Or are these sub_types (BTW what is vtable-subtype)? E.g. hanging off from the main vtable? Comments welcome, leo [1] This still needs more work: Real constant PMCs are allocated in a separate arena which isn't scanned during DOD, *but* all items, that the PMC may refer too have to be constant too, including Buffers it may use. But swapping in the vtable is working.
Events and JIT
Event handling currently works for all run cores[1] except JIT. The JIT core can't use the schemes described below, but we could: 1) explicitely insert checks, if events are to be handled 1a) everywhere or 1b) in places like described below under [1] c) 2) Patch the native opcodes at these places with e.g. int3 (SIG_TRAP, debugger hook) cpu instruction and catch the trap. Running the event handler (sub) from there should be safe, as we are in a consistent state in the run loop. 3) more ideas? 1) of course slows down execution of all JIT code, 2) is platform/architecture dependent, but JIT code is that anyway. Comments welcome, leo [1] a) Run cores with an opcode dispatch table get a new dispatch table, where all entries point to the event handling code b) The switch core checks at the beginning of the switch statement. c) Prederefed run cores get the opcode stream patched, where back-ward branches and invoke or such[2] are replaced with event-check opcodes. While a) and c) is run async from the event thread, it shouldn't cause problems, because (assuming atomic word-access) either the old function table/opcode pointer is visible or already the new, there is no inconsistent state. Events using a) or b) are handled instantly, while c) events get handled some (probably very short time) after they were scheduled. [2] Explicit hints from the ops-files, where to check events would simplify that. E.g. op event invoke() op event_after sleep(..)
Re: Some namespace notes
Here's my proposal: * Basics: Parrot uses nested hashes for namespaces (like perl does). The high-level language splits namespace strings using whatever its separator is ('::', '.' etc) to generate an array of strings for the namespace lookup. * Relative roots: Namespace lookup starts from a 'root' namespace (think root directory). Here the P2 argument holds the root namespace to start the lookup from: find_global P1, P2, ['global', 'namespace', 'hierarchy'], thingname If it's null then the interpreters default root namespace is used. This scheme allows chroot() style shifting of the effective root. (It's a key part of how the perl Safe module works, for example.) * Per-language root: Each HLL could use a 'root' that's one level down from the true root. Using a directory tree for illustration: /perl/Carp/carp perl sees Carp at top level /java/java/lang/... java sees java at top level * Backlinks: /perl/main -.main points back to perls own root ^--'(so $main::main::main::foo works as it should) /perl/parrot -. parrot points back to true root ^-' * Accessing namespace of other languages: Given the above, accessing the namespace of other languages is as simple as: /perl/parrot/java/java/lang/String/... eg $parrot::java::java::lang::String::CASE_INSENSITIVE_ORDER for perl and parrot.perl.Carp.carp for Java (perhaps, I don't claim to know any Java) * Summary: - Nested hashes allow chroot() style shifting of the root. - That requires the 'effective root' to be passed to find_global. - Each HLL could have it's own 'root' to avoid name clashes. - Backlinks can be used to provide access to other HLL namespaces. - This combination of unification (all in one tree) and isolation (each HLL has a separate root) offers the best of all worlds. Tim.
Ops file hints
IMHO we need some more information in the ops files: 1) If an INT argument refers to a label/branch destination 2) For the event-checking code 3) For the safe run core ad 1) e.g. inline op eq(in INT, in INT, inconst INT) { inline op eq(in INT, in INT, in_branch_const INT) { The Cin_branch_const translates during ops file mangling to Cinconst *and* sets an appropriate flag in OpLib/core.pm, which is carried on into the op_info structure. Currently imcc just estimates, if an opcode would branch and where the label is, which is bad (and error prone) - or more precisley, branching ops are known, but not all have an associated label. ad 2) e.g. op event invoke() op event_after sleep() would mean to check at or after that opcode, if events are to be handled. This would be a flag (or 2) in the op_info. ad 3) The safe run-core will very likely need to disallow opcodes per category similar as in perldoc Opcode. e.g. op :base_io print(...) Comments welcome (and takers, if we agree) leo
Re: [PATCHish] Food for thought about allocation
Luke Palmer wrote: I did a little benchmark test today that had some enlightening results. First of all: did you compile Parrot optimized? Because: I thought the problem might be in the small object allocator, You actually have a similar thing, but mainly inlined. Object allocation and DOD need optimized compiles. saveall newsub P1, .RetContinuation, back invokecc You are creating *two* RetContinuations with that. leo
Re: Some namespace notes
Tim Bunce [EMAIL PROTECTED] wrote: Here's my proposal: * Basics: Parrot uses nested hashes for namespaces (like perl does). * Relative roots: Namespace lookup starts from a 'root' namespace (think root directory). Here the P2 argument holds the root namespace to start the lookup from: find_global P1, P2, ['global', 'namespace', 'hierarchy'], thingname I like that except: *again* above syntax sucks. find_global P1, P2 ['global'; 'namespace'; 'hierarchy'; thingname ] P2 can be a namespace PMC or the interpreter itself. find_global P3, P2 ['global'; 'namespace'; 'hierarchy' ] returns another namespace, and ... find_global P1, P3 [ thingname ] is the same, as the first. The original syntax would need heavy modifications in the assembler, the latter fits nicely. Tim. leo
Re: Some namespace notes
On Fri, Jan 16, 2004 at 12:49:09PM +0100, Leopold Toetsch wrote: Tim Bunce [EMAIL PROTECTED] wrote: Here's my proposal: * Basics: Parrot uses nested hashes for namespaces (like perl does). * Relative roots: Namespace lookup starts from a 'root' namespace (think root directory). Here the P2 argument holds the root namespace to start the lookup from: find_global P1, P2, ['global', 'namespace', 'hierarchy'], thingname I like that except: *again* above syntax sucks. find_global P1, P2 ['global'; 'namespace'; 'hierarchy'; thingname ] P2 can be a namespace PMC or the interpreter itself. find_global P3, P2 ['global'; 'namespace'; 'hierarchy' ] returns another namespace, and ... find_global P1, P3 [ thingname ] is the same, as the first. The original syntax would need heavy modifications in the assembler, the latter fits nicely. Sure. Sounds good. (I'm not well placed to talk about syntax as I've not yet written any parrot code, though than may be about to change, it's the principles of a unified hierarchy, chroot, and backlinks that's important.) Tim.
Re: Optimization brainstorm: variable clusters
On Thu, Jan 15, 2004 at 06:27:52PM -0500, Melvin Smith wrote: At 06:13 PM 1/15/2004 -0500, Melvin Smith wrote: At 10:02 PM 1/15/2004 +0100, Elizabeth Mattijsen wrote: At 15:51 -0500 1/15/04, Melvin Smith wrote: Comments questions welcome. Why am I thinking of the register keyword in C? I have no idea and I can't see the relationship. :) I just realized my response sounded crass, and wasn't meant to be. I welcome comments, I just didn't understand what relation you were getting at. Feel free to point it out to me. The context: Jonathan was asking about importing constants at runtime and/or constant namespaces. Dan and I were discussing the issues and how routines with lots of package globals or constants would spend a significant part of their time retrieving symbols. Jonathan did not want compile time constants, Dan did not want importable constants that mutate the bytecode at runtime, so I was trying to come up with a compromise, ugly as it may be. Aren't constant strings going to be saved in a way that lets the address of the saved string be used to avoid string comparisons? (As is done for hash keys in perl5.) Perhaps that's already done. Then bytecode could 'declare' all the string constants it contains. The byteloader could merge them into the global saved strings pool and 'fixup' references to them in the bytecode. If the namespace lookup code knew when it was being given saved string pointers it could avoid the string compares as it walks the namespace tree. Maybe all that's been done. Here's an idea that builds on that: Perhaps a variant of a hash that worked with large integers (pointers) as keys could be of some use here. The namespace could be a tree of these 'integer hashes'. Most namespace lookups use constants which can be 'registered' in the unique string pool at byteloader time. To lookup a non-constant string you just need to check if it's in the unique string pool and get that pointer if it is. If it's not then you know it doesn't exist anywhere. If it is you do the lookup using the address of the string in the pool. The JudyL functions (http://judy.sourceforge.net/) provide a very efficient 'integer hash'. Tim.
postgres lib
hello i noticed that harry jackson is writing a postgres interface to wrap the nci functions. i'd like him to show what he's written so far and give his comments guess i'm not the only one hoping to test it LF
Re: Optimization brainstorm: variable clusters
At 9:30 AM +0100 1/16/04, Elizabeth Mattijsen wrote: At 19:52 -0500 1/15/04, Melvin Smith wrote: At 04:26 PM 1/15/2004 -0700, Luke Palmer wrote: I can see some potential problems to solve with regards to some languages where variables are dynamic and can be undefined, such as Perl6, but the optimization would certainly work for constants in all languages. The only problem with Perl6 would be if a global or package variable's address changed after it was stored in the register group at bytecode load time, (which could probably happen). Which is very hard not to happen as soon as you get into Exporter land. ;-( Well... sorta. A lot of that stuff's known at compile time. Anytime we cache something dynamic, we have to make sure the caches know about changes. I think that is where notifications might help. For constants it is easy. IMCC might say, this routine requires us to intialize at least 3 registers with a constant value, lets make it into a register block This may be a premature optimization, but for certain cases I think its pretty nifty. This smells like premature optimization to me for languages such as Perl[\d]. To some extent, yes. I just had a really nasty thought, and I think the compiler writers need to get Official Rulings on behavior. With perl, for example, it's distinctly possible that this: our $foo; # It's a global $foo = 12; if ($foo 10) { print $foo; } will require fetching $foo's PMC out of the global namespace three times, once for each usage. I don't know offhand if this is how perl 5 works (I think it might be) and we should check for perl 6, python, and ruby. This is mainly because of the possibility of tied or overridden namespaces, which would argue for a refetch on each use. *Not* refetching is a perfectly valid thing, and not specifying is also perfectly valid, but we need to check. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Optimization brainstorm: variable clusters
Tim Bunce [EMAIL PROTECTED] wrote: Aren't constant strings going to be saved in a way that lets the address of the saved string be used to avoid string comparisons? Constant strings get an entry in the constant_table. So for comparing 2 constant strings of the *same* code segment, strings differ, if their address differ. (As is done for hash keys in perl5.) Perhaps that's already done. Not yet, but its worth to look at. The byteloader could merge them into the global saved strings pool and 'fixup' references to them in the bytecode. That's not possible generally. E.g. eval()ing a piece of code with varying string constants would grow the global string constants forever. Perhaps a variant of a hash that worked with large integers (pointers) as keys could be of some use here. That doesn't play well with dynamic code AFAIK. Namespace keys cane be string vars too. The JudyL functions (http://judy.sourceforge.net/) provide a very efficient 'integer hash'. I had a look at that some time ago, but the internals are horribly complex and it was leaking memory too. Tim. leo
Re: cvs commit: parrot/src dynext.c extend.c
Dan Sugalski [EMAIL PROTECTED] wrote: +++ nci.pmc 16 Jan 2004 13:29:52 - 1.22 +STRING* name () { +return SELF-vtable-whoami; All classes inherit the Cname method from default.pmc. Did it now work without this addition? leo
Re: The todo list
At 12:45 PM -0500 1/15/04, Dan Sugalski wrote: What I'd like is for a volunteer or two to manage the todo queue. Nothing fancy, just be there to assign todo list items to the folks that volunteer, make sure they're closed out when done, and reassign them if whoever's handling a task needs to bail for whatever reason. Okay, I've two volunteers, Dave Pippenger and Stephane Peiry. (Make sure you get me your perl.org logins if you've not done so yet (I've not checked (Wheee, parenthesis!))) When Robert's back from vacation we'll get you installed with access to the RT queues for parrot and see what we can do. In the mean time, if anyone else has todo list items, send them (*ONE* per e-mail!) to bugs-parrot at bugs6.perl.org to get 'em in the queue and we'll start sorting them out from there. If we're lucky and have sufficient web luck we might even get 'em into a web-accessible TODO list (so make sure the subjects are descriptive too!) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Optimization brainstorm: variable clusters
At 3:51 PM -0500 1/15/04, Melvin Smith wrote: While sitting on IRC with Dan and Jonathan discussing how to optimizer a certain construct with how we handle globals/package variables, etc. I came to the conclusion that it would be valuable to not have to fetch each and every global, lexical or package variable by name, individually, but instead fetch them in clusters (4-16 at a time). We already have register frames which are saved and restored very efficiently. Two things: 1) Lexicals should be reasonably fast, as they're integer indexable in most cases. (The only time we *need* hashlike access is when we're doing symbolic lookup, which is pretty uncommon. Downright impossible for many languages) 2) I can easily see having something equivalent to the lookback op for register stacks. Won't help for ints and floats, but it'll work fine for PMCs and strings. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Some namespace notes
At 12:49 PM +0100 1/16/04, Leopold Toetsch wrote: Tim Bunce [EMAIL PROTECTED] wrote: Here's my proposal: * Basics: Parrot uses nested hashes for namespaces (like perl does). * Relative roots: Namespace lookup starts from a 'root' namespace (think root directory). Here the P2 argument holds the root namespace to start the lookup from: find_global P1, P2, ['global', 'namespace', 'hierarchy'], thingname I like that except: *again* above syntax sucks. find_global P1, P2 ['global'; 'namespace'; 'hierarchy'; thingname ] No. The thing will be a separate parameter. The original syntax would need heavy modifications in the assembler, the latter fits nicely. We can cope. The assembler needs a good kick with regards to keyed stuff anyway, I expect, and we're going to need this for constructing keys at runtime, something we've not, as yet, addressed. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Unicode, internationalization, C++, and ICU
At 10:40 AM +0100 1/16/04, Michael Scott wrote: Maybe we can use someone else's solution... http://lists.ximian.com/archives/public/mono-list/2003-November/016731.html Could be handy. We really ought to detect a system-installed ICU and use that rather than our local copy at configure time, if it's of an appropriate version. That'd at least avoid having two copies, and potentially get us some system-wide runtime memory savings. - Original Message - From: Dan Sugalski [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 15, 2004 8:09 PM Subject: Unicode, internationalization, C++, and ICU Now, assuming there's still anyone left reading this message... We've been threatening to build ICU into parrot, and it's time for that to start happening. Unfortunately there's a problem--it doesn't work right now. So, what we need is some brave soul to track ICU development and keep us reasonably up to date. What I'd really like is: 1) ICU building and working 2) ICU not needing any C++ I've done some testing, and I hate to be the bearer of bad news but I believe we have something of a problem. :-( The configure script turns out to be a shell script which, unless I'm mistaken, means we're currently unable to build ICU anywhere we don't have bash or similar. Win32 for starters, which is where I'm testing. A possible solution might be to re-write the configure script in Perl - though we'd have to keep it maintained as we do ICU updates. Another one, for Win32 at least, is that we *might* be able to use UNIX Services For Win32 and run configure under that, generate a Win32 makefile and just copy it in place with the configure script. Less portable to other places with the same problem, though, and again we have to maintain it as we update ICU. There is also a problem with the configure stage on Win32, but that's an aside until the above issue is sorted out. I also gave it a spin in cygwin, where the configure script for ICU runs OK, but there's no C++ compiler so it doesn't get built. Thoughts? Jonathan -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Some namespace notes
At 11:00 PM -0800 1/15/04, Jeff Clites wrote: A key part of my argument (and it's find if you understood this, and disagree--just wanted to make sure that it was clear) is that I think we shouldn't try to do any sort of cross-language unification. I saw that and wasn't really looking to deal with it, but I should've. I think we should have the potential for cross-language unification. It shouldn't be obligatory, but it should be easy, and I think we're going to see perl 5, perl 6, ruby, and python at least sharing a global namespace once we get things going sufficiently. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Optimization brainstorm: variable clusters
Simon Cozens [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: if ($foo 10) { print $foo; } This is mainly because of the possibility of tied or overridden namespaces, which would argue for a refetch on each use. No, come on, Dan. It's far worse than that. It'll be possible, from Perl-space to override either or print, and it should well be possible for them to, for instance, tie their operands. Wee! That's not the problem. But if the overriden C op (or the print) changes $foo's namespace on the fly, then a refetch would be necessary. I'd prefer to have a hint: our $foo is volatile; The normal case would be to fetch $foo exactly once - before any loop. Honestly an overridden op could insert new rules in the code and recompile everything. If we have to always check for such nasty things, then we can forget all performance and optimizations. leo
Re: Some namespace notes
At 11:07 AM + 1/16/04, Tim Bunce wrote: Here's my proposal: I like it all except for the backlink part, and that only because I'm not sure the names are right. I'm tempted to use reasonably unavailable characters under the hood (yeah, I'm looking at NUL (ASCII 0) and maybe SOH (ASCII 1) for language root and global root). Otherwise it looks good, and I think it's the way to be going. Languages can have the option of sharing a common root if they so choose, and set their search paths, since we're going to allow that sort of thing with nested namespaces. The default global space can be a two-level nest with the language level coming before the generic one in the search space. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Optimization brainstorm: variable clusters
At 2:37 PM + 1/16/04, Simon Cozens wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: if ($foo 10) { print $foo; } This is mainly because of the possibility of tied or overridden namespaces, which would argue for a refetch on each use. No, come on, Dan. It's far worse than that. It'll be possible, from Perl-space to override either or print, and it should well be possible for them to, for instance, tie their operands. Wee! Yeah, but we still control what they get. We've not gone so completely mad that we hand in strings and code snippets for runtime evaluation... -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Numeric formatting
At 12:21 AM + 1/16/04, Tim Bunce wrote: On Thu, Jan 15, 2004 at 05:15:05PM -0500, Dan Sugalski wrote: At 2:39 PM -0500 1/15/04, Dan Sugalski wrote: At 8:31 PM +0100 1/15/04, Michael Scott wrote: Is this relevant? http://oss.software.ibm.com/icu/userguide/formatNumbers.html I'm still not clear in my mind what the plan is with regard to ICU. Is it intended eventually to be: a) an always-there part of parrot, or b) just a sometimes-there thing that gets linked in if you mess with unicode? A) is the case. I didn't realize that the ICU library did numeric formatting. And then I realized, somewhat belatedly, that this won't necessarily work. Won't work, or work but potentially have too much overhead? Too much overhead. It means yanking in ICU and initializing parrot's Unicode subsystem just to do numeric formatting, which seems like a bit of overkill just to make: format foo, 9, .00 stick 0009.00 in foo. (Except for the case where it ought to stick 0009,00, I suppose) I think I'd rather not have to yank things into Unicode just to format numbers. I'm not quite sure what you mean here. Well, what I'm trying hard to do is make Unicode as optional for Parrot as any other encoding. While I know that in some likely common cases (perl 6) it'll be required, I'd prefer it not be mandatory everywhere. For code that requires unicode it certainly makes sense to yank it in, but I'm not sure numeric formatting really does. Using ICU in this case seems more a matter of convenience than need. OTOH, it brings in locale issues, and Jarkko's warned me about those often enough, though I remain in blissful ignorance about them for the moment. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: cvs commit: parrot/src dynext.c extend.c
At 3:07 PM +0100 1/16/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: +++ nci.pmc 16 Jan 2004 13:29:52 - 1.22 +STRING* name () { +return SELF-vtable-whoami; All classes inherit the Cname method from default.pmc. Did it now work without this addition? Ah, this was a junk leftover thing. I was wedging in debugging information, so I needed a local copy, and when I was cleaning up my tree I just ripped out the debug code and not the mainline code. This chunk can get tossed. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Optimization brainstorm: variable clusters
At 4:02 PM +0100 1/16/04, Leopold Toetsch wrote: Honestly an overridden op could insert new rules in the code and recompile everything. If we have to always check for such nasty things, then we can forget all performance and optimizations. That, at least, we don't have to worry about. None of the existing languages (well, Tcl might, but I don't care there) require going quite so mad, and if Larry tries, well... we have ways to deal with that. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [PATCHish] Food for thought about allocation
At 3:34 AM -0700 1/16/04, Luke Palmer wrote: I dearly wanted real Continuations to be faster, because I loathe RetContinuations. They suck. Honest. There really ought be no difference between a regular continuation and a return continuation. (I certainly can't think of a reason they should be different) But the most enlightening thing is how much things improved by writing my own small object allocator -- even a quick and dirty one. Last time I was fishing through that subsystem, it seemed to have a lot of overhead. Are there things about it that require this overhead, or could it take an optimization run brining it near the overhead level of this little one? If so, we might have an opportunity to boost parrot's usual speed by a fine degree. True enough. It's certainly worth poking around with, and since this is all (pleasantly) internal stuff, it's a good place for experimentation and research without affecting user visible behaviour. I fully expect to tweak the object allocator and frame sizes as we get closer to release time. I looked at the patch and I don't think this is an issue, but we should probably have separate allocation functions for different sized objects, even if they're just macro'd to be the generic allocator function with a hard-coded parameter. That'll let us tweak the allocation system as we go along. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Q: thread function
At 9:13 AM +0100 1/16/04, Leopold Toetsch wrote: Should a thread function be always non-prototyped or do we allow prototyped ones too? On preparing a thread, the parameters of the thread function are copied/cloned into the new thread interpreter. I'm not sure I want to do it that way. (Actually, I think I don't, at least in some cases) Give me a bit -- I'm working on the proposed thread spec, and it gets this low-level. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Ops file hints
At 12:15 PM +0100 1/16/04, Leopold Toetsch wrote: IMHO we need some more information in the ops files: 1) If an INT argument refers to a label/branch destination 2) For the event-checking code 3) For the safe run core ad 1) e.g. inline op eq(in INT, in INT, inconst INT) { inline op eq(in INT, in INT, in_branch_const INT) { Works, go for it. ad 2) e.g. op event invoke() op event_after sleep() would mean to check at or after that opcode, if events are to be handled. This would be a flag (or 2) in the op_info. Works as well. (Though we need to change sleep to wait on an event to wake up, but that's separate) ad 3) The safe run-core will very likely need to disallow opcodes per category similar as in perldoc Opcode. Yep. I'd also like to have the ability to add in some other parameters to the ops file, so if when we're digging in we could wedge in callouts that by default are ignored, that'd be great. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: postgres lib
At 2:59 PM +0100 1/16/04, LF wrote: hello i noticed that harry jackson is writing a postgres interface to wrap the nci functions. i'd like him to show what he's written so far and give his comments guess i'm not the only one hoping to test it :) Harry, if you're OK with this, go ahead and check it in. If you don't have checkin privs, grab a perl.org account and we'll get you some. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Unicode, internationalization, C++, and ICU
On Jan 15, 2004, at 3:33 PM, Jonathan Worthington wrote: - Original Message - From: Dan Sugalski [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 15, 2004 8:09 PM Subject: Unicode, internationalization, C++, and ICU Now, assuming there's still anyone left reading this message... We've been threatening to build ICU into parrot, and it's time for that to start happening. Unfortunately there's a problem--it doesn't work right now. So, what we need is some brave soul to track ICU development and keep us reasonably up to date. What I'd really like is: 1) ICU building and working 2) ICU not needing any C++ I've done some testing, and I hate to be the bearer of bad news but I believe we have something of a problem. :-( The configure script turns out to be a shell script which, unless I'm mistaken, means we're currently unable to build ICU anywhere we don't have bash or similar. Win32 for starters, which is where I'm testing. This page give instructions for building on Windows--it doesn't seem to require installing bash or anything: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/ readme.html#HowToBuildWindows I assume that on Windows you don't need to run the configure script. JEff
Re: Unicode, internationalization, C++, and ICU
snip This page give instructions for building on Windows--it doesn't seem to require installing bash or anything: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/ readme.html#HowToBuildWindows I assume that on Windows you don't need to run the configure script. Thanks for that, I'll work on and test a patch for the Configure script to do this on Win32 later. It won't help with any compiler other than MSVC++, but it certainly helps. Thanks, Jonathan
Re: Ops file hints
Dan Sugalski [EMAIL PROTECTED] wrote: At 12:15 PM +0100 1/16/04, Leopold Toetsch wrote: op event_after sleep() Works as well. (Though we need to change sleep to wait on an event to wake up, but that's separate) I thought so too. Its mainly a workaround until we have all the events done, albeit a plain sleep should be easy. I'd also like to have the ability to add in some other parameters to the ops file, so if when we're digging in we could wedge in callouts that by default are ignored, that'd be great. Takers wanted. Its mainly a Perl task mangling the ops files. If more info is needed, please just ask. leo
Re: Numeric formatting
Well, that's sort of how I imagined this would play out. And from a code writers ascii-centric outlook I agree it makes sense to have unicode as a special case brought in only when the pesky data requires it. What about using the icu api to make things easier when the formatter changes? Mike On 16 Jan 2004, at 16:18, Dan Sugalski wrote: At 12:21 AM + 1/16/04, Tim Bunce wrote: On Thu, Jan 15, 2004 at 05:15:05PM -0500, Dan Sugalski wrote: At 2:39 PM -0500 1/15/04, Dan Sugalski wrote: At 8:31 PM +0100 1/15/04, Michael Scott wrote: Is this relevant? http://oss.software.ibm.com/icu/userguide/formatNumbers.html I'm still not clear in my mind what the plan is with regard to ICU. Is it intended eventually to be: a) an always-there part of parrot, or b) just a sometimes-there thing that gets linked in if you mess with unicode? A) is the case. I didn't realize that the ICU library did numeric formatting. And then I realized, somewhat belatedly, that this won't necessarily work. Won't work, or work but potentially have too much overhead? Too much overhead. It means yanking in ICU and initializing parrot's Unicode subsystem just to do numeric formatting, which seems like a bit of overkill just to make: format foo, 9, .00 stick 0009.00 in foo. (Except for the case where it ought to stick 0009,00, I suppose) I think I'd rather not have to yank things into Unicode just to format numbers. I'm not quite sure what you mean here. Well, what I'm trying hard to do is make Unicode as optional for Parrot as any other encoding. While I know that in some likely common cases (perl 6) it'll be required, I'd prefer it not be mandatory everywhere. For code that requires unicode it certainly makes sense to yank it in, but I'm not sure numeric formatting really does. Using ICU in this case seems more a matter of convenience than need. OTOH, it brings in locale issues, and Jarkko's warned me about those often enough, though I remain in blissful ignorance about them for the moment. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Some namespace notes
Dan Sugalski [EMAIL PROTECTED] wrote: At 12:49 PM +0100 1/16/04, Leopold Toetsch wrote: find_global P1, P2 ['global'; 'namespace'; 'hierarchy'; thingname ] No. The thing will be a separate parameter. Why? Nested keys get you down the key chain until there is no more key. This can be a variable (above case) or another namespace PMC. Above lookup can be totally cached. When thingname is separate at least 2 hash lookups are necessary. Or if a separate thingname is there just append it - should be equivalent. The original syntax would need heavy modifications in the assembler, the latter fits nicely. We can cope. The assembler needs a good kick with regards to keyed stuff anyway, I expect, and we're going to need this for constructing keys at runtime, something we've not, as yet, addressed. We have: $ cat k.pasm new P1, .PerlHash new P2, .PerlString set P2, hello\n set P1[b], P2 new P3, .PerlHash set P3[a], P1 set P5, P3[a; b] # HoH access by key cons print P5 new P6, .Key set P6, a new P7, .Key set P7, b push P6, P7 set P5, P3[P6] # fully dynamic HoH access print P5 end $ parrot k.pasm hello hello leo
[perl #24922] Need Ops file metadata/hints system
# New Ticket Created by Dan Sugalski # Please include the string: [perl #24922] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24922 We need to revamp the ops file parsers to allow easier addition of ops metadata and parameter hints. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
[PATCH] Fix imcpasm tests on Win32
Hi, The attached patch fixes a problem in imcc/TestCompiler.pm which was causing all imcpasm tests to fail on Win32. Jonathan imcpasmtests.patch Description: Binary data
Re: postgres lib
Dan Sugalski wrote: At 2:59 PM +0100 1/16/04, LF wrote: hello i noticed that harry jackson is writing a postgres interface to wrap the nci functions. i'd like him to show what he's written so far and give his comments guess i'm not the only one hoping to test it Just something to bear in mind: Please do not assume anything about this code with regards to its suitability as a driver etc. I only did it because Dan wanted something to ease the pain of using libpq via Parrot. It does this in _areas_ but has not really been tested yet, its a bit of a mess as it stands at the moment but works for selects. It is probably OK to use it to play with or learn IMCC and to see how to use the NCI interface but none of it will become production code! Some of the IMCC guru's may have some questions as to its suitability to learn anything from ;-) :) Harry, if you're OK with this, go ahead and check it in. If you don't have checkin privs, grab a perl.org account and we'll get you some. I have an account username hjackson. Where is it going to go in the tree? I will try and get it tidied up a bit and check it in, do you want insert/update to be fully working as well. For those of you thinking its a silly question ;-) trust me it's not as straight forward as you might think operation ;-). I believe you want this for a demo (your post 12/02/03 18:35 ) What sort of thing is it for? Harry
Re: Events and JIT
On Fri, 16 Jan 2004, Dan Sugalski wrote: 2) Those that explicitly check for events ... Ops like spin_in_event_loop (or whatever we call it) or checkevent is in category two. They check events because, well, that's what they're supposed to do. Compilers should emit these with some frequency, though it's arguable how frequent they ought to be. I don't understand that part. Why the compiler? If the high-level code doesn't do anything with the events, then there's no point in checking. If it does use the events, then shouldn't developers call the even loop explicitly? Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ --
cygwin link failure
Hi, On cygwin, the final link fails with the following error:- gcc -o parrot.exe -s -L/usr/local/lib -g imcc/main.o blib/lib/libparrot.a -lcrypt blib/lib/libparrot.a(io_unix.o)(.text+0x87e): In function `PIO_sockaddr_in': /home/Jonathan/parrot_test/io/io_unix.c:468: undefined reference to `_inet_pton' inet_pton has not yet been implemented in cygwin, but it is being worked on... http://win6.jp/Cygwin/ Jonathan
Re: Vtables organization
At 11:53 AM +0100 1/16/04, Leopold Toetsch wrote: PMCs use Vtables for almost all their functionality *and* for stuff that in Perl5 term is magic (or they should AFAIK). E.g. setting the _ro property of a PMC (that supports it[1]) swaps in the Const$PMC vtable, where all vtable methods that would change the PMC thrown an exception. Or: setting a PMC shared, whould swap in a vtable, that locks e.g. internal aggregate state on access. That is a non-shared PMC doesn't suffer of any locking slowdown. Tieing will very likely swap in just another vtable and so on. The questions are: - Where and how should we store these vtables? - Are these PMC variants distinct types (with class_enum and name) - Or are these sub_types (BTW what is vtable-subtype)? E.g. hanging off from the main vtable? I was going to go on about a few ways to do this, but after I did I realized that only one option is viable. So, let's try this on for size: Vtables are chained. That means each vtable has a link to the next in the chain. It *also* means that each call into a vtable function has to pass in a pointer to the vtable the call came from so calls can be delegated properly. If we don't want this to suck down huge amounts of memory it also means that the vtable needs to be split into a vtable header and vtable function table body. Downside there is that we have an extra parameter (somewhat pricey) to all the vtable functions. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Ops file hints
At 5:58 PM +0100 1/16/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: I'd also like to have the ability to add in some other parameters to the ops file, so if when we're digging in we could wedge in callouts that by default are ignored, that'd be great. Takers wanted. Its mainly a Perl task mangling the ops files. If more info is needed, please just ask. As you probably noticed, I threw a bug/todo item in for this. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Events and JIT
Leopold Toetsch wrote: Event handling currently works for all run cores[1] except JIT. The JIT core can't use the schemes described below, but we could: 2) Patch the native opcodes at these places with e.g. int3 (SIG_TRAP, debugger hook) cpu instruction and catch the trap. Running the event handler (sub) from there should be safe, as we are in a consistent state in the run loop. I don't think that bytecode-modifying versions should fly; they're not threadsafe, and it would be nice to write-protect the instruction stream to avert that attack vector. 1) explicitely insert checks, if events are to be handled 1a) everywhere or 1b) in places like described below under [1] c) I like this (1b). With the JIT, an event check could be inlined to 1 load and 1 conditional branch to the event dispatcher, yes? (So long as interp is already in a register.) If that's done before blocking and at upward branches, the hit probably won't be killer for most of code. For REALLY tight loops (i.e., w/o branches or jumps, and w/ op count less than a particular threshold), maybe unroll the loop a few times and then still check on the upward branch. Those branches will almost always fall straight through, so while there will be load in the platform's branch prediction cache and a bit of bloat, there shouldn't be much overhead in terms of pipeline bubbles. The event ready word (in the interpreter, presumably) will stay in the L1 or L2 cache, avoiding stalls. No, it's not zero-overhead, but it's simple and easy enough to do portably. Crazy platform-specific zero-overhead schemes can come later as optimizations. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: Some namespace notes
I've used non-hierarchical file systems in the distant past, and it wasn't pleasant. I think aliases (symlinks) work much better in a hierarchy. So do inner packages, modules, and classes, which we plan to have in Perl 6. And package aliasing will be the basis for allowing different versions of the same module to coexist. And if Parrot makes people put /perl/parrot/java on the front of Java names, the first thing people will do is to alias them all to /java. Larry
Re: Events and JIT
On Fri, 16 Jan 2004, Dan Sugalski wrote: I don't understand that part. Why the compiler? Because we don't have the sort of control of the async environment that hardware does to deal with interrupts. And, realistically, all code has to deal with the possibility of interrupts. Even if they aren't doing any IO at all they're still potentially watching for keyboard (or other OS-initiated) breaks/interrupts. I see your point. In python if you press ^C, it should raise a KeyboardInterrupt exception. But rather than always monitoring for that, I'd want to register a listener and then have parrot handle the polling for me. Maybe by default, parrot does the check every N instructions... And then you could turn that off if you wanted more speed. Well... There's the issue of signals, which is the big one. If we could skip signals, that'd be great, but we can't, even on systems that don't do them in the true Unix-y sense. Windows programs should respond to breaks from the keyboard (or close-window requests in a terminal-esque environment if we build one) and have a chance to shut down cleanly, so... that's an event. This is probably a dumb question but: what if signals threw exceptions instead? I mean, they're pretty rare aren't they? They seem like a completely different kind of thing than watching a mouse or socket... Different because signals have nothing to do with the program itself but come entire from the outside. (Whereas with the regular events, the data comes from outside but the choice to listen for the data was made inside the program.) Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ --
RE: Events and JIT
Michal, But rather than always monitoring for that, I'd want to register a listener and then have parrot handle the polling for me. This is precisely what's being discussed. This is probably a dumb question but: what if signals threw exceptions instead? I'd hope that the event handler for a signal event could elect to throw an exception; it could even be the default. But the exception has to get into the thread somehow-- exceptions don't autonomously happen, and they require considerable cooperation from the thread on which the exception occurs. High-priority events are that mechanism through which the code which will throw the exception can interrupt normal program execution. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: Events and JIT
Dan Sugalski [EMAIL PROTECTED] wrote: At 11:38 AM +0100 1/16/04, Leopold Toetsch wrote: Event handling currently works for all run cores[1] except JIT. What I'd planned for with events is a bit less responsive than the system you've put together for the non-JIT case, and I think it'll be OK generally speaking. Ops fall into three categories: 1) Those that don't check for events 2) Those that explicitly check for events 3) Those that implicitly check for events Yep, that are the cases. I think I have boiled down that scheme to no cost for non JIT run cores[1], that is, in the absence of events there is no overhead for event checking. Event delivery (which I consider rare in terms of CPU cycles) takes a bit more instead - but not much. But the JIT core has to deal with event delivery too. So we have to decide which JITted ops are 3) - (case 2) the explicit check op is already available, that's no problem we need hints for 3) Ops in the third category are a bit trickier. Anything that sleeps or waits should spin on the event queue Ok, the latter is the simple part - all IO or event related ops. But the problem remains: What about the loop[2] of mops.pasm? Only integers in registers running at one Parrot op per CPU cycle. The big thing to ponder is which ops ought go in category three. I can see the various invoke ops doing it, but beyond that I'm up in the air. Yes. First: do we guarantee timely event handling in highly optimized loops like in mops.pasm? Can we uses schemes like my proposal of using the int3 x86 instruction... leo [1] the switched core currently checks after the switch statement, but its not simple to optimize that [2] jit_func+116: sub%edi,%ebx jit_func+118: jne0x81c73a4 jit_func+116
Re: Problem during make test
Chromatic [EMAIL PROTECTED] wrote: Yes, that's better. (Upgrading glibc didn't help -- I was worried that this was an NPTL issue that Parrot couldn't fix.) Cool. Now it hangs on t/pmc/timer: 0x10090b30 in Parrot_del_timer_event (interpreter=0x10273e88, Ah yep. When committing the first (trial) fix, I thought about such a problem, which is related: - if it seems to hang on a condition variable (still AFAIK: it shouldn't) - but anyway - it could depend on objects, that need destruction, like a timer event, so ... I moved killing the event loop a bit down in the interpreter destroy sequence. Reaching that point, timers should be removed from the queue. HTH and thanks for your valuable feedback to track things down. -- c leo
Re: Events and JIT
Michal Wallace [EMAIL PROTECTED] wrote: On Fri, 16 Jan 2004, Dan Sugalski wrote: interrupts. Even if they aren't doing any IO at all they're still potentially watching for keyboard (or other OS-initiated) breaks/interrupts. I see your point. In python if you press ^C, it should raise a KeyboardInterrupt exception. But rather than always monitoring for that, I'd want to register a listener Ehem: that's what we are talking about. There is a listener already running, that's the event thread. It currently doesn't much, but it listens eg for a timer event, that is, it waits on a condition. This doesn't take any CPU time during waiting except for a bit overhead, when the kernel awakens that thread and it executes (for a very short time). So this event thread sees: Oh the kernel just noticed me, that the cookie for interpreter #5 is ready, so I'll tell that interpreter. That's what the event thread is doing. Now interpreter #5 - currently busy running in a tight loop - doesn't see that his cookie is ready. *If* this interpreter would check every pace, if there is a cookie, it would be a big slowdown. To minimze the overhead, the event thread throws a big piece of wood in front of the fast running interpreter #5, which finally, after a bit stumbling, realizes: Oh, my cookie has arrived. This is probably a dumb question but: what if signals threw exceptions instead? We will (AFAIK) convert signals to events, which dispatch further. I mean, they're pretty rare aren't they? Async is the problem. Michal J Wallace leo
Re: Events and JIT
Gordon Henriksen [EMAIL PROTECTED] wrote: Leopold Toetsch wrote: 2) Patch the native opcodes at these places with e.g. int3 I don't think that bytecode-modifying versions should fly; they're not threadsafe, Why? The bytecode is patched by a different thread *if* an event is due (which in CPU cycles is rare). And I don't see a thread safety problem. The (possibly different) CPU reads an opcode and runs it. Somewhere in the meantime, the opcode at that memory position changes to the byte sequence 0xCC (on intel: int3 ) one byte changes, the CPU executes the trap or not (or course changing that memory position is assumed to be atomic, which AFAIK works on i386) - but next time in the loop the trap is honored. ... and it would be nice to write-protect the instruction stream to avert that attack vector. We did protect it, so we can un- and reprotect it, that's not the problem. 1b) in places like described below under [1] c) I like this (1b). With the JIT, an event check could be inlined to 1 load and 1 conditional branch to the event dispatcher, yes? Yep. That's the plain average slower case :) Its a fallback, if there are no better and faster solutions. (So long as interp is already in a register.) Arghh, damned i386 with *zero* registers, where zero is around 4 (usable, general... ) ;) So no interpreter in registers here - no. Its at least 3? cycles + branch prediction overhead, so a lot compared to nul overhead... ... If that's done before blocking and at upward branches, the hit probably won't be killer for most of code. For REALLY tight loops (i.e., w/o branches or jumps, and w/ op count less than a particular threshold), maybe unroll the loop a few times and then still check on the upward branch. Yep, loop unrolling would definitely help, that was the currently very likely working solution in my head. Those branches will almost always fall straight through, so while there will be load in the platform's branch prediction cache and a bit of bloat, there shouldn't be much overhead in terms of pipeline bubbles. The event ready word (in the interpreter, presumably) will stay in the L1 or L2 cache, avoiding stalls. Yep. Still I like these numbers: $ parrot -j examples/assembly/mops.pasm M op/s:790.105001 # on AMD 800 No, it's not zero-overhead, but it's simple and easy enough to do portably. Crazy platform-specific zero-overhead schemes can come later as optimizations. s(Crazy)(Reasonable) but later is ok:) leo
Re: cygwin link failure
Hi First of all, yet_another_shy_lurker++; On cygwin, the final link fails with the following error:- gcc -o parrot.exe -s -L/usr/local/lib -g imcc/main.o blib/lib/libparrot.a -lcrypt blib/lib/libparrot.a(io_unix.o)(.text+0x87e): In function `PIO_sockaddr_in': /home/Jonathan/parrot_test/io/io_unix.c:468: undefined reference to `_inet_pton' I had that problem when i tried to compile parrot on one of our school machines(cygwin). inet_pton is an addressfamily independent version of inet_aton that works with normal ip adresses aswell as ipv6 adresses, but is mostly only defined on machines that support ipv6. inet_pton has not yet been implemented in cygwin, but it is being worked on... http://win6.jp/Cygwin/ Indeed, but I think there might be other unix-like environments that (do not yet|will never) provide the inet_pton function. So I tried to add a inet_pton implementation for the cases where the platform does not provide it. Apache 2.0 goes that way, http://lxr.webperf.org/source.cgi/srclib/apr/network_io/unix/inet_pton.c I alread managed to adapt that piece of source slightly so that it compiles during the parrot build process. Now I'm trying to understand parrots configuration system in order to compile this only if there is no inet_pton defined. But then, im only a shy_lurker so this might take some time... Thomas
Re: JVM as a threading example (threads proposal)
On Thu, Jan 15, 2004 at 11:58:22PM -0800, Jeff Clites wrote: On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote: Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. You're missing the point of the multi-step access. It has nothing to do with threading or atomic access to variables. The JVM is a stack machine. JVM opcodes operate on the stack, not on main memory. The stack is thread-local. In order for a thread to operate on a variable, therefore, it must first copy it from main store to thread- local store (the stack). Parrot, so far as I know, operates in exactly the same way, except that the thread-local store is a set of registers rather than a stack. Both VMs separate working-set data (the stack and/or registers) from main store to reduce symbol table lookups. What I was expecting that the Java model was trying to do (though I didn't find this) was something along these lines: Accessing the main store involves locking, so by copying things to a thread-local store we can perform several operations on an item before we have to move it back to the main store (again, with locking). If we worked directly from the main store, we'd have to lock for each and every use of the variable. I don't believe accesses to main store require locking in the JVM. This will all make a lot more sense if you keep in mind that Parrot-- unthreaded as it is right now--*also* copies variables to working store before operating on them. This isn't some odd JVM strangeness. The JVM threading document is simply describing how the stack interacts with main memory. - Damien
RE: Events and JIT
Leopold Toetsch wrote: Why? The bytecode is patched by a different thread *if* an event is due (which in CPU cycles is rare). And I don't see a thread safety problem. The (possibly different) CPU reads an opcode and runs it. Somewhere in the meantime, the opcode at that memory position changes to the byte sequence 0xCC (on intel: int3 ) one byte changes, the CPU executes the trap or not (or course changing that memory position is assumed to be atomic, which AFAIK works on i386) - but next time in the loop the trap is honored. Other threads than the target could be executing the same chunk of JITted code at the same time. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
[PATCH] Configure ICU for building on Win32
Hi, The attached patch adds support to configure for building ICU with MSVC++, as recommended in:- http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html#HowToBuildWindows Unfortunately, the VC++ project/workspace files are missing from our ICU tree in CVS, see:- http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/allinone/ http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/allinone/all/ Once they are there, and with this patch, it should (unless I've messed up) work. Jonathan
Re: cygwin link failure
From: Seiler Thomas [EMAIL PROTECTED] First of all, yet_another_shy_lurker++; Welcome. :-) On cygwin, the final link fails with the following error:- gcc -o parrot.exe -s -L/usr/local/lib -g imcc/main.o blib/lib/libparrot.a -lcrypt blib/lib/libparrot.a(io_unix.o)(.text+0x87e): In function `PIO_sockaddr_in': /home/Jonathan/parrot_test/io/io_unix.c:468: undefined reference to `_inet_pton' I had that problem when i tried to compile parrot on one of our school machines(cygwin). inet_pton is an addressfamily independent version of inet_aton that works with normal ip adresses aswell as ipv6 adresses, but is mostly only defined on machines that support ipv6. inet_pton has not yet been implemented in cygwin, but it is being worked on... http://win6.jp/Cygwin/ Indeed, but I think there might be other unix-like environments that (do not yet|will never) provide the inet_pton function. So I tried to add a inet_pton implementation for the cases where the platform does not provide it. Apache 2.0 goes that way, http://lxr.webperf.org/source.cgi/srclib/apr/network_io/unix/inet_pton.c This was the kinda solution I had in mind, but my network programming knowledge is way under par. I alread managed to adapt that piece of source slightly so that it compiles during the parrot build process. Now I'm trying to understand parrots configuration system in order to compile this only if there is no inet_pton defined. You may want to take a look at config/auto/memalign.pl, which I believe is one of a number of scripts that generates a c file and attempts to compile it, then does something based upon the success of that attempt. But then, im only a shy_lurker so this might take some time... Thanks for having a crack at it. Jonathan
Re: [PATCH] Configure ICU for building on Win32
From: Jonathan Worthington [EMAIL PROTECTED] The attached patch... Which I forgot to attach. Sorry. Jonathan icuwin32.patch Description: Binary data
Re: JVM as a threading example (threads proposal)
On Jan 16, 2004, at 1:01 PM, Damien Neil wrote: On Thu, Jan 15, 2004 at 11:58:22PM -0800, Jeff Clites wrote: On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote: Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. You're missing the point of the multi-step access. It has nothing to do with threading or atomic access to variables. The JVM is a stack machine. JVM opcodes operate on the stack, not on main memory. The stack is thread-local. In order for a thread to operate on a variable, therefore, it must first copy it from main store to thread- local store (the stack). Parrot, so far as I know, operates in exactly the same way, except that the thread-local store is a set of registers rather than a stack. Both VMs separate working-set data (the stack and/or registers) from main store to reduce symbol table lookups. ... This will all make a lot more sense if you keep in mind that Parrot-- unthreaded as it is right now--*also* copies variables to working store before operating on them. This isn't some odd JVM strangeness. The JVM threading document is simply describing how the stack interacts with main memory. I think the JVM spec actually implying something beyond this. For instance section 8.3 states, A store operation by T on V must intervene between an assign by T of V and a subsequent load by T of V. Translating this to parrot terms, this would mean that the following is illegal, which it clearly isn't: find_global P0, V set P0, P1 # assign by T of V find_global P0, V # a subsequent load by T of V w/o an intervening store operation by T on V I think it is talking about something below the Java-bytecode level--remember, this is the JVM spec, and constrains how an implementation of the JVM must behave when executing a sequence of opcodes, not the rules a Java compiler must follow when generating a sequence of opcodes from Java source code. What I think it's really saying, again translated into Parrot terms, is this: store_global foo, P0 # internally, may cache value and not push to main memory find_global P0, foo # internally, can't pull value from main memory if above value was not yet pushed there and I think the point is this: find_global P0, foo # internally, also caches value in thread-local storage find_global P0, foo # internally, can use cached thread-local value And, as mentioned in section 8.6, any time a lock is taken, cached values need to be pushed back into main memory, and the local cache emptied. This doesn't make any sense if the thread's working memory is interpreted as the stack. JEff
Re: Events and JIT
On Jan 16, 2004, at 1:20 PM, Leopold Toetsch wrote: Gordon Henriksen [EMAIL PROTECTED] wrote: Leopold Toetsch wrote: 2) Patch the native opcodes at these places with e.g. int3 I don't think that bytecode-modifying versions should fly; they're not threadsafe, Why? The bytecode is patched by a different thread *if* an event is due (which in CPU cycles is rare). And I don't see a thread safety problem. The (possibly different) CPU reads an opcode and runs it. Somewhere in the meantime, the opcode at that memory position changes to the byte sequence 0xCC (on intel: int3 ) one byte changes, the CPU executes the trap or not (or course changing that memory position is assumed to be atomic, which AFAIK works on i386) - but next time in the loop the trap is honored. Where in the stream would we patch? If not in a loop, you may never hit the single patched location again, but still might not end for a very long time If we are patching all locations with branches and such, in large bytecode this could take a long time, and the executing thread might outrun the patching thread. Also, once the handler is entered we'd have to fix all of patched locations, which again could be time-consuming for large bytecode. It could work, but seems problematic. JEff
Re: JVM as a threading example (threads proposal)
On Friday, January 16, 2004, at 02:58 , Jeff Clites wrote: On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote: Damien Neil [EMAIL PROTECTED] wrote: On Thu, Jan 15, 2004 at 09:31:39AM +0100, Leopold Toetsch wrote: I don't see any advantage of such a model. The more as it doesn't gurantee any atomic access to e.g. long or doubles. The atomic access to ints and pointers seems to rely on the architecture but is of course reasonable. You *can't* guarantee atomic access to longs and doubles on some architectures, unless you wrap every read or write to one with a lock. The CPU support isn't there. Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. What I was expecting that the Java model was trying to do (though I didn't find this) was something along these lines: Accessing the main store involves locking, so by copying things to a thread-local store we can perform several operations on an item before we have to move it back to the main store (again, with locking). If we worked directly from the main store, we'd have to lock for each and every use of the variable. I think the real purpose of the model was to say thread-local values may be committed to main memory (perhaps significantly) after the local copy is logically assigned. Thus: In the absence of explicit synchronization, threads may manipulate potentially inconsistent local copies of variables. This model addresses: Copies of variables in registers, copies on the JVM stack, copies in the stack frame, thread preemption prior to store (which occurs on uniprocessors and multiprocessors alike), and delayed write-back caches in SMP systems.[*] In short, this portion of the spec provides bounds for the undefinedness of the behavior that occurs when programs do not use Java's synchronization primitives. It does so realistically, in a manner that contemporary computer systems can implement efficiently. (In fact, the spec is far more descriptive than it is proscriptive.) Or, as an example, it allows the natural thing to happen here: ; PPC[**] implementation of: ; var = var + var; lwz r30, 0(r29) ; load var (1 JVM load) addi r30, r30, r30 ; double var(2 JVM uses, 1 JVM assign) ; if your thread is preempted here stw r30, 0(r29) ; store var (1 JVM store) And allows obvious optimizations like this: ; PPC implementation of: ; var = var + var; ; var = var + var; lwz r30, 0(r29) addi r30, r30, r30 ; imagine your thread is preempted here addi r30, r30, r30 stw 0(r29), r30 But it explicitly disallows that same optimization for a case like this: var = var + var; synchronized (other} { other++; } var = var + var; That--that, and the whole cache coherency/delayed write thing: ; CPU 1 ; CPU 2 loadi r29, 0xFFF8 loadi r29, 0xFFF8 loadi r30, 0xDEAD loadi r30, 0xBEEF stw 0(r29), r30 stw 0(r29), r30 lwz r28, 0(r29) lwz r28, 0(r29) ; r28 is probably 0xDEAD; r28 is probably 0xBEEF ; (but could be 0xBEEF) ; (but could be 0xDEAD) syncnoop lwz r28, 0(r29) lwz r28, 0(r29) ; r28 matches on both CPUs now, either 0xDEAD or ; 0xBEEF (but not 0xDEEF or 0xBEAD or 0x). [* - On many SMP systems, processors do not have coherent views of main memory (due to their private data caches) unless the program executes explicit memory synchronization operations, which are at least expensive enough that you don't want to execute them on every opcode.] [** - Forgive my rusty assembler.] The reason I'm not finding it is that the semantic rules spelled out in the spec _seem_ to imply that every local access implies a corresponding access to the main store, one-to-one. On the other hand, maybe the point is that it can save up these accesses--that is, lock the main store once, and push back several values from the thread-local store. If it can do this, then it is saving some locking. The spec doesn't lock except when, or if the architecture were unable to provide atomic loads and stores for 32-bit quantities. Parrot deals with PMCs, which can contain (lets consider scalars only) e.g. a PerlInt or a PerlNumer. Now we would have atomic access (normally) to the former and very likely non-atomic access to the latter just depending on the value which happened to be stored in the PMC. This implies, that we have to wrap almost[1] all shared write *and* read PMC access with LOCK/UNLOCK. [1] except plain ints and pointers on current platforms Ah, but this misses a key point: We know that user data is allowed to get corrupted if the user isn't locking properly--we only have to protect VM-internal state. The key point is that it's very unlikely that there will be any floats involved in VM-internal state--it's going to be all pointers and ints (for offsets and lengths). That is, a corrupted float won't crash the VM. On one
Re: JVM as a threading example (threads proposal)
On Friday, January 16, 2004, at 08:38 , Jeff Clites wrote: On Jan 16, 2004, at 1:01 PM, Damien Neil wrote: On Thu, Jan 15, 2004 at 11:58:22PM -0800, Jeff Clites wrote: On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote: Yes, that's what I'm saying. I don't see an advantage of JVMs multi-step variable access, because it even doesn't provide such atomic access. You're missing the point of the multi-step access. It has nothing to do with threading or atomic access to variables. ... it has everything to do with allowing multiprocessors to operate without extraneous synchronization. The JVM is a stack machine. JVM opcodes operate on the stack, not on main memory. The stack is thread-local. In order for a thread to operate on a variable, therefore, it must first copy it from main store to thread-local store (the stack). Parrot, so far as I know, operates in exactly the same way, except that the thread-local store is a set of registers rather than a stack. Both VMs separate working-set data (the stack and/or registers) from main store to reduce symbol table lookups. ... This will all make a lot more sense if you keep in mind that Parrot--unthreaded as it is right now--*also* copies variables to working store before operating on them. This isn't some odd JVM strangeness. The JVM threading document is simply describing how the stack interacts with main memory. I think the JVM spec actually implying something beyond this. For instance section 8.3 states, A store operation by T on V must intervene between an assign by T of V and a subsequent load by T of V. Translating this to parrot terms, this would mean that the following is illegal, which it clearly isn't: find_global P0, V set P0, P1 # assign by T of V find_global P0, V # a subsequent load by T of V w/o an intervening store operation by T on V This rule addresses aliasing. It says that this (in PPC assembly): ; presume (obj-i) == obj+12 lwz r29, 12(r30) ; read, load addi r29, r29, 1 ; use, assign lwz r28, 12(r30) ; read, load addi r28, r28, 1 ; use, assign stw 0(r30), r29 ; store, eventual write stw 0(r30), r28 ; store, eventual write ... is an invalid implementation of this: j.i = j.i + 1; k.i = k.i + 1; ... where the JVM cannot prove j == k to be false. The rule states that the stw of r29 must precede the stw of r28. Why this is under threading... beyond me. Let me briefly hilight the operations as discussed and digress a little bit as to why all the layers: main memory --read+load-- working copy (register file, stack frame, etc.) --use-- execution engine (CPU core) --assign-- working copy (register file, stack frame, etc.) --write+store-- main memory (I paired the read+load and write+store due to the second set of rules in 8.2.) The spec never says where a read puts something that a load can use it, or where a store puts something that a write can use it. A store with its paired write pending is simply an in-flight memory transaction (and the same for a read+load pair). Possible places the value could be: In-flight on the system bus; queued by the memory controller; on a dirty line in a write-back cache; somewhere in transit on a NUMA architecture. Store-write and read-load are just different ends of the underlying ISA's load and store memory transactions. The read and write operations specify the operations from the memory controller's perspective; load and store specify them from the program's perspective. Note that reads and writes are performed by main memory, not by a thread. That distinction is crucial to reading the following section from the spec: 8.2 EXECUTION ORDER AND CONSISTENCY The rules of execution order constrain the order in which certain events may occur. There are four general constraints on the relationships among actions: * The actions performed by any one thread are totally ordered; that is, for any two actions performed by a thread, one action precedes the other. * The actions performed by the main memory for any one variable are totally ordered; that is, for any two actions performed by the main memory on the same variable, one action precedes the other. ... The extra read/write step essentially allows main memory (the memory controller) to order its operations with bounded independence of any particular thread. Careful reading of the other rules will show that this is only a useful abstraction in the case of true concurrency (e.g., SMP), as the other rules ensure that a single processor will always load variables in a state consistent with what it last stored. I think it is talking about something below the Java-bytecode level--remember, this is the JVM spec, and constrains how an implementation of the JVM must behave when executing a sequence of opcodes, not the rules a Java compiler must