[perl #55000] [PATCH] Threads Failures on Optimized Build
On Sun, Jun 1, 2008 at 1:31 PM, Vasily Chekalkin [EMAIL PROTECTED] wrote: interp-exceptions initialized lazily. But really_destroy_exception have signature with __attribute_notnull__. So we should either check this value before function call or change function signature to accepts NULL. I tried this variant: --- src/exceptions.c(revisión: 28050) +++ src/exceptions.c(copia de trabajo) @@ -772,8 +772,10 @@ void destroy_exception_list(PARROT_INTERP) { -really_destroy_exception_list(interp-exceptions); -really_destroy_exception_list(interp-exc_free_list); +if (interp-exceptions) +really_destroy_exception_list(interp-exceptions); +if (interp-exc_free_list) +really_destroy_exception_list(interp-exc_free_list); } /* In my platform, Ubuntu 8.04 i386, solves both this problem and #55170 The diagnostic is the same, the root of the problem is to pass null to a parameter attributed as non null. (Optionally add several rants about premature optimization here). -- Salu2 Index: src/exceptions.c === --- src/exceptions.c (revisión: 28050) +++ src/exceptions.c (copia de trabajo) @@ -772,8 +772,10 @@ void destroy_exception_list(PARROT_INTERP) { -really_destroy_exception_list(interp-exceptions); -really_destroy_exception_list(interp-exc_free_list); +if (interp-exceptions) +really_destroy_exception_list(interp-exceptions); +if (interp-exc_free_list) +really_destroy_exception_list(interp-exc_free_list); } /*
Re: [perl #55000] [PATCH] Threads Failures on Optimized Build
On Tuesday 03 June 2008 10:50:27 NotFound via RT wrote: On Sun, Jun 1, 2008 at 1:31 PM, Vasily Chekalkin [EMAIL PROTECTED] wrote: interp-exceptions initialized lazily. But really_destroy_exception have signature with __attribute_notnull__. So we should either check this value before function call or change function signature to accepts NULL. I tried this variant: --- src/exceptions.c(revisión: 28050) +++ src/exceptions.c(copia de trabajo) @@ -772,8 +772,10 @@ void destroy_exception_list(PARROT_INTERP) { -really_destroy_exception_list(interp-exceptions); -really_destroy_exception_list(interp-exc_free_list); +if (interp-exceptions) +really_destroy_exception_list(interp-exceptions); +if (interp-exc_free_list) +really_destroy_exception_list(interp-exc_free_list); } /* In my platform, Ubuntu 8.04 i386, solves both this problem and #55170 The diagnostic is the same, the root of the problem is to pass null to a parameter attributed as non null. (Optionally add several rants about premature optimization here). Agreed, and applied as r28051. Thanks, everyone! -- c
Re: [perl #55000] Threads Failures on Optimized Build
chromatic wrote: There is little bit different patch for it. --- a/src/exceptions.c +++ b/src/exceptions.c @@ -772,7 +772,9 @@ associated exceptions free list for the specified interpreter. void destroy_exception_list(PARROT_INTERP) { -really_destroy_exception_list(interp-exceptions); +if (interp-exceptions != NULL) { +really_destroy_exception_list(interp-exceptions); +} really_destroy_exception_list(interp-exc_free_list); } interp-exceptions initialized lazily. But really_destroy_exception have signature with __attribute_notnull__. So we should either check this value before function call or change function signature to accepts NULL. -- Bacek.
[perl #55000] Threads Failures on Optimized Build
# New Ticket Created by chromatic # Please include the string: [perl #55000] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=55000 I'm seeing several test failures from an optimized build (Ubuntu Hardy Heron x86 32-bit). Here's the verbose output from prove. I can post backtraces if necessary. As far as I can tell, they've been present for at least a thousand commits. t/pmc/threads.t. 1..20 ok 1 - interp identity not ok 2 - thread type 1 # Failed test 'thread type 1' # at t/pmc/threads.t line 80. # Exited with error code: 139 # Received: # thread # main 10 # Segmentation fault # # Expected: # thread # main 10 # not ok 3 - thread type 1 -- repeated # Failed test 'thread type 1 -- repeated' # at t/pmc/threads.t line 115. # Exited with error code: 139 # Received: # thread # main 10 # Segmentation fault # # Expected: # thread # main 10 # thread # main 10 # not ok 4 - thread type 2 # Failed test 'thread type 2' # at t/pmc/threads.t line 161. # Exited with error code: 139 # Received: # ok 1 # hello from thread # ParrotThread tid 1 # from 10 interp # Segmentation fault # # Expected: # ok 1 # hello from thread # ParrotThread tid 1 # from 10 interp # ok 5 - thread - kill not ok 6 - join, get retval # Failed test 'join, get retval' # at t/pmc/threads.t line 237. # Exited with error code: 139 # Received: # Segmentation fault # # Expected: # 500500 # 500500 # not ok 7 - detach # Failed test 'detach' # at t/pmc/threads.t line 290. # Exited with error code: 139 # Received: # thread # done # Segmentation fault # # Expected: # /(done\nthread\n)|(thread\ndone\n)/ # not ok 8 - share a PMC # Failed test 'share a PMC' # at t/pmc/threads.t line 319. # Exited with error code: 139 # Received: # thread # 20 # Segmentation fault # # Expected: # thread # 20 # done # 21 # not ok 9 - multi-threaded # Failed test 'multi-threaded' # at t/pmc/threads.t line 355. # Exited with error code: 139 # Received: # 3 # 1 # 2 # 3 # done thread # Segmentation fault # # Expected: # 3 # 1 # 2 # 3 # done thread # done main # not ok 10 - sub name lookup in new thread # Failed test 'sub name lookup in new thread' # at t/pmc/threads.t line 403. # Exited with error code: 139 # Received: # ok # ok # Segmentation fault # # Expected: # ok # ok # not ok 11 - CLONE_CODE only # Failed test 'CLONE_CODE only' # at t/pmc/threads.t line 436. # Exited with error code: 139 # Received: # ok 1 # ok 2 # ok 3 # ok 4 # Segmentation fault # # Expected: # ok 1 # ok 2 # ok 3 # ok 4 # ok 5 # not ok 12 - CLONE_CODE | CLONE_GLOBALS # Failed test 'CLONE_CODE | CLONE_GLOBALS' # at t/pmc/threads.t line 495. # Exited with error code: 139 # Received: # in thread: # ok alpha # ok beta1 # ok beta2 # ok beta3 # Segmentation fault # # Expected: # in thread: # ok alpha # ok beta1 # ok beta2 # ok beta3 # in main: # ok alpha # ok beta1 # ok beta2 # ok beta3 # not ok 13 - CLONE_CODE | CLONE_CLASSES; superclass not built-in # TODO vtable overrides aren't properly cloned RT# 46511 # Failed (TODO) test 'CLONE_CODE | CLONE_CLASSES; superclass not built-in' # at t/pmc/threads.t line 580. # Exited with error code: 139 # Received: # in thread: # Segmentation fault # # Expected: # in thread: # A Bar # called Bar's barmeth # called Foo's foometh # Integer? 0 # Foo? 1 # Bar? 1 # in main: # A Bar # called Bar's barmeth # called Foo's foometh # Integer? 0 # Foo? 1 # Bar? 1 # not ok 14 - CLONE_CODE | CLONE_CLASSES; superclass built-in # Failed test 'CLONE_CODE | CLONE_CLASSES; superclass built-in' # at t/pmc/threads.t line 665. # Exited with error code: 139 # Received: # in thread: # A Bar # called Bar's barmeth # called Foo's foometh # Integer? 1 # Foo? 1 # Bar? 1 # Segmentation fault # # Expected: # in thread: # A Bar # called Bar's barmeth # called Foo's foometh # Integer? 1 # Foo? 1 # Bar? 1 # in main: # A Bar # called Bar's barmeth # called Foo's foometh # Integer? 1 # Foo? 1 # Bar? 1 # not ok 15 - CLONE_CODE | CLONE_GLOBALS| CLONE_HLL # Failed test 'CLONE_CODE | CLONE_GLOBALS| CLONE_HLL' # at t/pmc/threads.t line 750. # Exited with error code: 139 # Received: # in thread: # ok 1 # ok 2 # Segmentation fault # # Expected: # in thread: # ok 1 # ok 2 # in main: # ok 1 # ok 2 # not ok 16 - globals + constant table subs issue # Failed test 'globals + constant table subs issue' # at t/pmc/threads.t line 816. # Exited with error code: 139 # Received: # ok 1 # ok 2 # ok 3 # ok 4 # ok 5 # ok 6 # ok 7 # ok 8 # ok 9 # ok 10 # ok 11 # ok 12 # ok 13 # ok 14 # ok 15 # ok 16 # ok 17 # ok 18 # ok 19 # ok 20 # ok 21 # ok 22 # ok 23 # ok 24 # ok 25 # ok 26 # ok 27 # ok 28 # ok 29 # ok 30 # ok 31 # ok 32 # ok 33 # ok 34 # ok 35 # ok 36 # ok 37 # ok 38 # ok 39 # ok 40 # ok 41 # ok 42 # ok 43 # ok 44 # ok 45 # ok 46 # ok 47 # ok 48 # ok 49 # ok 50 # ok 51 # ok 52 # ok 53 # ok 54 # ok 55 # ok 56 # ok 57 # ok 58 # ok 59 # ok 60 # ok 61
Dynamic binding (e.g. Perl5 local), continuations, and threads
The obvious way to implement Perl5 local bindings in Parrot would be to: 1. Capture the old value in a temporary; 2. Use store_global to set the new value; 3. Execute the rest of the block; and 4. Do another store_global to reinstate the old value. This has a serious flaw: It leaves the new binding in effect if executing the rest of the block causes a nonlocal exit. Throwing can be dealt with by establishing an error handler that reinstates the old value and rethrows, but this doesn't begin to address continuations, not to mention coroutines, which can be used to jump to an arbitrary call frame without unwinding the stack. Then there are threads to consider. The naive approach makes each thread's dynamic bindings visible to all other threads, which may or may not be desirable (not, IMHO, but this is a language design issue). Worse, the final state of the variable depends on which thread exits first, which is surely a bug. Proposal: The only reasonable approach (it seems to me) is to keep dynamic binding information in the call frame. One possible implementation is as follows: 1. Add a dynamic_bindings pointer to the call frame. This points to a linked list of the frame's current bindings, each entry of which holds a name/value pair. Each thread's initial frame gets its dynamic_bindings list initialized to NULL. Each new frame's dynamic_bindings list is initialized from the calling frame's dynamic_bindings. Nothing additional need be done for continuation calling. The dynamic_bindings list needs to be visited during GC. 2. Modify store_global and find_global to search this list for the desired global. If found, store_global modifies the entry value, and find_global fetches the entry value. If not found, the existing hash is consulted in the current fashion (modulo namespace implementation). 3. Add a bind_global op with the same prototype as store_global that pushes a new entry on the dynamic_bindings list. 4. Add an unbind_global op that takes an integer or integer constant and pops that many entries off of the dynamic_bindings list. Bindings are effectively popped when the sub exists, but this op is still needed for cases where the end of a dynamic binding lifetime comes before the end of the sub. Advantages: + The scheme is robust with respect to continuations, threads, coroutines, and nonlocal exits. Each sub creates dynamic bindings that are visible only within its dynamic scope (i.e. subs that it calls, directly or indirectly), and not to other threads or coroutines. + The overhead is low: a pointer copy on call, none on return, and zero context switching overhead. For typical programs with little or no dynamic binding, these are the only costs. + The code that a human or compiler needs to emit is even simpler than that of the naive scheme described above. Disadvantages: + The time to fetch or store a dynamic binding is proportional to the depth of the dynamic_bindings stack, which could be considerable for languages that do a lot of dynamic binding. Conceivably, this could be addressed by using a language-dependant PMC class for the binding entry, and any such dynamic-binding-intensive language could define a per-frame class that acted as a linked list of hashes. But this could be postponed until it was needed, possibly indefinitely. If this is acceptable (and there isn't already a better plan), I will have time to address this over the holiday week. TIA, -- Bob Rogers http://rgrjr.dyndns.org/
Re: threads on Solaris aren't parallel?
On Mon, Dec 12, 2005 at 10:28:31PM +0100, Leopold Toetsch wrote: On Dec 12, 2005, at 17:53, Erik Paulson wrote: Hi - I'm using an older version of Parrot (0.2.2) so I can use threads. It seems that Parrot on Solaris doesn't ever use more than one processor. [ ... ] Is there some way we can check to see if Parrot is actually creating more than one thread? Is it some sort of crazy green-thread issue? There are AFAIK some issues with solaris (but I don't know the details) It might need a different threading lib or some additional init code to create 'real' threads. I've got it to work now, thanks to Joe Wilson who gave me the last clue. I turned on pthreads in configure: perl Configure.pl --ccflags=:add{ -pthreads -D_REENTERANT } --linkflags=:add{ -pthreads } and I changed the definitino of CREATE_THREAD_JOINABLE: # define THREAD_CREATE_JOINABLE(t, func, arg) do {\ pthread_attr_t attr; \ int rc = pthread_attr_init(attr); \ assert(rc == 0);\ rc = pthread_attr_setscope(attr, PTHREAD_SCOPE_SYSTEM); \ assert(rc == 0);\ rc = pthread_setconcurrency(8); \ assert(rc == 0);\ pthread_create(t, NULL, func, arg); \ } while(0) The default to attrsetsope on Solaris is SCOPE_PROCESS, the default on Linux is SCOPE_SYSTEM. I'm on Solaris 8, and without the call to pthread_setconcurrency, I only ran one thread at a time. Starting in Solaris 9, pthread_setconcurrency doesn't do anything. (I don't have a Solaris 9 SMP I can test on to see if parrot uses multiple processors concurrently without the call to set_concurrency. My runtimes get about twice as fast every time I add a processor. I'm not sure what the minimal set of calls I need to add are - the setconcurrency was the last thing I tried, and once I added it in things started working - I don't know if that means I can remove my other changes and things will still work, I'll do that experiment later. -Erik
Re: threads on Solaris aren't parallel?
On Dec 13, 2005, at 19:35, Erik Paulson wrote: I've got it to work now, thanks to Joe Wilson who gave me the last clue. Great. I turned on pthreads in configure: perl Configure.pl --ccflags=:add{ -pthreads -D_REENTERANT } --linkflags=:add{ -pthreads } Have a look at config/init/hints/solaris.pm to make this change more permanent. Also having some SOLARIS_VERSION define (for below) if it's solaris would be good I presume. and I changed the definitino of CREATE_THREAD_JOINABLE: For a final patch you could include some #ifdef SOLARIS_VERSION == foo to just include necessary extensions. -Erik leo
Re: threads on Solaris aren't parallel?
Leopold Toetsch wrote: and I changed the definitino of CREATE_THREAD_JOINABLE: For a final patch you could include some #ifdef SOLARIS_VERSION == foo to just include necessary extensions. Doesn't look like there's anything Solaris-specific here. Other non-Linux OSes will need the same changes (FreeBSD supports both SCOPE_SYSTEM and SCOPE_PROCESS, for example), will they not? -- http://www.velocityvector.com/ | http://glmiller.blogspot.com/ http://www.classic-games.com/ | The hand ain't over till the river.
threads on Solaris aren't parallel?
Hi - I'm using an older version of Parrot (0.2.2) so I can use threads. It seems that Parrot on Solaris doesn't ever use more than one processor. The program attached should create argv[1] number of threads, and divide up over both of them argv[2] - ie perfect linear speedup. I've got a dual-processor Xeon (a real one, not this hyperthreaded stuff) and I indeed get speedup: tonic(19)% time ./parrot /common/tmp/jon/thread_test.pir 1 5 ... 18.870u 0.010s 0:18.93 99.7%0+0k 0+0io 534pf+0w tonic(20)% time ./parrot /common/tmp/jon/thread_test.pir 2 5 ... 19.360u 0.030s 0:09.93 195.2% 0+0k 0+0io 534pf+0w tonic(21)% However, on a Solaris machine that has 8 CPUs, we get no speedup: pinot(6)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 1 5000 ... 9.69u 0.05s 0:09.77 99.6% pinot(7)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 2 5000 ... 9.08u 0.09s 0:09.19 99.7% pinot(8)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 4 5000 ... 9.28u 0.07s 0:09.38 99.6% pinot(9)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 8 5000 ... 9.67u 0.03s 0:09.74 99.5% Is there some way we can check to see if Parrot is actually creating more than one thread? Is it some sort of crazy green-thread issue? Thanks, -Erik # Basic shared array program for parrot .sub _main .param pmc argv .sym int threadIncs .sym pmc threads .sym pmc child .sym pmc Inc_array .local pmc increment_pass .local pmc seed_param .local int i, value, seed .local pmc temp .local int tmp .local int offset .local int numThreads .local pmc logtmlib .local pmc DoBreakpoint .local string parameter parameter = shift argv parameter = shift argv numThreads = parameter parameter = shift argv #numThreads = 1 threadIncs = parameter threadIncs = threadIncs/numThreads init_array: Inc_array = global increment_array # get function pointer # setup an array to hold threads threads = new .FixedPMCArray threads = 50 # Set the number of increments to do in each thread increment_pass = new .Integer increment_pass = threadIncs seed = 54433 seed_param = new .Integer i = 0 create_Thread: child = new ParrotThread # basically new thread .sym pmc New_thread find_method New_thread, child, thread3 seed_param = seed increment_pass = increment_pass .pcc_begin .arg Inc_array .arg increment_pass .arg seed_param .invocant child .nci_call New_thread .pcc_end threads[i] = child inc i if i numThreads goto create_Thread i = 0 # Join and wait on threads _join_thread: .sym int tid .sym pmc Thread_join child = threads[i] tid = threads[i] find_method Thread_join, child, join .pcc_begin .arg tid .nci_call Thread_join .pcc_end threads[i] = child inc i if i numThreads goto _join_thread #DoBreakpoint() i = 0 tmp = 0 _main_print_loop: tmp = tmp + value print value print inc i if i 100 goto _main_print_loop print \n print tmp print \n .end # The code to exectute in the thread .sub increment_array .param pmc sub .param pmc increments .param pmc seed_param .local int i, tmp, value, numIncs, rand, index .local int temp numIncs = increments i = 0 s_loop: inc i if i numIncs goto s_loop i = 0 .end
Re: threads on Solaris aren't parallel?
On Dec 12, 2005, at 17:53, Erik Paulson wrote: Hi - I'm using an older version of Parrot (0.2.2) so I can use threads. It seems that Parrot on Solaris doesn't ever use more than one processor. [ ... ] Is there some way we can check to see if Parrot is actually creating more than one thread? Is it some sort of crazy green-thread issue? There are AFAIK some issues with solaris (but I don't know the details) It might need a different threading lib or some additional init code to create 'real' threads. Thanks, -Erik leo
Re: threads on Solaris aren't parallel?
Leopold Toetsch wrote: There are AFAIK some issues with solaris (but I don't know the details) It might need a different threading lib or some additional init code to create 'real' threads. You just have to know how they implement pthreads, which is weasel-worded in POSIX and allows Solaris much divergence from what you expect. It's the LWP versus full-fledged process thing. -- Jack J. Woehr # I never played fast and loose with the PO Box 51, Golden, CO 80402 # Constitution. Never did and never will. http://www.well.com/~jax # - Harry S Truman
Re: threads
Dave Frost [EMAIL PROTECTED] wrote: From the outset i decided i wanted the vm to provide its own threading mechanism i.e. not based on posix threads for example. Parrot had the option of providing its own threads, thread scheduling and the like. As leo mentioned, we're using OS threads. The problem with threads in user space (as you propose) is that the operating system doesn't know of the threads - it just sees the single OS thread running the program. And that means that if you have 4 CPUs (or a 4-core CPU, which I doubt will be uncommon in the near future) then any program running on your VM can only ever use 2 of those (the OS can only schedule threads that it knows about, and your VM's program's threads wouldn't be real ones, so couldn't be scheduled on seperate CPUs). With concurrency being of increasing importance, I think this is a powerful argument for using OS threads. My first plan was to have 2 native threads, one for execution of the main 'core' execution code the runtime if you like, the other thread was used to tell the execution code to swap threads. The most expensive part of using threading, aside from thread creation, is doing context switches (between threads). Here, you are requiring two context switches to provide one virtual context switch to code you are executing. And the switches are, of course, wasted if you decide not to switch. I thought i could synchronise these 2 using native semaphores. When it comes down to it a single op code takes a number of nastive instructions i.e. to execute an add insttuction i may have to do (say) 5 things. so I just check after each op code has been executed to see if the thread needs swapping out. That seems like a bad idea mainly due to speed/efficiency. Each thread is a top level object, so the stack, all stack pointers and register data etc resides in the thread, but i still cant have the execution engine swapping threads mid operation, so in my add example i still dont think i would want the execution engine swapping out a thread after 3 instructions, it would need to complete all 5. Its been a bit of a brick wall this, it seemed to be going quite well up to this point and i need to solve this before i can move on with lwm,(light weight machine). Any pointers, thoughts or comments are welcome. I guess what I really want to say is consider using OS threads. :-) But more helpfully, here's a (hacky, but maybe workable) approach that immediately occurs to me. I assume you have a sequence of bytecode that you execute. When you need to do a context switch, the thread is doing the signalling to say context switch now takes a copy of the instruction part of the next opcode that would be executed in the current virtual thread and then replaces the instruction with a context switch opcode. Then, when the context switch opcode executes, it replaces the instruction in the bytecode with the original one and does the context switch. But you need special cases for ops that want to try and obtain locks, so you can force a switch if the lock isn't available and stuff. And probably a flag to set when a switch happens, so you don't mess up the bytecode by re-writing the context switch opcode twice. Not saying it's the best scheme, but it may be Good Enough. Have fun, Jonathan
threads
hi all, Im interested to know how perl6/parrot implements threads. Im mainly interested as im writing a small vm of my own. Not a production vm like parrot, more for interest/curiosity etc. From the outset i decided i wanted the vm to provide its own threading mechanism i.e. not based on posix threads for example. My first plan was to have 2 native threads, one for execution of the main 'core' execution code the runtime if you like, the other thread was used to tell the execution code to swap threads. I thought i could synchronise these 2 using native semaphores. When it comes down to it a single op code takes a number of nastive instructions i.e. to execute an add insttuction i may have to do (say) 5 things. so I just check after each op code has been executed to see if the thread needs swapping out. That seems like a bad idea mainly due to speed/efficiency. Each thread is a top level object, so the stack, all stack pointers and register data etc resides in the thread, but i still cant have the execution engine swapping threads mid operation, so in my add example i still dont think i would want the execution engine swapping out a thread after 3 instructions, it would need to complete all 5. Its been a bit of a brick wall this, it seemed to be going quite well up to this point and i need to solve this before i can move on with lwm,(light weight machine). Any pointers, thoughts or comments are welcome. Cheers Dave
Re: threads
On Sep 27, 2005, at 17:14, Dave Frost wrote: hi all, Im interested to know how perl6/parrot implements threads. *) based on OS threads *) one interpreter per thread *) STM for shared objects / atomicity Any pointers, thoughts or comments are welcome. Cheers Dave leo
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Mr. Gay, let me know if you wait for a special request to uncomment the line /*#include parrot/thr_windows.h*/ in config/gen/platform/win32/threads.h whatever was broken, has now been fixed. patch applied, and ticket closed. ~jerry
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
On 5/9/05, jerry gay [EMAIL PROTECTED] wrote: much better! one failing test now... my initial exuberance was unfounded. one test fails in t/pmc/threads.t, but hundreds fail in the rest of the test suite. it seems this line (from above) is the culprit: -# ifdef _MCS_VER1 +# ifdef _MCS_VER so it seems the definition of THREAD_CREATE_JOINABLE() (which follows this directive in include/parrot/thr_windows.h) is incorrect. On 5/19/05, Leopold Toetsch [EMAIL PROTECTED] wrote: Vladimir Lipsky [EMAIL PROTECTED] wrote: Parrot_really_destroy needs to be fixed $verbose++ please, thanks yes, please. until this issue is fixed, i'm rolling back these patches so the threads test 6 is again skipped on windows, and the 200-odd failing tests will work again. feel free to send more patches, i'll happily test them (more carefully next time) and work out the bugs before applying. patch applied as r8165. ~jerry
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Vladimir Lipsky [EMAIL PROTECTED] wrote: 1) Why the heck Easy: it's not in the MANIFEST. Why: patches scattered between inline and attached and the MANIFEST part missing ... it's easy to overlook. -# ifdef _MCS_VER1 +# ifdef _MCS_VER Thanks, applied - hope that's really the whole thing now ;-) leo
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
jerry gay [EMAIL PROTECTED] wrote: much better! one failing test now... D:\usr\local\parrot-HEAD\trunkperl t/harness t/pmc/threads.t t/pmc/threadsok 3/11# Failed test (t/pmc/threads.t at line 163) # got: 'start 1 # in thread # done # Can't spawn .\parrot.exe D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm: Bad file descriptor at lib/Parrot/Test.pm line 231. # ' # expected: 'start 1 t/pmc/threadsNOK 4# in thread # done # ' # '.\parrot.exe D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm' failed with exit code 255 Parrot_really_destroy needs to be fixed
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Jerry Gay [EMAIL PROTECTED] wrote: here's the patch to unskip test 6: Thanks, applied. leo
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Vladimir Lipsky [EMAIL PROTECTED] wrote: D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm' failed with exit code 255 Parrot_really_destroy needs to be fixed $verbose++ please, thanks leo
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
As stated already, this (and possibly other thread) test(s) can't succeed as long as Win32 has no event loop that passes the terminate event on to the running interpreter. 1) Why the heck --- parrot/config/gen/platform/win32/threads.h Mon May 2 14:40:59 2005 +++ parrot-devel/config/gen/platform/win32/threads.h Mon May 2 14:42:58 2005 @@ -0,0 +1,3 @@ + +#include parrot/thr_windows.h + isn't in the repository? 2) To test both cases(MS compiler and not), I played with the macro #ifdef _MCS_VER in thr_windows.h and forgot 1 at the and of it. The patch applied removes it. Though it couldn't affect the test results as long as thr_windows.h wasn't included at all. mcs_ver.patch Description: Binary data
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Vladimir Lipsky wrote: parrot (r8016): no change. hangs w/98% cpu. here's the -t output: As stated already, this (and possibly other thread) test(s) can't succeed as long as Win32 has no event loop that passes the terminate event on to the running interpreter. The last two pmc's are allocated from a place which is clearly not the pmc pool arena from which other pmc's are allocated. Run is_pmc_ptr(interp, pmc) or check the involved pmc arenas to verify this assumption. leo
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
parrot (r8016): no change. hangs w/98% cpu. here's the -t output: parrot -t test_b.pasm 0 find_global P5, _foo - P5=SArray=PMC(0x7d5a50), 3 new P2, 18 - P2=PMCNULL, 6 find_method P0, P2, thread3- P0=PMCNULL, P2=ParrotThread=PMC(0x7d5a08), 10 new P6, 54 - P6=PMCNULL, 13 set I3, 2- I3=1, 16 invoke 17 set I5, P2 - I5=0, P2=ParrotThread=PMC(0x7d5a08) 20 getinterp P2 - P2=ParrotThread=PMC(0x7d5a08) 22 find_method P0, P2, detach - P0=NCI=PMC(0x638620), P2=ParrotInterpr eter=PMC(0x637dc8), 26 invoke 27 defined I0, P6 - I0=1, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, 27 defined I0, P6 - I0=0, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, 27 defined I0, P6 - I0=0, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, 27 defined I0, P6 - I0=0, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, 27 defined I0, P6 - I0=0, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, 27 defined I0, P6 - I0=0, P6=TQueue=PMC(0x7d59f0) 30 unless I0, -3- I0=0, etc On 5/15/05, Vladimir Lipsky [EMAIL PROTECTED] wrote: the 'detatch' threads test hangs on win32. this small patch skips one Could you try the following code('the detatch' threads test with one tweak) and tell me if it hangs either and what output you get? find_global P5, _foo new P2, .ParrotThread find_method P0, P2, thread3 new P6, .TQueue # need a flag that thread is done set I3, 2 invoke # start the thread set I5, P2 getinterp P2 find_method P0, P2, detach invoke wait: defined I0, P6 unless I0, wait print done\n sleep 5 # Maybe a race condition? end .pcc_sub _foo: print thread\n new P2, .Integer push P6, P2 # push item on queue returncc
Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32
Jerry Gay [EMAIL PROTECTED] wrote: the 'detatch' threads test hangs on win32. this small patch skips one test, so others may fail :) Thanks, applied. leo
[perl #35305] [PATCH] skip threads 'detatch' test on win32
# New Ticket Created by jerry gay # Please include the string: [perl #35305] # in the subject line of all future correspondence about this issue. # URL: https://rt.perl.org/rt3/Ticket/Display.html?id=35305 the 'detatch' threads test hangs on win32. this small patch skips one test, so others may fail :) ~jerry Index: t/pmc/threads.t === --- t/pmc/threads.t (revision 7994) +++ t/pmc/threads.t (working copy) @@ -263,6 +263,8 @@ 500500 OUTPUT +SKIP: { + skip(detach broken on $^O, 1) if ($^O =~ /MSWin32/); output_like('CODE', 'OUTPUT', detach); find_global P5, _foo new P2, .ParrotThread @@ -290,6 +292,7 @@ CODE /(done\nthread\n)|(thread\ndone\n)/ OUTPUT +} output_is('CODE', 'OUTPUT', share a PMC); find_global P5, _foo
Re: COND macros (was: Threads, events, Win32, etc.)
Parrot's locks will all have wait/signal/broadcast capabilities. We should go rename the macros and rejig the code. This may have to wait Really? I'm not sure I understand what broadcast does on a lock. Are you talking about something like P5's condpair? If so, why not just cop that code? Of course, I don't have a clue what it does on Win32, so maybe that's not such a good idea. GNS
Re: Threads, events, Win32, etc.
[ long win32 proposal ] I've to read through that some more times. OK; let me know if you have any questions on how the Win32 stuff works. I tried to explain things that are unlike POSIX, but of course it makes sense to me. Do you alread have ideas for a common API, or where to split the existing threads.c into platform and common code? I didn't see anything in thread.c that was platform specific -- or at least nothing that looked like it wouldn't work on Win32. Obviously thr_win32.h will be much different than thr_pthreads.h. As for a common API, I suppose we would have to figure out how the modules would interact. In the case of IO (any function names I made up have capital letters to distinguish them from anything that may be in the current codebase): * There is some generic IO code that sets up IO and event objects, and it sits below the buffering layer. All this stuff is going to be thread-safe so the low-level IO code shouldn't have to worry about it. This generic IO code sets up the IO and Event objects indicating what file, which operation (read/write/lock/unlock), what to do when the operation completes, how many bytes, file position, and any memory buffer needed. Then it would pass this information to the OS-specific code. So this generic code (say, StartIO()) would create an IO object that contains the file object, a pointer to the r/w buffer if the operation requires it, and a file position and byte count if the operation uses it. It would also contain an event object indicating what to do in case of failure or completion. StartIO pins the memory buffer, locks the IO object, and calls Win32AsyncRead or whatever. The XXXAsyncYYY funtcion starts the IO and returns to StartIO which unlocks the IO object and returns it to the caller. That can then be passed into functions to find out if it's complete, cancel it, etc. If XXX is Win32, the module would just start Read/WriteFile; if it's Solaris, maybe it'll call aioread/write; if it's POSIX without aio (e.g. Linux), it'll start up a new thread to do a blocking read/write. When the IO completes, the XXX code figures out the return code and how many bytes read/written. This information is passed to the generic CompleteIO(), which locks the IO object, updates the status (return code, byte count), unpins the buffer, dispatches the event as described by the Event object, and unlocks the IO object. * In the case of timers, there would be XXXCreateTimer and XXXCancelTimer, while the XXX code would need to call FireTimer(). * I suppose there should be an XXXQueueEvent function as well, but I'm not certain of its uses. Actual EventDispatch() would be a generic function that puts an event into a thread's queue and notifies the thread. I am not sure how to handle general events like RunGC. * GUI message handlers need to dispatch events synchronously. So the generic check_events function would call XXXGetGUIMessages which would need to dispatch the messages to whatever registered for them in that thread. Since this would amount to method dispatch instead of event dispatch, it would probably need something special for this. GNS
Re: COND macros (was: Threads, events, Win32, etc.)
On Wed, 17 Nov 2004 16:30:04 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Gabe Schaffer [EMAIL PROTECTED] wrote: The problem is a different one: the COND_INIT macro just passes a condition location, the mutex is created in a second step, which isn't needed for windows. OTOH a mutex aka critical section is needed separatly. So we should probably define these macros to be: COND_INIT(c, m) COND_DESTROY(c, m) see src/tsq.c for usage. Does win32 require more info to create conditions/mutexes or would these macros suffice? Win32 doesn't require anything else, but I don't think I like this idea. If you do COND_INIT(c, m) and Win32 ignores the 'm', what happens when some code goes to LOCK(m)? It would work under POSIX but break under Win32. I think there should be an opaque struct that contains c,m for POSIX and c for Win32. GNS
Re: COND macros (was: Threads, events, Win32, etc.)
At 8:42 AM -0500 11/19/04, Gabe Schaffer wrote: On Wed, 17 Nov 2004 16:30:04 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Gabe Schaffer [EMAIL PROTECTED] wrote: The problem is a different one: the COND_INIT macro just passes a condition location, the mutex is created in a second step, which isn't needed for windows. OTOH a mutex aka critical section is needed separatly. So we should probably define these macros to be: COND_INIT(c, m) COND_DESTROY(c, m) see src/tsq.c for usage. Does win32 require more info to create conditions/mutexes or would these macros suffice? Win32 doesn't require anything else, but I don't think I like this idea. If you do COND_INIT(c, m) and Win32 ignores the 'm', what happens when some code goes to LOCK(m)? It would work under POSIX but break under Win32. I think there should be an opaque struct that contains c,m for POSIX and c for Win32. This'll mean that every mutex will have a corresponding condition variable, something that I'm not sure we need. On the other hand, I can't picture us having so many of these that it makes any difference at all, so I don't have a problem with it. It isn't a good general-purpose thread solution (there are plenty of good reasons to unbundle these) but we don't really *need* a general-purpose solution. :) Parrot's locks will all have wait/signal/broadcast capabilities. We should go rename the macros and rejig the code. This may have to wait a little -- we're cleaning up the last of subs, I've still got the string stuff outstanding, and I promised Sam Ruby I'd deal with classes and metaclasses next. So much time, so little to do. No, wait, that's not right... -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Threads, events, Win32, etc.
Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. It should, but it doesn't. Here's the definition: # define COND_WAIT(c,m) pthread_cond_wait(c, m) You are already in the POSIX specific part. It came from thr_pthread.h, so it should be POSIX. The issue here is that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c). Every place in the code, whether it's Win32 or POSIX, is going to have to pass in a condition variable and a mutex. Just because Win32 will ignore the second parameter, that isn't going to prevent the code from creating the mutex, initializing it, and passing it in. I'm not sure, if we even should support Win9{8,5}. I'd be happy with simply implementing Win9x as a non-threaded platform. Of course, hopefully nobody will even ask... We'll see. But as Parrot's IO system is gonna be asynchronous in core, I doubt that we'll support it. Obviously Parrot has to run on non-threaded platforms where the kernel threading and AIO stuff just won't work. You can still do user threads, but file IO will still block everything. rationale. I can understand why there would need to be a global event thread (timers, GC, DoD), but why would passing a message from one thread to another need to be serialized through a global event queue? The main reason for the global event queue isn't message passing. The reason is POSIX signals. Basically you aren't allowed to do anything serious in a signal handler, especially you aren't allowed to broadcast a condition or something. So I came up with that experimental code of one thread doing signals. Yes, there has to be a separate thread to get signals, and each thread needs its own event queue, but why does the process have a global event_queue? I suppose there are generic events that could be handled just by the next thread to call check_events, but that isn't what this sounds like. And as for IO, I see the obvious advantages of performing synchronous IO functions in a separate thread to make them asynchronous, but that sounds like the job of a worker thread pool. There are many ways to implement this, but serializing them all through one queue sounds like a bottleneck to me. Yes. The AIO library is doing that anyway i.e. utilizing a thread pool for IO operations. I don't see why there needs to be a separate thread to listen for IOs to finish. Can't that be the same thread that listens for signals? That is, the IO thread just spends its whole life doing select(). If it got a signal, select() should return EINTR, so the thread could then check a flag to see which signal was raised, queue the event in the proper queue(s), and call select() again. OK, I think I understand why...the event thread is in a loop waiting for somebody to tell it that there's an event in the global event queue...which is really the part I don't get yet. Dan did post a series of documents to the list some time ago. Sorry I'be no exact subject, but with relevant keywords like events you should find it. Yeah, I remember reading some of his discussions with Damien Neil because I think I went to school with him. Anyway, here's my first draft for a Win32 event model: As for a Win32 event model, I think I should clarify what I'm talking about when I say Win32. Win32 IS NOT: The MS Services for Unix package provides a POSIX subsystem for Windows called Interix which is completely separate from Win32 (i.e. no GUI is possible, no Win SDK calls are available). It has fork(), symlinks, pthreads, SysV IPC, POSIX signals, pttys, and maybe even AIO. This config would be compiled like any other Unix variant with its own idiosyncracies. Win32 IS PROBABLY NOT: There are various POSIX emulation layers for Win32, such as cygwin and MinGW. These provide many function calls that Unix programs expect, but only to the degree that the Win32 subsystem allows (e.g. chmod likely will not do anything sensible). Since these programs still run under the Win32 subsystem, Windows GUIs are still possible. I don't know how these will interact with my event model. Win32 IS: This is the standard Win32 API as defined by NT4.0sp6a and higher. If you want to drop support for NT4, then we go to Win2k, but don't gain much. GUI message queues in Win32 are per thread. Each thread has a message queue that is autovivified. Any window that a thread creates has its messages sent to that thread's queue. However, there is no reason that a message actually has to have an associated window. You can send any thread in any process a message, so long as the thread has had its queue autovivified and is not crossing security boundaries. All files or things that look like files can be opened for async access. For example, sockets, files, and pipes can all be async. Any read, write, lock, unlock, or ioctl call can either signal a condition var (Win32 calls them events, and they don't have POSIX
Re: Threads, events, Win32, etc.
Gabe Schaffer [EMAIL PROTECTED] wrote: Yes, there has to be a separate thread to get signals, and each thread needs its own event queue, but why does the process have a global event_queue? I suppose there are generic events that could be handled just by the next thread to call check_events, but that isn't what this sounds like. It's mainly intended for broadcasts and timers. POSIX signals are weird and more or less broken from platform to platform. The only reliable way to get at them is to block the desired signal in all but one thread. This signal gets converted to a global event and from there it can be put into specifc threads if they have installed signal handlers for that signal. But as said the existing code is experimental and is likely to change a lot. I don't see why there needs to be a separate thread to listen for IOs to finish. Can't that be the same thread that listens for signals? That's the plan yes. AIO completion can be delivered as a signal. OK, I think I understand why...the event thread is in a loop waiting for somebody to tell it that there's an event in the global event queue...which is really the part I don't get yet. Well, the event thread is handling timer events on behalf of an interpreter. [ long win32 proposal ] I've to read through that some more times. Do you alread have ideas for a common API, or where to split the existing threads.c into platform and common code? GNS leo
COND macros (was: Threads, events, Win32, etc.)
Gabe Schaffer [EMAIL PROTECTED] wrote: Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. It should, but it doesn't. Here's the definition: # define COND_WAIT(c,m) pthread_cond_wait(c, m) You are already in the POSIX specific part. It came from thr_pthread.h, so it should be POSIX. The issue here is that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c). Well in the mentioned (TODO) platform/win32/threads.h you have to define your own COND_WAIT(c, m) - this is the interface of that macro, as POSIX needs the mutex, but you would ignore the 2nd parameter. Please have a look at the empty defines in include/parrot/threads.h. The problem is a different one: the COND_INIT macro just passes a condition location, the mutex is created in a second step, which isn't needed for windows. OTOH a mutex aka critical section is needed separatly. So we should probably define these macros to be: COND_INIT(c, m) COND_DESTROY(c, m) see src/tsq.c for usage. Does win32 require more info to create conditions/mutexes or would these macros suffice? [ I'll try to answer more in a separate thread ] leo
Re: Threads, events, Win32, etc.
On Mon, 15 Nov 2004 12:57:00 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Gabe Schaffer [EMAIL PROTECTED] wrote: * COND_WAIT takes a mutex because that's how pthreads works, but Win32 condition variables (called events) are kernel objects that do not require any other object to be associated with them. I think this could be cleaned up with further abstraction. Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. It should, but it doesn't. Here's the definition: # define COND_WAIT(c,m) pthread_cond_wait(c, m) It explicitly takes a condition and a mutex, while it should just be passed a Parrot_cond (or something like that): typedef struct { #ifdef pthreads pthread_mutex_t m; pthread_cond_t c; #elseif Win32 HANDLE h; #endif } Parrot_cond; The big issue, though, is with the IO thread. On NT the IO is already async and there are no signals (Ctrl+C is handled with a callback), so each interpreter thread should just be able to handle all of this in the check_events functions. Not all. We need to do check_events() for e.g. message passing too. Win9x doesn't have async IO on files, so it still might require separate threads to do IOs. I'm not sure, if we even should support Win9{8,5}. I'd be happy with simply implementing Win9x as a non-threaded platform. Of course, hopefully nobody will even ask... Anyway, it seems to me that all this event/IO stuff needs significantly more abstraction in order to prevent it from becoming a hacked-up mess of #ifdefs. Yep. The system-specific stuff should be split into platform files. A common Parrot API then talks to platform code. ...However, I couldn't find any docs on this, so I just guessed how it all works based on the source. The current state of the implemented pthread model is summarized in docs/dev/events.pod. Thanks, I didn't see that. My problem isn't with what the implementation does, though -- it's that I don't understand the rationale. I can understand why there would need to be a global event thread (timers, GC, DoD), but why would passing a message from one thread to another need to be serialized through a global event queue? And as for IO, I see the obvious advantages of performing synchronous IO functions in a separate thread to make them asynchronous, but that sounds like the job of a worker thread pool. There are many ways to implement this, but serializing them all through one queue sounds like a bottleneck to me. Au contraire. Your analysis is precise. Do you like to take a shot at a Win32 threads/event model? So we could figure out the necessary splitting of API/implementation. OK. I think I need to have a better understanding on what events actually are, though. Who sends them? What do they mean? Which signals do we actually care about? What are notifications? How will AIO actually be handled? You know, that sort of thing... Maybe there should be a PDD for it? GNS
Re: Threads, events, Win32, etc.
Gabe Schaffer [EMAIL PROTECTED] wrote: On Mon, 15 Nov 2004 12:57:00 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Gabe Schaffer [EMAIL PROTECTED] wrote: * COND_WAIT takes a mutex because that's how pthreads works, but Win32 condition variables (called events) are kernel objects that do not require any other object to be associated with them. I think this could be cleaned up with further abstraction. Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. It should, but it doesn't. Here's the definition: # define COND_WAIT(c,m) pthread_cond_wait(c, m) You are already in the POSIX specific part. 1) During configure parrot includes platform code from files located in config/gen/platform/*/ 2) if a platform doesn't have an implementation the ../generic/ directory is used. 3) $ find config -name threads.h config/gen/platform/generic/threads.h So there is no win32/threads.h (yet ;) If the implementation needs some additional libraries, the hints/* are consulted e.g. config/init/hints/linux.pl:$libs .= ' -lpthread'; I'm not sure, if we even should support Win9{8,5}. I'd be happy with simply implementing Win9x as a non-threaded platform. Of course, hopefully nobody will even ask... We'll see. But as Parrot's IO system is gonna be asynchronous in core, I doubt that we'll support it. The current state of the implemented pthread model is summarized in docs/dev/events.pod. Thanks, I didn't see that. My problem isn't with what the implementation does, though -- it's that I don't understand the rationale. I can understand why there would need to be a global event thread (timers, GC, DoD), but why would passing a message from one thread to another need to be serialized through a global event queue? The main reason for the global event queue isn't message passing. The reason is POSIX signals. Basically you aren't allowed to do anything serious in a signal handler, especially you aren't allowed to broadcast a condition or something. So I came up with that experimental code of one thread doing signals. And as for IO, I see the obvious advantages of performing synchronous IO functions in a separate thread to make them asynchronous, but that sounds like the job of a worker thread pool. There are many ways to implement this, but serializing them all through one queue sounds like a bottleneck to me. Yes. The AIO library is doing that anyway i.e. utilizing a thread pool for IO operations. Au contraire. Your analysis is precise. Do you like to take a shot at a Win32 threads/event model? So we could figure out the necessary splitting of API/implementation. OK. I think I need to have a better understanding on what events actually are, though. Who sends them? What do they mean? Which signals do we actually care about? What are notifications? How will AIO actually be handled? You know, that sort of thing... Maybe there should be a PDD for it? Dan did post a series of documents to the list some time ago. Sorry I'be no exact subject, but with relevant keywords like events you should find it. GNS leo
Threads, events, Win32, etc.
I was just browsing the Parrot source, and noticed that the threading implementation is a bit Unix/pthread-centric. For example: * COND_WAIT takes a mutex because that's how pthreads works, but Win32 condition variables (called events) are kernel objects that do not require any other object to be associated with them. I think this could be cleaned up with further abstraction. * CLEANUP_PUSH doesn't have any Win32 analog that I know of, although it's not clear why this might be needed for Parrot anyway. Right now it just looks like it's used to prevent threads from abandoning a mutex, which isn't a problem with Win32. The big issue, though, is with the IO thread. On NT the IO is already async and there are no signals (Ctrl+C is handled with a callback), so each interpreter thread should just be able to handle all of this in the check_events functions. That is, AIO and timers allow you to specify a completion callback (asynchronous procedure call) that gets executed once you tell the OS that you're ready for them (e.g. via Sleep), so the whole event dispatching system may not even be necessary. Win9x doesn't have async IO on files, so it still might require separate threads to do IOs. Note that the Windows message queue does not really get involved here (unless you want it to), as it is mainly for threads that have UIs or use COM/DDE. Anyway, it seems to me that all this event/IO stuff needs significantly more abstraction in order to prevent it from becoming a hacked-up mess of #ifdefs. However, I couldn't find any docs on this, so I just guessed how it all works based on the source. Feel free to whack me with a cluestick if I'm wrong about anything. GNS
Re: Threads, events, Win32, etc.
Gabe Schaffer [EMAIL PROTECTED] wrote: I was just browsing the Parrot source, and noticed that the threading implementation is a bit Unix/pthread-centric. For example: * COND_WAIT takes a mutex because that's how pthreads works, but Win32 condition variables (called events) are kernel objects that do not require any other object to be associated with them. I think this could be cleaned up with further abstraction. Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. * CLEANUP_PUSH doesn't have any Win32 analog that I know of, although it's not clear why this might be needed for Parrot anyway. Right now it just looks like it's used to prevent threads from abandoning a mutex, which isn't a problem with Win32. Yes. And it'll very likely go away. But anyway - it's a define by the platform. So you can define it being a noop for win32. The big issue, though, is with the IO thread. On NT the IO is already async and there are no signals (Ctrl+C is handled with a callback), so each interpreter thread should just be able to handle all of this in the check_events functions. Not all. We need to do check_events() for e.g. message passing too. Win9x doesn't have async IO on files, so it still might require separate threads to do IOs. I'm not sure, if we even should support Win9{8,5}. Anyway, it seems to me that all this event/IO stuff needs significantly more abstraction in order to prevent it from becoming a hacked-up mess of #ifdefs. Yep. The system-specific stuff should be split into platform files. A common Parrot API then talks to platform code. ...However, I couldn't find any docs on this, so I just guessed how it all works based on the source. The current state of the implemented pthread model is summarized in docs/dev/events.pod. ... Feel free to whack me with a cluestick if I'm wrong about anything. Au contraire. Your analysis is precise. Do you like to take a shot at a Win32 threads/event model? So we could figure out the necessary splitting of API/implementation. GNS leo
Re: Threads, events, Win32, etc.
At 12:57 PM +0100 11/15/04, Leopold Toetsch wrote: Gabe Schaffer [EMAIL PROTECTED] wrote: I was just browsing the Parrot source, and noticed that the threading implementation is a bit Unix/pthread-centric. For example: * COND_WAIT takes a mutex because that's how pthreads works, but Win32 condition variables (called events) are kernel objects that do not require any other object to be associated with them. I think this could be cleaned up with further abstraction. Not quite. COND_WAIT takes an opaque type defined by the platform, that happens to be a mutex for the pthreads based implementation. Yep. This is important to note -- the joys of portability often means that functions in the source carry parameters that might not actually get used. That's the case here, since POSIX threads (which the unices and VMS use for their threading model) requires a mutex. I fully expect we'll have similar bits carried around to accomodate windows too. The big issue, though, is with the IO thread. On NT the IO is already async and there are no signals (Ctrl+C is handled with a callback), so each interpreter thread should just be able to handle all of this in the check_events functions. Not all. We need to do check_events() for e.g. message passing too. And notifications, and possibly cleanup of objects with finalizers. Win9x doesn't have async IO on files, so it still might require separate threads to do IOs. I'm not sure, if we even should support Win9{8,5}. Nope. Or, rather, we officially don't care if we run on Win9x/WinME. If we do, swell. If not, well... Win9x isn't particularly special here. We feel the same about AmigaDOS, VMS 5.5, HP/UX 10.x, SunOS, Linux 1.x, and BeOS. Amongst others. Anyway, it seems to me that all this event/IO stuff needs significantly more abstraction in order to prevent it from becoming a hacked-up mess of #ifdefs. Yep. The system-specific stuff should be split into platform files. A common Parrot API then talks to platform code. Yeah. The event stuff's definitely primitive, and not much thought's been given to it as of yet. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [Proposal] JIT, exec core, threads, and architectures
Jeff Clites [EMAIL PROTECTED] wrote: On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote: Nethertheless we have to create managed objects (a Packfile PMC) so that we can recycle unused eval-segments. True, and some eval-segments are done as soon as they run (eval 3 + 4), whereas others may result in code which needs to stay around (eval sub {}), and even in the latter case not _all_ of the code generated in the eval would need to stay around. It seems that it may be hard to determine what can be recycled, and when. Well, not really. As long as you have a reference to the code piece, it's alive. And we have to protect the packfile dictionary with mutexes, when this managing structure changes i.e. when new segments gets chained into this list or when they get destroyed. Yes, though it's not clear to me if all eval-segments will need to go into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case above would not.) It probably depends on the generated code. If this code creates globals (e.g. Sub PMCs) it ought to stay around. [ toss constant op variations ] For PIR yes, but the PASM assembler can't know for sure what register would be safe to use--the code could be using its own obscure calling conventions. PASM would need rewriting to only use the available ops, basically. JEff leo
Re: [Proposal] JIT, exec core, threads, and architectures
On Oct 19, 2004, at 1:56 AM, Leopold Toetsch wrote: Jeff Clites [EMAIL PROTECTED] wrote: On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote: Nethertheless we have to create managed objects (a Packfile PMC) so that we can recycle unused eval-segments. True, and some eval-segments are done as soon as they run (eval 3 + 4), whereas others may result in code which needs to stay around (eval sub {}), and even in the latter case not _all_ of the code generated in the eval would need to stay around. It seems that it may be hard to determine what can be recycled, and when. Well, not really. As long as you have a reference to the code piece, it's alive. Yes, that's what I meant. In the case of: $sum = eval 3 + 4; you don't have any such reference. In the case of: $sub = eval sub { return 7 }; you do. In the case of: $sub = eval 3 + 4; sub { return 7 }; you've got a reference to the sub still, but the 3 + 4 code is no longer reachable, so wouldn't need to stay around. But it's possible that Parrot won't be able to tell the difference, and will have to keep around more than is necessary. And we have to protect the packfile dictionary with mutexes, when this managing structure changes i.e. when new segments gets chained into this list or when they get destroyed. Yes, though it's not clear to me if all eval-segments will need to go into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case above would not.) It probably depends on the generated code. If this code creates globals (e.g. Sub PMCs) it ought to stay around. Yes, that's what I meant by not all--some yes, some no. JEff
Re: [Proposal] JIT, exec core, threads, and architectures
Jeff Clites wrote: On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote: String, number (and PMC) constants are all addressed in terms of the compiling interpreter. ... When we do an eval() e.g. in a loop, we have to create a new constant table (and recycle it later, which is a different problem). Running such a compiled piece of code with different threads would currently do the wrong thing. The correct constant table depends on the code segment, rather than the specific interpreter, right? Yes. ...That means that referencing the absolute address of the const table entry would be correct for JIT code no matter the executing thread, but getting the const table from the compiling interpreter is wrong if that interpreter isn't holding a reference to the corresponding code segment. Thinking more about that I've to admit that my above conclusion is very likely wrong. When a piece of code gets compiled, we can say that we are creating a read-only data structure (code, constants, metadata). This data structure can be shared between different threads and using absolute addresses for the constants is ok. Nethertheless we have to create managed objects (a Packfile PMC) so that we can recycle unused eval-segments. And we have to protect the packfile dictionary with mutexes, when this managing structure changes i.e. when new segments gets chained into this list or when they get destroyed. Access to constants in the constant table is not only a problem for the JIT runcore, its a lengthy operation for all code, For a string constant at PC[i]: interpreter-code-const_table-constants[PC[i]]-u.string So for JIT and prederefed cores that's not a problem. OTOH it might be better to just toss all the constant table access in all instructions, except: set_n_nc set_s_sc set_p_pc # alias set_p_kc This would reduce the interpreter size significantly (compare the size of core_ops_cgp.o with core_ops_cg.o). Reducing the size is good, but this doesn't overall reduce the number of accesses to the constant table, just changes which op is doing them. Not quite, e.g: set P0[foo], bar set S0, P0[foo] are 3 accesses to the constant table. It would be set S1, foo set S2, bar set P0[S1], S2 set S0, P0[S1] with 2 accesses as long as there is no pressure on the register allocator. The assembler could still allow all constant variations of opcodes and just translate it. For this we'd need a special register to hold the loaded constant, so that we don't overwrite a register which is in use. No, just the registers we have anyway, JEff leo
Re: [Proposal] JIT, exec core, threads, and architectures
On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote: Jeff Clites wrote: On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote: Nethertheless we have to create managed objects (a Packfile PMC) so that we can recycle unused eval-segments. True, and some eval-segments are done as soon as they run (eval 3 + 4), whereas others may result in code which needs to stay around (eval sub {}), and even in the latter case not _all_ of the code generated in the eval would need to stay around. It seems that it may be hard to determine what can be recycled, and when. And we have to protect the packfile dictionary with mutexes, when this managing structure changes i.e. when new segments gets chained into this list or when they get destroyed. Yes, though it's not clear to me if all eval-segments will need to go into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case above would not.) OTOH it might be better to just toss all the constant table access in all instructions, except: set_n_nc set_s_sc set_p_pc # alias set_p_kc This would reduce the interpreter size significantly (compare the size of core_ops_cgp.o with core_ops_cg.o). Reducing the size is good, but this doesn't overall reduce the number of accesses to the constant table, just changes which op is doing them. Not quite, e.g: set P0[foo], bar set S0, P0[foo] are 3 accesses to the constant table. It would be set S1, foo set S2, bar set P0[S1], S2 set S0, P0[S1] with 2 accesses as long as there is no pressure on the register allocator. Sure, but you can do this optimization today--narrowing it down to just those 3 ops isn't required. But it only helps if local re-use of the same constants is frequent, and it may not be. (But still, it's a good optimization for a compiler to implement--it just may not have a huge effect.) Also, there's some subtlety. This: set S1, foo set S2, foo isn't the same as: set S1, foo set S2, S1 but rather: set S1, foo clone S2, S1 since 'set' copies in the s_sc case. (That's not a problem, just something to keep in mind.) The assembler could still allow all constant variations of opcodes and just translate it. For this we'd need a special register to hold the loaded constant, so that we don't overwrite a register which is in use. No, just the registers we have anyway, For PIR yes, but the PASM assembler can't know for sure what register would be safe to use--the code could be using its own obscure calling conventions. JEff
Re: [Proposal] JIT, exec core, threads, and architectures
Jeff Clites [EMAIL PROTECTED] wrote: On Oct 14, 2004, at 12:10 PM, Leopold Toetsch wrote: Proposal: * we mandate that JIT code uses interpreter-relative addressing - because almost all platforms do it - because some platforms just can't do anything else - and of course to avoid re-JITting for every new thread FYI, the PPC JIT does already do parrot register addressing relative to the interpreter pointer, which as you said is already in a CPU register. This is actually less instructions than using absolute addressing would require (one rather than three). Yes, and not only PPC, *all* but i386. We do still re-JIT for each thread on PPC, though we wouldn't have to (just never changed it to not). Doing that or not depending on a specific JIT platform is error prone and clutters the source code. ... But, we use this currently, because there is one issue with threads: With a thread, you don't start from the beginning of the JITted code segment, This isn't a threading issue. We can always start execution in the middle of one segment, e.g. after an exception. That's already handled on almost all JIT platforms and no problem. The code emitted in Parrot_jit_begin gets the Ccur_opcode * as argument and has to branch there, always. JEff leo
Re: [Proposal] JIT, exec core, threads, and architectures
Jeff Clites wrote: We do still re-JIT for each thread on PPC, though we wouldn't have to The real problem that all JIT architectures still have is a different one: its called const_table and hidden either in the CONST macro or in syntax like NUM_CONST, which is translated by the jit2h.pl utility. String, number (and PMC) constants are all addressed in terms of the compiling interpreter. Basically everywhere, where the exec code adds text relocations we aren't safe (e.g. load_nc in the PPC jit_emit code). When we do an eval() e.g. in a loop, we have to create a new constant table (and recycle it later, which is a different problem). Running such a compiled piece of code with different threads would currently do the wrong thing. Access to constants in the constant table is not only a problem for the JIT runcore, its a lengthy operation for all code, For a string constant at PC[i]: interpreter-code-const_table-constants[PC[i]]-u.string These are 3 indirections to get at the constants pointer array, and worse they depend on each other, emitting these 3 instructions on an i386 stalls for 1 cycle twice (but the compiler is clever and interleaves other instructions) For the JIT core, we can precalculate the location of the constants array and store it in the stack or even in a register (on not so register-crippled machines like i386). It only needs reloading, when an Cinvoke statement is emitted. OTOH it might be better to just toss all the constant table access in all instructions, except: set_n_nc set_s_sc set_p_pc # alias set_p_kc This would reduce the interpreter size significantly (compare the size of core_ops_cgp.o with core_ops_cg.o). The assembler could still allow all constant variations of opcodes and just translate it. leo
Re: [Proposal] JIT, exec core, threads, and architectures
On Oct 16, 2004, at 12:26 AM, Leopold Toetsch wrote: Jeff Clites [EMAIL PROTECTED] wrote: ... But, we use this currently, because there is one issue with threads: With a thread, you don't start from the beginning of the JITted code segment, This isn't a threading issue. We can always start execution in the middle of one segment, e.g. after an exception. That's already handled on almost all JIT platforms and no problem. The code emitted in Parrot_jit_begin gets the Ccur_opcode * as argument and has to branch there, always. I was remembering wrong--we do this on PPC too. On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote: String, number (and PMC) constants are all addressed in terms of the compiling interpreter. ... When we do an eval() e.g. in a loop, we have to create a new constant table (and recycle it later, which is a different problem). Running such a compiled piece of code with different threads would currently do the wrong thing. The correct constant table depends on the code segment, rather than the specific interpreter, right? That means that referencing the absolute address of the const table entry would be correct for JIT code no matter the executing thread, but getting the const table from the compiling interpreter is wrong if that interpreter isn't holding a reference to the corresponding code segment. Access to constants in the constant table is not only a problem for the JIT runcore, its a lengthy operation for all code, For a string constant at PC[i]: interpreter-code-const_table-constants[PC[i]]-u.string These are 3 indirections to get at the constants pointer array, and worse they depend on each other, emitting these 3 instructions on an i386 stalls for 1 cycle twice (but the compiler is clever and interleaves other instructions) For the JIT core, we can precalculate the location of the constants array and store it in the stack or even in a register (on not so register-crippled machines like i386). It only needs reloading, when an Cinvoke statement is emitted. For PPC JIT, it seems that we are putting in the address of the specific const table entry, as an immediate. OTOH it might be better to just toss all the constant table access in all instructions, except: set_n_nc set_s_sc set_p_pc # alias set_p_kc This would reduce the interpreter size significantly (compare the size of core_ops_cgp.o with core_ops_cg.o). Reducing the size is good, but this doesn't overall reduce the number of accesses to the constant table, just changes which op is doing them. The assembler could still allow all constant variations of opcodes and just translate it. For this we'd need a special register to hold the loaded constant, so that we don't overwrite a register which is in use. JEff
Re: [Proposal] JIT, exec core, threads, and architectures
On Oct 14, 2004, at 12:10 PM, Leopold Toetsch wrote: Proposal: * we mandate that JIT code uses interpreter-relative addressing - because almost all platforms do it - because some platforms just can't do anything else - and of course to avoid re-JITting for every new thread FYI, the PPC JIT does already do parrot register addressing relative to the interpreter pointer, which as you said is already in a CPU register. This is actually less instructions than using absolute addressing would require (one rather than three). We do still re-JIT for each thread on PPC, though we wouldn't have to (just never changed it to not). But, we use this currently, because there is one issue with threads: With a thread, you don't start from the beginning of the JITted code segment, but rather you need to start with a specific Parrot function call, somewhere in the middle. But you can't just jump to that instruction, because it would not have the setup code needed when entering the JITted section. So currently, we use a technique whereby the beginning of the JITted section has, right after the setup code, a jump to the correct starting address--in the main thread case, this is just a jump to the next instruction (essentially a noop), but in the thread case, it's a jump to the function which the thread is going to run. So right now the JITted code for a secondary thread differs by one instruction from that for the main thread. We'll need to work out a different mechanism for handling this--probably just a tiny separate JITted section to set things up for a secondary thread, before doing an inter-section jump to the right place. JEff
[Proposal] JIT, exec core, threads, and architectures
First some facts: - all JIT platforms *except* i386 have a register reserved for the runtime interpreter - Parrot register addressing is done relative to that CPU register - that would allow to reuse the JITted code for different threads aka interpreters - but because of i386 is using absolute addresses that's not done - the latter point also is the cause for zillions of text relocations currently needed for the EXEC/i386 run core. I can't imagine that this will speed up program start :) Proposal: * we mandate that JIT code uses interpreter-relative addressing - because almost all platforms do it - because some platforms just can't do anything else - and of course to avoid re-JITting for every new thread * src/jit.c calls some platform interface functions, which copy between Parrot registers and CPU registers. These are called at the begin and end of JITted code sections and are now based on absolute memory addresses. This should be changed to use offsets. To accomodate platforms that have the interpreter cached in a CPU register, I'm thinking of the following interface: MACRO int Parrot_jit_emit_get_base_reg_no(jit_info *pc) // possibly emit code and return register number of base pointer // this register should be one of the scratch registers // this must be a macro used as Creg = foo(pc); Parrot_jit_emit_mov_MR(,..int base_reg_no, size_t offset, int src_reg_no) ... other 3 move functions similar Register addressing is done relative to the base pointer, which currently is REG_INT(0) or the interpreter, but that might change. The code to load this register will be emitted just before the actual register moves are done. It's currenlty a noop for all but i386, which would need one instruction mov 16(%ebp), %eax * Currently all platforms are using homegrown defines to calculate the register offset, some of these are even readable ;) To get rid of that, we should provide a set of macros that calculate the register offset relative to the base pointer. REG_OFFS_INT(x) // get offset for INTVAL reg no x REG_OFFS_STR(x) ... * Implementation First the framework should be implemented. To allow some transition time the old register move semantics remain for some time. Depending on the defined()ness of CParrot_jit_emit_get_base_reg_no the new code will be used. Comments welcome leo
Update to Threads/IO issue on Cygwin
Since all the tests were passing in the past, I decided to play the CVS game to find exactly when/what changed. Good news is - nothing to do with Parrot Bad news is - it means it was an upgrade to Cygwin, which I also do on a daily basis. I have no way of tracking down what changed but I could ping the Cygwin list if anyone thinks it might help. Cheers, Joshua Gatcomb a.k.a. Limbic~Region ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com
Re: Update to Threads/IO issue on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: Since all the tests were passing in the past, I decided to play the CVS game to find exactly when/what changed. Good news is - nothing to do with Parrot Good, thanks for taking the time to investigate that. Bad news is - it means it was an upgrade to Cygwin, which I also do on a daily basis. Well, there are probably releases which get used most. A daily update always has some risk, Cheers, Joshua Gatcomb leo
Another Update to threads/IO problem on Cygwin
I happened to have found the last cygwin1.dll lying around in /tmp that I kept as a backup. I swapped it with the current cygwin1.dll just to see if it would make the IO problem go away and much to my happy surprise - it did. Details: cygwin1.dll-1.5.10-3 - previous stable build, works great cygwin1.dll-1.5.11-1 - current stable build, blows up I will be pinging the Cygwin list momentarily to see if they have any insight. Cheers Joshua Gatcomb a.k.a. Limbic~Region ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com
Re: Another Update to threads/IO problem on Cygwin
--- Joshua Gatcomb [EMAIL PROTECTED] wrote: I happened to have found the last cygwin1.dll lying around in /tmp that I kept as a backup. I swapped it with the current cygwin1.dll just to see if it would make the IO problem go away and much to my happy surprise - it did. Details: cygwin1.dll-1.5.10-3 - previous stable build, works great cygwin1.dll-1.5.11-1 - current stable build, blows up I will be pinging the Cygwin list momentarily to see if they have any insight. I didn't get a response from the Cygwin list, but I asked one of the Cygwin knowledgeable monks at the Monastery (http://www.perlmonks.org). They indicated there was a major problem with 1.5.11-1 with threads losing output (my problem exactly) and that it was corrected with one of the latest snapshots (http://cygwin.com/snapshots/). I downloaded it and tried it - everything is working great. make test - all tests pass make testj - all tests pass Both work even with some aggressive optimizations passed to Configure.pl All is once again right in the world ;-) Cheers Joshua Gatcomb a.k.a. Limbic~Region __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: [ Cygwin thread tests don't print all ] Does this patch help? It creates shared IO resources. But its of course not final: there are no precautions against one thread changing the PIO of another thread or such, no locks yet, nothing. leo --- parrot/src/inter_create.c Fri Oct 1 15:26:26 2004 +++ parrot-leo/src/inter_create.c Sat Oct 2 12:06:10 2004 @@ -31,6 +31,12 @@ #define ATEXIT_DESTROY /* + * experimental: use shared IO resources for threads + */ + +#define PARROT_SHARED_IO 1 + +/* =item Cstatic int is_env_var_set(const char* var) @@ -125,7 +131,15 @@ /* PANIC will fail until this is done */ SET_NULL(interpreter-piodata); +#if PARROT_SHARED_IO +if (parent) { +interpreter-piodata = parent-piodata; +} +else +PIO_init(interpreter); +#else PIO_init(interpreter); +#endif if (is_env_var_set(PARROT_GC_DEBUG)) { #if ! DISABLE_GC_DEBUG @@ -225,6 +239,9 @@ setup_default_compreg(interpreter); /* setup stdio PMCs */ +#if PARROT_SHARED_IO +if (!parent) +#endif PIO_init(interpreter); /* Done. Return and be done with it */ @@ -330,6 +347,9 @@ */ /* Now the PIOData gets also cleared */ +#if PARROT_SHARED_IO +if (!interpreter-parent_interpreter) +#endif PIO_finish(interpreter); /*
Re: Threads on Cygwin
On Saturday 02 October 2004 12:49, Leopold Toetsch wrote: Does this patch help? No, it makes things worse: --- without-patch.txt 2004-10-03 14:35:58.824775096 +0200 +++ with-patch.txt 2004-10-03 14:35:37.843964664 +0200 @@ -30,7 +30,12 @@ # expected: '500500 # 500500 # ' -ok 6 - detach +not ok 6 - detach +# Failed test (t/pmc/threads.t at line 257) +# 'thread +# ' +# doesn't match '/(done\nthread\n)|(thread\ndone\n)/ +# ' not ok 7 - share a PMC # Failed test (t/pmc/threads.t at line 285) # got: 'thread @@ -73,4 +78,4 @@ # ' ok 10 # skip no shared PerlStrings yet ok 11 # skip no shared PerlStrings yet -# Looks like you failed 6 tests of 11. +# Looks like you failed 7 tests of 11. without-patch.txt: $ perl -Ilib t/pmc/threads.t 1..11 ok 1 - interp identity not ok 2 - thread type 1 # Failed test (t/pmc/threads.t at line 61) # got: 'thread 1 # ' # expected: 'thread 1 # main 10 # ' not ok 3 - thread type 2 # Failed test (t/pmc/threads.t at line 98) # got: 'ok 1 # ok 2 # hello from 1 thread # ParrotThread tid 1 # Sub # ' # expected: 'ok 1 # ok 2 # hello from 1 thread # ParrotThread tid 1 # Sub # from 10 interp # ' ok 4 - thread - kill not ok 5 - join, get retval # Failed test (t/pmc/threads.t at line 189) # got: '' # expected: '500500 # 500500 # ' not ok 6 - detach # Failed test (t/pmc/threads.t at line 257) # 'thread # ' # doesn't match '/(done\nthread\n)|(thread\ndone\n)/ # ' not ok 7 - share a PMC # Failed test (t/pmc/threads.t at line 285) # got: 'thread # 20 # ' # expected: 'thread # 20 # done # 21 # ' not ok 8 - multi-threaded # Failed test (t/pmc/threads.t at line 320) # got: '3 # 1 # 2 # 3 # done thread # ' # expected: '3 # 1 # 2 # 3 # done thread # done main # ' not ok 9 - multi-threaded strings via SharedRef # Failed test (t/pmc/threads.t at line 368) # got: '3 # ok 1 # ok 2 # ok 3 # done thread # ' # expected: '3 # ok 1 # ok 2 # ok 3 # done thread # done main # ' ok 10 # skip no shared PerlStrings yet ok 11 # skip no shared PerlStrings yet # Looks like you failed 7 tests of 11. without-patch.txt: $ perl -Ilib t/pmc/threads.t 1..11 ok 1 - interp identity not ok 2 - thread type 1 # Failed test (t/pmc/threads.t at line 61) # got: 'thread 1 # ' # expected: 'thread 1 # main 10 # ' not ok 3 - thread type 2 # Failed test (t/pmc/threads.t at line 98) # got: 'ok 1 # ok 2 # hello from 1 thread # ParrotThread tid 1 # Sub # ' # expected: 'ok 1 # ok 2 # hello from 1 thread # ParrotThread tid 1 # Sub # from 10 interp # ' ok 4 - thread - kill not ok 5 - join, get retval # Failed test (t/pmc/threads.t at line 189) # got: '' # expected: '500500 # 500500 # ' ok 6 - detach not ok 7 - share a PMC # Failed test (t/pmc/threads.t at line 285) # got: 'thread # 20 # ' # expected: 'thread # 20 # done # 21 # ' not ok 8 - multi-threaded # Failed test (t/pmc/threads.t at line 320) # got: '3 # 1 # 2 # 3 # done thread # ' # expected: '3 # 1 # 2 # 3 # done thread # done main # ' not ok 9 - multi-threaded strings via SharedRef # Failed test (t/pmc/threads.t at line 368) # got: '3 # ok 1 # ok 2 # ok 3 # done thread # ' # expected: '3 # ok 1 # ok 2 # ok 3 # done thread # done main # ' ok 10 # skip no shared PerlStrings yet ok 11 # skip no shared PerlStrings yet # Looks like you failed 6 tests of 11. jens
Re: Threads on Cygwin
--- Jens Rieks [EMAIL PROTECTED] wrote: On Saturday 02 October 2004 12:49, Leopold Toetsch wrote: Does this patch help? No, it makes things worse: Actually it doesn't. There is something wrong with threads_6.pasm as my output for the test doesn't change with or without the patch and yet one passes and the other doesn't. Judging from the actual code it is supposed to print done in there somewhere and it doesn't. Test 6 is one of the few that has a regex for checking output : /(done\nthread\n)|(thread\ndone\n)/ So I am rather confused as to why it is passing without the patch since it only ever prints thread jens Joshua Gatcomb a.k.a. Limbic~Region __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: PIO_OS_UNIX is the one defined and now parrot squawks Polly wanna Unix everytime I run it ;-) Now what? Fix the thread related IO bug? Seriously, I don't know yet, if the IO initialization is done correctly for threads. Currently each thread has its own IO subsystem, which might be wrong. It could be that the IO PMCs for standard handles (or for all open files?) have to be shared between threads. leo
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: ... only 1 of the two messages is displayed I've fixed a flaw in the IO flush code. Please try again, thanks. leo
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: I agree, but that doesn't explain why only 1 of the two messages is displayed to the screen when the sleep statement is present. Overlooked that in the first place. So what you get is that the one *or* the other string is displayed. That's a serious problem, likely related to the IO subsystem. So the first question is: which IO system is active on Cygwin: the windows or the unix variant? But looking at the source code it seems that the IO system shutdown code is rather broken: only stdout and stderr streams are flushed but not the actual PIOs. I'll try to fix that. Joshua Gatcomb leo
Re: Threads on Cygwin
--- Leopold Toetsch [EMAIL PROTECTED] wrote: Joshua Gatcomb [EMAIL PROTECTED] wrote: ... only 1 of the two messages is displayed I've fixed a flaw in the IO flush code. Please try again, thanks. Still not working, but thanks! The behavior has changed a bit though. Here is the behavior prior to the fix - notice the location of the sleep statement Case 1: (as checked out) $ cat t/pmc/threads_2.pasm snipped set I3, 1 invoke # start the thread sleep 1 print main print I5 $ ./parrot t/pmc/threads_2.pasm thread 1 Case 2: (remove sleep all together) $ ./parrot t/pmc/threads_2.pasm main 10 thread 1 Case 3: $ cat t/pmc/threads_2.pasm snipped invoke # start the thread print main sleep 1 print I5 $ ./parrot t/pmc/threads_2.pasm main 10 After the change - case 3 now prints thread 1. You mentioned in the previous email that you were interested in knowing if this was Windows IO or the Cygwin variant. I would love to give you that information, but color me clueless. leo Joshua Gatcomb a.k.a. Limbic~Region ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: After the change - case 3 now prints thread 1. Strange. You mentioned in the previous email that you were interested in knowing if this was Windows IO or the Cygwin variant. I would love to give you that information, but color me clueless. S/Cygwin/unix/ Have a look at the defines in include/parrot/io.h. It's not quite visible which one is active w/o debugger, but you could insert some print statements in io/io.c:PIO_init_stacks(), where there are explicit cases for PIO_OS_*. Joshua Gatcomb leo
Re: Threads on Cygwin
--- Leopold Toetsch [EMAIL PROTECTED] wrote: Joshua Gatcomb [EMAIL PROTECTED] wrote: After the change - case 3 now prints thread 1. Strange. indeed You mentioned in the previous email that you were interested in knowing if this was Windows IO or the Cygwin variant. I would love to give you that information, but color me clueless. S/Cygwin/unix/ Have a look at the defines in include/parrot/io.h. It's not quite visible which one is active w/o debugger, but you could insert some print statements in io/io.c:PIO_init_stacks(), where there are explicit cases for PIO_OS_*. s/S/s/ PIO_OS_UNIX is the one defined and now parrot squawks Polly wanna Unix everytime I run it ;-) Now what? Joshua Gatcomb leo Joshua Gatcomb a.k.a. Limbic~Region __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
Re: Threads on Cygwin
Joshua Gatcomb [EMAIL PROTECTED] wrote: Up until a couple of weeks ago, all the threads tests were passing on Cygwin. I had submitted a patch some time ago that never got applied enabling tests for threads, timer, and extend_13 that never got applied. I figured there was good reason ... Overlooked? Please rediff and resend. It says at the bottom that the output could appear in reversed order and so I am guessing the sleep statement is to ensure that it comes out in the proper order. The sleep is of course a hack only and wrong. The real thing todo is to convert the test result into a regexp that allows both orderings. leo
Re: Threads on Cygwin
--- Leopold Toetsch [EMAIL PROTECTED] wrote: Joshua Gatcomb [EMAIL PROTECTED] wrote: I had submitted a patch some time ago that never got applied enabling tests for threads, timer, and extend_13. Overlooked? Please rediff and resend. I will do - likely tomorrow. It says at the bottom that the output could appear in reversed order and so I am guessing the sleep statement is to ensure that it comes out in the proper order. The sleep is of course a hack only and wrong. The real thing todo is to convert the test result into a regexp that allows both orderings. I agree, but that doesn't explain why only 1 of the two messages is displayed to the screen when the sleep statement is present. I don't want to brush a bug under the rug. If one thread finishes before the other thread gets to a print statement, the print does not appear on the screen at all. leo Joshua Gatcomb a.k.a. Limbic~Region __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail
Threads on Cygwin
Up until a couple of weeks ago, all the threads tests were passing on Cygwin. I had submitted a patch some time ago that never got applied enabling tests for threads, timer, and extend_13 that never got applied. I figured there was good reason so I didn't say anything about the tests failing except an occasional that's weird on #parrot. So today I decide to look at threads_2.pasm It says at the bottom that the output could appear in reversed order and so I am guessing the sleep statement is to ensure that it comes out in the proper order. So - why is the test failing? Because the second print statement never makes it to the screen. If I remove the print statement entirely, I see both things in the reverse expected order. If I place the sleep statement after the main thread print then all I get to the screen is the that and not the print statement from thread 1 It is almost as if by the time the time the second print happens, the filehandle is already closed So - since threads aren't officially supposed to be working on Cygwin - is this something I should care about or not? Cheers Joshua Gatcomb a.k.a. Limbic~Region __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail
Re: Threads on Cygwin
--- Joshua Gatcomb [EMAIL PROTECTED] wrote: Up until a couple of weeks ago, all the threads tests were passing on Cygwin. I had submitted a patch some time ago that never got applied enabling tests for threads, timer, and extend_13 that never got applied. I figured there was good reason so I didn't say anything about the tests failing except an occasional that's weird on #parrot. So today I decide to look at threads_2.pasm It says at the bottom that the output could appear in reversed order and so I am guessing the sleep statement is to ensure that it comes out in the proper order. So - why is the test failing? Because the second print statement never makes it to the screen. If I remove the print statement entirely, I see both things in the reverse expected order. If I place the sleep statement after the main thread print then all I get to the screen is the that and not the print statement from thread 1 It is almost as if by the time the time the second print happens, the filehandle is already closed So - since threads aren't officially supposed to be working on Cygwin - is this something I should care about or not? Cheers Joshua Gatcomb a.k.a. Limbic~Region In summary, all code in all threads runs to completion but whichever thread finishes last can't print to the screen $ perl t/harness --gc-debug --running-make-test -b t/pmc/threads.t Failed 7/11 tests, 36.36% okay (less 2 skipped tests: 2 okay, 18.18%) Failed Test Stat Wstat Total Fail Failed List of Failed --- t/pmc/threads.t7 1792117 63.64% 2-3 5-9 2 subtests skipped. Failed 1/1 test scripts, 0.00% okay. 7/11 subtests failed, 36.36% okay. __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail
[perl #31651] [TODO] Win32 - Threads, Events, Signals, Sockets
# New Ticket Created by Will Coleda # Please include the string: [perl #31651] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=31651 (No details. This comes from TODO.win32)
As the world stops: of GC and threads
So, I haven't heard any convincing evidence that execution in other threads can continue while garbage collection is executing, copying collector or not. (Point of fact, the copying collector has nothing to do with it.) So what are the options to stop the world? I've heard the first 2 of these from Dan and Leo. The third is mine. Any others? * Taking the read half of a reader-writer lock for all pointer mutations The point here is to block all mutators from mutating the object graph while the collector is traversing it. I can't imagine this being good for performance. This is in addition to the mutex on the PMC. Other threads might be scheduled before GC completes, but don't have to be for GC to proceed. * OS pause thread support Lots of operating systems (including at least several versions of Mac OS X) don't support this. Maybe parrot can use it when it is available. It's always a dangerous proposition, though; the paused thread might be holding the system's malloc() mutex or something similarly evil. (This evil is why platforms don't always support the construct.) I don't think this is feasible for portability reasons. In this case, other threads would not be scheduled before GC completes. * Use events This is my proposition: Have the garbage collector broadcast a STOP! event to every other parrot thread, and then wait for them all to rendezvous that they've stopped before GC proceeds. Hold a lock on the thread mutex while doing this, so threads won't start or finish. This has the worst latency characteristics of the three: All other threads would need to be scheduled before GC can begin. Corner cases exist: NCI calls could be long-running and shouldn't block parrot until completion; another thread might try to allocate memory before checking events. Neither is insurmountable. Gordon Henriksen [EMAIL PROTECTED]
Re: Threads... last call
At 1:27 AM -0500 1/30/04, Gordon Henriksen wrote: On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote: At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote: On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Pardon me but I've apparently lost track of context here. Elizabeth Mattijsen [EMAIL PROTECTED], Leopold Toetsch [EMAIL PROTECTED], [EMAIL PROTECTED] I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) A shared data structure, as per Dan's document? It's a somewhat novel approach, trying to avoid locking overhead with dynamic dispatch and vtable swizzling. I'm discussing somewhat more traditional technologies, which simply allow an object to perform equally correctly and with no differentiation between shared and unshared cases. In essence, I'm arguing that a shared case isn't necessary for some data structures in the first place. Oh, absolutely. One thing that was made very clear from the (arguably failed) perl 5 experiments with threads is that trying to have the threaded and unthreaded code paths be merged was an exercise in extreme pain and code maintenance hell. Allowing for vtable fiddling's a way to get around that problem by isolating the threaded and nonthreaded paths. It also allows us to selectively avoid threaded paths on a per-class per-method basis--on those classes that don't permit morph (and it is *not* a requirement that pmcs support it) For those data types that don't require synchronization on some (or all) paths, the locking method can be a noop. Not free, as it still has to be called (unfortunately) but as cheap as possible. If someone wants to make a good case for it I'm even OK with having a NULL be valid for the lock vtable method and checking for non-NULLness in the lock before calling if anyone wants to try their hand at benchmarking. (I could see it going either way-- if(!NULL) or an empty sub being faster) Still the No crash VM requirement is a bit of a killer, though. It will definitely impact performance, and there's no way around that that I know of. I've a few ideas to minimize the locking needs, though, and I'll try and get an addendum to the design doc out soon. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More on threads
At 10:50 AM -0500 1/24/04, Gordon Henriksen wrote: On Saturday, January 24, 2004, at 09:23 , Leopold Toetsch wrote: Gordon Henriksen [EMAIL PROTECTED] wrote: ... Best example: morph. morph must die. Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union. The vtable IS the discriminator. I'm referring to this: typedef union UnionVal { struct {/* Buffers structure */ void * bufstart; size_t buflen; } b; struct {/* PMC unionval members */ DPOINTER* _struct_val; /* two ptrs, both are defines */ PMC* _pmc_val; } ptrs; INTVAL int_val; FLOATVAL num_val; struct parrot_string_t * string_val; } UnionVal; So long as the discriminator does not change, the union is type stable. The vtable's not the discriminator there, the flags in the pmc are the discriminator, as they're what indicates that the union's a GCable thing or not. I will admit, though, that looks *very* different than it did when I put that stuff in originally. (It used to be just a union of FLOATVAL, INTVAL, and string pointer...) Still, point taken. That needs to die and it needs to die now. For the moment, lets split it into two pieces, a buffer pointer and an int/float union, so we don't have to guess whether the contents have issues with threads. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Threads. Again. Dammit.
Okay, it's obvious that we still have some issues to work out before we hit implementation details. (Hey, it could be worse--this is easy compared to strings...) I think there are some ways we can minimize locking, and I think we have some unpleasant potential issues to deal with in the interaction between strings and threads (I thought we could dodge that, but, well... I was wrong). This needs more thought and more work before we go anywhere. Some of the obvious stuff, like fixing up the cache slot of the PMC, should be done regardless. I also think we need more real-worldish tests for this, so we can see if the problems really are as bad as they seem. That, at least, I think I can help with, since I conveniently happen to have a compiler that targets parrot near-done enough to test some reasonably abusive HLL(ish) code to see what sort of hit we take. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More on threads
At 1:47 AM + 1/25/04, Pete Lomax wrote: On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen [EMAIL PROTECTED] wrote: snip It doesn't matter if an int field could read half of a double or v.v.; it won't crash the program. Only pointers matter. snip These rules ensure that dereferencing a pointer will not segfault. In this model, wouldn't catching the segfault Apart from anything else, I don't want to catch segfaults and bus errors in parrot. (Well, OK, that's not true--I *do* want to catch segfaults and bus errors, I just don't think it's feasible, or possible on all platforms) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: More on threads
Dan Sugalski wrote: Gordon Henriksen wrote: Leopold Toetsch wrote: Gordon Henriksen wrote: ... Best example: morph. morph must die. Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union. The vtable IS the discriminator. I'm referring to this: typedef union UnionVal { struct {/* Buffers structure */ void * bufstart; size_t buflen; } b; struct {/* PMC unionval members */ DPOINTER* _struct_val; /* two ptrs, both are defines */ PMC* _pmc_val; } ptrs; INTVAL int_val; FLOATVAL num_val; struct parrot_string_t * string_val; } UnionVal; So long as the discriminator does not change, the union is type stable. The vtable's not the discriminator there, the flags in the pmc are the discriminator, as they're what indicates that the union's a GCable thing or not. I will admit, though, that looks *very* different than it did when I put that stuff in originally. (It used to be just a union of FLOATVAL, INTVAL, and string pointer...) Hm. Well, both are a discriminator, then; dispatch to code which presumes the contents of the union is quite frequently done without examining the flags. Maybe use a VTABLE func instead to get certain flags? i.e., INTVAL parrot_string_get_flags(..., PMC *pmc) { return PMC_FLAG_IS_POBJ + ...; } Then, updating the vtable would atomically update the flags as well. Or, hell, put the flags directly in the VTABLE if it's not necessary for them to vary across instances. I have the entire source tree (save src/ tests) scoured of that rat's nest of macros for accessing PMC/PObj fields, but I broke something and haven't had the motivation to track down what in the multi-thousand- line-diff it was, yet. :( Else you'd have the patch already and plenty of mobility in the layout of that struct. Near time to upgrade my poor old G3, methinks; the build cycle kills me when I touch parrot/pobj.h. Do any PMC classes use *both* struct_val *and* pmc_val concurrently? I was looking for that, but am afraid I didn't actually notice. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: More on threads
Dan Sugalski [EMAIL PROTECTED] wrote: [ PObj union ] Still, point taken. That needs to die and it needs to die now. For the moment, lets split it into two pieces, a buffer pointer and an int/float union, so we don't have to guess whether the contents have issues with threads. The Buffer members (bufstart, buflen) of the union are never used for a PMC. Also a PMC can't get converted into a Buffer or vv. These union members are just there for DOD, so that one pobject_lives() (and other functions) can be used for both PMCs and Buffers. That was introduced when uniting Buffers and PMCs. I don't see a problem with that. The problem that Gordon expressed with morph is: thread1 thread2 PerlInt-vtable-set_string_native (int_val = 3) LOCK() perlscalar-vtable-morph: pmc-vtable is now a PerlString vtable, str_val is invalid read access on pmc - non-locked PerlString-vtable-get_integer STRING *s = pmc-str_val SIGBUS/SEGV on access of s But that can be solved by first clearing str_val, then changing the vtable. leo
RE: More on threads
Leopold Toetsch wrote: Gordon Henriksen wrote: ... in the multi-thousand- line-diff it was, yet. :( Else you'd have the patch already 1) *no* multi-thousands line diffs 2) what is the problem, you like to solve? Er? Extending to the rest of the source tree the huge patch to classes which you already applied. No logic changes; just cleaning those PObj accessor macros up. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: More on threads
Gordon Henriksen wrote: Er? Extending to the rest of the source tree the huge patch to classes which you already applied. No logic changes; just cleaning those PObj accessor macros up. Ah sorry, that one. Please send in small bunches, a few files changed at once. leo
Re: More on threads
Gordon Henriksen wrote: Hm. Well, both are a discriminator, then; dispatch to code which presumes the contents of the union is quite frequently done without examining the flags. The flags are *never* consulted for a vtable call in classes/*. DOD does different things if a Buffer or PMC is looked at, but that doesn't matter here. Then, updating the vtable would atomically update the flags as well. Doesn't matter. Or, hell, put the flags directly in the VTABLE if it's not necessary for them to vary across instances. No, flags are mutable and per PMC *not* per class. ... in the multi-thousand- line-diff it was, yet. :( Else you'd have the patch already 1) *no* multi-thousands line diffs 2) what is the problem, you like to solve? Do any PMC classes use *both* struct_val *and* pmc_val concurrently? E.g. iterator.pmc. UnmanagedStruct uses int_val pmc_val. This is no problem. These PMCs don't morph. leo
RE: More on threads
Leopold Toetsch wrote: Gordon Henriksen wrote: Or, hell, put the flags directly in the VTABLE if it's not necessary for them to vary across instances. No, flags are mutable and per PMC *not* per class. Of course there are flags which must remain per-PMC. I wasn't referring to them. Sorry if that wasn't clear. If a flag is only saying my VTABLE methods use the UnionVal as {a void*/a PObj*/a PMC*/data}, so GC should trace accordingly, it may be a waste of a per-object flag bit to store those flags with the PMC instance rather than with the PMC class. And if it's with the VTABLE, then it doesn't need to be traced. (But, then, all PObjs don't have VTABLES...) Sidebar: If we're looking at lock-free concurrency, flag updates probably have to be performed with atomic 's and |'s. BUT: Doesn't apply during GC, since other threads will have to be stalled then. Do any PMC classes use *both* struct_val *and* pmc_val concurrently? E.g. iterator.pmc. UnmanagedStruct uses int_val pmc_val. This is no problem. These PMCs don't morph. Er, int_val and pmc_val at the same time? That's not quite what the layout provides for: typedef union UnionVal { struct {/* Buffers structure */ void * bufstart; size_t buflen; } b; struct {/* PMC unionval members */ DPOINTER* _struct_val; /* two ptrs, both are defines */ PMC* _pmc_val; } ptrs; INTVAL int_val; FLOATVAL num_val; struct parrot_string_t * string_val; } UnionVal; Says to me: struct_val and pmc_val concurrently -- or -- bufstart and buflen concurrently -- or -- int_val -- or -- num_val -- or -- string_val I don't know if C provides a guarantee that int_val and ptrs._pmc_val won't overlap just because INTVAL and DPOINTER* fields happen to be the same size. At least one optimizing compiler I know of, MrC/MrC++, would do some struct rearrangement when it felt like it. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: More on threads
Gordon Henriksen wrote: Leopold Toetsch wrote: No, flags are mutable and per PMC *not* per class. Of course there are flags which must remain per-PMC. I wasn't referring to them. Sorry if that wasn't clear. If a flag is only saying my VTABLE methods use the UnionVal as {a void*/a PObj*/a PMC*/data}, so GC should trace accordingly, it may be a waste of a per-object flag bit to store those flags with the PMC instance rather than with the PMC class. All DOD related flags in the fast paths (i.e. for marking scalars) are located in the PMCs arena (with ARENA_DOD_FLAGS is on). This reduces cache misses during DOD to nearly nothing. More DOD related information is in the flags part of the Pobj - but accessing that also means cache pollution. Putting flags elsewhere too, needs one more indirection and allways an access to the PMC memory itself. This doesn't give us any advantage. But again, flags don't matter during setting or getting a PMCs data. Flags aren't used in classes for these purposes. There are very few places in classes, where flags are even changed. This is morphing scalars, and Key PMCs come to my mind. If we're looking at lock-free concurrency, flag updates probably have to be performed with atomic 's and |'s. Almost all mutating vtable methods will lock the pmc. Er, int_val and pmc_val at the same time? I know :) This isn't the safest thing we have. After your union accessor patches, we can clean that up, and use a notation so that for this case, the two union members really can't overlap. leo
Re: More on threads
Leopold Toetsch [EMAIL PROTECTED] wrote: [ perlscalar moprh ] But that can be solved by first clearing str_val, then changing the vtable. Fixed. I currently don't see any more problems related to perscalars. PerlStrings are unsafe per se, as long as we have the copying GC. They need a lock during reading too. All other perscalars should be safe now for non-locked reading. Mutating vtables get a lock. leo
Re: Threads... last call
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Never? Yes. From the user's perspective, synchronized objects more often than not provide no value! For instance, this Java code, run from competing threads, does not perform its intended purpose: // Vector was Java 1's dynamically sizable array class. if (!vector.Contains(obj)) vector.Add(obj); I does not prevent obj from appearing more than once in the collection. It's equivalent to this: bool temp; synchronized (collection) { temp = vector.Contains(obj); } // Preemption is possible between here... if (!temp) { sychronized (collection) { // ... and here. vector.Add(obj); } } The correct code is this: synchronized (vector) { if (!vector.Contains(obj)) vector.Add(obj); } If we again make explicit Vector's synchronized methods, a startling redundancy will become apparent: synchronized (vector) { bool temp; synchronized (collection) { temp = vector.Contains(obj); } if (!temp) { sychronized (collection) { vector.Add(obj); } } } This code is performing 3 times as many locks as necessary! More realistic code will perform many times more than 3 times the necessary locking. Imagine the waste in sorting a 10,000 element array. Beyond that stunning example of wasted effort, most structures aren't shared in the first place, and so the overhead is *completely* unnecessary for them. Still further, many shared objects are unmodified once they become shared, and so again require no locking. The only time automatically synchronized objects are useful is when the user's semantics exactly match the object's methods. (e.g., an automatically synchronized queue might be quite useful.) In that case, it's a trivial matter for the user to wrap the operation in a synchronized blockor subclass the object with a method that does so. But it is quite impossible to remove the overhead from the other 99% of uses, since the VM cannot discern the user's locking strategy. If the user is writing a threaded program, then data integrity is necessarily his problema virtual machine cannot solve it for him, and automatic synchronization provides little help. Protecting a policy-neutral object (such as an array) from its user is pointless; all we could do is to ensure that the object is in a state consistent with an incorrect sequence of operations. That's useless to the user, because his program still behaves incorrectly; the object is, to his eyes, corrupted since it no longer maintains his invariant conditions. Thus, since the VM cannot guarantee that the the user's program behaves as intended (only as written), and all that locking is wasted 4 times overthe virtual machine would be wise to limit its role to the prevention of crashes and to limiting the corruption that can result from incorrect (unsynchronized) access to objects. If automatic locking is the only way to do that for a particular data structure, then so be it. Oftentimes, though, thoughtful design can ensure that even unsynchronized accesses cannot crash or corrupt the VM as a whole--even if those operations might corrupt one object's state, or provide less than useful results. As an example of how this applies to parrot, all of the FixedMumbleArray classes[*] recently discussed could clearly be implemented completely and safely without automatic locks on all modern platformsif only parrot could allow lock-free access to at least some PMCs. ([*] Except FixedMixedArray. That stores UVals [plus more for a discriminator field], and couldn't be quite safe, as a UVal isn't an atomic write on many platforms. Then again, nor is a double.) Remember, Java already made the mistake of automatic synchronization with their original
Re: Threads... last call
On Wed, Jan 28, 2004 at 12:53:09PM -0500, Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. Quite the contrary--it is most useful. Parrot must, we all agree, under no circumstances crash due to unsynchronized data access. For it to do so would be, among other things, a gross security hole when running untrusted code in a restricted environment. There is no need for any further guarantee about unsynchronized data access, however. If unsyncronized threads invariably cause an exception, that's fine. If they cause the threads involved to halt, that's fine too. If they cause what was once an integer variable to turn into a string containing the definition of mulching...well, that too falls under the heading of undefined results. Parrot cannot and should not attempt to correct for bugs in user code, beyond limiting the extent of the damage to the threads and data structures involved. Java, when released, took the path that Parrot appears to be about to take--access to complex data structures (such as Vector) was always synchronized. This turned out to be a mistake--sufficiently so that Java programmers would often implement their own custom, unsynchronized replacements for the core classes. As a result, when the Collections library (which replaces those original data structures) was released, the classes in it were left unsynchronized. In Java's case, the problem was at the library level, not the VM level; as such, it was relatively easy to fix at a later date. Parrot's VM-level data structure locking will be less easy to change. - Damien
Re: Threads... last call
At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote: On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Pardon me but I've apparently lost track of context here. I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) -Melvin
Re: Threads... last call
Melvin Smith [EMAIL PROTECTED] wrote: I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) Basically we have three kinds of locking: - HLL user level locking [1] - user level locking primitives [2] - vtable pmc locking to protect internals Locking at each stage and for each PMC will be slow and can deadlock too. Very coarse grained locking (like Pythons interpreter_lock) doesn't give any advantage on MP systems - only one interpreter is running at one time. We can't solve user data integrity at the lowest level: data logic and such isn't really visible here. But we should be able to integrate HLL locking with our internal needs, so that the former doesn't cause deadlocks[3] because we have to lock internally too, and we should be able to omit internal locking, if HLL locking code already provides this safety for a specific PMC. The final strategy when to lock what depends on the system the code is running. Python's model is fine for single processors. Fine grained PMC locking gives more boost on multi-processor machines. All generalization is evil and 47.4 +- 1.1% of all statistics^Wbenchmarks are wrong;) leo [1] BLOCK scoped, e.g. synchronized {... } or { lock $x; ... } These can be rwlocks or mutex typed locks [2] lock.aquire, lock.release [3] not user caused deadlocks - the mix of e.g. one user lock and internal locking. -Melvin leo
Re: Threads... last call
On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote: At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote: On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote: At 12:27 PM 1/23/2004 -0800, Damien Neil wrote: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) But this accomplishes nothing useful and still means the data structure is not re-entrant, nor is it corruption resistant, regardless of how we judge it. It does accomplish something very useful indeed: It avoids the overhead of automatic locking when it isn't necessary. When *is* that locking necessary? To a second order approximation, ***NEVER.*** Pardon me but I've apparently lost track of context here. I thought we were discussing correct behavior of a shared data structure, not general cases. Or maybe this is the general case and I should go read more backlog? :) A shared data structure, as per Dan's document? It's a somewhat novel approach, trying to avoid locking overhead with dynamic dispatch and vtable swizzling. I'm discussing somewhat more traditional technologies, which simply allow an object to perform equally correctly and with no differentiation between shared and unshared cases. In essence, I'm arguing that a shared case isn't necessary for some data structures in the first place. Gordon Henriksen [EMAIL PROTECTED]
Re: More on threads
Gordon Henriksen [EMAIL PROTECTED] wrote: I overstated when I said that morph must die. morph could live IF: [ long proposal ] Increasing the union size, so that each pointer is distinct is not an option. This imposes considerable overhead on a non-threaded program too, due its bigger PMC size. To keep internal state consistent we have to LOCK shared PMCs, that's it. This locking is sometimes necessary for reading too. leo
Re: More on threads
will be? You just showed in a microbenchmark that it's 400% for some operations. We've also heard anecdotal evidence of 400% *overall* performance hits from similar threading strategies in other projects. And remember, these overheads are ON TOP OF the user's synchronization requirements; the PMC locks will rarely coincide with the user's high-level synchronization requirements. If these are the two options, I as a user would rather have a separate threaded parrot executable which takes the 2.1% hit, rather than the 400% overhead as per above. It's easily the difference between usable threads and YAFATTP (yet another failed attempt to thread perl). Gordon Henriksen [EMAIL PROTECTED]
Re: More on threads
Gordon Henriksen [EMAIL PROTECTED] wrote: Leopold Toetsch wrote: Increasing the union size, so that each pointer is distinct is not an option. This imposes considerable overhead on a non-threaded program too, due its bigger PMC size. That was the brute-force approach, separating out all pointers. If the scalar hierarchy doesn't use all 4 of the pointers, then the bloat can be reduced. Yep. Your proposal is a very thorough analysis, what the problems of morph() currently are. I can imagine, that we have a distinct string_val pointer, that isn't part of the value union. Morph is currently implemented only (and necessary) for PerlScalar types. So if a PerlString's string_val member is valid at any time, we could probably save a lot of locking overhead. And what of the per-PMC mutex? Is that not also considerable overhead? More than an unused field, even. We have to deal single-CPU (no-threaded) performance against threaded. For the latter, we have from single-CPU to many-multi CPU NUMA systems a wide spectrum of possibilties. I currently don't want to slow down the normal no-threaded case. To keep internal state consistent we have to LOCK shared PMCs, that's it. This locking is sometimes necessary for reading too. Sometimes? Unless parrot can prove a PMC is not shared, PMC locking is ALWAYS necessary for ALL accesses to ANY PMC. get_integer() on a PerlInt PMC is always safe. *If* the vtable is pointing to a PerlInt PMC it yields a correct value (atomic int access presumed). If for some reason (which I can't imagine now) the vtable pointer and the cache union are out of sync, get_integer would produce a wrong value, which is currently onsidered to be a user problem (i.e. missing user-level locking). The same holds for e.g. PerlNum, which might read a mixture of lo and hi words, but again, the pmc-vtable-get_number of a PerlNum is a safe operation on shared PMCs without locking too. The locking primitives provide AFAIK the (might needed) memory barriers to update the vtable *and* the cache values to point to consistent data during the *locking* of mutating vtable methods. BENCHMARK USER SYS %CPUTOTAL That's only 2.1%. [ big snip on benchmarks ] All these benchmarks show a scheme like: allocate once and use once. I.e. these benchmarks don't reuse the PMCs. They don't show RL program behavior. We don't have any benchmarks currently that expose a near to worst-case slowdown, which went towards 400% for doubled PMC sizes. We had some test programs (C only with simulated allocation schemes), that shuffled allocated PMCs in memory (a typical effect of reusing PMC memory a few time). The current 20 byte PMC went down by 200% the old 32 byte PMC went down by 400%. The point is, that when you get allocated PMC from the free_list, they are more and more scattered in the PMC arenas. All current benchmarks tend to touch contiguous memory, while RL (or a bit longer running) programs don't. That said, I do consider PMC sizes and cache pollution the major performance issues of RL Parrot programs. These benchmarks don't show that yet. What do you think the overall performance effect of fine-grained locking will be? You just showed in a microbenchmark that it's 400% for some operations. Yes. That's the locking overhead for the *fastest* PMC vtable methods. I think, that we'll be able to have different locking strategies in the long run. If you want to have a more scalable application on a many-CPU system, a build-option may provide this. For a one or two CPU system, we can do a less fine grained locking with less overhead. That might be a global interpreter lock for that specific case. And remember, these overheads are ON TOP OF the user's synchronization requirements; the PMC locks will rarely coincide with the user's high-level synchronization requirements. User level locking is't layed out yet. But my 2¢ towards that are: a user will put a lock around unsafe (that is shared) variable access. We have to lock internally a lot of times to keep our data-integrity, which is guaranteed. So the question is, why not to provide that integrity per se (the user will need it anyway and lock). I don't see a difference her for data integrity, *but* if all locking is under our control, we can optimize it and it doesn't conflict or it shouldn't deadlock. If these are the two options, I as a user would rather have a separate threaded parrot executable which takes the 2.1% hit, rather than the 400% overhead as per above. It's easily the difference between usable threads and YAFATTP (yet another failed attempt to thread perl). All these numbers are by far too premature, to have any impact on RL applications. Please note, that even with a 400% slowdown for one vtable operation the mops_p.pasm benchmark would run 4 to eight times faster then on an *unthreaded* perl5. Thread spawning is currently 8 times faster then on perl5. ? !!!1 Gordon Henriksen leo
Re: More on threads
Gordon Henriksen [EMAIL PROTECTED] wrote: ... Best example: morph. morph must die. Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union. Gordon Henriksen leo
Re: More on threads
On Saturday, January 24, 2004, at 09:23 , Leopold Toetsch wrote: Gordon Henriksen [EMAIL PROTECTED] wrote: ... Best example: morph. morph must die. Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union. The vtable IS the discriminator. I'm referring to this: typedef union UnionVal { struct {/* Buffers structure */ void * bufstart; size_t buflen; } b; struct {/* PMC unionval members */ DPOINTER* _struct_val; /* two ptrs, both are defines */ PMC* _pmc_val; } ptrs; INTVAL int_val; FLOATVAL num_val; struct parrot_string_t * string_val; } UnionVal; So long as the discriminator does not change, the union is type stable. When the discriminator does change, as per here: void Parrot_PerlInt_set_string_native(Parrot_Interp interpreter, PMC* pmc, STRING* value) { VTABLE_morph(interpreter, pmc, enum_class_PerlString); VTABLE_set_string_native(interpreter, pmc, value); } void Parrot_perlscalar_morph(Parrot_Interp interpreter, PMC* pmc, INTVAL type) { if (pmc-vtable-base_type == enum_class_PerlString) { if (type == enum_class_PerlString) return; PObj_custom_mark_CLEAR(pmc); pmc-vtable = Parrot_base_vtables[type]; return; } if (type == enum_class_PerlString) { pmc-vtable = Parrot_base_vtables[type]; VTABLE_init(interpreter, pmc); return; } PObj_custom_mark_CLEAR(pmc); pmc-vtable = Parrot_base_vtables[type]; } ... then these can both run: Parrot_scalar_get_string(Parrot_Interp interpreter, PMC* pmc) { return (STRING*)pmc-cache.string_val; } FLOATVAL Parrot_scalar_get_number(Parrot_Interp interpreter, PMC* pmc) { return pmc-cache.num_val; } That clearly allows a struct parrot_string_t * to freely share the same memory as a double. Were it an int and a double, the surprising results from this unprotected access wouldn't violate the no crashes guarantee. But it's a pointer! Dereferencing it could cause a segfault, or a read or write of an arbitrary memory location. Both clearly violate the crucial guarantee. Gordon Henriksen [EMAIL PROTECTED]
Re: More on threads
Leopold Toetsch wrote: Gordon Henriksen [EMAIL PROTECTED] wrote: ... Best example: morph. morph must die. Morph is necessary. But please note: morph changes the vtable of the PMC to point to the new data types table. It has nothing to do with a typed union. I overstated when I said that morph must die. morph could live IF: the UnionVal struct were rearranged bounds ere placed upon how far a morph could... well, morph It doesn't matter if an int field could read half of a double or v.v.; it won't crash the program. Only pointers matter. To allow PMC classes to guarantee segfault-free operation, morph and cooperating PMC classes must conform to the following rule. Other classes would require locking. With this vocabulary: variable: A memory location which is reachable (i.e., not garbage). [*] pointer: The address of a variable. pointer variable: A variable which contains a pointer. access: For a pointer p, any dereference of p*p, p-field, or p[i]whether for the purposes of reading or writing to that variable. And considering: any specific pointer variable (ptr), and all accesses which parrot might perform[**] on any pointer ever stored in ptr (A) [***], and any proposed assignment to ptr Then: If any A which once accessed a pointer variable would now access a non-pointer variable, Then the proposed assignment MUST NOT be performed. This is a relaxed type-stabilitydefinition. (Relaxed: It provides type stability only for pointer variables, not for data variables. It does not discriminate the types of pointers, only that the data structures they directly reference have the same layout of pointers. Also, a loophole allows non-pointer variables to become pointer variables, but not the reverse.) These rules ensure that dereferencing a pointer will not segfault. They also ensure that it is safe to deference a pointer obtained from a union according to the union's discriminatorregardless of when or in which order or how often parrot read the pointer or the discriminator.[***] I think they're actually the loosest possible set of rules to do this. [*] Two union members are the same variable. [**] This is in the variable ptr specifically, not merely in the same field of a similar struct. That is, having an immutable discriminator which selects s.u.v or s.u.i from struct { union { void * v; int i; } u } s is valid. A mutable discriminator is also validso long as the interpretation of pointer fields does not change. [***] But only if the architecture prevents shearing in pointer reads and writes. From another perspective this is to say: Every pointer variable must forever remain a pointer. Union discriminators must not change such that a pointer will no longer be treated as a pointer, or will be treated as a pointer to a structure with a different layout. The first step in conforming to these rules is guaranteeing that a perlscalar couldn't morph into an intlist or some other complete nonsense. So the default for PMCs should be to prohibit morphing. Also, morphable classes will have a hard time using struct_val without violating the above rules. But for this price, parrot could get lock-free, guaranteed crash-proof readers for common data types. But note that pmc-cache.pmc_val can be used freely! So if exotic scalars wrap their data structures in a PMC *cough*perlobject*cough*managedstruct*ahem*, then those PMCs can be part of a cluster of morphable PMC classes without violating these rules. Next, the scalar hierarchy (where morphing strikes me as most important) could be adjusted to provide the requisite guarantees, such as: perlstring's vtable methods would never look for its struct parrot_string_t * in the same memory location that a perlnum vtable method might be storing half of a floatval. Right now, that sort of guarantee is not made, and so ALL shared PMCs REALLY DO require locking. That's bad, and it's solvable. Specifically, UnionVal with its present set of fields, would have to become something more like this: struct UnionVal { struct parrot_string_t * string_val; DPOINTER* struct_val; PMC* pmc_val; void *b_bufstart; union { INTVAL _int_val; size_t _buflen; FLOATVAL _num_val; } _data_vals; }; If no scalar types use struct_val or pmc_val or b_bufstart, then those fields can go inside the union. Unconstrained morphing is the only technology that *in all cases* *completely* prevents the crash-proof guarantee for lock-free access to shared PMCs. Without changes to this, we're stuck with implicit PMC locking and what looks like an unusable threading implementation. This is only the beginning! For example, if parrot can povide type stability, mutable strings can be crash-proofed from multithreaded access. Wha?! 1. Add to the buffer structure an immutable
Re: More on threads
I wrote: With this vocabulary: variable: A memory location which is reachable (i.e., not garbage). [*] pointer: The address of a variable. pointer variable: A variable which contains a pointer. access: For a pointer p, any dereference of p*p, p-field, or p[i]whether for the purposes of reading or writing to that variable. And considering: any specific pointer variable (ptr), and all accesses which parrot might perform[**] on any pointer ever stored in ptr (A) [***], and any proposed assignment to ptr Then: If any A which once accessed a pointer variable would now access a non-pointer variable, Then the proposed assignment MUST NOT be performed. D'oh. This actually has to be recursive. Considering: any specific pointer variable (ptr'), and all accesses which parrot might perform on any pointer ever stored in ptr, and all accesses which parrot might perform on any pointer ever stored in those variables, ..., ... A, and any proposed assignment to ptr else it allows char * a = ...; char ** b = a; Doesn't change the conclusions I drew at all. (Nor does it require some massively recursive algorithm to run at pointer assignment time, just as the first one didn't require anything more than pointer assignment at pointer assignment time.) Could probably be simplified with the addition of pointer type to the definitions section. Anyhoo. Gordon Henriksen [EMAIL PROTECTED]
Re: More on threads
On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen [EMAIL PROTECTED] wrote: snip It doesn't matter if an int field could read half of a double or v.v.; it won't crash the program. Only pointers matter. snip These rules ensure that dereferencing a pointer will not segfault. In this model, wouldn't catching the segfault and retrying (once or twice) work? - If I'm reading you correctly, which is unlikely, this has little to do with program correctness, but about the interpreter not crashing because of an unfortunate context switch.. which the programmer should have guarded against in the first place... no, I think I just lost the plot again ;-) Pete
Re: More on threads
Pete Lomax wrote: Gordon Henriksen wrote: snip It doesn't matter if an int field could read half of a double or v.v.; it won't crash the program. Only pointers matter. snip These rules ensure that dereferencing a pointer will not segfault. In this model, wouldn't catching the segfault and retrying (once or twice) work? Determining how to retry in the general case would be... much more interesting than this proposal. :) Furthermore, worse than segfaults could potentially result from using half of a double as a pointer. There's no assurance that *((caddr_t *) double) won't in fact be a valid memory address. In this case, there would be no segfault, but memory would be subtly corrupted. There's no way to detect that, so there's no way to retry. The point of all the tweaks and care is to prevent ever dropping something else in a particular variable where parrot would, at another point in the program, expect a pointer of a particular type. I think you probably got the following, but I'd just like to elaborate more specifically. I think the greatest subtlety of the rules was in the interpretation of accesses which parrot might perform and the word specific in any specific pointer variable ptr Without understanding precisely what I meant there, one might think that even a simple polymorphic system like the following is prohibited: struct eg; typedef int (func_ptr*)(struct eg*); struct { func_ptr fp; union { int *pointer; int integer; } u; } eg; int pointer_meth(struct eg *thiz) { return ++*(thiz-u.pointer); } int integer_meth(struct eg *thiz) { return ++(thiz-u.integer); } void main() { struct eg eg1 = { pointer_meth, NULL }; struct eg eg2 = { integer_meth, 1 }; eg1.u.pointer = malloc(sizeof(int)); *eg1.u.pointer = 0; print_it(eg1, eg1); print_it(eg2, eg2); } void print_it(char *name, struct eg *some_eg) { printf(%s says %d\n, some_egfp(eg2)); } But the program IS allowed. While print_it might behave in any number of ways depending on some_eg-fp, it will always access a particular eg.u in a consistent fashion, since it always respects the discriminator (eg.fp), which the program never changes. By extension of this, C++ instances do not violate these rules, either.[*] Were the following line added to main(), though, then the program would be in violation: eg1.fp = integer_meth; Because now some other thread could have obvserved eg1.fp == pointer_meth and begun invoking pointer_meth. pointer_meth might now access u.pointer and, instead of a pointer, see n + (int) u.pointer. That probably won't segfault for small values of n, but will certainly not do the right thing either. This is trite tangent, but also note that the type stability rule prohibits this: eg1.u.pointer = NULL; But would not if the definition of pointer_meth became: int pointer_meth(struct eg *thiz) { int* pointer = this-u.pointer; return pointer == NULL? -1 : ++*pointer; } Because now the program will now not dereference u.pointer if its value is NULL. How cute. (But if u.pointer were not copied to a local, then bets are off again, because C might perform an extra load and get a value inconsistent with the one it received when testing for NULL.) ... But even that extra local copy isn't required if u.pointer begins NULL, and can become non-NULL, but will not become NULL again. Why? all accesses which parrot MIGHT perform on any pointer ever storED in ptr (A) Note the past tense there. That's why. If I'm reading you correctly, which is unlikely, That has little to do with you, but much to do with my burying important parts of my message in pages of dense text. :) this has little to do with program correctness, but about the interpreter not crashing because of an unfortunate context switch.. which the programmer should have guarded against in the first place... Yes. Precisely. no, I think I just lost the plot again ;-) I think you're pretty close, just missing a few of the subtleties that got buried in that long missive. :) Gordon Henriksen [EMAIL PROTECTED] [*] Even though C++ changes the vtable of an instance during instantiation, it does so in a broadening fashion, making formerly-inaccessible variables accessible. A C++ instance of class derived_class : public base_class is not a derived_class until base_class's constructor finishes and derived_class's constructor begins. (Side-effect: A subclass cannot influence the instantiation behavior of a base class.) (Objective C's class methods are wildly useful. Static languages tend to ignore them. It's sad.)
Re: Threads... last call
At 5:24 PM -0500 1/22/04, Deven T. Corzine wrote: Dan Sugalski wrote: Last chance to get in comments on the first half of the proposal. If it looks adequate, I'll put together the technical details (functions, protocols, structures, and whatnot) and send that off for abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and get the implementation in and going. Dan, Sorry to jump in out of the blue here, but did you respond to Damien Neil's message about locking issues? (Or did I just miss it?) Damian's issues were addressed before he brought them up, though not in one spot. A single global lock, like python and ruby use, kill any hope of SMP-ability. Hand-rolled threading has unpleasant complexity issues, is a big pain, and terribly limiting. And kills any hope of SMP-ability. Corruption-resistent data structures without locking just don't exist. This sounds like it could be a critically important design question; wouldn't it be best to address it before jumping into implementation? If there's a better approach available, wouldn't this be the best time to determine that? Deven Date: Wed, 21 Jan 2004 13:32:52 -0800 From: Damien Neil [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Start of thread proposal Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] In-Reply-To: [EMAIL PROTECTED] Content-Length: 1429 On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote: ... seems to indicate that even whole ops like add P,P,P are atomic. Yep. They have to be, because they need to guarantee the integrity of the pmc structures and the data hanging off them (which includes buffer and string stuff) Personally, I think it would be better to use corruption-resistant buffer and string structures, and avoid locking during basic data access. While there are substantial differences in VM design--PMCs are much more complicated than any JVM data type--the JVM does provide a good example that this can be done, and done efficiently. Failing this, it would be worth investigating what the real-world performance difference is between acquiring multiple locks per VM operation (current Parrot proposal) vs. having a single lock controlling all data access (Python) or jettisoning OS threads entirely in favor of VM-level threading (Ruby). This forfeits the ability to take advantage of multiple CPUs--but Leopold's initial timing tests of shared PMCs were showing a potential 3-5x slowdown from excessive locking. I've seen software before that was redesigned to take advantage of multiple CPUs--and then required no less than four CPUs to match the performance of the older, single-CPU version. The problem was largely attributed to excessive locking of mostly-uncontested data structures. - Damien -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Threads... last call
At 5:58 PM -0500 1/22/04, Josh Wilmes wrote: I'm also concerned by those timings that leo posted. 0.0001 vs 0.0005 ms on a set- that magnitude of locking overhead seems pretty crazy to me. It looks about right. Don't forget, part of what you're seeing isn't that locking mutexes is slow, it's that parrot does a lot of stuff awfully fast. It's also a good idea to get more benchmarks before jumping to any conclusions -- changing designes based on a single, first cut, quick-n-dirty benchmark isn't necessarily a wise thing. It seemed like a few people have said that the JVM style of locking can reduce this, so it seems to me that it merits some serious consideration, even if it may require some changes to the design of parrot. There *is* no JVM-style locking. I've read the docs and looked at the specs, and they're not doing anything at all special, and nothing different from what we're doing. Some of the low-level details are somewhat different because Java has more immutable base data structures (which don't require locking) than we do. Going more immutable is an option, but one we're not taking since it penalizes things we'd rather not penalize. (String handling mainly) There is no JVM Magic here. If you're accessing shared data, it has to be locked. There's no getting around that. The only way to reduce locking overhead is to reduce the amount of data that needs locking. I'm not familiar enough with the implementation details here to say much one way or another. But it seems to me that if this is one of those low-level decisions that will be impossible to change later and will forever constrain perl's performance, then it's important not to rush into a bad choice because it seems more straightforward. This can all be redone if we need to -- the locking and threading strategies can be altered in a dozen ways or ripped out and rewritten, as none of them affect the semantics of bytecode execution. At 17:24 on 01/22/2004 EST, Deven T. Corzine [EMAIL PROTECTED] wrote: Dan Sugalski wrote: Last chance to get in comments on the first half of the proposal. If it looks adequate, I'll put together the technical details (functions, protocols, structures, and whatnot) and send that off for abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and get the implementation in and going. Dan, Sorry to jump in out of the blue here, but did you respond to Damien Neil's message about locking issues? (Or did I just miss it?) This sounds like it could be a critically important design question; wouldn't it be best to address it before jumping into implementation? If there's a better approach available, wouldn't this be the best time to determine that? Deven Date: Wed, 21 Jan 2004 13:32:52 -0800 From: Damien Neil [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Start of thread proposal Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] 8.leo.home [EMAIL PROTECTED] In-Reply-To: [EMAIL PROTECTED] Content-Length: 1429 On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote: ... seems to indicate that even whole ops like add P,P,P are atomic. Yep. They have to be, because they need to guarantee the integrity of the pmc structures and the data hanging off them (which includes buffer and string stuff) Personally, I think it would be better to use corruption-resistant buffer and string structures, and avoid locking during basic data access. While there are substantial differences in VM design--PMCs are much more complicated than any JVM data type--the JVM does provide a good example that this can be done, and done efficiently. Failing this, it would be worth investigating what the real-world performance difference is between acquiring multiple locks per VM operation (current Parrot proposal) vs. having a single lock controlling all data access (Python) or jettisoning OS threads entirely in favor of VM-level threading (Ruby). This forfeits the ability to take advantage of multiple CPUs--but Leopold's initial timing tests of shared PMCs were showing a potential 3-5x slowdown from excessive locking. I've seen software before that was redesigned to take advantage of multiple CPUs--and then required no less than four CPUs to match the performance of the older, single-CPU version. The problem was largely attributed to excessive locking of mostly-uncontested data structures. - Damien -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Threads... last call
Dan Sugalski wrote: At 5:24 PM -0500 1/22/04, Deven T. Corzine wrote: Damian's issues were addressed before he brought them up, though not in one spot. A single global lock, like python and ruby use, kill any hope of SMP-ability. Hand-rolled threading has unpleasant complexity issues, is a big pain, and terribly limiting. And kills any hope of SMP-ability. What about the single-CPU case? If it can really take 4 SMP CPUs with locking to match the speed of 1 CPU without locking, as mentioned, perhaps it would be better to support one approach for single-CPU systems (or applications that are happy to be confined to one CPU), and a different approach for big SMP systems? Corruption-resistent data structures without locking just don't exist. The most novel approach I've seen is the one taken by Project UDI (Uniform Driver Interface). Their focus is on portable device drivers, so I don't know if this idea could work in the Parrot context, but the approach they take is to have the driver execute in regions. Each driver needs to have at least one region, and it can create more if it wants better parallelism. All driver code executes inside a region, but the driver does no locking or synchronization at all. Instead, the environment on the operating-system side of the UDI interface handles such issues. UDI is designed to ensure that only one driver instance can ever be executing inside a region at any given moment, and the mechanism it uses is entirely up to the environment, and can be changed without touching the driver code. This white paper has a good technical overview (the discussion of regions starts on page 9): http://www.projectudi.org/Docs/pdf/UDI_tech_white_paper.pdf I'm told that real-world experience with UDI has shown performance is quite good, even when layered over existing native drivers. The interesting thing is that a UDI driver could run just as easily on a single-tasking, single-CPU system (like DOS) or a multi-tasking SMP system equally well, and without touching the driver code. It doesn't have to know or care if it's an SMP system or not, although it does have to create multiple regions to actually be able benefit from SMP. (Of course, even with single-region drivers, multiple instances of the same driver could benefit from SMP, since each instance could run on a different CPU.) I don't know if it would be possible to do anything like this with Parrot, but it might be interesting to consider... Deven
Re: Threads... last call
On Fri, Jan 23, 2004 at 10:07:25AM -0500, Dan Sugalski wrote: A single global lock, like python and ruby use, kill any hope of SMP-ability. Assume, for the sake of argument, that locking almost every PMC every time a thread touches it causes Parrot to run four times slower. Assume also that all multithreaded applications are perfectly parallelizable, so overall performance scales linearly with number of CPUs. In this case, threaded Parrot will need to run on a 4-CPU machine to match the speed of a single-lock design running on a single CPU. The only people that will benefit from the multi-lock design are those using machines with more than 4 CPUs--everyone else is worse off. This is a theoretical case, of course. We don't know exactly how much of a performance hit Parrot will incur from a lock-everything design. I think that it would be a very good idea to know for certain what the costs will be, before it becomes too late to change course. Perhaps the cost will be minimal--a 20% per-CPU overhead would almost certainly be worth the ability to take advantage of multiple CPUs. Right now, however, there is no empirical data on which to base a decision. I think that making a decision without that data is unwise. As I said, I've seen a real-world program which was rewritten to take advantage of multiple CPUs. The rewrite fulfilled the design goals: the new version scaled with added CPUs. Unfortunately, lock overhead made it sufficiently slower that it took 2-4 CPUs to match the old performance on a single CPU--despite the fact that almost all lock attempts succeeded without contention. The current Parrot design proposal looks very much like the locking model that app used. Corruption-resistent data structures without locking just don't exist. An existence proof: Java Collections are a standard Java library of common data structures such as arrays and hashes. Collections are not synchronized; access involves no locks at all. Multiple threads accessing the same collection at the same time cannot, however, result in the virtual machine crashing. (They can result in data structure corruption, but this corruption is limited to surprising results rather than VM crash.) - Damien
Re: Threads... last call
On Fri, 23 Jan 2004 10:24:30 -0500, [EMAIL PROTECTED] (Dan Sugalski) wrote: If you're accessing shared data, it has to be locked. There's no getting around that. The only way to reduce locking overhead is to reduce the amount of data that needs locking. One slight modification I would make to that statement is: You can reduce locking overhead by only invoking that overhead that overhead when locking is necessary. If there is a 'cheaper' way of detecting the need for locking, then avoiding the cost of locking, by only using it when needed is beneficial. This requires the detection mechanism to be extremely fast and simple relative to the cost of aquiring a lock. This was what I attempted to describe before, in win32 terms, without much success. I still can't help tinking that other platforms probably have similar possibilities, but I do not know enough of them to describe the mechanism in thise terms. Nigel.
RE: Threads... last call
Deven T. Corzine wrote: The most novel approach I've seen is the one taken by Project UDI (Uniform Driver Interface). This is very much the ithreads model which has been discussed. The problem is that, from a functional perspective, it's not so much threading as it is forking. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
More on threads
Just thought I'd share some more thoughts on threading. I don't think the threading proposal is baked, yet, unfortunately. I've come to agree with Dan: As the threading requirements and the architecture stand, parrot requires frequent and automatic locking to prevent crashes. This is completely apart from user synchronization. As the architecture stands? What's wrong with it? I think the most problematic items are: 1. parrot's core operations are heavy and multi-step, not lightweight and atomic. -- This makes it harder for parrot to provide a crash-proof environment. 2. PMCs are implemented in C, not PIR. -- Again, makes parrot's job of providing a crash-proof environment much harder. If a small set of safe operations can be guaranteed safe, then the crash-proofing bubbles upward. 3. New code tends to appear in parrot's core rather than accumulating in a standard library. -- This bloats the core, increasing our exposure to bugs at that level. 4. Memory in parrot is not type-stable. -- unions with mutable discriminators are evil, because checking the discriminator and accessing the data field could be preempted by a change of discriminator and value. Thus, unions containing pointers require locking for even read access, lest seg faults or unsafe memory accesses occur. Best example: morph. morph must die. But parrot's already much too far along for the above to change. (ex.: morph must die.) The JVM and CLR have successful threading implementations because their core data types are either atomic or amenable to threading. (I've been over this before, but I'm playing devil's advocate today.) -- Many of Perl's core string types, for instance, are not threadsafe and they never will be. (note: I said Perl, not parrot.) Even if implemented on the JVM, Perl's string types would still require locking. That Perl doesn't use them yet doesn't mean parrot can't also have data structures that are amenable to locking. Immutable strings wouldn't require locking on parrot any more than on the JVMso long as morph and transcode could be prevented. (Three cheers for type-stable memory.) If parrot can prove that a P-reg will point to a PMC of such-and-such type, and can know that such-and-such operation requires no locking on that type, it can avoid locking the PMC. That, and neither environment (any longer) makes any misguided attempt to provide user-level consistency when it hasn't been requested. -- That means they simply don't lock except when the user tells them to. No reader-writer locks to update a number. Like Dan mentioned, there's no JVM magic, but rather there is a lot of very careful design. The core is crash-proofed, and is small enough that the crash-proofing is reasonable and provable. Atop that is built code which inherits that crash-proofing. Thread-safety is a very high-level guarantee, only rarely necessary. Dan Sugalski wrote: =item All shared PMCs must have a threadsafe vtable The first thing that any vtable function of a shared PMC must do is to aquire the mutex of the PMCs in its parameter list, in ascending address order. When the mutexes are released they are not required to be released in any order. Wait a sec. $2-vtable-add(interpreter, $1, $2, $3). That's one dynamic dispatch. I see 2 variables that could be shared. I think that's fatal, actually. The algorithm I'd suggest instead is this: Newborn objects couldn't have been shared, and as such can safely be accessed without locks. This is a lot given how Perl treats values, though certainly not all. All objects from foreign sources, which have been passed to another routine, or stored into a sharable container must be presumed to require locks. It's not as aggressive, true, but I think the overall cost is lower. To back up Dan: Regarding Leo's timings, everyone that's freaking out should remember that he was testing a very fast operation, the worst case scenario for the locking overhead. *Of course* the overhead will appear high. Most of parrot's operations are much heavier, and the locking overhead will be less apparent when those are executing. That said, a 400% penalty is too high a price to pay for what, after all, isn't even a useful level of threadsafety from a user's standpoint. But, again, without respecification and redesign, parrot requires the locking. The trick is to lock less. One way I can see to do that is to move locking upward, so that several operations can be carried out under the auspices of one lock. How would I propose to do this? Add some lock opcodes to PBC. The pluralized version allows parrot to acquire locks in ascending address order (hardcoded bubble sorts), according to Dan's very important deadlock-avoidance algorithm. - op lock(in PMC) - op lock2(in PMC, in PMC) - ... - op lock5(in PMC, in PMC, in PMC, in PMC, in PMC) - ... Add unlock opcode(s), too. Pluralized? Doesn't matter. Force all locks to be released before any of: - locking more
Threads... last call
Last chance to get in comments on the first half of the proposal. If it looks adequate, I'll put together the technical details (functions, protocols, structures, and whatnot) and send that off for abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and get the implementation in and going. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Threads... last call
Dan Sugalski wrote: Last chance to get in comments on the first half of the proposal. If it looks adequate, I'll put together the technical details (functions, protocols, structures, and whatnot) and send that off for abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and get the implementation in and going. Dan, Sorry to jump in out of the blue here, but did you respond to Damien Neil's message about locking issues? (Or did I just miss it?) This sounds like it could be a critically important design question; wouldn't it be best to address it before jumping into implementation? If there's a better approach available, wouldn't this be the best time to determine that? Deven Date: Wed, 21 Jan 2004 13:32:52 -0800 From: Damien Neil [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Start of thread proposal Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] In-Reply-To: [EMAIL PROTECTED] Content-Length: 1429 On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote: ... seems to indicate that even whole ops like add P,P,P are atomic. Yep. They have to be, because they need to guarantee the integrity of the pmc structures and the data hanging off them (which includes buffer and string stuff) Personally, I think it would be better to use corruption-resistant buffer and string structures, and avoid locking during basic data access. While there are substantial differences in VM design--PMCs are much more complicated than any JVM data type--the JVM does provide a good example that this can be done, and done efficiently. Failing this, it would be worth investigating what the real-world performance difference is between acquiring multiple locks per VM operation (current Parrot proposal) vs. having a single lock controlling all data access (Python) or jettisoning OS threads entirely in favor of VM-level threading (Ruby). This forfeits the ability to take advantage of multiple CPUs--but Leopold's initial timing tests of shared PMCs were showing a potential 3-5x slowdown from excessive locking. I've seen software before that was redesigned to take advantage of multiple CPUs--and then required no less than four CPUs to match the performance of the older, single-CPU version. The problem was largely attributed to excessive locking of mostly-uncontested data structures. - Damien