[perl #55000] [PATCH] Threads Failures on Optimized Build

2008-06-03 Thread NotFound
On Sun, Jun 1, 2008 at 1:31 PM, Vasily Chekalkin [EMAIL PROTECTED] wrote:

 interp-exceptions initialized lazily. But really_destroy_exception have
 signature with __attribute_notnull__. So we should either check this value
 before function call or change function signature to accepts NULL.

I tried this variant:

--- src/exceptions.c(revisión: 28050)
+++ src/exceptions.c(copia de trabajo)
@@ -772,8 +772,10 @@
 void
 destroy_exception_list(PARROT_INTERP)
 {
-really_destroy_exception_list(interp-exceptions);
-really_destroy_exception_list(interp-exc_free_list);
+if (interp-exceptions)
+really_destroy_exception_list(interp-exceptions);
+if (interp-exc_free_list)
+really_destroy_exception_list(interp-exc_free_list);
 }

 /*

In my platform, Ubuntu 8.04 i386, solves both this problem and #55170

The diagnostic is the same, the root of the problem is to pass null to
a parameter attributed as non null.

(Optionally add several rants about premature optimization here).

-- 
Salu2
Index: src/exceptions.c
===
--- src/exceptions.c	(revisión: 28050)
+++ src/exceptions.c	(copia de trabajo)
@@ -772,8 +772,10 @@
 void
 destroy_exception_list(PARROT_INTERP)
 {
-really_destroy_exception_list(interp-exceptions);
-really_destroy_exception_list(interp-exc_free_list);
+if (interp-exceptions)
+really_destroy_exception_list(interp-exceptions);
+if (interp-exc_free_list)
+really_destroy_exception_list(interp-exc_free_list);
 }
 
 /*


Re: [perl #55000] [PATCH] Threads Failures on Optimized Build

2008-06-03 Thread chromatic
On Tuesday 03 June 2008 10:50:27 NotFound via RT wrote:

 On Sun, Jun 1, 2008 at 1:31 PM, Vasily Chekalkin [EMAIL PROTECTED] wrote:
  interp-exceptions initialized lazily. But really_destroy_exception have
  signature with __attribute_notnull__. So we should either check this
  value before function call or change function signature to accepts NULL.

 I tried this variant:

 --- src/exceptions.c(revisión: 28050)
 +++ src/exceptions.c(copia de trabajo)
 @@ -772,8 +772,10 @@
  void
  destroy_exception_list(PARROT_INTERP)
  {
 -really_destroy_exception_list(interp-exceptions);
 -really_destroy_exception_list(interp-exc_free_list);
 +if (interp-exceptions)
 +really_destroy_exception_list(interp-exceptions);
 +if (interp-exc_free_list)
 +really_destroy_exception_list(interp-exc_free_list);
  }

  /*

 In my platform, Ubuntu 8.04 i386, solves both this problem and #55170

 The diagnostic is the same, the root of the problem is to pass null to
 a parameter attributed as non null.

 (Optionally add several rants about premature optimization here).

Agreed, and applied as r28051.  Thanks, everyone!

-- c


Re: [perl #55000] Threads Failures on Optimized Build

2008-06-01 Thread Vasily Chekalkin

chromatic wrote:

There is little bit different patch for it.

--- a/src/exceptions.c
+++ b/src/exceptions.c
@@ -772,7 +772,9 @@ associated exceptions free list for the specified 
interpreter.

 void
 destroy_exception_list(PARROT_INTERP)
 {
-really_destroy_exception_list(interp-exceptions);
+if (interp-exceptions != NULL) {
+really_destroy_exception_list(interp-exceptions);
+}
 really_destroy_exception_list(interp-exc_free_list);
 }



interp-exceptions initialized lazily. But really_destroy_exception have 
signature with __attribute_notnull__. So we should either check this 
value before function call or change function signature to accepts NULL.


--
Bacek.


[perl #55000] Threads Failures on Optimized Build

2008-05-28 Thread via RT
# New Ticket Created by  chromatic 
# Please include the string:  [perl #55000]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=55000 


I'm seeing several test failures from an optimized build (Ubuntu Hardy Heron 
x86 32-bit).  Here's the verbose output from prove.  I can post backtraces if 
necessary.

As far as I can tell, they've been present for at least a thousand commits.

t/pmc/threads.t.
1..20
ok 1 - interp identity
not ok 2 - thread type 1

#   Failed test 'thread type 1'
#   at t/pmc/threads.t line 80.
# Exited with error code: 139
# Received:
# thread
# main 10
# Segmentation fault
# 
# Expected:
# thread
# main 10
# 
not ok 3 - thread type 1 -- repeated

#   Failed test 'thread type 1 -- repeated'
#   at t/pmc/threads.t line 115.
# Exited with error code: 139
# Received:
# thread
# main 10
# Segmentation fault
# 
# Expected:
# thread
# main 10
# thread
# main 10
# 
not ok 4 - thread type 2

#   Failed test 'thread type 2'
#   at t/pmc/threads.t line 161.
# Exited with error code: 139
# Received:
# ok 1
# hello from thread
# ParrotThread tid 1
# from 10 interp
# Segmentation fault
# 
# Expected:
# ok 1
# hello from thread
# ParrotThread tid 1
# from 10 interp
# 
ok 5 - thread - kill
not ok 6 - join, get retval

#   Failed test 'join, get retval'
#   at t/pmc/threads.t line 237.
# Exited with error code: 139
# Received:
# Segmentation fault
# 
# Expected:
# 500500
# 500500
# 
not ok 7 - detach

#   Failed test 'detach'
#   at t/pmc/threads.t line 290.
# Exited with error code: 139
# Received:
# thread
# done
# Segmentation fault
# 
# Expected:
# /(done\nthread\n)|(thread\ndone\n)/
# 
not ok 8 - share a PMC

#   Failed test 'share a PMC'
#   at t/pmc/threads.t line 319.
# Exited with error code: 139
# Received:
# thread
# 20
# Segmentation fault
# 
# Expected:
# thread
# 20
# done
# 21
# 
not ok 9 - multi-threaded

#   Failed test 'multi-threaded'
#   at t/pmc/threads.t line 355.
# Exited with error code: 139
# Received:
# 3
# 1
# 2
# 3
# done thread
# Segmentation fault
# 
# Expected:
# 3
# 1
# 2
# 3
# done thread
# done main
# 
not ok 10 - sub name lookup in new thread

#   Failed test 'sub name lookup in new thread'
#   at t/pmc/threads.t line 403.
# Exited with error code: 139
# Received:
# ok
# ok
# Segmentation fault
# 
# Expected:
# ok
# ok
# 
not ok 11 - CLONE_CODE only

#   Failed test 'CLONE_CODE only'
#   at t/pmc/threads.t line 436.
# Exited with error code: 139
# Received:
# ok 1
# ok 2
# ok 3
# ok 4
# Segmentation fault
# 
# Expected:
# ok 1
# ok 2
# ok 3
# ok 4
# ok 5
# 
not ok 12 - CLONE_CODE | CLONE_GLOBALS

#   Failed test 'CLONE_CODE | CLONE_GLOBALS'
#   at t/pmc/threads.t line 495.
# Exited with error code: 139
# Received:
# in thread:
# ok alpha
# ok beta1
# ok beta2
# ok beta3
# Segmentation fault
# 
# Expected:
# in thread:
# ok alpha
# ok beta1
# ok beta2
# ok beta3
# in main:
# ok alpha
# ok beta1
# ok beta2
# ok beta3
# 
not ok 13 - CLONE_CODE | CLONE_CLASSES; superclass not built-in # TODO vtable 
overrides aren't properly cloned RT# 46511
#   Failed (TODO) test 'CLONE_CODE | CLONE_CLASSES; superclass not built-in'
#   at t/pmc/threads.t line 580.
# Exited with error code: 139
# Received:
# in thread:
# Segmentation fault
# 
# Expected:
# in thread:
# A Bar
# called Bar's barmeth
# called Foo's foometh
# Integer? 0
# Foo? 1
# Bar? 1
# in main:
# A Bar
# called Bar's barmeth
# called Foo's foometh
# Integer? 0
# Foo? 1
# Bar? 1
# 
not ok 14 - CLONE_CODE | CLONE_CLASSES; superclass built-in

#   Failed test 'CLONE_CODE | CLONE_CLASSES; superclass built-in'
#   at t/pmc/threads.t line 665.
# Exited with error code: 139
# Received:
# in thread:
# A Bar
# called Bar's barmeth
# called Foo's foometh
# Integer? 1
# Foo? 1
# Bar? 1
# Segmentation fault
# 
# Expected:
# in thread:
# A Bar
# called Bar's barmeth
# called Foo's foometh
# Integer? 1
# Foo? 1
# Bar? 1
# in main:
# A Bar
# called Bar's barmeth
# called Foo's foometh
# Integer? 1
# Foo? 1
# Bar? 1
# 
not ok 15 - CLONE_CODE | CLONE_GLOBALS| CLONE_HLL

#   Failed test 'CLONE_CODE | CLONE_GLOBALS| CLONE_HLL'
#   at t/pmc/threads.t line 750.
# Exited with error code: 139
# Received:
# in thread:
# ok 1
# ok 2
# Segmentation fault
# 
# Expected:
# in thread:
# ok 1
# ok 2
# in main:
# ok 1
# ok 2
# 
not ok 16 - globals + constant table subs issue

#   Failed test 'globals + constant table subs issue'
#   at t/pmc/threads.t line 816.
# Exited with error code: 139
# Received:
# ok 1
# ok 2
# ok 3
# ok 4
# ok 5
# ok 6
# ok 7
# ok 8
# ok 9
# ok 10
# ok 11
# ok 12
# ok 13
# ok 14
# ok 15
# ok 16
# ok 17
# ok 18
# ok 19
# ok 20
# ok 21
# ok 22
# ok 23
# ok 24
# ok 25
# ok 26
# ok 27
# ok 28
# ok 29
# ok 30
# ok 31
# ok 32
# ok 33
# ok 34
# ok 35
# ok 36
# ok 37
# ok 38
# ok 39
# ok 40
# ok 41
# ok 42
# ok 43
# ok 44
# ok 45
# ok 46
# ok 47
# ok 48
# ok 49
# ok 50
# ok 51
# ok 52
# ok 53
# ok 54
# ok 55
# ok 56
# ok 57
# ok 58
# ok 59
# ok 60
# ok 61

Dynamic binding (e.g. Perl5 local), continuations, and threads

2005-12-23 Thread Bob Rogers
   The obvious way to implement Perl5 local bindings in Parrot would
be to:

   1.  Capture the old value in a temporary;

   2.  Use store_global to set the new value;

   3.  Execute the rest of the block; and

   4.  Do another store_global to reinstate the old value.

This has a serious flaw:  It leaves the new binding in effect if
executing the rest of the block causes a nonlocal exit.  Throwing can be
dealt with by establishing an error handler that reinstates the old
value and rethrows, but this doesn't begin to address continuations, not
to mention coroutines, which can be used to jump to an arbitrary call
frame without unwinding the stack.

   Then there are threads to consider.  The naive approach makes each
thread's dynamic bindings visible to all other threads, which may or may
not be desirable (not, IMHO, but this is a language design issue).
Worse, the final state of the variable depends on which thread exits
first, which is surely a bug.

Proposal:

   The only reasonable approach (it seems to me) is to keep dynamic
binding information in the call frame.  One possible implementation is
as follows:

   1.  Add a dynamic_bindings pointer to the call frame.  This points to
a linked list of the frame's current bindings, each entry of which holds
a name/value pair.  Each thread's initial frame gets its
dynamic_bindings list initialized to NULL.  Each new frame's
dynamic_bindings list is initialized from the calling frame's
dynamic_bindings.  Nothing additional need be done for continuation
calling.  The dynamic_bindings list needs to be visited during GC.

   2.  Modify store_global and find_global to search this list for the
desired global.  If found, store_global modifies the entry value, and
find_global fetches the entry value.  If not found, the existing hash is
consulted in the current fashion (modulo namespace implementation).

   3.  Add a bind_global op with the same prototype as store_global that
pushes a new entry on the dynamic_bindings list.

   4.  Add an unbind_global op that takes an integer or integer constant
and pops that many entries off of the dynamic_bindings list.  Bindings
are effectively popped when the sub exists, but this op is still needed
for cases where the end of a dynamic binding lifetime comes before the
end of the sub.

Advantages:

   + The scheme is robust with respect to continuations, threads,
coroutines, and nonlocal exits.  Each sub creates dynamic bindings that
are visible only within its dynamic scope (i.e. subs that it calls,
directly or indirectly), and not to other threads or coroutines.

   + The overhead is low:  a pointer copy on call, none on return, and
zero context switching overhead.  For typical programs with little or no
dynamic binding, these are the only costs.

   + The code that a human or compiler needs to emit is even simpler
than that of the naive scheme described above.

Disadvantages:

   + The time to fetch or store a dynamic binding is proportional to the
depth of the dynamic_bindings stack, which could be considerable for
languages that do a lot of dynamic binding.  Conceivably, this could be
addressed by using a language-dependant PMC class for the binding entry,
and any such dynamic-binding-intensive language could define a per-frame
class that acted as a linked list of hashes.  But this could be
postponed until it was needed, possibly indefinitely.

   If this is acceptable (and there isn't already a better plan), I will
have time to address this over the holiday week.  TIA,

-- Bob Rogers
   http://rgrjr.dyndns.org/


Re: threads on Solaris aren't parallel?

2005-12-13 Thread Erik Paulson
On Mon, Dec 12, 2005 at 10:28:31PM +0100, Leopold Toetsch wrote:
 
 On Dec 12, 2005, at 17:53, Erik Paulson wrote:
 
 Hi -
 
 I'm using an older version of Parrot (0.2.2) so I can use threads.
 
 It seems that Parrot on Solaris doesn't ever use more than one 
 processor.
 
 [ ... ]
 
 Is there some way we can check to see if Parrot is actually creating 
 more than
 one thread? Is it some sort of crazy green-thread issue?
 
 There are AFAIK some issues with solaris (but I don't know the details) 
 It might need a different threading lib or some additional init code to 
 create 'real' threads.
 

I've got it to work now, thanks to Joe Wilson who gave me the last clue.

I turned on pthreads in configure:
perl Configure.pl --ccflags=:add{ -pthreads -D_REENTERANT } 
--linkflags=:add{ -pthreads }

and I changed the definitino of CREATE_THREAD_JOINABLE:


#  define THREAD_CREATE_JOINABLE(t, func, arg) do {\
pthread_attr_t  attr;   \
int rc = pthread_attr_init(attr);  \
assert(rc == 0);\
rc = pthread_attr_setscope(attr, PTHREAD_SCOPE_SYSTEM);   \
assert(rc == 0);\
rc = pthread_setconcurrency(8);   \
assert(rc == 0);\
pthread_create(t, NULL, func, arg);  \
} while(0)



The default to attrsetsope on Solaris is SCOPE_PROCESS, the default on Linux
is SCOPE_SYSTEM. 

I'm on Solaris 8, and without the call to pthread_setconcurrency, I only ran
one thread at a time. Starting in Solaris 9, pthread_setconcurrency doesn't
do anything.  (I don't have a Solaris 9 SMP I can test on to see if parrot
uses multiple processors concurrently without the call to set_concurrency.

My runtimes get about twice as fast every time I add a processor. I'm not sure
what the minimal set of calls I need to add are - the setconcurrency was the
last thing I tried, and once I added it in things started working - I don't
know if that means I can remove my other changes and things will still work,
I'll do that experiment later.

-Erik



Re: threads on Solaris aren't parallel?

2005-12-13 Thread Leopold Toetsch


On Dec 13, 2005, at 19:35, Erik Paulson wrote:

I've got it to work now, thanks to Joe Wilson who gave me the last 
clue.


Great.


I turned on pthreads in configure:
perl Configure.pl --ccflags=:add{ -pthreads -D_REENTERANT } 
--linkflags=:add{ -pthreads }


Have a look at config/init/hints/solaris.pm to make this change more 
permanent.
Also having some SOLARIS_VERSION define (for below) if it's solaris 
would be good I presume.



and I changed the definitino of CREATE_THREAD_JOINABLE:


For a final patch you could include some #ifdef SOLARIS_VERSION == foo 
to just include necessary extensions.



-Erik


leo



Re: threads on Solaris aren't parallel?

2005-12-13 Thread Greg Miller

Leopold Toetsch wrote:


and I changed the definitino of CREATE_THREAD_JOINABLE:



For a final patch you could include some #ifdef SOLARIS_VERSION == foo 
to just include necessary extensions.


Doesn't look like there's anything Solaris-specific here. Other 
non-Linux OSes will need the same changes (FreeBSD supports both 
SCOPE_SYSTEM and SCOPE_PROCESS, for example), will they not?

--
http://www.velocityvector.com/ | http://glmiller.blogspot.com/
http://www.classic-games.com/  |
 The hand ain't over till the river.


threads on Solaris aren't parallel?

2005-12-12 Thread Erik Paulson
Hi -

I'm using an older version of Parrot (0.2.2) so I can use threads.

It seems that Parrot on Solaris doesn't ever use more than one processor.

The program attached should create argv[1] number of threads, and 
divide up over both of them argv[2] - ie perfect linear speedup.
I've got a dual-processor Xeon (a real one, not this hyperthreaded stuff)
and I indeed get speedup:

tonic(19)% time ./parrot /common/tmp/jon/thread_test.pir 1 5
...
18.870u 0.010s 0:18.93 99.7%0+0k 0+0io 534pf+0w

tonic(20)% time ./parrot /common/tmp/jon/thread_test.pir 2 5
...
19.360u 0.030s 0:09.93 195.2%   0+0k 0+0io 534pf+0w
tonic(21)% 

However, on a Solaris machine that has 8 CPUs, we get no speedup:

pinot(6)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 1 5000
...
9.69u 0.05s 0:09.77 99.6%

pinot(7)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 2 5000
...
9.08u 0.09s 0:09.19 99.7%

pinot(8)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 4 5000
...
9.28u 0.07s 0:09.38 99.6%

pinot(9)% time ./parrot-solaris /common/tmp/jon/thread_test.pir 8 5000
...
9.67u 0.03s 0:09.74 99.5%

Is there some way we can check to see if Parrot is actually creating more than 
one thread? Is it some sort of crazy green-thread issue?

Thanks,

-Erik


# Basic shared array program for parrot
.sub _main
.param pmc argv
.sym int threadIncs
.sym pmc threads
.sym pmc child
.sym pmc Inc_array
.local pmc increment_pass
.local pmc seed_param
.local int i, value, seed
.local pmc temp
.local int tmp
.local int offset
.local int numThreads
.local pmc logtmlib
.local pmc DoBreakpoint
.local string parameter


parameter = shift argv
parameter = shift argv
numThreads = parameter
parameter = shift argv
#numThreads = 1
threadIncs = parameter
threadIncs = threadIncs/numThreads
 
init_array:

Inc_array = global increment_array # get function pointer

# setup an array to hold threads
threads = new .FixedPMCArray
threads = 50

# Set the number of increments to do in each thread
increment_pass = new .Integer
increment_pass = threadIncs

seed = 54433
seed_param = new .Integer

i = 0

create_Thread:
child = new ParrotThread  # basically new thread
.sym pmc New_thread
find_method New_thread, child, thread3
seed_param = seed
increment_pass = increment_pass 
.pcc_begin
.arg Inc_array
.arg increment_pass
.arg seed_param
.invocant child
.nci_call New_thread
.pcc_end
threads[i] = child
inc i

if i  numThreads goto create_Thread

i = 0
# Join and wait on threads
_join_thread:
.sym int tid
.sym pmc Thread_join
child = threads[i]
tid = threads[i]
find_method Thread_join, child, join

.pcc_begin
.arg tid
.nci_call Thread_join
.pcc_end
threads[i] = child
inc i
if i  numThreads goto _join_thread

#DoBreakpoint()
i = 0
tmp = 0
_main_print_loop:
tmp = tmp + value
print value
print  
inc i
if i  100 goto _main_print_loop
print \n
print tmp
print \n

.end


# The code to exectute in the thread
.sub increment_array
.param pmc sub
.param pmc increments
.param pmc seed_param
.local int i, tmp, value, numIncs, rand, index
.local int temp

numIncs = increments
i = 0

s_loop:
inc i
if i  numIncs goto s_loop

i = 0
.end




Re: threads on Solaris aren't parallel?

2005-12-12 Thread Leopold Toetsch


On Dec 12, 2005, at 17:53, Erik Paulson wrote:


Hi -

I'm using an older version of Parrot (0.2.2) so I can use threads.

It seems that Parrot on Solaris doesn't ever use more than one 
processor.


[ ... ]

Is there some way we can check to see if Parrot is actually creating 
more than

one thread? Is it some sort of crazy green-thread issue?


There are AFAIK some issues with solaris (but I don't know the details) 
It might need a different threading lib or some additional init code to 
create 'real' threads.



Thanks,

-Erik


leo



Re: threads on Solaris aren't parallel?

2005-12-12 Thread Jack Woehr

Leopold Toetsch wrote:

There are AFAIK some issues with solaris (but I don't know the 
details) It might need a different threading lib or some additional 
init code to create 'real' threads.


You just have to know how they implement pthreads, which is 
weasel-worded in POSIX and
allows Solaris much divergence from what you expect. It's the LWP versus 
full-fledged process

thing.

--
Jack J. Woehr # I never played fast and loose with the
PO Box 51, Golden, CO 80402   # Constitution. Never did and never will.
http://www.well.com/~jax  # - Harry S Truman



Re: threads

2005-10-02 Thread Jonathan Worthington

Dave Frost [EMAIL PROTECTED] wrote:
From the outset i decided i wanted the vm to provide its own threading 
mechanism i.e. not based on posix threads for example.
Parrot had the option of providing its own threads, thread scheduling and 
the like.  As leo mentioned, we're using OS threads.  The problem with 
threads in user space (as you propose) is that the operating system doesn't 
know of the threads - it just sees the single OS thread running the program. 
And that means that if you have 4 CPUs (or a 4-core CPU, which I doubt will 
be uncommon in the near future) then any program running on your VM can only 
ever use 2 of those (the OS can only schedule threads that it knows about, 
and your VM's program's threads wouldn't be real ones, so couldn't be 
scheduled on seperate CPUs).  With concurrency being of increasing 
importance, I think this is a powerful argument for using OS threads.


My first plan was to have 2 native threads, one for execution of the main 
'core' execution code the runtime if you like, the other thread was used 
to tell the execution code to swap threads.
The most expensive part of using threading, aside from thread creation, is 
doing context switches (between threads).  Here, you are requiring two 
context switches to provide one virtual context switch to code you are 
executing.  And the switches are, of course, wasted if you decide not to 
switch.


I thought i could synchronise these 2 using native semaphores.  When it 
comes down to it a single op code takes a number of nastive instructions 
i.e. to execute an add insttuction i may have to do (say) 5 things.  so  I 
just check after each op code has been executed to see if the thread needs 
swapping out.  That seems like a bad idea mainly due to speed/efficiency. 
Each thread is a top level object, so the stack, all stack pointers and 
register data etc resides in the thread, but i still cant have the 
execution engine swapping threads mid operation, so in my add example i 
still dont think i would want the execution engine swapping out a thread 
after 3 instructions, it would need to complete all 5.


Its been a bit of  a brick wall this, it seemed to be going quite well up 
to this point and i need to solve this before i can move on with 
lwm,(light weight machine).


Any pointers, thoughts or comments are welcome.

I guess what I really want to say is consider using OS threads.  :-)  But 
more helpfully, here's a (hacky, but maybe workable) approach that 
immediately occurs to me.  I assume you have a sequence of bytecode that you 
execute.  When you need to do a context switch, the thread is doing the 
signalling to say context switch now takes a copy of the instruction part 
of the next opcode that would be executed in the current virtual thread 
and then replaces the instruction with a context switch opcode.  Then, 
when the context switch opcode executes, it replaces the instruction in the 
bytecode with the original one and does the context switch.  But you need 
special cases for ops that want to try and obtain locks, so you can force a 
switch if the lock isn't available and stuff.  And probably a flag to set 
when a switch happens, so you don't mess up the bytecode by re-writing the 
context switch opcode twice.


Not saying it's the best scheme, but it may be Good Enough.

Have fun,

Jonathan 



threads

2005-09-27 Thread Dave Frost

hi all,

Im interested to know how perl6/parrot implements threads.  Im mainly 
interested as im writing a small vm of my own.  Not a production vm like 
parrot, more for interest/curiosity etc.  From the outset i decided i 
wanted the vm to provide its own threading mechanism i.e. not based on 
posix threads for example.  My first plan was to have 2 native threads, 
one for execution of the main 'core' execution code the runtime if you 
like, the other thread was used to tell the execution code to swap 
threads.  I thought i could synchronise these 2 using native 
semaphores.  When it comes down to it a single op code takes a number of 
nastive instructions i.e. to execute an add insttuction i may have to do 
(say) 5 things.  so  I just check after each op code has been executed 
to see if the thread needs swapping out.  That seems like a bad idea 
mainly due to speed/efficiency.  Each thread is a top level object, so 
the stack, all stack pointers and register data etc resides in the 
thread, but i still cant have the execution engine swapping threads mid 
operation, so in my add example i still dont think i would want the 
execution engine swapping out a thread after 3 instructions, it would 
need to complete all 5.


Its been a bit of  a brick wall this, it seemed to be going quite well 
up to this point and i need to solve this before i can move on with 
lwm,(light weight machine).


Any pointers, thoughts or comments are welcome.

Cheers

Dave



Re: threads

2005-09-27 Thread Leopold Toetsch


On Sep 27, 2005, at 17:14, Dave Frost wrote:



hi all,

Im interested to know how perl6/parrot implements threads.


*) based on OS threads
*) one interpreter per thread
*) STM for shared objects / atomicity


Any pointers, thoughts or comments are welcome.

Cheers

Dave


leo



Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-06-03 Thread jerry gay
 Mr. Gay, let me know if you wait for a special request to uncomment the line
 
 /*#include parrot/thr_windows.h*/
 
 in config/gen/platform/win32/threads.h
 
whatever was broken, has now been fixed. patch applied, and ticket closed.

~jerry


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-25 Thread jerry gay
On 5/9/05, jerry gay [EMAIL PROTECTED] wrote:
 much better! one failing test now...

my initial exuberance was unfounded. one test fails in
t/pmc/threads.t, but hundreds fail in the rest of the test suite. it
seems this line (from above) is the culprit:

 -#  ifdef _MCS_VER1
 +#  ifdef _MCS_VER

so it seems the definition of THREAD_CREATE_JOINABLE() (which follows
this directive in include/parrot/thr_windows.h) is incorrect.

On 5/19/05, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Vladimir Lipsky [EMAIL PROTECTED] wrote:
 
  Parrot_really_destroy needs to be fixed
 
 $verbose++ please, thanks
 
yes, please. until this issue is fixed, i'm rolling back these patches
so the threads test 6 is again skipped on windows, and the 200-odd
failing tests will work again. feel free to send more patches, i'll
happily test them (more carefully next time) and work out the bugs
before applying.

patch applied as r8165.
~jerry


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-19 Thread Leopold Toetsch
Vladimir Lipsky [EMAIL PROTECTED] wrote:

 1) Why the heck

Easy: it's not in the MANIFEST. Why: patches scattered between inline
and attached and the MANIFEST part missing ... it's easy to overlook.

 -#  ifdef _MCS_VER1
 +#  ifdef _MCS_VER

Thanks, applied - hope that's really the whole thing now ;-)

leo


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-19 Thread Vladimir Lipsky
 jerry gay [EMAIL PROTECTED] wrote:
much better! one failing test now...
D:\usr\local\parrot-HEAD\trunkperl t/harness t/pmc/threads.t
t/pmc/threadsok 3/11# Failed test (t/pmc/threads.t at line 
163)
#  got: 'start 1
# in thread
# done
# Can't spawn .\parrot.exe 
D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm: Bad file
descriptor at lib/Parrot/Test.pm line 231.
# '
# expected: 'start 1
t/pmc/threadsNOK 4# in thread
# done
# '
# '.\parrot.exe 
D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm' failed with
exit code 255
Parrot_really_destroy needs to be fixed


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-19 Thread Leopold Toetsch
Jerry Gay [EMAIL PROTECTED] wrote:

 here's the patch to unskip test 6:

Thanks, applied.
leo


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-19 Thread Leopold Toetsch
Vladimir Lipsky [EMAIL PROTECTED] wrote:

 D:\usr\local\parrot-HEAD\trunk\t\pmc\threads_4.pasm' failed with
 exit code 255

 Parrot_really_destroy needs to be fixed

$verbose++ please, thanks

leo


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-18 Thread Vladimir Lipsky
As stated already, this (and possibly other thread) test(s) can't succeed
as long as Win32 has no event loop that passes the terminate event on to
the running interpreter.
1) Why the heck
--- parrot/config/gen/platform/win32/threads.h Mon May 2 14:40:59 2005
+++ parrot-devel/config/gen/platform/win32/threads.h Mon May 2 14:42:58 2005
@@ -0,0 +1,3 @@
+
+#include parrot/thr_windows.h
+
isn't in the repository?
2) To test both cases(MS compiler and not), I played with the macro #ifdef 
_MCS_VER in thr_windows.h and forgot 1 at the and of it. The patch applied 
removes it. Though it couldn't affect the test results as long as 
thr_windows.h wasn't included at all.


mcs_ver.patch
Description: Binary data


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-17 Thread Leopold Toetsch
Vladimir Lipsky wrote:
parrot (r8016): no change. hangs w/98% cpu. here's the -t output:
As stated already, this (and possibly other thread) test(s) can't 
succeed as long as Win32 has no event loop that passes the terminate 
event on to the running interpreter.

The last two pmc's are allocated from a place which is clearly not the pmc
pool arena from which other pmc's are allocated.
Run is_pmc_ptr(interp, pmc) or check the involved pmc arenas to verify 
this assumption.

leo


Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-16 Thread jerry gay
parrot (r8016): no change. hangs w/98% cpu. here's the -t output:

parrot -t test_b.pasm
 0 find_global P5, _foo   - P5=SArray=PMC(0x7d5a50),
 3 new P2, 18   - P2=PMCNULL,
 6 find_method P0, P2, thread3- P0=PMCNULL,
P2=ParrotThread=PMC(0x7d5a08),
10 new P6, 54   - P6=PMCNULL,
13 set I3, 2- I3=1,
16 invoke
17 set I5, P2   - I5=0, P2=ParrotThread=PMC(0x7d5a08)
20 getinterp P2 - P2=ParrotThread=PMC(0x7d5a08)
22 find_method P0, P2, detach - P0=NCI=PMC(0x638620), P2=ParrotInterpr
eter=PMC(0x637dc8),
26 invoke
27 defined I0, P6   - I0=1, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
27 defined I0, P6   - I0=0, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
27 defined I0, P6   - I0=0, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
27 defined I0, P6   - I0=0, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
27 defined I0, P6   - I0=0, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
27 defined I0, P6   - I0=0, P6=TQueue=PMC(0x7d59f0)
30 unless I0, -3- I0=0,
etc

On 5/15/05, Vladimir Lipsky [EMAIL PROTECTED] wrote:
  the 'detatch' threads test hangs on win32. this small patch skips one
 
 Could you try the following code('the detatch' threads test with one tweak)
 and tell me if it hangs either and what output you get?
 
 find_global P5, _foo
 new P2, .ParrotThread
 find_method P0, P2, thread3
 new P6, .TQueue # need a flag that thread is done
 set I3, 2
 invoke # start the thread
 set I5, P2
 getinterp P2
 find_method P0, P2, detach
 invoke
 wait:
 defined I0, P6
 unless I0, wait
 print done\n
 sleep 5 # Maybe a race condition?
 end
 .pcc_sub _foo:
 print thread\n
 new P2, .Integer
 push P6, P2 # push item on queue
 returncc
 



Re: [perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-07 Thread Leopold Toetsch
Jerry Gay [EMAIL PROTECTED] wrote:

 the 'detatch' threads test hangs on win32. this small patch skips one
 test, so others may fail :)

Thanks, applied.
leo


[perl #35305] [PATCH] skip threads 'detatch' test on win32

2005-05-06 Thread via RT
# New Ticket Created by  jerry gay 
# Please include the string:  [perl #35305]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/rt3/Ticket/Display.html?id=35305 


the 'detatch' threads test hangs on win32. this small patch skips one
test, so others may fail :)

~jerry


Index: t/pmc/threads.t

===

--- t/pmc/threads.t (revision 7994)

+++ t/pmc/threads.t (working copy)

@@ -263,6 +263,8 @@

 500500
 OUTPUT
 
+SKIP: {
+  skip(detach broken on $^O, 1) if ($^O =~ /MSWin32/);
 output_like('CODE', 'OUTPUT', detach);
 find_global P5, _foo
 new P2, .ParrotThread
@@ -290,6 +292,7 @@

 CODE
 /(done\nthread\n)|(thread\ndone\n)/
 OUTPUT
+}
 
 output_is('CODE', 'OUTPUT', share a PMC);
 find_global P5, _foo


Re: COND macros (was: Threads, events, Win32, etc.)

2004-11-20 Thread Gabe Schaffer
 Parrot's locks will all have wait/signal/broadcast capabilities. We
 should go rename the macros and rejig the code. This may have to wait

Really? I'm not sure I understand what broadcast does on a lock. Are
you talking about something like P5's condpair? If so, why not just
cop that code? Of course, I don't have a clue what it does on Win32,
so maybe that's not such a good idea.

GNS


Re: Threads, events, Win32, etc.

2004-11-19 Thread Gabe Schaffer
 [ long win32 proposal ]
 
 I've to read through that some more times.

OK; let me know if you have any questions on how the Win32 stuff
works. I tried to explain things that are unlike POSIX, but of course
it makes sense to me.

 Do you alread have ideas for a common API, or where to split the
 existing threads.c into platform and common code?

I didn't see anything in thread.c that was platform specific -- or at
least nothing that looked like it wouldn't work on Win32. Obviously
thr_win32.h will be much different than thr_pthreads.h.

As for a common API, I suppose we would have to figure out how the
modules would interact. In the case of IO (any function names I made
up have capital letters to distinguish them from anything that may be
in the current codebase):

* There is some generic IO code that sets up IO and event objects, and
it sits below the buffering layer. All this stuff is going to be
thread-safe so the low-level IO code shouldn't have to worry about it.
This generic IO code sets up the IO and Event objects indicating what
file, which operation (read/write/lock/unlock), what to do when the
operation completes, how many bytes, file position, and any memory
buffer needed. Then it would pass this information to the OS-specific
code.

So this generic code (say, StartIO()) would create an IO object that
contains the file object, a pointer to the r/w buffer if the operation
requires it, and a file position and byte count if the operation uses
it. It would also contain an event object indicating what to do in
case of failure or completion. StartIO pins the memory buffer, locks
the IO object, and calls Win32AsyncRead or whatever.

The XXXAsyncYYY funtcion starts the IO and returns to StartIO which
unlocks the IO object and returns it to the caller. That can then be
passed into functions to find out if it's complete, cancel it, etc.

If XXX is Win32, the module would just start Read/WriteFile; if it's
Solaris, maybe it'll call aioread/write; if it's POSIX without aio
(e.g. Linux), it'll start up a new thread to do a blocking read/write.

When the IO completes, the XXX code figures out the return code and
how many bytes read/written. This information is passed to the generic
CompleteIO(), which locks the IO object, updates the status (return
code, byte count), unpins the buffer, dispatches the event as
described by the Event object, and unlocks the IO object.

* In the case of timers, there would be XXXCreateTimer and
XXXCancelTimer, while the XXX code would need to call FireTimer().

* I suppose there should be an XXXQueueEvent function as well, but I'm
not certain of its uses. Actual EventDispatch() would be a generic
function that puts an event into a thread's queue and notifies the
thread. I am not sure how to handle general events like RunGC.

* GUI message handlers need to dispatch events synchronously. So the
generic check_events function would call XXXGetGUIMessages which would
need to dispatch the messages to whatever registered for them in that
thread. Since this would amount to method dispatch instead of event
dispatch, it would probably need something special for this.

GNS


Re: COND macros (was: Threads, events, Win32, etc.)

2004-11-19 Thread Gabe Schaffer
On Wed, 17 Nov 2004 16:30:04 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Gabe Schaffer [EMAIL PROTECTED] wrote:
 The problem is a different one: the COND_INIT macro just passes a
 condition location, the mutex is created in a second step, which isn't
 needed for windows. OTOH a mutex aka critical section is needed
 separatly.
 
 So we should probably define these macros to be:
 
  COND_INIT(c, m)
  COND_DESTROY(c, m)
 
 see src/tsq.c for usage.
 
 Does win32 require more info to create conditions/mutexes or would these
 macros suffice?

Win32 doesn't require anything else, but I don't think I like this
idea. If you do COND_INIT(c, m) and Win32 ignores the 'm', what
happens when some code goes to LOCK(m)? It would work under POSIX but
break under Win32. I think there should be an opaque struct that
contains c,m for POSIX and c for Win32.

GNS


Re: COND macros (was: Threads, events, Win32, etc.)

2004-11-19 Thread Dan Sugalski
At 8:42 AM -0500 11/19/04, Gabe Schaffer wrote:
On Wed, 17 Nov 2004 16:30:04 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Gabe Schaffer [EMAIL PROTECTED] wrote:
 The problem is a different one: the COND_INIT macro just passes a
 condition location, the mutex is created in a second step, which isn't
 needed for windows. OTOH a mutex aka critical section is needed
 separatly.
 So we should probably define these macros to be:
  COND_INIT(c, m)
  COND_DESTROY(c, m)
 see src/tsq.c for usage.
 Does win32 require more info to create conditions/mutexes or would these
 macros suffice?
Win32 doesn't require anything else, but I don't think I like this
idea. If you do COND_INIT(c, m) and Win32 ignores the 'm', what
happens when some code goes to LOCK(m)? It would work under POSIX but
break under Win32. I think there should be an opaque struct that
contains c,m for POSIX and c for Win32.
This'll mean that every mutex will have a corresponding condition 
variable, something that I'm not sure we need.

On the other hand, I can't picture us having so many of these that it 
makes any difference at all, so I don't have a problem with it. It 
isn't a good general-purpose thread solution (there are plenty of 
good reasons to unbundle these) but we don't really *need* a 
general-purpose solution. :)

Parrot's locks will all have wait/signal/broadcast capabilities. We 
should go rename the macros and rejig the code. This may have to wait 
a little -- we're cleaning up the last of subs, I've still got the 
string stuff outstanding, and I promised Sam Ruby I'd deal with 
classes and metaclasses next.

So much time, so little to do. No, wait, that's not right...
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads, events, Win32, etc.

2004-11-17 Thread Gabe Schaffer
  Not quite. COND_WAIT takes an opaque type defined by the platform, that
  happens to be a mutex for the pthreads based implementation.
 
  It should, but it doesn't. Here's the definition:
  #  define COND_WAIT(c,m) pthread_cond_wait(c, m)
 
 You are already in the POSIX specific part.

It came from thr_pthread.h, so it should be POSIX. The issue here is
that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c).
Every place in the code, whether it's Win32 or POSIX, is going to have
to pass in a condition variable and a mutex. Just because Win32 will
ignore the second parameter, that isn't going to prevent the code from
creating the mutex, initializing it, and passing it in.

  I'm not sure, if we even should support Win9{8,5}.
 
  I'd be happy with simply implementing Win9x as a non-threaded
  platform. Of course, hopefully nobody will even ask...
 
 We'll see. But as Parrot's IO system is gonna be asynchronous in core, I
 doubt that we'll support it.

Obviously Parrot has to run on non-threaded platforms where the kernel
threading and AIO stuff just won't work. You can still do user
threads, but file IO will still block everything.

  rationale. I can understand why there would need to be a global event
  thread (timers, GC, DoD), but why would passing a message from one
  thread to another need to be serialized through a global event queue?
 
 The main reason for the global event queue isn't message passing. The
 reason is POSIX signals. Basically you aren't allowed to do anything
 serious in a signal handler, especially you aren't allowed to broadcast
 a condition or something.
 So I came up with that experimental code of one thread doing signals.

Yes, there has to be a separate thread to get signals, and each thread
needs its own event queue, but why does the process have a global
event_queue? I suppose there are generic events that could be handled
just by the next thread to call check_events, but that isn't what this
sounds like.

  And as for IO, I see the obvious advantages of performing synchronous
  IO functions in a separate thread to make them asynchronous, but that
  sounds like the job of a worker thread pool. There are many ways to
  implement this, but serializing them all through one queue sounds like
  a bottleneck to me.
 
 Yes. The AIO library is doing that anyway i.e. utilizing a thread pool
 for IO operations.

I don't see why there needs to be a separate thread to listen for IOs
to finish. Can't that be the same thread that listens for signals?
That is, the IO thread just spends its whole life doing select(). If
it got a signal, select() should return EINTR, so the thread could
then check a flag to see which signal was raised, queue the event in
the proper queue(s), and call select() again.

OK, I think I understand why...the event thread is in a loop waiting
for somebody to tell it that there's an event in the global event
queue...which is really the part I don't get yet.

 Dan did post a series of documents to the list some time ago. Sorry I'be
 no exact subject, but with relevant keywords like events you should
 find it.

Yeah, I remember reading some of his discussions with Damien Neil
because I think I went to school with him.

Anyway, here's my first draft for a Win32 event model:

As for a Win32 event model, I think I should clarify what I'm talking
about when I say Win32.

Win32 IS NOT: The MS Services for Unix package provides a POSIX
subsystem for Windows called Interix which is completely separate from
Win32 (i.e. no GUI is possible, no Win SDK calls are available). It
has fork(), symlinks, pthreads, SysV IPC, POSIX signals, pttys, and
maybe even AIO. This config would be compiled like any other Unix
variant with its own idiosyncracies.

Win32 IS PROBABLY NOT: There are various POSIX emulation layers for
Win32, such as cygwin and MinGW. These provide many function calls
that Unix programs expect, but only to the degree that the Win32
subsystem allows (e.g. chmod likely will not do anything sensible).
Since these programs still run under the Win32 subsystem, Windows GUIs
are still possible. I don't know how these will interact with my event
model.

Win32 IS: This is the standard Win32 API as defined by NT4.0sp6a and
higher. If you want to drop support for NT4, then we go to Win2k, but
don't gain much.

GUI message queues in Win32 are per thread. Each thread has a message
queue that is autovivified. Any window that a thread creates has its
messages sent to that thread's queue. However, there is no reason that
a message actually has to have an associated window. You can send any
thread in any process a message, so long as the thread has had its
queue autovivified and is not crossing security boundaries.

All files or things that look like files can be opened for async
access. For example, sockets, files, and pipes can all be async. Any
read, write, lock, unlock, or ioctl call can either signal a condition
var (Win32 calls them events, and they don't have POSIX

Re: Threads, events, Win32, etc.

2004-11-17 Thread Leopold Toetsch
Gabe Schaffer [EMAIL PROTECTED] wrote:

 Yes, there has to be a separate thread to get signals, and each thread
 needs its own event queue, but why does the process have a global
 event_queue? I suppose there are generic events that could be handled
 just by the next thread to call check_events, but that isn't what this
 sounds like.

It's mainly intended for broadcasts and timers. POSIX signals are weird
and more or less broken from platform to platform. The only reliable way
to get at them is to block the desired signal in all but one thread.
This signal gets converted to a global event and from there it can be
put into specifc threads if they have installed signal handlers for that
signal.

But as said the existing code is experimental and is likely to change a
lot.

 I don't see why there needs to be a separate thread to listen for IOs
 to finish. Can't that be the same thread that listens for signals?

That's the plan yes. AIO completion can be delivered as a signal.

 OK, I think I understand why...the event thread is in a loop waiting
 for somebody to tell it that there's an event in the global event
 queue...which is really the part I don't get yet.

Well, the event thread is handling timer events on behalf of an
interpreter.

[ long win32 proposal ]

I've to read through that some more times.

Do you alread have ideas for a common API, or where to split the
existing threads.c into platform and common code?

 GNS

leo


COND macros (was: Threads, events, Win32, etc.)

2004-11-17 Thread Leopold Toetsch
Gabe Schaffer [EMAIL PROTECTED] wrote:
  Not quite. COND_WAIT takes an opaque type defined by the platform, that
  happens to be a mutex for the pthreads based implementation.

  It should, but it doesn't. Here's the definition:
  #  define COND_WAIT(c,m) pthread_cond_wait(c, m)

 You are already in the POSIX specific part.

 It came from thr_pthread.h, so it should be POSIX. The issue here is
 that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c).

Well in the mentioned (TODO) platform/win32/threads.h you have to define
your own COND_WAIT(c, m) - this is the interface of that macro, as POSIX
needs the mutex, but you would ignore the 2nd parameter.

Please have a look at the empty defines in include/parrot/threads.h.

The problem is a different one: the COND_INIT macro just passes a
condition location, the mutex is created in a second step, which isn't
needed for windows. OTOH a mutex aka critical section is needed
separatly.

So we should probably define these macros to be:

  COND_INIT(c, m)
  COND_DESTROY(c, m)

see src/tsq.c for usage.

Does win32 require more info to create conditions/mutexes or would these
macros suffice?

[ I'll try to answer more in a separate thread ]

leo


Re: Threads, events, Win32, etc.

2004-11-16 Thread Gabe Schaffer
On Mon, 15 Nov 2004 12:57:00 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Gabe Schaffer [EMAIL PROTECTED] wrote:
  * COND_WAIT takes a mutex because that's how pthreads works, but Win32
  condition variables (called events) are kernel objects that do not
  require any other object to be associated with them. I think this
  could be cleaned up with further abstraction.
 
 Not quite. COND_WAIT takes an opaque type defined by the platform, that
 happens to be a mutex for the pthreads based implementation.

It should, but it doesn't. Here's the definition:
#  define COND_WAIT(c,m) pthread_cond_wait(c, m)
It explicitly takes a condition and a mutex, while it should just be
passed a Parrot_cond (or something like that):

typedef struct {
#ifdef pthreads
pthread_mutex_t m;
pthread_cond_t c;
#elseif Win32
HANDLE h;
#endif
} Parrot_cond;

  The big issue, though, is with the IO thread. On NT the IO is already
  async and there are no signals (Ctrl+C is handled with a callback), so
  each interpreter thread should just be able to handle all of this in
  the check_events functions.
 
 Not all. We need to do check_events() for e.g. message passing too.


   Win9x doesn't have async IO on files, so it still might
  require separate threads to do IOs.
 
 I'm not sure, if we even should support Win9{8,5}.

I'd be happy with simply implementing Win9x as a non-threaded
platform. Of course, hopefully nobody will even ask...

  Anyway, it seems to me that all this event/IO stuff needs
  significantly more abstraction in order to prevent it from becoming a
  hacked-up mess of #ifdefs.
 
 Yep. The system-specific stuff should be split into platform files. A
 common Parrot API then talks to platform code.
 
  ...However, I couldn't find any docs on this,
  so I just guessed how it all works based on the source.
 
 The current state of the implemented pthread model is summarized in
 docs/dev/events.pod.

Thanks, I didn't see that. My problem isn't with what the
implementation does, though -- it's that I don't understand the
rationale. I can understand why there would need to be a global event
thread (timers, GC, DoD), but why would passing a message from one
thread to another need to be serialized through a global event queue?

And as for IO, I see the obvious advantages of performing synchronous
IO functions in a separate thread to make them asynchronous, but that
sounds like the job of a worker thread pool. There are many ways to
implement this, but serializing them all through one queue sounds like
a bottleneck to me.

 Au contraire. Your analysis is precise. Do you like to take a shot at a
 Win32 threads/event model? So we could figure out the necessary
 splitting of API/implementation.

OK. I think I need to have a better understanding on what events
actually are, though. Who sends them? What do they mean? Which signals
do we actually care about? What are notifications? How will AIO
actually be handled? You know, that sort of thing... Maybe there
should be a PDD for it?

GNS


Re: Threads, events, Win32, etc.

2004-11-16 Thread Leopold Toetsch
Gabe Schaffer [EMAIL PROTECTED] wrote:
 On Mon, 15 Nov 2004 12:57:00 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Gabe Schaffer [EMAIL PROTECTED] wrote:
  * COND_WAIT takes a mutex because that's how pthreads works, but Win32
  condition variables (called events) are kernel objects that do not
  require any other object to be associated with them. I think this
  could be cleaned up with further abstraction.

 Not quite. COND_WAIT takes an opaque type defined by the platform, that
 happens to be a mutex for the pthreads based implementation.

 It should, but it doesn't. Here's the definition:
 #  define COND_WAIT(c,m) pthread_cond_wait(c, m)

You are already in the POSIX specific part.

1) During configure parrot includes platform code from files located in
  config/gen/platform/*/

2) if a platform doesn't have an implementation the ../generic/
directory is used.

3) $ find config -name threads.h
config/gen/platform/generic/threads.h

So there is no win32/threads.h (yet ;)

If the implementation needs some additional libraries, the hints/* are
consulted e.g.

config/init/hints/linux.pl:$libs .= ' -lpthread';

 I'm not sure, if we even should support Win9{8,5}.

 I'd be happy with simply implementing Win9x as a non-threaded
 platform. Of course, hopefully nobody will even ask...

We'll see. But as Parrot's IO system is gonna be asynchronous in core, I
doubt that we'll support it.

 The current state of the implemented pthread model is summarized in
 docs/dev/events.pod.

 Thanks, I didn't see that. My problem isn't with what the
 implementation does, though -- it's that I don't understand the
 rationale. I can understand why there would need to be a global event
 thread (timers, GC, DoD), but why would passing a message from one
 thread to another need to be serialized through a global event queue?

The main reason for the global event queue isn't message passing. The
reason is POSIX signals. Basically you aren't allowed to do anything
serious in a signal handler, especially you aren't allowed to broadcast
a condition or something.
So I came up with that experimental code of one thread doing signals.

 And as for IO, I see the obvious advantages of performing synchronous
 IO functions in a separate thread to make them asynchronous, but that
 sounds like the job of a worker thread pool. There are many ways to
 implement this, but serializing them all through one queue sounds like
 a bottleneck to me.

Yes. The AIO library is doing that anyway i.e. utilizing a thread pool
for IO operations.

 Au contraire. Your analysis is precise. Do you like to take a shot at a
 Win32 threads/event model? So we could figure out the necessary
 splitting of API/implementation.

 OK. I think I need to have a better understanding on what events
 actually are, though. Who sends them? What do they mean? Which signals
 do we actually care about? What are notifications? How will AIO
 actually be handled? You know, that sort of thing... Maybe there
 should be a PDD for it?

Dan did post a series of documents to the list some time ago. Sorry I'be
no exact subject, but with relevant keywords like events you should
find it.

 GNS

leo


Threads, events, Win32, etc.

2004-11-15 Thread Gabe Schaffer
I was just browsing the Parrot source, and noticed that the threading
implementation is a bit Unix/pthread-centric. For example:

* COND_WAIT takes a mutex because that's how pthreads works, but Win32
condition variables (called events) are kernel objects that do not
require any other object to be associated with them. I think this
could be cleaned up with further abstraction.

* CLEANUP_PUSH doesn't have any Win32 analog that I know of, although
it's not clear why this might be needed for Parrot anyway. Right now
it just looks like it's used to prevent threads from abandoning a
mutex, which isn't a problem with Win32.

The big issue, though, is with the IO thread. On NT the IO is already
async and there are no signals (Ctrl+C is handled with a callback), so
each interpreter thread should just be able to handle all of this in
the check_events functions. That is, AIO and timers allow you to
specify a completion callback (asynchronous procedure call) that gets
executed once you tell the OS that you're ready for them (e.g. via
Sleep), so the whole event dispatching system may not even be
necessary. Win9x doesn't have async IO on files, so it still might
require separate threads to do IOs.

Note that the Windows message queue does not really get involved here
(unless you want it to), as it is mainly for threads that have UIs or
use COM/DDE.

Anyway, it seems to me that all this event/IO stuff needs
significantly more abstraction in order to prevent it from becoming a
hacked-up mess of #ifdefs. However, I couldn't find any docs on this,
so I just guessed how it all works based on the source. Feel free to
whack me with a cluestick if I'm wrong about anything.

GNS


Re: Threads, events, Win32, etc.

2004-11-15 Thread Leopold Toetsch
Gabe Schaffer [EMAIL PROTECTED] wrote:
 I was just browsing the Parrot source, and noticed that the threading
 implementation is a bit Unix/pthread-centric. For example:

 * COND_WAIT takes a mutex because that's how pthreads works, but Win32
 condition variables (called events) are kernel objects that do not
 require any other object to be associated with them. I think this
 could be cleaned up with further abstraction.

Not quite. COND_WAIT takes an opaque type defined by the platform, that
happens to be a mutex for the pthreads based implementation.

 * CLEANUP_PUSH doesn't have any Win32 analog that I know of, although
 it's not clear why this might be needed for Parrot anyway. Right now
 it just looks like it's used to prevent threads from abandoning a
 mutex, which isn't a problem with Win32.

Yes. And it'll very likely go away. But anyway - it's a define by the
platform. So you can define it being a noop for win32.

 The big issue, though, is with the IO thread. On NT the IO is already
 async and there are no signals (Ctrl+C is handled with a callback), so
 each interpreter thread should just be able to handle all of this in
 the check_events functions.

Not all. We need to do check_events() for e.g. message passing too.

  Win9x doesn't have async IO on files, so it still might
 require separate threads to do IOs.

I'm not sure, if we even should support Win9{8,5}.

 Anyway, it seems to me that all this event/IO stuff needs
 significantly more abstraction in order to prevent it from becoming a
 hacked-up mess of #ifdefs.

Yep. The system-specific stuff should be split into platform files. A
common Parrot API then talks to platform code.

 ...However, I couldn't find any docs on this,
 so I just guessed how it all works based on the source.

The current state of the implemented pthread model is summarized in
docs/dev/events.pod.

 ... Feel free to
 whack me with a cluestick if I'm wrong about anything.

Au contraire. Your analysis is precise. Do you like to take a shot at a
Win32 threads/event model? So we could figure out the necessary
splitting of API/implementation.

 GNS

leo


Re: Threads, events, Win32, etc.

2004-11-15 Thread Dan Sugalski
At 12:57 PM +0100 11/15/04, Leopold Toetsch wrote:
Gabe Schaffer [EMAIL PROTECTED] wrote:
 I was just browsing the Parrot source, and noticed that the threading
 implementation is a bit Unix/pthread-centric. For example:

 * COND_WAIT takes a mutex because that's how pthreads works, but Win32
 condition variables (called events) are kernel objects that do not
 require any other object to be associated with them. I think this
 could be cleaned up with further abstraction.
Not quite. COND_WAIT takes an opaque type defined by the platform, that
happens to be a mutex for the pthreads based implementation.
Yep. This is important to note -- the joys of portability often means 
that functions in the source carry parameters that might not actually 
get used. That's the case here, since POSIX threads (which the unices 
and VMS use for their threading model) requires a mutex. I fully 
expect we'll have similar bits carried around to accomodate windows 
too.

  The big issue, though, is with the IO thread. On NT the IO is already
 async and there are no signals (Ctrl+C is handled with a callback), so
 each interpreter thread should just be able to handle all of this in
 the check_events functions.
Not all. We need to do check_events() for e.g. message passing too.
And notifications, and possibly cleanup of objects with finalizers.
   Win9x doesn't have async IO on files, so it still might
 require separate threads to do IOs.
I'm not sure, if we even should support Win9{8,5}.
Nope. Or, rather, we officially don't care if we run on Win9x/WinME. 
If we do, swell. If not, well...

Win9x isn't particularly special here. We feel the same about 
AmigaDOS, VMS 5.5, HP/UX 10.x, SunOS, Linux 1.x, and BeOS. Amongst 
others.

  Anyway, it seems to me that all this event/IO stuff needs
 significantly more abstraction in order to prevent it from becoming a
 hacked-up mess of #ifdefs.
Yep. The system-specific stuff should be split into platform files. A
common Parrot API then talks to platform code.
Yeah. The event stuff's definitely primitive, and not much thought's 
been given to it as of yet.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-19 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:
 On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote:

 Nethertheless we have to create managed objects (a Packfile PMC) so
 that we can recycle unused eval-segments.

 True, and some eval-segments are done as soon as they run (eval 3 +
 4), whereas others may result in code which needs to stay around (eval
 sub {}), and even in the latter case not _all_ of the code
 generated in the eval would need to stay around. It seems that it may
 be hard to determine what can be recycled, and when.

Well, not really. As long as you have a reference to the code piece,
it's alive.

 And we have to protect the packfile dictionary with mutexes, when this
 managing structure changes i.e. when new segments gets chained into
 this list or when they get destroyed.

 Yes, though it's not clear to me if all eval-segments will need to go
 into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case
 above would not.)

It probably depends on the generated code. If this code creates globals
(e.g. Sub PMCs) it ought to stay around.

[ toss constant op variations ]

 For PIR yes, but the PASM assembler can't know for sure what register
 would be safe to use--the code could be using its own obscure calling
 conventions.

PASM would need rewriting to only use the available ops, basically.

 JEff

leo


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-19 Thread Jeff Clites
On Oct 19, 2004, at 1:56 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote:

Nethertheless we have to create managed objects (a Packfile PMC) so
that we can recycle unused eval-segments.

True, and some eval-segments are done as soon as they run (eval 3 +
4), whereas others may result in code which needs to stay around 
(eval
sub {}), and even in the latter case not _all_ of the code
generated in the eval would need to stay around. It seems that it may
be hard to determine what can be recycled, and when.
Well, not really. As long as you have a reference to the code piece,
it's alive.
Yes, that's what I meant. In the case of:
$sum = eval 3 + 4;
you don't have any such reference. In the case of:
$sub = eval sub { return 7 };
you do. In the case of:
$sub = eval 3 + 4; sub { return 7 };
you've got a reference to the sub still, but the 3 + 4 code is no 
longer reachable, so wouldn't need to stay around.

But it's possible that Parrot won't be able to tell the difference, and 
will have to keep around more than is necessary.

And we have to protect the packfile dictionary with mutexes, when 
this
managing structure changes i.e. when new segments gets chained into
this list or when they get destroyed.

Yes, though it's not clear to me if all eval-segments will need to go
into a globally-accessible dictionary. (e.g., it seems the 3 + 4 
case
above would not.)
It probably depends on the generated code. If this code creates globals
(e.g. Sub PMCs) it ought to stay around.
Yes, that's what I meant by not all--some yes, some no.
JEff


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-17 Thread Leopold Toetsch
Jeff Clites wrote:
On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote:
String, number (and PMC) constants are all addressed in terms of the 
compiling interpreter.
...
When we do an eval() e.g. in a loop, we have to create a new constant 
table (and recycle it later, which is a different problem). Running 
such a compiled piece of code with different threads would currently 
do the wrong thing.

The correct constant table depends on the code segment, rather than the 
specific interpreter, right? 
Yes.
...That means that referencing the absolute 
address of the const table entry would be correct for JIT code no matter 
the executing thread, but getting the const table from the compiling 
interpreter is wrong if that interpreter isn't holding a reference to 
the corresponding code segment.
Thinking more about that I've to admit that my above conclusion is very 
likely wrong. When a piece of code gets compiled, we can say that we are 
creating a read-only data structure (code, constants, metadata). This 
data structure can be shared between different threads and using 
absolute addresses for the constants is ok.

Nethertheless we have to create managed objects (a Packfile PMC) so that 
we can recycle unused eval-segments. And we have to protect the packfile 
dictionary with mutexes, when this managing structure changes i.e. when 
new segments gets chained into this list or when they get destroyed.

Access to constants in the constant table is not only a problem for 
the JIT runcore, its a lengthy operation for all code, For a string 
constant at PC[i]:

   interpreter-code-const_table-constants[PC[i]]-u.string
So for JIT and prederefed cores that's not a problem.
OTOH it might be better to just toss all the constant table access in 
all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the size 
of core_ops_cgp.o with core_ops_cg.o).

Reducing the size is good, but this doesn't overall reduce the number of 
accesses to the constant table, just changes which op is doing them.
Not quite, e.g:
   set P0[foo], bar
   set S0, P0[foo]
are 3 accesses to the constant table. It would be
   set S1, foo
   set S2, bar
   set P0[S1], S2
   set S0, P0[S1]
with 2 accesses as long as there is no pressure on the register allocator.
The assembler could still allow all constant variations of opcodes and 
just translate it.
For this we'd need a special register to hold the loaded constant, so 
that we don't overwrite a register which is in use.
No, just the registers we have anyway,
JEff
leo


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-17 Thread Jeff Clites
On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote:
Jeff Clites wrote:
On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote:
Nethertheless we have to create managed objects (a Packfile PMC) so 
that we can recycle unused eval-segments.
True, and some eval-segments are done as soon as they run (eval 3 + 
4), whereas others may result in code which needs to stay around (eval 
sub {}), and even in the latter case not _all_ of the code 
generated in the eval would need to stay around. It seems that it may 
be hard to determine what can be recycled, and when.

And we have to protect the packfile dictionary with mutexes, when this 
managing structure changes i.e. when new segments gets chained into 
this list or when they get destroyed.
Yes, though it's not clear to me if all eval-segments will need to go 
into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case 
above would not.)

OTOH it might be better to just toss all the constant table access 
in all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the 
size of core_ops_cgp.o with core_ops_cg.o).
Reducing the size is good, but this doesn't overall reduce the number 
of accesses to the constant table, just changes which op is doing 
them.
Not quite, e.g:
   set P0[foo], bar
   set S0, P0[foo]
are 3 accesses to the constant table. It would be
   set S1, foo
   set S2, bar
   set P0[S1], S2
   set S0, P0[S1]
with 2 accesses as long as there is no pressure on the register 
allocator.
Sure, but you can do this optimization today--narrowing it down to just 
those 3 ops isn't required. But it only helps if local re-use of the 
same constants is frequent, and it may not be. (But still, it's a good 
optimization for a compiler to implement--it just may not have a huge 
effect.)

Also, there's some subtlety. This:
set S1, foo
set S2, foo
isn't the same as:
set S1, foo
set S2, S1
but rather:
set S1, foo
clone S2, S1
since 'set' copies in the s_sc case. (That's not a problem, just 
something to keep in mind.)

The assembler could still allow all constant variations of opcodes 
and just translate it.
For this we'd need a special register to hold the loaded constant, so 
that we don't overwrite a register which is in use.
No, just the registers we have anyway,
For PIR yes, but the PASM assembler can't know for sure what register 
would be safe to use--the code could be using its own obscure calling 
conventions.

JEff


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-16 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:
 On Oct 14, 2004, at 12:10 PM, Leopold Toetsch wrote:

 Proposal:
 * we mandate that JIT code uses interpreter-relative addressing
 - because almost all platforms do it
 - because some platforms just can't do anything else
 - and of course to avoid re-JITting for every new thread

 FYI, the PPC JIT does already do parrot register addressing relative to
 the interpreter pointer, which as you said is already in a CPU
 register. This is actually less instructions than using absolute
 addressing would require (one rather than three).

Yes, and not only PPC, *all* but i386.

 We do still re-JIT for each thread on PPC, though we wouldn't have to
 (just never changed it to not).

Doing that or not depending on a specific JIT platform is error prone
and clutters the source code.

 ... But, we use this currently, because
 there is one issue with threads: With a thread, you don't start from
 the beginning of the JITted code segment,

This isn't a threading issue. We can always start execution in the
middle of one segment, e.g. after an exception. That's already handled
on almost all JIT platforms and no problem. The code emitted in
Parrot_jit_begin gets the Ccur_opcode * as argument and has to branch
there, always.

 JEff

leo


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-16 Thread Leopold Toetsch
Jeff Clites wrote:
We do still re-JIT for each thread on PPC, though we wouldn't have to 
The real problem that all JIT architectures still have is a different 
one: its called const_table and hidden either in the CONST macro or in 
syntax like NUM_CONST, which is translated by the jit2h.pl utility.

String, number (and PMC) constants are all addressed in terms of the 
compiling interpreter. Basically everywhere, where the exec code adds 
text relocations we aren't safe (e.g. load_nc in the PPC jit_emit code).

When we do an eval() e.g. in a loop, we have to create a new constant 
table (and recycle it later, which is a different problem). Running such 
a compiled piece of code with different threads would currently do the 
wrong thing.

Access to constants in the constant table is not only a problem for the 
JIT runcore, its a lengthy operation for all code, For a string constant 
at PC[i]:

   interpreter-code-const_table-constants[PC[i]]-u.string
These are 3 indirections to get at the constants pointer array, and 
worse they depend on each other, emitting these 3 instructions on an 
i386 stalls for 1 cycle twice (but the compiler is clever and 
interleaves other instructions)

For the JIT core, we can precalculate the location of the constants 
array and store it in the stack or even in a register (on not so 
register-crippled machines like i386). It only needs reloading, when an 
Cinvoke statement is emitted.

OTOH it might be better to just toss all the constant table access in 
all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the size 
of core_ops_cgp.o with core_ops_cg.o). The assembler could still allow 
all constant variations of opcodes and just translate it.

leo


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-16 Thread Jeff Clites
On Oct 16, 2004, at 12:26 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
... But, we use this currently, because
there is one issue with threads: With a thread, you don't start from
the beginning of the JITted code segment,
This isn't a threading issue. We can always start execution in the
middle of one segment, e.g. after an exception. That's already handled
on almost all JIT platforms and no problem. The code emitted in
Parrot_jit_begin gets the Ccur_opcode * as argument and has to branch
there, always.
I was remembering wrong--we do this on PPC too.
On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote:
String, number (and PMC) constants are all addressed in terms of the 
compiling interpreter.
...
When we do an eval() e.g. in a loop, we have to create a new constant 
table (and recycle it later, which is a different problem). Running 
such a compiled piece of code with different threads would currently 
do the wrong thing.
The correct constant table depends on the code segment, rather than the 
specific interpreter, right? That means that referencing the absolute 
address of the const table entry would be correct for JIT code no 
matter the executing thread, but getting the const table from the 
compiling interpreter is wrong if that interpreter isn't holding a 
reference to the corresponding code segment.

Access to constants in the constant table is not only a problem for 
the JIT runcore, its a lengthy operation for all code, For a string 
constant at PC[i]:

   interpreter-code-const_table-constants[PC[i]]-u.string
These are 3 indirections to get at the constants pointer array, and 
worse they depend on each other, emitting these 3 instructions on an 
i386 stalls for 1 cycle twice (but the compiler is clever and 
interleaves other instructions)

For the JIT core, we can precalculate the location of the constants 
array and store it in the stack or even in a register (on not so 
register-crippled machines like i386). It only needs reloading, when 
an Cinvoke statement is emitted.
For PPC JIT, it seems that we are putting in the address of the 
specific const table entry, as an immediate.

OTOH it might be better to just toss all the constant table access in 
all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the size 
of core_ops_cgp.o with core_ops_cg.o).
Reducing the size is good, but this doesn't overall reduce the number 
of accesses to the constant table, just changes which op is doing them.

The assembler could still allow all constant variations of opcodes and 
just translate it.
For this we'd need a special register to hold the loaded constant, so 
that we don't overwrite a register which is in use.

JEff


Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-15 Thread Jeff Clites
On Oct 14, 2004, at 12:10 PM, Leopold Toetsch wrote:
Proposal:
* we mandate that JIT code uses interpreter-relative addressing
- because almost all platforms do it
- because some platforms just can't do anything else
- and of course to avoid re-JITting for every new thread
FYI, the PPC JIT does already do parrot register addressing relative to 
the interpreter pointer, which as you said is already in a CPU 
register. This is actually less instructions than using absolute 
addressing would require (one rather than three).

We do still re-JIT for each thread on PPC, though we wouldn't have to 
(just never changed it to not). But, we use this currently, because 
there is one issue with threads: With a thread, you don't start from 
the beginning of the JITted code segment, but rather you need to 
start with a specific Parrot function call, somewhere in the middle. 
But you can't just jump to that instruction, because it would not have 
the setup code needed when entering the JITted section. So currently, 
we use a technique whereby the beginning of the JITted section has, 
right after the setup code, a jump to the correct starting address--in 
the main thread case, this is just a jump to the next instruction 
(essentially a noop), but in the thread case, it's a jump to the 
function which the thread is going to run. So right now the JITted code 
for a secondary thread differs by one instruction from that for the 
main thread. We'll need to work out a different mechanism for handling 
this--probably just a tiny separate JITted section to set things up for 
a secondary thread, before doing an inter-section jump to the right 
place.

JEff


[Proposal] JIT, exec core, threads, and architectures

2004-10-14 Thread Leopold Toetsch
First some facts:
- all JIT platforms *except* i386 have a register reserved for the
  runtime interpreter
- Parrot register addressing is done relative to that CPU register
- that would allow to reuse the JITted code for different threads
  aka interpreters
- but because of i386 is using absolute addresses that's not done
- the latter point also is the cause for zillions of text relocations
  currently needed for the EXEC/i386 run core. I can't imagine that
  this will speed up program start :)
Proposal:
* we mandate that JIT code uses interpreter-relative addressing
- because almost all platforms do it
- because some platforms just can't do anything else
- and of course to avoid re-JITting for every new thread
* src/jit.c calls some platform interface functions, which copy between
  Parrot registers and CPU registers. These are called at the begin
  and end of JITted code sections and are now based on absolute
  memory addresses.
  This should be changed to use offsets. To accomodate platforms that
  have the interpreter cached in a CPU register, I'm thinking of the
  following interface:
  MACRO int Parrot_jit_emit_get_base_reg_no(jit_info *pc)
// possibly emit code  and return register number of base pointer
// this register should be one of the scratch registers
// this must be a macro used as Creg = foo(pc);
  Parrot_jit_emit_mov_MR(,..int base_reg_no, size_t offset, int src_reg_no)
  ... other 3 move functions similar
  Register addressing is done relative to the base pointer, which currently
  is REG_INT(0) or the interpreter, but that might change.
  The code to load this register will be emitted just before the actual
  register moves are done. It's currenlty a noop for all but i386, which
  would need one instruction mov 16(%ebp), %eax
* Currently all platforms are using homegrown defines to calculate the
  register offset, some of these are even readable ;)
  To get rid of that, we should provide a set of macros that calculate
  the register offset relative to the base pointer.
 REG_OFFS_INT(x)   // get offset for INTVAL reg no x
 REG_OFFS_STR(x)
 ...
* Implementation
  First the framework should be implemented.
  To allow some transition time the old register move semantics remain
  for some time. Depending on the defined()ness of
  CParrot_jit_emit_get_base_reg_no the new code will be used.
Comments welcome
leo


Update to Threads/IO issue on Cygwin

2004-10-06 Thread Joshua Gatcomb
Since all the tests were passing in the past, I
decided to play the CVS game to find exactly when/what
changed.

Good news is - nothing to do with Parrot

Bad news is - it means it was an upgrade to Cygwin,
which I also do on a daily basis.  I have no way of
tracking down what changed but I could ping the Cygwin
list if anyone thinks it might help.

Cheers,
Joshua Gatcomb
a.k.a. Limbic~Region



___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com


Re: Update to Threads/IO issue on Cygwin

2004-10-06 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:
 Since all the tests were passing in the past, I
 decided to play the CVS game to find exactly when/what
 changed.

 Good news is - nothing to do with Parrot

Good, thanks for taking the time to investigate that.

 Bad news is - it means it was an upgrade to Cygwin,
 which I also do on a daily basis.

Well, there are probably releases which get used most. A daily update
always has some risk,

 Cheers,
 Joshua Gatcomb

leo


Another Update to threads/IO problem on Cygwin

2004-10-06 Thread Joshua Gatcomb
I happened to have found the last cygwin1.dll lying
around in /tmp that I kept as a backup.  I swapped it
with the current cygwin1.dll just to see if it would
make the IO problem go away and much to my happy
surprise - it did.

Details:
cygwin1.dll-1.5.10-3 - previous stable build, works
great
cygwin1.dll-1.5.11-1 - current stable build, blows up

I will be pinging the Cygwin list momentarily to see
if they have any insight.

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region



___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com


Re: Another Update to threads/IO problem on Cygwin

2004-10-06 Thread Joshua Gatcomb
--- Joshua Gatcomb [EMAIL PROTECTED]
wrote:

 I happened to have found the last cygwin1.dll lying
 around in /tmp that I kept as a backup.  I swapped
 it
 with the current cygwin1.dll just to see if it would
 make the IO problem go away and much to my happy
 surprise - it did.
 
 Details:
 cygwin1.dll-1.5.10-3 - previous stable build, works
 great
 cygwin1.dll-1.5.11-1 - current stable build, blows
 up
 
 I will be pinging the Cygwin list momentarily to see
 if they have any insight.

I didn't get a response from the Cygwin list, but I
asked one of the Cygwin knowledgeable monks at the
Monastery (http://www.perlmonks.org).  They indicated
there was a major problem with 1.5.11-1 with threads
losing output (my problem exactly) and that it was
corrected with one of the latest snapshots
(http://cygwin.com/snapshots/).  I downloaded it and
tried it - everything is working great.

make test  - all tests pass
make testj - all tests pass

Both work even with some aggressive optimizations
passed to Configure.pl

All is once again right in the world ;-)

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Threads on Cygwin

2004-10-03 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:

[ Cygwin thread tests don't print all ]

Does this patch help? It creates shared IO resources. But its of course
not final: there are no precautions against one thread changing the PIO
of another thread or such, no locks yet, nothing.

leo

--- parrot/src/inter_create.c   Fri Oct  1 15:26:26 2004
+++ parrot-leo/src/inter_create.c   Sat Oct  2 12:06:10 2004
@@ -31,6 +31,12 @@
 #define ATEXIT_DESTROY

 /*
+ * experimental: use shared IO resources for threads
+ */
+
+#define PARROT_SHARED_IO 1
+
+/*

 =item Cstatic int is_env_var_set(const char* var)

@@ -125,7 +131,15 @@

 /* PANIC will fail until this is done */
 SET_NULL(interpreter-piodata);
+#if PARROT_SHARED_IO
+if (parent) {
+interpreter-piodata = parent-piodata;
+}
+else
+PIO_init(interpreter);
+#else
 PIO_init(interpreter);
+#endif

 if (is_env_var_set(PARROT_GC_DEBUG)) {
 #if ! DISABLE_GC_DEBUG
@@ -225,6 +239,9 @@
 setup_default_compreg(interpreter);

 /* setup stdio PMCs */
+#if PARROT_SHARED_IO
+if (!parent)
+#endif
 PIO_init(interpreter);

 /* Done. Return and be done with it */
@@ -330,6 +347,9 @@
  */

 /* Now the PIOData gets also cleared */
+#if PARROT_SHARED_IO
+if (!interpreter-parent_interpreter)
+#endif
 PIO_finish(interpreter);

 /*


Re: Threads on Cygwin

2004-10-03 Thread Jens Rieks
On Saturday 02 October 2004 12:49, Leopold Toetsch wrote:
 Does this patch help?
No, it makes things worse:

--- without-patch.txt   2004-10-03 14:35:58.824775096 +0200
+++ with-patch.txt  2004-10-03 14:35:37.843964664 +0200
@@ -30,7 +30,12 @@
 # expected: '500500
 # 500500
 # '
-ok 6 - detach
+not ok 6 - detach
+# Failed test (t/pmc/threads.t at line 257)
+#   'thread
+# '
+# doesn't match '/(done\nthread\n)|(thread\ndone\n)/
+# '
 not ok 7 - share a PMC
 # Failed test (t/pmc/threads.t at line 285)
 #  got: 'thread
@@ -73,4 +78,4 @@
 # '
 ok 10 # skip no shared PerlStrings yet
 ok 11 # skip no shared PerlStrings yet
-# Looks like you failed 6 tests of 11.
+# Looks like you failed 7 tests of 11.







without-patch.txt:
$ perl -Ilib t/pmc/threads.t
1..11
ok 1 - interp identity
not ok 2 - thread type 1
# Failed test (t/pmc/threads.t at line 61)
#  got: 'thread 1
# '
# expected: 'thread 1
# main 10
# '
not ok 3 - thread type 2
# Failed test (t/pmc/threads.t at line 98)
#  got: 'ok 1
# ok 2
# hello from 1 thread
# ParrotThread tid 1
# Sub
# '
# expected: 'ok 1
# ok 2
# hello from 1 thread
# ParrotThread tid 1
# Sub
# from 10 interp
# '
ok 4 - thread - kill
not ok 5 - join, get retval
# Failed test (t/pmc/threads.t at line 189)
#  got: ''
# expected: '500500
# 500500
# '
not ok 6 - detach
# Failed test (t/pmc/threads.t at line 257)
#   'thread
# '
# doesn't match '/(done\nthread\n)|(thread\ndone\n)/
# '
not ok 7 - share a PMC
# Failed test (t/pmc/threads.t at line 285)
#  got: 'thread
# 20
# '
# expected: 'thread
# 20
# done
# 21
# '
not ok 8 - multi-threaded
# Failed test (t/pmc/threads.t at line 320)
#  got: '3
# 1
# 2
# 3
# done thread
# '
# expected: '3
# 1
# 2
# 3
# done thread
# done main
# '
not ok 9 - multi-threaded strings via SharedRef
# Failed test (t/pmc/threads.t at line 368)
#  got: '3
# ok 1
# ok 2
# ok 3
# done thread
# '
# expected: '3
# ok 1
# ok 2
# ok 3
# done thread
# done main
# '
ok 10 # skip no shared PerlStrings yet
ok 11 # skip no shared PerlStrings yet
# Looks like you failed 7 tests of 11.










without-patch.txt:
$ perl -Ilib t/pmc/threads.t
1..11
ok 1 - interp identity
not ok 2 - thread type 1
# Failed test (t/pmc/threads.t at line 61)
#  got: 'thread 1
# '
# expected: 'thread 1
# main 10
# '
not ok 3 - thread type 2
# Failed test (t/pmc/threads.t at line 98)
#  got: 'ok 1
# ok 2
# hello from 1 thread
# ParrotThread tid 1
# Sub
# '
# expected: 'ok 1
# ok 2
# hello from 1 thread
# ParrotThread tid 1
# Sub
# from 10 interp
# '
ok 4 - thread - kill
not ok 5 - join, get retval
# Failed test (t/pmc/threads.t at line 189)
#  got: ''
# expected: '500500
# 500500
# '
ok 6 - detach
not ok 7 - share a PMC
# Failed test (t/pmc/threads.t at line 285)
#  got: 'thread
# 20
# '
# expected: 'thread
# 20
# done
# 21
# '
not ok 8 - multi-threaded
# Failed test (t/pmc/threads.t at line 320)
#  got: '3
# 1
# 2
# 3
# done thread
# '
# expected: '3
# 1
# 2
# 3
# done thread
# done main
# '
not ok 9 - multi-threaded strings via SharedRef
# Failed test (t/pmc/threads.t at line 368)
#  got: '3
# ok 1
# ok 2
# ok 3
# done thread
# '
# expected: '3
# ok 1
# ok 2
# ok 3
# done thread
# done main
# '
ok 10 # skip no shared PerlStrings yet
ok 11 # skip no shared PerlStrings yet
# Looks like you failed 6 tests of 11.


jens


Re: Threads on Cygwin

2004-10-03 Thread Joshua Gatcomb

--- Jens Rieks [EMAIL PROTECTED] wrote:

 On Saturday 02 October 2004 12:49, Leopold Toetsch
 wrote:
  Does this patch help?
 No, it makes things worse:

Actually it doesn't.  There is something wrong with
threads_6.pasm as my output for the test doesn't
change with or without the patch and yet one passes
and the other doesn't.  Judging from the actual code
it is supposed to print done in there somewhere and
it doesn't.  Test 6 is one of the few that has a regex
for checking output :

/(done\nthread\n)|(thread\ndone\n)/

So I am rather confused as to why it is passing
without the patch since it only ever prints thread

 jens
Joshua Gatcomb
a.k.a. Limbic~Region



__
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail


Re: Threads on Cygwin

2004-10-02 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:

 PIO_OS_UNIX is the one defined and now parrot squawks
 Polly wanna Unix everytime I run it ;-)

 Now what?

Fix the thread related IO bug? Seriously, I don't know yet, if the IO
initialization is done correctly for threads. Currently each thread has
its own IO subsystem, which might be wrong.

It could be that the IO PMCs for standard handles (or for all open
files?) have to be shared between threads.

leo


Re: Threads on Cygwin

2004-10-01 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:
 ... only 1 of the
 two messages is displayed

I've fixed a flaw in the IO flush code. Please try again, thanks.

leo


Re: Threads on Cygwin

2004-10-01 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:

 I agree, but that doesn't explain why only 1 of the
 two messages is displayed to the screen when the sleep
 statement is present.

Overlooked that in the first place. So what you get is that the one *or*
the other string is displayed. That's a serious problem, likely related
to the IO subsystem.
So the first question is: which IO system is active on Cygwin: the
windows or the unix variant?

But looking at the source code it seems that the IO system shutdown code
is rather broken: only stdout and stderr streams are flushed but not the
actual PIOs. I'll try to fix that.

 Joshua Gatcomb

leo


Re: Threads on Cygwin

2004-10-01 Thread Joshua Gatcomb
--- Leopold Toetsch [EMAIL PROTECTED] wrote:

 Joshua Gatcomb [EMAIL PROTECTED] wrote:
  ... only 1 of the
  two messages is displayed
 
 I've fixed a flaw in the IO flush code. Please try
 again, thanks.

Still not working, but thanks!  The behavior has
changed a bit though.

Here is the behavior prior to the fix - notice the
location of the sleep statement

Case 1:
(as checked out)
$ cat t/pmc/threads_2.pasm  snipped
set I3, 1
invoke  # start the thread

sleep 1
print main 
print I5

$ ./parrot t/pmc/threads_2.pasm
thread 1

Case 2:
(remove sleep all together)

$ ./parrot t/pmc/threads_2.pasm
main 10
thread 1

Case 3:
$ cat t/pmc/threads_2.pasm snipped
invoke  # start the thread

print main 
sleep 1
print I5

$ ./parrot t/pmc/threads_2.pasm
main 10

After the change - case 3 now prints thread 1.

You mentioned in the previous email that you were
interested in knowing if this was Windows IO or the
Cygwin variant.  I would love to give you that
information, but color me clueless.
 
 leo
 
Joshua Gatcomb
a.k.a. Limbic~Region




___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com


Re: Threads on Cygwin

2004-10-01 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:

 After the change - case 3 now prints thread 1.

Strange.

 You mentioned in the previous email that you were
 interested in knowing if this was Windows IO or the
 Cygwin variant.  I would love to give you that
 information, but color me clueless.

S/Cygwin/unix/

Have a look at the defines in include/parrot/io.h. It's not quite
visible which one is active w/o debugger, but you could insert some
print statements in io/io.c:PIO_init_stacks(), where there are explicit
cases for PIO_OS_*.

 Joshua Gatcomb

leo


Re: Threads on Cygwin

2004-10-01 Thread Joshua Gatcomb
--- Leopold Toetsch [EMAIL PROTECTED] wrote:

 Joshua Gatcomb [EMAIL PROTECTED] wrote:
 
  After the change - case 3 now prints thread 1.
 
 Strange.

indeed
 
  You mentioned in the previous email that you were
  interested in knowing if this was Windows IO or
 the
  Cygwin variant.  I would love to give you that
  information, but color me clueless.
 
 S/Cygwin/unix/
 
 Have a look at the defines in include/parrot/io.h.
 It's not quite
 visible which one is active w/o debugger, but you
 could insert some
 print statements in io/io.c:PIO_init_stacks(), where
 there are explicit
 cases for PIO_OS_*.

s/S/s/

PIO_OS_UNIX is the one defined and now parrot squawks
Polly wanna Unix everytime I run it ;-)

Now what?

 
  Joshua Gatcomb
 
 leo

Joshua Gatcomb
a.k.a. Limbic~Region



__
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 


Re: Threads on Cygwin

2004-09-30 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:
 Up until a couple of weeks ago, all the threads tests
 were passing on Cygwin.  I had submitted a patch some
 time ago that never got applied enabling tests for
 threads, timer, and extend_13 that never got applied.
 I figured there was good reason ...

Overlooked? Please rediff and resend.

 It says at the bottom that the output could appear in
 reversed order and so I am guessing the sleep
 statement is to ensure that it comes out in the proper
 order.

The sleep is of course a hack only and wrong. The real thing todo is
to convert the test result into a regexp that allows both orderings.

leo


Re: Threads on Cygwin

2004-09-30 Thread Joshua Gatcomb
--- Leopold Toetsch [EMAIL PROTECTED] wrote:

 Joshua Gatcomb [EMAIL PROTECTED]
wrote:
 I had submitted a patch some time ago that never
got
 applied enabling tests for threads, timer, and
 extend_13.

 Overlooked? Please rediff and resend.

I will do - likely tomorrow.

  It says at the bottom that the output could appear
  in reversed order and so I am guessing the sleep
  statement is to ensure that it comes out in the
  proper order.
 
 The sleep is of course a hack only and wrong. The
 real thing todo is to convert the test result into a
 regexp that allows both orderings.

I agree, but that doesn't explain why only 1 of the
two messages is displayed to the screen when the sleep
statement is present.  I don't want to brush a bug
under the rug.  If one thread finishes before the
other thread gets to a print statement, the print does
not appear on the screen at all.
 
 leo

Joshua Gatcomb
a.k.a. Limbic~Region 




__
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail


Threads on Cygwin

2004-09-29 Thread Joshua Gatcomb
Up until a couple of weeks ago, all the threads tests
were passing on Cygwin.  I had submitted a patch some
time ago that never got applied enabling tests for
threads, timer, and extend_13 that never got applied. 
I figured there was good reason so I didn't say
anything about the tests failing except an occasional
that's weird on #parrot.

So today I decide to look at threads_2.pasm

It says at the bottom that the output could appear in
reversed order and so I am guessing the sleep
statement is to ensure that it comes out in the proper
order.

So - why is the test failing?  Because the second
print statement never makes it to the screen.

If I remove the print statement entirely, I see both
things in the reverse expected order.

If I place the sleep statement after the main thread
print then all I get to the screen is the that and not
the print statement from thread 1

It is almost as if by the time the time the second
print happens, the filehandle is already closed


So - since threads aren't officially supposed to be
working on Cygwin - is this something I should care
about or not?

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region



__
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail


Re: Threads on Cygwin

2004-09-29 Thread Joshua Gatcomb
--- Joshua Gatcomb [EMAIL PROTECTED]
wrote:

 Up until a couple of weeks ago, all the threads
 tests
 were passing on Cygwin.  I had submitted a patch
 some
 time ago that never got applied enabling tests for
 threads, timer, and extend_13 that never got
 applied. 
 I figured there was good reason so I didn't say
 anything about the tests failing except an
 occasional
 that's weird on #parrot.
 
 So today I decide to look at threads_2.pasm
 
 It says at the bottom that the output could appear
 in
 reversed order and so I am guessing the sleep
 statement is to ensure that it comes out in the
 proper
 order.
 
 So - why is the test failing?  Because the second
 print statement never makes it to the screen.
 
 If I remove the print statement entirely, I see both
 things in the reverse expected order.
 
 If I place the sleep statement after the main thread
 print then all I get to the screen is the that and
 not
 the print statement from thread 1
 
 It is almost as if by the time the time the second
 print happens, the filehandle is already closed
 
 
 So - since threads aren't officially supposed to be
 working on Cygwin - is this something I should care
 about or not?
 
 Cheers
 Joshua Gatcomb
 a.k.a. Limbic~Region
 

In summary, all code in all threads runs to completion
but whichever thread finishes last can't print to the
screen

$ perl t/harness --gc-debug --running-make-test -b
t/pmc/threads.t
Failed 7/11 tests, 36.36% okay (less 2 skipped
tests: 2 okay, 18.18%)
Failed Test Stat Wstat Total Fail  Failed  List of
Failed
---
t/pmc/threads.t7  1792117  63.64%  2-3 5-9
2 subtests skipped.
Failed 1/1 test scripts, 0.00% okay. 7/11 subtests
failed, 36.36% okay.




__
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail 


[perl #31651] [TODO] Win32 - Threads, Events, Signals, Sockets

2004-09-20 Thread via RT
# New Ticket Created by  Will Coleda 
# Please include the string:  [perl #31651]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=31651 


(No details. This comes from TODO.win32)


As the world stops: of GC and threads

2004-02-07 Thread Gordon Henriksen
So, I haven't heard any convincing evidence that execution in other 
threads can continue while garbage collection is executing, copying 
collector or not. (Point of fact, the copying collector has nothing to 
do with it.) So what are the options to stop the world? I've heard the 
first 2 of these from Dan and Leo. The third is mine. Any others?

* Taking the read half of a reader-writer lock for all pointer mutations
The point here is to block all mutators from mutating the object graph 
while the collector is traversing it. I can't imagine this being good 
for performance. This is in addition to the mutex on the PMC. Other 
threads might be scheduled before GC completes, but don't have to be for 
GC to proceed.

* OS pause thread support
Lots of operating systems (including at least several versions of Mac OS 
X) don't support this. Maybe parrot can use it when it is available. 
It's always a dangerous proposition, though; the paused thread might be 
holding the system's malloc() mutex or something similarly evil. (This 
evil is why platforms don't always support the construct.) I don't think 
this is feasible for portability reasons. In this case, other threads 
would not be scheduled before GC completes.

* Use events
This is my proposition: Have the garbage collector broadcast a STOP! 
event to every other parrot thread, and then wait for them all to 
rendezvous that they've stopped before GC proceeds. Hold a lock on the 
thread mutex while doing this, so threads won't start or finish. This 
has the worst latency characteristics of the three: All other threads 
would need to be scheduled before GC can begin. Corner cases exist: NCI 
calls could be long-running and shouldn't block parrot until completion; 
another thread might try to allocate memory before checking events. 
Neither is insurmountable.



Gordon Henriksen
[EMAIL PROTECTED]


Re: Threads... last call

2004-02-02 Thread Dan Sugalski
At 1:27 AM -0500 1/30/04, Gordon Henriksen wrote:
On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote:

At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote:

On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data 
structures such as arrays and hashes.  Collections are not 
synchronized; access involves no locks at all.  Multiple threads 
accessing the same collection at the same time cannot, however, 
result in the virtual machine crashing.  (They can result in 
data structure corruption, but this corruption is limited to 
surprising results rather than VM crash.)
But this accomplishes nothing useful and still means the data 
structure is not re-entrant, nor is it corruption resistant, 
regardless of how we judge it.
It does accomplish something very useful indeed: It avoids the 
overhead of automatic locking when it isn't necessary. When *is* 
that locking necessary? To a second order approximation, 
***NEVER.***
Pardon me but I've apparently lost track of context here.

Elizabeth Mattijsen [EMAIL PROTECTED], Leopold Toetsch 
[EMAIL PROTECTED], [EMAIL PROTECTED] I thought we were 
discussing correct behavior of a shared data structure, not general 
cases. Or maybe this is the general case and I should go read more 
backlog? :)
A shared data structure, as per Dan's document? It's a somewhat 
novel approach, trying to avoid locking overhead with dynamic 
dispatch and vtable swizzling. I'm discussing somewhat more 
traditional technologies, which simply allow an object to perform 
equally correctly and with no differentiation between shared and 
unshared cases. In essence, I'm arguing that a shared case isn't 
necessary for some data structures in the first place.
Oh, absolutely.

One thing that was made very clear from the (arguably failed) perl 5 
experiments with threads is that trying to have the threaded and 
unthreaded code paths be merged was an exercise in extreme pain and 
code maintenance hell.

Allowing for vtable fiddling's a way to get around that problem by 
isolating the threaded and nonthreaded paths. It also allows us to 
selectively avoid threaded paths on a per-class per-method basis--on 
those classes that don't permit morph (and it is *not* a requirement 
that pmcs support it)

For those data types that don't require synchronization on some (or 
all) paths, the locking method can be a noop. Not free, as it still 
has to be called (unfortunately) but as cheap as possible. If someone 
wants to make a good case for it I'm even OK with having a NULL be 
valid for the lock vtable method and checking for non-NULLness in the 
lock before calling if anyone wants to try their hand at 
benchmarking. (I could see it going either way-- if(!NULL) or an 
empty sub being faster)

Still the No crash VM requirement is a bit of a killer, though. It 
will definitely impact performance, and there's no way around that 
that I know of. I've a few ideas to minimize the locking needs, 
though, and I'll try and get an addendum to the design doc out soon.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: More on threads

2004-01-30 Thread Dan Sugalski
At 10:50 AM -0500 1/24/04, Gordon Henriksen wrote:
On Saturday, January 24, 2004, at 09:23 , Leopold Toetsch wrote:

Gordon Henriksen [EMAIL PROTECTED] wrote:

... Best example: morph. morph must die.
Morph is necessary. But please note: morph changes the vtable of 
the PMC to point to the new data types table. It has nothing to do 
with a typed union.
The vtable IS the discriminator. I'm referring to this:

typedef union UnionVal {
struct {/* Buffers structure */
void * bufstart;
size_t buflen;
} b;
struct {/* PMC unionval members */
DPOINTER* _struct_val;   /* two ptrs, both are defines */
PMC* _pmc_val;
} ptrs;
INTVAL int_val;
FLOATVAL num_val;
struct parrot_string_t * string_val;
} UnionVal;
So long as the discriminator does not change, the union is type stable.
The vtable's not the discriminator there, the flags in the pmc are 
the discriminator, as they're what indicates that the union's a 
GCable thing or not. I will admit, though, that looks *very* 
different than it did when I put that stuff in originally. (It used 
to be just a union of FLOATVAL, INTVAL, and string pointer...)

Still, point taken. That needs to die and it needs to die now. For 
the moment, lets split it into two pieces, a buffer pointer and an 
int/float union, so we don't have to guess whether the contents have 
issues with threads.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Threads. Again. Dammit.

2004-01-30 Thread Dan Sugalski
Okay, it's obvious that we still have some issues to work out before 
we hit implementation details. (Hey, it could be worse--this is easy 
compared to strings...) I think there are some ways we can minimize 
locking, and I think we have some unpleasant potential issues to deal 
with in the interaction between strings and threads (I thought we 
could dodge that, but, well... I was wrong).

This needs more thought and more work before we go anywhere. Some of 
the obvious stuff, like fixing up the cache slot of the PMC, should 
be done regardless.

I also think we need more real-worldish tests for this, so we can see 
if the problems really are as bad as they seem. That, at least, I 
think I can help with, since I conveniently happen to have a compiler 
that targets parrot near-done enough to test some reasonably abusive 
HLL(ish) code to see what sort of hit we take.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: More on threads

2004-01-30 Thread Dan Sugalski
At 1:47 AM + 1/25/04, Pete Lomax wrote:
On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen
[EMAIL PROTECTED] wrote:
snip
It doesn't matter if an int field could read half of a double or v.v.;
it won't crash the program. Only pointers matter.
snip
These rules ensure that dereferencing a pointer will not segfault.
In this model, wouldn't catching the segfault
Apart from anything else, I don't want to catch segfaults and bus 
errors in parrot. (Well, OK, that's not true--I *do* want to catch 
segfaults and bus errors, I just don't think it's feasible, or 
possible on all platforms)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


RE: More on threads

2004-01-30 Thread Gordon Henriksen
Dan Sugalski wrote:

 Gordon Henriksen wrote:
 
  Leopold Toetsch wrote:
 
   Gordon Henriksen wrote:
  
... Best example: morph. morph must die.
  
   Morph is necessary. But please note: morph changes the vtable of
   the PMC to point to the new data types table. It has nothing to do

   with a typed union.
 
  The vtable IS the discriminator. I'm referring to this:
 
  typedef union UnionVal {
  struct {/* Buffers structure */
  void * bufstart;
  size_t buflen;
  } b;
  struct {/* PMC unionval members */
  DPOINTER* _struct_val;  /* two ptrs, both are 
defines */
  PMC* _pmc_val;
  } ptrs;
  INTVAL int_val;
  FLOATVAL num_val;
  struct parrot_string_t * string_val;
  } UnionVal;
 
  So long as the discriminator does not change, the union is 
  type stable.
 
 The vtable's not the discriminator there, the flags in the pmc are 
 the discriminator, as they're what indicates that the union's a 
 GCable thing or not. I will admit, though, that looks *very* 
 different than it did when I put that stuff in originally. (It used 
 to be just a union of FLOATVAL, INTVAL, and string pointer...)

Hm. Well, both are a discriminator, then; dispatch to code which
presumes the contents of the union is quite frequently done without
examining the flags. Maybe use a VTABLE func instead to get certain
flags? i.e.,

INTVAL parrot_string_get_flags(..., PMC *pmc) {
return PMC_FLAG_IS_POBJ + ...;
}

Then, updating the vtable would atomically update the flags as well.
Or, hell, put the flags directly in the VTABLE if it's not necessary
for them to vary across instances.

I have the entire source tree (save src/ tests) scoured of that rat's
nest of macros for accessing PMC/PObj fields, but I broke something and
haven't had the motivation to track down what in the multi-thousand-
line-diff it was, yet. :( Else you'd have the patch already and plenty
of mobility in the layout of that struct. Near time to upgrade my poor
old G3, methinks; the build cycle kills me when I touch parrot/pobj.h.


Do any PMC classes use *both* struct_val *and* pmc_val concurrently? I
was looking for that, but am afraid I didn't actually notice.

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



Re: More on threads

2004-01-30 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

[ PObj union ]

 Still, point taken. That needs to die and it needs to die now. For
 the moment, lets split it into two pieces, a buffer pointer and an
 int/float union, so we don't have to guess whether the contents have
 issues with threads.

The Buffer members (bufstart, buflen) of the union are never used for a
PMC. Also a PMC can't get converted into a Buffer or vv. These union
members are just there for DOD, so that one pobject_lives() (and other
functions) can be used for both PMCs and Buffers. That was introduced
when uniting Buffers and PMCs.

I don't see a problem with that.

The problem that Gordon expressed with morph is:

  thread1   thread2

  PerlInt-vtable-set_string_native
(int_val = 3)
  LOCK()
  perlscalar-vtable-morph:
  pmc-vtable is now a PerlString
vtable, str_val is invalid
read access on pmc - non-locked
PerlString-vtable-get_integer
STRING *s = pmc-str_val
SIGBUS/SEGV on access of s

But that can be solved by first clearing str_val, then changing the
vtable.

leo


RE: More on threads

2004-01-30 Thread Gordon Henriksen
Leopold Toetsch wrote:

 Gordon Henriksen wrote:
 
  ... in the multi-thousand-
  line-diff it was, yet. :( Else you'd have the patch already
 
 1) *no* multi-thousands line diffs
 2) what is the problem, you like to solve?

Er? Extending to the rest of the source tree the huge patch to
classes which you already applied. No logic changes; just
cleaning those PObj accessor macros up.

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



Re: More on threads

2004-01-30 Thread Leopold Toetsch
Gordon Henriksen wrote:

Er? Extending to the rest of the source tree the huge patch to
classes which you already applied. No logic changes; just
cleaning those PObj accessor macros up.
Ah sorry, that one. Please send in small bunches, a few files changed at 
once.

leo





Re: More on threads

2004-01-30 Thread Leopold Toetsch
Gordon Henriksen wrote:

Hm. Well, both are a discriminator, then; dispatch to code which
presumes the contents of the union is quite frequently done without
examining the flags. 
The flags are *never* consulted for a vtable call in classes/*. DOD does 
different things if a Buffer or PMC is looked at, but that doesn't 
matter here.

Then, updating the vtable would atomically update the flags as well.
Doesn't matter.


Or, hell, put the flags directly in the VTABLE if it's not necessary
for them to vary across instances.
No, flags are mutable and per PMC *not* per class.


... in the multi-thousand-
line-diff it was, yet. :( Else you'd have the patch already 
1) *no* multi-thousands line diffs
2) what is the problem, you like to solve?

Do any PMC classes use *both* struct_val *and* pmc_val concurrently? 
E.g. iterator.pmc. UnmanagedStruct uses int_val  pmc_val. This is no 
problem. These PMCs don't morph.

leo



RE: More on threads

2004-01-30 Thread Gordon Henriksen
Leopold Toetsch wrote:

 Gordon Henriksen wrote:
 
  Or, hell, put the flags directly in the VTABLE if it's not 
  necessary for them to vary across instances.
 
 No, flags are mutable and per PMC *not* per class.

Of course there are flags which must remain per-PMC. I wasn't
referring to them. Sorry if that wasn't clear.

If a flag is only saying my VTABLE methods use the UnionVal as {a
void*/a PObj*/a PMC*/data}, so GC should trace accordingly, it may be
a waste of a per-object flag bit to store those flags with the PMC
instance rather than with the PMC class. And if it's with the VTABLE,
then it doesn't need to be traced. (But, then, all PObjs don't have
VTABLES...)


Sidebar:
If we're looking at lock-free concurrency, flag updates probably have
to be performed with atomic 's and |'s. BUT: Doesn't apply during GC,
since other threads will have to be stalled then.


  Do any PMC classes use *both* struct_val *and* pmc_val concurrently?
 
 E.g. iterator.pmc. UnmanagedStruct uses int_val  pmc_val. This is no 
 problem. These PMCs don't morph.

Er, int_val and pmc_val at the same time? That's not quite what the
layout
provides for:

typedef union UnionVal {
struct {/* Buffers structure */
void * bufstart;
size_t buflen;
} b;
struct {/* PMC unionval members */
DPOINTER* _struct_val;   /* two ptrs, both are defines */
PMC* _pmc_val;
} ptrs;
INTVAL int_val;
FLOATVAL num_val;
struct parrot_string_t * string_val;
} UnionVal;

Says to me:

struct_val and pmc_val concurrently
  -- or --
bufstart and buflen concurrently
  -- or --
int_val
  -- or --
num_val
  -- or --
string_val

I don't know if C provides a guarantee that int_val and ptrs._pmc_val
won't 
overlap just because INTVAL and DPOINTER* fields happen to be the same
size.
At  least one optimizing compiler I know of, MrC/MrC++, would do some
struct
rearrangement when it felt like it.

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



Re: More on threads

2004-01-30 Thread Leopold Toetsch
Gordon Henriksen wrote:

Leopold Toetsch wrote:
No, flags are mutable and per PMC *not* per class.
Of course there are flags which must remain per-PMC. I wasn't
referring to them. Sorry if that wasn't clear.
If a flag is only saying my VTABLE methods use the UnionVal as {a
void*/a PObj*/a PMC*/data}, so GC should trace accordingly, it may be
a waste of a per-object flag bit to store those flags with the PMC
instance rather than with the PMC class. 
All DOD related flags in the fast paths (i.e. for marking scalars) are 
located in the PMCs arena (with ARENA_DOD_FLAGS is on). This reduces 
cache misses during DOD to nearly nothing. More DOD related information 
is in the flags part of the Pobj - but accessing that also means cache 
pollution. Putting flags elsewhere too, needs one more indirection and 
allways an access to the PMC memory itself. This doesn't give us any 
advantage.

But again, flags don't matter during setting or getting a PMCs data. 
Flags aren't used in classes for these purposes.

There are very few places in classes, where flags are even changed. This 
is morphing scalars, and Key PMCs come to my mind.


If we're looking at lock-free concurrency, flag updates probably have
to be performed with atomic 's and |'s. 
Almost all mutating vtable methods will lock the pmc.


Er, int_val and pmc_val at the same time? 
I know :) This isn't the safest thing we have. After your union accessor 
patches, we can clean that up, and use a notation so that for this case, 
the two union members really can't overlap.

leo



Re: More on threads

2004-01-30 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

[ perlscalar moprh ]

 But that can be solved by first clearing str_val, then changing the
 vtable.

Fixed. I currently don't see any more problems related to perscalars.

PerlStrings are unsafe per se, as long as we have the copying GC. They
need a lock during reading too. All other perscalars should be safe now
for non-locked reading. Mutating vtables get a lock.

leo


Re: Threads... last call

2004-01-29 Thread Gordon Henriksen
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data structures 
such as arrays and hashes.  Collections are not synchronized; access 
involves no locks at all.  Multiple threads accessing the same 
collection at the same time cannot, however, result in the virtual 
machine crashing.  (They can result in data structure corruption, but 
this corruption is limited to surprising results rather than VM 
crash.)
But this accomplishes nothing useful and still means the data structure 
is not re-entrant, nor is it corruption resistant, regardless of how 
we judge it.
It does accomplish something very useful indeed: It avoids the overhead 
of automatic locking when it isn't necessary. When *is* that locking 
necessary? To a second order approximation, ***NEVER.***

Never? Yes. From the user's perspective, synchronized objects more 
often than not provide no value! For instance, this Java code, run from 
competing threads, does not perform its intended purpose:

//  Vector was Java 1's dynamically sizable array class.
if (!vector.Contains(obj))
vector.Add(obj);
I does not prevent obj from appearing more than once in the collection. 
It's equivalent to this:

bool temp;
synchronized (collection) {
temp = vector.Contains(obj);
}
//  Preemption is possible between here...
if (!temp) {
sychronized (collection) {
//  ... and here.
vector.Add(obj);
}
}
The correct code is this:

synchronized (vector) {
if (!vector.Contains(obj))
vector.Add(obj);
}
If we again make explicit Vector's synchronized methods, a startling 
redundancy will become apparent:

synchronized (vector) {
bool temp;
synchronized (collection) {
temp = vector.Contains(obj);
}
if (!temp) {
sychronized (collection) {
vector.Add(obj);
}
}
}
This code is performing 3 times as many locks as necessary! More 
realistic code will perform many times more than 3 times the necessary 
locking. Imagine the waste in sorting a 10,000 element array.

Beyond that stunning example of wasted effort, most structures aren't 
shared in the first place, and so the overhead is *completely* 
unnecessary for them.

Still further, many shared objects are unmodified once they become 
shared, and so again require no locking.

The only time automatically synchronized objects are useful is when the 
user's semantics exactly match the object's methods. (e.g., an 
automatically synchronized queue might be quite useful.) In that case, 
it's a trivial matter for the user to wrap the operation in a 
synchronized blockor subclass the object with a method that does so. 
But it is quite impossible to remove the overhead from the other 99% of 
uses, since the VM cannot discern the user's locking strategy.

If the user is writing a threaded program, then data integrity is 
necessarily his problema virtual machine cannot solve it for him, and 
automatic synchronization provides little help. Protecting a 
policy-neutral object (such as an array) from its user is pointless; all 
we could do is to ensure that the object is in a state consistent with 
an incorrect sequence of operations. That's useless to the user, because 
his program still behaves incorrectly; the object is, to his eyes, 
corrupted since it no longer maintains his invariant conditions.

Thus, since the VM cannot guarantee that the the user's program behaves 
as intended (only as written), and all that locking is wasted 4 times 
overthe virtual machine would be wise to limit its role to the 
prevention of crashes and to limiting the corruption that can result 
from incorrect (unsynchronized) access to objects. If automatic locking 
is the only way to do that for a particular data structure, then so be 
it. Oftentimes, though, thoughtful design can ensure that even 
unsynchronized accesses cannot crash or corrupt the VM as a whole--even 
if those operations might corrupt one object's state, or provide less 
than useful results.

As an example of how this applies to parrot, all of the FixedMumbleArray 
classes[*] recently discussed could clearly be implemented completely 
and safely without automatic locks on all modern platformsif only 
parrot could allow lock-free access to at least some PMCs. ([*] Except 
FixedMixedArray. That stores UVals [plus more for a discriminator 
field], and couldn't be quite safe, as a UVal isn't an atomic write on 
many platforms. Then again, nor is a double.)

Remember, Java already made the mistake of automatic synchronization 
with their original

Re: Threads... last call

2004-01-29 Thread Damien Neil
On Wed, Jan 28, 2004 at 12:53:09PM -0500, Melvin Smith wrote:
 At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:
 Java Collections are a standard Java library of common data structures
 such as arrays and hashes.  Collections are not synchronized; access
 involves no locks at all.  Multiple threads accessing the same
 collection at the same time cannot, however, result in the virtual
 machine crashing.  (They can result in data structure corruption,
 but this corruption is limited to surprising results rather than
 VM crash.)
 
 But this accomplishes nothing useful and still means the data structure
 is not re-entrant, nor is it corruption resistant, regardless of how we 
 judge it.

Quite the contrary--it is most useful.

Parrot must, we all agree, under no circumstances crash due to
unsynchronized data access.  For it to do so would be, among other
things, a gross security hole when running untrusted code in a
restricted environment.

There is no need for any further guarantee about unsynchronized data
access, however.  If unsyncronized threads invariably cause an exception,
that's fine.  If they cause the threads involved to halt, that's fine
too.  If they cause what was once an integer variable to turn into a
string containing the definition of mulching...well, that too falls
under the heading of undefined results.  Parrot cannot and should
not attempt to correct for bugs in user code, beyond limiting the extent
of the damage to the threads and data structures involved.

Java, when released, took the path that Parrot appears to be about
to take--access to complex data structures (such as Vector) was
always synchronized.  This turned out to be a mistake--sufficiently
so that Java programmers would often implement their own custom,
unsynchronized replacements for the core classes.  As a result,
when the Collections library (which replaces those original data
structures) was released, the classes in it were left unsynchronized.

In Java's case, the problem was at the library level, not the VM
level; as such, it was relatively easy to fix at a later date.
Parrot's VM-level data structure locking will be less easy to change.

 - Damien


Re: Threads... last call

2004-01-29 Thread Melvin Smith
At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote:
On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data structures 
such as arrays and hashes.  Collections are not synchronized; access 
involves no locks at all.  Multiple threads accessing the same 
collection at the same time cannot, however, result in the virtual 
machine crashing.  (They can result in data structure corruption, but 
this corruption is limited to surprising results rather than VM crash.)
But this accomplishes nothing useful and still means the data structure 
is not re-entrant, nor is it corruption resistant, regardless of how we 
judge it.
It does accomplish something very useful indeed: It avoids the overhead of 
automatic locking when it isn't necessary. When *is* that locking 
necessary? To a second order approximation, ***NEVER.***
Pardon me but I've apparently lost track of context here.

I thought we were discussing correct behavior of a shared data structure,
not general cases. Or maybe this is the general case and I should
go read more backlog? :)
-Melvin




Re: Threads... last call

2004-01-29 Thread Leopold Toetsch
Melvin Smith [EMAIL PROTECTED] wrote:

 I thought we were discussing correct behavior of a shared data structure,
 not general cases. Or maybe this is the general case and I should
 go read more backlog? :)

Basically we have three kinds of locking:
- HLL user level locking [1]
- user level locking primitives [2]
- vtable pmc locking to protect internals

Locking at each stage and for each PMC will be slow and can deadlock
too. Very coarse grained locking (like Pythons interpreter_lock) doesn't
give any advantage on MP systems - only one interpreter is running at
one time.

We can't solve user data integrity at the lowest level: data logic and
such isn't really visible here. But we should be able to integrate HLL
locking with our internal needs, so that the former doesn't cause
deadlocks[3] because we have to lock internally too, and we should be able
to omit internal locking, if HLL locking code already provides this
safety for a specific PMC.

The final strategy when to lock what depends on the system the code is
running. Python's model is fine for single processors. Fine grained PMC
locking gives more boost on multi-processor machines.

All generalization is evil and 47.4 +- 1.1% of all
statistics^Wbenchmarks are wrong;)

leo

[1] BLOCK scoped, e.g. synchronized {... } or { lock $x; ... }
These can be rwlocks or mutex typed locks
[2] lock.aquire, lock.release

[3] not user caused deadlocks - the mix of e.g. one user lock and
internal locking.

 -Melvin

leo


Re: Threads... last call

2004-01-29 Thread Gordon Henriksen
On Thursday, January 29, 2004, at 11:55 , Melvin Smith wrote:

At 11:45 PM 1/28/2004 -0500, Gordon Henriksen wrote:

On Wednesday, January 28, 2004, at 12:53 , Melvin Smith wrote:

At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:

Java Collections are a standard Java library of common data 
structures such as arrays and hashes.  Collections are not 
synchronized; access involves no locks at all.  Multiple threads 
accessing the same collection at the same time cannot, however, 
result in the virtual machine crashing.  (They can result in data 
structure corruption, but this corruption is limited to surprising 
results rather than VM crash.)
But this accomplishes nothing useful and still means the data 
structure is not re-entrant, nor is it corruption resistant, 
regardless of how we judge it.
It does accomplish something very useful indeed: It avoids the 
overhead of automatic locking when it isn't necessary. When *is* that 
locking necessary? To a second order approximation, ***NEVER.***
Pardon me but I've apparently lost track of context here.

I thought we were discussing correct behavior of a shared data 
structure, not general cases. Or maybe this is the general case and I 
should go read more backlog? :)
A shared data structure, as per Dan's document? It's a somewhat novel 
approach, trying to avoid locking overhead with dynamic dispatch and 
vtable swizzling. I'm discussing somewhat more traditional technologies, 
which simply allow an object to perform equally correctly and with no 
differentiation between shared and unshared cases. In essence, I'm 
arguing that a shared case isn't necessary for some data structures in 
the first place.



Gordon Henriksen
[EMAIL PROTECTED]


Re: More on threads

2004-01-25 Thread Leopold Toetsch
Gordon Henriksen [EMAIL PROTECTED] wrote:

 I overstated when I said that morph must die. morph could live IF:

[ long proposal ]

Increasing the union size, so that each pointer is distinct is not an
option. This imposes considerable overhead on a non-threaded program
too, due its bigger PMC size.

To keep internal state consistent we have to LOCK shared PMCs, that's
it. This locking is sometimes necessary for reading too.

leo


Re: More on threads

2004-01-25 Thread Gordon Henriksen
 
will be? You just showed in a microbenchmark that it's 400% for some 
operations. We've also heard anecdotal evidence of 400% *overall* 
performance hits from similar threading strategies in other projects. 
And remember, these overheads are ON TOP OF the user's synchronization 
requirements; the PMC locks will rarely coincide with the user's 
high-level synchronization requirements.

If these are the two options, I as a user would rather have a separate 
threaded parrot executable which takes the 2.1% hit, rather than the 
400% overhead as per above. It's easily the difference between usable 
threads and YAFATTP (yet another failed attempt to thread perl).



Gordon Henriksen
[EMAIL PROTECTED]


Re: More on threads

2004-01-25 Thread Leopold Toetsch
Gordon Henriksen [EMAIL PROTECTED] wrote:
 Leopold Toetsch wrote:

 Increasing the union size, so that each pointer is distinct is not an
 option. This imposes considerable overhead on a non-threaded program
 too, due its bigger PMC size.

 That was the brute-force approach, separating out all pointers. If the
 scalar hierarchy doesn't use all 4 of the pointers, then the bloat can
 be reduced.

Yep. Your proposal is a very thorough analysis, what the problems of
morph() currently are. I can imagine, that we have a distinct string_val
pointer, that isn't part of the value union. Morph is currently
implemented only (and necessary) for PerlScalar types. So if a
PerlString's string_val member is valid at any time, we could probably
save a lot of locking overhead.

 And what of the per-PMC mutex? Is that not also considerable overhead?
 More than an unused field, even.

We have to deal single-CPU (no-threaded) performance against threaded.
For the latter, we have from single-CPU to many-multi CPU NUMA systems a
wide spectrum of possibilties.

I currently don't want to slow down the normal no-threaded case.

 To keep internal state consistent we have to LOCK shared PMCs, that's
 it. This locking is sometimes necessary for reading too.

 Sometimes? Unless parrot can prove a PMC is not shared, PMC locking is
 ALWAYS necessary for ALL accesses to ANY PMC.

get_integer() on a PerlInt PMC is always safe. *If* the vtable is
pointing to a PerlInt PMC it yields a correct value (atomic int access
presumed). If for some reason (which I can't imagine now) the vtable
pointer and the cache union are out of sync, get_integer would produce a
wrong value, which is currently onsidered to be a user problem (i.e.
missing user-level locking).

The same holds for e.g. PerlNum, which might read a mixture of lo and hi
words, but again, the pmc-vtable-get_number of a PerlNum is a safe
operation on shared PMCs without locking too.

The locking primitives provide AFAIK the (might needed) memory barriers
to update the vtable *and* the cache values to point to consistent data
during the *locking* of mutating vtable methods.

 BENCHMARK USER SYS  %CPUTOTAL
 That's only 2.1%.

[ big snip on benchmarks ]

All these benchmarks show a scheme like: allocate once and use once. I.e.
these benchmarks don't reuse the PMCs. They don't show RL program
behavior. We don't have any benchmarks currently that expose a near to
worst-case slowdown, which went towards 400% for doubled PMC sizes.

We had some test programs (C only with simulated allocation schemes), that
shuffled allocated PMCs in memory (a typical effect of reusing PMC
memory a few time). The current 20 byte PMC went down by 200% the old 32
byte PMC went down by 400%.

The point is, that when you get allocated PMC from the free_list, they
are more and more scattered in the PMC arenas. All current benchmarks
tend to touch contiguous memory, while RL (or a bit longer running)
programs don't.

That said, I do consider PMC sizes and cache pollution the major
performance issues of RL Parrot programs. These benchmarks don't show
that yet.

 What do you think the overall performance effect of fine-grained locking
 will be? You just showed in a microbenchmark that it's 400% for some
 operations.

Yes. That's the locking overhead for the *fastest* PMC vtable methods. I
think, that we'll be able to have different locking strategies in the
long run. If you want to have a more scalable application on a many-CPU
system, a build-option may provide this. For a one or two CPU system, we
can do a less fine grained locking with less overhead. That might be a
global interpreter lock for that specific case.

 And remember, these overheads are ON TOP OF the user's synchronization
 requirements; the PMC locks will rarely coincide with the user's
 high-level synchronization requirements.

User level locking is't layed out yet. But my 2¢ towards that are: a
user will put a lock around unsafe (that is shared) variable access. We
have to lock internally a lot of times to keep our data-integrity, which
is guaranteed. So the question is, why not to provide that integrity
per se (the user will need it anyway and lock). I don't see a difference
her for data integrity, *but* if all locking is under our control, we
can optimize it and it doesn't conflict or it shouldn't deadlock.

 If these are the two options, I as a user would rather have a separate
 threaded parrot executable which takes the 2.1% hit, rather than the
 400% overhead as per above. It's easily the difference between usable
 threads and YAFATTP (yet another failed attempt to thread perl).

All these numbers are by far too premature, to have any impact on RL
applications.

Please note, that even with a 400% slowdown for one vtable operation the
mops_p.pasm benchmark would run 4 to eight times faster then on an
*unthreaded* perl5. Thread spawning is currently 8 times faster then on
perl5.

 ?

!!!1

 Gordon Henriksen

leo


Re: More on threads

2004-01-24 Thread Leopold Toetsch
Gordon Henriksen [EMAIL PROTECTED] wrote:

 ... Best example: morph. morph must die.

Morph is necessary. But please note: morph changes the vtable of the PMC
to point to the new data types table. It has nothing to do with a typed
union.

 Gordon Henriksen

leo


Re: More on threads

2004-01-24 Thread Gordon Henriksen
On Saturday, January 24, 2004, at 09:23 , Leopold Toetsch wrote:

Gordon Henriksen [EMAIL PROTECTED] wrote:

... Best example: morph. morph must die.
Morph is necessary. But please note: morph changes the vtable of the 
PMC to point to the new data types table. It has nothing to do with a 
typed union.
The vtable IS the discriminator. I'm referring to this:

typedef union UnionVal {
struct {/* Buffers structure */
void * bufstart;
size_t buflen;
} b;
struct {/* PMC unionval members */
DPOINTER* _struct_val;   /* two ptrs, both are defines */
PMC* _pmc_val;
} ptrs;
INTVAL int_val;
FLOATVAL num_val;
struct parrot_string_t * string_val;
} UnionVal;
So long as the discriminator does not change, the union is type stable. 
When the discriminator does change, as per here:

	void
	Parrot_PerlInt_set_string_native(Parrot_Interp interpreter, PMC* 
pmc, STRING* value)
	{
	 VTABLE_morph(interpreter, pmc, enum_class_PerlString);
	 VTABLE_set_string_native(interpreter, pmc, value);
	}
	
	void
	Parrot_perlscalar_morph(Parrot_Interp interpreter, PMC* pmc, INTVAL 
type)
	{
	if (pmc-vtable-base_type == enum_class_PerlString) {
	if (type == enum_class_PerlString)
	return;
	PObj_custom_mark_CLEAR(pmc);
	pmc-vtable = Parrot_base_vtables[type];
	return;
	}
	if (type == enum_class_PerlString) {
	pmc-vtable = Parrot_base_vtables[type];
	 VTABLE_init(interpreter, pmc);
	return;
	}
	PObj_custom_mark_CLEAR(pmc);
	pmc-vtable = Parrot_base_vtables[type];
	
	}

... then these can both run:

Parrot_scalar_get_string(Parrot_Interp interpreter, PMC* pmc)
{
return (STRING*)pmc-cache.string_val;
}

FLOATVAL
Parrot_scalar_get_number(Parrot_Interp interpreter, PMC* pmc)
{
return pmc-cache.num_val;
}
That clearly allows a struct parrot_string_t * to freely share the same 
memory as a double. Were it an int and a double, the surprising 
results from this unprotected access wouldn't violate the no crashes 
guarantee. But it's a pointer! Dereferencing it could cause a segfault, 
or a read or write of an arbitrary memory location. Both clearly violate 
the crucial guarantee.



Gordon Henriksen
[EMAIL PROTECTED]


Re: More on threads

2004-01-24 Thread Gordon Henriksen
Leopold Toetsch wrote:

Gordon Henriksen [EMAIL PROTECTED] wrote:

... Best example: morph. morph must die.
Morph is necessary. But please note: morph changes the vtable of the 
PMC to point to the new data types table. It has nothing to do with a 
typed union.
I overstated when I said that morph must die. morph could live IF:

 the UnionVal struct were rearranged
 bounds ere placed upon how far a morph could... well, morph
It doesn't matter if an int field could read half of a double or v.v.; 
it won't crash the program. Only pointers matter.

To allow PMC classes to guarantee segfault-free operation, morph and 
cooperating PMC classes must conform to the following rule. Other 
classes would require locking.

With this vocabulary:
	variable: A memory location which is reachable (i.e., not garbage). [*]
	pointer: The address of a variable.
	pointer variable: A variable which contains a pointer.
	access: For a pointer p, any dereference of p*p, p-field, or 
p[i]whether for the purposes of reading or writing to that variable.

And considering:
	any specific pointer variable (ptr), and
	all accesses which parrot might perform[**] on any pointer ever 
stored in ptr (A) [***], and
	any proposed assignment to ptr

Then:
	If any A which once accessed a pointer variable would now access a 
non-pointer variable,
	Then the proposed assignment MUST NOT be performed.

This is a relaxed type-stabilitydefinition. (Relaxed: It provides type 
stability only for pointer variables, not for data variables. It does 
not discriminate the types of pointers, only that the data structures 
they directly reference have the same layout of pointers. Also, a 
loophole allows non-pointer variables to become pointer variables, but 
not the reverse.)

These rules ensure that dereferencing a pointer will not segfault. They 
also ensure that it is safe to deference a pointer obtained from a union 
according to the union's discriminatorregardless of when or in which 
order or how often parrot read the pointer or the discriminator.[***] I 
think they're actually the loosest possible set of rules to do this.

[*] Two union members are the same variable.
[**] This is in the variable ptr specifically, not merely in the same 
field of a similar struct. That is, having an immutable discriminator 
which selects s.u.v or s.u.i from struct { union { void * v; int i; } 
u } s is valid. A mutable discriminator is also validso long as the 
interpretation of pointer fields does not change.
[***] But only if the architecture prevents shearing in pointer reads 
and writes.

From another perspective this is to say:

Every pointer variable must forever remain a pointer.
Union discriminators must not change such that a pointer will no longer 
be treated as a pointer, or will be treated as a pointer to a structure 
with a different layout.

The first step in conforming to these rules is guaranteeing that a 
perlscalar couldn't morph into an intlist or some other complete 
nonsense. So the default for PMCs should be to prohibit morphing. Also, 
morphable classes will have a hard time using struct_val without 
violating the above rules. But for this price, parrot could get 
lock-free, guaranteed crash-proof readers for common data types. But 
note that pmc-cache.pmc_val can be used freely! So if exotic scalars 
wrap their data structures in a PMC 
*cough*perlobject*cough*managedstruct*ahem*, then those PMCs can be part 
of a cluster of morphable PMC classes without violating these rules.

Next, the scalar hierarchy (where morphing strikes me as most important) 
could be adjusted to provide the requisite guarantees, such as: 
perlstring's vtable methods would never look for its struct 
parrot_string_t * in the same memory location that a perlnum vtable 
method might be storing half of a floatval. Right now, that sort of 
guarantee is not made, and so ALL shared PMCs REALLY DO require locking. 
That's bad, and it's solvable.

Specifically, UnionVal with its present set of fields, would have to 
become something more like this:

struct UnionVal {
struct parrot_string_t * string_val;
DPOINTER* struct_val;
PMC* pmc_val;
void *b_bufstart;
union {
INTVAL _int_val;
size_t _buflen;
FLOATVAL _num_val;
} _data_vals;
};
If no scalar types use struct_val or pmc_val or b_bufstart, then those 
fields can go inside the union.

Unconstrained morphing is the only technology that *in all cases* 
*completely* prevents the crash-proof guarantee for lock-free access to 
shared PMCs. Without changes to this, we're stuck with implicit PMC 
locking and what looks like an unusable threading implementation.

This is only the beginning! For example, if parrot can povide type 
stability, mutable strings can be crash-proofed from multithreaded 
access. Wha?!

1. Add to the buffer structure an immutable 

Re: More on threads

2004-01-24 Thread Gordon Henriksen
I wrote:

With this vocabulary:
	variable: A memory location which is reachable (i.e., not 
garbage). [*]
	pointer: The address of a variable.
	pointer variable: A variable which contains a pointer.
	access: For a pointer p, any dereference of p*p, p-field, or 
p[i]whether for the purposes of reading or writing to that variable.

And considering:
	any specific pointer variable (ptr), and
	all accesses which parrot might perform[**] on any pointer ever 
stored in ptr (A) [***], and
	any proposed assignment to ptr

Then:
	If any A which once accessed a pointer variable would now access a 
non-pointer variable,
	Then the proposed assignment MUST NOT be performed.
D'oh. This actually has to be recursive.

Considering:
	any specific pointer variable (ptr'), and
	all accesses which parrot might perform on any pointer ever stored 
in ptr,
		and all accesses which parrot might perform on any pointer 
ever stored in those variables,
			...,
... A, and
	any proposed assignment to ptr

else it allows

char * a = ...;
char ** b = a;  
Doesn't change the conclusions I drew at all. (Nor does it require some 
massively recursive algorithm to run at pointer assignment time, just as 
the first one didn't require anything more than pointer assignment at 
pointer assignment time.) Could probably be simplified with the addition 
of pointer type to the definitions section.

Anyhoo.



Gordon Henriksen
[EMAIL PROTECTED]


Re: More on threads

2004-01-24 Thread Pete Lomax
On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen
[EMAIL PROTECTED] wrote:

snip
It doesn't matter if an int field could read half of a double or v.v.; 
it won't crash the program. Only pointers matter.
snip
These rules ensure that dereferencing a pointer will not segfault.
In this model, wouldn't catching the segfault and retrying (once or
twice) work? - If I'm reading you correctly, which is unlikely, this
has little to do with program correctness, but about the interpreter
not crashing because of an unfortunate context switch.. which the
programmer should have guarded against in the first place... no, I
think I just lost the plot again ;-)

Pete


Re: More on threads

2004-01-24 Thread Gordon Henriksen
Pete Lomax wrote:

Gordon Henriksen wrote:

snip
It doesn't matter if an int field could read half of a double or v.v.;
it won't crash the program. Only pointers matter.
snip
These rules ensure that dereferencing a pointer will not segfault.
In this model, wouldn't catching the segfault and retrying (once or 
twice) work?
Determining how to retry in the general case would be... much more 
interesting than this proposal. :)

Furthermore, worse than segfaults could potentially result from using 
half of a double as a pointer. There's no assurance that *((caddr_t *) 
double) won't in fact be a valid memory address. In this case, there 
would be no segfault, but memory would be subtly corrupted. There's no 
way to detect that, so there's no way to retry.

The point of all the tweaks and care is to prevent ever dropping 
something else in a particular variable where parrot would, at another 
point in the program, expect a pointer of a particular type.

I think you probably got the following, but I'd just like to elaborate 
more specifically. I think the greatest subtlety of the rules was in the 
interpretation of

accesses which parrot might perform
and the word specific in

any specific pointer variable ptr
Without understanding precisely what I meant there, one might think that 
even a simple polymorphic system like the following is prohibited:

struct eg;
typedef int (func_ptr*)(struct eg*);
struct {
func_ptr fp;
union { int *pointer; int integer; } u;
} eg;

int pointer_meth(struct eg *thiz) { return ++*(thiz-u.pointer); }
int integer_meth(struct eg *thiz) { return ++(thiz-u.integer); }

void main() {
struct eg eg1 = { pointer_meth, NULL };
struct eg eg2 = { integer_meth, 1 };
eg1.u.pointer = malloc(sizeof(int));
*eg1.u.pointer = 0;

print_it(eg1, eg1);
print_it(eg2, eg2);
}
void print_it(char *name, struct eg *some_eg) {
printf(%s says %d\n, some_egfp(eg2));
}
But the program IS allowed. While print_it might behave in any number of 
ways depending on some_eg-fp, it will always access a particular eg.u 
in a consistent fashion, since it always respects the discriminator 
(eg.fp), which the program never changes. By extension of this, C++ 
instances do not violate these rules, either.[*] Were the following line 
added to main(), though, then the program would be in violation:

		eg1.fp = integer_meth;

Because now some other thread could have obvserved eg1.fp == 
pointer_meth and begun invoking pointer_meth. pointer_meth might now 
access u.pointer and, instead of a pointer, see n + (int) u.pointer. 
That probably won't segfault for small values of n, but will certainly 
not do the right thing either.

This is trite tangent, but also note that the type stability rule 
prohibits this:

		eg1.u.pointer = NULL;

But would not if the definition of pointer_meth became:

int pointer_meth(struct eg *thiz) {
int* pointer = this-u.pointer;
return pointer == NULL? -1 : ++*pointer;
}
Because now the program will now not dereference u.pointer if its value 
is NULL. How cute. (But if u.pointer were not copied to a local, then 
bets are off again, because C might perform an extra load and get a 
value inconsistent with the one it received when testing for NULL.)

... But even that extra local copy isn't required if u.pointer begins 
NULL, and can become non-NULL, but will not become NULL again. Why?

	all accesses which parrot MIGHT perform on any pointer ever 
storED in ptr (A)
Note the past tense there. That's why.


If I'm reading you correctly, which is unlikely,
That has little to do with you, but much to do with my burying important 
parts of my message in pages of dense text. :)

this has little to do with program correctness, but about the 
interpreter not crashing because of an unfortunate context switch.. 
which the programmer should have guarded against in the first place...
Yes. Precisely.

no, I think I just lost the plot again ;-)
I think you're pretty close, just missing a few of the subtleties that 
got buried in that long missive. :)



Gordon Henriksen
[EMAIL PROTECTED]
[*] Even though C++ changes the vtable of an instance during 
instantiation, it does so in a broadening fashion, making 
formerly-inaccessible variables accessible. A C++ instance of class 
derived_class : public base_class is not a derived_class until 
base_class's constructor finishes and derived_class's constructor 
begins. (Side-effect: A subclass cannot influence the instantiation 
behavior of a base class.) (Objective C's class methods are wildly 
useful. Static languages tend to ignore them. It's sad.)


Re: Threads... last call

2004-01-23 Thread Dan Sugalski
At 5:24 PM -0500 1/22/04, Deven T. Corzine wrote:
Dan Sugalski wrote:

Last chance to get in comments on the first half of the proposal. 
If it looks adequate, I'll put together the technical details 
(functions, protocols, structures, and whatnot) and send that off 
for abuse^Wdiscussion. After that we'll finalize it, PDD the thing, 
and get the implementation in and going.
Dan,

Sorry to jump in out of the blue here, but did you respond to Damien 
Neil's message about locking issues?  (Or did I just miss it?)
Damian's issues were addressed before he brought them up, though not 
in one spot.

A single global lock, like python and ruby use, kill any hope of SMP-ability.

Hand-rolled threading has unpleasant complexity issues, is a big 
pain, and terribly limiting. And kills any hope of SMP-ability.

Corruption-resistent data structures without locking just don't exist.

This sounds like it could be a critically important design question; 
wouldn't it be best to address it before jumping into 
implementation?  If there's a better approach available, wouldn't 
this be the best time to determine that?

Deven

Date: Wed, 21 Jan 2004 13:32:52 -0800
From: Damien Neil [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Start of thread proposal
Message-ID: [EMAIL PROTECTED]
References: [EMAIL PROTECTED] 
[EMAIL PROTECTED] 
[EMAIL PROTECTED]
In-Reply-To: [EMAIL PROTECTED]
Content-Length: 1429

On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
 ... seems to indicate that even whole ops like add P,P,P are atomic.

Yep. They have to be, because they need to guarantee the integrity 
of the pmc structures and the data hanging off them (which includes 
buffer and string stuff)
Personally, I think it would be better to use corruption-resistant
buffer and string structures, and avoid locking during basic data
access.  While there are substantial differences in VM design--PMCs
are much more complicated than any JVM data type--the JVM does provide
a good example that this can be done, and done efficiently.
Failing this, it would be worth investigating what the real-world
performance difference is between acquiring multiple locks per VM
operation (current Parrot proposal) vs. having a single lock
controlling all data access (Python) or jettisoning OS threads
entirely in favor of VM-level threading (Ruby).  This forfeits the
ability to take advantage of multiple CPUs--but Leopold's initial
timing tests of shared PMCs were showing a potential 3-5x slowdown
from excessive locking.
I've seen software before that was redesigned to take advantage of
multiple CPUs--and then required no less than four CPUs to match
the performance of the older, single-CPU version.  The problem was
largely attributed to excessive locking of mostly-uncontested data
structures.
   - Damien


--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads... last call

2004-01-23 Thread Dan Sugalski
At 5:58 PM -0500 1/22/04, Josh Wilmes wrote:
I'm also concerned by those timings that leo posted.
0.0001 vs 0.0005 ms on a set- that magnitude of locking overhead
seems pretty crazy to me.
It looks about right. Don't forget, part of what you're seeing isn't 
that locking mutexes is slow, it's that parrot does a lot of stuff 
awfully fast. It's also a good idea to get more benchmarks before 
jumping to any conclusions -- changing designes based on a single, 
first cut, quick-n-dirty benchmark isn't necessarily a wise thing.

It seemed like a few people have said that the JVM style of locking
can reduce this, so it seems to me that it merits some serious
consideration, even if it may require some changes to the design of
parrot.
There *is* no JVM-style locking. I've read the docs and looked at 
the specs, and they're not doing anything at all special, and nothing 
different from what we're doing. Some of the low-level details are 
somewhat different because Java has more immutable base data 
structures (which don't require locking) than we do. Going more 
immutable is an option, but one we're not taking since it penalizes 
things we'd rather not penalize. (String handling mainly)

There is no JVM Magic here. If you're accessing shared data, it has 
to be locked. There's no getting around that. The only way to reduce 
locking overhead is to reduce the amount of data that needs locking.

I'm not familiar enough with the implementation details here to say much
one way or another. But it seems to me that if this is one of those
low-level decisions that will be impossible to change later and will
forever constrain perl's performance, then it's important not to rush
into a bad choice because it seems more straightforward.
This can all be redone if we need to -- the locking and threading 
strategies can be altered in a dozen ways or ripped out and 
rewritten, as none of them affect the semantics of bytecode execution.

At 17:24 on 01/22/2004 EST, Deven T. Corzine [EMAIL PROTECTED] wrote:

 Dan Sugalski wrote:

  Last chance to get in comments on the first half of the proposal. If
  it looks adequate, I'll put together the technical details (functions,
  protocols, structures, and whatnot) and send that off for
  abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and
  get the implementation in and going.
 Dan,

 Sorry to jump in out of the blue here, but did you respond to Damien
 Neil's message about locking issues?  (Or did I just miss it?)
 This sounds like it could be a critically important design question;
 wouldn't it be best to address it before jumping into implementation? 
 If there's a better approach available, wouldn't this be the best time
 to determine that?

 Deven

 Date: Wed, 21 Jan 2004 13:32:52 -0800
 From: Damien Neil [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: Start of thread proposal
 Message-ID: [EMAIL PROTECTED]
 References: [EMAIL PROTECTED] 
[EMAIL PROTECTED]
8.leo.home [EMAIL PROTECTED]
 In-Reply-To: [EMAIL PROTECTED]
 Content-Length: 1429
 On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
  ... seems to indicate that even whole ops like add P,P,P are atomic.
 
  Yep. They have to be, because they need to guarantee the integrity of
  the pmc structures and the data hanging off them (which includes
  buffer and string stuff)
 Personally, I think it would be better to use corruption-resistant
 buffer and string structures, and avoid locking during basic data
 access.  While there are substantial differences in VM design--PMCs
 are much more complicated than any JVM data type--the JVM does provide
 a good example that this can be done, and done efficiently.
 Failing this, it would be worth investigating what the real-world
 performance difference is between acquiring multiple locks per VM
  operation (current Parrot proposal) vs. having a single lock
 controlling all data access (Python) or jettisoning OS threads
 entirely in favor of VM-level threading (Ruby).  This forfeits the
 ability to take advantage of multiple CPUs--but Leopold's initial
 timing tests of shared PMCs were showing a potential 3-5x slowdown
 from excessive locking.
 I've seen software before that was redesigned to take advantage of
 multiple CPUs--and then required no less than four CPUs to match
 the performance of the older, single-CPU version.  The problem was
 largely attributed to excessive locking of mostly-uncontested data
 structures.
 - Damien



--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads... last call

2004-01-23 Thread Deven T. Corzine
Dan Sugalski wrote:

At 5:24 PM -0500 1/22/04, Deven T. Corzine wrote:

Damian's issues were addressed before he brought them up, though not 
in one spot.

A single global lock, like python and ruby use, kill any hope of 
SMP-ability.

Hand-rolled threading has unpleasant complexity issues, is a big pain, 
and terribly limiting. And kills any hope of SMP-ability.
What about the single-CPU case?  If it can really take 4 SMP CPUs with 
locking to match the speed of 1 CPU without locking, as mentioned, 
perhaps it would be better to support one approach for single-CPU 
systems (or applications that are happy to be confined to one CPU), and 
a different approach for big SMP systems?

Corruption-resistent data structures without locking just don't exist.
The most novel approach I've seen is the one taken by Project UDI 
(Uniform Driver Interface).  Their focus is on portable device drivers, 
so I don't know if this idea could work in the Parrot context, but the 
approach they take is to have the driver execute in regions.  Each 
driver needs to have at least one region, and it can create more if it 
wants better parallelism.  All driver code executes inside a region, 
but the driver does no locking or synchronization at all.  Instead, the 
environment on the operating-system side of the UDI interface handles 
such issues.  UDI is designed to ensure that only one driver instance 
can ever be executing inside a region at any given moment, and the 
mechanism it uses is entirely up to the environment, and can be changed 
without touching the driver code.

This white paper has a good technical overview (the discussion of 
regions starts on page 9):

http://www.projectudi.org/Docs/pdf/UDI_tech_white_paper.pdf

I'm told that real-world experience with UDI has shown performance is 
quite good, even when layered over existing native drivers.  The 
interesting thing is that a UDI driver could run just as easily on a 
single-tasking, single-CPU system (like DOS) or a multi-tasking SMP 
system equally well, and without touching the driver code.  It doesn't 
have to know or care if it's an SMP system or not, although it does have 
to create multiple regions to actually be able benefit from SMP.  (Of 
course, even with single-region drivers, multiple instances of the same 
driver could benefit from SMP, since each instance could run on a 
different CPU.)

I don't know if it would be possible to do anything like this with 
Parrot, but it might be interesting to consider...

Deven



Re: Threads... last call

2004-01-23 Thread Damien Neil
On Fri, Jan 23, 2004 at 10:07:25AM -0500, Dan Sugalski wrote:
 A single global lock, like python and ruby use, kill any hope of 
 SMP-ability.

Assume, for the sake of argument, that locking almost every PMC
every time a thread touches it causes Parrot to run four times
slower.  Assume also that all multithreaded applications are
perfectly parallelizable, so overall performance scales linearly
with number of CPUs.  In this case, threaded Parrot will need
to run on a 4-CPU machine to match the speed of a single-lock
design running on a single CPU.  The only people that will benefit
from the multi-lock design are those using machines with more than
4 CPUs--everyone else is worse off.

This is a theoretical case, of course.  We don't know exactly how
much of a performance hit Parrot will incur from a lock-everything
design.  I think that it would be a very good idea to know for
certain what the costs will be, before it becomes too late to change
course.

Perhaps the cost will be minimal--a 20% per-CPU overhead would
almost certainly be worth the ability to take advantage of multiple
CPUs.  Right now, however, there is no empirical data on which to
base a decision.  I think that making a decision without that data
is unwise.

As I said, I've seen a real-world program which was rewritten to
take advantage of multiple CPUs.  The rewrite fulfilled the design
goals: the new version scaled with added CPUs. Unfortunately, lock
overhead made it sufficiently slower that it took 2-4 CPUs to match
the old performance on a single CPU--despite the fact that almost
all lock attempts succeeded without contention.

The current Parrot design proposal looks very much like the locking
model that app used.


 Corruption-resistent data structures without locking just don't exist.

An existence proof:

Java Collections are a standard Java library of common data structures
such as arrays and hashes.  Collections are not synchronized; access
involves no locks at all.  Multiple threads accessing the same
collection at the same time cannot, however, result in the virtual
machine crashing.  (They can result in data structure corruption,
but this corruption is limited to surprising results rather than
VM crash.)

  - Damien


Re: Threads... last call

2004-01-23 Thread nigelsandever
On Fri, 23 Jan 2004 10:24:30 -0500, [EMAIL PROTECTED] (Dan Sugalski) wrote:
 If you're accessing shared data, it has 
 to be locked. There's no getting around that. The only way to reduce 
 locking overhead is to reduce the amount of data that needs locking.
 

One slight modification I would make to that statement is:

You can reduce locking overhead by only invoking that overhead
that overhead when locking is necessary. If there is a 'cheaper'
way of detecting the need for locking, then avoiding the cost
of locking, by only using it when needed is beneficial.

This requires the detection mechanism to be extremely fast and
simple relative to the cost of aquiring a lock. This was what I 
attempted to describe before, in win32 terms, without much success.

I still can't help tinking that other platforms probably have similar 
possibilities, but I do not know enough of them to describe the 
mechanism in thise terms.

Nigel. 




RE: Threads... last call

2004-01-23 Thread Gordon Henriksen
Deven T. Corzine wrote:

 The most novel approach I've seen is the one taken by Project UDI 
 (Uniform Driver Interface).

This is very much the ithreads model which has been discussed. The
problem is that, from a functional perspective, it's not so much
threading as it is forking.

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



More on threads

2004-01-23 Thread Gordon Henriksen
Just thought I'd share some more thoughts on threading. I don't think 
the threading proposal is baked, yet, unfortunately.

I've come to agree with Dan: As the threading requirements and the 
architecture stand, parrot requires frequent and automatic locking to 
prevent crashes. This is completely apart from user synchronization.

As the architecture stands? What's wrong with it? I think the most 
problematic items are:

1. parrot's core operations are heavy and multi-step, not lightweight 
and atomic.
-- This makes it harder for parrot to provide a crash-proof environment.
2. PMCs are implemented in C, not PIR.
-- Again, makes parrot's job of providing a crash-proof environment 
much harder. If a small set of safe operations can be guaranteed safe, 
then the crash-proofing bubbles upward.
3. New code tends to appear in parrot's core rather than accumulating in 
a standard library.
-- This bloats the core, increasing our exposure to bugs at that level.
4. Memory in parrot is not type-stable.
-- unions with mutable discriminators are evil, because checking the 
discriminator and accessing the data field could be preempted by a 
change of discriminator and value. Thus, unions containing pointers 
require locking for even read access, lest seg faults or unsafe memory 
accesses occur. Best example: morph. morph must die.

But parrot's already much too far along for the above to change. (ex.: 
morph must die.)

The JVM and CLR have successful threading implementations because their 
core data types are either atomic or amenable to threading. (I've been 
over this before, but I'm playing devil's advocate today.)
-- Many of Perl's core string types, for instance, are not threadsafe
and they never will be. (note: I said Perl, not parrot.) Even if 
implemented on the JVM, Perl's string types would still require locking. 
That Perl doesn't use them yet doesn't mean parrot can't also have data 
structures that are amenable to locking. Immutable strings wouldn't 
require locking on parrot any more than on the JVMso long as morph and 
transcode could be prevented. (Three cheers for type-stable memory.) If 
parrot can prove that a P-reg will point to a PMC of such-and-such type, 
and can know that such-and-such operation requires no locking on that 
type, it can avoid locking the PMC.

That, and neither environment (any longer) makes any misguided attempt 
to provide user-level consistency when it hasn't been requested.
-- That means they simply don't lock except when the user tells them 
to. No reader-writer locks to update a number.

Like Dan mentioned, there's no JVM magic, but rather there is a lot of 
very careful design. The core is crash-proofed, and is small enough that 
the crash-proofing is reasonable and provable. Atop that is built code 
which inherits that crash-proofing. Thread-safety is a very high-level 
guarantee, only rarely necessary.

Dan Sugalski wrote:

=item All shared PMCs must have a threadsafe vtable

The first thing that any vtable function of a shared PMC must do is to 
aquire the mutex of the PMCs in its parameter list, in ascending 
address order. When the mutexes are released they are not required to 
be released in any order.
Wait a sec. $2-vtable-add(interpreter, $1, $2, $3). That's one dynamic 
dispatch. I see 2 variables that could be shared. I think that's fatal, 
actually.

The algorithm I'd suggest instead is this: Newborn objects couldn't have 
been shared, and as such can safely be accessed without locks. This is a 
lot given how Perl treats values, though certainly not all. All objects 
from foreign sources, which have been passed to another routine, or 
stored into a sharable container must be presumed to require locks. It's 
not as aggressive, true, but I think the overall cost is lower.

To back up Dan: Regarding Leo's timings, everyone that's freaking out 
should remember that he was testing a very fast operation, the worst 
case scenario for the locking overhead. *Of course* the overhead will 
appear high. Most of parrot's operations are much heavier, and the 
locking overhead will be less apparent when those are executing. That 
said, a 400% penalty is too high a price to pay for what, after all, 
isn't even a useful level of threadsafety from a user's standpoint. But, 
again, without respecification and redesign, parrot requires the 
locking. The trick is to lock less.

One way I can see to do that is to move locking upward, so that several 
operations can be carried out under the auspices of one lock. How would 
I propose to do this?

 Add some lock opcodes to PBC. The pluralized version allows parrot to 
acquire locks in ascending address order (hardcoded bubble sorts), 
according to Dan's very important deadlock-avoidance algorithm.
  - op lock(in PMC)
  - op lock2(in PMC, in PMC)
  - ...
  - op lock5(in PMC, in PMC, in PMC, in PMC, in PMC)
  - ...
 Add unlock opcode(s), too. Pluralized? Doesn't matter.
Force all locks to be released before any of:
- locking more 

Threads... last call

2004-01-22 Thread Dan Sugalski
Last chance to get in comments on the first half of the proposal. If 
it looks adequate, I'll put together the technical details 
(functions, protocols, structures, and whatnot) and send that off for 
abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and 
get the implementation in and going.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads... last call

2004-01-22 Thread Deven T. Corzine
Dan Sugalski wrote:

Last chance to get in comments on the first half of the proposal. If 
it looks adequate, I'll put together the technical details (functions, 
protocols, structures, and whatnot) and send that off for 
abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and 
get the implementation in and going.
Dan,

Sorry to jump in out of the blue here, but did you respond to Damien 
Neil's message about locking issues?  (Or did I just miss it?)

This sounds like it could be a critically important design question; 
wouldn't it be best to address it before jumping into implementation?  
If there's a better approach available, wouldn't this be the best time 
to determine that?

Deven

Date: Wed, 21 Jan 2004 13:32:52 -0800
From: Damien Neil [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Start of thread proposal
Message-ID: [EMAIL PROTECTED]
References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
In-Reply-To: [EMAIL PROTECTED]
Content-Length: 1429
On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
... seems to indicate that even whole ops like add P,P,P are atomic.

Yep. They have to be, because they need to guarantee the integrity of 
the pmc structures and the data hanging off them (which includes 
buffer and string stuff)
Personally, I think it would be better to use corruption-resistant
buffer and string structures, and avoid locking during basic data
access.  While there are substantial differences in VM design--PMCs
are much more complicated than any JVM data type--the JVM does provide
a good example that this can be done, and done efficiently.
Failing this, it would be worth investigating what the real-world
performance difference is between acquiring multiple locks per VM
operation (current Parrot proposal) vs. having a single lock
controlling all data access (Python) or jettisoning OS threads
entirely in favor of VM-level threading (Ruby).  This forfeits the
ability to take advantage of multiple CPUs--but Leopold's initial
timing tests of shared PMCs were showing a potential 3-5x slowdown
from excessive locking.
I've seen software before that was redesigned to take advantage of
multiple CPUs--and then required no less than four CPUs to match
the performance of the older, single-CPU version.  The problem was
largely attributed to excessive locking of mostly-uncontested data
structures.
   - Damien




  1   2   3   >