Re: Threads Design. A Win32 perspective.

Uri Guttman Sat, 03 Jan 2004 13:25:45 -0800

>>>>> "EM" == Elizabeth Mattijsen <[EMAIL PROTECTED]> writes:


  >> ding! ding! ding! you just brought in a cpu specific instruction which
  >> is not guaranteed to be on any other arch. in fact many have such a
  >> beast but again, it is not accessible from c.

  EM> I just _can't_ believe I'm hearing this.  So what if it's not
  EM> accessible from C?  Could we just not build a little C-program that
  EM> would create a small in whatever loadable library?  Or have a
  EM> post-processing run through the binary image inserting the right
  EM> machine instructions in the right places?  Not being from a *nix
  EM> background, but more from a MS-DOS background, I've been used to
  EM> inserting architecture specific machine codes from higher level
  EM> languages into executable streams since 1983!  Don't tell me that's
  EM> not "done" anymore?  ;-)

it is not that it isn't done anymore but the effect has to be the same
on machines without test/set. and on top of that, it still needs to be a
kernel level operation so a thread can block on the lock. that is the
more important issue that makes using a test/set in user space a moot
problem.

  EM> I disagree.  I'm not a redmond fan, so I agree with a lot of your
  EM> _sentiment_, but you should also realize that a _lot_ of intel
  EM> hardware is running Linux.  Heck, even some Solarises run on it.

we are talking maybe 10-20 architectures out there that we would want
parrot to run on. maybe more. how many does p5 run on now?

  EM> The portability is in Parrot itself: not by using the lowest common
  EM> denominator of C runtime systems out there _today_!   It will take a
  EM> lot of trouble to create a system that will run everywhere, but that's
  EM> just it what makes it worthwhile.  Not that it offers the same limited
  EM> capabilities on all systems!

but we need a common denominator of OS features more than one for cpu
features. the fibre/thread stuff is redmond only. and they still require
system calls. so as i said the test/set is not a stopping point
(dijkstra) but the OS support is. how and where and when we lock is the
only critical factor and that hasn't been decided yet. we don't want to
lock at global thread levels and we are not sure we can lock at PMC or
object levels (GC and alloc can break that). we should be focusing on
that issue. think about how DBs did it. sybase used to do page locking
(coarse grained) since it was faster (this was 15 years ago) and they
had the fastest engine. but when multithreading and multicpu designs
came in a finer grained row locking was faster (oracle). sybase fell
behind and has not caught up. we have the same choices to make so we
need to study locking algorithms and techniques from that perspective
and not how to do a single lock (test/set vs kernel). but i will keep
reiterating that it has to be a kernel lock since we must block threads
and GC and such without spinning or manual scheduling (fibres).

  >> virtual ram is what counts on unix. you can't request some large amount
  >> without using real swap space. it may not allocate real ram until later
  >> (page faulting on demand) but it is swap space that counts. it is used
  >> up as soon as you allocate it.

  EM> So, maybe a wrapper is, either for *nix, or for Win32, or maybe both.

this is very different behavior IMO and not something that can be
wrapped easily. i could be wrong but separating virtual allocation from
real allocation can't be emulated without kernel support. and we need
the same behavior on all platforms. this again brings up how we lock so
that GC/alloc will work properly with threads. do we lock a thread pool
but not the thread when we access a shared thingy? that is a medium
grain lock. can the GC/alloc break the lock if it is called inside that
operation? or could only the pool inside the active thread do that? what
about an shared object alloced from thread A's pool but it triggers an
alloc when being accessed in thread B. these are the questions that need
to be asked and answered. i was just trying to point out to nigel that
the intel/redmond solutions are not portable as they require OS
support and that all locks need to be kernel level. given that
requirement, we need to decide how to do the locks so those questions
can be answered with reasonable efficiency. of course a single global
lock would work but that stinks and we all know it. so what is the lock
granularity? how do we handle GC/alloc across shared objects?

  EM> This sounds too much like dogma to me.  Why?  Isn't Parrot about
  EM> borgifying all good things from all OS's and VM's, now and in the
  EM> future?   ;-)

but parrot can only use a common set of features across OS's. we can't
use a redmond feature that can't be emulated on other platforms. and my
karma ran over my dogma :( :-)

  >> hardware is tossed out with portable VM design. we have a user space
  >> process written in c with a VM and VM threads that are to be based on
  >> kernel threads. the locking issues for shared objects is the toughest
  >> nut to crack right now. there is no simple or fast solution to this
  >> given the known contraints. intel/redmond specific solutions are not
  >> applicable (though we can learn from them).

  EM> Then please lets.  Instead of tossing them without further investigation.

i didn't toss them out as much as dismiss them as a portable solution.
i have asked a bunch of questions. i would like to see some suggested
solutions.

  >> ok, i can see where you got the test/set and yield stuff from now.
  >> fibres seem to be manually scheduled threads. this means user code has
  >> to decide to yield to another thread blocking on the same lock. this
  >> means all locks must have a list of threads/fibres blocking on that
  >> lock. and manually scheduled stuff is not appropriate for
  >> parrot. finally, it is redmond specific again and very unportable. i
  >> have never heard of this fibre concept on any unix flavor.

  EM> It's never too late.  Maybe Parrot can learn from it.

manual scheduling is not a good solution IMO. it means all locks must
have a good amount of code executed to yield the thread and figure out
what thread to be run next, etc. the kernel already does this with its
scheduler so we would be reinventing a big wheel. also the kernel still
does scheduling underneath our scheduling so it would be even slower.

  >> and that is not possible on other platforms. also it means parrot would
  >> have to have its own scheduler with that the pain that brings.

  EM> Ah, but the joy when it _does_ come together!   ;-)

heh. having done low level rtos stuff with its own 'scheduler' (really
event queues) i can remember the joy. but it works best on raw iron. in
user space it is much trickier. proper event loop code is in effect a
manual scheduler as you use 'return' from event handler code instead of
yield and that gets you back to the main loop. you also have to either
avoid blocking operations (i/o, etc.) or shunt them off to
threads/processes where they can block and not block the main
thread/process. to do manual scheduling of thread would be even
harder. we would need to work around every blocking operation and either
let the kernel reschedule another thread or do a test/set lock and yield
(this is from the fibre stuff i read) and wait until we get rescheduled
and do the same thing again. this is effectively spinning with kernel
yields in each cycle. not a nice solution IMO.

  >> your ideas make sense but only on redmond/intel which is not the target
  >> space for parrot.

  EM> Wouldn't that need to read "Intel", and therefore include most of
  EM> Linuxes running out there?  Are we talking CPU specific or OS specific
  EM> here?  If OS specific, but still on Intel, it still is some set of
  EM> machine instructions at some level that could be borgified. Don't tell
  EM> me Redmond has the monopoly on using some of the Intel x86 machine
  EM> instructions?

a combination of both IMO. the test/set instruction is intel (but many
cpus have something similar) but the fibre stuff (manual thread
scheduling) seems to be redmond specific. using a kernel lock is
portable but again brings up the lock granularity issue. i suggest we
focus on that problem and not about how we do the locking.

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Re: Threads Design. A Win32 perspective.

Reply via email to