Re: Threads... last call

Dan Sugalski Fri, 23 Jan 2004 08:27:16 -0800

At 5:58 PM -0500 1/22/04, Josh Wilmes wrote:

I'm also concerned by those timings that leo posted.
0.0001 vs 0.0005 ms on a set- that magnitude of locking overhead
seems pretty crazy to me.

It looks about right. Don't forget, part of what you're seeing isn't that locking mutexes is slow, it's that parrot does a lot of stuff awfully fast. It's also a good idea to get more benchmarks before jumping to any conclusions -- changing designes based on a single, first cut, quick-n-dirty benchmark isn't necessarily a wise thing.

It seemed like a few people have said that the JVM style of locking
can reduce this, so it seems to me that it merits some serious
consideration, even if it may require some changes to the design of
parrot.

There *is* no "JVM-style" locking. I've read the docs and looked at the specs, and they're not doing anything at all special, and nothing different from what we're doing. Some of the low-level details are somewhat different because Java has more immutable base data structures (which don't require locking) than we do. Going more immutable is an option, but one we're not taking since it penalizes things we'd rather not penalize. (String handling mainly)

There is no "JVM Magic" here. If you're accessing shared data, it has to be locked. There's no getting around that. The only way to reduce locking overhead is to reduce the amount of data that needs locking.

I'm not familiar enough with the implementation details here to say much
one way or another. But it seems to me that if this is one of those
low-level decisions that will be impossible to change later and will
forever constrain perl's performance, then it's important not to rush
into a bad choice because it seems more straightforward.

This can all be redone if we need to -- the locking and threading strategies can be altered in a dozen ways or ripped out and rewritten, as none of them affect the semantics of bytecode execution.

At 17:24 on 01/22/2004 EST, "Deven T. Corzine" <[EMAIL PROTECTED]> wrote:

Dan Sugalski wrote:
 > Last chance to get in comments on the first half of the proposal. If
 > it looks adequate, I'll put together the technical details (functions,
 > protocols, structures, and whatnot) and send that off for
 > abuse^Wdiscussion. After that we'll finalize it, PDD the thing, and
 > get the implementation in and going.
Dan,
 Sorry to jump in out of the blue here, but did you respond to Damien
 Neil's message about locking issues?  (Or did I just miss it?)
This sounds like it could be a critically important design question; wouldn't it be best to address it before jumping into implementation? If there's a better approach available, wouldn't this be the best time to determine that?

Deven

Date: Wed, 21 Jan 2004 13:32:52 -0800 From: Damien Neil <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Re: Start of thread proposal Message-ID: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]

8.leo.home> <[EMAIL PROTECTED]>

 In-Reply-To: <[EMAIL PROTECTED]>
 Content-Length: 1429

 On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
 > >... seems to indicate that even whole ops like add P,P,P are atomic.
 >
 > Yep. They have to be, because they need to guarantee the integrity of
 > the pmc structures and the data hanging off them (which includes
 > buffer and string stuff)

 Personally, I think it would be better to use corruption-resistant
 buffer and string structures, and avoid locking during basic data
 access.  While there are substantial differences in VM design--PMCs
 are much more complicated than any JVM data type--the JVM does provide
 a good example that this can be done, and done efficiently.

 Failing this, it would be worth investigating what the real-world
 performance difference is between acquiring multiple locks per VM

> operation (current Parrot proposal) vs. having a single lock

 controlling all data access (Python) or jettisoning OS threads
 entirely in favor of VM-level threading (Ruby).  This forfeits the
 ability to take advantage of multiple CPUs--but Leopold's initial
 timing tests of shared PMCs were showing a potential 3-5x slowdown
 from excessive locking.

 I've seen software before that was redesigned to take advantage of
 multiple CPUs--and then required no less than four CPUs to match
 the performance of the older, single-CPU version.  The problem was
 largely attributed to excessive locking of mostly-uncontested data
 structures.

- Damien

--
                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Threads... last call

Reply via email to