Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Oleksandr Otenko Tue, 09 Dec 2014 13:54:55 -0800

Yes, I do understand the reader needs barriers, too. I guess I waswondering more why the reader would need something stronger than whatdependencies etc could enforce. I guess I'll read what Martin forwardedfirst.


Alex



On 09/12/2014 21:37, David Holmes wrote:

See my earlier response to Martin. The reader has to force aconsistent view of memory - the writer can't as the write escapesbefore it can issue the barrier.

David

    -----Original Message-----
    *From:* [email protected]
    [mailto:[email protected]]*On Behalf Of
    *Oleksandr Otenko
    *Sent:* Wednesday, 10 December 2014 6:04 AM
    *To:* Hans Boehm; [email protected]
    *Cc:* core-libs-dev; [email protected]
    *Subject:* Re: [concurrency-interest] RFR: 8065804:
    JEP171:Clarifications/corrections for fence intrinsics

    On 26/11/2014 02:04, Hans Boehm wrote:

    To be concrete here, on Power, loads can normally be ordered by
    an address dependency or light-weight fence (lwsync).  However,
    neither is enough to prevent the questionable outcome for IRIW,
    since it doesn't ensure that the stores in T1 and T2 will be made
    visible to other threads in a consistent order.  That outcome can
    be prevented by using heavyweight fences (sync) instructions
    between the loads instead.


    Why would they need fences between loads instead of syncing the
    order of stores?


    Alex

    Peter Sewell's group concluded that to enforce correct volatile
    behavior on Power, you essentially need a a heavyweight fence
    between every pair of volatile operations on Power.  That cannot
    be understood based on simple ordering constraints.

    As Stephan pointed out, there are similar issues on ARM, but
    they're less commonly encountered in a Java implementation.  If
    you're lucky, you can get to the right implementation recipe by
    looking at only reordering, I think.


    On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
    <[email protected] <mailto:[email protected]>> wrote:

        Stephan Diestelhorst writes:
        >
        > David Holmes wrote:
        > > Stephan Diestelhorst writes:
        > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans
        Boehm:
        > > > > I'm no hardware architect, but fundamentally it seems
        to me that
        > > > >
        > > > > load x
        > > > > acquire_fence
        > > > >
        > > > > imposes a much more stringent constraint than
        > > > >
        > > > > load_acquire x
        > > > >
        > > > > Consider the case in which the load from x is an L1
        hit, but a

> > > > preceding load (from say y) is a long-latency miss.If we enforce

        > > > > ordering by just waiting for completion of prior
        operation, the
        > > > > former has to wait for the load from y to complete;
        while the
        > > > > latter doesn't.  I find it hard to believe that this
        doesn't leave
        > > > > an appreciable amount of performance on the table, at
        least for
        > > > > some interesting microarchitectures.
        > > >

> > > I agree, Hans, that this is a reasonable assumption.Load_acquire x

        > > > does allow roach motel, whereas the acquire fence does not.
        > > >
        > > > >  In addition, for better or worse, fencing
        requirements on at least
        > > > >  Power are actually driven as much by store atomicity
        issues, as by
        > > > >  the ordering issues discussed in the cookbook.  This
        was not
        > > > >  understood in 2005, and unfortunately doesn't seem to be
        > amenable to
        > > > >  the kind of straightforward explanation as in Doug's
        cookbook.
        > > >
        > > > Coming from a strongly ordered architecture to a weakly
        ordered one
        > > > myself, I also needed some mental adjustment about
        store (multi-copy)
        > > > atomicity.  I can imagine others will be unaware of
        this difference,
        > > > too, even in 2014.
        > >
        > > Sorry I'm missing the connection between fences and
        multi-copy
        > atomicity.
        >
        > One example is the classic IRIW.  With non-multi copy
        atomic stores, but
        > ordered (say through a dependency) loads in the following
        example:
        >
        > Memory: foo = bar = 0
        > _T1_         _T2_         _T3_             _T4_
        > st (foo),1   st (bar),1   ld r1, (bar)             ld r3,(foo)

> <addr dep / local "fence" here><addr dep>

        >                           ld r2, (foo)             ld r4, (bar)
        >
        > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
        non-multi-copy atomic
        > machines.  On TSO boxes, this is not possible. That means
        that the
        > memory fence that will prevent such a behaviour (DMB on
        ARM) needs to
        > carry some additional oomph in ensuring multi-copy
        atomicity, or rather
        > prevent you from seeing it (which is the same thing).

        I take it as given that any code for which you may have ordering
        constraints, must first have basic atomicity properties for
        loads and
        stores. I would not expect any kind of fence to add
        multi-copy-atomicity
        where there was none.

        David

        > Stephan
        >
        > _______________________________________________
        > Concurrency-interest mailing list
        > [email protected]
        <mailto:[email protected]>
        > http://cs.oswego.edu/mailman/listinfo/concurrency-interest

        _______________________________________________
        Concurrency-interest mailing list
        [email protected]
        <mailto:[email protected]>
        http://cs.oswego.edu/mailman/listinfo/concurrency-interest




    _______________________________________________
    Concurrency-interest mailing list
    [email protected]
    http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Reply via email to