Thanks Petri,

You accurately summarized what we said in arch call today.

Bill, we'll cover the topic again on Wednesday. I have seen how it happened
behind the scenes for DPDK: 1 year architectural discussion of implications
(kind of background mode, coffee time), then little by little patch
additions bottom up (from pool changes to full API impact). I think we need
to be extra-careful with this complex topic.

Cordially,

FF



On 20 February 2017 at 16:01, Savolainen, Petri (Nokia - FI/Espoo) <
petri.savolai...@nokia-bell-labs.com> wrote:

> Hi,
>
> We are already in the phase where code in master should be maintained for
> production quality. There is no hurry to merge in code that has
> questionable quality. Zero copy packet references is not a trivial feature
> to implement. We are in much better position to review, test and use the
> reference code, if it's developed in phases. That's why I propose that we
> do multiple smaller steps:
>
> 1) merge simple, copy based implementation first (in api-next and master),
> which we can be sure that is does not break anything
> 2) write multi-threaded (performance) test apps for refs
> 3) cleanup, optimize normal packet code towards zero-copy multi-ref
> support (minimize places where refs are visible)
> 4) implement zero-copy multi-ref for most obvious use cases: maybe static
> references first ...
> 5) continue multi-ref implementation, or decide that some rare corner
> cases can be left with copy based implementation
>
> This actually follows what has been done before. Add simple, copy based
> implementation first and continue development/optimization from there. So,
> that we can step back and compare easily with previous version, if e.g.
> race conditions are found.
>
> -Petri
>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of
> Bill
> > Fischofer
> > Sent: Saturday, February 18, 2017 6:28 PM
> > To: Francois Ozog <francois.o...@linaro.org>
> > Cc: lng-odp@lists.linaro.org
> > Subject: Re: [lng-odp] [API-NEXT PATCHv7 2/5] linux-generic: packet:
> > implement reference apis
> >
> > On Sat, Feb 18, 2017 at 9:57 AM, Francois Ozog <francois.o...@linaro.org
> >
> > wrote:
> >
> > > Well, problem is still there.
> > > You are doing something on a packet that may not exist anymore.
> > >
> >
> > Can you elaborate? The bug fix patch eliminates the race condition that
> > Janne pointed out because no thread manipulates a packet after
> > decrementing
> > the ref_count other than to free it if that operation decremented the
> > ref_count to 0.
> >
> >
> > >
> > > On 17 February 2017 at 22:08, Bill Fischofer <
> bill.fischo...@linaro.org>
> > > wrote:
> > >
> > >> I've posted patch http://patches.opendataplane.org/patch/8155/ to
> > >> address this issue.  It goes on api-next on top of patches
> > >> http://patches.opendataplane.org/patch/7879/ and
> > >> http://patches.opendataplane.org/patch/8154/
> > >>
> > >> On Fri, Feb 17, 2017 at 2:39 PM, Bill Fischofer
> > >> <bill.fischo...@linaro.org> wrote:
> > >> > First off, thank you very much for this review.
> > >> >
> > >> > Please note that this code has been streamlined in patch
> > >> > http://patches.opendataplane.org/patch/7879/ and has been further
> > >> > refined with patch http://patches.opendataplane.org/patch/8145/ but
> > >> > the exposure you identify still exists in that code.
> > >> >
> > >> > On Fri, Feb 17, 2017 at 11:31 AM, Peltonen, Janne (Nokia - FI/Espoo)
> > >> > <janne.pelto...@nokia.com> wrote:
> > >> >> Hi,
> > >> >>
> > >> >> I took a look at the packet references and it seems to me that
> > >> >> either the implementation is a bit racy or I confused myself
> > >> >> when reading the code. Or maybe I got the intended concurrency
> > >> >> semantics of the packet references wrong?
> > >> >>
> > >> >> My first issue is that packet_free() may access freed packet
> > >> >> header or corrupt unshared_len.
> > >> >>
> > >> >> The packet free function looks like this:
> > >> >>
> > >> >> static inline void packet_free(odp_packet_hdr_t *pkt_hdr)
> > >> >> {
> > >> >>         odp_packet_hdr_t *ref_hdr;
> > >> >>         uint32_t ref_count;
> > >> >>
> > >> >>         do {
> > >> >>                 ref_hdr = pkt_hdr->ref_hdr;
> > >> >>                 ref_count = packet_ref_count(pkt_hdr) - 1;
> > >> >>                 free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount);
> > >> >>
> > >> >>                 if (ref_count == 1)
> > >> >>                         pkt_hdr->unshared_len = pkt_hdr->frame_len;
> > >> >>
> > >> >>                 pkt_hdr = ref_hdr;
> > >> >>         } while (pkt_hdr);
> > >> >> }
> > >> >>
> > >> >> The problem here is that decrementing the ref_count, checking
> > >> >> its value and updating unshared_len is not single atomic
> > >> >> operation. By the time packet_free() checks if ref_count == 1
> > >> >> (i.e. if there is exactly one other reference left somewhere),
> > >> >> the true ref_count may have already been changed by another
> > >> >> thread doing packet_free() or packet_ref().
> > >> >>
> > >> >> For example, if two threads have a reference to the same packet
> > >> >> then execution (or the relevant memory ops) may get "interleaved"
> > >> >> as follows:
> > >> >>
> > >> >> T1: call packet_free()
> > >> >> T1: ref_count = packet_ref_count(pkt_hdr) - 1;
> > >> >> At this point ref_count variable is 1
> > >> >> T1: call free_bufs()
> > >> >> T1: call packet_ref_dec()
> > >> >> Now the ref_count of the packet header is 1.
> > >> >> T2: call and complete packet_free()
> > >> >> Thread 2 sees refcount 1 in the packet and frees the buffers
> > >> >> T1: pkt_hdr->unshared_len = pkt_hdr->frame_len;
> > >> >> Thread 1 accesses freed buffer for reading and writing.
> > >> >
> > >> > I agree. These steps should be reversed so that the code should
> read:
> > >> >
> > >> > if (ref_count == 1)
> > >> >    pkt_hdr->unshared_len = pkt_hdr->frame_len;
> > >> >
> > >> > free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount);
> > >> >
> > >> > Or using the code with the above two patches applied, the code
> should
> > >> read:
> > >> >
> > >> > static inline void packet_free(odp_packet_hdr_t *pkt_hdr)
> > >> > {
> > >> >         odp_packet_hdr_t *ref_hdr;
> > >> >         uint32_t ref_count;
> > >> >         int num_seg;
> > >> >
> > >> >         do {
> > >> >                 ref_count = packet_ref_count(pkt_hdr);
> > >> >                 num_seg = pkt_hdr->buf_hdr.segcount;
> > >> >                 ref_hdr = pkt_hdr->ref_hdr;
> > >> >
> > >> >                 if (odp_likely((CONFIG_PACKET_MAX_SEGS == 1 ||
> > num_seg
> > >> == 1) &&
> > >> >                     ref_count == 1)) {
> > >> >                         buffer_free_multi((odp_buffer_t
> > >> > *)&pkt_hdr->buf_hdr.handle.handle, 1);
> > >> >                 } else {
> > >> >                         if (ref_count == 2)
> > >> >                                 pkt_hdr->unshared_len =
> > >> pkt_hdr->frame_len;
> > >> >
> > >> >                         free_bufs(pkt_hdr, 0, num_seg);
> > >> >                  }
> > >> >
> > >> >                  pkt_hdr = ref_hdr;
> > >> >         } while (pkt_hdr);
> > >> > }
> > >> >
> > >> > The mistake was trying to optimize things so that unshared_len is
> not
> > >> > set if the buffers are being freed, but that exposes these race
> > >> > conditions. So the worst that should now happen is that it is set
> > >> > unnecessarily before being freed.
> > >> >
> > >> > If you concur I'll fold this fix into a v3 for patch
> > >> > http://patches.opendataplane.org/patch/8145/
> > >> >
> > >> >>
> > >> >> Similarly, if T2 created a new reference, T1 would have
> > >> >> a wrong idea of the number of remaining references and
> > >> >> would adjust the unshared_len to an incorrect value.
> > >> >>
> > >> >> Right?
> > >> >>
> > >> >> Maybe other modifications of unshared_len are also racy.
> > >> >
> > >> > I don't believe so, because references do not change the existing
> ODP
> > >> > restriction that two threads cannot share the same odp_packet_t.
> > When
> > >> > a packet reference is created it returns a separate odp_packet_t
> that
> > >> > has its own metadata. So unshared_len is always private to an
> > >> > individual odp_packet_t. The exception is static references but in
> > >> > this case the entire
> > >> > packet along with its metadata must be treated as read only so
> > >> > operations like odp_packet_push_head() that would try to modify
> > >> > unshared_len are prohibited.
> > >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> The second issue is that the atomic ops for setting and
> > >> >> reading the ref count seem to have too relaxed memory
> > >> >> ordering. In particular, packet_ref_dec() must not happen
> > >> >> (be visible to other threads) before its caller is done
> > >> >> with the packet and the related memory accesses have
> > >> >> completed. Now there does not seem to be any optimization
> > >> >> and memory barrier to prevent the ref count decrementing
> > >> >> happening too early. So I think it is at least theoretically
> > >> >> possible that a thread e.g. reads from a packet buffer
> > >> >> after it has already been freed by another thread, somehow
> > >> >> like this:
> > >> >>
> > >> >> Source code order:
> > >> >> T1: interesting_data = read_from_pkt(pkt)
> > >> >> T1: packet_free(pkt)
> > >> >>
> > >> >> Order visible to T2:
> > >> >> 1: ref count decr
> > >> >> 2: read from pkt
> > >> >>
> > >> >> Now if T2 goes and frees the remaining reference between
> > >> >> steps 1 and 2, T1 may get even more interesting data.
> > >> >>
> > >> >> Right?
> > >> >
> > >> > I don't believe so. The semantics of odp_atomic_fetch_dec_u32(),
> > which
> > >> > is what packet_ref_dec() uses, says that no two calls can see the
> > same
> > >> > fetched value, so only one thread will return ref_count == 1 and
> free
> > >> > the buffer. Note that if I see ref_count == 1 no other thread can be
> > >> > trying to increment it via a concurrent odp_packet_ref() call
> because
> > >> > that would mean that two threads were trying to manipulate the same
> > >> > odp_packet_t, which is prohibited.
> > >> >
> > >> > For CPUs that support out of order instruction execution, this is
> > only
> > >> > permitted providing the reordering and speculative executions are
> > >> > semantically consistent with sequential execution. If this were not
> > >> > the case you'd constantly have to worry about a processor turning
> > >> >
> > >> > T1: interesting_data = read_from_pkt(pkt)
> > >> > T1: packet_free(pkt)
> > >> >
> > >> > into
> > >> >
> > >> > T1: packet_free(pkt)
> > >> > T1: interesting_data = read_from_pkt(pkt)
> > >> >
> > >> > In your scenario above: T2 cannot be issuing a read to pkt after
> > >> > ref_count is decremented because the only way that T2 could be
> > >> > decrementing ref_count would be if T2 issued an odp_packet_free()
> > call
> > >> > for it. Obviously if it tries to reference pkt after such a call
> that
> > >> > is an application error.
> > >> >
> > >> > Thanks again for your much-appreciated help in looking at this!
> > >> >
> > >> >>
> > >> >>         Janne
> > >> >>
> > >> >>
>



-- 
[image: Linaro] <http://www.linaro.org/>
François-Frédéric Ozog | *Director Linaro Networking Group*
T: +33.67221.6485
francois.o...@linaro.org | Skype: ffozog

Reply via email to