Thanks Petri, You accurately summarized what we said in arch call today.
Bill, we'll cover the topic again on Wednesday. I have seen how it happened behind the scenes for DPDK: 1 year architectural discussion of implications (kind of background mode, coffee time), then little by little patch additions bottom up (from pool changes to full API impact). I think we need to be extra-careful with this complex topic. Cordially, FF On 20 February 2017 at 16:01, Savolainen, Petri (Nokia - FI/Espoo) < petri.savolai...@nokia-bell-labs.com> wrote: > Hi, > > We are already in the phase where code in master should be maintained for > production quality. There is no hurry to merge in code that has > questionable quality. Zero copy packet references is not a trivial feature > to implement. We are in much better position to review, test and use the > reference code, if it's developed in phases. That's why I propose that we > do multiple smaller steps: > > 1) merge simple, copy based implementation first (in api-next and master), > which we can be sure that is does not break anything > 2) write multi-threaded (performance) test apps for refs > 3) cleanup, optimize normal packet code towards zero-copy multi-ref > support (minimize places where refs are visible) > 4) implement zero-copy multi-ref for most obvious use cases: maybe static > references first ... > 5) continue multi-ref implementation, or decide that some rare corner > cases can be left with copy based implementation > > This actually follows what has been done before. Add simple, copy based > implementation first and continue development/optimization from there. So, > that we can step back and compare easily with previous version, if e.g. > race conditions are found. > > -Petri > > > > -----Original Message----- > > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of > Bill > > Fischofer > > Sent: Saturday, February 18, 2017 6:28 PM > > To: Francois Ozog <francois.o...@linaro.org> > > Cc: lng-odp@lists.linaro.org > > Subject: Re: [lng-odp] [API-NEXT PATCHv7 2/5] linux-generic: packet: > > implement reference apis > > > > On Sat, Feb 18, 2017 at 9:57 AM, Francois Ozog <francois.o...@linaro.org > > > > wrote: > > > > > Well, problem is still there. > > > You are doing something on a packet that may not exist anymore. > > > > > > > Can you elaborate? The bug fix patch eliminates the race condition that > > Janne pointed out because no thread manipulates a packet after > > decrementing > > the ref_count other than to free it if that operation decremented the > > ref_count to 0. > > > > > > > > > > On 17 February 2017 at 22:08, Bill Fischofer < > bill.fischo...@linaro.org> > > > wrote: > > > > > >> I've posted patch http://patches.opendataplane.org/patch/8155/ to > > >> address this issue. It goes on api-next on top of patches > > >> http://patches.opendataplane.org/patch/7879/ and > > >> http://patches.opendataplane.org/patch/8154/ > > >> > > >> On Fri, Feb 17, 2017 at 2:39 PM, Bill Fischofer > > >> <bill.fischo...@linaro.org> wrote: > > >> > First off, thank you very much for this review. > > >> > > > >> > Please note that this code has been streamlined in patch > > >> > http://patches.opendataplane.org/patch/7879/ and has been further > > >> > refined with patch http://patches.opendataplane.org/patch/8145/ but > > >> > the exposure you identify still exists in that code. > > >> > > > >> > On Fri, Feb 17, 2017 at 11:31 AM, Peltonen, Janne (Nokia - FI/Espoo) > > >> > <janne.pelto...@nokia.com> wrote: > > >> >> Hi, > > >> >> > > >> >> I took a look at the packet references and it seems to me that > > >> >> either the implementation is a bit racy or I confused myself > > >> >> when reading the code. Or maybe I got the intended concurrency > > >> >> semantics of the packet references wrong? > > >> >> > > >> >> My first issue is that packet_free() may access freed packet > > >> >> header or corrupt unshared_len. > > >> >> > > >> >> The packet free function looks like this: > > >> >> > > >> >> static inline void packet_free(odp_packet_hdr_t *pkt_hdr) > > >> >> { > > >> >> odp_packet_hdr_t *ref_hdr; > > >> >> uint32_t ref_count; > > >> >> > > >> >> do { > > >> >> ref_hdr = pkt_hdr->ref_hdr; > > >> >> ref_count = packet_ref_count(pkt_hdr) - 1; > > >> >> free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount); > > >> >> > > >> >> if (ref_count == 1) > > >> >> pkt_hdr->unshared_len = pkt_hdr->frame_len; > > >> >> > > >> >> pkt_hdr = ref_hdr; > > >> >> } while (pkt_hdr); > > >> >> } > > >> >> > > >> >> The problem here is that decrementing the ref_count, checking > > >> >> its value and updating unshared_len is not single atomic > > >> >> operation. By the time packet_free() checks if ref_count == 1 > > >> >> (i.e. if there is exactly one other reference left somewhere), > > >> >> the true ref_count may have already been changed by another > > >> >> thread doing packet_free() or packet_ref(). > > >> >> > > >> >> For example, if two threads have a reference to the same packet > > >> >> then execution (or the relevant memory ops) may get "interleaved" > > >> >> as follows: > > >> >> > > >> >> T1: call packet_free() > > >> >> T1: ref_count = packet_ref_count(pkt_hdr) - 1; > > >> >> At this point ref_count variable is 1 > > >> >> T1: call free_bufs() > > >> >> T1: call packet_ref_dec() > > >> >> Now the ref_count of the packet header is 1. > > >> >> T2: call and complete packet_free() > > >> >> Thread 2 sees refcount 1 in the packet and frees the buffers > > >> >> T1: pkt_hdr->unshared_len = pkt_hdr->frame_len; > > >> >> Thread 1 accesses freed buffer for reading and writing. > > >> > > > >> > I agree. These steps should be reversed so that the code should > read: > > >> > > > >> > if (ref_count == 1) > > >> > pkt_hdr->unshared_len = pkt_hdr->frame_len; > > >> > > > >> > free_bufs(pkt_hdr, 0, pkt_hdr->buf_hdr.segcount); > > >> > > > >> > Or using the code with the above two patches applied, the code > should > > >> read: > > >> > > > >> > static inline void packet_free(odp_packet_hdr_t *pkt_hdr) > > >> > { > > >> > odp_packet_hdr_t *ref_hdr; > > >> > uint32_t ref_count; > > >> > int num_seg; > > >> > > > >> > do { > > >> > ref_count = packet_ref_count(pkt_hdr); > > >> > num_seg = pkt_hdr->buf_hdr.segcount; > > >> > ref_hdr = pkt_hdr->ref_hdr; > > >> > > > >> > if (odp_likely((CONFIG_PACKET_MAX_SEGS == 1 || > > num_seg > > >> == 1) && > > >> > ref_count == 1)) { > > >> > buffer_free_multi((odp_buffer_t > > >> > *)&pkt_hdr->buf_hdr.handle.handle, 1); > > >> > } else { > > >> > if (ref_count == 2) > > >> > pkt_hdr->unshared_len = > > >> pkt_hdr->frame_len; > > >> > > > >> > free_bufs(pkt_hdr, 0, num_seg); > > >> > } > > >> > > > >> > pkt_hdr = ref_hdr; > > >> > } while (pkt_hdr); > > >> > } > > >> > > > >> > The mistake was trying to optimize things so that unshared_len is > not > > >> > set if the buffers are being freed, but that exposes these race > > >> > conditions. So the worst that should now happen is that it is set > > >> > unnecessarily before being freed. > > >> > > > >> > If you concur I'll fold this fix into a v3 for patch > > >> > http://patches.opendataplane.org/patch/8145/ > > >> > > > >> >> > > >> >> Similarly, if T2 created a new reference, T1 would have > > >> >> a wrong idea of the number of remaining references and > > >> >> would adjust the unshared_len to an incorrect value. > > >> >> > > >> >> Right? > > >> >> > > >> >> Maybe other modifications of unshared_len are also racy. > > >> > > > >> > I don't believe so, because references do not change the existing > ODP > > >> > restriction that two threads cannot share the same odp_packet_t. > > When > > >> > a packet reference is created it returns a separate odp_packet_t > that > > >> > has its own metadata. So unshared_len is always private to an > > >> > individual odp_packet_t. The exception is static references but in > > >> > this case the entire > > >> > packet along with its metadata must be treated as read only so > > >> > operations like odp_packet_push_head() that would try to modify > > >> > unshared_len are prohibited. > > >> > > > >> >> > > >> >> > > >> >> > > >> >> The second issue is that the atomic ops for setting and > > >> >> reading the ref count seem to have too relaxed memory > > >> >> ordering. In particular, packet_ref_dec() must not happen > > >> >> (be visible to other threads) before its caller is done > > >> >> with the packet and the related memory accesses have > > >> >> completed. Now there does not seem to be any optimization > > >> >> and memory barrier to prevent the ref count decrementing > > >> >> happening too early. So I think it is at least theoretically > > >> >> possible that a thread e.g. reads from a packet buffer > > >> >> after it has already been freed by another thread, somehow > > >> >> like this: > > >> >> > > >> >> Source code order: > > >> >> T1: interesting_data = read_from_pkt(pkt) > > >> >> T1: packet_free(pkt) > > >> >> > > >> >> Order visible to T2: > > >> >> 1: ref count decr > > >> >> 2: read from pkt > > >> >> > > >> >> Now if T2 goes and frees the remaining reference between > > >> >> steps 1 and 2, T1 may get even more interesting data. > > >> >> > > >> >> Right? > > >> > > > >> > I don't believe so. The semantics of odp_atomic_fetch_dec_u32(), > > which > > >> > is what packet_ref_dec() uses, says that no two calls can see the > > same > > >> > fetched value, so only one thread will return ref_count == 1 and > free > > >> > the buffer. Note that if I see ref_count == 1 no other thread can be > > >> > trying to increment it via a concurrent odp_packet_ref() call > because > > >> > that would mean that two threads were trying to manipulate the same > > >> > odp_packet_t, which is prohibited. > > >> > > > >> > For CPUs that support out of order instruction execution, this is > > only > > >> > permitted providing the reordering and speculative executions are > > >> > semantically consistent with sequential execution. If this were not > > >> > the case you'd constantly have to worry about a processor turning > > >> > > > >> > T1: interesting_data = read_from_pkt(pkt) > > >> > T1: packet_free(pkt) > > >> > > > >> > into > > >> > > > >> > T1: packet_free(pkt) > > >> > T1: interesting_data = read_from_pkt(pkt) > > >> > > > >> > In your scenario above: T2 cannot be issuing a read to pkt after > > >> > ref_count is decremented because the only way that T2 could be > > >> > decrementing ref_count would be if T2 issued an odp_packet_free() > > call > > >> > for it. Obviously if it tries to reference pkt after such a call > that > > >> > is an application error. > > >> > > > >> > Thanks again for your much-appreciated help in looking at this! > > >> > > > >> >> > > >> >> Janne > > >> >> > > >> >> > -- [image: Linaro] <http://www.linaro.org/> François-Frédéric Ozog | *Director Linaro Networking Group* T: +33.67221.6485 francois.o...@linaro.org | Skype: ffozog