On Jan 11, 2013, at 5:48 PM, Christian Peron <c...@sqrt.ca> wrote:

> Guy,
> 
> Can you please describe the race and how to reproduce it? When we introduced 
> zerocopy-bpf, we also introduced bpf_bufheld() and bpf_bufreclaimed() which 
> are effectively nops for the regular buffer mode. Maybe we could maintain 
> state there as well? In any case, I would like to understand the race/and 
> reproduce it
> 
> Thanks!

Related to rwatson's comment from r175903, dropping the bpf descriptor lock 
while performing the uiomove() in bpfread() enables a couple of races: 1) 
simultaneous reads on a bpf device (not great), and 2) races between bpfread() 
and catchpacket() (panics due to NULL pointer dereference).

The symptom I encountered was a panic in catchpacket() where the save buffer 
was NULL. I added diagnostic code and found that one of the two buffers was 
being lost. As I reviewed the code, it seemed that the only way this could 
happen was if the hold buffer was being invalidated (rotated) out from under 
these lines in bpfread():

        BPFD_UNLOCK(d); 
 
        /* 
         * Move data from hold buffer into user space. 
         * We know the entire buffer is transferred since 
         * we checked above that the read buffer is bpf_bufsize bytes. 
         * 
         * XXXRW: More synchronization needed here: what if a second thread 
         * issues a read on the same fd at the same time?  Don't want this 
         * getting invalidated. 
         */ 
        error = bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio); 
 
        BPFD_LOCK(d); 
        d->bd_fbuf = d->bd_hbuf; /* if other code in the driver simultaneously 
sets hbuf to NULL, this loses a buffer! */
        d->bd_hbuf = NULL; 
        d->bd_hlen = 0; 
        bpf_buf_reclaimed(d); 
        BPFD_UNLOCK(d); 

My instrumented code validated this hypothesis, and I added the bd_hbuf_in_use 
flag to allow dropping the descriptor lock over the bpf_uiomove() call while 
protecting the hold buffer from rotation anywhere else in the driver. In my own 
tree, I have left some additional diagnostic code to make sure hbuf is still 
valid after uiomove completes, and that code has not been triggered since I 
added the bd_hbuf_in_use flag and associated code.

The way I was able to trigger the panic in catchpacket() was to call 
pcap_setfilter() more than one time on a live, promiscuous pcap descriptor 
while reading from a very busy network. I was able to temporarily work around 
this issue by using the BIOCSETFNR ioctl directly on the bpf descriptor rather 
than using pcap_setfilter(), which avoided calling reset_d() in the ioctl 
handler and thus avoided clearing bd_hbuf in a race with bpfread().

Just as I'm writing this, I realized the code that triggered the problem may 
have called pcap_setfilter() in a separate thread from the packet reading 
thread. I wrote a unit test for this issue, but as it was single-threaded, I 
was never able to cause the crash with it.



Regarding the zero-copy code, I'm not sure how bpf_bufheld() could be used as a 
general approach to solving this race. It appears to me that a condition 
variable is necessary to work around the necessity of releasing the bpf 
descriptor lock over the uiomove() call.

Hope this helps!

Guy

> 
> On 2012-11-13, at 3:40 PM, Guy Helmer wrote:
> 
>> To try to completely resolve the race in bpfread(), I have put together 
>> these changes to add a flag to indicate when the hold buffer cannot be 
>> modified because it is in use. Since it's my first time using mtx_sleep() 
>> and wakeup(), I wanted to run these past the list to see if I can get any 
>> feedback on the approach.
>> 
>> 
>> Index: bpf.c
>> ===================================================================
>> --- bpf.c    (revision 242997)
>> +++ bpf.c    (working copy)
>> @@ -819,6 +819,7 @@ bpfopen(struct cdev *dev, int flags, int fmt, stru
>>       * particular buffer method.
>>       */
>>      bpf_buffer_init(d);
>> +    d->bd_hbuf_in_use = 0;
>>      d->bd_bufmode = BPF_BUFMODE_BUFFER;
>>      d->bd_sig = SIGIO;
>>      d->bd_direction = BPF_D_INOUT;
>> @@ -872,6 +873,9 @@ bpfread(struct cdev *dev, struct uio *uio, int iof
>>              callout_stop(&d->bd_callout);
>>      timed_out = (d->bd_state == BPF_TIMED_OUT);
>>      d->bd_state = BPF_IDLE;
>> +    while (d->bd_hbuf_in_use)
>> +            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                PRINET|PCATCH, "bd_hbuf", 0);
>>      /*
>>       * If the hold buffer is empty, then do a timed sleep, which
>>       * ends when the timeout expires or when enough packets
>> @@ -940,27 +944,27 @@ bpfread(struct cdev *dev, struct uio *uio, int iof
>>      /*
>>       * At this point, we know we have something in the hold slot.
>>       */
>> +    d->bd_hbuf_in_use = 1;
>>      BPFD_UNLOCK(d);
>> 
>>      /*
>>       * Move data from hold buffer into user space.
>>       * We know the entire buffer is transferred since
>>       * we checked above that the read buffer is bpf_bufsize bytes.
>> -     *
>> -     * XXXRW: More synchronization needed here: what if a second thread
>> -     * issues a read on the same fd at the same time?  Don't want this
>> -     * getting invalidated.
>> +     *
>> +     * We do not have to worry about simultaneous reads because
>> +     * we waited for sole access to the hold buffer above.
>>       */
>>      error = bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio);
>> 
>>      BPFD_LOCK(d);
>> -    if (d->bd_hbuf != NULL) {
>> -            /* Free the hold buffer only if it is still valid. */
>> -            d->bd_fbuf = d->bd_hbuf;
>> -            d->bd_hbuf = NULL;
>> -            d->bd_hlen = 0;
>> -            bpf_buf_reclaimed(d);
>> -    }
>> +    KASSERT(d->bd_hbuf != NULL, ("bpfread: lost bd_hbuf"));
>> +    d->bd_fbuf = d->bd_hbuf;
>> +    d->bd_hbuf = NULL;
>> +    d->bd_hlen = 0;
>> +    bpf_buf_reclaimed(d);
>> +    d->bd_hbuf_in_use = 0;
>> +    wakeup(&d->bd_hbuf_in_use);
>>      BPFD_UNLOCK(d);
>> 
>>      return (error);
>> @@ -1114,6 +1118,9 @@ reset_d(struct bpf_d *d)
>> 
>>      BPFD_LOCK_ASSERT(d);
>> 
>> +    while (d->bd_hbuf_in_use)
>> +            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, PRINET,
>> +                "bd_hbuf", 0);
>>      if ((d->bd_hbuf != NULL) &&
>>          (d->bd_bufmode != BPF_BUFMODE_ZBUF || bpf_canfreebuf(d))) {
>>              /* Free the hold buffer. */
>> @@ -1254,6 +1261,9 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add
>> 
>>                      BPFD_LOCK(d);
>>                      n = d->bd_slen;
>> +                    while (d->bd_hbuf_in_use)
>> +                            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                                PRINET, "bd_hbuf", 0);
>>                      if (d->bd_hbuf)
>>                              n += d->bd_hlen;
>>                      BPFD_UNLOCK(d);
>> @@ -1967,6 +1977,9 @@ filt_bpfread(struct knote *kn, long hint)
>>      ready = bpf_ready(d);
>>      if (ready) {
>>              kn->kn_data = d->bd_slen;
>> +            while (d->bd_hbuf_in_use)
>> +                    mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                        PRINET, "bd_hbuf", 0);
>>              if (d->bd_hbuf)
>>                      kn->kn_data += d->bd_hlen;
>>      } else if (d->bd_rtout > 0 && d->bd_state == BPF_IDLE) {
>> @@ -2299,6 +2312,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int pk
>>       * spot to do it.
>>       */
>>      if (d->bd_fbuf == NULL && bpf_canfreebuf(d)) {
>> +            while (d->bd_hbuf_in_use)
>> +                    mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                        PRINET, "bd_hbuf", 0);
>>              d->bd_fbuf = d->bd_hbuf;
>>              d->bd_hbuf = NULL;
>>              d->bd_hlen = 0;
>> @@ -2341,6 +2357,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int pk
>>                      ++d->bd_dcount;
>>                      return;
>>              }
>> +            while (d->bd_hbuf_in_use)
>> +                    mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                        PRINET, "bd_hbuf", 0);
>>              ROTATE_BUFFERS(d);
>>              do_wakeup = 1;
>>              curlen = 0;
>> Index: bpf.h
>> ===================================================================
>> --- bpf.h    (revision 242997)
>> +++ bpf.h    (working copy)
>> @@ -1235,7 +1235,8 @@ SYSCTL_DECL(_net_bpf);
>> /*
>> * Rotate the packet buffers in descriptor d.  Move the store buffer into the
>> * hold slot, and the free buffer ino the store slot.  Zero the length of the
>> - * new store buffer.  Descriptor lock should be held.
>> + * new store buffer.  Descriptor lock should be held. Hold buffer must
>> + * not be marked "in use".
>> */
>> #define      ROTATE_BUFFERS(d)       do {                                    
>> \
>>      (d)->bd_hbuf = (d)->bd_sbuf;                                    \
>> Index: bpf_buffer.c
>> ===================================================================
>> --- bpf_buffer.c     (revision 242997)
>> +++ bpf_buffer.c     (working copy)
>> @@ -189,6 +189,9 @@ bpf_buffer_ioctl_sblen(struct bpf_d *d, u_int *i)
>>              return (EINVAL);
>>      }
>> 
>> +    while (d->bd_hbuf_in_use)
>> +            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +                PRINET, "bd_hbuf", 0);
>>      /* Free old buffers if set */
>>      if (d->bd_fbuf != NULL)
>>              free(d->bd_fbuf, M_BPF);
>> Index: bpfdesc.h
>> ===================================================================
>> --- bpfdesc.h        (revision 242997)
>> +++ bpfdesc.h        (working copy)
>> @@ -63,6 +63,7 @@ struct bpf_d {
>>      caddr_t         bd_sbuf;        /* store slot */
>>      caddr_t         bd_hbuf;        /* hold slot */
>>      caddr_t         bd_fbuf;        /* free slot */
>> +    int             bd_hbuf_in_use; /* don't rotate buffers */
>>      int             bd_slen;        /* current length of store buffer */
>>      int             bd_hlen;        /* current length of hold buffer */
>> 
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to