Following up on this - I was able to poke around in DDB a little. When this read call blocks (even though it shouldn’t, in multiple ways), the kernel trace looks like this:
db{0}> trace/t 139 trace: pid 313 lid 1 at 0xbace3c7c 0xbace3c7c: netbsd:mi_switch+0x10 0xbace3cac: netbsd:sleepq_block+0x16c 0xbace3ce4: netbsd:cv_timedwait_sig+0x108 0xbace3d14: netbsd:ttysleep+0x80 0xbace3d94: netbsd:ttread+0x308 0xbace3dcc: netbsd:ucomread+0x230 0xbace3e4c: netbsd:spec_read+0xcc 0xbace3e7c: netbsd:VOP_READ+0x48 0xbace3eb4: netbsd:vn_read+0x8c 0xbace3f04: netbsd:dofileread+0x98 0xbace3f34: netbsd:sys_read+0x70 0xbace3fac: netbsd:syscall+0x18c Next step is to look into spec_read and below, to see if I can figure out why it’s not returning a byte immediately. Rob > On Feb 22, 2019, at 11:41 AM, Rob Newberry <robthed...@mac.com> wrote: > > I’ve been debugging this from as many angles as I can, and right now, the > summary question is this: > > Why would “read( fd, but, 1);” block for almost 30 seconds, when: > > - the descriptor is in non-blocking mode > - poll indicates the descriptor is safely readable > - top indicates there is no CPU load (machine is also > interactively responsive) > > Any ideas? I think my next step is to trace down through the “read” system > call and try to see where it’s blocking, but so far I’ve gotten a little lost > trying to figure out the path of the read call into the TTY code (mostly > because I don’t know all this code, and I’m slowly figuring out how > “/dev/ttyU0” ends up mapping to a vnode, how the v_ops get mapped, where > TTY’s buffering gets mapped in, etc. I will eventually figure this out, but > it’s a bit painful just reading and adding printfs :-). > > > > Here’s some more details: > > It may indeed be unrelated to USB/umodem. > > The code does use select() and poll() to wait for input from the serial port. > > Eventually, the code decides that it can read without blocking from the port > (both select and poll have indicated that the descriptor is safely readable). > > It then calls "read( fd, but, 1 );” (the code is adamant about reading a byte > at a time). > > This call blocks for a LONG time. > > But the descriptor is in non-blocking mode. I’ve added debug code just > before the “read” call that (a) checks the flags to make sure the descriptor > is non-blocking, (b) does a “poll” once more to make sure the descriptor > appears readable, and (c) prints out the time before the call — and then I > added a print out of the time AFTER the “read()” call completes (using > clock_gettime(CLOCK_MONOTONIC)); > > In every case, the descriptor is in non-blocking mode, and poll says it is > readable. > > But the “read” call takes 25 seconds or more! > > I’ve got “top” running in another shell — there’s no CPU load while I run > this. > > Here’s the output: > > FDWrapper::read 0x7f7fffab3479, 1 > in non-blocking mode (flags = 0x00004006) > poll indicates descriptor is readable (0x00000040) > time before read = 8436, 316529950 > FDWrapper::read - got 1 > time after read = 8461, 854271618 > > > Here’s the code that generates the above output: > > > ssize_t > FDWrapper::read(void* data, size_t len) > { > > printf("FDWrapper::read %p, %u\n", data, (unsigned int)len ); > > { > int fl; > fl = fcntl( mFDRead, F_GETFL ); > if ( fl & O_NONBLOCK ) > { > printf( "\tin non-blocking mode (flags = 0x%08X)\n", fl > ); > } > else > { > printf( "\tin BLOCKING mode (flags = 0x%08X)\n", fl ); > } > } > > { > struct pollfd pfd; > int flags = POLLRDNORM|POLLERR|POLLNVAL|POLLHUP; > int err; > > pfd.fd = mFDRead; > pfd.events = flags; > pfd.revents = 0; > > err = poll( &pfd, 1, 0 ); > if ( ( err > 0 ) && ( pfd.revents & flags ) ) > { > printf( "\tpoll indicates descriptor is readable > (0x%08lX)\n", (long)(pfd.revents & flags ) ); > } > else > { > printf( "\tpoll indicates descriptor is NOT readable > (err = %d, 0x%08lX)\n", err, (long)(pfd.revents & flags ) ); > } > } > > { > struct timespec start; > clock_gettime( CLOCK_MONOTONIC, &start ); > printf( "\ttime before read = %lu, %lu\n", start.tv_sec, > start.tv_nsec ); > } > > ssize_t ret = ::read(mFDRead, data, len); > printf("\tFDWrapper::read - got %d\n", (int)ret ); > > { > struct timespec stop; > clock_gettime( CLOCK_MONOTONIC, &stop ); > printf( "\ttime after read = %lu, %lu\n", stop.tv_sec, > stop.tv_nsec ); > } > > > … > > > > > > > >> On Feb 18, 2019, at 11:05 AM, Michael van Elst <mlel...@serpens.de> wrote: >> >> mar...@duskware.de (Martin Husemann) writes: >> >>> On Mon, Feb 18, 2019 at 09:11:51AM -0800, Rob Newberry wrote: >>>> I'm pretty sure there are differences in how NetBSD's USB modem class >>>> driver interacts with the device from how other platforms do. For the >>>> time being, I'm trying to read the code of the NetBSD and the FreeBSD >>>> drivers, along with the USB CDC spec, to see what's going on. >> >>> Hmm, I thought I would be using umodem, but actually it is u3g. >>> However, I am pretty sure that at that level the driver does not >>> do anything special but just pass on the data in both ways. >> >> I'm using umodem for a 3G interface, nothing special there. >> >> You also see from the traces that the same data is sent and the >> first byte received is also the same. This could be as simple >> as a timeout on NetBSD, I first thought something was using very >> small timeouts that cannot be handled with our low HZ value, but >> the code seems to use only timeouts of at least several seconds. >> >> The code also uses a rare combination of select() and poll() to >> wait for input. I tend to believe the problem is not related to >> USB or umodem at all. >> >> -- >> -- >> Michael van Elst >> Internet: mlel...@serpens.de >> "A potential Snark may lurk in every tree." >