ron minnich wrote:

we're doing some work here with Andrey's port of ssh2. It *almost*
works. But I'm seeing a stack trace I don't understand.

I can't give you all the details -- it's ssh, therefore it is pretty
awful -- but here is the short form: There is a proc called fromnet()
which has this inner loop:
        for(;;){
                if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
                        write(1, buf, n);
                else
                        goto Donenet;
        }

When this proc is entered, ape has forked off two procs to handle the
fd 'c'. From the fromnet function, we see the libssh2_channel_read
does a select. here is where I get confused. The stk() for the two
procs looks like this:
pread()+0x7 /sys/src/libc/9syscall/pread.s:5
read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
libssh2_packet_read(session=0x1102f8)+0x176
/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16

The the read on fd 5. That's the socket. Here is the other proc.

_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
/sys/src/ape/lib/ap/plan9/_buf.c:292
libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16

ok, I think this stack is a bit messed up, since I don't see how we
can have the coyproc in the call chain from select(), but ... is it?

Plan9 has no select functionality. Select is emulated in APE by forking a childproc that reads an fd and fills a buffer (on a shared memory area). Read() should then pick up the data from the buffer and wakeup the reader proc if it sleeps (because the buffer got filled up). Select() will startup such a reader proc (startbuf()) if it is not already "bufferd" and then check if the buffer has data available,
so the stacktrace looks valid to me.

Maybe the bufferd filedescriptors doesnt work with the recv() call and are only implemented for read()? I think you should find some kind of switch in read() that checks if the fd is bufferd and then calls
some _buf.c function that copies the data from the buffer.
Maybe this is missing for recv()?

I realize there is very little information here, sorry ... here's what
is bothering me. It seems we have two procs hanging on a read on fd 5.
I think the copyproc and some other proc are in conflict but ... I am
unsure. The problems we are seeing might be explained by the wrong
proc grabbing output at the wrong time -- it feels like a race
condition. And acid trips we can take to hammer this one down?

Anyone ever done a select on a socket in ape?


Reply via email to