On Tuesday 14 October 2008 05:34:46 Roberto A. Foglietta wrote: > 2008/10/14 Denys Vlasenko <[EMAIL PROTECTED]>: > > On Tuesday 14 October 2008 10:48:54 am Rob Landley wrote: > >> On Sunday 12 October 2008 23:48:22 Rob Landley wrote: > >> > Next time it reads a buffer, it starts with the last character of a > >> > cursor left sequence: capital D. Capital D is "delete to end of > >> > line", which it does. > >> > > >> > So basically, busybox vi is corrupting your data when you cursor > >> > around in a file on a loaded system. Wheee... > >> > >> Hmmm... I redid the readit() code to only read ahead when processing an > >> escape sequence. (This let me shrink the readahead buffer to 8 bytes, > >> actually 5 but with 32 bit alignment 8 makes more sense. Bloatcheck > >> says I shrunk the code by 17 bytes.) > > > > Disregard my previous patch, I just looked at your code and it's > > as good but it's smaller than mine, so let's use yours. > > > >> Unfortunately, this mitigated the problem a bit, but didn't actually > >> _fix_ it. It happens less often, but I can still trigger it. > >> > >> I _think_ this is actually a qemu issue. The escape sequences are being > >> generated by the host Linux, which are then sent to the qemu process > >> over a virtual serial console, which breaks them down into individual > >> bytes with an interrupt for each. > >> > >> This means that the blocking we're depending on to parse escape > >> sequences doesn't work over a serial console. You _can get an escape > >> character by itself with poll saying there's no more data, and then on > >> the next clock cycle you can get a "[D". > >> > >> Hmmm... > >> > >> Ok, making poll wait 300 miliseconds before deciding there's no next > >> character in a pending escape sequence seems to have fixed it. (At > >> least I can't reproduce the problem under qemu anymore.) > > > > Please document this next time, or someone else might come later > > and delete the timeout. I did this a few mins ago :( will fix it now. > > > > Did you try something smaller than 300ms? > > As far as I understood the problem: considering a 1200bps line, 120 > chars per second, 40 escape sequence per seconds, then the minimum > timeout should be at least 1/40 sec = 25 ms.
It's not the number of escape sequences per second, it's the delay between each character in the escape sequence. (The problem is when we don't recognize that this string of characters should be interpreted as a block, and instead interpret them as individual characters.) So if this was the _only_ issue, the minimum delay needed is 1/120th of a second. (But see below.) > This is for a fixed speed > line, considering an asynchronous data line with an average speed of > 1200bps and a bell curve variance of 25 ms then using a timeout of > 100ms would catch the 0.999936657516% of the escape sequences. I'm not following that at all. If it's just a question of giving the next character time to come in at 1200bps, a 25 ms delay should catch 100.00% of the escape sequences. There's no "variance" in the timing of characters coming in on a serial line, if they didn't work on a hard clock the data would be corrupted at the physical link layer. > Enlarging the timeout to 150ms 0.999999998027%. Over 200ms should not > make any sense any more if the variance has been correctly estimated > in 25 ms. > > http://en.wikipedia.org/wiki/Normal_distribution Where does statistics enter into serial communications? No, the specific case I'm worried about is virtual systems running under an emulator. (In addition to kvm and xen and such, I bang on qemu a lot to test busybox on non-x86 platforms. A virtual serial console is a standard way to talk to that sort of thing.) The emulator process is a normal userspace process on the host system, subject to the process scheduler and thus capable of having its execution suspended at any time for an arbitrary length of time on a loaded host system. When the host system schedules away the emulator process, the virtual system's processor pauses. But the poll() timeout is in wall clock time, which doesn't pause. This means that when the emulator _does_ resume, there's a race between the timer interrupt (telling the poll to return because there was no data within the requested time period) and the serial interrupt (queueing the new data). Which one takes precedence? I have no idea, that's a "dig into the guts of the Linux kernel" thing. If the timeout take precedence over the servicing the serial interrupt, then we need a long enough delay to make sure that the emulator process gets scheduled to handle the next character before the timeout. If the serial interrupt takes precedence over the timer, then even a delay of 25 miliseconds should be plenty. (The other fun little detail is that the _sending_ process on the other end of the serial line can get scheduled away, but it submits data to the kernel as a multi-character block so even if your terminal process gets scheduled away, it either hasn't sent the escape sequence yet or has queued the whole thing into the kernel's serial output buffer which isn't subject to userspace scheduling. At least as long as the write blocks until complete instead of returning a short write. I think.) I think I can work out a test for this (write a script suspending the qemu process with "while true; do kill -STOP $PID; sleep 1; kill -CONT $PID; sleep 1; done" and then hold down "cursor left" in vi for a couple minutes and see if it zaps the line it's one. Those delays are long enough that even the current 300 ms wait to connect a partial sequence won't be long enough, so it should either manifest the bug or let us know that we can cut the delay down to something unnoticeable like 50ms and still be safe.) (P.S. Most network things send data in larger blocks (256 bytes for ppp, 1500 bytes for 100baseT, 65k for gigabit) so this problem doesn't easily manifest via ssh, just via serial connection. It could theoretically manifest via ssh on a poor connection where you queue up 500+ cursor movements, have an escape sequence cross packet boundaries due to the MTU size, and then have the second packet delayed by retransmission. But that's kind of unlikely.) Rob _______________________________________________ busybox mailing list busybox@busybox.net http://busybox.net/cgi-bin/mailman/listinfo/busybox