On Fri, Mar 29, 2019 at 08:19:12PM +0100, Martijn van Duren wrote: > Running getdelim on a nonblocking socket results in data loss of the > first part of the message if the said message is sent in chunks. > Code below shows how to repeat. > > POSIX states the following: > For the conditions under which the getdelim() and getline() functions > shall fail and may fail, refer to fgetc. > > And for fgetc: > [EAGAIN] > The O_NONBLOCK flag is set for the file descriptor underlying stream > and the thread would be delayed in the fgetc() operation > > This to me reads that the first call should retain the data in the > buffer and the second call should return the entire sentence. > I also ran the code on Alpine Linux (musl libc) and Linux Mint (glibc). > Musl behaves just like us and glibc returns the first part of the > sentence without the delimiter (returning a positive value, indicating > there's no error so far), which is a violation of the specs, which > states that the data returned contains the newline. > > I also looked at the getdelim.c code, but I don't have the knowledge/ > time to send a diff at this time. But maybe someone has some useful > input on this.
Hmmm, interesting corner case. Clobbering the data without ever telling the caller about it is bad. One solution is something like: check in the error case if we read anything, and if so then "put it all back" a la ungetc(3). I don't think we can just store what we read in the buffer across calls. The only state we're given when we call getdelim(3) is the size of the buffer, which is at least buflen. But the spec says nothing about the caller using the same buffer between calls: even if that's idiomatic we can't rely on the application to do that for us. I'm beat, so this isn't happening immediately, but I think if we refactored __srefill() and added something like __sappend() to append new data to the FILE's buffer (growing it if necessary) and then changed the getdelim(3) logic to __sappend() if fp->_r > 0 and the delimiter is not in the buffer, we'd... have done what I just described. But I need to look more closely at stdio to figure out how... and to make sure I'm not talking nonsense. We sort of do something similar in fgetln(3) already, but we use an auxiliary buffer. Maybe of note is that our fgetc(3) docs claim C89 behavior, not all the additional stuff POSIX.1-2008 specifies. Which doesn't make our getdelim(3) non-conforming per se, but it makes it difficult for the application developer to even know that this case is possible.