How do I get the buffered bytes in a FILE *?
A bunch of protocols (git, http, mbox, etc) start with lines of data followed by a block of data, so it's natural to want to call getline() and then handle the data block. But getline() takes a FILE * and things like zlib and sendfile() take an integer file descriptor. Posix lets me get the file descriptor out of a FILE * with fileno(), but the point of FILE * is to readahead and buffer. How do I get the buffered data out without reading more from the file descriptor? I can't find a portable way to do this? In musl I can #include and then use __freadahead(fp) to get the number of bytes. Various other packages do an if/else staircase mucking with the internals of every libc they personally know, ala: https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c But I can't find a portable way to do this? Rob
Re: How do I get the buffered bytes in a FILE *?
On 4/11/22 13:54, Person Who Did Not CC The List wrote: > On 4/11/22 2:48 PM, Rob Landley via austin-group-l at The Open Group wrote: > >> But I can't find a portable way to do this? > > There isn't one. How do I request one? (Does WSG14 have a mailing list?) Rob
Re: How do I get the buffered bytes in a FILE *?
On 4/11/22 14:20, Another Person Who Did Not CC The List wrote: > On Mon, Apr 11, 2022 at 01:48:06PM -0500, Rob Landley via austin-group-l at > The Open Group wrote: >> A bunch of protocols (git, http, mbox, etc) start with lines of data >> followed by >> a block of data, so it's natural to want to call getline() and then handle >> the >> data block. But getline() takes a FILE * and things like zlib and sendfile() >> take an integer file descriptor. >> >> Posix lets me get the file descriptor out of a FILE * with fileno(), but the >> point of FILE * is to readahead and buffer. How do I get the buffered data >> out >> without reading more from the file descriptor? >> >> I can't find a portable way to do this? > > Can you forgo using stdio entirely and just use open? Most > API's that take a file descriptor are generally prepared to > do large I/O requests rather than character-by-character I/O > anyway. Modern kernels will buffer the file in the file/buffer > cache. That's what I did for years (toybox had a get_line() function instead of getline() that did byte at a time reads with realloc()), introduced in 2007: https://github.com/landley/toybox/blob/bc07865a504c/lib/lib.c#L594 But the Android guys complained it was really slow and ugly, which it was. (Sure Linux buffers it but a system call per byte is still nuts, drives the scheduler batty, makes your strace output impossible to follow...) Plus I was reimplementing a libc API which I try not to do without a really good reason, so when getline() was added to posix in 2008 I switched over MOST users to the new posix api... But the byte-at-a-time get_line() can't QUITE go away: https://github.com/landley/toybox/commit/15cbb92dffc8 (Hands off to sendfile() when it's out of hunks...) And it keeps wanting to come back: https://github.com/landley/toybox/commit/601828982a53 (Adding gzip/deflate support means http 1.1 data lines are followed by gzipped data payload.) Last week a contributor implementing a subset of "git" is encountering the same problem... http://lists.landley.net/pipermail/toybox-landley.net/2022-April/012817.html (Internally git's file format is a bunch of "keyword: value" text lines followed by payload.) It's come up other times over the years. Perennial problem. Unix has been doing "keyword:value lines followed by payload" since the days of mbox files. Sure http bodies started out as text, but they didn't stay that way... It seems like a simple question to ask a File *, "what data you have buffered". The reply "we're not capable of answering that question, therefore the programmer shouldn't ever want to ask it" seems... fixable? Rob (Note that if I implement my own get_line() with extra leftover buffer handed off between line reads to avoid the byte-at-a-time inefficiency, I've reinvented FILE *. And none of the above use cases guarantee seekable input so it can't put extra data it read data BACK into the file descriptor.)
Re: How do I get the buffered bytes in a FILE *?
On 4/11/22 15:41, Rich Felker wrote: >> But I can't find a portable way to do this? > > To give some context to this question, the __freadahead function > present in musl libc was created in 2012 to resolve a conflict between > gnulib, which has traditionally used an #ifdef jungle to provide > "freadahead" and other functionality by poking at FILE internals for > each known target they support, and musl, which explicitly makes FILE > an opaque non-ABI type. The idea was to let them implement the > function in a way that keeps the private member accesses on the stdio > implementation side. > > While I don't like this interface, gnulib is longstanding historical > precedent for its existence ~somewhere~ (just not as part of the > implementation), and it's historical precedent for major software > wanting this kind of access to stdio. I just emailed the chair of the C standard group and 90% of his reply was about text vs binary mode with FILE * (which is not present in Linux, MacOS, Android, iOS, Solaris, any embedded OS I've encountered...) I'm personally fine with fileno() returning -1 when the FILE * is in text mode, let alone freadahead(). Even the coreutils developers are noping out of support for things like cygwin now that Windows Subsystem for Linux exists: https://lists.gnu.org/archive/html/coreutils/2022-04/msg00038.html The C committee chair then said: > File descriptors are outside the scope of the C standard, so any > support for switching back and forth between streams and file > descriptors belongs elsewhere. I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING with file descriptors. That's why fileno() and fdopen() are only in posix, not in ANSI C. And the issues I'm currently trying to solve are a result of getline() showing up in posix-2008, which also does not exist in ANSI C. Thus an freadahead() function to encapsulate the horrible #ifdef staircase people are already repeatedly reinventing belongs in Posix, not ANSI. The function is needed to make Posix's existing fileno() reliable, and in the absence of a standard this has already been reimplemented multiple times. GNU has attempted to centralize its workaround collection in gnulib: https://github.com/digitalocean/gnulib/blob/master/lib/freadahead.c (Leading to a bunch of patches for m4 and glib coming up when you google for freadahead because said staircase breaks a lot.) But even IBM Z/OS implemented __freadahead(): https://www.ibm.com/docs/en/zos/2.4.0?topic=lf-freadahead-retrieve-number-bytes-remaining-in-input-buffer#freadahead The lack of this function derailed a port to VMS a few years back: https://sourceforge.net/p/vms-ports/tickets/61/ Here's somebody trying to update i386 support in buildroot and guess what they needed to patch: https://www.zephray.me/post/create_coremark_boot_disk_for_386/#:~:text=freadahead *shrug* This is a thing people are reimplementing a lot. Rob
Re: How do I get the buffered bytes in a FILE *?
Rob Landley wrote, on 11 Apr 2022: > > A bunch of protocols (git, http, mbox, etc) start with lines of data followed > by > a block of data, so it's natural to want to call getline() and then handle the > data block. But getline() takes a FILE * and things like zlib and sendfile() > take an integer file descriptor. > > Posix lets me get the file descriptor out of a FILE * with fileno(), but the > point of FILE * is to readahead and buffer. How do I get the buffered data out > without reading more from the file descriptor? > > I can't find a portable way to do this? I tried this sequence of calls on a few systems, and it worked in the way you would expect: fgets(buf, sizeof buf, fp); int fd = dup(fileno(fp)); close(fileno(fp)); while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... } read(fd, buf, sizeof buf); It relies on fread() not detecting EBADF until it tries to read more data from the underlying fd. It has some caveats: 1. It needs a file descriptor to be available. 2. The close() will remove any fcntl() locks that the calling process holds for the file. 3. In a multi-threaded process it has the usual problem around fd inheritance, but that's addressed in Issue 8 with the addition of dup3(). Also, for the standard to require it to work, I think we would need to tweak the EBADF error for fgetc() (which fread() references) to say: The file descriptor underlying stream is not a valid file descriptor open for reading and there is no buffered data available to be returned. (adding the "and ..." part). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: How do I get the buffered bytes in a FILE *?
On Tue, Apr 12, 2022 at 10:42:02AM +0100, Geoff Clare via austin-group-l at The Open Group wrote: > Rob Landley wrote, on 11 Apr 2022: > > A bunch of protocols (git, http, mbox, etc) start with lines of data > > followed by a block of data, so it's natural to want to call > > getline() and then handle the data block. But getline() takes a FILE > > * and things like zlib and sendfile() take an integer file > > descriptor. > > Posix lets me get the file descriptor out of a FILE * with fileno(), > > but the point of FILE * is to readahead and buffer. How do I get the > > buffered data out without reading more from the file descriptor? > > I can't find a portable way to do this? > I tried this sequence of calls on a few systems, and it worked in the > way you would expect: > fgets(buf, sizeof buf, fp); > int fd = dup(fileno(fp)); > close(fileno(fp)); > while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... } > read(fd, buf, sizeof buf); > It relies on fread() not detecting EBADF until it tries to read more > data from the underlying fd. > It has some caveats: > 1. It needs a file descriptor to be available. > 2. The close() will remove any fcntl() locks that the calling process >holds for the file. > 3. In a multi-threaded process it has the usual problem around fd >inheritance, but that's addressed in Issue 8 with the addition >of dup3(). There is another dangerous problem: if another thread or a signal handler allocates another fd and it is assigned the number fileno(fp), the while loop might read data from a completely unrelated file. This could be avoided by dup2/dup3'ing /dev/null onto fileno(fp) instead of closing it (at the cost of another file descriptor). > Also, for the standard to require it to work, I think we would need to > tweak the EBADF error for fgetc() (which fread() references) to say: > The file descriptor underlying stream is not a valid file > descriptor open for reading and there is no buffered data > available to be returned. Although I don't expect it to break in practice, the close(fileno(fp)) or dup2(..., fileno(fp)) violates the rules about the "active handle" in XSH 2.5.1 Interaction of File Descriptors and Standard I/O Streams. I believe the "correct" solution with a stdio implementation that doesn't offer something like freadhead() is not to use stdio but implement own buffering. -- Jilles Tjoelker
Re: How do I get the buffered bytes in a FILE *?
Sigh, spam filter impounded this. (Gotta move off gmail...) On 4/12/22 04:42, Geoff Clare via austin-group-l at The Open Group wrote: > Rob Landley wrote, on 11 Apr 2022: >> >> A bunch of protocols (git, http, mbox, etc) start with lines of data >> followed by >> a block of data, so it's natural to want to call getline() and then handle >> the >> data block. But getline() takes a FILE * and things like zlib and sendfile() >> take an integer file descriptor. >> >> Posix lets me get the file descriptor out of a FILE * with fileno(), but the >> point of FILE * is to readahead and buffer. How do I get the buffered data >> out >> without reading more from the file descriptor? >> >> I can't find a portable way to do this? > > I tried this sequence of calls on a few systems, and it worked in the > way you would expect: > > fgets(buf, sizeof buf, fp); > int fd = dup(fileno(fp)); > close(fileno(fp)); > while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... } > read(fd, buf, sizeof buf); > > It relies on fread() not detecting EBADF until it tries to read more > data from the underlying fd. Hmmm. That's an interesting approach. > It has some caveats: > > 1. It needs a file descriptor to be available. Understood, but acceptable. > 2. The close() will remove any fcntl() locks that the calling process >holds for the file. Fine. > 3. In a multi-threaded process it has the usual problem around fd >inheritance, but that's addressed in Issue 8 with the addition >of dup3(). Threads break everything anyway, but you could dup2(/dev/null) if you cared about them. > Also, for the standard to require it to work, I think we would need to > tweak the EBADF error for fgetc() (which fread() references) to say: > > The file descriptor underlying stream is not a valid file > descriptor open for reading and there is no buffered data > available to be returned. > > (adding the "and ..." part). Sounds reasonable. I'll give this a try. Thanks, Rob
答复: How do I get the buffered bytes in a FILE *?
Rob, you can use the MSG_PEEK flag on recv(2) instead of relying on stdio FILE* handles. 发件人: Rob Landley via austin-group-l at The Open Group 日期: 星期二, 2022-04-12 05:59:31 收件人: Rich Felker 抄送: austin-group-l@opengroup.org 主题: Re: How do I get the buffered bytes in a FILE *? On 4/11/22 15:41, Rich Felker wrote: >> But I can't find a portable way to do this? > > To give some context to this question, the __freadahead function > present in musl libc was created in 2012 to resolve a conflict between > gnulib, which has traditionally used an #ifdef jungle to provide > "freadahead" and other functionality by poking at FILE internals for > each known target they support, and musl, which explicitly makes FILE > an opaque non-ABI type. The idea was to let them implement the > function in a way that keeps the private member accesses on the stdio > implementation side. > > While I don't like this interface, gnulib is longstanding historical > precedent for its existence ~somewhere~ (just not as part of the > implementation), and it's historical precedent for major software > wanting this kind of access to stdio. I just emailed the chair of the C standard group and 90% of his reply was about text vs binary mode with FILE * (which is not present in Linux, MacOS, Android, iOS, Solaris, any embedded OS I've encountered...) I'm personally fine with fileno() returning -1 when the FILE * is in text mode, let alone freadahead(). Even the coreutils developers are noping out of support for things like cygwin now that Windows Subsystem for Linux exists: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0 The C committee chair then said: > File descriptors are outside the scope of the C standard, so any > support for switching back and forth between streams and file > descriptors belongs elsewhere. I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING with file descriptors. That's why fileno() and fdopen() are only in posix, not in ANSI C. And the issues I'm currently trying to solve are a result of getline() showing up in posix-2008, which also does not exist in ANSI C. Thus an freadahead() function to encapsulate the horrible #ifdef staircase people are already repeatedly reinventing belongs in Posix, not ANSI. The function is needed to make Posix's existing fileno() reliable, and in the absence of a standard this has already been reimplemented multiple times. GNU has attempted to centralize its workaround collection in gnulib: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerzOg%2FNc%3D&reserved=0 (Leading to a bunch of patches for m4 and glib coming up when you google for freadahead because said staircase breaks a lot.) But even IBM Z/OS implemented __freadahead(): https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdocs%2Fen%2Fzos%2F2.4.0%3Ftopic%3Dlf-freadahead-retrieve-number-bytes-remaining-in-input-buffer%23freadahead&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=i7WGhxZ9VW21u1Az3Yc6lcl3DZLTQcswfc8wLb6SK3M%3D&reserved=0 The lack of this function derailed a port to VMS a few years back: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fp%2Fvms-ports%2Ftickets%2F61%2F&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=H30KHDv8zLCzMUYrWmEDe1VVR7ZPmwSu7i1eRdd4FNc%3D&reserved=0 Here's somebody trying to update i386 support in buildroot and guess what they needed to patch: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zephray.me%2Fpost%2Fcreate_coremark_boot_disk_for_386%2F%23%3A~%3Atext%3Dfreadahead&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fT0ive9CHUOpphQqo0ePRbtWTnYK1XlHhY7EGmTuChs%3D&reserved=0 *shrug* This is a thing people are reimplementing a lot. Rob
Re: 答复: How do I get the buffered bytes in a FILE *?
Q) "How do I switch from FILE * to fd via fileno() without losing data." A) "Don't use FILE *" That's not the question I asked? The C99 guys said they haven't got fileno() or anything using file descriptors, so this ball is not not in their court. Posix has fileno(). That's why I'm asking here. Rob On 4/16/22 00:44, Danny Niu wrote: > Rob, you can use the MSG_PEEK flag on recv(2) instead of relying on stdio > FILE* > handles. > > > > *发件人**:*Rob Landley via austin-group-l at The Open Group > > *日期**:*星期二, 2022-04-12 05:59:31 > *收件人**:*Rich Felker > *抄送**:*austin-group-l@opengroup.org > *主题**:*Re: How do I get the buffered bytes in a FILE *? > > On 4/11/22 15:41, Rich Felker wrote: >>> But I can't find a portable way to do this? >> >> To give some context to this question, the __freadahead function >> present in musl libc was created in 2012 to resolve a conflict between >> gnulib, which has traditionally used an #ifdef jungle to provide >> "freadahead" and other functionality by poking at FILE internals for >> each known target they support, and musl, which explicitly makes FILE >> an opaque non-ABI type. The idea was to let them implement the >> function in a way that keeps the private member accesses on the stdio >> implementation side. >> >> While I don't like this interface, gnulib is longstanding historical >> precedent for its existence ~somewhere~ (just not as part of the >> implementation), and it's historical precedent for major software >> wanting this kind of access to stdio. > > I just emailed the chair of the C standard group and 90% of his reply was > about > text vs binary mode with FILE * (which is not present in Linux, MacOS, > Android, > iOS, Solaris, any embedded OS I've encountered...) > > I'm personally fine with fileno() returning -1 when the FILE * is in text > mode, > let alone freadahead(). Even the coreutils developers are noping out of > support > for things like cygwin now that Windows Subsystem for Linux exists: > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0 > <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0> > > The C committee chair then said: > >> File descriptors are outside the scope of the C standard, so any >> support for switching back and forth between streams and file >> descriptors belongs elsewhere. > > I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING > with > file descriptors. > > That's why fileno() and fdopen() are only in posix, not in ANSI C. And the > issues I'm currently trying to solve are a result of getline() showing up in > posix-2008, which also does not exist in ANSI C. > > Thus an freadahead() function to encapsulate the horrible #ifdef staircase > people are already repeatedly reinventing belongs in Posix, not ANSI. The > function is needed to make Posix's existing fileno() reliable, and in the > absence of a standard this has already been reimplemented multiple times. > > GNU has attempted to centralize its workaround collection in gnulib: > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerzOg%2FNc%3D&reserved=0 > <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerz
Re: 答复: How do I get the buffered bytes in a FILE *?
On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote: Q) "How do I switch from FILE * to fd via fileno() without losing data." A) "Don't use FILE *" That's not the question I asked? The answer is correct, but incomplete. The missing piece is that if you want to use FILE *, the operation you want, and the information you need to implement it, are not part of the public API. Other than using a strategy like Geoff suggested early on, or trying something like setvbuf to turn off buffering on the FILE * completely, the buffer associated with a FILE * and the indexes into it that say how much data you've consumed from the underlying source are opaque. If you want to manipulate that information, or expose it to a caller, you can't use FILE * (or, if you want a direct answer, "you can't"). I found it easier to write my own buffered input package to satisfy the POSIX read ahead requirements than try to coerce stdio into doing it. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: 答复: How do I get the buffered bytes in a FILE *?
On 4/17/22 18:10, Chet Ramey wrote: > On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote: >> Q) "How do I switch from FILE * to fd via fileno() without losing data." >> >> A) "Don't use FILE *" >> >> That's not the question I asked? > > The answer is correct, but incomplete. The missing piece is that if you > want to use FILE *, the operation you want, and the information you need to > implement it, are not part of the public API. Which is a fixable problem. > Other than using a strategy like Geoff suggested early on, or trying > something like setvbuf to turn off buffering on the FILE * completely, the > buffer associated with a FILE * and the indexes into it that say how much > data you've consumed from the underlying source are opaque. https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c > If you want to > manipulate that information, or expose it to a caller, you can't use FILE * > (or, if you want a direct answer, "you can't"). The if/else staircase in m4 and gnulib and so on says I can. I was just wondering if there was a _clean_ way to do it. (The gnu lot exposed it as part of glib's exports, but glib's license isn't compatible with my projects and pulling in a library like that just for one accessor function to read an integer out of a struct is a bit silly anyway.) The C99 guys point out they haven't got file descriptors and thus this would logically belong in posix, for the same reason fileno() does. "But FILE * doesn't have a way to fetch the file descriptor" was answered by adding fileno(). That is ALSO grabbing an integer out of the guts of FILE *. > I found it easier to write my own buffered input package to satisfy the > POSIX read ahead requirements than try to coerce stdio into doing it. Reimplementing FILE * from scratch and not using the ANSI one is a common way to address this limitation, yes. So is the if/else staircase of libc specific hacks. Or in the case of musl (and now bionic, according to the end of http://lists.landley.net/pipermail/toybox-landley.net/2022-April/012824.html) adding __freadahead(fp) to return the number of bytes in the buffer. Z/OS has it too: https://www.ibm.com/docs/en/zos/2.3.0?topic=lf-freadahead-retrieve-number-bytes-remaining-in-input-buffer Of course the lack of standardization means that Dragonfly BSD called theirs "__sreadahead()" instead... https://github.com/DragonFlyBSD/DragonFlyBSD/blob/master/lib/libc/stdio/sreadahead.c But can you really say it's opaque when so many independent reimplementations of how to access it already exist? This exists. It would be nice if it got standardized. Rob
Re: 答复: How do I get the buffered bytes in a FILE *?
On 4/18/22 12:53 AM, Rob Landley wrote: On 4/17/22 18:10, Chet Ramey wrote: On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote: Q) "How do I switch from FILE * to fd via fileno() without losing data." A) "Don't use FILE *" That's not the question I asked? The answer is correct, but incomplete. The missing piece is that if you want to use FILE *, the operation you want, and the information you need to implement it, are not part of the public API. Which is a fixable problem. Sure, everything's fixable. It's not what you asked, though. Other than using a strategy like Geoff suggested early on, or trying something like setvbuf to turn off buffering on the FILE * completely, the buffer associated with a FILE * and the indexes into it that say how much data you've consumed from the underlying source are opaque. https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c So the gnulib folks looked at a bunch of different stdio implementations and used non-public (or at least non-standard) portions of the implementation to agument the stdio API. If that's what you want to do, propose adding freadahead to the standard. Or reimplement the gnulib work and accept that the stdio implementation can potentially change out from under you. Current POSIX provides no help here. If you want to manipulate that information, or expose it to a caller, you can't use FILE * (or, if you want a direct answer, "you can't"). The if/else staircase in m4 and gnulib and so on says I can. Not in a way that protects you against changes to one of the underlying stdio implementations. And isn't that the point? You can always offer that functionality if you have stable access to stdio internals, but it's not in the standard. I was just wondering if there was a _clean_ way to do it. OK. Do you think you've gotten an answer to that? The C99 guys point out they haven't got file descriptors and thus this would logically belong in posix, for the same reason fileno() does. "But FILE * doesn't have a way to fetch the file descriptor" was answered by adding fileno(). That is ALSO grabbing an integer out of the guts of FILE *. Sure. And adding that to the standard would require the usual things, for which there's a process. This exists. It would be nice if it got standardized. Maybe it would. But that's a different question. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: 答复: How do I get the buffered bytes in a FILE *?
On 4/17/22 23:53, Rob Landley wrote: > On 4/17/22 18:10, Chet Ramey wrote: >> On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote: >>> Q) "How do I switch from FILE * to fd via fileno() without losing data." >>> >>> A) "Don't use FILE *" >>> >>> That's not the question I asked? >> >> The answer is correct, but incomplete. The missing piece is that if you >> want to use FILE *, the operation you want, and the information you need to >> implement it, are not part of the public API. > > Which is a fixable problem. > >> Other than using a strategy like Geoff suggested early on, or trying >> something like setvbuf to turn off buffering on the FILE * completely, the >> buffer associated with a FILE * and the indexes into it that say how much >> data you've consumed from the underlying source are opaque. > > https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c > > https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c And now it's in bionic: https://android-review.googlesource.com/c/platform/bionic/+/2227544 Backstory: protocols like http use mbox style data formats with lines of text followed by a blob of data. I want to use getline() to read the lines of text (which among other things tells me the size of the ensuing data), and then sendfile() the data afterwards. Posix provides fileno(FILE) to hand off an fd from getline() context to sendfile() context, but data is lost in the transition when the FILE * has read and buffered extra data. The API to ask how much data is in the buffer (so I can fread() it and be sure I have all of it, without triggering additional reads from the underlying fd) is __freadahead(), which posix does not yet standardize. The C standards committee does not use file descriptors: fileno() is posix. Rob