How do I get the buffered bytes in a FILE *?

2022-04-11 Thread Rob Landley via austin-group-l at The Open Group
A bunch of protocols (git, http, mbox, etc) start with lines of data followed by
a block of data, so it's natural to want to call getline() and then handle the
data block. But getline() takes a FILE * and things like zlib and sendfile()
take an integer file descriptor.

Posix lets me get the file descriptor out of a FILE * with fileno(), but the
point of FILE * is to readahead and buffer. How do I get the buffered data out
without reading more from the file descriptor?

I can't find a portable way to do this?

In musl I can #include  and then use __freadahead(fp) to get the
number of bytes. Various other packages do an if/else staircase mucking with the
internals of every libc they personally know, ala:

https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c

But I can't find a portable way to do this?

Rob



Re: How do I get the buffered bytes in a FILE *?

2022-04-11 Thread Rob Landley via austin-group-l at The Open Group
On 4/11/22 13:54, Person Who Did Not CC The List wrote:
> On 4/11/22 2:48 PM, Rob Landley via austin-group-l at The Open Group wrote:
> 
>> But I can't find a portable way to do this?
> 
> There isn't one.

How do I request one? (Does WSG14 have a mailing list?)

Rob



Re: How do I get the buffered bytes in a FILE *?

2022-04-11 Thread Rob Landley via austin-group-l at The Open Group
On 4/11/22 14:20, Another Person Who Did Not CC The List wrote:
> On Mon, Apr 11, 2022 at 01:48:06PM -0500, Rob Landley via austin-group-l at 
> The Open Group wrote:
>> A bunch of protocols (git, http, mbox, etc) start with lines of data 
>> followed by
>> a block of data, so it's natural to want to call getline() and then handle 
>> the
>> data block. But getline() takes a FILE * and things like zlib and sendfile()
>> take an integer file descriptor.
>> 
>> Posix lets me get the file descriptor out of a FILE * with fileno(), but the
>> point of FILE * is to readahead and buffer. How do I get the buffered data 
>> out
>> without reading more from the file descriptor?
>> 
>> I can't find a portable way to do this?
> 
> Can you forgo using stdio entirely and just use open?   Most
> API's that take a file descriptor are generally prepared to
> do large I/O requests rather than character-by-character I/O
> anyway.   Modern kernels will buffer the file in the file/buffer
> cache.

That's what I did for years (toybox had a get_line() function instead of
getline() that did byte at a time reads with realloc()), introduced in 2007:

https://github.com/landley/toybox/blob/bc07865a504c/lib/lib.c#L594

But the Android guys complained it was really slow and ugly, which it was. (Sure
Linux buffers it but a system call per byte is still nuts, drives the scheduler
batty, makes your strace output impossible to follow...) Plus I was
reimplementing a libc API which I try not to do without a really good reason, so
when getline() was added to posix in 2008 I switched over MOST users to the new
posix api...

But the byte-at-a-time get_line() can't QUITE go away:

https://github.com/landley/toybox/commit/15cbb92dffc8

(Hands off to sendfile() when it's out of hunks...)

And it keeps wanting to come back:

https://github.com/landley/toybox/commit/601828982a53

(Adding gzip/deflate support means http 1.1 data lines are followed by gzipped
data payload.)

Last week a contributor implementing a subset of "git" is encountering the same
problem...

http://lists.landley.net/pipermail/toybox-landley.net/2022-April/012817.html

(Internally git's file format is a bunch of "keyword: value" text lines followed
by payload.)

It's come up other times over the years. Perennial problem. Unix has been doing
"keyword:value lines followed by payload" since the days of mbox files. Sure
http bodies started out as text, but they didn't stay that way...

It seems like a simple question to ask a File *, "what data you have buffered".
The reply "we're not capable of answering that question, therefore the
programmer shouldn't ever want to ask it" seems... fixable?

Rob

(Note that if I implement my own get_line() with extra leftover buffer handed
off between line reads to avoid the byte-at-a-time inefficiency, I've reinvented
FILE *. And none of the above use cases guarantee seekable input so it can't put
extra data it read data BACK into the file descriptor.)



Re: How do I get the buffered bytes in a FILE *?

2022-04-11 Thread Rob Landley via austin-group-l at The Open Group
On 4/11/22 15:41, Rich Felker wrote:
>> But I can't find a portable way to do this?
> 
> To give some context to this question, the __freadahead function
> present in musl libc was created in 2012 to resolve a conflict between
> gnulib, which has traditionally used an #ifdef jungle to provide
> "freadahead" and other functionality by poking at FILE internals for
> each known target they support, and musl, which explicitly makes FILE
> an opaque non-ABI type. The idea was to let them implement the
> function in a way that keeps the private member accesses on the stdio
> implementation side.
> 
> While I don't like this interface, gnulib is longstanding historical
> precedent for its existence ~somewhere~ (just not as part of the
> implementation), and it's historical precedent for major software
> wanting this kind of access to stdio.

I just emailed the chair of the C standard group and 90% of his reply was about
text vs binary mode with FILE * (which is not present in Linux, MacOS, Android,
iOS, Solaris, any embedded OS I've encountered...)

I'm personally fine with fileno() returning -1 when the FILE * is in text mode,
let alone freadahead(). Even the coreutils developers are noping out of support
for things like cygwin now that Windows Subsystem for Linux exists:

https://lists.gnu.org/archive/html/coreutils/2022-04/msg00038.html

The C committee chair then said:

>   File descriptors are outside the scope of the C standard, so any
> support for switching back and forth between streams and file
> descriptors belongs elsewhere.

I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING with
file descriptors.

That's why fileno() and fdopen() are only in posix, not in ANSI C. And the
issues I'm currently trying to solve are a result of getline() showing up in
posix-2008, which also does not exist in ANSI C.

Thus an freadahead() function to encapsulate the horrible #ifdef staircase
people are already repeatedly reinventing belongs in Posix, not ANSI. The
function is needed to make Posix's existing fileno() reliable, and in the
absence of a standard this has already been reimplemented multiple times.

GNU has attempted to centralize its workaround collection in gnulib:
https://github.com/digitalocean/gnulib/blob/master/lib/freadahead.c

(Leading to a bunch of patches for m4 and glib coming up when you google for
freadahead because said staircase breaks a lot.)

But even IBM Z/OS implemented __freadahead():

https://www.ibm.com/docs/en/zos/2.4.0?topic=lf-freadahead-retrieve-number-bytes-remaining-in-input-buffer#freadahead

The lack of this function derailed a port to VMS a few years back:

https://sourceforge.net/p/vms-ports/tickets/61/

Here's somebody trying to update i386 support in buildroot and guess what they
needed to patch:

https://www.zephray.me/post/create_coremark_boot_disk_for_386/#:~:text=freadahead

*shrug* This is a thing people are reimplementing a lot.

Rob



Re: How do I get the buffered bytes in a FILE *?

2022-04-12 Thread Geoff Clare via austin-group-l at The Open Group
Rob Landley wrote, on 11 Apr 2022:
>
> A bunch of protocols (git, http, mbox, etc) start with lines of data followed 
> by
> a block of data, so it's natural to want to call getline() and then handle the
> data block. But getline() takes a FILE * and things like zlib and sendfile()
> take an integer file descriptor.
> 
> Posix lets me get the file descriptor out of a FILE * with fileno(), but the
> point of FILE * is to readahead and buffer. How do I get the buffered data out
> without reading more from the file descriptor?
> 
> I can't find a portable way to do this?

I tried this sequence of calls on a few systems, and it worked in the
way you would expect:

fgets(buf, sizeof buf, fp);
int fd = dup(fileno(fp));
close(fileno(fp));
while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... }
read(fd, buf, sizeof buf);

It relies on fread() not detecting EBADF until it tries to read more
data from the underlying fd.

It has some caveats:

1. It needs a file descriptor to be available.

2. The close() will remove any fcntl() locks that the calling process
   holds for the file.

3. In a multi-threaded process it has the usual problem around fd
   inheritance, but that's addressed in Issue 8 with the addition
   of dup3().

Also, for the standard to require it to work, I think we would need to
tweak the EBADF error for fgetc() (which fread() references) to say:

The file descriptor underlying stream is not a valid file
descriptor open for reading and there is no buffered data
available to be returned.

(adding the "and ..." part).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: How do I get the buffered bytes in a FILE *?

2022-04-16 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Tue, Apr 12, 2022 at 10:42:02AM +0100, Geoff Clare via austin-group-l at The 
Open Group wrote:
> Rob Landley wrote, on 11 Apr 2022:
> > A bunch of protocols (git, http, mbox, etc) start with lines of data
> > followed by a block of data, so it's natural to want to call
> > getline() and then handle the data block. But getline() takes a FILE
> > * and things like zlib and sendfile() take an integer file
> > descriptor.

> > Posix lets me get the file descriptor out of a FILE * with fileno(),
> > but the point of FILE * is to readahead and buffer. How do I get the
> > buffered data out without reading more from the file descriptor?

> > I can't find a portable way to do this?

> I tried this sequence of calls on a few systems, and it worked in the
> way you would expect:

> fgets(buf, sizeof buf, fp);
> int fd = dup(fileno(fp));
> close(fileno(fp));
> while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... }
> read(fd, buf, sizeof buf);

> It relies on fread() not detecting EBADF until it tries to read more
> data from the underlying fd.

> It has some caveats:

> 1. It needs a file descriptor to be available.

> 2. The close() will remove any fcntl() locks that the calling process
>holds for the file.

> 3. In a multi-threaded process it has the usual problem around fd
>inheritance, but that's addressed in Issue 8 with the addition
>of dup3().

There is another dangerous problem: if another thread or a signal
handler allocates another fd and it is assigned the number fileno(fp),
the while loop might read data from a completely unrelated file. This
could be avoided by dup2/dup3'ing /dev/null onto fileno(fp) instead of
closing it (at the cost of another file descriptor).

> Also, for the standard to require it to work, I think we would need to
> tweak the EBADF error for fgetc() (which fread() references) to say:

> The file descriptor underlying stream is not a valid file
> descriptor open for reading and there is no buffered data
> available to be returned.

Although I don't expect it to break in practice, the close(fileno(fp))
or dup2(..., fileno(fp)) violates the rules about the "active handle" in
XSH 2.5.1 Interaction of File Descriptors and Standard I/O Streams.

I believe the "correct" solution with a stdio implementation that
doesn't offer something like freadhead() is not to use stdio but
implement own buffering.

-- 
Jilles Tjoelker



Re: How do I get the buffered bytes in a FILE *?

2022-04-18 Thread Rob Landley via austin-group-l at The Open Group
Sigh, spam filter impounded this. (Gotta move off gmail...)

On 4/12/22 04:42, Geoff Clare via austin-group-l at The Open Group wrote:
> Rob Landley wrote, on 11 Apr 2022:
>>
>> A bunch of protocols (git, http, mbox, etc) start with lines of data 
>> followed by
>> a block of data, so it's natural to want to call getline() and then handle 
>> the
>> data block. But getline() takes a FILE * and things like zlib and sendfile()
>> take an integer file descriptor.
>> 
>> Posix lets me get the file descriptor out of a FILE * with fileno(), but the
>> point of FILE * is to readahead and buffer. How do I get the buffered data 
>> out
>> without reading more from the file descriptor?
>> 
>> I can't find a portable way to do this?
> 
> I tried this sequence of calls on a few systems, and it worked in the
> way you would expect:
> 
> fgets(buf, sizeof buf, fp);
> int fd = dup(fileno(fp));
> close(fileno(fp));
> while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... }
> read(fd, buf, sizeof buf);
> 
> It relies on fread() not detecting EBADF until it tries to read more
> data from the underlying fd.

Hmmm. That's an interesting approach.

> It has some caveats:
> 
> 1. It needs a file descriptor to be available.

Understood, but acceptable.

> 2. The close() will remove any fcntl() locks that the calling process
>holds for the file.

Fine.

> 3. In a multi-threaded process it has the usual problem around fd
>inheritance, but that's addressed in Issue 8 with the addition
>of dup3().

Threads break everything anyway, but you could dup2(/dev/null) if you cared
about them.

> Also, for the standard to require it to work, I think we would need to
> tweak the EBADF error for fgetc() (which fread() references) to say:
> 
> The file descriptor underlying stream is not a valid file
> descriptor open for reading and there is no buffered data
> available to be returned.
> 
> (adding the "and ..." part).

Sounds reasonable. I'll give this a try.

Thanks,

Rob



答复: How do I get the buffered bytes in a FILE *?

2022-04-15 Thread Danny Niu via austin-group-l at The Open Group
Rob, you can use the MSG_PEEK flag on recv(2) instead of relying on stdio FILE* 
handles.

发件人: Rob Landley via austin-group-l at The Open Group 

日期: 星期二, 2022-04-12 05:59:31
收件人: Rich Felker 
抄送: austin-group-l@opengroup.org 
主题: Re: How do I get the buffered bytes in a FILE *?
On 4/11/22 15:41, Rich Felker wrote:
>> But I can't find a portable way to do this?
>
> To give some context to this question, the __freadahead function
> present in musl libc was created in 2012 to resolve a conflict between
> gnulib, which has traditionally used an #ifdef jungle to provide
> "freadahead" and other functionality by poking at FILE internals for
> each known target they support, and musl, which explicitly makes FILE
> an opaque non-ABI type. The idea was to let them implement the
> function in a way that keeps the private member accesses on the stdio
> implementation side.
>
> While I don't like this interface, gnulib is longstanding historical
> precedent for its existence ~somewhere~ (just not as part of the
> implementation), and it's historical precedent for major software
> wanting this kind of access to stdio.

I just emailed the chair of the C standard group and 90% of his reply was about
text vs binary mode with FILE * (which is not present in Linux, MacOS, Android,
iOS, Solaris, any embedded OS I've encountered...)

I'm personally fine with fileno() returning -1 when the FILE * is in text mode,
let alone freadahead(). Even the coreutils developers are noping out of support
for things like cygwin now that Windows Subsystem for Linux exists:

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0

The C committee chair then said:

>   File descriptors are outside the scope of the C standard, so any
> support for switching back and forth between streams and file
> descriptors belongs elsewhere.

I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING with
file descriptors.

That's why fileno() and fdopen() are only in posix, not in ANSI C. And the
issues I'm currently trying to solve are a result of getline() showing up in
posix-2008, which also does not exist in ANSI C.

Thus an freadahead() function to encapsulate the horrible #ifdef staircase
people are already repeatedly reinventing belongs in Posix, not ANSI. The
function is needed to make Posix's existing fileno() reliable, and in the
absence of a standard this has already been reimplemented multiple times.

GNU has attempted to centralize its workaround collection in gnulib:
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerzOg%2FNc%3D&reserved=0

(Leading to a bunch of patches for m4 and glib coming up when you google for
freadahead because said staircase breaks a lot.)

But even IBM Z/OS implemented __freadahead():

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdocs%2Fen%2Fzos%2F2.4.0%3Ftopic%3Dlf-freadahead-retrieve-number-bytes-remaining-in-input-buffer%23freadahead&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=i7WGhxZ9VW21u1Az3Yc6lcl3DZLTQcswfc8wLb6SK3M%3D&reserved=0

The lack of this function derailed a port to VMS a few years back:

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fp%2Fvms-ports%2Ftickets%2F61%2F&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=H30KHDv8zLCzMUYrWmEDe1VVR7ZPmwSu7i1eRdd4FNc%3D&reserved=0

Here's somebody trying to update i386 support in buildroot and guess what they
needed to patch:

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zephray.me%2Fpost%2Fcreate_coremark_boot_disk_for_386%2F%23%3A~%3Atext%3Dfreadahead&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fT0ive9CHUOpphQqo0ePRbtWTnYK1XlHhY7EGmTuChs%3D&reserved=0

*shrug* This is a thing people are reimplementing a lot.

Rob


Re: 答复: How do I get the buffered bytes in a FILE *?

2022-04-16 Thread Rob Landley via austin-group-l at The Open Group
Q) "How do I switch from FILE * to fd via fileno() without losing data."

A) "Don't use FILE *"

That's not the question I asked?

The C99 guys said they haven't got fileno() or anything using file descriptors,
so this ball is not not in their court. Posix has fileno(). That's why I'm
asking here.

Rob

On 4/16/22 00:44, Danny Niu wrote:
> Rob, you can use the MSG_PEEK flag on recv(2) instead of relying on stdio 
> FILE*
> handles.
> 
>  
> 
> *发件人**:*Rob Landley via austin-group-l at The Open Group
> 
> *日期**:*星期二, 2022-04-12 05:59:31
> *收件人**:*Rich Felker 
> *抄送**:*austin-group-l@opengroup.org 
> *主题**:*Re: How do I get the buffered bytes in a FILE *?
> 
> On 4/11/22 15:41, Rich Felker wrote:
>>> But I can't find a portable way to do this?
>> 
>> To give some context to this question, the __freadahead function
>> present in musl libc was created in 2012 to resolve a conflict between
>> gnulib, which has traditionally used an #ifdef jungle to provide
>> "freadahead" and other functionality by poking at FILE internals for
>> each known target they support, and musl, which explicitly makes FILE
>> an opaque non-ABI type. The idea was to let them implement the
>> function in a way that keeps the private member accesses on the stdio
>> implementation side.
>> 
>> While I don't like this interface, gnulib is longstanding historical
>> precedent for its existence ~somewhere~ (just not as part of the
>> implementation), and it's historical precedent for major software
>> wanting this kind of access to stdio.
> 
> I just emailed the chair of the C standard group and 90% of his reply was 
> about
> text vs binary mode with FILE * (which is not present in Linux, MacOS, 
> Android,
> iOS, Solaris, any embedded OS I've encountered...)
> 
> I'm personally fine with fileno() returning -1 when the FILE * is in text 
> mode,
> let alone freadahead(). Even the coreutils developers are noping out of 
> support
> for things like cygwin now that Windows Subsystem for Linux exists:
> 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0
> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.html&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3D&reserved=0>
> 
> The C committee chair then said:
> 
>>   File descriptors are outside the scope of the C standard, so any
>> support for switching back and forth between streams and file
>> descriptors belongs elsewhere.
> 
> I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING 
> with
> file descriptors.
> 
> That's why fileno() and fdopen() are only in posix, not in ANSI C. And the
> issues I'm currently trying to solve are a result of getline() showing up in
> posix-2008, which also does not exist in ANSI C.
> 
> Thus an freadahead() function to encapsulate the horrible #ifdef staircase
> people are already repeatedly reinventing belongs in Posix, not ANSI. The
> function is needed to make Posix's existing fileno() reliable, and in the
> absence of a standard this has already been reimplemented multiple times.
> 
> GNU has attempted to centralize its workaround collection in gnulib:
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerzOg%2FNc%3D&reserved=0
> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.c&data=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerz

Re: 答复: How do I get the buffered bytes in a FILE *?

2022-04-17 Thread Chet Ramey via austin-group-l at The Open Group

On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote:

Q) "How do I switch from FILE * to fd via fileno() without losing data."

A) "Don't use FILE *"

That's not the question I asked?


The answer is correct, but incomplete. The missing piece is that if you
want to use FILE *, the operation you want, and the information you need to
implement it, are not part of the public API.

Other than using a strategy like Geoff suggested early on, or trying
something like setvbuf to turn off buffering on the FILE * completely, the
buffer associated with a FILE * and the indexes into it that say how much
data you've consumed from the underlying source are opaque. If you want to
manipulate that information, or expose it to a caller, you can't use FILE *
(or, if you want a direct answer, "you can't").

I found it easier to write my own buffered input package to satisfy the
POSIX read ahead requirements than try to coerce stdio into doing it.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: 答复: How do I get the buffered bytes in a FILE *?

2022-04-17 Thread Rob Landley via austin-group-l at The Open Group
On 4/17/22 18:10, Chet Ramey wrote:
> On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote:
>> Q) "How do I switch from FILE * to fd via fileno() without losing data."
>> 
>> A) "Don't use FILE *"
>> 
>> That's not the question I asked?
> 
> The answer is correct, but incomplete. The missing piece is that if you
> want to use FILE *, the operation you want, and the information you need to
> implement it, are not part of the public API.

Which is a fixable problem.

> Other than using a strategy like Geoff suggested early on, or trying
> something like setvbuf to turn off buffering on the FILE * completely, the
> buffer associated with a FILE * and the indexes into it that say how much
> data you've consumed from the underlying source are opaque.

https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c

https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c

> If you want to
> manipulate that information, or expose it to a caller, you can't use FILE *
> (or, if you want a direct answer, "you can't").

The if/else staircase in m4 and gnulib and so on says I can. I was just
wondering if there was a _clean_ way to do it. (The gnu lot exposed it as part
of glib's exports, but glib's license isn't compatible with my projects and
pulling in a library like that just for one accessor function to read an integer
out of a struct is a bit silly anyway.)

The C99 guys point out they haven't got file descriptors and thus this would
logically belong in posix, for the same reason fileno() does. "But FILE *
doesn't have a way to fetch the file descriptor" was answered by adding
fileno(). That is ALSO grabbing an integer out of the guts of FILE *.

> I found it easier to write my own buffered input package to satisfy the
> POSIX read ahead requirements than try to coerce stdio into doing it.

Reimplementing FILE * from scratch and not using the ANSI one is a common way to
address this limitation, yes. So is the if/else staircase of libc specific
hacks. Or in the case of musl (and now bionic, according to the end of
http://lists.landley.net/pipermail/toybox-landley.net/2022-April/012824.html)
adding __freadahead(fp) to return the number of bytes in the buffer.

Z/OS has it too:

https://www.ibm.com/docs/en/zos/2.3.0?topic=lf-freadahead-retrieve-number-bytes-remaining-in-input-buffer

Of course the lack of standardization means that Dragonfly BSD called theirs
"__sreadahead()" instead...

https://github.com/DragonFlyBSD/DragonFlyBSD/blob/master/lib/libc/stdio/sreadahead.c

But can you really say it's opaque when so many independent reimplementations of
how to access it already exist?

This exists. It would be nice if it got standardized.

Rob



Re: 答复: How do I get the buffered bytes in a FILE *?

2022-04-18 Thread Chet Ramey via austin-group-l at The Open Group

On 4/18/22 12:53 AM, Rob Landley wrote:

On 4/17/22 18:10, Chet Ramey wrote:

On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote:

Q) "How do I switch from FILE * to fd via fileno() without losing data."

A) "Don't use FILE *"

That's not the question I asked?


The answer is correct, but incomplete. The missing piece is that if you
want to use FILE *, the operation you want, and the information you need to
implement it, are not part of the public API.


Which is a fixable problem.


Sure, everything's fixable. It's not what you asked, though.




Other than using a strategy like Geoff suggested early on, or trying
something like setvbuf to turn off buffering on the FILE * completely, the
buffer associated with a FILE * and the indexes into it that say how much
data you've consumed from the underlying source are opaque.


https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c


So the gnulib folks looked at a bunch of different stdio implementations
and used non-public (or at least non-standard) portions of the
implementation to agument the stdio API.

If that's what you want to do, propose adding freadahead to the standard.

Or reimplement the gnulib work and accept that the stdio implementation
can potentially change out from under you. Current POSIX provides no help
here.



If you want to
manipulate that information, or expose it to a caller, you can't use FILE *
(or, if you want a direct answer, "you can't").


The if/else staircase in m4 and gnulib and so on says I can.


Not in a way that protects you against changes to one of the underlying
stdio implementations. And isn't that the point? You can always offer that
functionality if you have stable access to stdio internals, but it's not in
the standard.


I was just wondering if there was a _clean_ way to do it. 


OK. Do you think you've gotten an answer to that?



The C99 guys point out they haven't got file descriptors and thus this would
logically belong in posix, for the same reason fileno() does. "But FILE *
doesn't have a way to fetch the file descriptor" was answered by adding
fileno(). That is ALSO grabbing an integer out of the guts of FILE *.


Sure. And adding that to the standard would require the usual things, for
which there's a process.


This exists. It would be nice if it got standardized.


Maybe it would. But that's a different question.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: 答复: How do I get the buffered bytes in a FILE *?

2022-09-22 Thread Rob Landley via austin-group-l at The Open Group
On 4/17/22 23:53, Rob Landley wrote:
> On 4/17/22 18:10, Chet Ramey wrote:
>> On 4/16/22 2:58 PM, Rob Landley via austin-group-l at The Open Group wrote:
>>> Q) "How do I switch from FILE * to fd via fileno() without losing data."
>>> 
>>> A) "Don't use FILE *"
>>> 
>>> That's not the question I asked?
>> 
>> The answer is correct, but incomplete. The missing piece is that if you
>> want to use FILE *, the operation you want, and the information you need to
>> implement it, are not part of the public API.
> 
> Which is a fixable problem.
> 
>> Other than using a strategy like Geoff suggested early on, or trying
>> something like setvbuf to turn off buffering on the FILE * completely, the
>> buffer associated with a FILE * and the indexes into it that say how much
>> data you've consumed from the underlying source are opaque.
> 
> https://github.com/coreutils/gnulib/blob/master/lib/freadahead.c
> 
> https://sources.debian.org/src/m4/1.4.18-2/lib/freadahead.c

And now it's in bionic:

https://android-review.googlesource.com/c/platform/bionic/+/2227544

Backstory: protocols like http use mbox style data formats with lines of text
followed by a blob of data. I want to use getline() to read the lines of text
(which among other things tells me the size of the ensuing data), and then
sendfile() the data afterwards.

Posix provides fileno(FILE) to hand off an fd from getline() context to
sendfile() context, but data is lost in the transition when the FILE * has read
and buffered extra data. The API to ask how much data is in the buffer (so I can
fread() it and be sure I have all of it, without triggering additional reads
from the underlying fd) is __freadahead(), which posix does not yet standardize.

The C standards committee does not use file descriptors: fileno() is posix.

Rob