Re: How do I get the buffered bytes in a FILE *?

2022-04-16 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Tue, Apr 12, 2022 at 10:42:02AM +0100, Geoff Clare via austin-group-l at The 
Open Group wrote:
> Rob Landley wrote, on 11 Apr 2022:
> > A bunch of protocols (git, http, mbox, etc) start with lines of data
> > followed by a block of data, so it's natural to want to call
> > getline() and then handle the data block. But getline() takes a FILE
> > * and things like zlib and sendfile() take an integer file
> > descriptor.

> > Posix lets me get the file descriptor out of a FILE * with fileno(),
> > but the point of FILE * is to readahead and buffer. How do I get the
> > buffered data out without reading more from the file descriptor?

> > I can't find a portable way to do this?

> I tried this sequence of calls on a few systems, and it worked in the
> way you would expect:

> fgets(buf, sizeof buf, fp);
> int fd = dup(fileno(fp));
> close(fileno(fp));
> while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... }
> read(fd, buf, sizeof buf);

> It relies on fread() not detecting EBADF until it tries to read more
> data from the underlying fd.

> It has some caveats:

> 1. It needs a file descriptor to be available.

> 2. The close() will remove any fcntl() locks that the calling process
>holds for the file.

> 3. In a multi-threaded process it has the usual problem around fd
>inheritance, but that's addressed in Issue 8 with the addition
>of dup3().

There is another dangerous problem: if another thread or a signal
handler allocates another fd and it is assigned the number fileno(fp),
the while loop might read data from a completely unrelated file. This
could be avoided by dup2/dup3'ing /dev/null onto fileno(fp) instead of
closing it (at the cost of another file descriptor).

> Also, for the standard to require it to work, I think we would need to
> tweak the EBADF error for fgetc() (which fread() references) to say:

> The file descriptor underlying stream is not a valid file
> descriptor open for reading and there is no buffered data
> available to be returned.

Although I don't expect it to break in practice, the close(fileno(fp))
or dup2(..., fileno(fp)) violates the rules about the "active handle" in
XSH 2.5.1 Interaction of File Descriptors and Standard I/O Streams.

I believe the "correct" solution with a stdio implementation that
doesn't offer something like freadhead() is not to use stdio but
implement own buffering.

-- 
Jilles Tjoelker



Re: POSIX gettext() and uselocale()

2021-05-24 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Tue, May 04, 2021 at 01:07:39AM +0200, Bruno Haible via
austin-group-l at The Open Group wrote:
> https://posix.rhansen.org/p/gettext_split
> says (line 92):

>   "The returned string may be invalidated by a subsequent call to
>bind_textdomain_codeset(), bindtextdomain(), setlocale(),
>textdomain(), or uselocale()."

> While in most programs setlocale(), textdomain(), bindtextdomain(),
> bind_textdomain_codeset() are being called at the beginning of the
> program execution, before any call to gettext(), the situation is
> very different for uselocale().

> 1) uselocale() is meant to have effects ONLY on the thread in which it
>is called.

> 2) uselocale() is a helper function to implement *_l functions where
>the POSIX standard does not specify them or the system does not have
>them.
>For example, when a program wants to have a function to parse
>a number, recognizing only the ASCII digits and only '.' as decimal
>separator, a reliable way to implement such a function is by calling
>uselocale of the "C" locale, strtod(), and then uselocale() again
>to switch the thread back to the previous locale.

>If POSIX did not have uselocale(), it would need to provide many
>more *_l functions.

> If the gettext() result may be invalidated by a uselocale() call (in
> any other thread!), this would mean that

>   ** Programs can use gettext() or uselocale() but not both. **

> and - more or less -

>   ** Multithreaded programs that use libraries (that may use uselocale())
>  cannot use gettext(). **

> I think that specifying gettext() to be so restricted is not useful.
> It would make more sense to allow concurrent uselocale() calls.

> Proposed wording:

>   "The returned string may be invalidated by a subsequent call to
>bind_textdomain_codeset(), bindtextdomain(), setlocale(),
>or textdomain()."

This may be a bit too weak. Now the implementation can never free a
string that was returned by a gettext call on a thread with uselocale()
active, while logically the string may be owned by the locale and could
be freed if that locale is no longer set on any thread and freelocale()
has been called on it as needed.

-- 
Jilles Tjoelker



Re: [1003.1(2016/18)/Issue7+TC2 0001418]: Add options to ulimit to match get/setrlimit()

2020-11-17 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Tue, Nov 17, 2020 at 03:14:43PM +, Geoff Clare via austin-group-l
at The Open Group wrote:
> Or I could just go with my original suggestion of adding:

> Conforming applications shall specify each option separately; that is,
> grouping option letters (for example, −fH) need not be recognized by
> all implementations.

> to my proposal.

Unfortunately, even that will not be enough. A -H or -S option or the
end-of-options argument "--" may go between the resource option and the
limit value. For example, the following three commands
  ulimit -c -S 1
  ulimit -c -- 1
  ulimit -c -S -- 1
comply to the amended proposed specification, but do not work as
expected in bash (they seem to write the -c limit to stdout and ignore
the operand).

Of course, there is no need for the end-of-options argument, but
consistency with other utilities suggests it should be allowed.

So perhaps additionally:
  Conforming applications shall not use the -- argument to indicate the
  end of options and shall not place the -S and -H options after options
  indicating resources.

These are violations of Guidelines 10 and 11 from 12.2 Utility Syntax
Guidelines.

One might also wonder whether the language of options and operands is
still worth the trouble at this point.

-- 
Jilles Tjoelker



Re: [1003.1(2016/18)/Issue7+TC2 0001418]: Add options to ulimit to match get/setrlimit()

2020-11-13 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Mon, Nov 09, 2020 at 03:07:43PM +, Geoff Clare via austin-group-l
at The Open Group wrote:
> The ksh and bash behaviour of reporting multiple values seems more
> useful to me, but I wouldn't object if others want to make this
> unspecified.

With bash, reporting multiple values does not work if the options are
grouped into a single argument:

% bash -c 'ulimit -fn' 
bash: line 0: ulimit: n: invalid number
% bash -c 'ulimit -f -n'
file size   (blocks, -f) unlimited
open files  (-n) 231138

With ksh93, both these commands work as expected.

Similarly, commands like  ulimit -fH  do not work in bash. It must be
-Hf, -H -f or -f -H.

I'm testing with
GNU bash, version 5.0.18(3)-release (amd64-portbld-freebsd12.1)
and
  version sh (AT Research) 93u+ 2012-08-01

-- 
Jilles Tjoelker