Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-16 Thread Christoph Anton Mitterer via austin-group-l at The Open Group
On Tue, 2022-04-05 at 23:33 +0700, Robert Elz via austin-group-l at The
Open Group wrote:
> Not just portable, but sane.   Only a moron would actually use . ? *
> [ ( ...
> as a delimiter, there are plenty of perfectly good alternatives
> available
> when good old / isn't the best choice (which it often isn't when
> manipulating path names).   Personally I'm quite fond of ascii BEL
> (^G)
> as the delimiter in the cases when neither / nor ; (my 2 favourites)
> are really available (while BEL probably isn't technically portable,
> it always works in my experience).
> 
> It still needs to be clear that it is possible to be a moron if one
> wants, but in such cases, some things just might not be possible.

I should perhaps add that I don't actually want to use special
characters as delimiters myself. ;-)

My use case is rather writing a function which escapes arbitrary
strings as literal for use in BREs respectivel EREs and also for the
use in sed commands (thus the delimiter to be used by the user of my
function needs to be considered).

I didn't just want to forbid using any special characters as delimiter,
if it would technically work.


>   | It might be worth altering this somehow, but "literal" is wrong
>   | (specifically if the delimiter is '^' or '-', or things like ':'
> in
>   | [[:alpha:]]).
> 
> That depends upon the context of the word "literal" there - I just
> took
> it to mean that the character would mean the same thing as it would
> if it
> were not also the delimiter, not that it would be deprived of any
> other
> magic properties it might gain by such use.

I like Geoff's choice of "normal" plus the example.

"Literal" and assuming some context (even when explaining it) would
have just made the text again ambiguous or at least more complex to
read.


>   | > => And perhaps something like "should put it inside a bracket
>   | > expression __with not other characters__" to make clear, that
> one
>   | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as
>   | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'.
>   |
>   | Incorrect, sX[X0-9]XfooX is required to "work"
> 
> I think the point there was that it doesn't mean the same thing, in
> that
> one a single char is being substituted, in the others, it is a 2 char
> sequence,
> the delimiter, followed by a digit, not either the delimiter or a
> digit.

I've added my thoughts about the solution Geoff and you came up with in
my upcoming (probably tomorrow) reply to Geoff's longer mail.


Thanks,
Chris.



[Issue 8 drafts 0001578]: sed y-command: error in description about the number of characters in string1 and string2

2022-04-16 Thread Austin Group Bug Tracker via austin-group-l at The Open Group


The following issue has been SUBMITTED. 
== 
https://www.austingroupbugs.net/view.php?id=1578 
== 
Reported By:calestyo
Assigned To:
== 
Project:Issue 8 drafts
Issue ID:   1578
Category:   Base Definitions and Headers
Type:   Error
Severity:   Editorial
Priority:   normal
Status: New
Name:   Christoph Anton Mitterer 
Organization:
User Reference:  
Section:Utilities, sed 
Page Number:3138 
Line Number:106249 
Final Accepted Text: 
== 
Date Submitted: 2022-04-17 01:40 UTC
Last Modified:  2022-04-17 01:40 UTC
== 
Summary:sed y-command: error in description about the number
of characters in string1 and string2
Description: 
Hey.

I noted this originally in
https://austingroupbugs.net/view.php?id=1551#c5780 and there in my point
(VI) (at the very bottom of that note):

The description of the y-command contains on page 3138, line 106249:
"If the number of characters in string1 and string2 are not equal, or if
any of the characters in string1 appear more than once, the results are
undefined."

That is strictly speaking wrong, namely in the case when string1 and/or
string2 contains '\'-escaped 'n' (for newline) or a '\'-escaped delimiters,
and the number of occurrences in both strings don't even out.
Desired Action: 
Perhaps simply write "If the number of characters (after resolving any
escape sequences)..." or so?
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2022-04-17 01:40 calestyo   New Issue
2022-04-17 01:40 calestyo   Name  => Christoph Anton
Mitterer
2022-04-17 01:40 calestyo   Section   => Utilities, sed  
2022-04-17 01:40 calestyo   Page Number   => 3138
2022-04-17 01:40 calestyo   Line Number   => 106249  
==




Re: 答复: How do I get the buffered bytes in a FILE *?

2022-04-16 Thread Rob Landley via austin-group-l at The Open Group
Q) "How do I switch from FILE * to fd via fileno() without losing data."

A) "Don't use FILE *"

That's not the question I asked?

The C99 guys said they haven't got fileno() or anything using file descriptors,
so this ball is not not in their court. Posix has fileno(). That's why I'm
asking here.

Rob

On 4/16/22 00:44, Danny Niu wrote:
> Rob, you can use the MSG_PEEK flag on recv(2) instead of relying on stdio 
> FILE*
> handles.
> 
>  
> 
> *发件人**:*Rob Landley via austin-group-l at The Open Group
> 
> *日期**:*星期二, 2022-04-12 05:59:31
> *收件人**:*Rich Felker 
> *抄送**:*austin-group-l@opengroup.org 
> *主题**:*Re: How do I get the buffered bytes in a FILE *?
> 
> On 4/11/22 15:41, Rich Felker wrote:
>>> But I can't find a portable way to do this?
>> 
>> To give some context to this question, the __freadahead function
>> present in musl libc was created in 2012 to resolve a conflict between
>> gnulib, which has traditionally used an #ifdef jungle to provide
>> "freadahead" and other functionality by poking at FILE internals for
>> each known target they support, and musl, which explicitly makes FILE
>> an opaque non-ABI type. The idea was to let them implement the
>> function in a way that keeps the private member accesses on the stdio
>> implementation side.
>> 
>> While I don't like this interface, gnulib is longstanding historical
>> precedent for its existence ~somewhere~ (just not as part of the
>> implementation), and it's historical precedent for major software
>> wanting this kind of access to stdio.
> 
> I just emailed the chair of the C standard group and 90% of his reply was 
> about
> text vs binary mode with FILE * (which is not present in Linux, MacOS, 
> Android,
> iOS, Solaris, any embedded OS I've encountered...)
> 
> I'm personally fine with fileno() returning -1 when the FILE * is in text 
> mode,
> let alone freadahead(). Even the coreutils developers are noping out of 
> support
> for things like cygwin now that Windows Subsystem for Linux exists:
> 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.gnu.org%2Farchive%2Fhtml%2Fcoreutils%2F2022-04%2Fmsg00038.htmldata=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=qW6cnH8rwWc%2Bwv%2BQrfB4Di10WhaBgNBPdVLsjMvKnCM%3Dreserved=0
> 
> 
> The C committee chair then said:
> 
>>   File descriptors are outside the scope of the C standard, so any
>> support for switching back and forth between streams and file
>> descriptors belongs elsewhere.
> 
> I.E. ANSI C doesn't have read(), write(), or open(). They don't do ANYTHING 
> with
> file descriptors.
> 
> That's why fileno() and fdopen() are only in posix, not in ANSI C. And the
> issues I'm currently trying to solve are a result of getline() showing up in
> posix-2008, which also does not exist in ANSI C.
> 
> Thus an freadahead() function to encapsulate the horrible #ifdef staircase
> people are already repeatedly reinventing belongs in Posix, not ANSI. The
> function is needed to make Posix's existing fileno() reliable, and in the
> absence of a standard this has already been reimplemented multiple times.
> 
> GNU has attempted to centralize its workaround collection in gnulib:
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdigitalocean%2Fgnulib%2Fblob%2Fmaster%2Flib%2Ffreadahead.cdata=04%7C01%7C%7Cd6b866684e314bcdb99f08da1c068c38%7C84df9e7fe9f640afb435%7C1%7C0%7C637853111716104245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=y4vMMkD3zqnD7QCbH5i3Q6jUGEIl2UAY1znerzOg%2FNc%3Dreserved=0
> 
> 
> (Leading to a bunch of patches for m4 and glib coming up when you google for
> freadahead because said staircase breaks a lot.)
> 
> But even IBM Z/OS implemented __freadahead():
> 
> 

Re: How do I get the buffered bytes in a FILE *?

2022-04-16 Thread Jilles Tjoelker via austin-group-l at The Open Group
On Tue, Apr 12, 2022 at 10:42:02AM +0100, Geoff Clare via austin-group-l at The 
Open Group wrote:
> Rob Landley wrote, on 11 Apr 2022:
> > A bunch of protocols (git, http, mbox, etc) start with lines of data
> > followed by a block of data, so it's natural to want to call
> > getline() and then handle the data block. But getline() takes a FILE
> > * and things like zlib and sendfile() take an integer file
> > descriptor.

> > Posix lets me get the file descriptor out of a FILE * with fileno(),
> > but the point of FILE * is to readahead and buffer. How do I get the
> > buffered data out without reading more from the file descriptor?

> > I can't find a portable way to do this?

> I tried this sequence of calls on a few systems, and it worked in the
> way you would expect:

> fgets(buf, sizeof buf, fp);
> int fd = dup(fileno(fp));
> close(fileno(fp));
> while ((ret = fread(buf, 1, sizeof buf, fp)) > 0) { ... }
> read(fd, buf, sizeof buf);

> It relies on fread() not detecting EBADF until it tries to read more
> data from the underlying fd.

> It has some caveats:

> 1. It needs a file descriptor to be available.

> 2. The close() will remove any fcntl() locks that the calling process
>holds for the file.

> 3. In a multi-threaded process it has the usual problem around fd
>inheritance, but that's addressed in Issue 8 with the addition
>of dup3().

There is another dangerous problem: if another thread or a signal
handler allocates another fd and it is assigned the number fileno(fp),
the while loop might read data from a completely unrelated file. This
could be avoided by dup2/dup3'ing /dev/null onto fileno(fp) instead of
closing it (at the cost of another file descriptor).

> Also, for the standard to require it to work, I think we would need to
> tweak the EBADF error for fgetc() (which fread() references) to say:

> The file descriptor underlying stream is not a valid file
> descriptor open for reading and there is no buffered data
> available to be returned.

Although I don't expect it to break in practice, the close(fileno(fp))
or dup2(..., fileno(fp)) violates the rules about the "active handle" in
XSH 2.5.1 Interaction of File Descriptors and Standard I/O Streams.

I believe the "correct" solution with a stdio implementation that
doesn't offer something like freadhead() is not to use stdio but
implement own buffering.

-- 
Jilles Tjoelker