Re: test.1 -> "if file exists" -> "if pathname resolves to an existing directory entry"

2024-07-24 Thread tlaronde
On Wed, Jul 24, 2024 at 12:24:50AM +0700, Robert Elz wrote:
> Date:Tue, 23 Jul 2024 16:16:46 +0200
> From:
> Message-ID:  
> 
>   |   -e file   True if file exists (regardless of type).
>   |
>   | let me wondering: what "file" is supposed to exist? The symlink by
>   | itself? or what it points to?
> 
> Note the following wording in test(1) just before the EXIT STATUS
> section heading:
> 
>  Note that all file tests with the exception of -h and -L follow symbolic
>  links and thus evaluate the test for the file pointed at.
> 
> Which is another way of saying that they use stat(2) rather than lstat(2)
> except for -h (and the obsolete -L).

I will argue that the sentence is not in the correct place. It should
be at the head before describing the options.

For me, requiring to read the whole man page before attempting to
parse it (because one might incorrectly parse it because something is
defined, not conspicuously, _after_ being used) is suboptimal. Not to
mention that the form:

-e file   True if file exists (regardless of type).
-h file   True if file exists and is a symbolic link.

yields automatically the identification of the first "file" to the
second one i.e.: "file" is whatever name you gave (subject to standard
fully qualification relative to directories) without any
indirection.

I usually invoke sed(1) with the quit action (sed -n '5{p;q;}') once I
get what I was after, typically extracting a header or an excerpt from
a header, not wasting cycles to read a whole (perhaps huge) file I'm
not interested in except after this. For a manpage, I'm doing
the same ;-)

> 
> You can verify this by doing:
> 
>$ ln -s /foo/bar /tmp/SL
>$ test -e /tmp/SL && echo SL exists
>$
> 
> Needless to say the assumption here is that /foo/bar does not exist.
> 
> To achieve what you want:
> 
>$ test -e /tmp/SL || test -h /tmp/SL && echo SL exists or is a symlink
>SL exists or is a symlink
>$
> 
> If you need a single (unraceable) test, that can often be achieved by
> attempting to make a link to the target filename, as link(2) and hence
> ln(1) without -f will fail if the target name exists.
> 
> kre

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
 http://nunc-et-hic.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


test.1 -> "if file exists" -> "if pathname resolves to an existing directory entry"

2024-07-23 Thread tlaronde
FWIW, the current test.1 man page could, IMHO, be improved by
mimicking POSIX wording i.e. by replacing "if file exists" by the
corresponding variant of "if pahtname resolves to an existing
directory entry...".

Context: I wondered how to test that a file exists in all cases, that
is, if it is a symlink, testing that it is pointing to an existing file,
and the current description of "-e":

-e file   True if file exists (regardless of type).

let me wondering: what "file" is supposed to exist? The symlink by
itself? or what it points to?

This explaining that.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: POSIX.2, IFS and echo command

2024-07-07 Thread tlaronde
On Sat, Jul 06, 2024 at 02:47:20PM -0700, Greg A. Woods wrote:
> At Sat, 6 Jul 2024 20:09:24 +0200, tlaro...@kergis.com wrote:
> Subject: Re: POSIX.2, IFS and echo command
> >
> > Thanks. I think then that dash(1) is, in some circumstances (what
> > version? what libc? Bug report was from a Debian node and I couldn't
> > reproduce this with the dash in pkgsrc) at fault and doesn't split the
> > arguments and passes "a\tb" as a single arg to echo.
> 
> dash(1) is kind of broken, I think by design/desire.
> 
> Herbert Xu forked a really old, and notably buggy at the time, NetBSD
> /bin/sh to create dash back in mid 1997.  Then for some reason he
> stopped pulling in NetBSD changes and fixes early in 2002 and so dash
> has diverged ever since, especially with its own unique changes.
> 
> For some interesting reading:
> 
>   https://www.in-ulm.de/~mascheck/various/ash/
> 
> Go up a couple of levels for even more good reading.

Thanks for the link! Quite a good reading! (not limited to this
chapter). Indeed, despite verifying POSIX.2 when I write something
that has to be run on a myriad of systems, I had to limit the features
used in the scripts to find a common implemented/accepted subset
to all the variants given as system sh(1)---your link provides an
interesting history about variants/forks and all things that were
fixed or not, at various times, in all the variants and that some
forks have been done once and are features fixed since---Android for
example.

But, BTW, for the present bug (in dash), since the pkgsrc version is
 up-to-date with the latest release (2022), the bug may be in the glibc
since dash run on NetBSD does not exhibit the problem (or the dash in
question is not the latest version; whatever).

echo $var seems to behave, on this version, as echo $@, not doing
field splitting.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
 http://nunc-et-hic.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: POSIX.2, IFS and echo command

2024-07-06 Thread tlaronde
On Sat, Jul 06, 2024 at 07:46:51PM +0200, ??? wrote:
> On Sat, Jul 06, 2024 at 06:09:16PM +0200, tlaro...@kergis.com wrote:
> > With sh(1), and specially on NetBSD, I tend to expect this behavior:
> > 
> > $ line=$(printf "a\tb")
> > $ echo $line | od -a
> > 000a  sp   b  nl
> > 004
> 
> line has the value a, tab, b.
> 
> Given the default value of IFS (space, tab, newline):
> $line splits this into a, b,
> so echo is as-if it were executed with "echo", "a", "b".
> 
> POSIX recommends against using echo in favour of printf due to the
> nonportability of echo, but in this case this is okay
> (no \es, first argument doesn't start with -).
> 

OK, this was my expectation.

> Thus (POSIX.1-2024 line numbers):
> 93207  STDOUT
> 93208  The echo utility arguments shall be separated by single 
> 93209  characters and a  character shall follow the last argument.
> 
> So echo produces a, space, b, newline.
> 

Uh! I missed the STDOUT paragraph and only read the DESCRIPTION:

"The echo utility writes its arguments to standard output, followed by
a . If there are no arguments, only the  is
written."

I will argue that the "single " precision could be put in the
description since it is as essential as the trailing newline, and
since outputing to stdout is not a by product but the core of what it
does...

> Your sample above is (for the default IFS), guaranteed to hold.
> 
> If IFS=' ' (so tab doesn't split line) or echo "$line" then you'd expect
>   $ echo "$line" | od -ta
>   000   a  ht   b  nl
>   004
> 

Thanks. I think then that dash(1) is, in some circumstances (what
version? what libc? Bug report was from a Debian node and I couldn't
reproduce this with the dash in pkgsrc) at fault and doesn't split the
arguments and passes "a\tb" as a single arg to echo.

Thanks,
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
 http://nunc-et-hic.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


POSIX.2, IFS and echo command

2024-07-06 Thread tlaronde
It's not a bug report, it's a search for enlightment.

KerTeX is used on many systems, and my own compilation/installation
framework uses a very limited subset of POSIX.2 utilities, starting of
course by sh(1).

With sh(1), and specially on NetBSD, I tend to expect this behavior:

$ line=$(printf "a\tb")
$ echo $line | od -a
000a  sp   b  nl
004

That is line, not between double quotes, is expanded; then field
splitting is done with default IFS and echo prints two arguments, and
prints them separated by a space.

It happens that dash(1) (at least one version) and when such a call 
is done in a subshell invocation '(...)' keeps the tab separating a and
b---this is piped to a sed call and hence the regex is failing because
it expects spaces, not spaces or tabs.

I guess that the problem is with the implementation of echo as a
built-in, with "shortcuts", that is, it is not called with arguments in
an ellipsis or iterating other the arguments list.

But reading the description of the echo command in the Open Group
spec, I read nowhere with _what_ delimiter between them successive
arguments have to be printed by echo. All in all, an echo
concatenating arguments will be POSIX compliant, no?

Do I read incorrectly the spec?

TIA
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
 http://nunc-et-hic.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: GSOC - Inetd enhancements

2024-03-30 Thread tlaronde
On Sat, Mar 30, 2024 at 04:20:28PM +0200, cristian dragoi wrote:
> Hello,
> 
> My name is Cristian-Theodor Dr?goi, and I am currently in my final year at
> the Faculty of Automatic Control and Computers, University POLITEHNICA of
> Bucharest. Alongside my studies, I have been working as a part-time
> software engineer for the past five months.
> 
> I am writing to express my interest in the "Inetd Enhancements" project for
> GSOC 2024. My  interest in systems engineering and a desire to broaden my
> knowledge in this area have drawn me to this project. Although I do not
> have experience with programming in NetBSD, I am eager to learn and
> contribute to the project.
> 
> Should you wish to discuss this further, please feel free to contact me via
> this email address or refer to the contact information provided in my CV.
> 

I haven't read the Inetd enhancements proposal but, for what is worth,
I have already made significant modifications to NetBSD' inetd(8).
There are published in:

https://github.com/tlaronde/BeSiDe

(it's current NetBSD with incorporation of several modifications made
by me, including inetd(8)---and by side effect, mountd(8) that was
sharing code with inetd(8).)

I dare say that it will be more easy to add features to this version
instead of working with the current one.

But since I'm not directly concerned by the GSoC affair (neither by
NetBSD evolutions---I'm not a NetBSD developer), this is just to give
you a supplementary pointer to existing alternative code.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: MesaLib mpeg12 bug: lack of samples

2023-12-10 Thread tlaronde
On Sat, Dec 09, 2023 at 08:35:34PM +0100, tlaro...@kergis.com wrote:
> At least xine(1) and vlc(1) crashes, in various circumstances (xine(1)
> directly if invoked without argument; vlc(1) when trying to read a
> dvd) but the failure in both cases comes from MesaLib, gallium,
> namely, in MesaLib sources:
> 
> src/gallium/auxiliary/vl/vl_mpeg12_decoder.c
> 
> There are functions (not procedures) but treated like procedures i.e.
> the return value is in fact ignored, taken as granted with assertions
> being used to catch the failing cases by aborting.
> 
> And it aborts.
> 
> The problem is that a buffer is initted (guessing by the names),
> from samples. 3 planes are expected (a guess once more: red,
> green, blue in whatever order) but in my case there is only sample (one
> plane).
> 
>   => My screen has pixel depth = 16 bits stored in 2 bytes.
>   XXX I should try if going 24 bits exhibit the problem or not...
> 
>[...]

Switching to 24 bits doesn't circumvent the problem but there is a
difference in behavior:

At the moment:

- xine(1) crashes at whatever depth when invoked without argument
because it tries to display the logo: "xine --no-logo" passes this and
doesn't crash at least then. If invoked with a mp4 file: at 16bits,
sound is here but no image; at 24 bits, rendering... In all cases,
unable to play a DVD (crashes): bug in MesaLib/gallium, at least
with radeon r600 driver;

- vlc(1): doesn't crash when invoked without argument. Renders mp4 at
whatever depth (16 bits or 24 bits). Unable to play a DVD (crashes):
bug in MesaLib/gallium, at least with radeon r600 driver;

- ogle(1): able to play a DVD since it doesn't use "new" stuff...

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


MesaLib mpeg12 bug: lack of samples

2023-12-09 Thread tlaronde
At least xine(1) and vlc(1) crashes, in various circumstances (xine(1)
directly if invoked without argument; vlc(1) when trying to read a
dvd) but the failure in both cases comes from MesaLib, gallium,
namely, in MesaLib sources:

src/gallium/auxiliary/vl/vl_mpeg12_decoder.c

There are functions (not procedures) but treated like procedures i.e.
the return value is in fact ignored, taken as granted with assertions
being used to catch the failing cases by aborting.

And it aborts.

The problem is that a buffer is initted (guessing by the names),
from samples. 3 planes are expected (a guess once more: red,
green, blue in whatever order) but in my case there is only sample (one
plane).

=> My screen has pixel depth = 16 bits stored in 2 bytes.
XXX I should try if going 24 bits exhibit the problem or not...

But a loop goes for 3 iterations (3 planes) and encounters NULL for the
second one leading to the trigger of the assertions.

Modifying the code to stop the iteration when a NULL is encountered
and then bypassing the assertion in the caller, the DVD is rendered
with a rough image, partially scrambled, mainly green in my case.

=> This ressembles what I had with the 640 permissions on
/dev/dri/cardN...
XXX I should definitively test going 24 bits...

1) So does anyone know anything about the processing, this part of the
code and the way the "samples" (if this is it) are supposed to be
acquired. From where?

2) Is a 16 bits screen depth known to cause problems and is the code
normally ready for whatever color depth? (both in kernel and in
userland)

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[PATCH] gpt.8: add suffixes known by dehumanize_number

2023-10-26 Thread tlaronde
Here is a small diff to add the suffixes known by dehumanize_number
when specifying a size and saying that the suffix is case insensitive.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C
diff --git a/sbin/gpt/gpt.8 b/sbin/gpt/gpt.8
index d13782285fd..2445722fa2a 100644
--- a/sbin/gpt/gpt.8
+++ b/sbin/gpt/gpt.8
@@ -160,7 +160,7 @@ or
 .Sq S
 then size is in sectors, otherwise size is in bytes which must be
 a multiple of the device's sector size.
-Accepted suffix units are
+Accepted suffix units (case insensitive) are
 .Sq b
 to denote bytes,
 .Sq k
@@ -168,7 +168,13 @@ to denote kilobytes,
 .Sq m
 to denote megabytes and
 .Sq g
-to denote gigabytes.
+to denote gigabytes,
+.Sq t
+to denote terabytes,
+.Sq p
+to denote petabytes, and
+.Sq e
+to denote exabytes.
 The minimum size is 1 sector.
 .Pp
 The


DRM/KMS: report

2023-10-13 Thread tlaronde
[CAVEATS: Please remember that I'm not an english native speaker, and
that what follows is not a "lecture" or a judgement about what is done,
but a home made translation in some english of some of the notes---there
is more documentation to come later.
if I wanted to look at the DRM/KMS stuff, it was because I felt (and
still feel...) that I would never haved embarked in such an appalling
task to try to tame a thing like that ;-) I'm not "blaming" or
"naming and shaming"---or whatever the term is---or despising work
or people.]

3 months ago, I have engaged to take a look at the DRM/KMS object, with
the goal to ensure that the NetBSD kernel could be severed at will from
it.

Here is the report.

I will start with code for the impatients, and will continue with
documentation / comments and end with future directions (for me).

Note: I have finally taken again an Internet optical fiber connection
(after infelicities with a previous provider), so I have been able to
pull and push on a fork that is here:

https://github.com/tlaronde/src


 WHAT IS IN THESE SOURCES

commit 6d715506703ed9f0bec6a39fec8794b5b8eb
Author: Thierry LARONDE 
Date:   Fri Oct 13 18:39:03 2023 +0200

In order to allow to change, disable, enable, find or list devices
according to a pattern (specified between slashes; can be anchored at
beginning with '^'; at end with '$'; but no wildcard dot, or count or
range...), the userconf parsing are modified.

It works... but not for what I wanted. Giving /drm/ for example as a
pattern will actually disable all matching devices, but since
"radeondrmkmsfb" matches, you end up with no display at all because the
drm is nonetheless attempted.

"/kms$/" and "/drm$/" could work. But this is more a debugging feature
(except for find or list) than something to use bluntly for the moment.

Should we have /pattern/@/parent_pattern/? Or enforce a namespace
policy?

At least, one should use "list /pattern/" or "find /pattern/" before
modifying blindly.

commit e62e0b293986bfb3a749ab499d8367b5c6a161a2
Author: Thierry LARONDE 
Date:   Thu Oct 12 18:07:13 2023 +0200

Just add the precision that the pmap_pv_untrack() users are DRM2
aka DRMKMS drivers (not "legacy" DRM ones).

commit 930cf9cd86c51551b7731777df2882a64ba655b7
Author: Thierry LARONDE 
Date:   Thu Oct 12 09:00:56 2023 +0200

For consistency, what is related to monitors is not taken from
XFree86 but taken from the latest VESA DMT (v 1.0, Rev. 13).  So
modelines are removed, and dmt added, and the code fixed to work
with this with no user visible change for the moment. And some
modes not defined in the VESA DMT are put in an extradmt file, with
fixes for Mac monitors (taken from parameters in the Linux framebuffer
code).

For consistency too, published strings like "800x600x60" are replaced
by "800x600@60Hz" to avoid multiplying apples by oranges and
ambiguity about exactly what the last number describes.

The double scan entries were not used and are not generated.

DRM, DRM2 aka DRM/KMS: SOME NOTES

DRM or now DRM2 (aka DRM/KMS) are inherently and heavily linked to
X11 and to Linux.  Due to the size of the thing, NetBSD is deriving
a version from the one FreeBSD tries to derive. To make things worse,
the API is changing significantly. So we can only adapt late; and, de
facto, we always drag behind.

The important thing to keep in mind is that this is heavily linked to
X11. It's not something independent.

To make things even worse, the abuse of acronyms is blurring things that
didn't need to be made even less clear. Not to mention the fact that DRM
is also used for Digital Rights Management---that has strictly nothing
to do with the thing---, DRI (a part of the X11 stuff) is also used
instead of DRM for the X11 part, and DRM2 is also referred too as 
DRM/KMS.

The "legacy" ("first" version, at least in NetBSD) DRM drivers are these
ones (for x86 ones):

#i915drm*   at drm? # Intel i915, i945 DRM driver
#mach64drm* at drm? # mach64 (3D Rage Pro, Rage) DRM driver
#mgadrm*at drm? # Matrox G[24]00, G[45]50 DRM driver
#r128drm*   at drm? # ATI Rage 128 DRM driver
#radeondrm* at drm? # ATI Radeon DRM driver
#savagedrm* at drm? # S3 Savage DRM driver
#sisdrm*at drm? # SiS DRM driver
#tdfxdrm*   at drm? # 3dfx (voodoo) DRM driver

The drivers using the new API have sometimes "kms" in the name (for
i915, I guess to make a difference with the previous "legacy"
i915drm), but generally not, or if this is the case, this is not the
device attaching early:

# DRMKMS drivers 
i915drmkms* at pci? dev 

Re: bin/57544: sed(1) and regex(3) problem with encoding

2023-08-30 Thread tlaronde
On Wed, Aug 30, 2023 at 02:32:25PM -, Christos Zoulas wrote:
> In article ,
> RVP   wrote:
> >On Wed, 26 Jul 2023, tlaro...@polynum.com wrote:
> >
> >> $ export LC_CTYPE=fr_FR.ISO8859-15
> >>
> >> and then:
> >>
> >> $ echo "??" | sed 's/??\é/g'
> >> sed: 1: "s/??\é/g": RE error: trailing backslash (\)
> >>
> >
> >Not running NetBSD right now, but, FreeBSD 13.2 has the same issue which
> >can be seen even with a plain grep(1)--as it relies on the libc regexp
> >engine.
> >
> >Can you try the patch below (it is for NetBSD):
> 
> Why don't we make next and end unsigned char so that all instances are fixed?

Because one needs to review all the macros and all the invocations of
the macros because there are comparison between next and other
characters, and comparing unsigned char on one side and signed char on
the other is sure to introduce another can of worms.

I think RVP and I are in agreement about this: the whole lib should be
carefully reviewed. The patch proposed by RVP (the two casts, last patch
attached to the PR) is safe, correcting a fault and not modifying
something else; perhaps---and even probably--- not correcting all
the faults but at least, immediately, not introducing new ones.

I would have preferred that the library be "eight bits" clean
, i.e.  handling correctly the C language---ASCII---
and treating the extra range as is, with higher level libraries, if user
wants them, dealing with extended character sets and regex in order to
"compile" them to basic ones running on the core library, the way
microcode is converting CISC into RISC, with a core more simple (no
extended chars), sticking to C, and so more easy to make or prove
correct (the higher library explaining character classes and so on
according to the lang and the encoding etc.).

This whole "i18n" and "l10n" is a nightmare---and this is a not english
native speaker who writes it...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: sed(1) and LC_CTYPE

2023-07-31 Thread tlaronde
On Wed, Jul 26, 2023 at 07:17:17PM +0200, tlaro...@polynum.com wrote:
> On Wed, Jul 26, 2023 at 06:32:15PM +0200, Martin Husemann wrote:
> > On Wed, Jul 26, 2023 at 12:19:39PM -0400, Mouse wrote:
> > > > $ export LC_CTYPE=fr_FR.ISO8859-15
> > > 
> > > > $ echo "éé" | sed 's/é/\é/g'
> > > > sed: 1: "s/é/\é/g": RE error: trailing backslash (\)
> > > 
> > > I agree that's broken.
> > > 
> > > > Since, to my knowledge, we do not support anything via iconv or
> > > > whatever, shouldn't we assume simply a string of bytes \`a la C, that
> > > > is:
> > > 
> > > Seems to me there's a deeper problem.  Even if something like iconv
> > > _were_ available, fr_FR.ISO8859-15 is a single-octet character set, so
> > > 
> > > > -   (void) setlocale(LC_ALL, "");
> > > > +   (void) setlocale(LC_ALL, "POSIX");
> > > 
> > > should, it seems to me, make no difference.  Am I misunderstanding?
> > 
> > Indeed - and it only does on architectures where char == signed char:
> 
> Very good catch, indeed.
> 
> And this is a regression vs 9.3 and I suspect the main difference is the
> setlocale(3)---that allows not to solve, but to circumvent a more deeper
> problem.
> 
> PR sent as bin/57544

RVP has spotted the culprit (for this one; the whole code would need
a review for a similar problem in other uses and with the interaction
with the locales).

The amended diff, more explanations (and caveats) have been put in
bin/57544 and the correct behavior verified by compiling the libc
with this diff and compiling statically sed(1) against this amended
libc.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: sed(1) and LC_CTYPE

2023-07-26 Thread tlaronde
On Wed, Jul 26, 2023 at 05:27:51PM +, Taylor R Campbell wrote:
> > Date: Wed, 26 Jul 2023 17:32:03 +0200
> > From: tlaro...@polynum.com
> > 
> > If setting LC_CTYPE to this:
> > 
> > $ export LC_CTYPE=fr_FR.ISO8859-15
> > 
> > and then:
> > 
> > $ echo "??" | sed 's/?/\é/g'
> > sed: 1: "s/?/\é/g": RE error: trailing backslash (\)
> > 
> > Where does the program manage to find a backslash i.e. 0134? While
> > '?' is 0351.
> 
> Exactly what bytes are passed as an argument to sed?  Can you write a
> program that will hexdump argv[1] and pass the same argument to it?
> 
> Next step, if that reveals the expected 0xe9, is to find exactly what
> string is passed to regcomp inside sed.

RVP has sent (attached to GNATS bin/57544)  a diff against
regex/regcomp.c (in one place, an int c getting and promoting a signed 
char obtained by GETNEXT()).

The diff is attached to bin/57544.

I will try to compile a fixed NetBSD 10.0_BETA libc to see if this
solves the problem.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: sed(1) and LC_CTYPE

2023-07-26 Thread tlaronde
On Wed, Jul 26, 2023 at 06:32:15PM +0200, Martin Husemann wrote:
> On Wed, Jul 26, 2023 at 12:19:39PM -0400, Mouse wrote:
> > > $ export LC_CTYPE=fr_FR.ISO8859-15
> > 
> > > $ echo "éé" | sed 's/é/\é/g'
> > > sed: 1: "s/é/\é/g": RE error: trailing backslash (\)
> > 
> > I agree that's broken.
> > 
> > > Since, to my knowledge, we do not support anything via iconv or
> > > whatever, shouldn't we assume simply a string of bytes \`a la C, that
> > > is:
> > 
> > Seems to me there's a deeper problem.  Even if something like iconv
> > _were_ available, fr_FR.ISO8859-15 is a single-octet character set, so
> > 
> > > - (void) setlocale(LC_ALL, "");
> > > + (void) setlocale(LC_ALL, "POSIX");
> > 
> > should, it seems to me, make no difference.  Am I misunderstanding?
> 
> Indeed - and it only does on architectures where char == signed char:

Very good catch, indeed.

And this is a regression vs 9.3 and I suspect the main difference is the
setlocale(3)---that allows not to solve, but to circumvent a more deeper
problem.

PR sent as bin/57544

Thanks,
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


sed(1) and LC_CTYPE

2023-07-26 Thread tlaronde
If setting LC_CTYPE to this:

$ export LC_CTYPE=fr_FR.ISO8859-15

and then:

$ echo "éé" | sed 's/é/\é/g'
sed: 1: "s/é/\é/g": RE error: trailing backslash (\)

Where does the program manage to find a backslash i.e. 0134? While
'é' is 0351.

Since, to my knowledge, we do not support anything via iconv or
whatever, shouldn't we assume simply a string of bytes \`a la C,
that is:

diff --git a/usr.bin/sed/main.c b/usr.bin/sed/main.c
index d87bce2a5c85..c6b69a83cd57 100644
--- a/usr.bin/sed/main.c
+++ b/usr.bin/sed/main.c
@@ -136,7 +136,7 @@ main(int argc, char *argv[])
char *temp_arg;
 
setprogname(argv[0]);
-   (void) setlocale(LC_ALL, "");
+   (void) setlocale(LC_ALL, "POSIX");
 
fflag = 0;
inplace = NULL;

? With such a change, the result is:


$ echo "éé" | ./sed 's/é/\é/g'
éé

and this is what I expected.

What is the rationale for taking environment when all the code in the
src expects ASCII to start with? (for commands, range and so on).

What am I doing wrong?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


github and branch for inetd(8)

2023-07-20 Thread tlaronde
Would putting my inetd(8) work in a branch on github NetBSD/src be any
step forward for further testing and later, merging in the master trunk?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: DRM/KMS: countdown started

2023-07-13 Thread tlaronde
Le Thu, Jul 13, 2023 at 11:25:40AM +, Taylor R Campbell a écrit :
> > Date: Thu, 13 Jul 2023 11:35:50 +0200
> > From: tlaro...@polynum.com
> > 
> > Severing will be the first thing done. Whether I will be able, in this
> > first stage, to allow, from this newly created branch, to still
> > optionally use the current implementation of DRM/KMS is an open
> > question, and this goal is beyond the objectives of the first stage.
> 
> You know you can just omit the drm drivers from the kernel, right?
> For example, on amd64:
> 
> no i915drmkms* at pci?
> no radeon* at pci?
> no nouveau* at pci?
> #no amdgpu* at pci?   # (not even enabled by default on amd64)
> 
> You can also disable them at boot-time with userconf, and you'll get a
> dumb pci framebuffer with genfb instead:
> 
>> boot -c
>userconf> disable i915drmkms
>userconf> disable radeon
>userconf> disable nouveau
>userconf> quit
> 

Yes. But you have to specify all of them or know which one you will
encounter to disable it. This is what, from my POV, is wrong. This
should be possible to opt for it later and, at the minimum, the kernel
should be able to survive a problematic 2D setting by falling back to
a simple generic framebuffer. There are ways to circumvent the problem.
I want to try to address it.

> Same for the Arm boards that use the drm modesetting APIs, like
> tilcdc(4) and tifb(4).  But you'd have to rewrite all those drivers --
> and test them on a menagerie of Arm boards where the drm stuff has
> already been tested and works and has very small maintenance burden at
> this point.

That's the part I want to disentangle.

And, from my POV, the problem is not "maintenance" of existing/old.
It's the increasing burden of trying to follow upstream and to upgrade,
even simply to upgrade NetBSD part without having to touch DRM or
without being hampered by the present implementation creeping in the
kernel, or upgrade DRM/KMS without touching the rest of NetBSD.

And if the DRM/KMS stays with an "old" version, applications will not
work anymore because they are Linux centric and always go for "the
latest". And since the motto is "release often", the latest is every
other week.

> 
> I think you'll find you've misplaced where the painful parts of all
> this are, but go ahead...

Well, I know it will be painful ;-) That's why I have estimated 3 months
(not full time; but a significant part of my spare time).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


DRM/KMS: countdown started

2023-07-13 Thread tlaronde
I have received today, sent by Martin Husemann (thanks!), a copy of the
trees: cvs, git and hg.

So the countdown is started. I give me at most 3 months to disentangle
the tree in order for the present Linux originated DRM/KMS
implementation to be clearly severed so that alien and awkward
interfaces do not creep in the NetBSD kernel and so that, eventually,
graphics capabilities can be tested and opt in later in the process,
without taking down the whole kernel in case of problem.

Severing will be the first thing done. Whether I will be able, in this
first stage, to allow, from this newly created branch, to still
optionally use the current implementation of DRM/KMS is an open
question, and this goal is beyond the objectives of the first stage.


You will hear from me, reporting success or failure, at the latest on
14th of October 2023.

Le Thu, Jul 06, 2023 at 01:32:31PM +0200, tlaro...@polynum.com a écrit :
> As I think I have proven with inetd(8), when I engage to do something, I
> do.
> 
> I have started, some months ago now, to review problems with the display
> starting by the end: the monitor.
> 
> I have updated the DMT (sys/dev/videomode) and identified some problems
> (with Mac monitors) and published the result (that NetBSD has ignored
> while this corrects things, breaks nothing and is at least a step
> forward, independent from anything else).
> 
> So now I plan to resume my walk through the code but what is on
> https://wiki.netbsd.org/releng/netbsd-10/ seems to just confirm my
> engineeging guts feeling: that DRM/KMS is too big, too convoluted, too
> alien and that it's a lost battle to try to make this work with the
> kernel.
> 
> I'm the "software Hercules".
> 
> I have already "cleaned the GRASS stables"---because the GPL GRASS
> state was not satisfactory, so I simply came back to the last public
> domain CERL GRASS release and was able, alone, to get everything
> working---and to my surprise not only have what GPL GRASS had working,
> but also what GPL GRASS had _not_ working, while I had simply reorganised
> and posixified the sources: I had added nothing back then (now it is
> my professional code base; it has of course evolved and saw many
> additions).
> 
> I have also "cleaned the TeX stables"---once again totally unsatisfied
> of having the obligation to download gigabytes of stuff for a software
> system written to be ultra-portable (this is string manipulation so this
> is typically almost totally userland stuff) and small, a needle lost in
> tons of hay stack. The result is kerTeX. And not only a whole
> distribution, but I have written an extension: Prote, to accommodate
> LaTeX new requirements.
> 
> So yes: this is a perfectly valid engineering solution, if you took the
> wrong way to go back to the fork, and take another one.
> 
> So I'm proposing to go back to the fork (for this part starting by
> identifying what is immune, orthogonal to the thing) when the Linux
> DRM way was taken and to eject the Linux DRM/KMS.
> 
> If DRM has to be addressed, it will be addressed after, but doing it
> our own way.
> 
> And I'm proposing to help.
> -- 
> Thierry Laronde 
>  http://www.kergis.com/
> http://kertex.kergis.com/
> Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: DRM/KMS

2023-07-07 Thread tlaronde
Le Fri, Jul 07, 2023 at 08:02:22PM +0100, David Brownlee a écrit :
> On Fri, 7 Jul 2023 at 18:19,  wrote:
> >
> > Le Fri, Jul 07, 2023 at 03:08:25PM +0100, David Brownlee a écrit :
> > > On Fri, 7 Jul 2023 at 15:03, Martin Husemann  wrote:
> > > >
> > > > On Fri, Jul 07, 2023 at 02:30:14PM +0100, David Brownlee wrote:
> > > > > drm/kms definitely is hugely complicated, overly Linux focussed, and
> > > > > difficult to maintain and update. A lot of effort has been put into
> > > > > getting it to run on NetBSD (and updating from previous versions), but
> > > > > it's currently the only viable game in town.
> > > >
> > > > I also thing it is not *that* far away from working fine.
> > > >
> > > > The releng wiki page lists a bunch of PRs against it, but those are 
> > > > mostly
> > > > hard to fix because the problem only happens on *some* hardware, and
> > > > sometimes only in special scenarios (e.g. serial console used and the
> > > > monitor powered on during drm/kms attaching).
> > > >
> > > > That it all is a mess we probably all agree with.
> > > >
> > > > And this will require more updates, every year - GPU hardware does
> > > > evolve, and available options change.
> > > >
> > > > Using no drm/kms is a good alternative (and works great on NetBSD in 
> > > > general).
> > > > But you loose WebGL and sometimes accalerated video playback, and also
> > > > often support for mulitple displays (but that part might even be easy 
> > > > to fix).
> > >
> > > So far some good changes (cribbing shamelessly from other suggestions) 
> > > might be:
> > > - Implement "boot -D" (or similar) to boot with all DRM disabled, to
> > > make it easier for hardware with issues
> > > - Allow optionally initialising DRM after boot and transferring
> > > console ownership (may add more work in future upgrades, but works
> > > well with above item :)
> > > - Rework wsdisplay to try to reduce abstraction violations and make it
> > > cleaner to work with
> > > - Looking at issues with certain hardware (can probably find systems
> > > to ship if anyone is interested...)
> >
> > The 1), 2) and a part of 3) is what I have in mind for the first step.
> 
> Would this be from the three bullet points above? (In which case,
> wonderful, let me get out of your way :), or your original numbered
> points, in which case I'll may respond later

This is from your 3 points above. Obviously, the 3) will be only
partially made since I will have only one side of the view.

> 
> > As far as I'm concerned, I will not help to "fix" the present state of 
> > DRM/KMS
> > since for me the amount of work already needed to try to make the thing 
> > work is
> > out of proportion and will have to be doubled the next time; this is, 
> > already
> > and even more so for the future, hopeless.
> 
> There are certainly different aspects to fixing the present state of
> DRM/KMS. Working directly on the existing DRM code is definitely
> useful, but very much not the only need.
> 
> I'm just trying to make related suggestions that are likely to be of
> immediate benefit to NetBSD and picked up quickly and imported into
> the tree, plus feedback on your earlier comments (with the same aim).
> Obviously you're at liberty to give each as much (or little)
> consideration as you choose!
> 
> >From your earlier email:
> 
> 1) Sever this DRM/KMS clearly from the kernel [...]
> 
> As alluded to earlier, for ~any hardware on which the kernel can run
> with DRM enabled, it can run without DRM (with reduced functionality).
> Cleaning up the interface between the DRM and the rest of the kernel
> (I'm just going to wave hands and say "wsdisplay") would definitely be
> of some benefit. Would I be correct in assuming this would include
> making the existing DRM "more pluggable" while trying not to impede
> future upstream merges - to allow other things (see 4) to be
> conditionally in its place?

Yes. My general view of the thing is that a kernel is in fact a general
resources manager, implementing a policy for resources sharing, without
any specialization (as a file system orders files without caring what
is in them).

There can be co-processing units for specialized tasks, like Auxiliary
Graphical Processing Units, that can be hardware or software (with
almost any combination in between hardware and software).

So the aim is that the AGPUs can be discovered, listed, and use if
wanted, but supplementary to the kernel; aside or whatever the term
might be. With a pure software basic AGPU being always or almost
always at disposal---there can perfectly be a computer that has only
0D display (isolated leds: "works, doesn't work") or a line of display
1D. AGPU starts at 2D.

And to make things so that some kind of protocol is put between the
kernel and the AGPU so that changing an AGPU has as little impact to the
kernel as possible and that the work can be done almost independently.

Would it be possible and to what extent? This is the question I want to
answer now.

> 
> 2) Audit

Re: DRM/KMS

2023-07-07 Thread tlaronde
Le Fri, Jul 07, 2023 at 03:08:25PM +0100, David Brownlee a écrit :
> On Fri, 7 Jul 2023 at 15:03, Martin Husemann  wrote:
> >
> > On Fri, Jul 07, 2023 at 02:30:14PM +0100, David Brownlee wrote:
> > > drm/kms definitely is hugely complicated, overly Linux focussed, and
> > > difficult to maintain and update. A lot of effort has been put into
> > > getting it to run on NetBSD (and updating from previous versions), but
> > > it's currently the only viable game in town.
> >
> > I also thing it is not *that* far away from working fine.
> >
> > The releng wiki page lists a bunch of PRs against it, but those are mostly
> > hard to fix because the problem only happens on *some* hardware, and
> > sometimes only in special scenarios (e.g. serial console used and the
> > monitor powered on during drm/kms attaching).
> >
> > That it all is a mess we probably all agree with.
> >
> > And this will require more updates, every year - GPU hardware does
> > evolve, and available options change.
> >
> > Using no drm/kms is a good alternative (and works great on NetBSD in 
> > general).
> > But you loose WebGL and sometimes accalerated video playback, and also
> > often support for mulitple displays (but that part might even be easy to 
> > fix).
> 
> So far some good changes (cribbing shamelessly from other suggestions) might 
> be:
> - Implement "boot -D" (or similar) to boot with all DRM disabled, to
> make it easier for hardware with issues
> - Allow optionally initialising DRM after boot and transferring
> console ownership (may add more work in future upgrades, but works
> well with above item :)
> - Rework wsdisplay to try to reduce abstraction violations and make it
> cleaner to work with
> - Looking at issues with certain hardware (can probably find systems
> to ship if anyone is interested...)

The 1), 2) and a part of 3) is what I have in mind for the first step.

As far as I'm concerned, I will not help to "fix" the present state of DRM/KMS
since for me the amount of work already needed to try to make the thing work is
out of proportion and will have to be doubled the next time; this is, already
and even more so for the future, hopeless.

So I maintain the offer and thanks David Brownlee for the offer to send
me the CVS (and yes, the git and hg would be worth too) but since I try
to make a small donation every year to NetBSD it will not "cost" me
something I was not already planning to give, so I ask the core to make
the estimation and once I have donated, if David is doing the work, to
make things so that at least, he gets back what it has costed him to
make and send me the copy I have requested.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: DRM/KMS

2023-07-07 Thread tlaronde
Le Thu, Jul 06, 2023 at 05:57:01PM +, Bruno Melo a écrit :
> > What is frightening me now, is that the impressive amount of work needed
> > to make the thing "work", and only temporarily, is a work that will
> > be lost because the code base will continue to evolve (not to say:
> > dissolve...).
> 
> I think here is real problem. For these you need a team, not any team, you 
> need a big team with different backgrounds for each GPU vendor. 
> 
> But, being optimistic, if you decide to create a new GPU drivers technology 
> using rump and being able to use unmodified GPU drivers for any Unix-like OS 
> reduces costs for these companies and this way promoting rump across open 
> source world, so maybe (just maybe) some of the vendors start to invest in 
> this new technology and other non-GPU drivers vendors can be interested on 
> rump too.

I will know what is achievable or not only after the "reconnaissance"
of the code ;-)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: DRM/KMS

2023-07-07 Thread tlaronde
Le Thu, Jul 06, 2023 at 03:31:13PM -0400, Mouse a écrit :
> > 4. And of course, GPU is hard.  [...]
> 
> This is true.
> 
> I know someone who works on GPU support _with_ the benefit of NDAed
> documentation.  He says typical GPUs have documentation errata lists
> substantially longer than the documentation itself (as in 400 pages of
> doc, 900 pages of errata).
> 
> That said, if tlaronde@ is willing to try, I say go for it.  In case of
> failure, NetBSD is no worse off; in case of success, substantially
> better.  And there's enough track record there that I don't consider it
> nearly as unlikely to succeed as I would for most people (myself
> included).

OK, so I take up the challenge.

What I need from core (someone who has access to the CVS server), is a
copy of the CVS tree. I have only GSM access to Internet, with varying
speed and paying by the byte, so "downloading", supposing it was
available, is not an option.

So to whoever from core:

1) Make the estimate of costs in USB keys needed, time needed for the
operator and shipping costs to France and add a margin/profit for the
foundation;

2) Send me the estimate. I will then make a donation of that amount
(margin included) to the NetBSD foundation;

3) Once I have received the USB key(s), the countdown for a delay of 3
months will start for the achievement of the first step: put DRM/KMS
clearly aside so that the remaining of the kernel will be orthogonal to
its present design and that there will be a framebuffer working,
independant from any code coming from DRM/KMS (or X11, BTW...), DRM/KMS
being only an optional addition.

After three months, I will report success or failure (or enough success
to extend once the period for 3 months to achieve the work).

Whether core is OK or not, please let me know.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: DRM/KMS

2023-07-06 Thread tlaronde
Le Fri, Jul 07, 2023 at 01:05:03AM +0900, PHO a écrit :
> On 7/6/23 20:32, tlaro...@polynum.com wrote:
> > So I'm proposing to go back to the fork (for this part starting by
> > identifying what is immune, orthogonal to the thing) when the Linux
> > DRM way was taken and to eject the Linux DRM/KMS.
> >
> > If DRM has to be addressed, it will be addressed after, but doing it
> > our own way.
> >
> > And I'm proposing to help.
> 
> Wait, are you suggesting we should throw the Linux DRM/KMS away and build
> our framework and individual drivers?
> 
> Hmm... h...
> 
> I agree that it's a huge convoluted alien that is ridiculously hard to tame.
> I'm currently trying hard to make the half-ported vmwgfx driver work on
> NetBSD. In my local tree it now compiles and successfully initializes but
> fails to correctly update the framebuffer.
> 
> There are mainly four reasons why it's so hard to make it work:
> [...]

The 4 reasons you gave are what, from an external point of view, were
already my conclusions.

But let me a take a real world example.

More than ten years ago---ten years!---I proposed to develop a software
solution for a certain task to an enterprise. The answer was: it will
take years to develop, we have not the time! There are current solutions
that can be adapted for the task.

Ten years later, I incidentally learned that they were still searching
for a solution to the problem since the existing solutions have never
managed to... solve the problem.

In the mean time I had developed a framework that made it easy to solve
the problem, not with years of development anymore (because these years
have already been spent developing the framework and not trying to
patch an existing Titanic---gigantic and sunking).

I do know that GPU is hard, trendy since the improvements in the CPU
is almost to a stand still, and is complex and that the industry is
making it even harder by not documenting things.

But I think the time spent, constantly, to try to make the thing sort
of work, while it is a code nightmare, is a waste of time.

What I'm proposing is:

1) Sever this DRM/KMS clearly from the kernel; that it is not on by
default and so that it can not hamper the kernel; and making, on the
kernel side, no decision "bended" concerning the display and the
framebuffer just in order to accommodate this DRM/KMS: do it totally
orthogonally and in order to make it the way
it seems logical, matching what the kernel has to achieve;

2) Audit the DRM/KMS code, going back to its source---Linux---without
the addition of FreeBSD own patching, that simply adds noise to what
seems to have already a poor noise/signal ratio, to see if we can sever
the end driving (your point #4) code from the rest of the gaz-work;

3) In all cases, design what seems to be logical, sound and maintainable
to deliver the service with the kernel, blending correctly with
the other tasks (and in fact, design "on paper"
without implementing something at first; the time spent on the paper
and trying to consider the problem from all the angles is time gained
when implementing or maintaining; and in this area the experience of
the ones who had been confronted to the "thing" will be invaluable to
say at least: "not like that!"); if the need for new kernel interfaces
arises, OK; but considering the kernel as a whole, with an interface
blending correctly with everything else and not mimicking something
else simply to be able to run the alien code;

4) If 2) has shown that the end part can be re-used, good. If not,
let's choose one good card (this means a card with "advanced" features
but with documentation) for a template driver and let the need
encourage people to write such a driver for the cards they want to
use with DRM support.

What is frightening me now, is that the impressive amount of work needed
to make the thing "work", and only temporarily, is a work that will
be lost because the code base will continue to evolve (not to say:
dissolve...).

And the feeling from your own description is that even you consider
that the code is not worth the effort because we will never be
sure that it can indeed work correctly.

If, instead of returning to the last CERL published version of GRASS, I
had wanted to "fix" GPL GRASS or if, instead of returning to the very
early (not GPL: Public domain indeed) web2c TeX utilities, I had wanted
to "fix" the current distribution, I would still be at it, or,
more probably, would have dropped the task after some years before
seeing the end of it.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


inetd(8) bugs

2023-07-06 Thread tlaronde
You will find attached an example of a configuration exercizing bugs in
the current implementation of inetd(8).

Emphasize: this is the current inetd(8) not my implementation.

How it works:

There is the possibility to set a default host address by specifying an
address with a trailing ':' and nothing after.

The problem is that the current implementation allows continuation lines
(for legacy or v2 syntax) and swallows any continuation line without
verifying if a statement (my definition; it is not define in the current
man page) has begun.

If there are after empty lines (optional) and an unfortunate leading
blank in the next non empty or blank line, the defhost statement
is concatenated with the remaining and hence not considered as a
defhost statement: the address is applied to the first other line
(here the line starting by a blank and 5432), but the default
remains '*' (any) and if one invokes, not as root:

$ inetd -d bug.conf

the next entry: 5433 is rejected because it can't be applied (if not
root) to any. (If you suppress the leading blank before 5432, the
defhost is set and 5433, applied to 127.0.0.1, succeeds).

Other problems:

- A line starting by a blank is a continuation line; so a statement
must start at the beginning of the line; but the parsing accepts 
leading blanks for the beginning of the statement; this comes from
the problem of having introduced ';' as a statement terminator in v2
syntax (while this was superfluous) and allowing blanks around it---this
is why in the syntax I have re-specified, the end of a statement is
a new line or, for a non-empty
statement (and only a non-empty statement): '[[:blank:]]*;[[:blank:]]*',
the only way to be able to define an empty statement AND to allow
continuation lines for non-empty statements i.e. statements that have
begun.

- It is not said in the manual page, but handling of quotes is done for
anything, legacy syntax included;

- Quoting is said to be '/* Parse shell-style quotes */' while there
is no difference made between single quotes quoting and double quotes
quoting (contrary to shell quoting);

- Escape sequences work only in v2 and only between quotes: not outside.
This renders them almost useless---because quoting allows to tokenize
and incidentally "escape" special characters inside quotes;

- The way the parsing is done, the "#@ []" and "ipsec =" can only
take one argument.

There is one thing "curious" when testing (not as root) the bug.conf
attached. Here is the result:

---8<---
$ inetd -d /tmp/bug.conf

/tmp/bug.conf line 5: Found service definition '5432'
ADD : 127.0.0.1:5432 proto=udp, wait.max=1.5, user:group=root:(null) builtin=0 
server=test_server policy="in discard" 
/tmp/bug.conf line 15: Found service definition '5433'
/tmp/bug.conf line 21: Ignoring invalid definition.
1 service(s) loaded.
Going away.

--->8---

After "Going away\n" there is a spurious:


This comes from the ipsec and is sent to stdout (other messages from
inetd(8) are sent to stderr).

I have not searched to identify where it comes exactly from (but this is
ipsec related since, if one suppresses the default ipsec directive in
the bug.conf, no quadruple double quotes are printed).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C
#@ "in discard" 
# Invalidate wrong deprecated defhost directive.
127.0.0.1:
 
 5432 on
protocol = udp,
wait = yes,
user = root,
service_max = 5,
ip_max = 3,
exec = test_server,
args = test_server dgram wait '\x00some_arg';

#Test ip_max of 0
 5433 on
protocol = udp,
wait = yes,
user = \x72oot,
ip_max = 0,
exec = test_server,
args = test_server dgram wait;


Re: [CODE] inetd FINAL

2023-07-03 Thread tlaronde
Le Mon, Jul 03, 2023 at 08:23:13PM +0100, David Brownlee a écrit :
> Some random thoughts :)
> 
> - Would it make sense to actively reject -l or similar when -c is given

I don't think so. They can be a "slip of the thumb" when passing
switches and they do no harm---the problem is that the checking is not
purely lexicographical and syntaxical; there is a bit of semantics
involved too: if the protocols are supported or not etc. and a user may
be a little hesitant about the switches and could think that to validate
a config to be served, the safer way is to call the program with exactly
the same options as for the server, except requesting only the checking.

Not illogical.

So I think these should be ignored---allowing rc(8) to in fact simply
prefix or postfix inetd_flags with " -c " in order to validate a config.

> - Resilient mode would probably benefit from a new {sub ,}section
> heading in the man page

Probably. And it needs a bit of rework too to make it perfectly clear
(difference between startup, when fallback can be taken if problem, and
later, when fallback is not read unless specifically requested with
USR1 signal).

> - I'd be inclined to reject a config which tries to embed a null with
> \000 or similar with an error

I think too that such an interpreted escape (passing literally the
string '\000' is not a problem) is obviously whether an error or a very
convoluted way to say that the value is the empty string---I forgot to
add that "keyword = [,;\n]" is a valid way of saying value == "" in the
man page (simply because it was allowed in the previous implementation).

Since when spawning, every argument is passed as a C string, such an
embedded null char can only truncate a string, so I don't see how it
could have any legitimate usage---except, as explained above, as a
convoluted way of saying that the value is the empty string; but, in
some sense, it is knowing too much about the implementation: the config
is user level strings and an empty string is whether nothing between
equal and separator or terminator; or '' or ""; the way, internally,
strings are handled should not be the concern of the user.

Is someone seeing a possible legitimate use of this?

If not, yes, I think too that it should be an error.

There is one more area that needs discussion: compatibility.

My implementation is compatible except for one thing: previous
implementation treated quoting between single quotes the same as quoting
between double quotes; in my implementation, it works like with sh(1):
between single quotes, no interpretation at all---and one can not put
an escaped single quote between single quotes.

And for other compatibility features but deprecated syntax: I would be
inclined to treat the deprecated features:

listen-address:

(for specifying the default host; but this causes problems if there are
spurious blanks as I will explain in the ChangeLog. This is replaced
by:

.defhost [value]
)

and:

#@ [ipsec_policy...]

(abusing a comment. Replaced by:

.ipsec [value...]
)

I would be inclined to treat deprecated features as faults in checking,
and only warnings in running during one release (in order for present
configurations to still work, when upgrading, but warning the
administrator that the period of grace will be limited).

And drop the compatibility the following release (deprecated features
then error even when serving).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[CODE] inetd FINAL

2023-07-03 Thread tlaronde
Here is the final code with all implemented:

http://downloads.kergis.com/misc/inetd.tar.gz

The new version:

$ ls -l ./inetd
-rwxr-xr-x  1 alceste  wheel  76136 Jul  3 19:55 ./inetd

$ size ./inetd
   textdata bss dec hex filename
  563073008   11016   70331   112bb ./inetd

vs:

$ ls -l /usr/sbin/inetd
-r-xr-xr-x  1 root  wheel  70752 Jun  4 13:24 /usr/sbin/inetd

$ size /usr/sbin/inetd
   textdata bss dec hex filename
  5250820406088   60636ecdc /usr/sbin/inetd


I thank Mouse for having corrected a previous version of the man page,
but since I have reworked it in the mean time, don't blame him for
errors or else that can have been introduced since.

I will have to make a ChangeLog to explain what is not working in the
current implementation in NetBSD tree; the modifications I have made
and the additions. But for the present state, all should be explained
in the man page.

You can start playing with it.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): security considerations

2023-07-03 Thread tlaronde
Le Mon, Jul 03, 2023 at 08:36:23AM -0400, Mouse a écrit :
> > There is one more thing I'd be inclined to add: when _serving_ a
> > config as root[*], error if the configuration (including sourced
> > chunks) is writable by someone else than root.
> 
> > What do you think?
> 
> A reasonable thing if it's an overridable default.  An extremely
> annoying thing (albeit only occasionally) if it's non-overridable.
> 
> Also, I'm not sure how I'd modify that if the UID it's serving as is
> someone other than root.

For the moment, I have written it as an error if in server mode
and if uid == root. For another user, the check is not done since 
various combinations are possible and, for me, legitimate with no clear
pattern.

I can create a server flag '-s' for "strict" mode, enforcing the check,
and not set it by default.

YMMV. Since there is a checker mode, and there is no privilege needed
and no error (file(s) need only to be readable) when checking, I tend to
think that when writing or verifying, permissions can be whatever
so it is not hampering the work; but when installing the config for
serving it, putting the file only under root writability is a safety
precaution too (against one's own blunders).

There are pros and cons either way---meaning that, you are right, it has
to be configurable; remains the question of: what should be the default?
Strict or not?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-07-03 Thread tlaronde
Le Mon, Jul 03, 2023 at 01:36:54PM +0200, ??? a écrit :
> On Mon, Jul 03, 2023 at 06:13:45AM +, David Holland wrote:
> > On Fri, Jun 30, 2023 at 05:51:13PM +0200, tlaro...@polynum.com wrote:
> >  > For this one I will go with the established behavior, but what should I
> >  > do when someone is passing, in octal or in hexa: "\000" ou "\x00"?
> > If you don't support embedded nulls in the strings you're handling
> > (and most things don't), it's an error.
> Or just expand everything unchecked and document that input must
> be a text file before and after expansion, then it's the user's
> fault, especially if they don't know what that means.
> 
> That's how dad did it after all, and that's how other programs in
> the distribution behave.

The String Auxiliary Processor that I have written and is part of inetd
handles strings with length, not relying on '\0' when walking the string
hence allows embedded '\0' (it takes null terminated strings on entry,
but manipulate series of bytes internally and indeed accepts to split
a string with embedded '\0'---I use it in order to make only one
allocation for all the strings of the structure).

So I will accept the embedded escaped nuls even if this will simply
truncate the string afterwards since the remaining of the code
manipulates char *. But this can't, as far as I see, introduce a
security problem. So, as long as the result is acceptable, let it be.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


inetd(8): security considerations

2023-07-03 Thread tlaronde
I'm almost finished with inetd(8)---I still wait for an answer about ATF
tests: to be added if my version of inetd will reach the NetBSD src
tree; if not reaching the NetBSD src tree, I will not bother with ATF.

There is one more thing I'd be inclined to add: when
_serving_ a config as root[*], error if the configuration (including
sourced chunks) is writable by someone else than root.

What do you think?

*: checking mode is unprivileged and can be done by whoever with
whatever readable configuration.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-07-02 Thread tlaronde
Le Sat, Jul 01, 2023 at 06:39:32PM -, Christos Zoulas a écrit :
> In article <20230603120221.0766b60...@jupiter.mumble.net>,
> Taylor R Campbell   wrote:
> >> Date: Sat, 3 Jun 2023 13:45:44 +0200
> >> From: tlaro...@polynum.com
> >> 
> >> So I suggest to add a mention of sysexits(7) to style.
> >
> >I don't think sysexits(7) is consistently used enough, or really
> >useful enough, to warrant being a part of the style guide.  Very few
> >programs, even those in src, use it, and I don't think anything
> >_relies_ on it for semantics in calling programs.
> 
> I agree; nothing really uses sysexits except inside sendmail perhaps...
> It has been around for more than 40 years:
> 
> ^As 00062/0/0
> ^Ad D 1.1 81/10/15 20:29:54 eric 1 0 
> ^Ac date and time created 81/10/15 20:29:54 by eric
> 
> and one would think that if it was useful, it would have caught on by now.
> 

Since you don't discuss anything particular to sysexits, your sentence
is then a broad, general judgement. So let's see:

"NetBSD 1996 -- 2023

It has been around for 28 years. And one would think that if it was
useful, it would have caught on by now."

If the former is true, the latter is true. Is this latter true?

As far as I'm concerned, even if the latter is true, in numbers, it has 
absolutely
no bearing on the usefulness of NetBSD. Only on the stupidity of the
mob.

And since you mention init(8), the funny thing is that we are discussing
about a server that, generally, daemonizes and is hence reparented to
init(8)...

And when it does not daemonize, it is in debugging mode, and providing
an information that the program has, and that will be lost when casting
all errors to EXIT_FAILURE, is a debugging feature...

Not to mention that if there was the user interface equivalent of
strerror(3) (a sysstrerror(1)), one would not have to plague the programs
with variable strings, more or less accurate.

And, if a daemon was reparented to a daemon server, not closing stdin,
stdout and stderr, but redirecting then, this will allow to pass
commands to the server via its stdin, solving the problem that was
discussed, incidentally, in the course of the inetd(8) thread. And this
super daemon could make something of standardized return statuses if
only for stats purposes.

sysexits(3) is a good idea. And this is probably why it hadn't caught
on.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Trivial program size inflation

2023-07-02 Thread tlaronde
Le Sat, Jul 01, 2023 at 12:04:25PM -0700, Jason Thorpe a écrit :
> 
> > On Jul 1, 2023, at 8:20 AM, tlaro...@polynum.com wrote:
> > 
> > This is also what I meant by "static seems to be considered deprecated".
> 
> Honestly, I find the obsession with static linking hilariously quaint.  
> NetBSD already bends backwards to an extreme degree by ensuring that old 
> version of *system calls* work correctly *in the kernel*.  Some other systems 
> provide the ABI contract at the libc boundary, and let the libc <-> kernel 
> interface be fluid (keep the compatibility stuff in user-space where it 
> belongs!)

If it comes to this particular case, the binary format is ELF; and ELF
has an interpreter field; wouldn't it be possible to have versionned
ld_elf.so providing emulation?

> 
> Obviously this is not feasible to do with static binaries, and we have one 
> platform that ONLY supports static, but for the rest, I honestly think we 
> should officially deprecate static linking in the general case (obviously it 
> has some uses in super-constrained environments, but in those cases we are 
> often also using crunch?d binaries as well).

It is curious that you react this way in a thread where, you as others,
have had your jaw drop seeing the size of a literally do_nothing
executable. This was unseen precisely because few use static linking.

Dshared encourages "inflation".

Dshared is a way to hell---there is not only Windows DLLs Hell. Haven't
you never had a third party application, from pkgsrc, needing to be updated,
and because even trivial libraries of a few kilobytes are linked
dynamically, the thing considers that the previous version is not "good
enough"---while there is no API nor ABI change---and forces to upgrade it,
rendering all the other programs linked against the previous version not
executable (while dshared is advertized as allowing concurrent versions
of a same library, generally only one version is allowed, the other one
being desinstalled) forcing one to upgrade absolutely anything for, in fact, a
library that generally simply implements "Hello, world!"---but with a
ton of fat---?

Not to mention all the security problems implied by the searching
feature for an elf dshared with rpath so that without extra care one can
not be sure that what will be executed is what was expected.

Dshared is not exactly what I will call a panacea.

But as has written Henri Poincar\'e: "To believe everything or to doubt
about anything are two different ways of being equally superficial." (Tout
croire ou douter de tout sont deux façons différentes d'être également
superficiel.)

I sometimes use totally static; frequently use partially static hence
frequently use also partially dshared; and sometimes use totally
dshared.

There are uses for both static and dshared (if there was shared static,
I will also use this in some cases).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Trivial program size inflation

2023-07-01 Thread tlaronde
Le Sat, Jul 01, 2023 at 03:00:21PM +, RVP a écrit :
> On Sat, 1 Jul 2023, Mouse wrote:
> 
> > dlopen, that doesn't make sense to me.  For a statically linked
> > program, the linker can tell whether it calls dlopen et al.
> > 
> 
> Oh, I didn't mean the program needing to call dlopen() directly.
> libc itself may load shared objects to support things like i18n and NSS
> on an as-needed basis.

This is also what I meant by "static seems to be considered deprecated".

The libc (and the crt) depend on things one doesn't know or guess, and
you end up whether with a "static" binary that is in fact
dynamic---because there were missing features added by default as
dynamically shared libes---or a failure because you have not added
libes you didn't know about and that vary from system to system, and
from release to release.

The result is that people prefer to compile dshared because it works
by default doing things "magically"...

And only old guys like me still care about the size of the
executable... (But curiously enough, the "young" ones are always talking
about "ecology" and "saving the planet", while they don't care a hoot
about the memory, storing, energy, their use of software or their
softwares waste... I wonder from time to time if, for example,
"researchers" used the correct utilities for their papers about "global
warming" instead of some common huge beasts (proprietary or open source),
 we will not be in an ice age instead... )
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: ipsec: slight inconsistency

2023-07-01 Thread tlaronde
Le Sat, Jul 01, 2023 at 10:49:45AM -0400, Greg Troxel a écrit :
> tlaro...@polynum.com writes:
> 
> > The two functions are said "inverse" from each other but the problem is
> > that if one gives a delimiter to ipsec_dump_policy(3) that is neither
> > a blank nor a new line, the string obtained can not be an input
> > to ipsec_set_policy(3). So there are not really inverse from each other.
> >
> > Wouldn't it be more logical whether to have no delimiter to
> > ipsec_dump_policy(3) (defaulting to '\n' for separating the elementary
> > statements) or to allow a delimiter to ipsec_set_policy(3) when parsing
> > the policy passed?
> 
> I think it would be most logical to document in ipsec_dump_policy that
> the default delimeter matches what is expected by ipsec_set_policy, and
> that alternate delimiters might be useful for people but do not produce
> valid syntax.  That resolves your consternation but does not break
> anyone relying on the current behavior.  This problem is surely
> longstanding and that you seem to be the first to notice or care, so the
> severity would seem to be extremely low.

Yes, at least for it to be documented.

But there is a more general lesson to draw from this and inetd(8): when 
giving a syntax, it has to be checked from all angles in order to settle
for something that makes sense and a documentation that matches. And the
documentation is really part of the implementation---in fact, it should
come first, and be amended while you implement...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


ipsec: slight inconsistency

2023-07-01 Thread tlaronde
[Still inetd(8) related.]

I'm rewriting the inetd/ipsec code since some acrobatics are not really
needed but when wanting to use directly ipsec_set_policy(3) and
ipsec_dump_policy(3), there is a slight inconsistency.

The two functions are said "inverse" from each other but the problem is
that if one gives a delimiter to ipsec_dump_policy(3) that is neither
a blank nor a new line, the string obtained can not be an input
to ipsec_set_policy(3). So there are not really inverse from each other.

Wouldn't it be more logical whether to have no delimiter to
ipsec_dump_policy(3) (defaulting to '\n' for separating the elementary
statements) or to allow a delimiter to ipsec_set_policy(3) when parsing
the policy passed?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Trivial program size inflation

2023-06-30 Thread tlaronde
Le Fri, Jun 30, 2023 at 01:37:10PM -0400, Mouse a écrit :
> Based on something at work, I was looking at executable sizes.  I
> eventually tried a program stripped about as far down as I could:
> 
> int main(void);
> int main(void)
> {
>  return(0);
> }
> 
> and built it -static.  size on the resulting binary:
> 
> sparc, my mutant 1.4T:
> 
> textdatabss dec hex filename
> 12616   124 288 13028   32e4main
> 
> amd64, my mutant 5.2:
> 
>text  data bss dec hex filename
>  152613  4416   16792  173821   2a6fd main
> 
> amd64, 9.0_STABLE (ftp.n.o):
> 
>textdata bss dec hex filename
>  562318   29064 2176416 2767798  2a3bb6 main
> 
> 12K to do nothing is bad enough (I'm going to be looking at why it's
> that big).  149K is even more disturbing (I'll be looking at that too).
> But over half a meg of text and two megs of BSS?  To do nothing?
> Surely something is wrong somewhere.

What are the compiler (especially the linker) and compiler version on
your 1.4T vs. 9.0?

Having the obligation to support a myriad of systems for kerTeX, I have
seen that, unfortunately, static linking is considered nowadays a second
rate feature if not a deprecated one, and this is a part I have to
adjust from time to time, generally for Linuces, to obtain static binaries.

Other thing to look at: the so called i18n. On NetBSD, iconv doesn't
work with static linking, but could it be that its object are
nonetheless added too? As well as threading and so on?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-30 Thread tlaronde
Le Fri, Jun 30, 2023 at 03:37:18PM +, David Holland a écrit :
> On Wed, Jun 28, 2023 at 06:32:10PM +0200, tlaro...@polynum.com wrote:
>  > > If you want to write a two digit octal number you can not continue with
>  > > another ocatal digit. In C you could do "...\77" "7" and have it concat
>  > > the literals. In config files (without concatenation) you need some
>  > > other trick.
>  > 
>  > I beg to differ: since due to this very unfortunate "variable length"
>  > feature, your scanner has to read char by char, it can reject the third
>  > digit since it would yield an out of range byte value.
> 
> The behavior of escapes in C strings is widely used and well
> understood. Don't improvise.
> 
> There are such things as invalid inputs. Reject them with a reasonable
> diagnostic message instead of trying to guess what the user might have
> meant. Works out much better in the long run.

For this one I will go with the established behavior, but what should I
do when someone is passing, in octal or in hexa: "\000" ou "\x00"?

I have decided that this value will be reput, back, as an escape
sequence (possibly for an argument of some program), since if the
program "interprets" the escape sequence (as current inetd(8) does),
while manipulating internally, obviously, C strings, it will certainly
not provide what was intended... (supposing the user knows what he
wants, and this is, I admit, quite an optimistic view).

So what is established behavior in this case---and, BTW, most utilities
ignore errors with octal sequences (printf(1) for example).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[CODE] inetd OK

2023-06-29 Thread tlaronde
The new version passes the tests for current ATF, and does what was
proposed.

I have corrected two blunders:

- wrong macro: the debug flag (in the usage and in support) was
conditionnally added with:

#define DEBUG
#else
#endif

when the macro is DEBUG_ENABLE...

- the program writes via syslog only if syslogging, and I had forgotten
to define the boolean syslogging to true when it was syslogging...

The corrected tarball is here:

http://downloads.kergis.com/misc/inetd.tar.gz

Things to do:

1) I'd like a native english speaker to review the MAN page---the MAN
page needs some more work for the description of the syntax but this
will be completed as long as the 2) below;

2) I'd like to know if NetBSD will later (NOT for 10.0) take the code or
not, in order to know if I has to put the tests in the ATF form or
not---because I will use my version for my own usage and publish
it along the other open sources things I publish if NetBSD doesn't
want it, but in this case will add tests my own way.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 06:58:57PM +0200, Roland Illig a écrit :
> Am 28.06.2023 um 12:57 schrieb tlaro...@polynum.com:
> > But isn't it incorrect? POSIX 2018 says:
> >
> > '"\ddd", where ddd is a one, two, or three-digit octal number, shall be
> > written as a byte with the numeric value specified by the octal number.'
> 
> The main intended takeaway from this sentence is that \ is not a
> single escape sequence but rather the escape sequence '\000' followed by
> the digit '0'. That's a different to the hex escape sequence introduced
> by C90, which allows an arbitrary number of digits, so '\x12'
> forms a single escape sequence.
> 
> That sentence defines that '\778' is parsed as '\77' followed by the
> digit '8', as '8' is not an octal digit.
> 
> That sentence also says that '\777' is parsed as a single escape
> sequence (due to the common lexer rule that at each time, the longest
> possible token is matched), as '777' is a syntactically valid octal
> number. The range constraints are usually not expressed in the grammar,
> they are left to another layer of the parser or interpreter instead.
> 
> So '\778' should be parsed as '\77' followed by '8', and '\777' should
> be parsed as '\777' and then rejected as out of range, just like a port
> number 7 is rejected as well.

OK for the interpretation linked to the lexer. But as for the "reject",
POSIX says nothing, and the result is simply truncated to 8 bits.

The devil is indeed in the details...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 04:24:20PM +, RVP a écrit :
> On Wed, 28 Jun 2023, tlaro...@polynum.com wrote:
> 
> > But you can't: from the syntax given, \777 is a perfectly valid \77
> > octal sequence followed by the character '7'.
> > 
> 
> That would be a very surprising way to resolve the ambiguity which is
> present here. There are others when it comes to octal notation:
> 
> Single-digit octal escapes can be confused with regexp back-references, so
> POSIX says octal escapes must have at least 2 digits in certain situations.
> 
> As for resolving \777 as \777 and not \77'7 is this note in the EXTENDED
> DESCRIPTION for tr(1) (I knew I had read this somewhere in my travels through
> POSIX-land):
> 
>\octal
>   Octal sequences can be used to represent characters with specific
>   coded values. An octal sequence shall consist of a 
>   followed by the _longest_ sequence of one, two, or three-octal-digit
>   characters (01234567).
> 
>(my emphasis)
> 
> What's good for the goose is also good for the gander, I say.
> 

OK, in this case if this is specified somewhere, and linked to the way
lexers behave, I will go with this. (It would be good if POSIX in a 
revision could suppress all the "singular" explanations of octal, put
a common specified definition in one place, and link to it.)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 12:45:55PM -0400, Mouse a écrit :
> >>> "\ddd", where ddd is a one, two, or three-digit octal number, shall
> >>> be written as a byte with the numeric value specified by the octal
> >>> number."
> >> [...]
> > I beg to differ: since due to this very unfortunate "variable length"
> > feature, your scanner has to read char by char, it can reject the
> > third digit since it would yield an out of range byte value.
> 
> Would it?  Only if your bytes are smaller than nine bits - or if
> they're signed and smaller than ten bits.
> 
> Is the size of a `byte' specified anywhere?

>From memory, in POSIX a char is a byte (8bit) representable value (it
may be implemented with a wider size, but only this range is valid).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 06:06:38PM +0200, Martin Husemann a écrit :
> On Wed, Jun 28, 2023 at 05:59:10PM +0200, tlaro...@polynum.com wrote:
> > "\ddd", where ddd is a one, two, or three-digit octal number, shall be
> > written as a byte with the numeric value specified by the octal number."
> > 
> > ? Because I parse it as: an octal escape sequence can be \d, or \dd or
> > \ddd; and the result is a byte value.
> 
> Exactly. But for the parser the "byte value" is irrelevant, that part is
> semantics (and checked later). Syntactically you write an octal number
> with upto three digits.
> 
> If you want to write a two digit octal number you can not continue with
> another ocatal digit. In C you could do "...\77" "7" and have it concat
> the literals. In config files (without concatenation) you need some
> other trick.

I beg to differ: since due to this very unfortunate "variable length"
feature, your scanner has to read char by char, it can reject the third
digit since it would yield an out of range byte value.

And it shall be emphasized that POSIX says strictly nothing about this:
what is the correct interpretation: swallow up to three digits in the
0-7 range, not evaluating the value that may be out of range ? Or
swallow up to three digits in the 0-7 range as long a the value is in
the byte range? The latter seems more consistent than the former, but
neither is in the spec.

And the whole "variable length" feature should never have been
"standardized" specially for user level utilities: it is almost
impossible to verify a script because it is impossible to parse
correctly such a string from a cursorily look. It is a highway to
 security hell.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 05:26:55PM +0200, Martin Husemann a écrit :
> On Wed, Jun 28, 2023 at 05:01:46PM +0200, tlaro...@polynum.com wrote:
> > But you can't: from the syntax given, \777 is a perfectly valid \77
> > octal sequence followed by the character '7'.
> 
> No, from the Posix text you quoted it clearly is a three digit ocatl
> sequence, and its value is out of range.

Since you are not either an english native speaker, Would you mind
explaining me how you parse the POSIX sentence:

"\ddd", where ddd is a one, two, or three-digit octal number, shall be
written as a byte with the numeric value specified by the octal number."

? Because I parse it as: an octal escape sequence can be \d, or \dd or
\ddd; and the result is a byte value.

You seem to parse it as: an octal sequence is always three digit \ddd,
but translated in a "three digit" byte value; this doesn't make sense
for me, because if you specify that it is a byte, you specify the range
[0, 255], and the number of digits has nothing to do with it.

It this is your interpretation, this is not what printf(1) does:

$ printf '\11|\n'
|

$ printf '\74\n'
<

and you might try the BEL if your terminal supports it for a single
digit case. This is variable length. If printf doesn't take '|' or
'\n' because it can not be an octal, why would it take \777 that
can't be the value of a byte?

So why assume that an incorrect three digit octal should be interpreted
when a correct two digit octal followed by a character (that happens to
correspond, in ASCII, to a digit) is a valid string in this case?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
Le Wed, Jun 28, 2023 at 01:10:04PM +, RVP a écrit :
> On Wed, 28 Jun 2023, tlaro...@polynum.com wrote:
> 
> > But isn't it incorrect? POSIX 2018 says:
> > 
> > '"\ddd", where ddd is a one, two, or three-digit octal number, shall be
> > written as a byte with the numeric value specified by the octal number.'
> > 
> > since 477 -> 777 are not byte values, shouldn't \777 be interpreted as
> > '\77' octal then the digit 7; \677 -> \67 octal then 7 etc. ?
> > 
> 
> Better to do what the C compiler does: try converting up to 3 digits, and
> if the result of the conversion is > 255, complain.

But you can't: from the syntax given, \777 is a perfectly valid \77
octal sequence followed by the character '7'. Why would the program
error when what is given is, from the syntax, perfectly valid because
the octal sequence is an unfortunate variable length feature---it is
even not specified that the octal sequence will function like the '*' in
regex that is it will try to swallow the maximum of digits that follow
in the range 0-7.

If you try '\778', it will print '?8' and I think it is the correct
behavior from the syntax description.

I would hope that POSIX in a revision specify that this loosy
feature of "variable length octal" is deprecated and discouraged
(unfortunately it can probably not be nuked) and that the new
sequence "\oDDD": backslash, 'o' and exactly three digit from 000
to 377 is the way to unambiguously express an octal number from
now on.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread tlaronde
When refactoring and rewriting the scanning/parsing code for inetd(8), I
wanted to add, too, the possibility to pass octal escape sequences
(hex were already added) in order to be less surprising and to,
actually, support whatever an admin is acustomed to use when invoking
utilities.

Since this is unfortunately variable length, I first implemented it as
if escape and digit between 0 and 7 -> octal; if next digit between
0 and 7, shift left by 3 and add the decimal value; repeat for next.

The problem is that \777 is accepted -> 0377; \677 -> 0277;
\577 -> 0177; \477 -> 0077.

So I had the curiosity to use printf(1) and sh(1) to see how they
handled them.

Well: this way.

But isn't it incorrect? POSIX 2018 says:

'"\ddd", where ddd is a one, two, or three-digit octal number, shall be
written as a byte with the numeric value specified by the octal number.'

since 477 -> 777 are not byte values, shouldn't \777 be interpreted as
'\77' octal then the digit 7; \677 -> \67 octal then 7 etc. ?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[RFC] String Auxiliary Processor: a... SAP!

2023-06-27 Thread tlaronde
When re-organizing the parsing code of inetd(8), I wanted to achieve
several goals:

1 - Keep the buffer with the statement as is in order to provide the
context in case of syntax error (and improve efficiency by not shifting
in the same buffer strings around);

2 - Make almost no memory allocation at least while in the parsing phase
for a service;

3 - Be able to reconstruct and to publish a v2 version of a successfully
parsed configuration file.

3) happened to be problematic because the successfully parsed config
has to be sent to syslog and translating the servtab to a correct config
definition would have implied whether to add a line for every field
or to construct a line in a buffer, dealing constantly with sizes and so
on.

In fact, a solution solved all the problems at once: the SAP...

It is a very basic "machine" with a stack of strings---not a stack of
pointers: the strings are written in a buffer of fixed size.

The machine provides unary and binary operations. The unary operations
deal with the top of the stack (the last string "pushed"). Binary
operations deal with the top and its predecessor.

A string is "pushed" on the stack and can be manipulated: taken as is;
unquoted; quoted; escape sequences interpreted. All the lexical details
are handled by the SAP.

A string can be pushed creating a new entry or it can be appended to the
string on the stack.

Unary operations:

sap_pop()
sap_dup()
sap_split(char) : split string on top like strrchr(3)---I needed this
one; other options can be added.

sap_store(flag, ...): stores the string, whether as a string (allocated),
as a string to a FILE* or a as an interpreted binary
value handling type, size, sign and bound. This pop's
out the string.

Binary operations:

sap_swap()  exchange inplace the two last strings
sap_join(char)  assemble the two last strings with the char as
separator;
sap_merge() put last string directly at the end of previous one

It shall be noted that you can join with '\0'. This means that one
creates a single string with chunks separated by '\0'.

This solved my memory allocation problem: I only allocate once a
whole record with sap_store(), having "joined" the fields, and having
pop'ped out the binary values, so that the members of the servtab
structure all point inside a single allocated memory chunk.

This is small and consist of the two files sap.h and sap.c in the
inetd(8) sources here:

http://downloads.kergis.com/misc/inetd.tar.gz
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[CODE] inetd(8)

2023-06-27 Thread tlaronde
You will find here:

http://downloads.kergis.com/misc/inetd.tar.gz

the sources for implementation of what I had proposed for inetd(8):

- Create a checker mode;
- Never serve a not valid config;
- Resilient mode;
- Fallback feature;
- Restate the syntax (there are extensions to bypass things I find
unfortunate; example: instead of overloading comment with '#@' just
add a ".ipsec" directive; as well add a ".defhost" directives instead
of acrobatics about "host:" on a line by itself---these are still
supported but a warning is emitted to encourage to use the new
directives).
- Publish a validated config via syslog or on stdout (if requested in
checker mode with the -d flag);
- The exit status is one of sysexits(3).

I have untangled things and totally rewritten the scanning and parsing:
parse_v2.c doesn't exist anymore, but since I needed to publish back
the config, I developped a String Auxiliary Processor i.e. a... SAP!
and it is put aside because it could be more widly useful.

The SAP deserves more explanations and I will send a separate message
about it later.

The result adds only 1ko to the executable.

The syntax is compatible with the existing except for one thing: I make
a difference between single quotes and double quotes: a sequence
between single quotes is taken "as is"---like in sh(1); between
double quotes, escape sequences are interpreted.

Extensions:
- I cover the whole escaping club and added octals---but for
the moment this is compatible with printf(1) but this is unfortunate
because \777 is a valid octal... \oDDD imposing three digits would be
better;
- The escape sequences are allowed anywhere, and not only inside
quoted sequences, when a value is expected;
- I have added three directives: ".ipsec", ".defhost" and
".grestore", the latter being a flag instructing to reset defhost and
ipsec to the values they had before the inclusion of the file (this for
compatibility with previous implementation).

The ".include" now works as in the shell: it is the same as if the
lines had been directly written in the config file with the exception of
continuating lines: a statement starting in one file must end in the
very same file.

I have added the essentials in the man page but it needs more
attention.

The program compiles and runs but it needs a complete bunch of tests
(there are corner cases in the syntax and the rate_limit and ipsec
stuff have been only superficially touched; as well as the accept filter
with a comment, in the original sources, still saying it needed to be
verified that the filters still worked...).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


inetd(8): final version tomorrow

2023-06-26 Thread tlaronde
I will release the final (minus further bug hunting or cosmetic changes)
tomorrow.

I have extensions to the syntax and one incompatible change:

The current syntax treat single quoting and double quoting the same and,
furthermore, escape sequences are only interpreted inside quoted
segments.

In my implementation, single quoting is the same as in shell: no
interpretation at all. And escape sequences can appear, but only for
_values_, inside double quotes as well as outside quotes.

(Furthermore this quoting and escaping lexicographical sugar can be
used in positional definition as well, since all are values in this
case.)

This will have to be discussed.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: vi(1) one line crasher

2023-06-23 Thread tlaronde
Would you mind adding your input to gnats bin/57482?

Thanks!

Le Fri, Jun 23, 2023 at 12:56:05AM +0200, Martin Neitzel a écrit :
> > And to be more exhaustive (perhaps it's this) is here my EXINIT:
> > EXINIT='set nu showmatch ts=8 wl=72'
> 
> Ah!  I can seg-fault vi(1) now, too!
> 
> It boils down to:  a "set nu" alone + "$" causes the crash for me.
> 
> I run vi(1) without any personal settings in .exrc/.virc/$EXINIT.
> Also, $DISPLAY wasn't set, $TERM was "screen".
> (terminal size: 54 rows x 80 cols).
> 
> 
> To be explicit (:set all):
> 
> noaltwerase noexrc  matchtime=7 report=5noterse
> noautoindentnoextended  mesgnoruler notildeop
> autoprint   filec=""nomodeline  scroll=26   timeout
> noautowrite flash   msgcat="./" nosearchincrnottywerase
> backup=""   nogtagsmode noprint=""  nosecurenoverbose
> nobeautify  hardtabs=0  nonumbershiftwidth=8warn
> cdpath=":"  noiclower   nooctal noshowmatch window=53
> cedit=""noignorecaseopennoshowmode  nowindowname
> columns=80  keytime=6   optimizesidescroll=16   wraplen=0
> nocombined  noleftright path="" noslowopen  wrapmargin=0
> nocomment   lines=54print=""nosourceany wrapscan
> noedcompatible  nolisp  prompt  tabstop=8   nowriteany
> noerrorbellsnolist  noreadonly  taglength=0
> escapetime=1locknoredrawtags="tags"
> noexpandtab magic   remap   term="screen"
> directory="/tmp"
> fileencoding="UTF-8"
> inputencoding="UTF-8"
> matchchars="()[]{}<>"
> paragraphs="IPLPPPQPP LIpplpipbp"
> recdir="/var/tmp/vi.recover"
> sections="NHSHH HUnhsh"
> shell="/usr/pkg/bin/tcsh"
> shellmeta="~{[*?$`'"\"
> 
>   Martin Neitzel

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: vi(1) one line crasher

2023-06-22 Thread tlaronde
PR submitted.

Thanks!

Le Thu, Jun 22, 2023 at 08:54:30PM +0200, Martin Husemann a écrit :
> On Thu, Jun 22, 2023 at 08:52:12PM +0200, Martin Husemann wrote:
> > env EXINIT='set nu showmatch ts=8 wl=72' vi /tmp/vi_crasher.txt
> > 
> > does crash for me when typing $
> 
> #2  0x009a54da in vs_paint (sp=sp@entry=0x6fb6f40de000, 
> flags=flags@entry=3) at 
> /work/src/external/bsd/nvi/dist/vi/vs_refresh.c:726
> 726 abort(); /* XXX infinite recursion */
> (gdb) list
> 721 abort();
> 722 }
> 723 #else
> 724 if (vip->sc_smap == NULL) {
> 725 if (F_ISSET(sp, SC_SCR_REFORMAT))
> 726 abort(); /* XXX infinite recursion */
> 727 F_SET(sp, SC_SCR_REFORMAT);
> 728 return (vs_paint(sp, flags));
> 729 }
> 730 #endif
> #3  0x009983de in vs_paint (sp=sp@entry=0x6fb6f40de000, flags=3)
> at /work/src/external/bsd/nvi/dist/vi/vs_refresh.c:728
> #4  0x009990d5 in vs_refresh (sp=sp@entry=0x6fb6f40de000, 
> forcepaint=forcepaint@entry=0)
> at /work/src/external/bsd/nvi/dist/vi/vs_refresh.c:99
> #5  0x009941e6 in vi (spp=spp@entry=0x7f7fffe31740)
> at /work/src/external/bsd/nvi/dist/vi/vi.c:115
> #6  0x0097bdf8 in editor (wp=wp@entry=0x6fb6f40f1000, 
> argc=, argc@entry=2, argv=, 
> argv@entry=0x7f7fffe319c8)
> at /work/src/external/bsd/nvi/dist/common/main.c:436
> #7  0x009a5864 in main (argc=2, argv=0x7f7fffe319c8)
> at /work/src/external/bsd/nvi/dist/cl/cl_main.c:134
> 
> 
> Can you file a PR please?
> 
> Thanks!
> 
> Martin

-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


vi(1) one line crasher

2023-06-22 Thread tlaronde
And to be more exhaustive (perhaps it's this) is here my EXINIT:

EXINIT='set nu showmatch ts=8 wl=72'

The problem with wl=72 and the 80 for the size of the window and the
8 expansion for tabs?

The "end" of the displayed line is for me the 72th char, the 'l'.

So wl=72 and ts=8 may reproduce for you?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: vi(1) one line crasher

2023-06-22 Thread tlaronde
Le Thu, Jun 22, 2023 at 08:00:22PM +0200, Martin Neitzel a écrit :
> Hi Thierry,
> 
> > If one opens this one line script in vi, in an xterm with default size,
> > using the dollar '$' to go to the end of the line crashes vi(1).
> 
> I cannot reproduce that with:
> 
> % env TERM=xterm LC_ALL=fr_FR.ISO8859-15 vi vi_crasher.txt 
> 
> with local (linux) xterm -> ssh netbsd-host -> vi
> 
> NetBSD 8.2_STABLE (GENERIC), amd64,  #3: Sat May  2 15:05:24 CEST 2020
> 
> /usr/bin/vi,
> :ve yields:  Version (1.81.6-2013-11-20nb4)
> and ts=8

Others seem unable to reproduce also, but for me it is 100%
reproducible.

So trying to put every info:

$ xwininfo


xwininfo: Please select the window about which you
  would like information by clicking the
  mouse in that window.

xwininfo: Window id: 0x6d "xterm"

  Absolute upper-left X:  3
  Absolute upper-left Y:  24
  Relative upper-left X:  0
  Relative upper-left Y:  21
  Width: 884
  Height: 556
  Depth: 16
  Visual: 0x21
  Visual Class: TrueColor
  Border width: 0
  Class: InputOutput
  Colormap: 0x20 (installed)
  Bit Gravity State: NorthWestGravity
  Window Gravity State: NorthWestGravity
  Backing Store State: NotUseful
  Save Under State: no
  Map State: IsViewable
  Override Redirect State: no
  Corners:  +3+24  -713+24  -713-320  +3-320
  -geometry 80x24+1+1

(Note: the position has nothing to do with, the geometry has).

vi: set ts=8

A vis version of the one line

$ vis -w
\011\011v_stack[nval].ival\040=\040v_stack[nval-1].ival\040+\040v_stack[nval-1].len\011\012

Suspicious thing: when it crashes with ts=8, the last char displayed
just against the right border of the window (the remaining of the line
is not wrapped; it is simply not displayed) is the 'l' of the third
"nval". Then there are exactly 8 chars after before new line, the
last one being a tab (a tab and 8; and the width is 80)...

How to reproduce: in vi(1), use '$' to go to the end of the line.

Other way to reproduce for me. Set ts=4, move to the end of the line,
and then set ts=8.

The locales:
$ locale
LANG="POSIX"
LC_CTYPE="fr_FR.ISO8859-15"
LC_COLLATE="POSIX"
LC_TIME="POSIX"
LC_NUMERIC="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="fr_FR.ISO8859-15"
LC_ALL=""

and finally the window manager is twm (all are from NetBSD X11; not
pkgsrc).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: vi(1) one line crasher

2023-06-22 Thread tlaronde
Le Thu, Jun 22, 2023 at 06:40:08PM +0200, Martin Husemann a écrit :
> On Thu, Jun 22, 2023 at 06:22:42PM +0200, tlaro...@polynum.com wrote:
> > If one opens this one line script in vi, in an xterm with default size,
> > using the dollar '$' to go to the end of the line crashes vi(1).
> 
> I can't reproduce this. Which version of NetBSD? The in-tree vi?
> What is your $TERM? Which xterm?

$ uname -a

NetBSD cauchy.polynum.local 10.0_BETA NetBSD 10.0_BETA (cauchy) #0: Mon Feb 27 
11:28:34 CET 2023  
tlaronde@cauchy.polynum.local:/usr/obj/polynum.NODECONF-cauchy.polynum.local_netbsd-9.3-amd64_netbsd-amd64/netbsd/obj/sys/arch/amd64/compile/cauchy
 amd64

Stock userland; native X11; native xterm.

The xterm is launched through twm with this:

"Xterm" f.exec "LC_CTYPE=fr_FR.ISO8859-15; exec xterm -bg black -fg 
white -geometry +100+1 -ls&"

!!! IMPORTANT !!!

Forgot one important thing: set ts to 8 in vi. With ts == 4 it doesn't
crash:

vi -> set ts=8
   __

and the backtrace:

(gdb) bt
#0  0x78860599c47a in _lwp_kill () from /usr/lib/libc.so.12
#1  0x78860599c97a in abort () from /usr/lib/libc.so.12
#2  0x00012f65637a in vs_paint.cold ()
#3  0x00012f64927e in vs_paint ()
#4  0x00012f649f75 in vs_refresh ()
#5  0x00012f645086 in vi ()
#6  0x00012f62cc98 in editor ()
#7  0x00012f656704 in main ()
(gdb) 
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


vi(1) one line crasher

2023-06-22 Thread tlaronde
If one opens this one line script in vi, in an xterm with default size,
using the dollar '$' to go to the end of the line crashes vi(1).

I had frequently problem with the last character in the vi windows, on
the edge of the xterm window not displaying correctly (the previous char
is repeated and what is displayed is not what is in the file).

But this is the first time it crashes---with 100% "reliability", if I
can say so...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C
v_stack[nval].ival = v_stack[nval-1].ival + v_stack[nval-1].len 


Re: [CODE] inetd

2023-06-20 Thread tlaronde
Le Mon, Jun 19, 2023 at 07:34:43PM +0200, tlaro...@polynum.com a écrit :
> The new version of inetd is here:
> 
> http://downloads.kergis.com/misc/inetd.tar.gz
> 
> I have rewritten the majority of the parsing code, and put everything
> in parse.c, with a lot of comments---there is no more parse_v2.c since
> with two different files, for the same thing, things were not parsed
> exactly the same way.
> 
>[...] 
> There is one thing I have dropped and I need feedback about it:
> separating statements with semicolon in new syntax.
> 
> This allows "to put several service definitions on the same line" and
> even to put a legacy positional service definition after v2 ones.
> 
> It seems to me to be absolutely useless. Continuation lines allow to
> clarify things by shortening the lines. The semicolon allows to lengthen
> them, with absolutely no reason (if root wants to "obfuscate" its
> config file he can change the reading permissions bits...).
> 
> So is it used? Have I to support this---or can I simply support it
> (silently) at the end of a v2 syntax statement (for backward
> compatibility), and error if something is put after?

Well at least it is used in src/tests/usr.sbin/inetd/

So I will support the trailing ';' for service definition---even if this
creates corner cases if another entry is put in the same line: is
';[[:blank:]]*\.' valid? I will go for yes. That is with
';' the following initial blanks are ignored and this is as if the stmt
was starting at the very first char of a line---the problem is with
continuation lines that are defined by a leading blank, while escape
sequences are supported and a backslash before eol would have been
a simpler way to define continuation lines for directives or srv
definitions...

Other devil in the details: escaping is done only in quoted sections
in the present src/ code.
While, all in all, quoting should have meant "as is" (there is no
difference between single and double quotes), except for an escaped
quote inside quotes, and escaped sequence should be allowed in
value definition (not keyword) outside quotes (this is what I have
implemented now; so it is compatible with previous but an
extension---and it is valid for legacy syntax as well)...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[CODE] inetd

2023-06-19 Thread tlaronde
The new version of inetd is here:

http://downloads.kergis.com/misc/inetd.tar.gz

I have rewritten the majority of the parsing code, and put everything
in parse.c, with a lot of comments---there is no more parse_v2.c since
with two different files, for the same thing, things were not parsed
exactly the same way.

I have started to rewrite the manual page since I had to create new
directives to be able to mimic (if wanted) the current behavior---the
.include directive was not behaving like a dot'ed file in sh(1), and
the default host and the default policy were reset after inclusion.

I have reworked the syntax in order to clarify a lot of corner cases---but
there are still work to do with the manual page.

But the essentials are already here.

There is one thing I have dropped and I need feedback about it:
separating statements with semicolon in new syntax.

This allows "to put several service definitions on the same line" and
even to put a legacy positional service definition after v2 ones.

It seems to me to be absolutely useless. Continuation lines allow to
clarify things by shortening the lines. The semicolon allows to lengthen
them, with absolutely no reason (if root wants to "obfuscate" its
config file he can change the reading permissions bits...).

So is it used? Have I to support this---or can I simply support it
(silently) at the end of a v2 syntax statement (for backward
compatibility), and error if something is put after?

I will focus now on debugging, trimming and ajusting the syntax
information displayed (the machinery allows to pinpoint what goes
wrong).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): cmdif as builtin

2023-06-09 Thread tlaronde
Le Fri, Jun 09, 2023 at 03:48:38PM -0400, Mouse a écrit :
> > Le Fri, Jun 09, 2023 at 08:47:10AM -0400, Mouse a écrit :
> 
> I find it amusing that it's "Fri, Jun 09, 2023 at 08:47:10AM -0400"
> rather than something like "ven, 09 jui 2023, a 08:47:10 -0400", when
> the surrounding text _is_ en français.

It's because my system is configured, mostly to run with the native
language: 'C'---that is not english... ;-)

> 
> >>> BTW; just an idea: in the case of inetd(8), wouldn't it be more
> >>> simple and logical, in this very case, to add a "cmdif" (cmd
> >>> interface) builtin?
> >> Simpler and more logical than what?
> > Emphasis: in the inetd(8) context.  [...]
> 
> Sure.  But what is it you're comparing a cmdif builtin to, what is it
> that it's simpler and more logical than?  Signals? My (hypothetical as
> far as inetd is concerned) pidconn?  Something else?
> 

Say it seems logical to me. I like the (almost defunct, it seems) idea
of a bootstrapping compiler: a rudimentary version easy to compile, but
that can then compiles itself with all the bells and whistles.

Here, inetd in fact using what it has to do to provide its own
interface.

> 
> >> The biggest difference I see between this and using signals to
> >> provoke these actions is the target namespace: filesystem names for
> >> AF_LOCAL or process IDs for signals.
> > More than this: you can pass parameters with signals.
> 
> I assume there's a negation missing there.

Yes: signals have no parameters. And there are few free, so they do the
job for what they were designed for, but they can not be extended.

>  Yes, this is part of the
> reason I built pidconn: signals use PIDs for their destination
> namespace, but are an extremely restricted communication channel,
> suitable for little more than a very few seldom-used commands.  I've
> got at least two programs that already use both SIGUSR1 and SIGUSR2
> (designed before I had pidconn).
> 
> > I'm still thinking (background process) about the subject you have
> > started in another thread (about a way to pass commands to a process
> > in a more broad way than what is allowed by signals).
> 
> I'd be interested in hearing any thoughts you may come up with.  I'm
> hardly wedded to pidconn; it's just the best alternative I came up with
> that looked sanely implementable.
> 
> In the particular case of inetd, I agree: its raison d'être is, as you
> say, to listen for network connections and do things in response to
> them, so, if a network connection is suitable for the purpose, an
> internal service is a good model.

For the moment, I always fall back to the basis: stdin, stdout and
stderr are linked, by default to the controling terminal and have
conventional numbers. Why no three other conventional numbers, that a
program could listen or write too and that could be connected via an
user level program, given only the pid of the process to contact.

The advantage of the Plan9 approach is that "anything is a file". So
such an interface appears as a file in the name space (say like
/proc/pid/ctl). That's simple.

> 
> > All in all, why daemon(3) or a variation of daemon(3) would not
> > change stdin and stdout to not be linked to the controlling terminal
> > but precisely to another interface that allows sending commands and
> > receiving results, and deferring error or logs via stderr?
> 
> That would certainly be possible.  But:
> 
> (1) I would not want this to be restricted to daemonized processes.
> 

No, not restricted to daemons (but I thought of daemons because it could
seem logical not to close the connections, but to redirect them; that
stdin, stdout and stderr be "plumbed" with something else than the
controling terminal (the "controling interface").

> I have at least two programs which export pidconn interfaces which have
> proven useful even when the processes are not daemonized (and neither
> of them is anything for which daemon() would even be appropriate,
> though one of has an option to auto-background itself by forking and
> the parent exiting).
> 
> In the particular case of inetd, I would want the management interface
> to be available even when running non-daemonized, such as for
> debugging.

Agreed.

> 
> (2) You'd still have to design the rendezvous mechanism, the thing you
> describe as just "another interface".

Whether a well defined name appearing in the name space (like the
/proc/pid/ctl) or an user level program "knowing" how to connect to
these channels based on the target pid?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): cmdif as builtin

2023-06-09 Thread tlaronde
Le Fri, Jun 09, 2023 at 08:47:10AM -0400, Mouse a écrit :
> > BTW; just an idea: in the case of inetd(8), wouldn't it be more
> > simple and logical, in this very case, to add a "cmdif" (cmd
> > interface) builtin?
> 
> Simpler and more logical than what?

Emphasis: in the inetd(8) context. In this context, where inetd is
designed to wait listening for sparsely used services (the case for a
command line interface); that can limit the number of connections; that
can connect to various types of interfaces; it seems not logical to
perhaps try to add code that will duplicate part of its own "raison
d'\^etre".

> 
> In any case, the major issue I would have with it is the lack of
> authentication.  But that's so obvious that I assume you would be doing
> something like requiring a password - or doing it only for AF_LOCAL
> sockets and using LOCAL_PEEREID.  (This is pretty close to what most of
> my pidconn servers do - they use the pidconn analog of LOCAL_PEEREID to
> verify that the client is either root or the same UID the server is
> running as.)

Yes, of course. The scheme (in principle) is simple: inetd waits for a
connection on the given interface (as defined by a service entry in its
configuration file). When a connection occurs, it defers the
authentification to some service that, if authentification is
successful, "plumb" the external user with inetd command interpreter
(once the authentification is done, it is just a pass-through).

> 
> The biggest difference I see between this and using signals to provoke
> these actions is the target namespace: filesystem names for AF_LOCAL or
> process IDs for signals.

More than this: you can pass parameters with signals. So for a command
interface, signals are convenient if you need only a few totally defined
actions to trigger, but are not convenient if you want parameters or a
richer set of actions.

I'm still thinking (background process) about the subject you have started
in another thread (about a way to pass commands to a process in a more
broad way than what is allowed by signals).

I think it deserves more attention and deserves a solution in the system.

All in all, why daemon(3) or a variation of daemon(3) would not change stdin
and stdout to not be linked to the controlling terminal but precisely to
another interface that allows sending commands and receiving results, and
deferring error or logs via stderr? This is just "plumbing"---and Plan9 has
simply pushed the concept further; but it's already there.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


inetd(8): cmdif as builtin

2023-06-09 Thread tlaronde
BTW; just an idea: in the case of inetd(8), wouldn't it be more simple
and logical, in this very case, to add a "cmdif" (cmd interface)
builtin? that can be activated via the config with a normal service
entry, and that then listens for commands for managing inetd(8)
itself?

Note: it is for future reference, and will not be done now. I'm
finishing what I have engaged to do before.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[CODE] inetd

2023-06-08 Thread tlaronde
For the ones who want to look at it, the current state of my work is
here:

http://downloads.kergis.com/misc/inetd.tar.gz

You will at least see that I have rewritten a lot of parse.c and
parse_v2.c (and will in fact rewrite almost everything about the
parsing), inetd.c being the legacy and "easy" part (there are very
probably blunders added by me, since, if the code compiles, I haven't
run it yet: I always write everything and start debugging only when I'm
at least satisfy theoretically with what I have done).

(I have put assertions so no need to tell me: "it doesn't work", it is
not meant to at the present stage.)

And yes, there are still scalffolders, ladders, etc.: WIP.

So the code is here and it compiles but it is not functional yet.

My next timeline is 2023-06-19 for a functional version and 2023-06-26
for a fully functional one with "enough" testing done.

For the record, I have finally managed to understand what the parsing
code was doing and realized that what I thought initially were bugs,
were not: it is only that the way the code is written, it is almost
impossible for someone having not studied it carefully, to grasp at once
what is going on.

So here is how the old (present) code worked; and I will sketch after,
what the one I'm actually writing does:

Old code:

The manual page states that the config has to be given as absolute path
in normal mode, but that it can be a relative path in debug mode.

=> This is so only by side effect. If in debugging mode, the process
does not daemonize. The call daemon(0,0) does change CWD to
root. This is why in normal mode, even if one has not specified a
rooted path, it is always rooted, because it is always taken
relative to CWD that happens to be '/' when not in debugging mode.

New: the path is verified in every case to be rooted (leading '/')
so that the result of checking or running does not depend on CWD.

The man page states that a lock is written. 

=> It is not accurate: a lock is attempted but the return value is
not used and hence several processes can be running in parallel
until a fatal error stop one of them. The problem is that they can
pollute the logs.

New code: if there is a lock, it is /var/run so we are running
against root: the process exits. If there is no lock but we can't
write to /var/run, syslog is not used and messages go to stderr
(this is allowed not for the checked, that runs before daemonization
and before lock, but in case of a testing mode using builtins and
not privilege ports).

When parsing, the code construct a linked list of servtab structure.
The implementation of the ".include" directive has not be made so that
it behaves like a C programmer or a sh(1) user expects: the effect is
not equivalent (minus continuations lines---escaped lines) to having
directly written the chunk in the file.

The implementation in fact is using config() with recursive calls. This
is what I absolutely missed when I glanced through the code. Hence such
"horrors" as:

defhost = newstr(defhost);

(making a copy of the string and "loosing" the initial pointer) was
increasing my blood pressure.  But in fact, in another routine, far from
this one, the current value was saved, before, after a convoluted
succession of routines, config() was called again with a new config
(the included file). The copied string was just so that on "poping", it
will be fread, and the previous value restored. But it took me quite
some time to understand because the names or the comments were no help,
or a brief explanation of the process.

=> But the result is that if someone wants to "include" a file with
IPSEC policies: ".include ipsec.conf", assuming that it will apply,
it was totally lost: when the task pops off, the preceding policy is
restored. The inheritance is always from parent to child, and the
"included" file is not considered a chunk of the config, but a
child.

New: in my new code, there is no recursion. The routine filling the
line buffer simply switches to the new file to read the next line
from the included file. The ".include" directive is handled as
a sourced file in shell (or in #include in C). But the structure
put in place could allow another directive to implement the previous
behavior (but not with the "include" name).

In the old code, the loops were indeed prevented, contrary to what I
stated having not understood the recursion. And my initial bug report:

save_defhost = defhost;
new_file.abs = realpath(CONFIG, NULL); <--- here
new_file.next = file_list_head;
#ifdef IPSEC
save_policy = policy;
#endif
/* Put new_file at the top of the config stack */
file_list_head = &new_file;
read_glob_configs(pattern);
free(new_file.abs); <--- here
 

inetd(8): code tomorrow or, at worst, on Thursday

2023-06-05 Thread tlaronde
I'm rewriting far more than I have expected (but it goes well), so I'm
shifting the schedule: I will show the code not today, but tomorrow, or
if I need more time for debugging, on Thursday (2023-06-08).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread tlaronde
Le Sat, Jun 03, 2023 at 11:28:31PM +0700, Robert Elz a écrit :
> Date:Sat, 3 Jun 2023 13:45:44 +0200
> From:tlaro...@polynum.com
> Message-ID:  
> 
>   | Rhialto pointed me to sysexits(3) that was exactly what I was looking
>   | for (for inetd(8) revision). So kudos to him!
> 
> I deliberately didn't mention sysexits.h (or sysexits(3)) as I don't
> think it is really appropriate here.
> 
> sysexits works when the calling program (one which execs & then waits for
> the one which is to use those exit values) understands the convention, and
> can take action based upon the different exit codes.
> 

But there is such a calling framework: it is called rc(8).

That's the rc(8) that "ensures" (it can't if it is not called) that
there is only one inetd(8) server; the program by itself has strictly no
code to ensure that another server is not running...

In the man page states: "should be run at boot time by /etc/rc"

but it is not "should": it is "shall" because nothing else ensure
uniqness.

Since rc(8) is an automated framework, it has to understand exit values
and certainly not to parse variable strings (why not confront rc(8) with
i18n or l10n then?).

Furthermore, you seem all to be OK with the fact that if a user asks:
"Do integers wear white socks?", the program shall answer: "NO". I'm
sorry, but the correct answer is: "NONSENSE" unless you want the user to
ask: "Their socks are black then?"

For rc(8), if every program handled by rc(8) was exiting with EX_USAGE,
it would be a piece of cake to verify before release that rc(8) is at
least up to date with the calling convention of the programs it handles.

So, I use sysexits(3) in inetd(8) since if 0 for OK and whatever for
anything else will do, sysexits(3) is a choice that is not less
legitimate than anything else.

And I do claim that sysexits was and still is a good idea ;-)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread tlaronde
Le Sat, Jun 03, 2023 at 12:25:01PM +, Taylor R Campbell a écrit :
> > Date: Sat, 3 Jun 2023 14:12:21 +0200
> > From: tlaro...@polynum.com
> > 
> > Le Sat, Jun 03, 2023 at 12:02:20PM +, Taylor R Campbell a écrit :
> > > > Date: Sat, 3 Jun 2023 13:45:44 +0200
> > > > From: tlaro...@polynum.com
> > > > 
> > > > So I suggest to add a mention of sysexits(7) to style.
> > > 
> > > I don't think sysexits(7) is consistently used enough, or really
> > > useful enough, to warrant being a part of the style guide.  Very few
> > > programs, even those in src, use it, and I don't think anything
> > > _relies_ on it for semantics in calling programs.
> > 
> > But I think it is a loss of information to put everything in
> > EXIT_FAILURE. All in all, the majority of scripts will simply test
> > against 0, so being more fine grained (there are only 15 exit values at
> > the moment) doesn't cause problems and, IMHO, adds some value that can
> > be useful.
> 
> It's not really a loss of information: usually the error message
> printed to stderr is much more informative.
> 
> The question is whether it's useful for composition, so that calling
> programs can make meaningful decisions to take useful action on the
> basis of the called program's exit code -- like the convention of zero
> for success, nonzero for failure, which is absolutely useful for
> composition.
> 
> Unless you're devising a scheme to do that with sysexits(3), and
> implementing it systematically so that other programs derive some
> benefit from it, spending time to make inetd(8) scrupulously adhere to
> the sysexits(3) ontology of failure modes is likely a distraction from
> your main goals.

Don't worry: it is already fixed and was only a matter of minutes.

I still plan to release the alpha on Monday the 5th (of June 2023...).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread tlaronde
Le Sat, Jun 03, 2023 at 12:02:20PM +, Taylor R Campbell a écrit :
> > Date: Sat, 3 Jun 2023 13:45:44 +0200
> > From: tlaro...@polynum.com
> > 
> > So I suggest to add a mention of sysexits(7) to style.
> 
> I don't think sysexits(7) is consistently used enough, or really
> useful enough, to warrant being a part of the style guide.  Very few
> programs, even those in src, use it, and I don't think anything
> _relies_ on it for semantics in calling programs.
> 

But I think it is a loss of information to put everything in
EXIT_FAILURE. All in all, the majority of scripts will simply test
against 0, so being more fine grained (there are only 15 exit values at
the moment) doesn't cause problems and, IMHO, adds some value that can
be useful.

> > But I'd like also to request some additions to sysexits(3):
> > [...]
> 
> Sounds like overthinking this, unless you see specific semantic value
> for composing programs that goes beyond the standard convention of 0
> for success and nonzero for failure.
> 
> There are extremely rare cases of making finer distinctions than that.
> For example, cmp(1) returns 0 for identical, 1 for difference, >1 for
> error.
> 

But in more complex cases, I think this can add value (I'm not
requesting to make it mandatory; but if sysexits(3) is not largely used,
it's perhaps simply but the majority---I was part of it---don't know it
exists...

> > Furthermore, I'm adding a RETURN VALUES section to inetd.8 and I think
> > it should be standard practice for sys programs.
> 
> Normally this would go under EXIT STATUS, not RETURN VALUES.

OK.

> 
> > BTW, and still concerning style, is there a defined way of generating a
> > MAN page needing to edit some part of the manual (ex.: usage) depending
> > on some macros defined or not (in the case of inetd.8---even if this is
> > not an actual problem because LIBWRAP is always defined---the [-l] flag
> > depends on the macro; but it is always present in the usage).
> 
> I'd just document it unconditionally, and if it's really important,
> mention in the text that it depends on a compile-time option.

OK (and, for inetd.8, LIBWRAP is on since 1996, so it was a hypothetical
question).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread tlaronde
Le Sat, Jun 03, 2023 at 02:01:50PM +0200, Martin Husemann a écrit :
> On Sat, Jun 03, 2023 at 01:45:44PM +0200, tlaro...@polynum.com wrote:
> > Furthermore, I'm adding a RETURN VALUES section to inetd.8 and I think
> > it should be standard practice for sys programs.
> 
> That is for functions returning a value, the proper .Sh here would
> be EXIT STATUS.
> 

OK.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread tlaronde
Rhialto pointed me to sysexits(3) that was exactly what I was looking
for (for inetd(8) revision). So kudos to him!

I'm converting inetd(8) to exiting with conventional values defined
in sysexits(3) since:
- it auto-documents the code;
- it allows to script in a consistent manner the return status
of a sys program;
- it is a help when tracking a bug since it can point the
developer at the culprit or at least narrow down what to look for.

So I suggest to add a mention of sysexits(7) to style.

Furthermore, I'm adding a RETURN VALUES section to inetd.8 and I think
it should be standard practice for sys programs.

But I'd like also to request some additions to sysexits(3):

- a EX_RESOURCES, for an error not in the flow of the program but
due to a contextual exhaustion of resources (for example allocations
on the heap---that's the error detected even if there is probably
another problem elsewhere in this case, except in very small memory
environments);

- More fine grained error values for EX_OSERR; specially, error
for major interfaces---I'd like a EX_KEVENT for example;

BTW, and still concerning style, is there a defined way of generating a
MAN page needing to edit some part of the manual (ex.: usage) depending
on some macros defined or not (in the case of inetd.8---even if this is
not an actual problem because LIBWRAP is always defined---the [-l] flag
depends on the macro; but it is always present in the usage).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-06-02 Thread tlaronde
Le Fri, Jun 02, 2023 at 10:59:04PM +0200, Rhialto a écrit :
> On Wed 31 May 2023 at 00:18:26 +0700, Robert Elz wrote:
> > Date:Tue, 30 May 2023 16:11:55 +0200
> > From:tlaro...@polynum.com
> > Message-ID:  
> > 
> > 
> >   | -c  check a config file (and does not execute). Returns 0 on 
> > success and
> >   | ENOENT or EINVAL on error.
> > 
> > If you mean what that seems to say, then no.  The check only part is fine,
> > but functions can return ENOENT or EINVAL (or can return -1, or NULL, or
> > something with one of those in errno) - programs do not exit with those
> > values as the status (as their values aren't specified, it is possible
> > that ENOENT%256 (or ENOENT&0xFF if you prefer) == 0.
> > 
> > You can either just exit with status 1, or exit 1 for file open failed
> > (which covers a whole range of errno values, not just ENOENT), and 2
> > for invalid contents if you prefer.   But never use errno values as an
> > exit parameter.
> 
> There is  which defines "Exit status codes for system
> programs." EX_DATAERR could be appropriate for an invalid config file.
> 

Thanks for the info! I didn't know about this...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-06-02 Thread tlaronde
Le Fri, Jun 02, 2023 at 07:35:54AM +0930, Brett Lymn a écrit :
> On Thu, Jun 01, 2023 at 09:08:52AM +0200, tlaro...@polynum.com wrote:
> > 
> > But for now, it will be far simpler to only modify the NetBSD source
> > without trying to merge something external.
> > 
> 
> It isn't external - the mods were made to the NetBSD source code.

But since I have to disentangle the code, it will simply add burden now,
because it is external to the src code in the NetBSD official sources
and to my proposal.

So it is noted for a next step. But it will not be integrated now.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-06-02 Thread tlaronde
Le Fri, Jun 02, 2023 at 07:01:40AM +, David Holland a écrit :
> On Mon, May 29, 2023 at 10:11:09AM +0200, tlaro...@polynum.com wrote:
>  > There are infelicities in /usr/src/usr.sbin/inetd/parse.c and I will
>  > send a PR with patches attached.
>  > 
>  > The question is what to do in case of a config file not found (this is
>  > the initial problem: the realpath() return status is not tested and a
>  > structure is inconditionnally added to a linked list with an unreachable
>  > config file).
>  > 
>  > It seems to me, since these are services, that the failure to load a
>  > config is critical enough (since the server may be then servicing what
>  > was not intended to be serviced; the reverse is less problematic)
>  > to exit at least on this error.
>  > 
>  > What do others think?
> 
> I have not read most of the traffic yet, but I feel, fairly strongly,
> that inetd should _not_ exit, except (maybe) if the config is broken
> during its initial startup. It's a critical service.

So I will put together the result of the exchanges in the thread and of
my reading of the source:

Not stopping on an error was logical with the old syntax since _all the
directives were independent_: failing to read a line for a service
shouldn't have the side effect of failing to serve the other services.

But this assumption does not anymore with the new syntax: the feature of
the implicit address---one does not need to specify an address: it will
then be what is the default at the moment---makes the config a whole,
and failing on a line may define the default address with something
completely different from what was intended: what will be the result of
runnin login on the external interface instead of an internal one?

=> a failure in the config now must discard the whole config.

Hence to address this and the need for essential services the following
will be done (in the process in fact):

Two modes:

Server mode: inetd [-rl] [-f [-d]] [rooted_config]

Checker mode: inetd -c [-d] [rooted_config]

inetd always checks the config it is asked to serve first and does not
serve anything before validating the config. If default mode, a failure
to check the config causes an exit of the daemon.

Disentangling the parsing from the executing/serving, there is a check
mode that just parses the config and returns EXIT_SUCCESS if the config
is OK, EXIT_FAILURE if not (unfortunately, there is no EXIT_RTFM to
indicate that the call was nonsense...).

Checker mode:

So inetd just exits with the status in check mode. In check mode, if -d
is given, debug message about the parsing (and the errors) are sent to
stderr and, if successful, the parsed config in new syntax is sent to
stdout. Nothing is printed if [-d] is not given.

Server mode:

Inetd always checks the config and if the convid is invalid it exits
with error (default mode).

If the [-r] mode is asked for (r for "resilient"), in case of failure to
validate the config, inetd falls back to "/etc/inetd.fallback.conf" that
is also parses and, if checking successful, served. If even this
fallback fails, inetd(8) does not exit but serves the "no-op" config: it
does nothing simply waiting for the instruction to reload the config
("the config" is always the one passed, explicitely or implicitely, on
the call).

Still in resilient mode, if the server is instructed to reload the
config, it does not stop serving the old but first checks the new. If
the new is valid, it switches to these new instructions. If the new is
not valid, it continues to serve the current served one.

In daemon mode but in foreground mode, messages are logged via syslog.

When in foreground, messages go to stderr.

A new control is added:

Sending USR1 signal instruct inetd to switch to the fallback config (it
is not "sticky": when instructed to reload via HUP, it will reload "the
config" not the fallback one).

Compatibility:

I was unhappy about:

-d run in foreground and add debug messages
-f run in foreground

that seems to me not very Unix like.

Hence the [-f [-d]] but for compatibility in the switch it is a
fallthrough: case d will fallthrough f so the behavior will be the same.

-c takes also -d; no problem since -cfd will invoke the checker, not the
daemon: "inetd is human and always grasps to the first opportunity to do
less work".
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-06-01 Thread tlaronde
Le Thu, Jun 01, 2023 at 12:50:30PM +0930, Brett Lymn a écrit :
> On Wed, May 31, 2023 at 12:43:40PM +0200, tlaro...@polynum.com wrote:
> > 
> > And I think you're right: the info will go in a 0400 file in /tmp, and
> > will be a way to obtain various running infos---but for now, just the
> > running config (it could perhaps be extended, but not now, to add
> > stats, what is masked by a secmodel etc.)
> >
> 
> There was a GSoC project last year that was looking to extend inetd.
> Unfortunately, it is incomplete but one of the things that was done was
> to write a inetdctl that could, among other things, dump the running
> configuration.  You may be able to use some of that code.

Thank you for the info.

But for now, it will be far simpler to only modify the NetBSD source
without trying to merge something external.

In fact, what I'm going to do is not very difficult.

And, from an engineering point of view, the problems now present are the
consequence of putting a new syntax without realizing that it totally
changes a fundamental implicit assumption: every directive in the
conf file (legacy version) was totally distinct from the others, so
failing to parse a directive was only impacting _this_ service and could
not impact others.

So it was safe and indeed logical to continue, precisely as to not
impact other services because one blunder for another service.

So, since I have now a quite clear vision of what I have to do and want
to obtain, it would be unwise to try to master or to incorporate a code
made with another vision. It will simply harden my task and risk
introducing by the window bugs I want to put out by the door.

Let's put the code right again. A side effect will be that it will be
easier to extend.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 09:57:00PM +1000, Luke Mewburn a écrit :
> This isn't a NetBSD convention per se, although I could write up in
> more detail the conventions we do use and pass them around separately
> for consideration/refinement for NetBSD services/daemons.

Yes, please. It would be worth having guidelines and set rules if only
to ease the automatic monitoring by easing scripting syslog logs.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 08:16:52AM -0400, Mouse a écrit :
> > The inclusion directive is a dot i.e. a here script:
> 
> Well, as I think someone else pointed out, I wokuldn't call that a here
> script.  sh spells it ".", csh spells it "source", but here script is
> more often used for "<<" style input, input that's inline, which an
> included file by definition is not.
> 
> > This means that all definitions have a global scope and that the
> > feature of the default address, if not specified, is not limited to
> > the file that is dot'ed there.
> 
> Then there is definitely a use for including the same file more tha
> once:
> 
> 10.7.44.184:
> . private-services
> 172.18.9.1:
> . private-services
> *:
> . everyone-services

Yes. This is why the interdiction to include several times the same file
will be suppressed. Only loops will cause failure---in fact, in the
code, the interdiction to load several times the same file was,
apparently, the way chosen to prevent loops.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 07:50:03AM -0400, Mouse a écrit :
> >> So I added a new file descriptor type (well, semi-new; they're
> >> DTYPE_MISC) and a new syscall (pidconn) [...]
> 
> > In this area (as others, in fact), the Plan9 solution is quite
> > elegant (and consistent with the whole design): in the /proc/
> > directory, the process has its directory /proc/$pid/, under which
> > there is "ctl" file to which one can write textual messages to
> > control the process:
> 
> > echo $cmd >/proc/$pid/ctl
> 
> That is unidirectional communication.  pidconn is bidirectional.  Aside
> from that, how does the process (a) notice that this has been done and
> (b) get the string ($cmd in the above) to act on it?

I haven't much time to spend with Plan9---to my regrets when it comes to
mastering the concepts---so take the following with a grain of salt:

Controling the process can be made by writing to the "ctl" file (this
does what some Unix signals do).

Other Unix like signals can be emulated via the use of notes. There
is also a "note" file that allows to write to the file (for notes
delivered to the corresponding process).

And if bi-directional IPC is wanted, it can be done at user-level with
the plumber that allows to "receive, examine, rewrite and dispatch
messages between programs" as long as the program allows to be plumbed
(prepared to read and write from conventionally named file in the
namespace). (It does multiplexing too; it is not only 1<->1.)

In fact, one can do the same thing probably with Unix. The difference is
that the facilities are offered by the system and a lot is leveraged by
the manipulation of the namespace (everything can write to "/tmp" except
that what corresponds to "/tmp" for a user is different from what is
used by another one).

> 
> The procfs I have has a ctl file, but it is ptrace-style control, not
> communication a la pidconn (which is optional, more like sockets).  It
> also appears to have vanished by 9.1.  Besides those, I'm not sure how
> I feel about depending on procfs.

Well proc/ in Plan9 is not an external addition. All this machinery is
the core of the system. This is why some concepts of Plan9 have been
adapted in Unix, but others could not so easily be adapted because they
depend too much on things done rather differently and in the core.

But the concept of having well-known file names to communicate
with a process is appealing and to not have to set the communication
channels in every program but just to prepare to use them if the program
wants too, is a bonus.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 09:57:00PM +1000, Luke Mewburn a écrit :
> On 23-05-31 13:12, tlaro...@polynum.com wrote:
>   | > Also, on 23-05-30 21:03, tlaro...@polynum.com wrote:
>   | >   | (When checking, so even with -C, nothing is written via
>   | >   | syslog since, then, inetd is not a daemon but just an utility---a 
> syntax
>   | >   | checker---printing messages on stdout or stderr and a inetd daemon 
> could be
>   | >   | serving at the very same time: so don't spoil its log.)
>   | > 
>   | > I don't consider inetd in "check" mode using syslog (except when in -f
>   | > foreground of course) as "spoil[ing] the log"; the relevant entries will
>   | > have a different process ID in the log and sysadmins and log processors
>   | > are quite familiar with dealing with parallel streams of logs from the
>   | > same service with different process IDs, 
>   | 
>   | But inetd used in check mode is probably to try to validate a candidate
>   | config in the current process to be written. If it is actually used,
>   | the config parsed will be syslogged when actually loaded.
>   | 
>   | So I think when using inetd in check mode, it is an utility and the
>   | config will go to stdout, and the errors about the parsing to stderr
>   | (and when trying to run, the syslog will have only the information about
>   | failure to parse, or success and config dump, not the details of the
>   | parsing since these failures can be explored at length by using inetd in
>   | check mode).
> 
> IMHO, if check + -f foreground; errors to stdout/stderr,
> and if check + (background default); output to syslog.
> In any case, success is 0 status, failure is non-zero (or signal raise).
> 
> If check mode is used in an rc.d script (for example), dumping a lot of
> errors to stderr can ruin "clean" rc.d output, and if the stderr errors
> occur at startup, you may not have easy access to errors printed to
> stderr on boot, whereas you generally have the syslog.

Since I have replied in chronological order, and it might be difficult
to gather the pieces in the right order with a hierarchy of threads, I
will restate (or state more clearly) what I have in mind in this area:

inetd(8) called with just '-c' will only validate or invalidate a config
without writing anything neither to stdout, nor to stderr and not via
syslog.

With '-C', it will write the "explained" (all comments stripped; all
addresses explicit) config parsed if successful to stdout (and nothing
to stderr).

If, with '-[cC]' the '-d' flag is also given, debugging informations
about the parsing will be sent to stderr.

When not checking, the first step of inetd(8) will be to parse the
config before doing anything. If the config is not valid, a message will
be written via syslog stating that the config is rejected and, depending
on default behavior or resilient mode, it will whether exits (logging
via syslog that it is exiting on error) or fallback to the alternate
fallback config.

But, even with '-d', the debugging informations about the parsing of the
config will not be written via syslog since it will, IMHO, add too much
hay around the needles.

Furthermore, if the parsing succeeds, we don't care about the debugging
information. If the parsing failed, invoking 'inetd -cd config' will
give all the informations to the one in charge of fixing it. And if
"config" has vanished in the meantime, the debugging informations would
have been useless altogether, since the config will have not been
"serviced".

> 
> Again, we've found both helpful at work. Originally we used to just
> print daemon startup errors to stderr. We eventually changed to
> duplicate to syslog until we know if we're foreground/background mode or
> not. This isn't a NetBSD convention per se, although I could write up in
> more detail the conventions we do use and pass them around separately
> for consideration/refinement for NetBSD services/daemons.
> 

For the daemon (not invoked as a parse checker, even if the daemon
always checks the config before executing the directives), the messages
will go via syslog.

The remaining question is whether, with a successful parsing when
running in daemon mode, the config should be dumped via syslog (I
finally think so) or if another signal has to be accepted to only dump
the config (to syslog since you think there may be problem when dumping
to a file in /tmp) via syslog only if requested to do so.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 01:12:12PM +0200, tlaro...@polynum.com a écrit :
> 
> But inetd used in check mode is probably to try to validate a candidate
> config in the current process to be written.

My english is definitively not very good: inetd in check mode is
probably used to try a config that is currently written (for future
actual use; "in the process" to be written).

I hope this is more understandable...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 08:42:11PM +1000, Luke Mewburn a écrit :
> On 23-05-31 03:54, Robert Elz wrote:
>   | Date:Tue, 30 May 2023 21:03:21 +0200
>   | From:tlaro...@polynum.com
>   | Message-ID:  
>   | 
>   |   | Do you think that SIGINFO is sound as the signal to obtain a config 
> DUMP in
>   |   | the syslog?
>   | 
>   | First, dumping config to syslog seems like an odd thing to do at all, I'd
>   | normally expect it to be dumped in a file instead (something in /tmp
>   | perhaps, or where defined in the config file perhaps),
> 
> Over years of developing and maintaining various long-running
> services/daemons on various work systems that run for long periods of
> time, we've developed a convention to log a bunch of pertinent
> configuration on startup, and when the configuration reloads (if reload
> is supported) - either via SIGHUP or dynamically when configuration
> changes, depending upon the service and its implementation history.
> In some cases we just log what's change from compiled-in defaults (on
> startup) or changes from previous configuration before the reload.
> 
> Working with our support team and customers we've found that the
> practice of having such configuration in the log with the other messages
> from the service is quite helpful for debugging issues or confirming
> configuration especially if it's been running for a long time and the
> configuration has been changed since startup (without a reload).
> 
> We also have a convention of logging a "starting" event once the service
> has validated its options and configuration and is about to actually
> start work, "stopping" when shutting down (with a summary of the error
> that triggered it, if any - more details of the event is usually found
> in a previous log message), or "terminating by signal %d" if a signal
> (with appropriate set default handler and re-raise of re-raise of the
> signal, so that ^C in a shell loop actually stops).
> 
> These conventions may be useful here, or not. Just food for thought.
> YMMV :)

I think that this is important because following some convention helps
who is reading a log to grasp easily what is going on (or not...) and
helps scripting the log.

So I will go with these words (and case).

> 
> 
>   | But that's independent of the signal used to make it happen, that
>   | could be SIGINFO, which normally makes a process list its current
>   | state (what it is doing) - and for that knowing what config inet is
>   | serving could be considered part - but that is also typically a lot
>   | more than SIGINFO generates (usually just a line).   Personally
>   | I'd probably pick a different signal for that, but I am not sure which.
> 
> Yes, SIGINFO is really intended for foreground apps receiving a signal
> from the tty (e.g., I use ^T in tcsh) to display a short line with
> status, rather than services/daemons logging to syslog.
> 
> E.g., 
>   lukem% ^T
>   [ 6909969.1505419] load: 0.00  cmd: tcsh 2750 [0x4bb99a/4] 0.31u 0.28s 0% 
> 1456k
> 
> SIGINFO support has been added to various other applications over the
> years: dump, gzip, ftp (etc). According to CHANGES.prev I sent a patch
> to implement TIOCSTAT in the tty driver in ~ 1993, but I don't recall
> what inspired that patch; maybe it was just tcsh's optional support for
> SIGINFO / TIOCSTAT. (I certainly didn't invent the idea).
> 
> 
> As to what signal to use. Maybe USR1 ? Or just leverage off the idea
> to log on (re)load the pertinent configuration.
> Maybe add a -D flag to enable debug without foreground
> (-d enables debug and foreground together), and then dump all the
> configuration.

I think that if, in other circumstances, this kind of info has been seen
to be relevant, I will go with dumping the successfully parsed config
that is about to be served to syslog (in every case; since if it is
useful when doing remote support, it can't depend on the flags given
when starting the daemon) and then add only a signal SIGUSR1 for
switching to the fallback config.

> 
> 
> Also, on 23-05-30 21:03, tlaro...@polynum.com wrote:
>   | (When checking, so even with -C, nothing is written via
>   | syslog since, then, inetd is not a daemon but just an utility---a syntax
>   | checker---printing messages on stdout or stderr and a inetd daemon could 
> be
>   | serving at the very same time: so don't spoil its log.)
> 
> I don't consider inetd in "check" mode using syslog (except when in -f
> foreground of course) as "spoil[ing] the log"; the relevant entries will
> have a different process ID in the log and sysadmins and log processors
> are quite familiar with dealing with parallel streams of logs from the
> same service with different process IDs, 

But inetd used in check mode is probably to try to validate a candidate
config in the current process to be written. If it is actually used,
the config parsed will be syslogged when actually loaded.

So I think when using inetd in check mode, it is an utility and the
con

Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Tue, May 30, 2023 at 07:34:55PM -0400, Mouse a écrit :
> > But that's independent of the signal used to make it happen, [...]
> 
> > Personally I'd probably pick a different signal for that, but I am
> > not sure which.
> 
> I have long felt that having only two uncommitted signals (SIGUSR1 and
> SIGUSR2) is way too few.
> 
> I've also long wanted a way to contact processes by PID.  Something
> like sockets where addresses are process IDs.
> 
> A while ago, I finally sat down to do something about that.  I
> eventually came to the conclusion that sockets would not do, not
> because of any conceptual problem but simply because the internal
> infrastructure involved was incompatible with the design goals I had.
> (It's possible the socket API design _is_ incompatible with my design
> goals, but I found the internal issues before I became convinced of
> that.)  So I added a new file descriptor type (well, semi-new; they're
> DTYPE_MISC) and a new syscall (pidconn) to do the analogs of
> socket+listen and socket+connect and a few other relevant things.
> 
> The relevance to this thread is that the way I'd handle this is to give
> inetd a(n optional) pidconn listener via which it could be told to dump
> its config, either to the pidconn connection or to a file whose name is
> specified in the command.
> 
> Assuming NetBSD isn't interested in going that far, perhaps it would be
> reasonable to have a config-file syntax which specifies a listening
> point (AF_LOCAL, AF_INET, AF_INET6, whatever) via which it could be
> similarly commanded?

In this area (as others, in fact), the Plan9 solution is quite elegant
(and consistent with the whole design): in the /proc/ directory, the
process has its directory /proc/$pid/, under which there is "ctl" file
to which one can write textual messages to control the process:

echo $cmd >/proc/$pid/ctl

For inetd(8), I will exhaust the spares (SIGUSR1 and SIGUSR2) and
instead of writing (for the dumping of the current config file) via
syslog, it will write to a file in /tmp (the config chunk being guarded
by comment sentries: "##CONFIG_BEGIN" and "##CONFIG_END") and the scheme
could be extended (not now) to write other infos as well (stats and so
on).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread tlaronde
Le Wed, May 31, 2023 at 03:54:12AM +0700, Robert Elz a écrit :
> Date:Tue, 30 May 2023 21:03:21 +0200
> From:tlaro...@polynum.com
> Message-ID:  
> 
>   | Do you think that SIGINFO is sound as the signal to obtain a config DUMP 
> in
>   | the syslog?
> 
> First, dumping config to syslog seems like an odd thing to do at all, I'd
> normally expect it to be dumped in a file instead (something in /tmp
> perhaps, or where defined in the config file perhaps),
> 
> But that's independent of the signal used to make it happen, that
> could be SIGINFO, which normally makes a process list its current
> state (what it is doing) - and for that knowing what config inet is
> serving could be considered part - but that is also typically a lot
> more than SIGINFO generates (usually just a line).   Personally
> I'd probably pick a different signal for that, but I am not sure which.
Well, I will go with using SIGUSR1 and SIGUSR2 hence, as Mouse noted,
exhausting the spares.

And I think you're right: the info will go in a 0400 file in /tmp, and
will be a way to obtain various running infos---but for now, just the
running config (it could perhaps be extended, but not now, to add
stats, what is masked by a secmodel etc.)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: [RFC] inetd(8) changes proposal

2023-05-30 Thread tlaronde
Le Wed, May 31, 2023 at 12:18:26AM +0700, Robert Elz a écrit :
> Date:Tue, 30 May 2023 16:11:55 +0200
> From:tlaro...@polynum.com
> Message-ID:  
> 
>   | The inclusion directive is a dot i.e. a here script:
> 
> Definitely should not be called that, a "here xxx" is an xxx that is,
> obviously, here, not elsewhere.   Something in another file is included,
> sourced, "dotted" if you like, but not "here".   Of course this is just
> a minor terminology issue.

Yes, you are right: it is a slip of the keyboard.

Nonetheless this part will not go in the man page. It is simply to
answer a question I asked myself in the thread about the feature
of the "default address". I was proposing to limit the default
address to the scope of the file parsed at the moment. But this
would be inconsistent, since there is, in fact, only one file even
if bits are dotted from other files. So it will stay as it is
(but hence the necessity to validate the whole config since one error
could change the whole meaning of the config).

> 
> 
>   | -ccheck a config file (and does not execute). Returns 0 on 
> success and
>   |   ENOENT or EINVAL on error.
> 
> If you mean what that seems to say, then no.  The check only part is fine,
> but functions can return ENOENT or EINVAL (or can return -1, or NULL, or
> something with one of those in errno) - programs do not exit with those
> values as the status (as their values aren't specified, it is possible
> that ENOENT%256 (or ENOENT&0xFF if you prefer) == 0.
> 
> You can either just exit with status 1, or exit 1 for file open failed
> (which covers a whole range of errno values, not just ENOENT), and 2
> for invalid contents if you prefer.   But never use errno values as an
> exit parameter.

OK.

> 
> The general outline looks OK to me, but if this happens, and is
> documented, try to be a little less long winded in the man page...
> 

Don't worry: for the man page, I will limit myself to the addition
of the new options, a brief description of what they do and correcting
the descriptions of features changed (globbing now including in
lexicographical order; including the same file several times
legitimate as long as there is no loop; only regular files as
config chunks).

And a native english speaker will then have to review the result---preferably
reviewing the implementation in parallel to catch discrepancies.

> One possibility for the config files (and SIGUSR1 etc) might be to
> allow a list of config files (with a default list compiled in, possibly with
> one name as a fallback always appended to the list) and have
> SIGUSR1 move to the next on the list (in one direction) and SIGUSR2
> move to the next in the other direction (and SIGHUP just do a reload
> on whatever is being used currently, as it always has).
> 

Yes this could be. But cycling through alternatives will cost more
time than to simply stop and restart with another config, or to
simply use a symbolic link for the config that can be changed (for
what it links to) at will and then asking to reload. I prefer to
keep it simple since the two main functions of this feature for me
are:

1) belt and suspenders---try to be sure to have at least some fundamental
services in case of wrong config;

2) reducing the level of service for some maintenance or due to
excessive load etc.

So I will limit myself to one defined fallback, which should provide
enough safety against a refuse to serve anything from an incorrect
config (if the administrator decides to use the feature; nothing will be
provided/served by default).

Do you think that SIGINFO is sound as the signal to obtain a config DUMP in
the syslog? (When checking, so even with -C, nothing is written via
syslog since, then, inetd is not a daemon but just an utility---a syntax
checker---printing messages on stdout or stderr and a inetd daemon could be
serving at the very same time: so don't spoil its log.)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[RFC] inetd(8) changes proposal

2023-05-30 Thread tlaronde
After a little reflection, here are the proposed changes to inetd(8):


THE CONFIGURATION

There is only one configuration file given by a path---whether on the
command line or defaulting to "/etc/inetd.conf".

The inclusion directive is a dot i.e. a here script: there are not
different configuration files, but only one that may simply be assembled
by dot'ing others.

This means that all definitions have a global scope and that the
feature of the default address, if not specified, is not limited to the
file that is dot'ed there.

The configuration has hence to be wholly correct since any failure may
totally change the meaning of the directives (by changing the default
address).

=> I will disentangle the execution of the directives from the parsing
of the config (at the moment, services are launched once a line is
parsed, ignoring errors). To begin any action, the configuration will
have to be globally accepted.

The consequence will be that it will be possible to check the config
without running and without adding spaghetti code to not
execute if only checking.

For checking, two options will be added:

-c  check a config file (and does not execute). Returns 0 on success and
ENOENT or EINVAL on error.

-C  as -c plus dump on stdout an explicit normalized config as it has
been parsed. If the config is invalid, nothing is dumped: no action
is taken if the config is invalid (there is no security problem if
inetd is run by a not root user since the user has to have reading
rights to the whole configuration to obtain any result: nothing is
revealed that he would not have been able to get without the program.

 NORMAL BEHAVIOR

In the normal behavior, an incorrect config is fatal: a message is
logged via syslog, but the process exits with error (ENOENT or
EINVAL).

   RESILIENT BEHAVIOR

Since a config can be rejected and hence nothing serviced, there will be
the addition of another mode:

-r  Resilient. At start, if the config given is invalid, try to load
instead a fallback config file "/etc/inetd.fallback.conf". If this
config also fails (maybe inexistent), serves nothing but keep
running (the program is then still loaded, meaning resilience also
against an update of the binaries). In this mode, an instruction to 
reload will try to reload THE config (the one defined at the
beginning, not the fallback one) and if this new config fails,
the process will continue to serve the previous config (since
nothing is done before validating the whole configuration, the
previous, as it is, continues to be serviced; if it was the no-op,
it continues to be the no-op: the fallback config is not reloaded.
But see below.).

Note: resilient does not mean restricted. The fallback conf can be as
convoluted as a "normal" conf file. But one usage (not mandatory) will
probably be a safe minimal file with essential services to ensure in all
cases that the minimum is accessible.

 CONTROL

Supplementary to SIGHUP:

SIGUSR1 will instruct to load the fallback conf. This doesn't change
THE config (a SIGHUP following will try to reload THE config and not
the fallback), but loads, now, in its stead, the fallback one. This
is a way to decrease the services serviced without stopping and
restarting with another config file.

SIGUSR2 (or SIGINFO?) will instruct to dump the current config via
syslog. (Rationale: whether for debugging purposes, or because the
administrator has tampered with its config without knowing anymore
what worked, offer the possibility to tell what is supposed to work
now).

   CONFIG DUMP

A config dump shall be so that its loading as config will lead to
exactly the same behavior as the one obtained with the config file
initially provided to the parser.

   MODIFICATION FROM PRESENT PARSING HANDLING

A file can be loaded several times as long as there is no loop (in
fact, at the moment, contrary to what is stated in the man page, loops
are not detected: they are prevented by not allowing to load the same
file twice---and due to bugs, this is not guaranteed). The loop
detection will not use realpath(3) but dev/inode.

A config file has to be a regular file (not a pipe). Its size is
retrieved at opening and not more than its initial size is processed. If
the process succeeds but the size of the file has changed when reaching
the end (of the processing of this file), there is tampering with the
file and the config fails (it does not guarantee against any
tampering but at least it ensures that there will be, by file, a
definite amount of data to process).

The pathnames list resulting from globing will 

Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 07:49:44AM -0400, Mouse a écrit :
> >> I'm not sure inetd(8) has any business calling realpath in the first
> >> place.
> 
> I agree.
> 
> > It has to call realpath(3) since in order to not include several
> > times the same file, it makes strings comparaisons about names.
> 
> If I as an admin write a config that tries to include the same file
> twice, whether via the same or different paths, I expect it to include
> the same file twice.  If that leads to errors, that's on me.  (It might
> or might not _always_ lead to an error, depending on how the include
> file syntax is defined to interact with the stateful aspects of the
> config language and what's in the config files.)

I tend to agree that since, with globbing, the order of the inclusion of
files is not guaranteed, reloading several times the same file, as long
as there is no loop, is not less legitimate (but for
globbing, there could be an added defined behavior to sort
lexicographically the files list so that carefully choosen names can
guarantee a defined order of inclusion; and this will be compatible with
"present" behavior, since no order was guaranteed; hence any order
imposed now will do). 

But, at first, I will add a flag to check the present syntax and keep 
the present theorical behavior (in fact, due to bugs, it doesn't
work), and will correct the bugs and make the program exit on error.

And only in a second time, perhaps review the way things are processed
(perhaps allowing multiple inclusions of the same file---do what
the admin requested; as you wrote, it is his work so his responsability
as long as, syntactically, the program can parse what he asked, deliver).

> If you really want to avoid including the same file twice even if
> that's what the config says to do, I'd say it should do so with
> dev/ino > comparisons, not pathname comparisons.

Yes. Since the operation of calling realpath then open is definitively
not atomic, hence the success of realpath(3) been no guarantee of the
opening of the file, and since realpath has to go in kernel space also,
so there is no process time gained, even if I maintain the not multiple
inclusions for the first step, I will make the comparisons with lower
level identifiers. 
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 11:20:32AM +, Taylor R Campbell a écrit :
> > Date: Mon, 29 May 2023 10:11:09 +0200
> > From: tlaro...@polynum.com
> > 
> > The question is what to do in case of a config file not found (this is
> > the initial problem: the realpath() return status is not tested and a
> > structure is inconditionnally added to a linked list with an unreachable
> > config file).
> 
> I'm not sure inetd(8) has any business calling realpath in the first
> place.  Can we just remove it and let it continue to use paths exactly
> as written in the config file?

It has to call realpath(3) since in order to not include several times
the same file, it makes strings comparaisons about names. Hence a file
has to have some canonical form, not maskerading with differing forms
or it will not be able to detect it is the same file or to detect a
loop.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 11:19:22AM +, Taylor R Campbell a écrit :
> > Date: Mon, 29 May 2023 13:13:33 +0200
> > From: tlaro...@polynum.com
> > 
> > I'm for: exit on any error. If we provide a way to check, that's the
> > responsability of the administrator to check his config before trying to
> > run the thing.
> 
> Yes please.  Should provide an option to check a configuration file
> without starting inetd(8), make `service inetd check' do this, and
> make `service inetd reload' (maybe also `service inetd restart') do
> check first.  This should be standard practice for all daemons.

So: OK. I take this one (inetd) and will do it sometime in the week
(at worst, you should here from me on Monday the 4th).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 11:43:33AM +0100, David Brownlee a écrit :
> On Mon, 29 May 2023 at 11:38, Michael van Elst  wrote:
> >
> > tlaro...@polynum.com writes:
> >
> > >If inetd is not running, if the administrator doesn't look at the logs,
> >
> > That's why people monitor services and logs and use manual or
> > automated procedures to validate and deploy configuration changes.
> 
> I have a slight preference towards 'exit on error', but both options
> have completely valid use cases.
> Could add a command line flag to determine whether to exit on error.
> Is there any prior art in other BSDs/Linux?
> One aspect to bear in mind is that inetd has been around ~forever, and
> conventions have changed over time.
> 
> > >At least, wouldn't it be worth to add a flag simply to parse and
> > >validate the syntax without running the daemon?
> >
> > It's always a good thing to be able to validate a configuration.
> 
> This absolutely sounds like a nice idea - could then be chained in
> rc.d so 'inetd reload' could check the file and abort with an error
> rather than reloading (similar to the recent sshd changes)
> 

So seems others are OK at least for the checking without running (that
seems to me the bare minimum to provide due to the complexity of the
thing and the security implications).

But I will once more plead for exit on error due to this feature (from
the man page):

---8<---
To avoid the need to repeat listen addresses over and over again, listen
addresses are inherited from line to line, and the listen address
can be changed without defining a service by including a line containing
just a listen-addr followed by a colon.
--->8---

If one such line fails, all that will be parsed after (necessarily from
another file) will potentially not listen where it should!

Imagine what it can be if telnet is listening on 22!

(One solution: clear the definition of the default address defhost when
including another file.)

I'm for: exit on any error. If we provide a way to check, that's the
responsability of the administrator to check his config before trying to
run the thing.

I will also modify the man page, because including several times the
same config files has not an undefined behavior (well, at the moment:
yes, but only because of a blunder in the code): the file is only
included once; every other appearance is skipped (it doesn't really work
at the moment due to the bugs).

Thing that I will not do now but could be done: when globbing is used
and this results in several files, since the order of inclusion is
random, verify that the files are orthogonal and exit with error if they
are not.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 11:47:28AM +0200, tlaro...@polynum.com a écrit :
> 
> And some log messages are problematic too:
> 
> DPRINTCONF("Syntax error; Exiting '%s'", CONFIG);
> 
> while it never exits: the function returns an invalid status code, and
> the process goes on...

For this, I mean "Quitting '%s'" is what it does. "Exiting" has an
implicit meaning when it comes to a program. A program exits, but you
don't "exit" a file.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
Le Mon, May 29, 2023 at 09:03:07AM -, Michael van Elst a écrit :
> tlaro...@polynum.com writes:
> 
> >It seems to me, since these are services, that the failure to load a
> >config is critical enough (since the server may be then servicing what
> >was not intended to be serviced; the reverse is less problematic)
> >to exit at least on this error.
> 
> inetd will service what is configured. Skipping an unparsable directive
> may have unwanted side effects, but so will a syntactically correct but
> otherwise wrong directive.
> 
> The impact of not providing some services in case of a syntax error
> can easily be as problematic or dangerous as a wrongly configured service
> that the parser is unable to detect.
> 
> If you want to protect against bad configurations, you could separate
> each service, e.g. chose a syntax without side effects or even use
> a config file per service.

We can not achieve "semantical" correctness: be able to "understand"
what the user wanted to do. But, at least, if a config file is not
reachable or if a directive is unparsable, there is obviously something
wrong.

If inetd is not running, if the administrator doesn't look at the logs,
he will very probably be reachable by phone or by email, and users will
be sure telling him that something is wrong...

At least, wouldn't it be worth to add a flag simply to parse and
validate the syntax without running the daemon?

And some log messages are problematic too:

DPRINTCONF("Syntax error; Exiting '%s'", CONFIG);

while it never exits: the function returns an invalid status code, and
the process goes on...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


inetd(8): continue or exit on error?

2023-05-29 Thread tlaronde
There are infelicities in /usr/src/usr.sbin/inetd/parse.c and I will
send a PR with patches attached.

The question is what to do in case of a config file not found (this is
the initial problem: the realpath() return status is not tested and a
structure is inconditionnally added to a linked list with an unreachable
config file).

It seems to me, since these are services, that the failure to load a
config is critical enough (since the server may be then servicing what
was not intended to be serviced; the reverse is less problematic)
to exit at least on this error.

What do others think?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: MAXPATHLEN vs PATH_MAX

2023-05-28 Thread tlaronde
Le Sun, May 28, 2023 at 06:26:44AM +, David Holland a écrit :
> 
> Also, I'm not sure everyone agrees with you on that distinction of
> "size" and "length" (even though it makes a certain amount of sense)
> so be careful about drawing too many conclusions.
> 

For this, it's simply, for me, C:

char string[];

the difference between strlen(string) and sizeof string.

And for the ones who could think that I'm beating around the bush,
stumbling now and then on something at random, no: like in the Dupin
Poe's first novel, once you have the... path, it's clear why you ended
there:

I wanted to use the jemalloc debugging features. But it is not on by
default in libc, while jemalloc is the alloc engine now. I then wanted
to LD_PRELOAD an ad hoc compiled version: the program crashed in ld.elf_so.

So I started to look at the code of the rtld and saw that LD_PRELOAD was
taking a list of libraries using strsep(3) to tokenize (on both column and
space). Well, I use strtok(3) or an hand made variant, so I looked what
strsep(3) was doing differently from strtok(3).

About its features is the ability to detect "empty fields". Uh? But what
will give an empty string library name?

My first reaction was: "obviously" an empty string will fail. But I'm
always suspicious when I answer: "obviously". It's generally because I
never thought of any reason.

So I take K&R and look for fopen(3): the empty string is not mentionned.
Uh...

So I take POSIX for open(2): the empty string is not mentionned clearly
either but, distilling the mandatory errors, it seems that an empty
string could be used for a directory (but converted to CWD) but could
not be used for a filename. But I'm unsure of my interpretation.

So I look in the NetBSD implementation and look at realpath(), then
stumble on the faulty pathadj().

Having found one bug, the---normal---process is to search if this fault
was not done elsewhere (when I find a fault in my code, I review all my
code to see if I have not done the same fault---or another one...---for the
same thing elsewhere).

Then I find what I think is a nest of bugs in inetd.

Going back to realpath(), I see that it uses MAXPATHLEN while PATH_MAX
is defined by NetBSD and is used in the 2018 version of POSIX. So I look
the definition of MAXPATHLEN to see if there can be a hiatus...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: MAXPATHLEN vs PATH_MAX

2023-05-27 Thread tlaronde
Le Sat, May 27, 2023 at 08:50:21PM +, David Holland a écrit :
> On Sat, May 27, 2023 at 09:10:34AM +0200, tlaro...@polynum.com wrote:
>  > Shouldn't be MAXPATHLEN be defined as PATH_MAX - 1?
> 
> No. They're the same. The existence of some old code where somebody
> didn't read the definition carefully (or that predates a clear
> definition) doesn't change that. Not sure why you think it should...

The comment in sys/param.h:

/*
 * MAXPATHLEN defines the longest permissible path length after expanding
 * symbolic links. It is used to allocate a temporary buffer from the buffer
 * pool in which to do the name expansion, hence should be a power of two,
 * and must be less than or equal to MAXBSIZE.  MAXSYMLINKS defines the
 * maximum number of symbolic links that may be expanded in a path name.
 * It should be set high enough to allow all legitimate uses, but halt
 * infinite loops reasonably quickly.
 *
 * MAXSYMLINKS should be >= _POSIX_SYMLOOP_MAX (see )
 */

It uses once more "length" so that, reading it, a user (at least me)
doesn't know if it is a string length or a buffer size so if the buffer
is (MAXPATHLEN+1) or MAXPATHLEN. Onc could argue that if one reads
"carefully", since a power of two is wanted, one can not add 1 since it
will not be a power of two anymore... But this would be arguing.

Shouldn't MAXPATHLEN be marked as deprecated precisely because of the
unfortunate ambiguity of its name, PATH_MAX being prefered in it's stead, and 
the
comment modified to clearly state it is a size and not a length?

BTW I have realized that modifying the definition to match the "description"
will impose to review all the code (old and added) to track an of by one
problem, so it would introduce a can of worms.

IMHO, at least the comment should be amended to state things more clearly and to
not use "length".
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Sanitizing (canonicalising) the block device name in mount_ffs ??

2023-05-27 Thread tlaronde
Le Sat, May 27, 2023 at 11:56:16PM +0700, Robert Elz a écrit :
> I'm dual-posting this to tech-kern and tech-userlevel, as while it is
> a userlevel issue, it could have kernel implications.   Please respect
> the Reply-To and send replies only to tech-userlevel
> 
> You may have noticed that a recent change (mine) to the pathadj()
> function (which converts an abritrary path name to its canonical form).
> That function is not permitted to fail, but could.   Now instead of
> failing, and returning (potential) nonsense, it exits if it cannot
> do what it is required to do (usually it can).  In practice this
> affects nothing real.
> 
> However, it affects some uses of rump - which sets up a "block device"
> in a way that its name cannot be canonicalised.   It was relying upon
> the way that pathadj() happens to work (based upon how realpath(3) works)
> to make things function - pathadj() was issuing an error message, which
> some rump using ATF tests were simply ignoring (deliberately).
> 
> Yesterday, I was trying to find a way to make this all work - unsuccessfully.
> 

Since pathadj() was just sugar, calling realpath(3) (without really
testing the return) and emitting some messages, in a special case can
you simply "flatten" the thing i.e. replace the call to pathadj() by a
call to realpath(3)?

And then, there should be a code similar to what is done in
src/sbin/mount/mount.c: if canonical_path is NULL, try what the user
passed:

219,224
/*
 * Create a canonical version of the device or mount path
 * passed to us.  It's ok for this to fail.  It's also ok
 * for the result to be exactly the same as the original.
 */
canonical_path = realpath(*argv, canonical_path_buf);

227,238
/*
 * Try looking up the canonical path first,
 * then try exactly what the user entered.
 */
if ((canonical_path == NULL ||
(mntbuf = getmntpt(canonical_path)) == NULL) &&
(mntbuf = getmntpt(*argv)) == NULL) {
out:
errx(EXIT_FAILURE,
"Unknown special file or file system `%s'",
*argv);
}

>From a superficial knowledge, it seems to me that, eventually,
the __mount50() syscall has to be called with a canonical path,
since the syscall does no acrobatics with the path (and shall not be
passed garbage).

FWIW
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Sanitizing (canonicalising) the block device name in mount_ffs ??

2023-05-27 Thread tlaronde
Le Sat, May 27, 2023 at 11:56:16PM +0700, Robert Elz a écrit :
> I'm dual-posting this to tech-kern and tech-userlevel, as while it is
> a userlevel issue, it could have kernel implications.   Please respect
> the Reply-To and send replies only to tech-userlevel
> 
> You may have noticed that a recent change (mine) to the pathadj()
> function (which converts an abritrary path name to its canonical form).
> That function is not permitted to fail, but could.   Now instead of
> failing, and returning (potential) nonsense, it exits if it cannot
> do what it is required to do (usually it can).  In practice this
> affects nothing real.
> 
> However, it affects some uses of rump - which sets up a "block device"
> in a way that its name cannot be canonicalised.   It was relying upon
> the way that pathadj() happens to work (based upon how realpath(3) works)
> to make things function - pathadj() was issuing an error message, which
> some rump using ATF tests were simply ignoring (deliberately).
> 
> Yesterday, I was trying to find a way to make this all work - unsuccessfully.
> 
> Today I am wondering why we need to bother?That is, not why we bother
> with rump, not even why rump has to make its magic etfs work the way it
> does.   But why we need to canonicalise the block device name for mount.
> 

Isn't it because in order to be able to compare strings, the path has
to have an uniq (canonical) form, independent from the way the user
enters it? For example, at the user level, how mount(8) could compare,
given only one argument, to what is in /etc/fstab without trying
first to give the pathname given some normal form? (I imagine that
with the NAME=...  syntax in fstab, there are now alternatives to
the canonical form, but in a limited amount).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


[BUG] in src/usr.sbin/inetd (?)

2023-05-27 Thread tlaronde
In /usr/src/usr.sbin/inetd/parse.c, a linked list of file_list
structures is included.

When adding a config include file, realpath(3) is called:

parse.c:include_configs()
1193 
new_file.abs = realpath(CONFIG, NULL);

new_file.abs is not tested against NULL, and the structure is
inconditionnally added to the list. But before returning:

1201
free(new_file.abs);

the allocated buffer is freed (may be not NULL) but the member is not
reset to NULL.

Then in check_no_reinclude(), the list is walked down and this test
is made:

1314
char *abs_path = realpath(glob_path, NULL);

1324,1325
for (cur = file_list_head; cur != NULL; cur = cur->next) {
if (strcmp(cur->abs, abs_path) == 0) {

So:
- if realpath(3) failed, strcmp(3) is called against NULL;
- if realpath(3) succeeded, the comparison is made with a location
that has been freed!

It may be that the alloc engine doesn't reuse a freed region, so it may
work (if realpath(3) succeeded...), but it seems to me there is
something wrong... Am I reading the source correctly?

And on a minor style note:

In check_no_reinclude(), cur is assigned at declaration, but is in fact
redefined (to the same value) in the for loop:

1310,1325
static bool
check_no_reinclude(const char *glob_path)
{
struct file_list *cur = file_list_head;
char *abs_path = realpath(glob_path, NULL);

if (abs_path == NULL) {
ERR("Error checking real path for '%s': %s",
glob_path, strerror(errno));
return false;
}

DPRINTCONF("Absolute path '%s'", abs_path);

for (cur = file_list_head; cur != NULL; cur = cur->next) {
if (strcmp(cur->abs, abs_path) == 0) {

(abs_path is also assigned at declaration, but it is not redundant.)
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


MAXPATHLEN vs PATH_MAX

2023-05-27 Thread tlaronde
In src/sys/sys/param.h, MAXPATHLEN is defined as PATH_MAX.

In POSIX, PATH_MAX is defined like this:

{PATH_MAX}
Maximum number of bytes the implementation will store as a pathname
in a user-supplied buffer of unspecified size, including the terminating
null character. Minimum number the implementation will accept as the
maximum number of bytes in a pathname.

So the maximum length of a path is PATH_MAX - 1.

How is MAXPATHLEN to be interpreted: the maximum length of the string
as the name seems to imply, or the maximum number of bytes, including
terminating nul, occupied by the maximum path?

In the sources, we have the two uses:

in src/bin/pax/ar_subs.c, the buffer allocated for a call to realpath(3)
is MAXPATHLEN (no problem, since realpath will not store more than
MAX_PATH).

In src/crypto/dist/ipsec-tools/src/racoon/privsep.c, we have a buffer of
MAXPATHLEN + 1, taking into account a trailing nul and interpreting
'LEN' as in strlen(3), which seems logical.

Shouldn't be MAXPATHLEN be defined as PATH_MAX - 1?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Bugs in sbin/mount/pathadj.c?

2023-05-24 Thread tlaronde
It seems to me that there are two problems:

- one minor : with the warning that has been modified and that is less
pertinent than the "old" one;

- the other one major: the value returned by realpath(3)
is lost for the real caller. And testing the value of
adjusted is wrong since, from the manpage, "adjusted" can
be defined with the portion of the pathname causing problem.
And I don't see any caller testing errno...

For the minor bug:

--- /data/m/netbsd_9.3/usr/src/sbin/mount/pathadj.c 2011-02-17 
17:57:46.0 +0100
+++ pathadj.c   2023-01-21 18:51:35.0 +0100
@@ -1,4 +1,4 @@
-/* $NetBSD: pathadj.c,v 1.2 2011/02/17 16:57:46 pooka Exp $*/
+/* $NetBSD: pathadj.c,v 1.3 2020/07/26 08:20:22 mlelstv Exp $  */
 
 /*
  * Copyright (c) 2008 The NetBSD Foundation.  All Rights Reserved.
@@ -37,10 +37,13 @@
 pathadj(const char *input, char *adjusted)
 {
 
-   if (realpath(input, adjusted) == NULL)
+   if (realpath(input, adjusted) == NULL) {
warn("Warning: realpath %s", input);
-   if (strncmp(input, adjusted, MAXPATHLEN)) {
-   warnx("\"%s\" is a non-resolved or relative path.", input);
+   return;
+   }
+
+   if (input[0] != '/') {
+   warnx("\"%s\" is a relative path.", input);
warnx("using \"%s\" instead.", adjusted);
}
 }


An "adjusted" path can perfectly be "/usr/lib/../libexec" so testing
only the first char of "const char *input" against '/' says nothing.

The old way was "too strong" in the sense that "/usr/lib/" resulted in
error (the trailing '/' is removed by realpath(3) so input and adjusted
would differ. But this is minor inconvenience.

The present warning is false. This has no real consequence since this
is not adjpath() that makes anything of the result. But nonetheless, the
modification, for this, seems wrong. And the message for realpath()
failure should be adjusted too.

And this is precisely because pathadj() calls realpath() but doesn't do
anything if realpath(3) is in error, that there is the major problem.

pathadj() should return something on error.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: NetBSD 10 and NetBSD 11...

2023-05-17 Thread tlaronde
Le Wed, May 17, 2023 at 08:50:04PM +0200, Martin Husemann a écrit :
> On Wed, May 17, 2023 at 08:47:33PM +0200, tlaro...@polynum.com wrote:
> > But the fact that the advertised list of changes for 10 stops at
> > February 2023 is probably something the webmaster(s) should look
> > at: it's a bit confusing/disturbing...
> 
> Where do you see that?

Here:

https://www.netbsd.org/changes/changes-10.0.html
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: NetBSD 10 and NetBSD 11...

2023-05-17 Thread tlaronde
Le Wed, May 17, 2023 at 08:38:53PM +0200, Martin Husemann a écrit :
> On Wed, May 17, 2023 at 11:29:43AM -0700, Jason Thorpe wrote:
> > There have been a steady stream of bug fixes from the trunk being
> > pulled into the netbsd-10 release branch.  I don't have knowledge of
> > releng@'s plans vis a vis a release date.
> 
> You can find details about the 10.0 release state
> at
> 
>   https://wiki.netbsd.org/releng/netbsd-10/
> 
> (but right now that site is not reachable for me)

Ah! so it's not just me---for problem with connections.

FWIW: the main domain www.netbsd.org works, but every other subdomain
fails.
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: NetBSD 10 and NetBSD 11...

2023-05-17 Thread tlaronde
Le Wed, May 17, 2023 at 11:29:43AM -0700, Jason Thorpe a écrit :
> 
> 
> > On May 17, 2023, at 11:21 AM, tlaro...@polynum.com wrote:
> > 
> > I don't know on what mailing list to ask this...
> > 
> > I have seen on the web site that changes to 10 have stopped in february
> > and that changes are now for 11...
> > 
> > Does this mean that 10 will never be released and that the focus is on
> > 11 now?
> 
> There have been a steady stream of bug fixes from the trunk being pulled into 
> the netbsd-10 release branch.  I don't have knowledge of releng@'s plans vis 
> a vis a release date.
> 

Thanks for the answer.

It seems that there are difficulties, at least from my location
(France), with the NetBSD site now (every page redirected is in
error in Firefox while trying to download the very same page with
ftp(1) succeeds...), so I have a very partial view of the
infos online.

But the fact that the advertised list of changes for 10 stops at
February 2023 is probably something the webmaster(s) should look
at: it's a bit confusing/disturbing...
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


NetBSD 10 and NetBSD 11...

2023-05-17 Thread tlaronde
I don't know on what mailing list to ask this...

I have seen on the web site that changes to 10 have stopped in february
and that changes are now for 11...

Does this mean that 10 will never be released and that the focus is on
11 now?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: jemalloc profiling

2023-04-28 Thread tlaronde
Hello,

Le Thu, Apr 27, 2023 at 10:54:30PM +, RVP a écrit :
> On Thu, 27 Apr 2023, tlaro...@polynum.com wrote:
> 
> > I was trying to use the profiling capabilities of jemalloc and found
> > that there is a /usr/lib/libjemalloc_p.a that seems (from
> > /usr/src/distrib/sets/lists/comp/mi) to be the profiling version.
> > 
> 
> The *_p.a files are used for call-profiling. What you want is jemalloc's
> allocation profiling which is a different beast entirely. For this you'll
> have to re-compile jemalloc with `JEMALLOC_PROF' defined either in the
> Makefile:
> 
> src/external/bsd/jemalloc/lib/Makefile.inc
> 
> or, in the defs file:
> 
> src/external/bsd/jemalloc/include/jemalloc/internal/jemalloc_internal_defs.h
> 

Thanks for the explanations! I will compile an ad hoc version.

Best,
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


jemalloc profiling

2023-04-27 Thread tlaronde
I was trying to use the profiling capabilities of jemalloc and found
that there is a /usr/lib/libjemalloc_p.a that seems (from
/usr/src/distrib/sets/lists/comp/mi) to be the profiling version.

When setting MALLOC_CONF, the "normally" linked processes spit about
whether "Malformed conf string" (when using "opt.prof:true,...") or
"Invalid conf pair: x:y" ("prof:true,...") but I can ignore these and
guess that "prof:true,prof_leak:true,..." is valid for the programs
linked against libjemalloc_p---note that jemalloc(3) seems to be the
pristine version, and there is discrepancies with the NetBSD use since,
for example, no jemalloc/jemalloc.h (since it is integrated as
malloc implementation), and "opt.prof" etc. are mentionned while it
seems in MALLOC_CONF one needs "prof" etc.

But I don't get any report or dump with my program compiled against
libjemalloc_p.a so I'm in a blue whether this can work or not.

Is there some special incantation that makes it work? or is there the
necessity to install the debug set to expect it to work? Or is it simply
not possible to mix the profiling version on NetBSD because the not
profiling version is integrated in the libc?

TIA,
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


BearSSL: alternative to openSSL?

2023-02-04 Thread tlaronde
Is there some consideration given to BearSSL:

https://bearssl.org/

as a possible (future) alternative to openSSL?
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


  1   2   >