partly revoking NFS export
Is there a way to partly revoke NFS export, temporarily? Say I export /export/foo and /export/bar and I want to fsck /export/foo while continuing to export /export/bar. What would happen if I remove /export/foo from /etc/exports (and HUP mountd) and then unmount it? Would NFS clients get ESTALE for any files they happen to have open? Is there a way to make them just hang on those files until I re-mount and re-export the file system?
Re: sh(1) read: add LINE_MAX safeguard and "-n" option
As I understand, the performance problem of the read built-in utility originates from its need to read one byte at a time in order not to swallow input it doesn't process. Would it make sense to add an "exclusive" option (call it "-x" for now) to read, where "read -x" essentially means "I promise to do all processing on this open file with read -x or not to complain if input gets lost"? This may be infeasable due to not being able to guarantee that any "read -x" really uses the built-in version. The point, of course, is to allow the shell to buffer the input.
Re: (b)make: Outputting a huge variable value
> What happens here is that the child "make -V" reads the complete Makefile > again, and after that, it prints the variable value. Oh, then it's propably useless in the pkgsrc context the question came up.
Re: (b)make: Outputting a huge variable value
> main: .PHONY > make -V X | wc -c Wow, I didn't know that worked. How on earth /does/ it work? If a command happens to be "make", make just forks without the child process exec()ing anything, so all the make variables etc. are inherited?
Re: (b)make: Outputting a huge variable value
> Does "make -V '$VARIABLE'" (or without the $, depending on exactly what > you want) not work? I must be missing something. The point is to do this in a makefile, not as an isolated invocation of make. So something like ${VAR:wfile} or ${VAR:|echo >file}.
(b)make: Outputting a huge variable value
Following a discussion (make mdi: shell limit exceeded) on tech-pkg@, I keep asking myself whether there's a sensible way to output the contents of a make variable to a file (or pipe), even if the contents exceeds ARG_MAX.
Re: strtoi(3) ERANGE vs ENOTSUP
I think it depends on what you consider valid use cases of strtoi(). > This is the case when the number is out of range, and there's trailing > garbage -- e.g., s="42z", min=3, max=7. Is it correct to consider the first non-digit character garbage? I.e., would you regard it as an abuse of strtoi() to parse strings like "32k", "16M" or "1.5", "4 2/3? If the answer is "don't do that" (i.e. a valid use should be entirely parseble), then I'd expect ENOTSUP. If that's valid use, I'd expect ERANGE.
Re: child of a multithreaded process looping in malloc()
Oh well. > It effectively does -- the calls to _malloc_prefork/postfork(_child) > are baked into fork in NetBSD's libc: It does in -current, but the problem shows up in -8. I'll try to back-port the changes. Am I right that I'll need the changes of malloc.c 1.60, jemalloc.c 1.53 (and 1.48), extern.h 1.26 and pthread_atfork.c 1.15? Did I miss something? Thanks for the hint. I grep'd for pthread_atfork in -current's jemalloc.c, but, of course, that didn't reveal anything.
child of a multithreaded process looping in malloc()
I've a long-standing problem of a process eating cpu time without doing anything useful, which, most probably, goes like this: Of a multi-threaded process, one thread is in malloc() while another thread fork()s. (So the child is born with the malloc lock held.) The child process (becoming single-threaded by the fork) calls malloc(). The child loops forever because there's no other thread to release the lock. Strictly speaking, malloc(3) is not declared to be async-signal-safe, so a threaded program shouldn't call malloc() in a child before fork()ing. In my case, the code doing the fork/malloc is the Perl interpreter embedded into collectd, so there's little I can do about it. Couldn't malloc simply install a pthread_atfork() handler that releases the lock in the child?
Re: new certificate stuff
What about certctl.conf in the etc set defaulting to "manual" and sysinst (optionally?) changing it to automatic mode? Of course, then, updating to -10 wouldn't give you automatic mode.
Re: epoll exposure
> It also is a wrong way to build self-configuration; such a test is > vulnerable to both false positives and false negatives. It should be > reported upstream as a bug. Much righter is to test whether epoll, if > present, produces the behaviour the program expects in the uses it > makes of it. As Linux introduced epoll (or so I think) I think it's appropriate -- absent a SUS specification -- to assume it works as under Linux? How would you argue if some other OS was to introduce something called kqueue with semantics different from FreeBSD?
submission port usage (was: /etc/services losses)
I think the de-facto rationale for a larger network goes like this: -- You don't want to get your IPs blacklisted because infected clients send spam from within your network. -- Other sites will allow mail submission on their submission port only after authentication (SASL). So you block outgoing smtp, but allow outgoing submission.
Re: unresolvable R_X86_64_NONE relocation against symbol `__sF'
> Can you post the build log for this and the > > /var/work/pkgsrc/pkgtools/pkg_install/work/libfetch/libfetch.a > > file itself? attached (gzip-ed and .work.log renamed). dot_work.log.gz Description: Binary data libfetch.a.gz Description: Binary data
porting nss-pam-ldapd (was: pulling in changes to nsswitch from FreeBSD?)
> For "reasons" I have been looking to build nss-pam-ldapd I've already done that eight years ago: see pkg/49804.
unresolvable R_X86_64_NONE relocation against symbol `__sF'
And another weird problem: Using a lang/gcc8 compiler patched to use gas and gld from devel/binutils 2.26.1 (see other thread), when building pkgtools/pkg_install, I get /usr/pkg/bin/gld: /var/work/pkgsrc/pkgtools/pkg_install/work/libfetch/libfetch.a(common.o)(.text+0x1e5): unresolvable R_X86_64_NONE relocation against symbol `__sF' /usr/pkg/bin/gld: final link failed: Nonrepresentable section on output collect2: error: ld returned 1 exit status *** [pkg_add] Error code 1 What the hell does that mean? What's really strange is that pkgtools/pkg_install is the only package from my set that fails to build. I can build monstrosities like devel/cmake, devel/boost-libs and lang/gcc8 (using said gcc8), but not pkgtools/pkg_install! Any help?
Re: using gas/gld from devel/binutils
[Adding tech-userlevel for the base system questions] For involved reasons, I patched lang/gcc8 to use gas/gld from devel/binutils (analogous to what is done to use gas on Solaris). This makes lang/gcc8 fail to build, it segfaults (cc1 nil pointer reference in etc_set_father() calling et_splay(rmost) with rmost being NULL) during stage 2. This is reproducible on different hardware, under different NetBSD versions (-6 and -8), but all amd64. Updating devel/binutils to pkgsrc-current doesn't help. But downgrading devel/binutils to 2018Q1 (binutils 2.26.1) makes it work. The next update in pkgsrc is to 2.34 (in 2020Q1), which fails. It looks strange that GCC is unable to compile itself using GNU as and ld. This is on NetBSD-8 and 2022Q4. Building lang/gcc8 with the base binutils (2.27) works. What Version of binutils do -9, -10 and -current use? Are there patches in base that are missing in pkgsrc? Any other hints? I worked around it (with binutils 2.26.1, I can compile a set of packages including net/icinga2, which depends on devel/cmake and devel/boost-libs), but there seems to be a nasty bug lurking somewhere.
Re: debugging/tracing a setuid program
I haven't investigated this further, but it worked to ktrace -p and revealed openat() as the culprit. It's detected by autoconf on the -6 chroot on -8 while -6 doesn't implement it.
Re: debugging/tracing a setuid program
> (a) I'd say it shouldn't stop ktracing I suspect it stops as soon as sudo calls setuid.
Re: debugging/tracing a setuid program
> As root, ktrace -i the shell (or other process) it's started by. That gives me a ktrace that stops in the middle of the GIO where sudo is reading the sudoers file.
debugging/tracing a setuid program
I have an interesting problem: How do you debug or ktrace a setuid binary that exhibits the problem only when run as non-root? (Specifically, this is sudo built for NetBSD-6 via kver in a chroot on -8 failing to read the timestamp files on real -6. When called as root, it doesn't use the timestamps.)
Re: split(1): add '-c' to continue creating files
> How about instead adding an option that sets the first name explicitly > and keeps the "abort on failure" behaviour? That looks like a much better idea to me.
Re: tcsh as csh default
> About the only argument for retaining csh that makes zero > sense is to retain it for scripts. I guess it's a matter of (varying) taste. I used to prefer csh at some time, and that was definetely after V7 appeared. Also, there are csh scripts out there. Anyone fully converted metamail to sh?
Re: Permissions of the root dot files
> what difference does the user 'w' (or 'r' ... 'x' does matter) permission > bit really make on a root owned file? To me, it implies that the file should not be written regardless of the fact that it technically can.
PATH order (was: sh(1) and ksh(1) default PATH)
If you need the base version of a utility, why not call it by full path? OTOH, if people need a newer version of foo, I install foo from pkgsrc and want that to take precedence over the base version. If I write a wrapper around bar, I put it in /usr/local/bin.
Re: /rescue/tar needing liblzma.so.2
> if /rescue/tar is going to run gzip as a subprocess, it should have > the full path /rescue/gzip (or gzcat or whatever) baked in so > its behaviour doesn't depend on PATH. Yes, that looks lihe TRT to me.
Re: /rescue/tar needing liblzma.so.2
> Fortunately, I could /rescue/gzcat base.tgz >base.tar and tar x that I also couldn‘t /rescue/gunzip base.tgz. Why?
/rescue/tar needing liblzma.so.2
I nearly locked myself out updating a server (from -6 to -8). I booted the new kernel (single user), mounted /usr, /var and /tmp, extracted base.tgz (excluding /rescue, fortunately), but had forgotten to remount / rw. After that (I interrupted tar), tar didn't work any more. Well, I had /rescue/tar. Unfortunately, with z, it said Shared object "liblzma.so.2" not found Fortunately, I could /rescue/gzcat base.tgz >base.tar and tar x that. So why does something in /rescue need a shared library?
pthread__initmain() calling err() (was: cmake core dumps in -6 emulation)
> No, this is just too early in the init sequence. > It shouldn't be using err()... Then someone(TM) should fix that? Shall I file a PR?
Re: cmake core dumps in -6 emulation
EF> What I don't understand is why it dumps core while reporting an error. KRE> Perhaps NetBSD 6 required an explicit setprogname() which is no longer KRE> required? But that's a matter of the C library, no?
Re: cmake core dumps in -6 emulation
> you could test with paxctl on the cmake file With paxctl +m /usr/pkg/bin/cmake (inside the chroot, of course), it now core dumps even earlier in the build, with #0 0x0079d33c in cmsys::SystemTools::FilesDiffer(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) () (gdb) bt #0 0x0079d33c in cmsys::SystemTools::FilesDiffer(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) () #1 0x00437072 in cmGeneratedFileStreamBase::Close() () #2 0x00437141 in cmGeneratedFileStreamBase::~cmGeneratedFileStreamBase() () #3 0x00437263 in cmGeneratedFileStream::~cmGeneratedFileStream() () #4 0x006ec4da in cmLocalUnixMakefileGenerator3::ScanDependencies(std::__cxx11::basic_string, std::allocator > const&, std::map, std::allocator >, cmDepends::DependencyVector, std::less, std::allocator > >, std::allocator, std::allocator > const, cmDepends::DependencyVector> > >&) () #5 0x006ecbe4 in cmLocalUnixMakefileGenerator3::UpdateDependencies(std::__cxx11::basic_string, std::allocator > const&, bool, bool) () #6 0x0041d958 in cmcmd::ExecuteCMakeCommand(std::vector, std::allocator >, std::allocator, std::allocator > > >&) () #7 0x0040cb41 in main () > Maybe you should set sysctl security.pax.mprotect.global=0 while building > the old pkgs? That works. What I don't understand is why it dumps core while reporting an error.
Re: cmake core dumps in -6 emulation
> Ktrace it As mentioned, that doesn't work (well, it works, which is the problem). > there are 3 err() calls in pthread__init() Starting with #8 0x71b551460ac0 in err () from /usr/lib/libc.so.12 #9 0x71b55240c47b in pthread__init () from /usr/lib/libpthread.so.1 I disassembled a bit before 0x71b55240c47b, the relevant part being 0x71b55240c468 : lea0x56a(%rip),%rsi# 0x71b55240c9d9 0x71b55240c46f : mov$0x1,%edi 0x71b55240c474 : xor%eax,%eax 0x71b55240c476 : callq 0x71b552405eb0 0x71b55240c47b : mov%rdi,%rax and x/s 0x71b55240c9d9 says 0x71b55240c9d9: "mprotect stack" So what's going wrong?
Re: cmake core dumps in -6 emulation
> but it's got to mean _something_ Timing?
Re: cmake core dumps in -6 emulation
> Ktrace it That, way, it proceeds past the error. I can then interrupt it and proceed with a normal make build. That core dumps again (later); ktrace-d, it proceeds past the error again. Interrupt, proceed normal. core dumps. Finally, I get a core dump with the ktrace-d run, but no sensible backtrace. Last entries of the process dumping core: 23687 1 cmakeNAMI "/libkver_machine_arch" 23687 1 cmakeRET readlink -1 errno 2 No such file or directory 23687 1 cmakeCALL __sysctl(0x7f7fffc03160,2,0x71b5548018c4,0x7f7fffc03178,0,0) 23687 1 cmakeRET __sysctl 0 23687 1 cmakeCALL __sysctl(0x7f7fffc031f0,2,0x71b552612b40,0x7f7fffc031e8,0,0) 23687 1 cmakeRET __sysctl 0 23687 1 cmakeCALL _lwp_unpark_all(0,0,0) 23687 1 cmakeRET _lwp_unpark_all 1024/0x400 23687 1 cmakeCALL __sysctl(0x7f7fffc03110,2,0x71b55173d340,0x7f7fffc03108,0,0) 23687 1 cmakeRET __sysctl 0 23687 1 cmakeCALL getrlimit(3,0x7f7fffc031d0) 23687 1 cmakeRET getrlimit 0 23687 1 cmakeCALL mprotect(0x7f7fff80,0x40,3) 23687 1 cmakeRET mprotect -1 errno 13 Permission denied 23687 1 cmakePSIG SIGSEGV SIG_DFL: code=SEGV_MAPERR, addr=0x4c, trap=6) 23687 1 cmakeNAMI "cmake.core" So could this be a problem with libkver?
cmake core dumps in -6 emulation
In order to be able to build packages for -6 on a -8 machine, I set up a subdirectroy in /var/chroot containing a tar-ed copy of a real -6 machine. I then chroot into there with kver -r 6.1_STABLE /bin/sh. I tried to build icinga2, but I cmake core dumps with this backtrace: #0 0x77a602807644 in write () from /usr/lib/libpthread.so.1 #1 0x77a6018ee079 in __swrite () from /usr/lib/libc.so.12 #2 0x77a6018ed84b in __sflush () from /usr/lib/libc.so.12 #3 0x77a6018ed894 in fflush () from /usr/lib/libc.so.12 #4 0x77a6018ce870 in __vfprintf_unlocked () from /usr/lib/libc.so.12 #5 0x77a6018d1c68 in vfprintf () from /usr/lib/libc.so.12 #6 0x77a6018ccdf4 in fprintf () from /usr/lib/libc.so.12 #7 0x77a60187e660 in verr () from /usr/lib/libc.so.12 #8 0x77a601860ac0 in err () from /usr/lib/libc.so.12 #9 0x77a60280c47b in pthread__init () from /usr/lib/libpthread.so.1 #10 0x77a60184f105 in _libc_init () from /usr/lib/libc.so.12 #11 0x77a6018389e4 in ?? () from /usr/lib/libc.so.12 #12 0x77a604d7a6a0 in ?? () #13 0x77a6018344a9 in _init () from /usr/lib/libc.so.12 #14 0x in ?? () Of course, it build on the real -6 machine (being one of the machines I developed the package on). Any ideas?
Re: Checking whether an rc.d script was invoked from rc
> Make the script require CRITLOCALMOUNTED (in -current, or mountcritlocal > in older versions)? I don't get that. That wouldn't mount /usr, no? My (second) question was about patching the binary so it would link to libc, libm and libpcap from /lib, not /usr/lib. Or would it do that automatically when /usr/lib is not available?
Checking whether an rc.d script was invoked from rc
Two probably very basic questions: Is there a way to check whether an rc.d script was invoked from rc (as opposed to running it manually, via service(8) or whatever? During autoboot, rc_fast is set, but if I go single user and back multi user or boot to single user and go multi user from there, it isn't. Hovever, there is an operation (sending NAs using /usr/pkg/bin/na6) in a script of mine that can safely be skipped when autobooting and most probably during going multi user, but needs /usr mounted (I could add that to critical_filesystems_local, yes). Alternatively, I could copy na6 to /local/bin (or whatever). It only needs libc, libpcap and libm, which are all in /lib, but ldd say the binary references /usr/lib. How do I change that? Thanks.
Re: rc of built-in printf
Oops, it looks like /usr/bin/printf will only exit non-zero because it receives SIGPIPE. If called with SIGPIPE ignored, it will still exit 0.
rc of built-in printf
Is it on purpose that sh's (at least, NetBSD-8's sh's) built-in printf doesn't give a non-zero rc if the underlying write(2) fails (with EPIPE, in my case)? It turns out that this { sleep 1; printf "Hallo" || echo "ERROR">&2; } | echo Foo doesn't print "ERROR" with both sh and bash, while it does with ksh. Replacing printf with echo changes nothing, while using the /usr/bin (or /bin, in case of echo) form does what I would expect. kre?
Re: sh(1) wait builtin command and stopped jobs
> Then add an option to wait [...] to indicate that wait should complete > if the [...] process enters stopped state I guess "enters stopped state" includes the case where the process already was in the stopped state when the wait command was issued? > My inclination is to go that way, rather than having default wait complete > when a (selected) job stops, with a possible option to avoid that, I don't have any strong opinion, but also find it slightly more natural that way. Long ago, I used processes stopping themselves as a primitive synchronisation tool (not from a shell script, however). I used an ELC to feed four CD writers, which worked well when the four cdrdao processes were in sync, but miserably failed otherwise. So I added a --stop option to cdrdao which stopped the process as soon as the lengthy initialization was complete and then manually issued a kill -CONT to make them continue.
Re: ZFS - mounting filesystems
> I don't see a real problem with deciding to mount all local filesystems > (marked auto of course) at mouncritlocal time. What if /usr is on NFS and /usr/local is local?
Re: installboot: Old BPB too big, use -f (may invalidate filesystem)
> BPB is PC specific and linked to MBR if I'm not mistaken. What seems to > me problematic is that you mention fdisk (i.e. MBR) first and the > error message is about GPT... I'm using MBR, not GPT. But fdisk provokes an error message about GPT.
Re: installboot: Old BPB too big, use -f (may invalidate filesystem)
> If you are trying to setup the machine for NetBSD only use It's NetBSD only. > - if you are UEFI booting from a FAT partition No, it's plain old BIOS boot (the server is from 2005). > - if you really want ffsv1 boot code, then sd0e better not be a FAT file >system sd0e is fstype RAID.
Re: installboot: Old BPB too big, use -f (may invalidate filesystem)
> I see no explicit indication of which port you're doing this on. Ah, sorry, it's amd64.
installboot: Old BPB too big, use -f (may invalidate filesystem)
What does installboot: Old BPB too big, use -f (may invalidate filesystem) mean? I have a RAIDframe level 1 RAID consisting of /dev/sd0e and /dev/sd1e. Now, sd0 failed. I replaced the disc, fdisk'ed and disklabel'ed it and performed a raidctl -R, which succeeded. Now, I need to write the boot code, so I can boot from the new disc. But installboot /dev/rsd0e /usr/mdec/bootxx_ffsv1 gives me above error message. I tried installboot -n /dev/rsd1e /usr/mdec/bootxx_ffsv1, which works. The only other oddity I see with the new disc is that fdisk provokes an SCSI error and sd0d: error reading fsbn 71819495 (sd0 bn 71819495; cn 49191 tn 0 sn 635) plus fdisk: Can't read secondary GPT header: Invalid argument (the disc size is 71819433).
Re: sh: killing a pipe head from the tail
> apart from patching collectd For the record: https://github.com/collectd/collectd/pull/3954
Re: sh: killing a pipe head from the tail
> It is irrelevant to the questino you asked. But people often ask "how > do I do XYZ?" when their problem is actually "how do I do ABC?" and > they believe XYZ is the only way (or sometimes, the correct way) to do > ABC. Such people are often wrong; there are often other ways to do ABC. Ah, OK, that's very true, as I know too well from the receiving end. > If you want to sidestep such questions, a very brief sketch of > why you want to do XYZ can help. In this case, saying something like > "I was looking at a problem which is irrelevant here, because it led me > to wonder if there's some way for the process on the read end of a pipe > to kill the process on the write end without depending on the writing > process getting SIGPIPE" might help make it clear that you're looking > for XYZ in its own right, independent of the ABC that motivated your > initial interest in it. Oh yes, that would have been a much better way to phrase it. > Personally, I would call it a bug in collectd that it leaves SIGPIPE > ignored by plugin children it forks; that is not what least surprise > would lead me to expect. I would say it'd be better for it to catch > SIGPIPE and have the catcher do nothing. That way, it'd get reset for > free upon exec in the child. Yes, I'm going to file a bug with collectd; I'll only need to look up where it may be relying on a child ignoring SIGPIPE. > It occurs to me that you might be able to do it by having the script > kill its entire process group. This may or may not do what you want, > depending on what collectd and the shell in question do with process > groups. It does seem to me like one of the cleaner possible solutions. I must admit I don't know enough of the details of process groups, let alone the question what Poetterix has done to them (while collect_envstat surely only needs to work on NetBSD, derivatives may need to work on Linux).
Re: sh: killing a pipe head from the tail
I'm confused. Is my english so bad that no-one understands what I'm asking or is my understanding of SIGPIPE wrong? > Can't you just close() the pipe? Yes, of course. I wrote: EF> Of course, when the tail exits, the head will get SIGPIPE as soon as it EF> tries to output something, but can the tail explicitly kill it earlier? So yes, closing the tail of the pipe will deliver a SIGPIPE to the head, but only as soon as it writes to the pipe, no? I was asking for a method to kill it even before it next tries to write something to the pipe. I was asked about the context and what I was really trying to do. I didn't give the context because I thaught it was irrelevant. What I was really trying to do is partly trying to learn. So the context is a collectd exec plugin translating envstat output to collectd PUTVALs. The structure is about envstat -i 10 | while read ... ... printf "PUTVAL ... || exit 1 ... done Now the problem is that collectd sets a handler to ignore SIGPIPE, that's inherited to the exec'd plugin (I could change that, after all, it's Open Source), so envstat continues to run. After I realized what the problem was (SIGPIPE ignored), I thought what to do and the obvious solution (apart from patching collectd) is to trap - PIPE before calling envstat. But since that would make envstat continue to run until it next tries to output something, I was asking myself whether there was a more elegant solution where I could explicitly kill envstat from within the while loop instead. Of course, the question is not really relevant to envstat -i, but maybe I'll be facing a similar situation in the future where the head must be killed because no two incarnations may run at once or letting it continue is expensive or whatnot.
sh: killing a pipe head from the tail
Is there a sane way to kill a pipe's head from within the pipe's tail, at least on tail exit? Of course, when the tail exits, the head will get SIGPIPE as soon as it tries to output something, but can the tail explicitly kill it earlier?
fuser(1)
I just stumbled over the fact that NetBSD userland seems to be missing the fuser(1) command mandated by (the XSI extension of) POSIX. Is there any reason (other than "nobody cared") for that?
Re: test a -nt b if b doesn't exist
> /bin/test and the test builtin to /bin/sh are the same source code, I already supposed that. > Edgar's question was more on what the definition of -nt should be, when > the 2nd arg file does not exist. Right. > But since no portable script can really use -nt (as it isn't standardised) > I'm not sure that this is all that important. It's a pity it's not stadardised, but if different shells behave differently, there's little chance it ever will be, no? I think test -nt/-ot is about the only non-SUS-thing (apart from some find operand I forget which one it is) my shell scripts knowingly rely on (I do sometimes rely on bmake and some non-standard M4 extensions). I'm avoiding local, which is a pain to do, but using find -newer to imitate test -nt looks like to great an overkill. Note that stat(1) isn't SUS either, so the only workaround I could think of is parsing two ls -l outputs.
Re: test a -nt b if b doesn't exist
EF> Is there any cons[e]nsus among shell developers what the prefer[e]d EF> behaviour is? I just noticed that you understood "shell developers" as "people writing shell code" whereas I originally meant "people writing shells". But, of course, both are interesting. > Incidentally, this means that, from this point of view, as you described > it, bash on NetBSD is buggy. People could also argue that bash behaving differently on NetBSD than elsewhere is buggy. > As for which behaviour - whether shell builtin or /bin/test - is > better? I'm not sure. Both behave the same way on NetBSD. > I'd say the test is false regardless of whether the second file exists Yes, that's documented. > Personally, I'd tend to treat a nonexistent second file as an > infinitely old second file: -nt is true and -ot is false. I tend to concur, but one could also argue that the test should always fail if the second file doesn't exist along the lines of "I can't confirm X being newer/older than Y because Y doesn't exist". > Just to complicate things, you/we might arguably want A -nt B to be the > same as B -ot A, which disagrees with the above two paragraphs. I wouldn't mind if A -nt B was different from B -ot A, at least not if that's documented.
test a -nt b if b doesn't exist
I've noticed a subtle difference between our sh and bash when applying test -nt to a pair of files where the first one exists, but the second doesn't: In bash, the result is true (0), while with our sh (and /bin/test), it's false (1). Unfortunately, test -nt is not specified by SUS. Is there any consnsus among shell developers what the preferrd behaviour is? At least, I think sh's behaviour should be documented (if only in the sense that it's undefined).
lout (was: Summary of man-page formatting)
> However, I took a quick look at the source yesterday and it seems the > code comes as 52 source files numbered 01-52, which is not, shall we > say, entirely auspicious. :-| Well, any decent C programmer would have called them z00.c through z51.c.
attaching gdb to stopexit'ed process (was: unconditional core dump on exit?)
EF> Is there a way to make a process dump core on exit no matter what? EF> I have a deamon dying (or whatever) from time to time with no trace to the EF> cause and guess a backtrace from a dump would help. KR> Start under a debugger, break on 'exit'. I tried to set proc.$pid.stopexit, but when I attach to the stopped process via gdb, it exits. Is that a bug?
Re: Set USB device ownership based on vedorid/productid
> devpubd does that in-tree. OK. Is there a sane way of obtaining VendorId/ProductId in the devpub script?
Set USB device ownership based on vedorid/productid
Does NetBSD provide any framework that allows USB device ownership/permissions to be autmatically set on USB VendorId/DeviceId? E.g., if a USB device with VendorId/ProductId 06da/0002 appears and becomes ugen0, do chown nut:nut /dev/ugen0.*; chmod 0660 /dev/ugen0.*
unconditional core dump on exit?
Is there a way to make a process dump core on exit no matter what? I have a deamon dying (or whatever) from time to time with no trace to the cause and guess a backtrace from a dump would help.
fdisk: Can't read secondary GPT header: Invalid argument
I just replaced a failed (SAS) disc in a RAIDframe RAID with an identical one. Everything seems to have worked well, only fdisk utters fdisk: Can't read secondary GPT header: Invalid argument combined with a kernel message sd0(ahd0:0:0:0): Check Condition on CDB: 0x28 00 04 47 e0 e7 00 00 01 00 SENSE KEY: Illegal Request INFO FIELD: 71687372 ASC/ASCQ: Logical Block Address Out of Range fdisk says total sectors: 71819496 of which, in disklabel, I only use 7168. I tried to scsictl detach the failed disc, but that gave me something like "device busy", so I just swapped. Why is a GPT fidsk's business? Would there have been a method of cleanly detaching the old and attaching the new disc?
merging /usr/bin etc. to / (was: Solving the syslogd problem)
> This is elegant and I would like to see it. Just remove /usr entirely and > collapse its contents into / - no /usr/bin, no /usr/lib, etc. This thread started because syslogd lives in /usr, mounting of /usr depends on NETWORK, and so network daemons are unable to log to syslog (at least in the beginning). I guess mounting /usr depends on NETWORK for a reaseon, that reason most probably being to make it network-mountable. So if you move /usr (or /usr/bin etc.) to /, you lose the ability to have the contents of /usr/bin etc. reside on the network. You could just as well decide that /usr must be a local filesystem, no?
Re: Fonts for console/fb for various locales: a proposal
> This is the whole purpose of METAFONT. METAFONT is a rasterizer. I'm somewhat sceptical that the results will be usable at the low resolution of a console frame buffer.
bmake: .if !empty(i:M[0-9]*)
I once again ran into an oddity with bmake's .for loops. It looks like that .if !empty(i:Mxxx) doesn't work if i is a .for loop variable. Assigning the loop variable to another var makes it work. Oddly enough, .if !empty(:U123:Mxxx) (outside a .for loop) works as expected. [I'm aware that [0-9]* will match "2nd"| .for i in 123 abc 456 .info ${i} ${i:M[0-9]*} .if !empty(i:M[0-9]*) .info ${i}: num .else .info ${i}: alpha .endif i_:= ${i:M[0-9]*} .if !empty(i_) .info ${i}: _num .else .info ${i}: _alpha .endif .endfor .if !empty(:U123:M[0-9]*) .info num .else .info alpha .endif default: bmake: "/tmp/x.mk" line 4: 123 123 bmake: "/tmp/x.mk" line 8: 123: alpha bmake: "/tmp/x.mk" line 12: 123: _num bmake: "/tmp/x.mk" line 4: abc bmake: "/tmp/x.mk" line 8: abc: alpha bmake: "/tmp/x.mk" line 14: abc: _alpha bmake: "/tmp/x.mk" line 4: 456 456 bmake: "/tmp/x.mk" line 8: 456: alpha bmake: "/tmp/x.mk" line 12: 456: _num bmake: "/tmp/x.mk" line 19: num Do I have some stupid error in that?
Re: NetBSD truss(1), coredumper(1) and performance bottlenecks
> I was asked for truss(1) by Christos back some time ago. So here it is. I'm surely missing something, but what's the advantage of truss over ktruss?
dynamic symbol resolving preference (was: SHA384_Update symbol clash with libgs)
> IMO, they're both at fault. I'm not at all an expert on the field of dynamic linking, but what strikes me as odd is: 1. The dynamic linker should be able to notice that two libraries are pulled in which export conflicting symbols and warn about it, no? That would have saved me three working days. 2. If a routine inside libssl references a symbol (like SHA384_Update) present in two libraries (libc and libgs) having been pulled into the process, but the file (libssl) containing the reference only depends on one of the two libraries providing the symbol (libc) and not the other (libgs), not even indirectly, it should have a strong preference to resolve the reference to the library being depended on (libc), not some random other one (lbgs) having been loaded, no? But probably my idea of dynamic linking is too simplistic or there are real use cases for that situation where one wants the opposite behaviour?
SHA384_Update symbol clash with libgs
I'm not sure at which level this needs to be dealt with. libgs, in its infinite wisdom, exports SHA384_Update, which of course clashes with OpenSSL's well known symbol of the same name. Which means that as soon as you pull in libgs, your TLS may fail in mysterious ways. [In my case, it was php-ldap failing to StartTLS because OpenSSL's tls1_setup_key_block() failed. It took three working days (including two stepping through OpenSSL code) to pinpoint that. libgs was pulled in via php-imagick.] Apart from a GS rendering library exporting symbol with the name of a well-known crypto function being a strange idea, who is at fault? Why does libssl's reference to SHA384_Update gets resolved to libgs's symbol and not libc's? Any workaround to make the code work?
Re: [PATCH] dump -U to specify dumpdates device
> IIRC Irix had this, both for EFS and XFS. No, as far as I remember and see, neither dump nor xfsdump had this. I didn't actually start my O2, though.
Re: [PATCH] dump -U to specify dumpdates device
> Example usages: > dump -a0uf /dump/data.dump -D NAME=data /data/ > dump -a0uf /dump/data-abc.dump -D data-abc /data/a* /data/b* /data/c* I fail to see any -U in there. > Opinions? Any prior art in one of the other BSDs or in systemd?
Re: Re-establishing the "magic" of the special vars in sh(1)
> So, one solution to this might be to add a new builtin command > something like this: > > specialvar variable ... While I concur that there's a problem to be solved (looks like it may even be a security problem to setenv RANDOM or the like), I guess such a change will only make sense if being widely adopted (like in "evey sensible shell has it" or "it's POSIX"). Do you happen to know what other shell's authors think about this? Would it be more promising to suggest a new option to the "unset" command? Through NIH syndrome, one may end up with specialvar, makespecial, isspecial, unset -S and unset -s otherwise.
Re: /bin/sh startup file processing
> only current use for the (posix "is undefined") relative path in $ENV is > if the intent is to run the script relative to whatever directory the > shell happens to start in. I doubt that is often intended. I would guess the most probable intent is to run it from $HOME (and wonder why it sometimes doesn't work if you happen to explicitly start a new shell)
Re: Moving telnet/telnetd from base to pkgsrc
> Y'all seem to think it's totally reasonable to telnet in the open internet What's the problem with "telnet www.uni-bonn.de http"?
Re: Moving telnet/telnetd from base to pkgsrc
> send hate mail my way. I guess you are over-looking my (and probably a lot of other network administrator's) primary use case for /usr/bin/telnet: connect to a HTTP/SMTP/IMAP/whatever port and speak the protocol.
Re: /bin/sh startup file processing
> unless /etc/profile changes it, $HOME (for .profile) So, would it make sense to treat relative paths as relative to $HOME, then? That way, you don't break existing setups where that was intended.
Re: /bin/sh startup file processing
> I'm considering, if it seems reasonable to those here, to change sh so it does > not read profile files (any of them) from relative paths (simply ignore any > such attempt). Yes, that sounds reasonable to me. I don't know how many people's profiles it might break, though. In the current version, what's a login-sh's wd at that time?
Re: X=1 :
> X=/ cd $X > > cd's to $HOME, not to / ... > > This really violates the POLA, I'd say... Depends on how much shell programming you do. I used to trip over it often enough. The point is that $X is expanded before X=/ is assigned. You run cd (with no arguments) with X set to /.
Re: X=1 :
> There are two weird things with this list: sh is definitively weird in places. > : doesn't need to be a special built-in at all. It can be > implemented perfectly well the same as /usr/bin/true. But it's defined to be a special built-in. (As an aside, true is a built-in -- see 2.9.1 1.d.) > cd is missing, yet it can't possibly be implemented externally. cd is a built-in, but not a special built-in. > So I suppose I would amend my proposal to do this in the more sensible > way, and make cd special, and : not special. This would violate POSIX. There's a strange hierarchy special built-in utitility -- built-in utitility -- utitility implemented as a built-in. If you implement a utility (say echo) as a built-in, well, it's a utility implemented as a built-in and just behaves as if it weren't. 2.9.1. 1.e.i.a even specifies that the built-in is not found when there's no such file in PATH. (special) built-in utilities must be built in. You're no allowed to implement them externally. What the reason for some of the built-ins to be special and some not I never understood. KRE, can you shed some light on this?
Re: X=1 :
See SUS, 2.9.1: If the command name is a special built-in utility, variable assignments shall affect the current execution environment. (and colon is a special built-in utility).
Re: /bin/sh redirect without command failure mode
> All other shells seem to not exit As even dash seem not to exit: Did they change their behaviour on purpose to match ksh/bash? I guess it would be better to line up with ksh/dash unless there's some compelling reason to keep the differing behaviour.
Re: dirname(3) used on its own results?
> But there's nothing in that bit that implies (to me!) > "You can't subsequently call dirname(3) on the results of > a previous call to dirname(3)". You are calling dirname() on an argument that may be destroyed by that very call. You are calling dirname() on a possibly invalid argument.
Re: dirname(3) used on its own results?
> or are we restricted by some standard from having dirname(3) > be able to be called on the results of a previous dirname(3) call Not exactly restricted from, but This is not strictly a bug; it is explicitly allowed by IEEE Std 1003.1-2001 (``POSIX.1''). > in which case we should document this in the manpage? The dirname() function returns a pointer to static storage that may be overwritten by subsequent calls to dirname().
Re: Updating old scripts ... #1 /usr/sbin/ypinit
I don't get that. It looks like you are looking for a script that reads and understands the documentation (which is wrong wrt. the current implementation) and would get confused if you change the documentation to match actual behaviour? Or are you looking for a script that reads and understands the documentation, reads the code, doesn't understand that, using getopt, the documented behaviour is impossible, but, after you converting the script to getopts, would understand that the behaviour doesn't match documentation and commit suicide/create a black hole/post "leave /bin/sh alone" comments etc?
Re: shell jobs -p
> You're using an old version, not NetBSD current (or 8) right? 6.1, mostly (for the ash part), yes. > and now the only way for a script to make a job vanish from the > jobs table (and so, from being seen in the output of jobs -p) is > to "wait" for it. Surely you've digested what SUS says about this. I didn't try too hard, but I didn't grasp it. Could you explain what SUS allows? > In interactive shells, as running "wait" is not normally something that > interactive users do, once the user has been told that a job is > complete (either via async notification, or from the jobs command) > it is removed, never to be seen again. OK, I understand. > The dash behaviour (clearing the jobs table in a subshell) is technically > correct (but as you say, makes life difficult) I'm not sure whether we are talking about the same thing. What I mean is that if you have background jobs running, jobs -p will surely list them. However, if you want to use jobs -p output in a script, the only way is to run it inside $(...). That's a (potential) subshell environment, and in dash, the output of $(jobs -p) is always empty. While I suppose the words of SUS allow that, SUS also gives $(jobs -p) as an example, which, in dash, could better be written "". This cannot be intended. > the correct way to handle things like this is to not make a sub-shell > to run "simple" command substitutions which do not affect the current > shell environment. I don't get that. Either we're talking past each other or I simply don't understand what you mean. Can you give a way where jobs -p and [insert your code here] echo "$j" give the same output in dash? > Your parallel.jobs.sh doesn't work, as it never does a "wait". OK. > "jobs >/dev/null" might work in bash, but it is not the correct > method. Yes, sure. > but I don't know if this is how it is supposed to be run... > > jinx$ ./sh /tmp/parallel.jobs.sh 4 1 2 3 4 5 6 7 8 9 Yes, that's the intended way (which I guess is obvious from the source). Would you say parallel.list.sh is portable?
Re: shell jobs -p
> You mean xargs -p, essentially? xargs -L 1 -P (capital) would have done the job if a) had I known about it (thanks for the hint!) b) it were POSIX
shell jobs -p
I could probably as well directly mail kre@, but who knows. The objective was to write s shell script that parallelized invocations of some command, but only up to a certain number of jobs running in parallel. There were two ideas to implement this, one using the jobs utility to track the current invocations and one to manually track them on a list. I ran into a number of problems with the "jobs" variant where I'm unsure of whether I was running into corner cases undefined by POSIX or plain bugs in certain shells. 1. When you background a job (with &), and later, the job has finished a) in the sense that it's a shell function that has returned b) in the sense that it's an external process that called exit(3) but the script has not wait-ed for the job (wait meaning the utility, not the system call) is the job then supposed to show up in "jobs -p" output? In bash, at least for a), the job does show up until you call jobs without -p. In ash and dash, it doesn't show up. 2. SUS says "The jobs utility does not work as expected when it is operating in its own utility execution environment". Indeed, in dash, $(jobs -p) outputs nothing, rendering jobs useless for non-interactive use. But SUS also says "Usage such as $(jobs -p) provides a way of ...". So, does dash's behaviour qualify as a bug?. I attach both implementations, maybe there are other useful comments. parallel.jobs.sh Description: Bourne shell script parallel.list.sh Description: Bourne shell script
Re: Redoing the code in /bin/sh to handle the issues in PR bin/48875
A bit late due to vacation: > To me, a command substitution that runs in the background seems weird > and like something to avoid. Is there really a case where this is useful, > or would it be reasonable to include a suggestion that it's best to avoid > such a construct? The case that led to the PR mentioned was a shell script forking a (shell) daemon process with the need to know that daemon's PID (to register the incarnation). So there was the fork-and-background chant followed by an "echo $!", and all that surrounded by something like daemon_pid=$(...). If you want to know even more detail, it was dotcache (http://www.math.uni-bonn.de/people/ef/dotcache) and I worked around it by invoking the record-pid code inside the backgrounding code. And to further complicate things, that code was written for Debian, so dash, and the whole thing here started just because I was investigating whether my problem was specific to dash or triggered on another ash as well.
Re: Weirdness in /bin/sh of 8.0
> It just stops after printing the package list: 1. What's the exit code? 2. If you run it with sh -x, do you see where it exits?
Re: shell prefix/suffix removal with quoted word
> I have susv4tc2. AOL. > It is specified Possibly yes. > but clarification should be done Surely yes. > Enclosing the full parameter expansion string in double- > quotes shall not cause the following four varieties of > pattern characters to be quoted, whereas _quoting_ > _characters_within_the_braces_shall_have_this_effect_. I did read that, but I (mis?)understood that was about qouting %/#. > And at the end of the informative matter: > > The double-quoting of patterns is different depending on where > the double-quotes are placed: I missed that. It' fairly well-hidden.
shell prefix/suffix removal with quoted word
It has been brought to my attention that quoting the "word" in sh's substring processing causes word to be matched literally rather than being treated as a pattern. I.e., x="abc" y="?" echo "${x#"$y"}" outputs "abc", while x="abc" y="?" echo "${x#$y}" outputs "bc". I can't see this behaviour specified by SUS nor mentioned in sh(1). bash and ksh seem to behave the same.
Re: swapctl -l
> Pages are only removed from swap when they are freed or accessed. Ah, I see, thanks! Can I find out which processes own pages that are paged out?
Re: swapctl -l
EF> It appears to me that swapctl -l lists how much of the swap devices EF> have ever been in use since they were configured. JS> No? It seems to list exactly how much space is currently in use. Sorry, I was confused by sysstat vm not showing any paging acivity. Looks like I have some daemons with memory leaks. I restarted them, but swap usage dropped to 1%, not zero.
swapctl -l
It appears to me that swapctl -l lists how much of the swap devices have ever been in use since they were configured. Is there a way to display how much swap is currently in use in the sense of "how many pages are currently swapped out"? Maybe due to my lack of undestanding of vm the question doesn't make sense.
printf '%b' '\64' (was: printf(1) and incomplete escape sequence)
> that is, \0123 in a format string, and in a %b arg are treated differently. There's always something new to learn. Do you have any idea which system's printf's bug POSIX is modeled after here?
Re: open()ing a named pipe with no current readers
> Won't O_NONBLOCK cover this? No. POSIX says: O_NONBLOCK When opening a FIFO with O_RDONLY or O_WRONLY set: If O_NONBLOCK is set, an open() for reading-only shall return without delay. An open() for writing-only shall return an error if no process currently has the file open for reading. > I don't think you can distinguish "buffer is full right now" from "no > reader" though. I wouldn't need to distinguish that. > Depending on what you're doing you might want a stream socket instead > of a pipe. I'm not in control of the reading side.
open()ing a named pipe with no current readers
I guess that's a rater silly/basic question, but ... Is there a way to open(.. O_WRONLY ..) an existing named pipe that currently has no reader such that -- the open() will neither block nor fail -- subsequent write()s to that fd will -- succeeed if someone started reading in the meantime -- block/return EWOULDBLOCK if still no-one is reading ? Of course there's the work-around of lazily open()ing the fd before each write() call.
Re: Suggestion: add a new option for chown(1)/chmod(1)
> similar to the -x option available in find(1)? YES!
make: empty() and friends on loop variables
It appears to me that empty(), defined() etc. do not work on .for loop variables, at least not on NetBSD 7's make and bmake-20150505. Assigning the loop variable to anothe rintermediate var seems tu cure it. .for i in aXa bYb .if defined(i) .info DEFINED .else .info UNDEFINED .endif .if empty(i:M*X*) .info ${i}: NO .else .info ${i}: YES .endif j:= ${i} .if empty(j:M*X*) .info ${j}: NO .else .info ${j}: YES .endif .endfor My guess would be that the logic that magically substitutes i with :U misses the defined(i) case.
Re: Changing basename(3) and dirname(3) for future posix compatibility
> there might be NetBSD applications currently which are assuming > that the input string is not modified by these functions I'd be heavily surprised if that change wouldn't break half (OK, 10%) the consumers, either because they call both functions on the same path argument or because they continue to use the path argument (if only for debug output). Has anyone scanned NetBSD's (or pkgsrc's) codebase for such uses? What does Windo^WLinu^Wglibc do? > which would have a '\0' dumped on top of a '/' when needed > (which can happen with both basename and dirname). Is this becasuse the basename of "C:\Program Files\Important\" is defined to be "Important" sans slash -- err, of "/etc/pam.d/" to be "pam.d"?
Re: RFC: Change to the way sh -x works?
I still prefer your original approach: SUS doesn't specify the behaviour. No-one is known to rely on the old behaviour. The new behaviour makes much more sense (to me). If I understood you correctly, the old behaviour can be emulated by set -x 2>/dev/stderr after the change in case anyone is going to need it.
Re: RFC: Change to the way sh -x works?
> What "to standard error" means there, that is, which version of stderr, > isn't clear. So, can anyone imagine a use case where "follow whatever stderr happens to be subsequently redirected to" (current behaviour) would make more sense than "whatever stderr currently points to" (after your change) would make more sense? Or anything relying on current behaviour? After your change, would set -x 2>/dev/stderr effect the old behaviour?
Re: RFC: Change to the way sh -x works?
This looks very useful to me. I guess SUS doesn't specify this? > For this, stderr is remembered only when tracing turns from off to on, > if it is on already, and another "set -x" is performed, nothing changes. I can't unambigiously parse that. On second reading, treating the comma at the end of the first line more as a semicolon or full stop, I stumble over what "nothing changes" means: The shell treats this as a no-op or the shell behaves as without your change? From further reading, I guess it's a no-op. > And yes, if tracing is off, "set -x 2>/tmp/trace-file" works as expected This is brilliant!
Re: getaddrinfo(3) on numerical addresses
EF> Ah yes, of course. Stupid me. Thanks. So what I've learned (thanks!) from this discussion: Calling getaddrinfo() with a hint of AF_INET/AF_INET6 means "if you try hard, can you make this a IPv4/IPv6 address" (e.g., look up 1.2.3.4.numerical.org or ::1.i-like-colons.org) If you want "does this look like a IPv4/IPv6 address" instead, call getaddrinfo() without a hint and examine res->ai_family. Which makes sense.
Re: getaddrinfo(3) on numerical addresses
EF> Given this is a monitoring system, who's job it is to detect server failures, EF> marking random servers/switches as dead while the resolver is going mad and EF> so check_ping on their numerical IPv4 times out is not particularily useful. VU> BTW, isn't that exactly the argument that this program must use VU> AI_NUMERICHOST which guarantees that no name resolution will be VU> attempted? While it's debatable how much sense it makes to use check_ping with a non-numerical address, as-is, it does accept them. I can't upstream a patch breaking this.