Re: [Toybox] POSIX 2024 changes
On 7/23/24 09:13, Ray Gardner wrote: > I modified an old HTML diff program by Ian Bicking to run with > Python3, and wrote a little shell script to compare the 2018 version of > a utility spec to the 2024 version. Run like: > > ./diffposixutil.sh sed Useful, thanks. (Modulo I'm changing it to work with my already downloaded version of posix-2008 back from 2008, since I wanna see _all_ the changes since then in case I've missed some drip-feed...) > This downloads the old and new versions of the sed.html spec from > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ and > https://pubs.opengroup.org/onlinepubs/9799919799/utilities/ and runs > htmldiff.py, putting the result in dif/sed.html. I've attached the > shell script, htmldiff.py, and the opengroup style.css from their > utilities folder. I miss the ability to download posix versions as tarballs instead of having to screen scrape them. This was explicitly supported before: susv4: https://pubs.opengroup.org/onlinepubs/9699919799/download/ susv3: https://pubs.opengroup.org/onlinepubs/009695399/download/ But for susv5: https://pubs.opengroup.org/onlinepubs/9799919799/download 404 error. And no link from the main index page either, instead there's a "participants". "You can't have it, this is ours." That's how you know it's a standard... > I ran this on the awk spec and got quite a few differences I'll have > to review, including some that are obvious editorial errors. I'll try > to bring the errors to the attention of someone at the open group. While you're at it, ask them why there's an awk.html.orig in the new directory? > Ray Thanks, Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] tar.test: don't test non-`-p` behavior as root.
On 7/23/24 14:55, enh via Toybox wrote: > Fixes https://github.com/landley/toybox/issues/512. Applied, but could you email me the tarball for the tar "ownership" test that produces the 2d7b hash? I'm getting a d9e7 hash here, and just confirmed the previous laptop doing TEST_HOST with devuan brochitis was also producing d9e7. (Which I didn't notice because passing all root tests is under the mkroot todo item...) The downside of testing hashes instead of hd output is when it differs and you can't reproduce the old one, it doesn't say why. You added this hash in commit 43d398ad5d7b and it's possible it never worked for me, but if --owner --group --mtime aren't squashing all the variables I'd like to know what still varies. Both toybox tar and host tar are producing the same output, so it's presumably a filesystem thing, possibly different passwd uids for "nobody"...? Thanks, Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] devmem: add -f FILE, arbitrary amounts of data.
On 7/26/24 07:13, enh via Toybox wrote: >> Don't know Rob cares about this, but ?: is not ISO C. It's a GNU >> extension. Specifically https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Conditionals.html > with 80 existing uses of https://en.wikipedia.org/wiki/Elvis_operator > in toybox, i'm pretty sure rob knows this already :-) Yeah, and it's been discussed multiple times over the years, starting with http://lists.landley.net/pipermail/toybox-landley.net/2012-April/013359.html where we were already looking into the portability of it. I should try to dig something up to put in code.html but the decision way back when was "if the 2.6 kernel needed it to build, toybox is probably ok relying on it being there" because we exist in an ecosystem. (The fact I maintained a tinycc fork for 3 years trying to extend it to make tccboot work with current vanilla source may have influenced this. :) I _thought_ this was recorded somewhere in code.html or design.html but it's not the easiest thing to grep for. There's several links at https://landley.net/toybox/cleanup.html#advice but this isn't one of them... There's a couple others things like this, "int x=x;" to shut up the stupid gcc "is never used uninitialized" warnings comes to mind (although that's migrated to a QUIET macro that does conditional zero initialization)... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] toybox tar test failure
On 7/17/24 07:56, enh wrote: >> > FAIL: tar honor umask >> > echo -ne '' | umask 0022 && rm -rf dir && mkdir dir && tar xf >> > $FILES/tar/dir.tar && stat -c%A dir dir/file >> > --- expected 2024-07-15 16:20:47.217287424 + >> > +++ actual 2024-07-15 16:20:47.257287423 + >> > @@ -1,2 +1,2 @@ >> > -drwxr-xr-x >> > --rwxr-xr-x >> > +drwxrwxrwx >> > +-rwxrwxrwx >> >> I can't reproduce it. I just did a fresh clone on the old machine and "make >> clean defconfig tests" completed, including running the new tar tests. >> >> The _symptom_ is that it ran the new tests against old toybox tar from before >> commit 93718452b9f6. That's the behavior from before the fix. Is your test >> setup >> calling an older toybox tar out of the host $PATH maybe? > > no, this was on a device, so it'll be the /system/bin/toybox from the build. ... > yeah, that seems to be a mksh compatibility issue. I suspect the other issue might also be also mksh, or your slightly different test setup. It could be that the shell umask command isn't setting umask properly, or maybe something's violating the assumption that the test command line is in a new process so state changes a test makes to umask, cd, or environment variables don't persist in the parent context... I'll see if I can reproduce a mksh run of the test here. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] toybox tar test failure
On 7/15/24 12:11, enh via Toybox wrote: > tried to sync this morning, but i'm getting a new tar test failure: Saw this yesterday but didn't have the email laptop with me. (I need to move that over...) > FAIL: tar honor umask > echo -ne '' | umask 0022 && rm -rf dir && mkdir dir && tar xf > $FILES/tar/dir.tar && stat -c%A dir dir/file > --- expected 2024-07-15 16:20:47.217287424 + > +++ actual 2024-07-15 16:20:47.257287423 + > @@ -1,2 +1,2 @@ > -drwxr-xr-x > --rwxr-xr-x > +drwxrwxrwx > +-rwxrwxrwx I can't reproduce it. I just did a fresh clone on the old machine and "make clean defconfig tests" completed, including running the new tar tests. The _symptom_ is that it ran the new tests against old toybox tar from before commit 93718452b9f6. That's the behavior from before the fix. Is your test setup calling an older toybox tar out of the host $PATH maybe? > also (since possibly WAI, and not a blocker because we ignore toybox > failures in tests if we're not using the toybox implementation), one > of the awk tests fails against Android's "one true awk": Ray replied to that, I haven't opened the can of worms of awk cleanup yet (wanna cut a release with it in pending first) so I'll happily apply a fix from him if he posts one and otherwise punt for now... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] gcc vs llvm -static-libasan
On 7/9/24 12:35, enh wrote: > On Tue, Jul 9, 2024 at 10:02 AM enh wrote: >> On Tue, Jul 9, 2024 at 8:47 AM Rob Landley wrote: >> >> i'll forward this to the llvm asan folks anyway, in particular the >> "gcc and llvm use different flags" is the kind of papercut that i >> think both gcc and llvm have been trying harder to avoid lately. > > https://github.com/llvm/llvm-project/pull/98194 for that. Wunderbar. (I think it's related to the stroopwafel.) When does your AOSP toolchain update in the main build so I can commit/push a change using it for ASAN=1 builds without inconveniencing you? > apparently your "static asan" request is even harder than i thought... Last time I tried setting up a container we wound up debugging stdin handling in bionic's _start code (although that _was_ because the laptop I was using at the time had an old processor throwing illegal instruction errors, so I needed to build a kernel and run it through qemu and used mkroot, which triggered some sad initramfs behavior in the kernel.) The current laptop seems to run the binaries so far. (I'm grinding through make tests even, just static build without ASAN enabled...) > the folks who actually know what they're talking about (because they > implemented all of these things), fully static asan isn't a thing --- It is for gcc/glibc, although I expect they're only implementing half the tests the LLVM guys have... > "bionic's hwasan support is a special well-integrated combination that > works in static binaries, but everything else relies on symbol > interposition and the presence of the dynamic loader". > > so my default knee-jerk "have you tried hwasan?" was closer to the > mark than i realized at the time :-) Does qemu-system-aarrcchh6644 emulate hwasan? If so, how do I get a dynamic bionic chroot of a reasonable size to boot under qemu to test binaries? (If the NDK doesn't have an extractable sysroot dir, does AOSP have some sort of "lunch shellprompt"?) Or should I stick with the glibc one once I can check that in? (Dynamic linked bionic would allow more of the utf-8 plumbing to work, and thus get tested by me, but I have to finish replacing -lcrypt with lib/crypt.c before "make defconfig" builds under bionic, so I'm busy here for a while yet...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] gcc vs llvm -static-libasan
On 7/9/24 09:02, enh wrote: > On Tue, Jul 9, 2024 at 8:47 AM Rob Landley wrote: >> Any suggestions? > > none you're going to like, but... I don't have to like it, I just have to make it work. :) > you're basically checking all the "not supported" boxes here: Welcome to my life. > x86-64 (as opposed to arm64), The android-ndk-r26d/toolchains/llvm/prebuild directory only has linux-x86_64 binaries. (The download page hass Platform "Linux 64-bit (x86)" as the only Linux option. It can produce arm output, but doesn't provide a toolchain that would run on an arm system, therefore a native compile-and-test cycle... > asan (as opposed to hwasan), Which wasn't available on x86-64 last I checked. I could whip up a raspberry-pi alike if pushed, but if you don't publish an ndk that runs there... (I could try building it from source again?) Adding qemu system emulation into the compile and test cycle is on the todo list. Adding qemu APPLICATION emulation is a recipe for weird corner cases in system call translation and something I've generally tried to avoid because system call translation is a categorically harder problem than emulating hardware interfaces. (Containers leverage the existing kernel, based on work done by OpenVZ since 1999, running the same host/guest binaries with all the endianness and page size and magic constants and structure packing issues guaranteed identical, and it STILL took 10 to get it load bearing.) > static (as opposed to dynamic). Again, running on a debian host. When they come out with debian-bionic I'll happily debootstrap a chroot for testing. Getting an android chroot to work as a development environment is the goal, hence chicken-and-egg problem. > i believe that arm64 hwasan static binaries "just work", > and if they don't, that's something we'd actually be able to find the > time to look at. until there's x86-64 hardware with whatever they're > calling "x86-64 top byte ignore", though, you're stuck with asan, and > that's been known to be mostly/somewhat broken on Android for years > (with no-one to work on it). > > why can't you run the dynamic version? i thought you'd upgraded your > laptop? I did a fresh OS install on a spare laptop of the same model, but it's still a glibc system. > is the new one also too old to run ~2012-era x86-64 binaries? It runs the static binaries just fine, but it's a glibc system: "./toybox: cannot execute: required file not found". (And then "ldd toybox" said "/usr/lib/x86_64-linux-gnu/libm.so: invalid ELF header" which isn't _entirely_ surprising given that gnu/ldd is a bash script calling glibc but was another "wha...?" moment I did not pursue further.) I vaguely remembered you saying you could create a "/system" symlink into the ndk somewhere, so I ran readelf on the binary which is requesting /system/bin/linker64 and did a "find android-ndk-r26d -name linker64" and there were no hits, so even setting up a dynamic chroot for it remains a todo list item. > alternatively, build arm64 hwasan binaries and run them on your phone > or orangepi? > > i'll forward this to the llvm asan folks anyway, in particular the > "gcc and llvm use different flags" is the kind of papercut that i > think both gcc and llvm have been trying harder to avoid lately. Thanks. As long as ASAN works for me _somewhere_ I don't really care where (should be the same bugs), and I mostly want all the tests to pass with ASAN before cutting a release (which is due again: time flies like an arrow, fruit flies like an apple, of the two time flies eating all your arrows sounds like the bigger problem...). But if I can't check in the "make it work with gcc on debian" flag without breaking the Android build, I has a sad. :( Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] gcc vs llvm -static-libasan
On 6/12/24 16:08, Rob Landley wrote: > The new debian toolchain also broke gcc/glibc ASAN, complaining (at > runtime) > "ASan runtime does not come first in initial library list; you should > either link runtime to your application or manually preload it with > LD_PRELOAD." which is that library ordering > nonsense back to rear its ugly head again and I refuse to humor > these INSANE ASSHOLES. If LLVM/bionic works without this, then it's > NOT REQUIED, they're just really bad at it. Notice how the error message > doesn't don't say which library to LD_PRELOAD if I _did_ want to fix it, > it just refuses to work where the previous version worked. A clear regression. > Which I'm late enough in reporting it's a fait accompli, and I'm in the wrong > for not noticing their fuck-up in a timely manner. Far too late to start > making > a fuss about it now... > > (Is a required library not installed? I used "build-essential" instead > of manually installing gcc and make precisely so it would scoop up that kind > of nonsense... And it's complaining about library ORDERING, which is not > supposed to be a thing when dynamic linking.) So the local fix I've had for this is to add -static-libasan to CFLAGS in the ASAN portion of scripts/portability.sh which fixes gcc. Note that I need this to do a NON-STATIC ASAN=1 build, because otherwise I get the "runtime does not come first" error above. And no, this is not a toybox issue: $ gcc hello.c -fsanitize-address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls $ ./a.out ==5942==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. But I wanted to regression test -static-libasan against the NDK before checking it in, and I just installed ndk-r26d on the new laptop to do that. The first reason it doesn't work is that llvm chose -static-libsan instead. I note that gcc does not accept "static-libsan" and llvm does not accept "-static-libasan". The SECOND reason it doesn't work is that a --static linked NDK toybox binary with -static-libsan says "error: undefined sunbol: _DYNAMIC" referenced by sanitizer_linux.cpp called from libclang_rt.asan-x86_64-android.a To reproduce that, I extracted the ndk zip file into my home directory and did: $ ln -s ~/android-ndk-r26d/toolchains/llvm/prebuilt/linux-x86_64/bin ~/llvm $ echo -e '#!/bin/bash\n"$(dirname "$0")"/clang --target=x86_64-linux-android34 "$@"' > ~/llvm/llvm-cc $ chmod +x ~/llvm/llvm-cc $ cd ~/toybox/toybox $ CROSS_COMPILE=~/llvm/llvm- LDFLAGS=--static make clean defconfig toybox And then switched off SU and LOGIN in .config because of crypt (it's on the todo list), but the rest built! But then adding ASAN=1, the NDK build broke, although -static-libsan doesn't seem to affect that (it builds dynamic with it but I can't run the result, and the build breaks static without it because the OTHER asan libraries inappropriately assume dynamic linking). I'd like to check in the -static-libasan at the end of the CFLAGS string for ASAN to fix the debian build (on the theory ASAN doesn't work for me in the NDK _anyway_), but I don't want to break the (dynamic) android build (which complains it's an unknown flag with "do you mean libsan"). Any suggestions? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] test -ef -ot -nt (POSIX 2024)
On 7/3/24 00:37, Oliver Webb via Toybox wrote: > Looking over the new stuff in POSIX 2024, toybox already has most of the > stuff it specifies I've been waiting for the html to go up on opengroup.org before considering it "real", in part due to PDF being more awkward to deal with (eyestrain) and in part due to a members-only spec behind a paywall with samizdat copies in the wild not being something open source should bother with. Andrew Josey's 6/21 email https://www.mail-archive.com/austin-group-l@opengroup.org/msg12710.html said it would take "up to a month" to publish the html. That said, happy to get a jump on things if some subset of recent debian features is now blessed by a third party... > Excluding things like make which toybox doesn't have, But should. I wonder if a posix-2024 make would actually build Linux From Scratch packages? > and gettext/msgfmt which from all > the design documentation I've read Rob doesn't wanna add to toybox, I consider them out of scope. When Aboriginal Linux built Linux From Scratch it used https://landley.net/aboriginal/mirror/gettext-stub-1.tar.gz which NOPs out the functions. > These are the POSIX 2024 features toybox doesn't have: > > dd iflag=fullblock ok > rm -d Heh. Ok. > tail -r (which from my checking coreutils doesn't have) Which sadly means debian's man page doesn't either. > test -ef -nt -ot > > The attached patch adds in the test -ef -nt -ot > > As for symbolic links: > > $ test /bin/bash -ef /bin/sh && echo yes > yes > $ test /bin/bash -ot /bin/sh && echo yes > yes > > test is meant to follow symlinks _only_ when checking inodes with -ef Applied locally, but you didn't update the help text. I can do that here if -nt -ef and -ot have the same meanings as the 2019 debian "man 1 test" page... Hmmm, -nt = newer than, -ot = older than, -ef is... how did "same" become e? Equal? Equivalent? That sounds like contents, not "these are hardlinks to the same dev/inode". Merriam Webster's thesaurus for "same" just gives those two E words. Expected, extracted, extruded, ectopic, electric, endemic, eloquent... Bash "help test" says "True if file1 is a hardlink to file2". Can't argue... Thanks, Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] Heads up, posix 2024 dropped.
I've been waiting for the HTML edition (which says "to follow soon") so I can recombobulate the roadmap. (Not doing it with the PDF, although several people have written up their own difference lists.) https://www.opengroup.org/austin/ Previous web version does not have the forward link yet either: https://pubs.opengroup.org/onlinepubs/9699919799/ But just so you know... :) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] lspci fixes for multi-controller hosts
On 6/14/24 08:51, Dmitry Buzdyk wrote: > Couple of patches for the systems with multiple PCIe controllers. > In my case it has 2 controllers and output of the lspci (with patches applied) > looks like this: > > :00:00.0 [060400]: [17cb:0113] > :01:00.0 [0c0330]: [1912:0014] (rev 03) xhci_hcd > 0001:00:00.0 [060400]: [17cb:0113] > 0001:01:00.0 [060400]: [1179:0623] > 0001:02:01.0 [060400]: [1179:0623] > 0001:02:02.0 [060400]: [1179:0623] > 0001:02:03.0 [060400]: [1179:0623] > 0001:03:00.0 [04]: [17cd:0202] > 0001:04:00.0 [0c0330]: [1912:0014] (rev 03) xhci_hcd > 0001:05:00.0 [02]: [1179:0220] tc956x_pci-eth > 0001:05:00.1 [02]: [1179:0220] tc956x_pci-eth > > > 1. Fix "lspci -x" taking into account full PCI_PATH_NAME > 2. Print full PCI_PATH_NAME to the output. I applied the first one, but #2 changes the output to not match what debian's lspci is doing. Is there a rule about when to print the extra info here? (If there's more than one controller?) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] awk (was: strlower() bug)
On 6/15/24 17:22, Ray Gardner wrote: > On Wed, Jun 12, 2024 at 2:57 PM Rob Landley wrote: >> >> On 6/11/24 16:56, Ray Gardner wrote: >> > Elliot, thanks for the positive feedback on the docs, but I really >> > wish you and Rob would try the program. I waited a while to see what >> > Rob would have to say. He doesn't seem the sort to be at a loss for >> > words, but ... nothing. Any idea why he's had nothing to say about an >> > awk for toybox? >> >> Why are you asking Elliott? > > He responded to my post; you didn't. I know you've had a lot on your plate > lately with your move, selling the house, working on toysh. But you > responded to most posts here since mine on 5/14. I had the window open, but hadn't yet done the reading. :) > After all you've written about awk, I was puzzled by your non-response, > and inferred wrongly that it was intentional, so I thought asking you why > you didn't respond to a post you intended to ignore would be ... not well > received. I thought Elliot might have some insight there. A reasonable amount of follow-up is fine. I _do_ drop the ball a lot. >> Remember the "poke me a week later if I forget"? > > No. But I dug into the archive, and find that you said that to Oliver in a > post about toysh in March. But never about awk, or to me. > (http://lists.landley.net/pipermail/toybox-landley.net/2024-March/030146.html) Sigh, I keep thinking it's in the FAQ. (I should update the FAQ, but have like 8 half-finished updates to it already...) >> I consider myself poked, somewhat passive-aggressively. :) > > No passive aggression intended. My fault for not documenting the expected procedure better. > But you've been looking for an awk for at least 8 years, so I really > thought you'd welcome one that's complete and written for toybox, with > some tests and documentation. I am very interested, yes. I downloaded your repo, copied toybox/awk.c to toys/pending/awk, and built it. It compiled. I grabbed awk.test and ran that, and it passed. Didn't QUITE pass test_host: awk: cmd. line:1: warning: escape sequence `\u' treated as plain `u' FAIL: awk \u echo -ne '' | "/usr/bin/awk" 'BEGIN{print "\u20\u255"}' < /dev/null --- expected2024-06-16 08:36:12.147722288 -0500 +++ actual 2024-06-16 08:36:12.155722288 -0500 @@ -1 +1 @@ - ɕ +u20u255 But eh, passed all the others (with VERBSOSE=all), close enough. (Adding "utf8locale" to the test file didn't fix it, dunno what it's trying to do...) *shrug* I'm happy to check it into pending as is, if you don't mind discarding commit history. (Um, URL to the github commit I got it from maybe? The trees haven't got the same base so there isn't an obvious "pull" option here...) $ git log toybox/awk.c toybox_awk_test/awk.test | grep '^commit ' | wc -l 30 It's a _bit_ granular but toysh is way worse, and for that matter: $ git log --follow toys/*/sed.c | grep '^commit ' | wc -l 110 Hmmm, maybe I can do something clever fiddly with fishing out git format-patch entries, trimming them a bit and adjusting the paths, and "git am" in the other tree... >> I have the tab open, the reason I haven't looked at it yet is A) it's 4500 >> lines, B) in a thing I have WAY insufficient existing domain expertise in >> (but >> multiple bookmarked tutorials and an entire book on somewhere). > > It's really 3523 lines of non-blank non-comment code, measured with: > > toybox awk -f cnt_sloc.awk awk.c > > where cnt_sloc.awk is: > /^[ \t]*\/\*/ , /\*\/[ \t]*$/ { next } # Skip /* ... */ comments > /^ *$/ || /^ *\/\// { next } # Skip empty and //comment-only lines > { sloc++ } > END { print sloc } Hmmm... $ ./awk -f <(echo '/^[ \t]*\/\*/ , /\*\/[ \t]*$/ { next }'$'\n''/^ *$/ || /^ *\/\// { next }'$'\n''{ sloc++ }'$'\n''END { print sloc }') toys/*/awk.c 3523 > BTW regarding not getting an SSD at Target: there's a MicroCenter in the > Minneapolis metro area; might be worth the drive. The one where I am is good. I took the green line to the A line to Best Buy, which still had a few of the right kind of ssd locked in a misc old parts filing cabinet. New(er) laptop is up and running, with non-EOL os version installed on it. (Hence the list of things the new environment/compiler broke.) Part of my slow/quiet here is the old machine is still the "master" for email and blog pushing, so I write notes-to-self and then have to copy them over when that's the one I took out with me for the day. I'm slowly getting used to firefox (dunno if it really scales yet, "pkill -f renderer" at chrome hasn't got an obvious equivalent that leaves the tab open and reloadable). I also have to decide if it's gonna be Thunderbird again or something else, which is presumably bundled
Re: [Toybox] awk (was: strlower() bug)
Sigh, composed a reply on the other laptop but still can't send it from there... On 6/12/24 16:09, enh wrote: > On Wed, Jun 12, 2024 at 4:57 PM Rob Landley wrote: > yeah, i make a lot of noise, but i don't have any real power :-) Nah, you just have bigger fish to fry. :) > (fwiw, there's a second edition just come out. from one-true-awk it > seems like csv support is the main new feature. amusingly one of the > errata for the new edition on https://awk.dev/ is a behavioral > difference between one-true-awk and gawk.) I tend to drill down fairly deeply into things to be comfortable, and people have implemented fairly large things in awk as a programming language. >> The new debain toolchain is hallucinating a warning when I build toybox >> with it, toys/posix/grep.c:211:24: warning: 'regexec0' accessing 8 bytes >> in a region of size 4 [-Wstringop-overflow=] >> and futher note: referencing argument 5 of type 'regmatch_t[0]'. 1) This only happens for ASAN=1 by the way. (Yes, at compile time.) 2) Changing the prototype from regmatch_t pmatch[] to regmatch_t *pmatch made the warning go away. *jazzhands* Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] awk (was: strlower() bug)
On 6/11/24 16:56, Ray Gardner wrote: > Elliot, thanks for the positive feedback on the docs, but I really > wish you and Rob would try the program. I waited a while to see what > Rob would have to say. He doesn't seem the sort to be at a loss for > words, but ... nothing. Any idea why he's had nothing to say about an > awk for toybox? Why are you asking Elliott? He's in California, I'm currently in Minneapolis, we haven't spoken in person since before the pandemic. (I mean, he has my cell phone number and can text me, but it doesn't come up much?) Remember the "poke me a week later if I forget"? I consider myself poked, somewhat passive-aggressively. :) I have the tab open, the reason I haven't looked at it yet is A) it's 4500 lines, B) in a thing I have WAY insufficient existing domain expertise in (but multiple bookmarked tutorials and an entire book on somewhere). I'm currently re-familiarizing myself with the toysh redirection code to fix a nasty bug there, which has been a todo item I haven't finished for 3 weeks, and in the past week I installed a new laptop with a non-EOL version of devuan, and setting that up I found multiple things that the world moving on without me broke. Sigh, my blog's way behind, just updated it to the 21st, but here's blog spoilers which probably do not render properly as html yet and doesn't have links to things I mention (the patch mentioned at the end is https://landley.net/bin/mkroot/latest/linux-patches/0008-elfcrap.patch and no I haven't fixed it yet): June 7 Stuff's a bit chopped up since I'm straddling two laptops. Still blogging from the old one, and the old one has the reasonable battery (I should order another battery) so I can't take the new one out to random coffee shop yet but only use it plugged in at the desk. So I'm blogging about what I did based on a notes.txt file I scp'd over to the old machine. Package dependencies remain out of control: for some reason "apt-get install git" wanted "libperl-error" which is just sad. I'm vaguely annoyed that build-essential installed fakeroot and three *-perl packages and so on, but that's the cost of using a meta-package somebody else curates. (Saying "the following additional packages will be installed" and then "the following NEW packages will be installed" with the only difference being the second one incldues the package I requested... that seems non-optimal, especially when the list is 37 packages long). The new debain toolchain is hallucinating a warning when I build toybox with it, toys/posix/grep.c:211:24: warning: 'regexec0' accessing 8 bytes in a region of size 4 [-Wstringop-overflow=] and futher note: referencing argument 5 of type 'regmatch_t[0]'. This warning is wrong in multiple ways. First the code's been run under ASAN a lot without complaint, and no other toolchain produces this warning: not llvm, not gcc, and musl-cross-make has been building the same gcc 12.0 version which does NOT produce the warning. Something debian locally patched into its "gcc 12.0-14" is producing a warning that vanilla gcc does not produce. That makes it a bit suspicious to begin with. I inspected the code anyway, and argument 5 of the call to regexec0() in do_grep() is an 8 byte pointer to a 16 byte structure. There's no "region of size 4" to be found. The argument shoe->m is a pointer to an entry of type regmatch_t, and that struct contains two entries of regoff_t which is ssize_t which is long, thus 16 bytes on a 64-bit system. Even on a 32 bit system, the two of them would still add up to 8 bytes. The structure is allocated to its full size. There's nothing wrong with the code that I've been able to spot. I _think_ what might be happening is shoe->m lives in "shoe" which is most recently assigned to in the enclosing for() loop via shoe = (void *)TT.reg; and TT.reg in the GLOBALS() block is declared as struct double_list *reg; because at that level we only care that it's a doubly linked list, not what members each list entry has in the command-local "struct reg". Except even THAT theory is funky because double_list has three pointers: next, prev, and data, each of which is 8 bytes, where is it getting size 4? If it was comparing sizeof(*TT.reg) with sizeof(*shoe) then shoe-m starts off the end of the smaller struct. If the compiler can't keep the types straight then it's not a size 4 issue, it's an out of bounds access. The type of the "shoe" pointer is "struct reg", which has 5 members. The argument it's complaining about is a pointer to the 5th member, which is indeed a regmatch_t. (And the error is SAYING it's a regmatch_t, which is neither 4 nor 8 bytes long, it's 16. Neither the pointer, not the struct, nor any member OF that struct, match the constraint it's insisting was violated.) The only place there's a member of size 4 is "int rc", the third member of struct reg. And struct double_list only HAS 3 members, and "m" is the last member struct reg, so maybe somehow the compiler is confusing (struct reg
Re: [Toybox] [PATCH] vi: rename `-s` flag to `-c`
On 6/11/24 07:50, enh wrote: >> Looking further at this, what is the behavioral difference between -c and -s? >> The patch does nothing but change one into the other, with no other behavior >> change I've spotted? ... >> I thought for a moment that -c was jumping straight into esc-colon mode with >> the >> command line at the bottom of the screen, but the -c example above does not >> provide -c on the command line and I am just CONFUSED. ... > i think the objection is quite simple, no? -c takes a _command_ > whereas -s takes a _filename_ (and that file is full of commands). > what's currently implemented _is_ -s, and it would be wrong to rename > it to -c. Ok, so -c _does_ jump to esc-colon mode and -s doesn't, and the patch isn't changing the behavior. Got it. (Typed that, then didn't send it and instead opened vi.c instead, rewrote most of the help text, gave up on making sense of the vim man page and instead started reading the posix vi page, which delegates to the ex page...) CAN OF WORMS. No idea how to cleanly explain that "a is like i but moves the cursor one character to the right first, for no obvious reason". I mean, does that come up a lot? And "A is END then i". Design question of whether to try to explain to someone how to use this (assuming they have cursor keys and thus don't need weird combos like that), or is there a duty to list what we've implemented? (Which in general has _not_ been toybox policy: the help text doesn't mention things we included for compatibility with old scripts/habits that are never the obvious answer to "how do I do X", there should be one obvious way to do it and was before Guido retired... But that's a heck of an editorial judgement, isn't it? Sigh, over the years I've had various vi cheatsheets (and once taught an intro to unix course at the local community college which had a week on vi), but if I still have any they're packed in a box in storage after the move. This command is the poster child for "each user uses a subset of the options, but no two quite agree on WHICH subset..." Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] vi: rename `-s` flag to `-c`
On 6/7/24 03:41, Rob Landley wrote: > On 6/5/24 00:46, Jarno Mäkipää wrote: >> You cannot test against other vi clones with -c after this patch. But >> you could have test against vim with -s {script} implementation. I >> used vim as reference for testing with original test case files. > > If I'm cloning bash specifically for toysh, I don't have an objection to > targeting vim specifically in a toybox vi implementation. > >> Ex command only switch -c could be added as addition to -s if you >> wanna achieve something with ex commands, but maybe dont delete -s >> implementation, unless you have better way to test vi mode motions. > > Right now the regression test contexts I'm paying attention to are busybox > defconfig and debian's default install. How does this impact testing against > those? Looking further at this, what is the behavioral difference between -c and -s? The patch does nothing but change one into the other, with no other behavior change I've spotted? The vim man page says: -c {command} {command} will be executed after the first file has been read. {command} is interpreted as an Ex command. If the {command} contains spaces it must be enclosed in double quotes (this depends on the shell that is used). Example: Vim "+set si" main.c Note: You can use up to 10 "+" or "-c" commands. -s {scriptin} The script file {scriptin} is read. The characters in the file are interpreted as if you had typed them. The same can be done with the command ":source! {scriptin}". If the end of the file is reached before the editor exits, further characters are read from the keyboard. I thought for a moment that -c was jumping straight into esc-colon mode with the command line at the bottom of the screen, but the -c example above does not provide -c on the command line and I am just CONFUSED. Jarno: what "other clones" were you referring to? Posix has -c, and does not have -s. Are we supporting non-posix clones other than vim? (Is there a default freebsd or macos version that ignores posix, maybe?) https://pubs.opengroup.org/onlinepubs/9699919799/utilities/vi.html I do not have the domain expertise to understand the objection here. > Rob Still Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] vi: rename `-s` flag to `-c`
Sorry I'm a bit slow, I'm setting up a new laptop and haven't got email moved over yet. (Cutting the gordian knot of closing all the windows on the old laptop by just not waiting for that, since I _do_ have a spare laptop and bought a new hard drive for the new install anyway...) On 6/5/24 00:46, Jarno Mäkipää wrote: > You cannot test against other vi clones with -c after this patch. But > you could have test against vim with -s {script} implementation. I > used vim as reference for testing with original test case files. If I'm cloning bash specifically for toysh, I don't have an objection to targeting vim specifically in a toybox vi implementation. > Ex command only switch -c could be added as addition to -s if you > wanna achieve something with ex commands, but maybe dont delete -s > implementation, unless you have better way to test vi mode motions. Right now the regression test contexts I'm paying attention to are busybox defconfig and debian's default install. How does this impact testing against those? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] Fix ionice's return value for getting process IO priority
On 5/30/24 07:23, enh via Toybox wrote: > [note: this isn't my patch; it was > https://android-review.googlesource.com/c/platform/external/toybox/+/3106282, > and i'm just forwarding it. the attachment is the original patch with > the original author's details, and i've cc:ed them. lgtm to me, > though, and matches the next function in the same file.] > > In the user version, if you use ionice to get the process IO > priority without permission, -1 will be returned, but Idle: prio 7 > will be printed at this time. This is an incorrect priority and > should return permission denied. Applied, and then I did 70a5259261ea cleanup on top of it (move the error handling into the syscall wrappers, renamed with x prefix, and clean up a couple printfs that didn't need to assign back to the global variables while I was there), which I don't THINK broke anything but I don't have a particularly thorough way to test it here? I set the ioprio of a backgrounded "sleep" command, and it read back what I'd sent, but debian's and toybox's commands varied a bit: $ sleep 1000 & [1] 18361 $ ./ionice -p 18361 unknown: prio 0 $ ionice -p 18361 none: prio 0 $ ./ionice -p 18361 -c 3 -n 2 $ ./ionice -p 18361 Idle: prio 2 $ ionice -p 18361 idle Looks ok to me? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strlower() bug
On 5/31/24 12:53, enh wrote: >> Let's see... Ah: >> >> https://www.unicode.org/L2/L1999/UnicodeData.html >> >> That's a bit long. My suggestion had 9 decimal numbers, this has "IDEOGRAPHIC >> TELEGRAPH SYMBOL FOR JANUARY" as one of fifteen fields, with " 0031 >> 6708" being another single field. How nice. (And still extensive warnings >> that >> this doesn't cover everything. I think "too much is never enough" was an MTV >> slogan back in the 1980s? Ah, it's from "The Marriage of Figaro" in 1784.) > > citation needed? (or if you want me to keep trying to think of where > that or something similar occurs in the libretto, at least tell me > whether it's an aria or recitative :-) ) Sorry, not the Mozart one. And not the Italian one Mozart based his version on, but the original french version the Italian one was based on: https://en.wikipedia.org/wiki/The_Marriage_of_Figaro_(play) The quote gets translated a few ways out of the 300 year old french: https://www.oxfordreference.com/display/10.1093/acref/9780191826719.001.0001/q-oro-ed4-0807 And to clarify again, I mean Wolfgang, not his equally (if not more) talented sister Maria who toured together with her sibling as child prodigies but was sidelined as soon as she reached "marriageable age" and had to teach piano for a living: https://en.wikipedia.org/wiki/Maria_Anna_Mozart Some letters from Wolfgang praising her compositions have survived, but her parents destroyed all her actual sheet music because it had cooties. Next time people talk about the "great men of history"... Don't get me started about Einstein's first wife. >> In ascii, wcwidth() is basically isprint() plus "tab is weird". >> >> For unicode, wcwidth() comes into play. The unicode bureaucracy committee >> being >> too microsofted to competently provide one is irrelevant to wcwidth() not >> being >> needed for ascii. >> >> (I also note the assumption of monospaced fonts in all this. Java's >> fontmetrics() was about measuring pixel counts in non-monospaced fonts, which >> this doesn't even contemplate.) > > this is why i keep telling you that wcwidth() only really makes sense > for tty-based stuff. and even there ... I need to figure out where to wrap lines in command line editing and text editors and so on. (I have been relieved of duty on vi, but I still need to make shell command line editing work. Plus fold and so on. And screen, and watch. Might do a nano-alike at some point. This is already sort of in top...) > i'm curious whether the > different terminal emulators actually behave the same in any of the > interesting cases. (_especially_ when you get to the "that can't > happen in well-formed text in the language that uses that script" > cases.) I have an ANSI probe sequence to ask where the cursor is, but even if I wanted to be that chatty (and didn't mind that the amount of time it takes to get a response is arbitrary and variable, with no response actually guaranteed to come anyway, and other input surrounding the response), if the output's already wrapped and scrolled the screen since the last time I asked it's bad. And if I _disable_ screen wrap then A) I dunno if it's truncated the output, B) lots of other stuff breaks (it's like leaving the screen in raw mode, only SUBTLY wrong, and yes QEMU does this from time to time and drives bash line editing NUTS, that's why run-qemu.sh echoes the relevant stop doing that sequence AND mkroot's init also outputs it)... Which means I need a wcwidth() to know how many columns the next character will advance the cursor in the terminal before outputting it. >> Not that I particularly want to ship a large ascii table either. When I dug >> into >> musl's take on this, I was mostly reverse engineering their compression >> format >> and then going "huh, yeah you probably do want to compress this". >> >> I could generate the table I listed with a C program that runs ispunct() and >> similar on every unicode code point and outputs the result. I could then >> compare >> what musl, glibc, and bionic produce for their output. The problem is it's >> not >> authoritative, it's downwind of the "macos is still using 2002 data" issue >> that >> keeps provoking this. :( > > i'm really confused that you keep mentioning ascii. if you really mean > ispunct() here, say, and not iswpunct(), The difference between them is that ispunct() has always taken an int but the C committee was cowardly and refused to make it actually respond to the whole range, so they created a new function to do the same thing. At least fseeko() can blame LP64 for long and pointer being the same size having splash damage. (Moore's Law didn't advance the components in a coordinated manner, we hit the need for >2 gig files ten years before we hit the need for >4gig system RAM and thus 64 bit registers...) (I suppose the C committee was fighting IBM and Microsoft for 10 years before utf8 happened, and then the unicode committee had Microsoft on it and thus combining
Re: [Toybox] strlower() bug
On 5/30/24 16:12, enh wrote: >> > hmm... looking at Apple's online FreeBSD code, it looks like they have >> > very different (presumably older) FreeBSD code >> > [https://opensource.apple.com/source/Libc/Libc-320.1.3/locale/FreeBSD/tolower.c.auto.html], >> > and the footer of the file that reads implies that they're using data >> > from Unicode 3.2 (released in 2002, which would make sense given the >> > 2002 BSD copyright date in the tolower.c source): >> >> Sigh, can't they just ship machine consumable bitmaps or something? > > because everyone wants different formats. even the same library has > changed over time. (and not just because characters went from 16 bits > to 21 bits!) Conversion from a simple format seems straightforward to me. Part of my frame of reference here is Tim Berners Lee inventing the 404 error. That was Tim's big advance that made HTML work where Ted Nelson's overdesigned hyper-cyber-iText didn't. Tim 80/20'd the problem by just handling the easy cases (we have the data) and punting the hard cases (updating links when they moved) to humans. Ted published his hyper-hype paper in 1965 and then failed to interest anyone in it for a quarter century before Tim made something actually useful (beating Gopher by about 6 months). Crediting Ted as the inventor of html is like crediting Jules Verne as the inventor of the submarine, or H.G. Wells as the (eventual) inventor of the time machine. (Lazerpig had a rant about this in his video on stealth planes: the inventor is the person who made it WORK, not who came up with the idea of humans flying or a knob on the wall that controls the air temperature.) So to me, the question is "how much can we put in a simple format", and then have a list of broken characters you need an exception handler function for. How do we 80/20 this? >> I can have >> my test plumbing pull "standards" files, ala: >> >> https://github.com/landley/toybox/blob/master/mkroot/packages/tests >> >> But an organization shipping a PDF or 9 interlocking JSON files with a turing >> complete stylesheet doesn't help much. > > (not really the point, but the one you want for the stuff you're > talking about here is actually just a text file. Let's see... Ah: https://www.unicode.org/L2/L1999/UnicodeData.html That's a bit long. My suggestion had 9 decimal numbers, this has "IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY" as one of fifteen fields, with " 0031 6708" being another single field. How nice. (And still extensive warnings that this doesn't cover everything. I think "too much is never enough" was an MTV slogan back in the 1980s? Ah, it's from "The Marriage of Figaro" in 1784.) aosp/external/icu/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode/UnicodeData.txt aosp/external/icu/android_icu4j/src/main/tests/android/icu/dev/data/unicode/UnicodeData.txt aosp/external/icu/icu4c/source/data/unidata/UnicodeData.txt aosp/external/pcre/maint/Unicode.tables/UnicodeData.txt aosp/external/cronet/third_party/icu/source/data/unidata/UnicodeData.txt aosp/out/soong/workspace/external/cronet/third_party/icu/source/data/unidata/UnicodeData.txt Android seems to have checked in multiple copies of this file. $ for i in $THAT; do [ -n "$OLD" ] && diff -u $OLD $i; OLD=$i; done | grep +++ +++ aosp/external/pcre/maint/Unicode.tables/UnicodeData.txt 2023-08-18 15:16:31.239657629 -0500 +++ aosp/external/cronet/third_party/icu/source/data/unidata/UnicodeData.txt 2023-08-18 15:14:44.351661450 -0500 And I need to re-pull my tree for them to match. > i've repeatedly been > tempted to teach unicode(1) to read it, since it's always installed on > macOS and debian anyway [for values of "always" that include "all my > machines, anyway"], to be able to show far more information about any > given character.) I've thrown a note on the todo heap... >> Which is _sad_ because there's only a dozen ispunct() variants that read a >> bit >> out of a bitmap (and haven't significantly changed since K: neither >> isblank() >> nor isascii() is worth the wrapper), plus a toupper/tolower pair that map >> integers with "no change" being the common case. > > (one of the things you'll learn from parsing the file is that that's > not how toupper()/tolower() works for all characters. plus there's > titlecase. plus case folding.) "For all characters". I'm just looking for low hanging fruit and a list of exceptions to punt to a function. >> Plus unicode has wcwidth(). > > no, it doesn't. (i wouldn't be maintaining my own if it did!) In ascii, wcwidth() is basically isprint() plus "tab is weird". For unicode, wcwidth() comes into play. The unicode bureaucracy committee being too microsofted to competently provide one is irrelevant to wcwidth() not being needed for ascii. (I also note the assumption of monospaced fonts in all this. Java's fontmetrics() was about measuring pixel counts in non-monospaced fonts, which this doesn't even contemplate.) >> So code, alpha, cntrl, digit, punct, space, width, upper,
Re: [Toybox] this week in weird coreutils stuff: chmod
On 5/30/24 14:59, enh wrote: >> *shrug* Removing all uses of mode_t and using "unsigned" instead consistently >> should work fine. Only "struct stat" should really care, and even then they >> could just use the actual primitive type in the struct definition... >> >> (I'm not a fan of data hiding without some _reason_ for it. I used to humor >> it a lot more, but now I want to know what/why it's doing.) > > funnily enough, i'm having exactly this argument with the person who > asked for this chmod functionality, since they own the ABI checker, > and i'm claiming that telling me that i've "changed" u_int32_t to > uint32_t is not helpful, and that when talking about ABI i always want > the underlying type :-) LP64 remains a good idea. Pity the ANSI C committee for C89 didn't have a spine. (Yeah they had to navigate the 16->32 bit transition, but LP32 for 32-bit ANSI C systems would have made OBVIOUS SENSE. "There are older systems that aren't LP32" was a given. The 68k came out in 1980 and the 386 in 1985, they weren't exactly taken by surprise by 32 bit registers and a flat memory model...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] microcom, stty: Use TCSADRAIN to set tty device attribute
On 5/20/24 23:43, Yi-Yo Chiang via Toybox wrote: > Don't flush the tty device input queue when setting device attribute. > > Rationale: > * microcom: The tty device might already have a _good enough_ termios > to read data from. Flushing the input queue just to set the terminal > attribute would result in data loss in this case. > * stty: This command doesn't read nor write data to the tty device, so > flushing the input queue doesn't make sense here. The program > actually talking to the tty should decide if it wants the tty > flushed or not. > Note: other implementations of stty also uses TCSANOW (bsd) or > TCSADRAIN (coreutils), so I think this assumption is fine. Was commit 2043855a4bd5 sufficient or do you still have a use case that needs this? (Email dated the 20th containing a patch dated the 21st vs a git commit applied the 23rd...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] this week in weird coreutils stuff: chmod
On 5/29/24 14:20, enh wrote: > seems to have broken the macOS build? > ``` > lib/lib.c:953:10: error: conflicting types for 'string_to_mode' > unsigned string_to_mode(char *modestr, unsigned mode) > ^ > ./lib/lib.h:413:10: note: previous declaration is here > unsigned string_to_mode(char *mode_str, mode_t base); > ^ > ``` Oops, missed one. Try commit 3c276ac106a4. So what _is_ mac using... Sigh: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/_types/_mode_t.h:typedef __darwin_mode_t mode_t; /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/_types.h:typedef __uint16_t __darwin_mode_t;/* [???] Some file attributes */ They typedef it to unsigned short instead of unsigned int. Even though type promotion will pass an int on the stack for anything smaller than an int, and use an int register to do the math... I guess back in 1974 "int" was a 16 bit type, and they stuck with that in the move to 32 and then 64 bit processors because SUGO times 3 bits each is only using 12 of those 16 bits, leaving 4 for file types and we've only used 7 of those 16 combinations for IFDIR and IFBLK and so on (well, 8 on mac but the header says IFWHT is obsolete), clearly that will never run out... *shrug* Removing all uses of mode_t and using "unsigned" instead consistently should work fine. Only "struct stat" should really care, and even then they could just use the actual primitive type in the struct definition... (I'm not a fan of data hiding without some _reason_ for it. I used to humor it a lot more, but now I want to know what/why it's doing.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] this week in weird coreutils stuff: chmod
On 5/28/24 08:00, enh via Toybox wrote: > apparently chmod allows something like > > chmod u+rwX-s,g+rX-ws,o+rX-wt > > as a (far less readable!) synonym for > > chmod u+rwX,u-s,g+rX,g-ws,o+rX,o-wt > > i'm told that toybox silently accepts the former too, but does not > interpret it as if it means the latter? Try commit a2c4a53e155c. (Needed to zero a variable inside the loop rather than just once at the beginning. Random cleanups while I was there, plus tests.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strlower() bug
On 5/22/24 09:30, enh wrote: > On Tue, May 14, 2024 at 2:58 PM Rob Landley wrote: >> It looks like macos towlower() refuses to return expanding unicode >> characters. >> Possibly to avoid exactly the kind of bug this fixed, in exchange for >> corrupting >> the data. > > yeah, i don't know whether it's on purpose or a bug, but that does > seem to be the case... i tested with another Latin Extended-B > character whose uppercase and lowercase forms are both in the same > block (and thus have the same utf8 encoding length), and macOS > towlower() does work for that. > > hmm, actually maybe it's just that their Unicode data is out of date? > it looks like they don't know about Latin Extended-C at all? a code > point like U+2c62 that gets _smaller_ (because it's in the IPA > Extensions block) doesn't work either. > > i did try looking in FreeBSD, but i've never understood how this stuff > works there. FreeBSD questions go to Ed Maste who is theoretically subscribed here but keeps getting unsubscribed by gmail bounces. > i'm guessing from the fact i've never found them that the > implementations are all generated at build time, subtly enough that my > attempts to grep for the generators fail. > > hmm... looking at Apple's online FreeBSD code, it looks like they have > very different (presumably older) FreeBSD code > [https://opensource.apple.com/source/Libc/Libc-320.1.3/locale/FreeBSD/tolower.c.auto.html], > and the footer of the file that reads implies that they're using data > from Unicode 3.2 (released in 2002, which would make sense given the > 2002 BSD copyright date in the tolower.c source): Sigh, can't they just ship machine consumable bitmaps or something? I can have my test plumbing pull "standards" files, ala: https://github.com/landley/toybox/blob/master/mkroot/packages/tests But an organization shipping a PDF or 9 interlocking JSON files with a turing complete stylesheet doesn't help much. > so, yeah, i don't think there was anything clever or mysterious going > on here --- macOS is just using Unicode data from 22 years ago. (which > is an amusing real-world example of why i keep saying "you probably > don't want to get into the business of redistributing Unicode data; it > changes every year" :-) ) A youtuber named Ryan McBeth is fond of explaining the difference between a "problem" and a "dilemma". A problem has an obvious solution, which may be painful or expensive but there's not a lot of disagreement on what success looks like. A dilemma has multiple ways to address it, each of which has something uniquely wrong with it. Problems don't lead to indecision, dilemmas do (and thus accumulate). In this case, the dilemma is "trusting libc to get it wrong differently in each new environment" vs "taking a large expense onboard with borderline xkcd violation". (If there is an xkcd strip explaining why not to do something, you probably shouldn't do it. In this case https://xkcd.com/927/ ) Which is _sad_ because there's only a dozen ispunct() variants that read a bit out of a bitmap (and haven't significantly changed since K: neither isblank() nor isascii() is worth the wrapper), plus a toupper/tolower pair that map integers with "no change" being the common case. Plus unicode has wcwidth(). Yes, it's over a (sparse!) table with space for a million entries, but CSV encoding all that data in human+machine readable ASCII should gzip down to what, 500k? Let's see, the bits seem to be alpha, cntrl, digit, punct, and space, and then width (mostly 0, 1, or 2 but we've talked about exceptions), and two translation codepoints for toupper and tolower. You can easily derive isalnum() and isxdigit(), and isascii() and isblank() are trivial according to the man page. If the table has upper and lower mappings (I.E. what character this turns into, zero if it doesn't) then you don't need isupper() or islower() bits unless there's cases where "this isn't upper case but can be converted to lower case" (which aren't covered by having BOTH toupper() and tolower() mappings for the same character). I'm honestly unclear on what "isgraph" does, "any printable character except space"... if isprint() means "not width 0" then that's just adding && !isspace() so doesn't need to be in the table. So code, alpha, cntrl, digit, punct, space, width, upper, lower. Something like: 0,0,0,0,0,0,0,0,0 13,0,1,0,0,1,0,0,0 32,0,0,0,0,1,1,0,0 57,0,0,1,0,0,1,0,0 58,0,0,0,1,0,1,0,0 65,1,0,0,0,0,1,0,97 No, that doesn't cover weird stuff like the right-to-left gearshift or the excluded mapping ranges or even the low ascii characters having special effects like newline and tab, but those aren't really "characters" are they? Special case the special cases, don't try to represent them
Re: [Toybox] microcom.c discarding data due to TCSAFLUSH
On 5/20/24 09:42, Yi-Yo Chiang via Toybox wrote: > Is there any particular reason to use TCSAFLUSH here? Partly because it's what strace said busybox and minicom were doing, and partly because I've had serial hardware that produced initial static on more than one occasion. In this case, it looks like Elliott also put it in his initial contribution (commit 12fcf08b5c96). > If not, can we change to TCSADRAIN or TCSANOW. I don't think there is good > reason to _discard received data_ just to set the terminal mode...? Is there > really a real world case that the device termios is so dirty that all data, > from > before setting raw mode, must be discarded? I've seen multiple instances where there was initial noise from the port going live before the speed stabilized, or static from a physical connection plugging in or powering up, or truncated bootloader messages that filled up the input buffer then abruptly cut off. > I also tried to modify the microcom code to skip tcsetattr() if the device > termios is already equal to the mode we are setting it. > `if (old_termios != new_termios) tcsetattr(new_termios, TCSAFLUSH)` > However this doesn't work because microcom always tries to set the device > baud. Hmmm, you're right, it shouldn't mess with that unless we specify -s. I could also make TCSAFLUSH only happen when we do -s (because otherwise it's an existing connection and we're not messing with it, but I still need to make sure it's in raw mode)... Note: FLAG(s)*TCSAFLUSH becomes 0 (TCSANOW) in the absence of -s. > For example a pty device might be configured to use buad 38400, Why set the baud at all on a pty? A pseudo-terminal doesn't have a baud rate, leave it alone. (You can also inherit a serial port that was set up by the bootloader and should again just be left alone...) > but microcom > would want it to be 115200, thus flushing it's data. but pty doesn't really > care > about the baud most of the time AFAIK, so flushing data in this case just > seems > disruptive to the user experience. Setting baud rate and flushing are two different switches in the interface, but in this case flushing only when setting the baud rate seems a good use of the existing controls. Try commit 2043855a4bd5 Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] netcat: clarify documentation.
On 5/20/24 07:06, enh via Toybox wrote: > "Collate" means "sort", but -O is like -o other than buffering. It means "group". (The dictionary says "gather or arrange in the proper sequence".) I can see the confusion, but the collate button on the copier in high school stapled pages together (the pages came out the same order either way, the question was should the groups be attached). I also had a data entry work-study job in college where I had to "collate" reports (basically doing the same thing by hand, except using transparent file folders and this little plastic strip that slid along the edge to hold the pages in). We just had a thread about "buffering", and I find that _less_ illuminating in context. Sigh, "grouped", "streamed", "together", "terse", "what I thought it was doing until I actually compared the output side by side", "showing the actual data instead of the transaction boundaries that survived the nagle algorithm", "assembled", "congregated", "collected", "packaged", "declutered", "thesaurus"... How does "packed" sound? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] xputs: Do flush
On 5/20/24 08:32, Yi-Yo Chiang wrote: > Thanks! Adding TOYFLAG_NOBUF worked. > > I feel the same way about "manual flushing of the output buffer is a terrible > interface". I asked myself the question "Why am I manually flushing so much? > There must be a better way..." multiple times when I wrote the other patch > that > does s/xprintf/dprintf/, s/xputs/xputsn/ It's an annoying design quandry. > > Your other patch changes a bunch of xprintf() to dprintf() which is even > _more_ > > fun because dprintf() writes directly to the file descriptor (bypassing > the > > buffer in the libc global FILE * instance "stdio"), which means in the > absence > > of manual flushing the dprintf() data will go out first and then the > stdio > data > > will go out sometime later, in the wrong order. Mixing the two output > formats > > tends to invert the order that way, unless you disable output buffering. > > Which is the reason I replaced those all with the "flush" functions (xputsn) > or > direct fd-write functions (dprintf), so that their order won't shuffle. > Anyway the problem is moot now that we have TOYFLAG_NOBUF. Eh, not moot. Shifted. Currently there's one command using TOYFLAG_NOBUF, and a lot of recent buffering fixes: ea119151ccc5 59b041d14aec afeed2d46a9a a57e42a386b0 ca6bde9e1c43 I should probably audit all the commands and figure out which buffering type to use for each. (grep currently finds manual fflush() in hexedit, login, watch, and ps). But not today... > > But that hasn't been popular, and it's a pain to implement in userspace > > (because > > we don't have access to mulitple cheap timers like the kernel does, we need > > to > > take a signal and there's both a limited number of signals). > > do you run on anything that doesn't have real-time signals? i was > going to say that -- since toybox is a closed world -- you could just > use SIGUSR2, but i see that dhcp is already using that! but if you can > assume real-time signals, there are plenty of them... Within toybox I could probably come up with something, true. Although fflush() locking is still a bit problematic if I'm not depending on thread infrastructure. (Either I don't use FILE * and do it myself, or I require libc to be thread aware.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] xputs: Do flush
On 5/20/24 07:36, enh wrote: >> Adding flushing to xputs() is an Elliott question is because he's the one who >> can presumably keep stdio buffer semantics in his head. They make zero sense >> to >> me. I added a flush to xputsn() because in line buffering mode it was the >> "dangerous" one that had no newline thus no flush, but then when we go to >> block >> buffering mode xputs() needs a flush just like xputsn() would, and MAYBE it's >> good to have the flush because in line buffer mode it would be a NOP? Except >> the >> command selected block buffering mode precisely BECAUSE it didn't want to >> flush >> each line, so why should xputs() do it when the command asked not to? And if >> xputs() is doing it, why is xprintf() _not_ doing it? And if xprintf() _is_ >> doing it, then we're back to basically not buffering the output... > > this to me was exactly why it should be "everything flushes" or > "nothing flushes". not "some subset of calls for some subset of > inputs", because no-one can keep all the special cases in their heads. > and "everything flushes" is problematic from a performance > perspective, so "nothing flushes" wins by default. (but, yes, when we > have our own kernel, have a time-based component to buffering layer's > flushing is an interesting idea :-) ) Eh, now Yi-Yo's pointed me back at timer_create() and reminded me of realtime signals, it seems like the plumbing is there to make FILE * output use nagle. The problem is a userspace wrapper trying to fflush() from signal context assumes everything's written in an async-safe way (and of course that everyone else will SA_RESTART when interrupted), which either means spray it down with thread locking or use cmpxchg() within the flush() implementation, and either way involves me trusting libc in a way I currently don't. (And I can't do it "right" myself due to FILE * internals being opaque, I have to wrap an unknown implementation...) But it sounds quite feasible for _libc_ to do setvbuf(_IONAG) these days. :) (Sigh, or threads with signalfd(). Grrr.) >> I like systematic solutions that mean the problem never comes up again. >> Elliott >> has been advocating the whack-a-mole "fix them as we find them" approach here >> which I am not comfortable with. I've been leaning towards adding a >> TOYFLAG_NOBUF so we can select a third buffering type, and go back to "do not >> buffer output at all, flush after every ANSI write function" for at least >> some >> commands I'd like to be able to reason about. Especially for commands like >> this >> where output performance really doesn't seem like it should be an issue. > > +1 --- an inherently interactive program like this seems reasonable to > be unbuffered. (except for file transfer?) Isn't file transfer sending 4k blocks? Buffering gets really weird with this kind of program anyway: when you're sending data across a serial port that's breaking it up into individually transmitted bytes, and depending on what your 16550a-or-similar threshold is set to the recipient's probably getting notified of each group of 8 bytes. (And yes, the hardware uses SOMETHING LIKE NAGLE internally to enforce the input and output notification thresholds.) Then you layer ppp over it, which breaks your 4k into something like 1.5k chunks, then does nagle on that trying to fill out the last packet, and that's assuming there were no pipes in there which do their own re-collating on the data. And then of course files, once there's filesystem involvement... but of course the VFS is in there before that marshalling data into and out of page cache... The point of the output buffer is to deal with chunks of data "big enough" to amortize the transaction overhead. Zerocopy of the data has always been somewhat aspirational, and handing off buffers between page table contexts is often more expensive than copying it. (Or not! It changes between hardware generations and I didn't even PRETEND to be current on how the "mitigations for cache speculation side channel attacks" differ between different kernel versions running on different arm processors... There comes a point where "locality within process good, launch largeish buffer out into the operating system, wave bye-bye as it goes off into the machinery" is the best I can do. Although the definition of largeish still has the dregs of moore's law clinging to it with the recent 4k->256k push. (xmodem had 128 byte packets, I suppose it's roughly the same jump...) >> https://lists.gnu.org/archive/html/coreutils/2024-03/msg00016.html > > (fwiw, i think that was just some internet rando asking for it, no? > and they didn't actually implement it?) Padraig's reply was "this does seem like useful functionality" and a pointer to the libc people, and then there were over a dozen additional replies in the thread, so I wouldn't call it a clear no... > do you run on anything that doesn't have real-time signals? i was > going to say that -- since toybox is a closed world -- you
Re: [Toybox] [PATCH] xputs: Do flush
On 5/18/24 21:53, Yi-Yo Chiang wrote: > What I wanted to address with this patch are: > > 1. Fix this line of > xputs() https://github.com/landley/toybox/blob/master/toys/net/microcom.c#L113 > The prompt text is not flushed immediately, so it is not shown to the user > until > the escape char is entered (which defeats the purpose of the prompt, that is > to I agree you've identified two problems (unflushed prompt, comment not matching code) that both need to be fixed. I'm just unhappy with the solutions, and am concerned about a larger design problem. I implemented TOYFLAG_NOBUF and applied it to this command. The result compiles but I'm not near serial hardware at the moment, does it fix the problem for you? Trying to fix it via micromanagement (adding more flushing and switching some but not all output contexts in the same command between FILE * and file descriptor) makes my head hurt... Adding flushing to xputs() is an Elliott question is because he's the one who can presumably keep stdio buffer semantics in his head. They make zero sense to me. I added a flush to xputsn() because in line buffering mode it was the "dangerous" one that had no newline thus no flush, but then when we go to block buffering mode xputs() needs a flush just like xputsn() would, and MAYBE it's good to have the flush because in line buffer mode it would be a NOP? Except the command selected block buffering mode precisely BECAUSE it didn't want to flush each line, so why should xputs() do it when the command asked not to? And if xputs() is doing it, why is xprintf() _not_ doing it? And if xprintf() _is_ doing it, then we're back to basically not buffering the output... > tell the user what the escape char is) and stdout is flushed by handle_esc. > To fix this we either make xputs() flush automatically, or we just add a > single > line of manual flush() after xputs() in microcom.c. > Either is fine with me. When I searched for the first xputs in microcom I got: xputsn("\r\n[b]reak, [p]aste file, [q]uit: "); if (read(0, , 1)<1 || input == CTRL('D') || input == 'q') { Which is a separate function (the n version is no newline, it does not add the newline the way libc puts() traditionally does), with its own flushing semantics: xputsn() doesn't call xputs(), and neither calls or is called by xprintf(). "Some functions flush, some functions don't" is a bit of a design sharp edge. The larger problem is manual flushing of the output buffer is a terrible interface, and leads to missing error checking without which a command won't reliably exit when its output terminal closes because the whole SIGPIPE thing was its own can of worms that even bionic used to manually mess with. Which is why I originally made toybox not ever do that (systemic fix) but I got complaints about performance. Your other patch changes a bunch of xprintf() to dprintf() which is even _more_ fun because dprintf() writes directly to the file descriptor (bypassing the buffer in the libc global FILE * instance "stdio"), which means in the absence of manual flushing the dprintf() data will go out first and then the stdio data will go out sometime later, in the wrong order. Mixing the two output formats tends to invert the order that way, unless you disable output buffering. I like systematic solutions that mean the problem never comes up again. Elliott has been advocating the whack-a-mole "fix them as we find them" approach here which I am not comfortable with. I've been leaning towards adding a TOYFLAG_NOBUF so we can select a third buffering type, and go back to "do not buffer output at all, flush after every ANSI write function" for at least some commands I'd like to be able to reason about. Especially for commands like this where output performance really doesn't seem like it should be an issue. And OTHER implementations can't consistently get this right, which is why whether: for i in {1..100}; do echo -n .; sleep .1; done | less Produces any output before 10 seconds have elapsed is potluck, and varies from release to release of the same distro. Oh, and the gnu/crazies just came up with a fourth category of write as a gnu/extension: flush after NUL byte. https://lists.gnu.org/archive/html/coreutils/2024-03/msg00016.html It's very gnu to fix "this is too complicated to be reliable" by adding MORE complication. Note how the problem WE hit here was 1) we didn't ask for LINEBUF mode, 2) \r doesn't count as a line for buffer flushing purposes anyway, 3) the new feature making it trigger on NUL instead _still_ wouldn't make \r count as a line for buffer flushing purposes. My suggestion for a "proper fix" to the problem _category_ of small writes being too expensive was to have either libc or the kernel use nagle's algorithm for writes from userspace, like it does for network connections. (There was a fix to this category of issue decades ago, it just never got applied HERE.) But that hasn't been popular, and it's a pain to implement in
Re: [Toybox] [PATCH] xputs: Do flush
On 5/16/24 06:46, Yi-Yo Chiang via Toybox wrote: > The comment string claims xputs() to write, flush and check error. > However the 'flush' operation is actually missing due to 3e0e8c6 > changing the default buffering mode from 'line' to 'block'. That's sort of an Elliott question? Originally, xprintf() and friends all flushed (which is necessary to detect output errors and xexit() if so), but Elliott complained that was too slow, so the flushes got removed, and then we changed the default stdout buffering type, and... Alas, it was a whole multi-year thing. Elliott has volunteered to put manual flushes everywhere it's a problem. I've seriously thought about going exclusively to file descriptor output (dprintf() is in posix now) and leaving FILE * for input only. Personally, I honestly believe the _proper_ fix is to upgrade the kernel to use vdso to implement nagle's algorithm on file descriptor 1: https://landley.net/notes-2024.html#28-04-2024 But I'm not holding my breath. Rob P.S. I should post some subset of https://landley.net/bin/mkroot/latest/linux-patches/ to linux-kernel again. So they can be ignored again. ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] netcat -f bug
On 5/11/24 02:11, Yi-Yo Chiang wrote: > On Sat, May 11, 2024 at 1:30 AM Rob Landley <mailto:r...@landley.net>> wrote: > > What's your use case triggering this patch? Because without that, I go > off on > various design tangents, as seen below: > > I just wanted some tool to communicate with a pty or socket node on android. > Wanted a program to be able to send/recv towards a duplex data stream. (more > precisely I want a command that does exactly what pollinate() does) > Since socat nor minicom is available on Android, I'm just using `stty raw > -echo > && nc -f` to "talk" to my pty. > > Why didn't I use <> redirector? Because I wasn't aware of that feature before > reading this mail... > Let me fiddle with it a bit: > > cat <>/dev/pts/0 >> Shows the pts output, but my input doesn't get passed back Sorry for sitting on this, my confusion here is I don't know what /dev/pts/0 means in your test, and the pts man page isn't illuminating. It doesn't seem to be special, it just seems to be the first one allocated? (So who allocated it on android?) According to "tty" in a random command line tab that one's using /dev/pts/17, and ps ax | grep pts/0 says it's PID 14597 a random bash instance, so I don't think the test lines up on a debian+xfce laptop. What is your test trying to _do_? (What process are you talking to?) > yeah like you said it should had fall through and be like -l. > However digging the git history the fall through line got removed > here > https://github.com/landley/toybox/commit/67bf48c1cb3ed55249c27e6f02f5c938b20e027d > which is unintentional I think? Yeah, lack of automated regression testing for this, which is why I want to understand and fix the test... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strlower() bug
On 5/14/24 12:12, enh wrote: > On Tue, May 14, 2024 at 1:04 PM Rob Landley wrote: >> >> On 5/14/24 07:10, enh wrote: >> > macOS tests seem to be broken since this commit? >> > >> > FAIL: find strlower edge case >> > echo -ne '' | touch aⱥ; find . -iname aȺ >> > --- expected 2024-05-10 17:32:56.0 + >> > +++ actual 2024-05-10 17:32:56.0 + >> > @@ -1 +0,0 @@ >> > -./aⱥ >> >> Sigh. Apple's handling of utf8/unicode continues to be... "a challenge". >> >> When I run "make test_find" standalone, it gives me: >> >> scripts/runtest.sh: line 219: syntax error near unexpected token `;' >> scripts/runtest.sh: line 219: ` R) LEN=0; B=1; ;&' >> >> Because bash 3.2 from 2007 doesn't understand ;& > > yeah, nor does mksh. it hasn't caused me any problems though; i've > been ignoring it for years now. > >> And THEN it goes: >> >> touch: out of range or illegal time specification: >> -MM-DDThh:mm:SS[.frac][tz] >> touch: out of range or illegal time specification: >> -MM-DDThh:mm:SS[.frac][tz] >> FAIL: find newerat >> echo -ne '' | find dir -type f -newerat @12345 >> --- expected2024-05-14 11:16:40.0 -0500 >> +++ actual 2024-05-14 11:16:40.0 -0500 >> @@ -1 +0,0 @@ >> -dir/two >> >> Which is a different error that DOESN'T happen with the global tests, because >> those are using toybox touch rather than homebrew's $TOUCH. But it works on >> debian. Let's see: >> >> $ touch --version >> touch: illegal option -- - >> usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]] >>[-d -MM-DDThh:mm:SS[.frac][tz]] file ... >> >> Thank you, gnu project. I'm gonna assume this is _also_ from 2007. (I made >> scripts/prereq/build.sh for a REASON...) > > no, i think this is a BSD touch. > > yeah, that looks very like the FreeBSD touch's usage: > > static void > usage(const char *myname) > { > fprintf(stderr, "usage: %s [-A [-][[hh]mm]SS] [-achm] [-r file] " > "[-t [[CC]YY]MMDDhhmm[.SS]]\n" > " [-d -MM-DDThh:mm:SS[.frac][tz]] " > "file ...\n", myname); > exit(1); > } > > >> Then when I run "make clean macos_defconfig tests" I get: >> >> Undefined symbols for architecture arm64: >> "_iconv", referenced from: >> _do_iconv in iconv.o >> (maybe you meant: _iconv_main) >> "_iconv_open", referenced from: >> _iconv_main in iconv.o >> ld: symbol(s) not found for architecture arm64 >> >> Because the Makefile has: >> >> tests: ASAN=1 >> tests: toybox >> scripts/test.sh >> >> And ASAN apparently breaks on homebrew's toolchain but not debian's >> toolchain. >> Why does it break there but not on Linux... >> >> probe cc -Wall -Wundef -Werror=implicit-function-declaration >> -Wno-char-subscripts -Wno-pointer-sign -funsigned-char >> -Wno-deprecated-declarations -Wno-string-plus-int >> -Wno-invalid-source-encoding >> -fsanitize=address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls >> -xc -o /dev/null - >> error: cannot parse the debug map for '/dev/null': The file was not >> recognized >> as a valid object file >> clang: error: dsymutil command failed with exit code 1 (use -v to see >> invocation) >> >> Because it tries to read back the -o output we discarded, and fails when it >> can't do so, thus all library probes fail and it tries to build with no >> libraries. But only when ASAN is enabled, because ASAN uses -o as INPUT. >> Bravo. >> >> None of this is the actual unicode failure, this is just ambient macos... FAIL: find strlower edge case echo -ne '' | touch aⱥ; find . -iname aȺ --- expected2024-05-14 13:32:19.0 -0500 +++ actual 2024-05-14 13:32:19.0 -0500 @@ -1 +0,0 @@ -./aⱥ make: *** [tests] Error 1 cfarm104 (homebrew):toybox landley$ ls generated/testdir/testdir/ a? $ LC_ALL=en_US.UTF-8 ls generated/testdir/testdir a? $ generated/testdir/ls generated/testdir/testdir a\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245 $ echo -./aⱥ -./aⱥ $ generated/testdir/ls -N generated/testdir/testdir aⱥ cfarm104 (homebrew):toybox landley$ generated/testdir/ls -N g
Re: [Toybox] strlower() bug
On 5/14/24 07:10, enh wrote: > macOS tests seem to be broken since this commit? > > FAIL: find strlower edge case > echo -ne '' | touch aⱥ; find . -iname aȺ > --- expected 2024-05-10 17:32:56.0 + > +++ actual 2024-05-10 17:32:56.0 + > @@ -1 +0,0 @@ > -./aⱥ Sigh. Apple's handling of utf8/unicode continues to be... "a challenge". When I run "make test_find" standalone, it gives me: scripts/runtest.sh: line 219: syntax error near unexpected token `;' scripts/runtest.sh: line 219: ` R) LEN=0; B=1; ;&' Because bash 3.2 from 2007 doesn't understand ;& And THEN it goes: touch: out of range or illegal time specification: -MM-DDThh:mm:SS[.frac][tz] touch: out of range or illegal time specification: -MM-DDThh:mm:SS[.frac][tz] FAIL: find newerat echo -ne '' | find dir -type f -newerat @12345 --- expected2024-05-14 11:16:40.0 -0500 +++ actual 2024-05-14 11:16:40.0 -0500 @@ -1 +0,0 @@ -dir/two Which is a different error that DOESN'T happen with the global tests, because those are using toybox touch rather than homebrew's $TOUCH. But it works on debian. Let's see: $ touch --version touch: illegal option -- - usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]] [-d -MM-DDThh:mm:SS[.frac][tz]] file ... Thank you, gnu project. I'm gonna assume this is _also_ from 2007. (I made scripts/prereq/build.sh for a REASON...) Then when I run "make clean macos_defconfig tests" I get: Undefined symbols for architecture arm64: "_iconv", referenced from: _do_iconv in iconv.o (maybe you meant: _iconv_main) "_iconv_open", referenced from: _iconv_main in iconv.o ld: symbol(s) not found for architecture arm64 Because the Makefile has: tests: ASAN=1 tests: toybox scripts/test.sh And ASAN apparently breaks on homebrew's toolchain but not debian's toolchain. Why does it break there but not on Linux... probe cc -Wall -Wundef -Werror=implicit-function-declaration -Wno-char-subscripts -Wno-pointer-sign -funsigned-char -Wno-deprecated-declarations -Wno-string-plus-int -Wno-invalid-source-encoding -fsanitize=address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls -xc -o /dev/null - error: cannot parse the debug map for '/dev/null': The file was not recognized as a valid object file clang: error: dsymutil command failed with exit code 1 (use -v to see invocation) Because it tries to read back the -o output we discarded, and fails when it can't do so, thus all library probes fail and it tries to build with no libraries. But only when ASAN is enabled, because ASAN uses -o as INPUT. Bravo. None of this is the actual unicode failure, this is just ambient macos... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] I'm aware landley.net is saying "site.not found".
That dreamhost server migration they did? (The recent "only 2 years old" version thread?) Does not seem to have correctly updated the DNS record. Of the domain they manage for me. Dreamhost continues to provide nine fives of uptime. I've pinged support already, they'll probably get back to me in the morning. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] today in "shut up, gnu!"
On 4/12/24 13:24, enh via Toybox wrote: > ~/aosp-main-with-phones$ find external/ -name NOTICE -type l -maxdepth 2 > find: warning: you have specified the global option -maxdepth after > the argument -name, but global options are not positional, i.e., > -maxdepth affects tests specified before it as well as those specified > after it. Please specify global options before other arguments. Looking back at this (ok, closing tabs), I think I implemented this the same as any other option, so you can "-type l -o -maxdepth 2" and friends. The thing is, when maxdepth triggers it returns without recursing on the path being evaluated, so you'd have to "-type d -o maxdepth 2" for the difference to matter. (Recursing into lower entries but not triggering on them.) But it's not "global" in any magic way. It's just... another option? Which doesn't seem WRONG... And of course https://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html hasn't got maxdepth. Meanwhile, busybox has: //config:config FEATURE_FIND_MAXDEPTH //config: bool "Enable -mindepth N and -maxdepth N" //config: default y //config: depends on FIND And: #define INIT_G() do { \ setup_common_bufsiz(); \ BUILD_BUG_ON(sizeof(G) > COMMON_BUFSIZE); \ /* we have to zero it out because of NOEXEC */ \ memset(, 0, sizeof(G)); \ IF_FEATURE_FIND_MAXDEPTH(G.minmaxdepth[1] = INT_MAX;) \ IF_FEATURE_FIND_EXEC_PLUS(G.max_argv_len = bb_arg_max() - 2048;) \ G.need_print = 1; \ G.recurse_flags = ACTION_RECURSE; \ } while (0) And I miss the days when I worked on that project and it was SIMPLE. I liked simple. That's what attracted me to it in the first place... https://git.busybox.net/busybox/commit/?id=053c12e0de30 Yeah, I'm not even trying to understand that right now. I'll take my 730 lines over their 1750 lines any day, I don't CARE who has the smaller binary size after stripping specific ELF table entries... Anyway, I should come up with a test for maxdepth acting as a normal option vs acting as a global option... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] nproc(1)
Relevant blog entry is https://landley.net/notes-2022.html#26-07-2022 > Meanwhile, I found out that musl has a bug! The nproc command has two > modes, the default shows available processors (as modified by taskset), > and nproc --all shows installed processors (whether or not the current process > can schedule on them). One codepath is _SC_NPROCESSORS_CONF and the other > is _SC_NPROCESSORS_ONLN. Except musl does ONLN for both, it hasn't got the > second codepath, which according to strace is checking /sys/devices/system/cpu > in glibc, and the bionic source has a comment saying that /proc/cpuinfo > works fine on x86 but arm is broken because arm filters out the > taskset-unavailable processors from that, so you have to look at the sysfs > one to work around the arm bug. And then me ruminating that mkroot is all single processor emulations so testing this is once again a design issue... Pretty sure I poked Rich about it at the time, but I just I confirmed that musl still has the bug in today's git. And the above bionic note is apparently why my code is looking at sysfs to get the data, and "strace nproc --all" on debian says that's what they're doing too, and ltrace says it's doing the getconf() so yes glibc is also doing it. Musl will apparently allow itself to read data out of /proc, or at least there's 13 hits in the current codebase, but has zero instances of reading out of /sys. Rob On 5/2/24 11:20, enh wrote: > (to be fair, i was shocked the first time i had to deal with an > Android device where these weren't both the same...) > > On Thu, May 2, 2024 at 9:18 AM enh wrote: >> >> /facepalm >> >> maybe move your hand-written version into portability just for musl, >> and everyone with a working libc just uses sysconf()? >> >> On Tue, Apr 30, 2024 at 8:26 PM Rob Landley wrote: >> > >> > On 4/29/24 16:56, enh via Toybox wrote: >> > > isn't nproc(1) just a call to sysconf(3) with either >> > > _SC_NPROCESSORS_ONLN for regular behavior, or _SC_NPROCESSORS_CONF for >> > > --all? >> > >> > From musl src/conf/sysconf.c: >> > >> > case JT_NPROCESSORS_CONF & 255: >> > case JT_NPROCESSORS_ONLN & 255: ; >> > unsigned char set[128] = {1}; >> > int i, cnt; >> > __syscall(SYS_sched_getaffinity, 0, sizeof set, set); >> > for (i=cnt=0; i> > for (; set[i]; set[i]&=set[i]-1, cnt++); >> > return cnt; >> > >> > Musl returns the same thing for "conf" and "online". >> > >> > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] stty bug
On 5/10/24 06:15, Yi-Yo Chiang via Toybox wrote: > The _negate & combination_ type of settings are bugged. > > `stty cooked` and `stty raw` works fine, but the negated options: > > $ stty -raw > stty: unknown option: cooked > $ stty -cooked > stty: unknown option: raw Ack, added to the notes for that command. (It's in pending for a reason...) Thanks, Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] unshare/nsenter and flags
On 5/10/24 18:46, Yifan Hong wrote: > I am running all commands as a non-root user. Here are the two commands I run: > > strace ./toybox unshare --mount --map-root-user --user /bin/bash -c 'echo' > 2>&1 > | tee /tmp/user.txt > strace ./toybox unshare --mount --map-root-user /bin/bash -c 'echo' 2>&1 | tee > /tmp/no_user.txt > strace unshare --mount --map-root-user /bin/bash -c 'echo' 2>&1 | tee > /tmp/no_user_linux.txt $ unshare --mount --map-root-user --user /bin/bash -c echo unshare: unshare failed: Operation not permitted That's on my host devuan. Let's see about newer... Ah, booting a daedalus ISO under KVM, the command works. Looks like they added (enabled?) new kernel plumbing between 3.0 and 5.0. > Got about half my laptop tabs closed so far! Working towards a reboot... Ok, time to bite the bullet and finish that, if I need the upgrade to test a fix... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strlower() bug
On 5/8/24 16:27, Ray Gardner wrote: > BTW I was a bit surprised that mentioning my awk for toybox got no reaction. Oh I'm interested, but somebody (probably you) mentioned they were looking into it before, and I'll wait to see some code first. :) (The problem with asking to see code early is pending/git.c isn't useful and that's as far as the original developer had time/energy for, and I haven't personally opened that can of worms y et. The problem with waiting until it's done is pending/bc.c was several times larger than I expected and I'm uncertain if I even want to personally open that can of worms.) That said, if you're actively working on it and wanted to do a brief design infodump here, consider it solicited. :) Rob P.S. Also, my old Austin house finally went on the market last weekend and we got a lowball bid two days later that the realtor was doing the GO GO GO DON'T STOP TO THINK ACT NOW SUPPLIES RUNNING OUT pressure thing you see in most scams, because apparently houses and bananas have a similar lifespan and if it's on the market for longer than it takes anyone outside the realtor's immediate friend circle to find out about it the world will end. So now there's all the paperwork in the world. Fade and I had to get a printout notarized on wednesday. They're asking when we last had all the plumbing and wiring in the walls replaced. (Is that a thing people regularly do after houses are built?) I had to docusign a leaded paint affidavit addendum. The old bank that holds the mortage we're paying off called me to try to upsell me on a NEW mortgage (we're renting for now), and the person wanting to "transfer me to an agent" wouldn't get off the phone for 20 minutes. And Wells Fargo got our addresses updated so it says our checking account type is changing to one that charges us a $10 fee/month for existing. Anyway, if I seem a bit distracted right now it's because I am. ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] netcat -f bug
What's your use case triggering this patch? Because without that, I go off on various design tangents, as seen below: On 5/10/24 06:09, Yi-Yo Chiang via Toybox wrote: > Hi, > The -f option for netcat doesn't seem to be doing anything right now. I should have a test for that, but to be honest I came up with netcat -f back in busybox (commit 1cca9484db69 says 2006) before I knew about bash's <> redirector to open a file for both reading _and_ writing (or had bash not added it yet?), meaning the example in that commit probably _should_ have been stty 115200 -F /dev/ttyS0 && stty raw -echo -ctlecho && cat <>/dev/ttyS0 >&0 2>&0 (I should NOT ask Chet for "{0-2}<>/dev/ttyS0" syntax operating on a filehandle range. I should not do it. That would be... I dunno, rude? I mean in theory I'd just want him to fix the existing {1..2} syntax to do one open() and then dup() redirects instead of opening the device multiple times, which was the initial problem because reopening the /dev node instead of dup() an existing filehandle to it either gave -EBUSY or hardware reset the UART depending on the underlying driver, and the reason chet would give me a LOOK if I asked is {brace,expansion} is resolved _before_ variable expansion and redirection, so it literally turns INTO 3 arguments with different numbers and thus three separate open() calls to the char device, and making it do something else is basically a layering violation...) Ahem. Sorry. Tangent. It's possible netcat -ft makes it still useful, but A) that implies there should be some sort of tty wrapper in the nice/taskset/time/chroot/nohup mold, B) I think -t is currently broken because I needed to rewrite it to add nommu support (decompose forkpty() into the underlying openpty() and login_tty() calls around the vfork() instead of fork()) and just commented it out and put it on the todo list... The original theory was -f should fall through to the "else" case on line 191, and thus naturally inherit any other applicable options. Which is hard to see in my current tree because with a bunch of half-finished work in it: $ git diff toys/*/netcat.c | diffstat netcat.c | 62 +- 1 file changed, 49 insertions(+), 13 deletions(-) Sorry for falling behind... > It is > missing a call to pollinate() after opening the specified device file. > The patch adds back that line of pollinate(). Which makes it not work with running commands (ala -f should work like -l). > Also make sure that the timeout handler is not armed for -f mode as -f > shouldn't > timeout. File open() should just succeed or fail immediately. Why shouldn't -f timeout? Various /dev nodes take a while to open, automount behind the scenes... Is there a downside to leaving that part as is? (Other than the new case you added not alarm(0) disarming it?) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] unshare/nsenter and flags
Ok, cycling back to this... On 5/2/24 21:51, enh wrote: >> > it seems like -r _doesn't_ actually imply -U in practice (and they >> > seemed to have strace output to prove it). >> >> So... should it? > > i think so? i have no idea about any of this, but > https://man7.org/linux/man-pages/man1/unshare.1.html says > >-r, --map-root-user >Run the program only after the current effective user and >group IDs have been mapped to the superuser UID and GID in >the newly created user namespace. This makes it possible to >conveniently gain capabilities needed to manage various >aspects of the newly created namespaces (such as configuring >interfaces in the network namespace or mounting filesystems >in the mount namespace) even when run unprivileged. As a mere >convenience feature, it does not support more sophisticated >use cases, such as mapping multiple ranges of UIDs and GIDs. >This option implies --setgroups=deny and --user. This option >is equivalent to --map-user=0 --map-group=0. > > which sounds like it supports the toybox documentation rather than the > toybox source? > >> What did they try to do, and what did they _want_ to happen? > > unshare --mount --map-root-user /bin/sh -c "mount --bind $A $B" Running that as my normal user gave EPERM on the unshare(CLONE_NEWNS) which is the reason I haven't poked at this more. (To be useful, it seems like it probably needs to be setuid and then drop permissions after unsharing stuff, and I need to come up to speed on the security implications of that and possibly write a "contain" command with as little novelty as possible. Which is not a can of worms I want to open without a clear desk...) Running it under sudo I got: openat(AT_FDCWD, "/proc/self/setgroups", O_WRONLY) = 3 write(3, "deny", 4) = -1 EPERM (Operation not permitted) > they looked at strace for toybox and saw > > unshare(CLONE_NEWNS)= -1 EPERM (Operation not permitted) > > but for the util-linux one they saw > > unshare(CLONE_NEWNS|CLONE_NEWUSER) = 0 Are they root or a normal user? Because adding -U to the above command line I got: geteuid() = 1000 getegid() = 1000 unshare(CLONE_NEWNS|CLONE_NEWUSER) = -1 EPERM (Operation not permitted) But with sudo, that succeeded and adding an ls -l to the bash command yes it did the bind mount, which is gone again when it exits. >> The "22.04" means it came out two years and one month ago, and that's what >> they're migrating me TO. So, you know, I can presumably feel less bad about >> my >> laptop... > > (to be fair, until _last week_ that was the current LTS release :-) > but, yeah, odd timing unless they deliberately like to be on the > previous LTS release! i'll throw no stones as long as i'm living so > close to the Android build server glass house though...) Got about half my laptop tabs closed so far! Working towards a reboot... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strlower() bug
On 5/6/24 17:12, Ray Gardner wrote: > While working on an awk implementation for toybox, I found a bug in > strlower(), which is used only in find.c. I've attached some tests to > put in find.test to reveal it. I can't put them here directly because > I don't think the UTF-8 names will come through. (I modelled my awk > tolower()/toupper() code on your strlower().) Your test doesn't create the files you're finding, so find is supposed to fail? Your first test doesn't barf under ASAN, and then the second one's going to fail because echo -n | wc says it's 258 bytes and the VFS file length limit is 255 bytes, so there CAN'T be a file named that on Linux. (Path length != path component length, there's no slashes in there.) > The problem is in the test if the output string needs to be enlarged > to take an expanded lowercase: > // Case conversion can expand utf8 representation, but with extra mlen > // space above we should basically never need to realloc > if (mlen+4 > (len = new-try)) continue; > > The mlen+4 needs to be mlen-4 to leave at least 4 bytes for the next > character. Hmmm, possibly. I still don't understand what your test case is testing. (Just trying to trigger an ASAN violation with an otherwise nonsense test?) > As the comment indicates, it should "never" need to realloc; No, the first comment is "never" because triggering probably indicates a libc bug (we converted it from valid utf8 to a unicode code point, ran it through libc's towlower(), and are now trying to convert the result _back_ to utf8, an encoding hiccup at this point seems unlikely? But I don't trust locale plumbing ever, so...) The second is "basically never" because it requires an insane input string, but that's user controlled and users do crazy things, sometimes even intentionally. > it takes > a very long name of uppercase characters that do expand when made > lowercase. But the code is there to handle that very case. The first malloc rounds the allocation up to next 8 byte boundary _after_ what it's actually using, so 9-16 bytes of zeroes at the end, and assuming the conversion only ever grows 1 byte (I don't remember the pathological expansion case, it's in my blog somewhere, but your test is turning c8 ba into e2 b1 a5 which is 1 byte of expansion) then you need at least 8 expanding unicode code points to burn through the padding, so your first test string is too short to trigger a problem. And your second is too long to produce a valid filename, so the test can't _succeed_... Sigh, lemme come up with a test that demonstrates the fix working... the minimal one seems to be ./find . -iname aȺ And then, of course, TEST_HOST fails because I need to enable a utf8 locale, but I made plumbing for that recent-ish-ly... commit 6800a95ef328 > BTW, when I run those tests, they "PASS", but show as aborted: > corrupted size vs. prev_size > scripts/runtest.sh: line 137: 265983 Aborted find . > -iname > AC > PASS: find utf8 uppercase long name Odd. > The test echos and checks the $? return code and the abort apparently > leaves that as 0. That could be anything from a bash issue to your distro's libc. The only trap in tests.sh is for SIGINT, and that handler isn't inherited by child processes. The return code of a process killed by a signal should be 128+signum, which the test plumbing would notice if it was the actual exit code of your shell snippet. I checked in a test that should actually succeed, but would fail with ASAN enabled before the bug was fixed. > Is there a way to fix the test system so it can > force the exit code to be something else? Not if the signal/exit isn't allowed to propagate back to it by the test. You ran a child process and then unconditionally did an ;echo $? meaning test.sh doesn't get notified of the child process getting killed by a signal, it unconditionally (because ;) went on to run a second command, "echo" which is returning whatever your bash recorded. Some distros have horrible fault interceptors that log crap into syslog or dmesg or some such, AND THEN RETURN SUCCESS. (Which is doubly insane: A) a program faulting does not need to be globally logged on a development system, B) returning success when that happens is very sad, but their "logic" was that some scripts would otherwise misbehave.) > When I run the test from a > command line directly in bash, it gets a code of 134 (SIGABRT). Without ASAN I'm getting 139 (128+11 = SIGSEGV). There would appear to be a difference in our environments. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"
On 5/7/24 15:50, Rob Landley wrote: > And THAT was based on the old environment setup I used to do in Firmware Linux > to give User Mode Linux a mostly writeable chroot despite starting with > https://user-mode-linux.sourceforge.net/hostfs.html but that was back before > git > was invented so I just have a bunch of tarball snapshots over the years (at > https://landley.net/aboriginal/downloads/old/) rather than A) Sorry, forgot to explain, B) That's not even the old one I'm talking about, https://landley.net/aboriginal/old/download/snapshots probably is. User Mode Linux is a port of Linux to userspace, I.E. making the "vmlinux" ELF file built at the top of the tree an actual runnable Linux program, which boots its own little VM and runs processes inside it. This predated QEMU or KVM by a decade, and was one of the first ways to run a virtual Linux system without requiring root access on the host. Firmware Linux was built around it, Aboriginal Linux was the relaunch targeting QEMU instead (and doing cross compiling, because UML only ever properly supported x86 for some reason). UML had the "hostfs" filesystem, which acted like a network filesystem making a directory from the host appear in a directory of the virtual system. (Again, decades before virtfs and friends, although NFS and Samba were around.) The problem was, a hostfs file belonging to root (UID 0) wasn't writeable to root within the VM. The mount point was SORT of writeable, but it was getting translated on the host to reads/writes/renames/deletes as the host user running UML, and then the translated syscall would fail and failures that shouldn't happen were getting returned on the client. And this included fixups you needed to do like replacing /etc/mtab with a symlink to /proc/mounts (because mount points became a per-process attribute in Linux 2.5, so a single global mount table as a filesystem maintained by the userspace mount tool didn't cut it anymore). So I made a script that created a new directory in the host user's fully writeable space and populated it with symlinks to host resources before chrooting into it (all within UML), so I had access to the host stuff I needed but could also replace it all as needed. And that's what I did my emulated Linux >From Scratch build in, back around 2004. Anyway, "here's a thing that needs to be spliced into the $PATH, you may want to use symlinks" sometimes goes "whoosh" over my head as "hard for people who haven't done it before" because to ME it's a 20 year old trick. Sorry 'bout that... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"
On 5/6/24 23:33, Oliver Webb wrote: > On Sunday, May 5th, 2024 at 21:21, Rob Landley wrote: > >> Oh, the other todo item here is "multiple overlays". The current overlay >> package >> was a quick hack, never did the design work to figure out what what more >> complication should look like. Partly waiting for people to complain to me >> that >> they need more than it does... > > Maybe making the OVERLAY variable a delimiter separated list, looping over > it each time the overlay package is specified. Then indexing the OVERLAY > variable like a > array with that counter (I don't really know how bash arrays work, I think > this is easy > with them though from my vague knowladge, although I don't know)? Various things are easy to implement, the question is what user interface works best. The rest of mkroot uses CSV internally so having CSV in an option isn't that heavy a lift. (Although it hasn't been presented as external UI before, and relative vs absolute paths in a comma separated list is a bit tricksy, and we never DID address "what happens if you define the same variable twice on the command line? Right now it overwrites...) The other sharp edge is "when files conflict between overlays, do you overwrite or leave the old one or what". And of course the "following symlinks out of tree" problem, which I added a tar option to address and what I've vaguely thought of doing here was having toybox tar handle it doing the tar c | tar xv trick with --restrict to just leverage the existing stuff. And I have a note about sparse file handling, which is at least 3 todo items combined into one note: 1) cpio sparse handling is part of the periodic https://lwn.net/Articles/789228/ threads that never resolved last I checked, the last attempt wound up diverging into https://lkml.iu.edu/hypermail/linux/kernel/2207.3/06939.html which eventually went upstream (after it got completely rewritten to not smell like me so Greg KH would tolerate it) but the cpio extension part didn't get brought back up that I've been cc'd on... 2) tar sparse handling should have both modes (SEEK_HOLE and detect runs of zeroes), and then the tar.test stuff updated to mostly use the runs of zeroes because there are some TERRIBLE FILESYSTEM implementations out there and none of them seem to agree on span granularity. (How big IS the run of zeroes? Where are the edges? Just seek past 4k aligned blocks isn't good enough, and it doesn't look like 64k is either. Don't get me started on "ecryptfs"... 3) add sparse support to cp.c. (Grumble grumble --sparse longopt without short opt, and should --sparse=auto be the default behavior? If the filesystem doesn't support sparse files then presumably seek-and-write will zero fill anyway and we don't have to do anything. Or seek would fail, which I guess we should gracefully handle but sendfile_pad() already has plumbing for that?) >> It hasn't got "make". Kind of limiting factor not to have a make command on >> the >> target. > > gmake has a "./build.sh" that you can use to bootstrap it up on a system > without make. My first step in this after I hacked together a overlay was > to get a gmake tarball and try to build it with "./configure && ./build.sh && > ./make install", > which configure (on host bash, not toysh) goes into a infinite loop without > expr, and putting that in will fail because "host compiler does not produce > run-able executable" (Which might be true because I have to manually hack > together > a overlay each time and I throw out quick "hello world" tests mostly). Good to know. Way back when, I had a script that would splice the toolchain.sqf into the host filesystem with a bunch of symlinks, ala https://github.com/landley/aboriginal/blob/master/system-image.sh#L65 splicing together https://github.com/landley/aboriginal/blob/master/sources/toys/dev-environment.sh and https://github.com/landley/aboriginal/blob/master/sources/toys/make-hdb.sh although the interesting part was probably https://github.com/landley/aboriginal/blob/master/sources/toys/dev-environment.sh#L72 And THAT was based on the old environment setup I used to do in Firmware Linux to give User Mode Linux a mostly writeable chroot despite starting with https://user-mode-linux.sourceforge.net/hostfs.html but that was back before git was invented so I just have a bunch of tarball snapshots over the years (at https://landley.net/aboriginal/downloads/old/) rather than Pretty sure I have old blog entries at https://landley.livejournal.com explaining what I was doing and why, but ever since the servers moved around I haven't wanted to fish in them, archive.org is slow and has terrible UI, and my backup disks from that period are... somewhere. Everything's still packed from the move, I could
Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"
On 4/27/24 20:44, Oliver Webb via Toybox wrote: > Doing minimal linux system setup with mkroot and trying to create a minimal > environment > with a native toolchain to run autoconf in. This would mean getting the > native static > toolchain for my architecture from > https://landley.net/toybox/downloads/binaries/toolchains/latest/. > Mounting the image (Why are cross compilers tarballs while native compilers > are fs images? Copying the native compiler into the initramfs takes more space than initramfs can comfortably hold. The run-qemu.sh in mkroot defaults to -m 256 (I.E. 256 megabytes system memory), and some board emulations (like mips) _can't_ map more than that. (Making the boards consistent is good, it's enough to run a single threaded compile, and it's nice for running lots of instances in parallel on the host ala mkroot/testroot.sh. Even ignoring that, the kernel's cpio extractor generally has its own size limits. The initial physical memory layout only leaves so large a gap between "where we loaded the cpio.gz" and "where we extract it to", and when you fill up that gap at a certain point the extract overwrites the data it's reading, because initramfs isn't _expected_ to be multiple gigabytes in size. Again, how much you've got varies by target but adding a quarter gigabyte of toolchain didn't work on multiple boards when I tried it. Shrinking the toolchain down has some hard limits: even way back in the aboriginal linux days when I was trying to set up a tinycc compiler on target, just the extracted /usr/include headers took up quite a bit of space: $ cd ccc $ du -s i686-*cross/*/include 23148 i686-linux-musl-cross/i686-linux-musl/include Currently 23 megabytes (and another couple megabytes for the compiler includes). Keeping them in a squashfs was more memory efficient. > Wouldn't making them tarballs mean that you could extract their contents > without running > losetup and dealing with mounting devices and needing root permissions ? Squashfs is an archive format, there's an unsquashfs command to extract it if you want to fiddle with it on the host, although mount-and-copy in mkroot works too. The problem (read-only) mounting a compressed archive is seekability: on normal block devices the kernel can jump around and grab chunks of directory information and file contents into dcache and page cache, and be free to discard them again under memory pressure so they should be cheap to get back. That's the design expectation for filesystems. The problem with a tarball is you need to extract the whole thing starting at the beginning to find where anything _is_. You can fix that by building an index at mount time (extract the whole thing, examine the contents, and make notes) but that makes mount really slow and also means you have a data tree you can't discard so you've more or less pinned your directory cache if you want to know where all the files start. Zip file format addresses the dentry part because it was designed to let you extract individual files, but it doesn't address seekability _within_ a file. If you try seek 10 megs into a file (or mmap from that point) it has to extract and discard 10 megs of data. (The main downside of zip files A) individually compressing each file is less efficient than compressing the whole archive, so they tend to be larger, B) zip puts all its metadata at the _end_ of the file, so if the file is truncated at all you've lost ALL the contents because it doesn't know what any of the rest means anymore. Incomplete zip file transfers were worthless because it has to start reading at the end to find anything. The reason it did that was so amending existing zip files in place was quick, because it can remove and rewrite the metadata easily. If the metadata wasn't at the end and needed to be expanded, it would either need to move all the file contents to make room, or break the metadata into chunks and parse together scattered overlays. Of course replacing a file in the archive wasted space because unless the old file had coincidentally been at the very end of the archive, it left the old one in there and just added the new copy and updated the index to point at it.) Most compression formats handle files in chunks: bzip2 does 900k blocks, gzip does periodic dictionary resets, etc. Using a compression format with a reasonable chunk size and tracking where each chunk starts lets you handle seeks reasonably well, and that's what squashfs does. I haven't looked up the actual file format, but conceptually it's a zip file plus chunk indexes within files. > I trust they were > made fs images for a good reason, but... _why_). Within mkroot, squashfs is easier to deal with because I don't need to reserve destination space to extract everything into to poke at the contents. Outside of mkroot, squashfs isn't that much harder to play with, mostly just less familiar. > And ideally running a mkroot overlay on > it because that's what the overlays seem to be
Re: [Toybox] Fw: Re: Dude.
On 5/4/24 11:34, Oliver Webb via Toybox wrote: > (Rob wants this on the list anyways, and he hasn't CC:-ed it. If I want to send a message to the list, I'm capable of doing so. As I said in the postscript I don't _object_ to it being on the list (in an "I say the same things in public as in private" way), and I did lament that I'd spent half a work day composing several thousand words to just one person rather than as a general resource I could refer back to in future or maybe get a FAQ entry out of. But thinking about it after the fact (when I got your reply), I honestly didn't expect other people on the list to be interested beyond maybe closure. It's potentially useful to know that the guy who wrote about half of all messages to the list last month (35/76 in the web archive) might stop. > I want it on the list for multiple reasons. Or might not. Reading lots of text is _work_. I reference "pascal's apology" a lot (him being sorry for writing a long letter because he didn't have time to write a short one), because people try to _read_ this stuff. (Or worse, they stop trying.) I try to keep the signal to noise ratio up, and that means editing it DOWN. Which takes time and energy. (This reply is uncomfortably rambly, but I've already spent a day away from the keyboard going "I have to coherently reply to this" and not wanting to.) > (I gave him permission > to cc it in a reply email I intend to forward to the list)) And I didn't, so you put words in my mouth again about what I "want". This thread doesn't advance the project, and I doubt this exchange offers much insight into _my_ behavior. I've been posting publicly for a quarter century on linux-kernel and busybox and uclibc and toybox and j-core and elsewhere. I've maintained _this_ project for 17 years. I've made policy statements about it in design.html and the faq and on the list and in my blog (and twitter and mastodon and livejournal and talks on youtube and mp3 recorded panels from penguicon and linucon and heck, you can pull my old comments out of slashdot and lwn.net if you try). There's even a code of conduct which HIGHLY IRONICALLY was originally copied from twitter's. (No really: https://github.com/landley/toybox/commit/bc308973ffb6) People already have _plenty_ of rope to hang me with if they decide they need a reason. I care about the _code_. what's best for the _project_. I also care about documentation, but the problem is usually "too much" and needing to boil it down and put it somewhere obvious where it's indexed and people can easily find it. I'm trying NOT to make it about me. I'm very fiddly about the work, sometimes trying (and failing) to do the programming equivalent of Faberge Eggs, and the perfect can be the enemy of the good. But that's what distinguishes this project from the half-dozen other implementations of the same stuff already out there. That said, you pointed me at a message where you'd asked an actual question: > > > because I've been trying to run gcc under mkroot and a response to > > > http://lists.landley.net/pipermail/toybox-landley.net/2024-April/030334.html > > > would've been helpful. > > Hadn't seen it. It got, quite literally, lost in the noise. And I'd started replying to that (before you sent this to the list), and stopped because it was too long and I needed to edit it down. I should just press "send" on the ramble and move on to the next todo item. (Top of stack is fixing unshare I think...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] unshare/nsenter and flags
On 5/2/24 13:14, enh via Toybox wrote: > another googler wanted a host unshare(1) for some testing... i added > that, and they complained that although the docs say > > -r Become root (map current euid/egid to 0/0, implies -U) > (--map-root-user) > > it seems like -r _doesn't_ actually imply -U in practice (and they > seemed to have strace output to prove it). So... should it? What did they try to do, and what did they _want_ to happen? I'd compare with my debian unshare command but my install is a bit out of date. (According to https://endoflife.date/devuan I've still got 4 weeks of support.) Coincidentally, I just got an email yesterday morning from "The Happy Dreamhost Upgrade Robot" (yes really) that they're updating landley.net's web container: > We have great news! As part of our mission to support you with your digital > presence, we are always looking to improve your products and provide you with > the most advanced and powerful hardware. > > On Wednesday, May 8th we will be migrating you to a newer shared server. As > part of this maintenance, the operating system will be upgraded from Ubuntu > Bionic to Ubuntu Jammy Jellyfish 22.04.2. > > In most cases, no action is required on your part, but we've prepared some > documentation that will help you prepare for the upgrade to Ubuntu Jammy: > https://help.dreamhost.com/hc/en-us/articles/15506945971220 The "22.04" means it came out two years and one month ago, and that's what they're migrating me TO. So, you know, I can presumably feel less bad about my laptop... > i was assuming the code was just missing, but when i looked, i found: > > // unshare -U does not imply -r, so we cannot use [+rU] > if (test_r()) toys.optflags |= FLAG_U; Let's see, git annotate says that comment comes from commit 3c0be8a473c0: Author: Samuel Holland Date: Sun Apr 12 16:00:16 2015 -0500 unshare: fix -r Calling unshare(2) immediately puts us in the new namespace with the "overflow" user and group ID. By calling geteuid() and getegid() in handle_r() after calling unshare(), we try to map that to root, which Linux refuses to let us do. What we really want to map to root is the caller's uid/gid in the original namespace. So we have to save them before calling unshare(). Meanwhile the "implies" in the help text comes from commit fb4a241f35cf two months earlier: Author: Rob Landley Date: Wed Feb 18 15:19:15 2015 -0600 Patch from Isaac Dunham to add -r, fixed up so it doesn't try to include two flag contexts simultaneously. So it looks like Isaac made -r imply -U and Samuel made it _not_ do so, without changing the help text, and I didn't notice because I'd really like to build domain expertise here but haven't got it. (Largely because doing container stuff tends to require root access, and if I'm requiring root access anyway I tend to just chroot, or launch a qemu instance that does NOT require root access on the host. It's on the todo list...) I've used toybox's unshare command a bunch of times, but not the UID remapping parts... > but note the unshare/nsenter sharing there --- is it a problem that i > have unshare enabled but not nsenter? is that expected to work? I'm happy to implement proper semantics here if I know what they _are_. What _should_ it do? I recently blogged (https://landley.net/notes.html#13-04-2024) about attending yet another container talk at txlf, but if I really want a "contain" command what I should probably do is dig through: https://github.com/p8952/bocker https://github.com/Fewbytes/rubber-docker https://blog.lizzie.io/linux-containers-in-500-loc.html And "come up with something". It would be really nice if there was a simple existing syntax I could be compatible with, which is why I was vaguely looking at what minijail does, and https://github.com/rkt/rkt and https://github.com/opencontainers/runc and https://github.com/containers/crun and https://github.com/containerd/containerd and so on. But that's a fresh can of worms to open after I close a couple of existing ones, and to get to 1.0 the LFS build needs "awk" more than container support... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] nproc(1)
On 4/29/24 16:56, enh via Toybox wrote: > isn't nproc(1) just a call to sysconf(3) with either > _SC_NPROCESSORS_ONLN for regular behavior, or _SC_NPROCESSORS_CONF for > --all? >From musl src/conf/sysconf.c: case JT_NPROCESSORS_CONF & 255: case JT_NPROCESSORS_ONLN & 255: ; unsigned char set[128] = {1}; int i, cnt; __syscall(SYS_sched_getaffinity, 0, sizeof set, set); for (i=cnt=0; ihttp://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] DreamHost Security Alert
On 4/24/24 13:10, Rob Landley wrote: > Alas, my website's likely to be down for a bit while I explain to them that > "the > compiler that got used to build an exploit" and "the exploit" can share > strings > because gnu is incompetent and leaks the path where things got built into the > resulting binaries, but that does not mean that the compiler the strings came > from in the first place is actually infected. And it's back. Human saw the email thread at 9am and took reasonable action. I was a little annoyed it was down all day, but eh: nine fives. Close enough. They're cheap and I don't have to do it. Rob (Before them I had a server with a static IP where I ran all my own servers, which meant I had one DNS server pointing to all the other services, and a number of sites went "but DNS says you need TWO authoritative servers" and I went "I'm not paying for a second static IP and all the records would point to the first static IP so if it goes down what does being able to look up the name of the services that aren't currently THERE accomplish? And that's before DNS required cryptographic signatures, and then "sender permitted from" showed up in email around then and NONE of those checkers would work without 2 DNS servers so I _couldn't_ set it up... So yes I _could_ get one of my orange pi boards sent to one of the raspberry pi hosting sites that give a static ipv4 as part of the hosting package, but... I really don't want to?) ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] DreamHost Security Alert
Alas, my website's likely to be down for a bit while I explain to them that "the compiler that got used to build an exploit" and "the exploit" can share strings because gnu is incompetent and leaks the path where things got built into the resulting binaries, but that does not mean that the compiler the strings came from in the first place is actually infected. I mean, here's an article from 2018: https://www.bleepingcomputer.com/news/security/mirai-iot-malware-uses-aboriginal-linux-to-target-multiple-platforms/ Rob (I'd point to old blog entries where I went "huh, my compilers got used to build random russian malware" ten years ago, but my blog was on my site so you wouldn't see it unless I fish it out of archive.org...) Forwarded Message Subject: DreamHost Security Alert - Malware on landley.net Date: Wed, 24 Apr 2024 09:53:09 -0700 (PDT) From: DreamHost Abuse Team To: r...@landley.net Hello Rob Landley, We have received a report of malware at the following location: hXXps://landley.net/aboriginal/downloads/old/binaries/1.2.6/cross-compiler-armv7l.tar.bz2 This means that your site has likely been compromised. We have taken the site offline by renaming its directory (appended _DISABLED_BY_DREAMHOST). Please do not re-enable it until you can address the problem. In general, the three most common entry points for a compromised website are: 1. Vulnerable, typically out-of-date software (such as blogs, forums, CMS, associated themes and plugins, etc.) 2. A cracked/brute-forced admin login for a web application like WordPress, Joomla, Drupal etc. 3. A compromised FTP/SFTP/SSH user password. 1. All software you have installed under your domain should always be kept up-to-date with the most recent version available from the vendors' website, as these often contain security patches for known issues. Older versions of well-known and popular web software (including Wordpress, Drupal, Joomla, etc.) are known to have vulnerabilities that can allow injection and execution of arbitrary code. 2. If you utilize a web application with a script-based administrative backend (like WordPress, Joomla, or Drupal), make sure that you're not using a generic username like "admin" or "webmaster" for the user with administrative privileges. Hackers will slowly brute-force common usernames in order to get access to a script's backend and whatever tools exist there that allow file uploads, alterations, or execution of code. 3. FTP/SFTP/SSH passwords can be compromised and used to modify files. The most important part of securing your account in this case is to change your FTP user's password via the (USERS > MANAGE USERS) -> "Edit" area of the control panel. Passwords should not contain dictionary words and should be a string of at least 8 mixed-case alpha characters, numbers, and symbols. It is also recommended to always use Secure FTP (SFTP) or SSH rather than regular FTP, which sends passwords over the internet in plaintext. You can disable FTP for your user(s) within the DreamHost panel (USERS > MANAGE USERS) section. At this point, we recommend logging into your DreamHost server and removing the content we listed. (Note: You may first need to reset the permissions). You should also look for any other files/directories you did not upload yourself and update all your website components where applicable. As for determining which entry point is the cause of this incident, for 1 and 2, you can review the Apache logs for suspicious activity and requests to suspicious files. Keep in mind that we typically only keep around 5 days worth of Apache logs. For 3, you can refer to this article to find recent logins to your user: https://help.dreamhost.com/hc/en-us/articles/214915728-Determining-how-your-site-was-hacked For further help on this topic, you can refer to our Knowledge Base: https://help.dreamhost.com/hc/en-us/articles/215604737-Hacked-sites-overview https://help.dreamhost.com/hc/en-us/sections/203242117-Logs Lastly, we have scheduled an automated malware scan and if anything is found, we will send you a separate email with those results. If you need further assistance, please respond directly to this email. Thank you for your cooperation! -DreamHost Abuse Team ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] xxd: -d Decimal Lables flag, Don't cap at one file
On 4/22/24 17:17, enh via Toybox wrote: > ah, yeah, the _include_ path uses the full buffer and -r uses stdio > buffering, but "regular" xxd was doing neither. i've sent out the > trivial patch to switch to stdio. Ah, performance tweak. *shrug* Applied... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] xxd: buffer input via stdio.
On 4/22/24 17:17, enh via Toybox wrote: > --- > toys/other/xxd.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) What's the issue this fixes? It's not: for i in $(seq 1 100); do echo $i; sleep 1; done | ./xxd Because that won't produce output for a couple minutes... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] shuf works on FreeBSD
On 4/20/24 03:42, Vidar Karlsen via Toybox wrote: > toys/other/shuf.c builds and runs on FreeBSD and can be enabled in > freebsd_miniconfig with CONFIG_SHUF=y. > > I can't think of a use case for it, but I'm sure there are some. I thought it was enabled in commit 93c8ea40a back in November? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Microsoft github took down the xz repo.
On 4/15/24 03:53, Jarno Mäkipää wrote: > On Sun, Apr 14, 2024 at 9:14 AM Oliver Webb via Toybox > wrote: >> >> To revive a old thread with new technical info I stumbled upon: >> >> On Saturday, March 30th, 2024 at 15:58, Rob Landley wrote: >> >> > I set up gitea for Jeff on a j-core internal server, and it was fine >> > except it >> > used a BUNCH of memory and cpu for very vew users. Running cgi on >> > dreamhost's >> > servers is a bother at the best of times (I don't want to worry about >> > exploits), >> > and the available memory/CPU there is wind-up toy levels. >> > >> > My website is a bunch of static pages rsynced into place, some of which use >> > xbithack to enable a crude #include syntax, and that's about what the >> > server can >> > handle. >> >> Going through the list of "minimal tools" on https://suckless.org/rocks/, Not really a fan of that site. I did a roadmap section on them long ago (https://landley.net/toybox/roadmap.html#sbase), but I'm trying to implement mostly compatible versions of things that already exist, and they're trying to invent new things that didn't previously exist because https://xkcd.com/927/ which I mostly consider fragmentation rather than helping, and I try not to encourage them. >> I stumbled >> upon a git frontend called stagit >> (https://git.codemadness.org/stagit/file/README.html) >> which the suckless project uses as it's git frontend. When microsoft bought github I mirrored my repo on my website so you could pull it from there, but doing that doesn't have any web interface so I did a quick and dirty bash script to upload the "git format-patch" of each commit, with symlinks from the 12 character hash to the full hash (because doing _each_ one was an insanely slow exercise in inode exhaustion). You're once again telling me what I did was not good enough for you, and that I am wrong, and must change to suit you. >> But to have a solution, you must have a problem. The 2 main issues I have >> with the current git management >> are the fact I'm very tired. >> there doesn't seem to be a way to clone the current repo directly from >> landley.net (Making Microsoft >> GitHub the middleman). $ git annotate www/header.html | grep -w git fb47b0120 (Rob Landley2021-09-12 14:33:36 -0500 30) https://landley.net/toybox/git>local $ git show fb47b0120 commit fb47b0120f7aa73c0821a8c55e15540d83baed01 Author: Rob Landley Date: Sun Sep 12 14:33:36 2021 -0500 Add a local git mirror (todo item since github was acquired)... diff --git a/www/git/index.html b/www/git/index.html new file mode 100644 index ..bade8d1b --- /dev/null +++ b/www/git/index.html @@ -0,0 +1 @@ +Not browseable: git clone https://landley.net/toybox/git $ git log scripts/git-static-index.sh commit 990e0e7a40e4509c7987a190febe5d867f412af6 Author: Rob Landley Date: Sat Dec 24 06:34:11 2022 -0600 Script to put something browseable in https://landley.net/toybox/git https://landley.net/notes-2022.html#22-12-2022 >> And the fact I can't browse the source code without github or android code >> search acting as >> the middleman I do not have source tree snapshots up. Kinda hard to do in a static manner without uploading rather a LOT of files (and even if you upload each version of "git log" for each file and create an index file for each commit with the ls -lR of the whole tree linking to the relevant version, the URLs to the files are ugly. I can do it, but don't really want to? Linking to individual lines of the file while also having the raw text kinda implies uploading two versions and I just dowanna. Oh, and dreamhost's server config doesn't have sane file associations for all the types so if I put up a .c file it wants to DOWNLOAD it instead of displaying it as text and trying to .htaccess that more of a pain than I'm up for, so I would wind up having blah.c.txt and blah.c.html files and that's just ugly...) Plus, syntax highlighting: you'd THINK there would be some nice linux syntax highlighting packages out there but not counting "use vi" (which doesn't work for me anyway, :syntax = "E319: Sorry, the command is not available in this version")... Searching around I found https://github.com/alecthomas/chroma which is very proud that it's written in "pure go"... except it's a wrapper for a python library, and python's runtime is written in C, so DEFINE PURE... Digging into the aforementioned python (don't get me started) library, the "python-pigmentize" package installs the man page for a command "pygmentize", and the bash completion for the command pygmentize, but does not install the actual command in the $PATH (or anywhere
Re: [Toybox] df not working on FreeBSD
On 4/15/24 04:37, Vidar Karlsen via Toybox wrote: > Hello, > > df throws the following error on FreeBSD: > > root@140amd64_noopts-usrports:/usr/local/toybox/bin # ./df / > df: getmntinfo: Invalid argument > > A little bit of poking around shows that getmntinfo expects the second > argument (the mode) to be one of these, and not 0: Presumably it worked at one point, but I didn't write that bit... > sys/sys/mount.h: > #define MNT_WAIT1 /* synchronously wait for I/O to complete */ > #define MNT_NOWAIT 2 /* start all I/O, but do not wait for it */ > #define MNT_LAZY3 /* push data not written by filesystem syncer */ > #define MNT_SUSPEND 4 /* Suspend file system after sync */ > > Changing 0 to MNT_NOWAIT in portability.c makes df happy again. And doesn't break macos, so I'm not adding the #ifdef in your patch. (I don't have an openbsd test environment lying around, but https://man.openbsd.org/getmntinfo.3 links to https://man.openbsd.org/getfsstat.2 which says the options are MNT_WAIT and MNT_NOWAIT so presumably they're happy too. Commit 7d9ee89d3cf8. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] httpd: How is this supposed to be _used_?
On 4/13/24 14:09, Oliver Webb via Toybox wrote: > The first thing I ran into is that httpd doesn't do that by default, > running "toybox httpd dist/" won't actually host those pages > on localhost. It's an inetd client: https://en.wikipedia.org/wiki/Inetd toybox netcat -s 127.0.0.1 -p 8 -L httpd . I've been meaning to come up with an actual inetd, and possibly lib/*.c plumbing to do standalone servers, but nommu support and rate limiting incoming connections and so on all go in a layer I haven't implemented yet and am not interested in reproducing in multiple commands. Genericizing the plumbing I've already got in netcat, but making it available from individual commands, implies having a standard set of command line utilities that get exposed in commands to specify address to bind to and port to listen on and max simultaneous connections (including max per source IP) and output inactivity timeout Possibly some sort of IPSERVER macro flung into the option string, with a corresponding structure in TT and then a function I call? Or maybe just stick with inetd so it's somebody else's problem... I explain this here periodically, by the way: http://lists.landley.net/pipermail/toybox-landley.net/2024-January/03.html In theory a tcpsvd was contributed to pending long ago, which kind of has the od/hexdump/xxd problem of multiple implementations not sharing code (as I periodically mention here, ala http://lists.landley.net/pipermail/toybox-landley.net/2023-January/029410.html). It's in the todo heap... > "Why?": Looking at the source code and typing > input into httpd, it wants input from stdin and seemingly outputs to > stdout like a normal unix tool (which httpd is usually not). As all inetd clients do, yes: nbd_server.c is another one. Lots of other things (like the tftpd in pending, or dropbear) can work in inetd mode. > Forgive me, but I'm going to compare this to busybox httpd. You do you. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] today in "shut up, gnu!"
On 4/12/24 13:24, enh via Toybox wrote: > ~/aosp-main-with-phones$ find external/ -name NOTICE -type l -maxdepth 2 > find: warning: you have specified the global option -maxdepth after > the argument -name, but global options are not positional, i.e., > -maxdepth affects tests specified before it as well as those specified > after it. Please specify global options before other arguments. > > (it does do the right thing, but insists on whining first.) I've hit that too, and am big into Not Doing That. Thought I'd blogged about it, but it could have been irc, or twitter (which I deleted when twitler bought it but have an archive I should probably post somewhere), or... probably too old for mastodon? There's a reason I get so exasperated about each new gnu/nag I stub my toe on. It's gone beyond isolated incident into "pattern of looking down on everyone else and sneering". Unix has always been a silent protagonist, without which shell scripts are a pain to do. If it doesn't work, they'll figure out why. Just behave consistently (according to SOME kind of understandable logic) and let them keep the pieces. Sometimes there's a -v flag to activate printfs() stuck into the code, but don't express opinions when they didn't ASK. (Put them in the man page or --help if it's that important.) This has ALWAYS been the unix way. There are ALWAYS corner cases, and deterministic behavior is not difficult to debug. The gnu/FSF never got that. Stallman only decamped to unix under protest, a refugee from the Jupiter project's collapse orphaning ITS, and he never really understood it. RMS did not INVENT the idea of cloning unix with his big announcement in 1983. Unix was a diverse community starting from the 1974 ACM article, let alone the Berkeley Software Distribution in 1975. The first full from-scratch Unix clone (writing their own kernel, compiler, and command line) was Coherent, which shipped in 1980. Paul Allen copied subdirectories and file descriptors from unix into DOS 2.0 not long after. Minix started in 1983 and shipped in 1986, and Linux is 100% a descendant of Minix (developed on minix, its first filesystem was minix, the development discussion on comp.os.minix, he inherited 80% of the minix community because he took patches and Tanenbaum didn't...) There's a famous tanenbaum-torvalds debate preserved for posterity, there is NOT a stallman-torvalds debate because nobody cared what stallman had to say. Nor did he invent freeware, which was the universal norm before the Apple vs Franklin decision in 1983 because you couldn't copyright binaries before Steve Jobs got the appeals court to change the law. Byte and Compute magazines had basic listings in the back of each issue for you to type in, decus and CP/M northwest had software libraries, the commodore 64 came bundled with a disk of Jim Butterfield's software but he didn't WORK for them: he founded the Toronto Pet User's Group (TPUG) and published free software with source code. But Stallman mansplained at everyone else at the top of his lungs nonstop from the moment he showed up, and there are all sorts of topics that can't NOT have an "as opposed to what stallman's saying, the truth is" section today... https://en.wikipedia.org/wiki/Freeware Sigh, watching https://youtu.be/2gOGHdZDmEk and https://youtu.be/WWfsz5R6irs and https://youtu.be/9RO5ZAmzjvI every time the narration talks about Pierre Spray I get Stallman vibes. There's a broadcast version of Dunning-Kruger where you plausibly preach to an audience who doesn't know better, and become The Expert that everybody must get a quote from every time something happens in that area, while the people actually doing the work facepalm at every third word. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] uname no longer broken on FreeBSD?
On 4/13/24 03:00, Vidar Karlsen via Toybox wrote: > Hello, > > toys/posix/uname.c builds and runs on Freebsd now. I have tested it on > 13.2-amd64, 14.0-amd64, 13.2-arm64 and 14.0-arm64. I think it's safe to > put CONFIG_UNAME=y back into kconfig/freebsd_miniconfig. Ah, commit d2bada0e42e6 fixed it but I only remembered to add it to macos_defconfig, forgot the other one. Thanks, Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] prereq build, what is the motivation behind building od?
On 4/8/24 14:20, Oliver Webb via Toybox wrote: > Although I may be wrong, "od" doesn’t seem to be in > the build infrastructure. What’s the reason for it being a > "prereq" command. $ vi scripts/recreate-prereq.sh ... $ grep '^od ' log.txt od "-Anone" "-vtx1" https://github.com/landley/toybox/blob/0.8.11/scripts/make.sh#L230 > Also, have you thought about specifying FILES through > the command line to reduce build time by only building what we need to. Have I thought about micromanaging the build in a way that may not link in combination with a given set of generated/*.h files? Probably at some point. Keep in mind I've been doing this stuff on and off since... depending on how you want to look at it, 1999. > Scanning > for commands with “which” "which" looks at what's installed on the host out of the $PATH. what does that have to do with what's configured in toybox? (If I supplied an airlock I specified the $PATH...) > and maybe uname for stuff like gsed You mean use uname to figure out if we're running on MacOS or FreeBSD like the code already does in scripts/portability.sh? > and putting them > in FILES if we don’t have a good enough version. I built defconfig under record-commands, and then did the standard "awk '{print $1}' | sort -u | xargs" trick from literally _decades_ ago: https://github.com/landley/aboriginal/blob/dbd0349d8ae6/sources/toys/report_recorded_commands.sh#L10 https://landley.net/aboriginal/FAQ.html#:~:text=logging%20wrapper to get a list of the commands used by that, and used that to generate a toybox .config file enabling those commands. I then made a SHELL SCRIPT that DID ALL THAT so you could SEE HOW/WHY IT WAS BUILT (and also so I could automate updating it, yes I should probably add it to release.txt): https://github.com/landley/toybox/blob/master/scripts/recreate-prereq.sh And tried to explain that I'd done so: https://github.com/landley/toybox/commit/d1acc6e88be5 And how to use the result: https://github.com/landley/toybox/commit/3bbc31c78b41 > Then generating generated/ > files based off of that? No. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] timeout.test: reduce flake.
Catching up. (I let stuff pile up preparing for the release and then took a couple days off, and now I'm at texas linuxfest doing sleep deprived talk prep for tomorrow...) On 4/8/24 15:28, enh via Toybox wrote: > A (presumably overloaded) CI server saw the `exit 0` test time out. > Given that several of these tests should just fail immediately, > having a huge timeout isn't even a bad thing --- if we had a bug > that caused us to report the correct status, but not until the > timeout had _also_ expired, this would make that failure glaringly > obvious. > > Aren't the other tests with 0.1s timeouts potentially flaky? Yes, > obviously, but I'll worry about those if/when we see them in real > life? (Because increasing those timeouts _would_ increase overall > test time.) Yes it should never happen, but 11 minutes seems like a footgun. I bumped it up to 1 second (10 times as long as before). If you see it again I can bump it to 5 seconds, but much beyond 1 second and the "timeout -v .1 sleep 3" test later on gets flaky, as does: toyonly testcmd "-i" \ "-i 1 sh -c 'for i in .25 .50 2; do sleep \$i; echo hello; done'" \ "hello\nhello\n" "" "" Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] Release 0.8.11 is out.
Yeah, a bit overdue. Lemme know if anything in the release notes isn't clear. Still doing a texas linuxfest talk soonish. Hopefully they post a video eventually... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] tail test failures?
On 4/8/24 11:14, enh via Toybox wrote: > looks like the github CI has been red for ubuntu and macOS since april 5th? > > this revert fixes the current failing test: > > [master 8368f8f9] Revert "Enforce min/max for % input type (time in > seconds w/millisecond granularity)." > > but that just gets me a different failing test, so it's obviously a > bit more subtle than that :-) Darn it, didn't get a release out on leap day, didn't get a release out during the eclipse... Always one more thing. (Pay no attention to the binaries I just uploaded, gotta rebuild them and do it again. This is why I push the tag and update the news.html file on the website LAST...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] utf8towc(), stop being defective on null bytes
On 4/8/24 11:53, Oliver Webb wrote: > Still, U+ is a valid code point, and having a special case especially for > it > that isn’t mentioned but you have to watch out for is either a bug or a > documentation error. I say it's intentional, you reassert that I'm wrong. I'll leave you to your opinion... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] utf8towc(), stop being defective on null bytes
On 4/8/24 11:01, enh wrote: >> > Returning length 0 means we hit a null terminator, >> >> Null bytes aren't always "terminators". You can embed null bytes into data >> and still >> want to do utf8 processing with it. > > that's questionable ... the desire to have ASCII NUL in utf-8 > sequences (without breaking the "utf-8 sequences are usable as c > strings" property) is the main reason for the existence of "modified > utf-8". You don't need a conversion function to grab a nul byte, you can check if it's a null byte. That value _is_ a special case, the enclosing loop can deal with it easily enough (there's nothing to convert, it's a NUL byte, check directly). I've got functions like regexec0() that work over a range instead of using a NUL, and those have to deal with libc's regex stopping at NUL so the enclosing loop advances past it and restarts. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] utf8towc(), stop being defective on null bytes
On 4/6/24 17:48, Oliver Webb via Toybox wrote: > Heya, looking more at the utf8 code in toybox. The first thing I spotted is > that > utf8towc() and wctoutf8() are both in lib.c instead of utf8.c, why haven't > they > been moved yet, is it easier to track code that way? The "yet" seems a bit presumptuous and accusatory, but given the title of the post I suppose that's a given. I have no current plans to move xstrtol() from lib.c to xwrap() And atolx() is only called that instead of xatol() because it does suffixes. The reason it had to go in lib.c back in the day was explained in the commit that moved it to lib.c: https://github.com/landley/toybox/commit/6e766936396e As for moving it again someday, unnecessarily moving files is churn that makes the history harder to see, and lib/*.c has never been a strict division (more "one giant file seems a bit much"). The basic conversion to/from utf8 is different from caring about the characteristics of unicode code points (which the rest of utf8.c does), so having it in lib.c makes a certain amount of sense, and I'm not strongly motivated to change it without a good reason. It might happen eventually because I'm still not happy with the general unicode handling design "yet", but that's a larger story. Way back when there was "interestingtimes.c" for all my not-curses code, but it was too long to type and mixed together a couple different kinds of things, so I split it into utf8.c and tty.c both of which were shorter and didn't screw up "ls" columnization. (I probably should have called it unicode.c instead, but unicode is icky, the name is longer, and half the unicode stuff is still in libc anyway). Unicode is icky because utf8 and unicode are not the same thing. Ken Thompson came up with a very elegant utf8 design and microsoft crapped all over it (cap the conversion range, don't add the base value covered by the previous range so there are no duplicate encodings) for no apparent reason, and then unicode just plain got nuts. (You had an ENORMOUS encoding space, the bottom bit could totally have been combining vs physical characters so we don't need a function to tell, and combining characters should 100% have gone BEFORE the physical characters rather than after to avoid the whole problem of FLUSHING them, and higher bits could indicate 1 column vs 2 column or upper/lower/numeric so you don't have to test with special functions like that, just collage them into LARGE BLOCKS which is LESS SILLY than the whole "skipping 0xd800" or whatever that is for the legacy 16 bit windows encoding space that microsoft CRAPPED INTO THE STANDARD... Ahem.) But alas, microsoft bought control of the unicode committee, so you need functions to say what each character is, and those functions are unnecessarily complicated. In theory libc has code to do wide char conversions already, but glibc refuses to enable it unless you've installed and selected a utf8-aware locale (which is just nuts, but that's glibc for you). I made some clean dependency-free functions to do the simple stuff that doesn't care what locale you're in, but there's still wcwidth() and friends that depend on libc's whims (hence the dance to try to find a utf8 locale in main.c, and the repeated discussion on this list between me and Elliott and Rich Felker about trying to come up with portable fontmetrics code. Well, column-metrics. Elliott keeps trying to dissuade me, but bionic's code for this still didn't work static linked last I checked...) Moving stuff around between files when I'm not entirely satisfied with the design (partly depending on libc state and partly _not_ depending on it) doesn't seem helpful. > Also, the documentation > (header comment) should probably mention that they store stuff as unicode > codepoints, Because I consistently attach comments before the function _body_ explaining what the function does, instead of putting long explanations in the .h files included from every other file which the compiler would have to churn through repeatedly. In this case: // Convert utf8 sequence to a unicode wide character // returns bytes consumed, or -1 if err, or -2 if need more data. int utf8towc(unsigned *wc, char *str, unsigned len) > I spent a while scratching my head at the fact wide characters are 4 byte > int's > when the maximum utf8 single character length is 6 bytes. Because Microsoft broke utf8 in multiple ways through the unicode consortium, among other things making 4 bytes the max: http://lists.landley.net/pipermail/toybox-landley.net/2017-September/017184.html In addition to the mailing list threads, I thought I blogged about this rather a lot at the time: https://landley.net/notes-2017.html#29-08-2017 https://landley.net/notes-2017.html#01-09-2017 https://landley.net/notes-2017.html#19-10-2017 Which was contemporaneous with the above git commit that added the function to lib/lib.c. I generally find that stuff by going "when did this code show up and/or get
[Toybox] scripts/prereq/build.sh
I recently added scripts/prereq/build.sh which runs a "cc -I dir *.c" style build against canned headers. Theoretically a portable build not requiring a system to have any command line utilities except "cc" and a shell. (Ok, you still need bash to run scripts/make.sh and scripts/install.sh until toysh is promoted. And until I replace kconfig, you still need gmake to run "make defconfig", but I've got a design for that one now.) Both that build.sh script and the saved scripts/prereq/generated headers are created by scripts/recreate-prereq.sh which figures out what commands a toybox build uses out of the $PATH (by doing a defconfig build under mkroot/record-commands.sh), makes a .config file with just those commands enabled and all dependencies switched off (and hardwires the two not-android not-mmu symbols that get compiler probed), then strips down the resulting headers to have just the symbols those commands need. (Well, I haven't stripped down config.h yet but all the OTHERS are hit with sed/grep to remove stuff for the commands that aren't enabled.) Of course when I ran it on macos it went "boing": toys/other/taskset.c:52:17: error: use of undeclared identifier '__NR_sched_getaffinity' toys/other/taskset.c:81:15: error: use of undeclared identifier '__NR_sched_setaffinity' toys/other/taskset.c:119:29: error: use of undeclared identifier '__NR_sched_getaffinity' 3 warnings and 3 errors generated. It's trying to build nproc, which scripts/make.sh uses out of the $PATH to query available processors. And yes, nproc calls sched_getaffinity() on linux (even the debian one, according to strace) which isn't really portable... In theory, I've got some workaround code for nproc being unavailable in scripts/portability.sh already: # Probe number of available processors, and add one. : ${CPUS:=$(($(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null)+1))} I'm uncomfortable leaning in to "linux else bsd/mac" because I was also thinking about stuff like qnx and vxworks and so on with the new "canned" build, but if all the probes fail that becomes CPUS=$((+1)) and thus sets it to 1, which should still work if I filter out nproc and sysctl isn't there either? But I'd also like to build nproc for other targets if I could. Which sounds like it turns into a portability.c mess pretty quickly... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] Mkroot talk at texas linuxfest on the 13th.
They posted the description. It's basically "45 minutes about mkroot": https://2024.texaslinuxfest.org/talks/mkroot-tiny-linux-system-builder/ Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] more gnu nonsense: cp -n
On 4/2/24 01:35, Ryan Prichard wrote: > Apparently upstream coreutils "cp -n" changed between 9.1 and 9.2, and the > Debian maintainers reverted the change temporarily(?) and also added the > "non-portable" error. > > In coreutils 9.1 and older, "cp -n" quietly skipped a file if the > destination existed, but as of 9.2, it instead prints an error and exits with > non-zero at the end. (I saw some stuff about "immediately failing" on the > Debian > bug, but AFAICT, cp keeps going and fails at the end.) It does look like the > new > 9.2+ behavior matches "cp -n" on macOS (14.3.1) (and probably FreeBSD but I > didn't test that). In toybox, I tend to repeat an option to get that sort of behavior, so I'd do: cp -n thingy... - skip files, no error cp -nn thingy... - skip files, with error That way the existing behavior doesn't change, and old versions that don't understand the doubling still provide the old behavior (because cp -n -n = cp -n by default) without erroring out on an unknown flag or consuming more namespace. See toybox's "ls -ll" (shows nanoseconds) or "lsusb -nn" (numeric AND non-numeric output) for examples. And yes, debian handles "ls -ll" just fine. :) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] more gnu nonsense: cp -n
On 4/1/24 10:31, enh via Toybox wrote: > hadn't seen this one before... > > cp: warning: behavior of -n is non-portable and may change in future; > use --update=none instead > > (consider me skeptical that a system without -n is going to have > --update=none...) Define non-portable? Freebsd 14 has -n, macos has -n, busybox cp has -n, and of course toybox (and thus android) has -n. Meanwhile: $ ./busybox cp --update=none one two cp: option '--update' doesn't allow an argument root@freebsd:~ # cp --update=none one two cp: illegal option -- - root@freebsd:~ # cp --update=none one two cp: illegal option -- - $ toybox cp --update=none one two cp: Unknown option 'update=none' (see "cp --help") Those clowns are explicitly advocating for a LESS portable option. This is why I'm not removing "egrep", which is a shell wrapper on my devuan system by the way: $ which egrep /bin/egrep $ cat /bin/egrep #!/bin/sh exec grep -E "$@" At least THAT one is easy for distributions to keep doing regardless of gnu/stupid. If the solution for cp -n isn't "distro patches out the stupid", then "install busybox cp" or just "use alpine". Spurious warnings from gnu are just that: spurious. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Microsoft github took down the xz repo.
On 3/30/24 15:16, Oliver Webb wrote: > On Saturday, March 30th, 2024 at 15:06, Rob Landley wrote: >> FYI, Microsoft Github disabled the xz repository because it became >> "controversial" (I.E. there was an exploit in the news). >> >> https://social.coop/@eb/112182149429056593 >> >> https://github.com/tukaani-project/xz > > They couldn't have removed commit access for the trojan horse and got on with > their lives? Mastodon's been talking about this at length all day: https://mstdn.social/@rysiek/112184610302366603 https://hachyderm.io/@dalias/112182128889536710 https://cyberplace.social/@GossiTheDog/112184645230558304 https://social.secret-wg.org/@julf/112184194797977290 https://mastodon.social/@richlv/112180479433832095 And a lot of things the discussion was linking to went away. Oh well... >> I'm assuming if toybox ever has a significant bug, microsoft would respond by >> deleting the toybox repository. There's a reason that I have >> https://landley.net/toybox/git on my website, and my send.sh script pushes to >> that before pushing to microsoft github. > > As much as it doesn't matter, I've wondered what git web frontend you use, > The html source for > the massive table of commits doesn't give a copyright notice. https://github.com/landley/toybox/blob/master/scripts/git-static-index.sh https://landley.net/notes-2022.html#22-12-2022 > Do you just have a script make > a table out of "git log"? Furthermore, have you considered using cgit or > gitea or another > fancier git frontend for your own site? I engaged with cgit at one point and found it overcomplicated and unpleasant. I set up gitea for Jeff on a j-core internal server, and it was fine except it used a BUNCH of memory and cpu for very vew users. Running cgi on dreamhost's servers is a bother at the best of times (I don't want to worry about exploits), and the available memory/CPU there is wind-up toy levels. My website is a bunch of static pages rsynced into place, some of which use xbithack to enable a crude #include syntax, and that's about what the server can handle. > There is also the issue of you not being able to push commits to the github > repo because > github is forcing everyone to use 2FA. I haven't been hit by that yet for some reason. I push from the command line anyway (which is basically ssh), so if I lost website access I could presumably still update the README to let people know where to go. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Microsoft github took down the xz repo.
On 3/30/24 15:11, Rob Landley wrote: > upstream of the xz-embedded repo with the public domain code I cloned is: > > https://git.tukaani.org/xz-embedded.git > > Which is still available. Although now that I look at it, a5390fd368f8 in september is the last commit that wasn't from the backdoor guy anyway, so nothing new of interest. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] Microsoft github took down the xz repo.
FYI, Microsoft Github disabled the xz repository because it became "controversial" (I.E. there was an exploit in the news). https://social.coop/@eb/112182149429056593 https://github.com/tukaani-project/xz I'm assuming if toybox ever has a significant bug, microsoft would respond by deleting the toybox repository. There's a reason that I have https://landley.net/toybox/git on my website, and my send.sh script pushes to that _before_ pushing to microsoft github. Luckily the xz guys don't seem to trust microsoft github either, because the upstream of the xz-embedded repo with the public domain code I cloned is: https://git.tukaani.org/xz-embedded.git Which is still available. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] Clean up xz a good amount
On 3/29/24 17:50, Oliver Webb wrote: >> > ah, crap, that's another thing to put on the riscv64 to-do list... >> > (thanks for bringing that to light!) >> >> so, TIL that upstream already added a risc-v bcj implementation... > > I always thought that the xz decompresser we use in toybox ("xx-embeded") and > the main > one (The one with the CVE) were different projects (Separate git repos, one > is much slower > than the other, etc). The exploit was somebody checked a "test case" into the build system that hacked the rest of the build with an x86-64 binary blob that linked before the other functions? https://youtu.be/jqjtNDtbDNI I was only halfway paying attention once I was sure it didn't affect toybox. My systems here use dropbear for ssh anyway, yes including my laptop. :) > That being said, There are 0BSD licensed parts in the xz repo > (one of SIX different licenses). Huh, really? Cool... >> (rob will of course be delighted to hear of systemd's involvement in >> the exploit chain :-) ) > > Who would've known that a over-complicated, extremely large hairball with a > massive dependency chain > that tries to consume _everything_ makes it easy to perform exploits. Deleted long grumbling about adding complexity probably means you're _reducing_ security because the system is less auditable: a signing chain of custody is still GIGO it just means it was delivered to you by TIVO with a mandatory EULA so you can't personally FIX it... Ahem. Tangent. Not going there. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] Clean up xz a good amount
On 3/29/24 17:28, enh wrote: > On Wed, Feb 28, 2024 at 9:13 AM enh wrote: >> > > @@ -639,6 +640,20 @@ enum xz_ret xz_dec_bcj_run(struct xz_dec_bcj *s, >> > > struct xz_dec_lzma2 *lzma2, >> > > */ >> > > enum xz_ret xz_dec_bcj_reset(struct xz_dec_bcj *s, char id) >> > > { >> > > + switch (id) { >> > > + case BCJ_X86: >> > > + case BCJ_POWERPC: >> > > + case BCJ_IA64: >> > > + case BCJ_ARM: >> > > + case BCJ_ARMTHUMB: >> > > + case BCJ_SPARC: >> > > +break; >> > > + >> > > + default: >> > > +/* Unsupported Filter ID */ >> > > +return XZ_OPTIONS_ERROR; >> > > + } >> > > + >> > >s->type = id; >> > >s->ret = XZ_OK; >> > >s->pos = 0; >> >> ah, crap, that's another thing to put on the riscv64 to-do list... >> (thanks for bringing that to light!) > > so, TIL that upstream already added a risc-v bcj implementation... I'm happy to call the public domain repo our "upstream" for this, but there's still some collation damage (they have many files and we want either one or two), and a lot of cleanup that could be done in our code that moves it farther from their code. As for whether we want one file or two: one model is the engine in the command ala toys/*/bzcat.c and the other is lib/deflate.c called by toys/*/gzip.c (but also available for other things to pull in without having to fork a child process and pipe data through it). But the real difference there is deflate has half an inflate already that I REALLY SHOULD FINISH (dictionary selection and resets, everything else is just a question of doing the work) and xz compression seems a bit out of scope. (Being able to read everything: yay. Being able to compress data, gzip is the 80/20. Modulo busybox refuses to build without bzip2 compression (I hit it until it confessed in mkroot/packages/busybox.c but that broke all the help text), and I did WRITE a cleaned up bzip2 engine many moons ago (reposted it here not to long ago I think), so I _could_ have a lib/bzip2.c with a compression side if I wanted to? Modulo the bzip2 compression side string sort logic never made sense to me (what is the logic of falling back from one sort mechanism to the next, why those in that order with those thresholds) so to test my engine I had to block copy the original sort logic, which has licensing issues... > ...but i only learned that because i was looking at > https://www.openwall.com/lists/oss-security/2024/03/29/4 which was > fascinating in many ways. > > (rob will of course be delighted to hear of systemd's involvement in > the exploit chain :-) ) I saw a youtube video on it, and it's been all over mastodon today. So much unnecessary complexity. Adding layers to "solve" problems is painting over dry rot. There are reasons I also want to simplify the build system itself, and care so much about comparing the behavior across multiple platforms... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Poke about the bc.c cleanup patches I submitted a while ago
On 3/27/24 08:31, Rob Landley wrote: >>> ipcrm, ipcs, ... >> I don't know how I'm supposed to test resources I have no way to create, >> we'll need ipcmk eventually. These seem more feasible to test, although >> their tests will fail under mkroot until we >> have ipcmk ... > To be honest, I'm tempted to clean up and promote them to "examples". Leaving > them "default n". There in case somebody needs it, but if so it would be nice > if > they could send us a note letting us know they exist... I did a quick cleanup pass on ipcrm, but... yeah, I have no idea how to test this? Also... what ARE keys vs IDs? I thought ID was a number and a key would be arbitrary strings, because a key gets washed through a lookup function and an id is just strtol(), but the code that's there does: function(int key, char *name...) { ... id = strtol(name, , 0); if (*c) { error_msg("invalid number :%s", name); return; } if (key) { if (id == IPC_PRIVATE) { error_msg("illegal key (%s)", name); return; IPC_PRIVATE is zero. So even if you set "key" to 1, strtol() has to consume the whole thing or you get "invalid number" error and an abort before it even checks key. There's no !key test around that first bit. And then right afterwards it checks if the strtol() it did returned zero (IPC_PRIVATE is zero) and barfs if it did, so even if that first part WAS a thinko with a missing test, it still wouldn't work for anything that didn't at least START with a nonzero number. So what's a "key"? I did a "git log */ipcrm.c" over in busybox and there hasn't been a patch to it from an actual USER of this command since it was introduced. It's all code size shrink, compiler flag damage, white space fixes, help text style updates, annotating with size estimates, NOEXEC, "make GNU licensing statement forms more regular", "use can't instead of cannot", using EXIT_SUCCESS and EXIT_FAILURE macros (really???), whatever "strtoul() fixes" was, and so on. Churn for being a busybox applet, global search and replace over the tree. No actual _user_ of the code has touched it since it was added to the tree, and it turns out that was MY fault: commit 6eb1e416743c597f8ecd3b595ddb00d3aa42c1f4 Author: Rob Landley Date: Mon Jun 20 04:30:36 2005 + Rodney Radford submitted ipcs and ipcrm (system V IPC stuff). They could use some more work to shrink them down. And in my defense, I had no idea what they WERE back then. That whole mess started with a poke from some Qualcomm developers from India: http://lists.busybox.net/pipermail/busybox/2005-June/048807.html Which led to a newbie looking for something to do asking how you submit new commands to the project: http://lists.busybox.net/pipermail/busybox/2005-June/048828.html And then two other devs piping up to show interest: http://lists.busybox.net/pipermail/busybox/2005-June/048847.html http://lists.busybox.net/pipermail/busybox/2005-June/048848.html Which led to the patch. So three people showed interest in 2005, resulting in a new dev porting the commands from util-linux-2.12a, but none of them actually submitted anything like a test case: http://lists.busybox.net/pipermail/busybox/2005-June/048851.html So I have what I think is a cleaned up version but can't prove I didn't break it, and I have no idea if 19 years after it was added to busybox and then (as far as I can tell) completely ignored... anyone still cares? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] modeprobe.c and last.c: Codeshare identical llist_add()
On 3/26/24 15:05, Oliver Webb via Toybox wrote: > 2 identical versions of the same function, variable names and everything > > 31 bytes saved in bloatcheck The problem being it moves code from pending/ to lib/ whose only users are in pending. I've generally just done singly linked list additions inline. When you don't mind reversing the list order it's literally two assignments and a dereference; node->next = head; head = node; Pushing two arguments onto the stack and making a function call is approximately as much code. (When I want to preserve list order I tend to use the existing doubly linked list functions.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/25/24 20:24, enh wrote: > But "dpkg-query -S $(which $NAME)" is pretty easy to do the mapping > yourself on > debian... > > > (yeah, though i suspect anyone trying to do this hypothetical "swap package $X > for toybox" would want the _opposite_ mapping, from package name to all the > commands. and i don't know of a way to ask apt that question? $ dpkg-query -L tar | grep bin/ /bin/tar /usr/sbin/rmt-tar /usr/sbin/tarcat > other than > brute-forcing all of the executables in all of the directories in $PATH, > anyway.) Checking the $PATH would be clever but the above covers it for me. There are some insane packages which crap binaries under /usr/lib, such as /usr/lib/libreoffice/program/oosplash or /usr/lib/man-db/manconv and generally I consider these packages to be maintained by madmen. I mean honestly: $ cat /usr/bin/7z #! /bin/sh exec /usr/lib/p7zip/7z "$@" Why would you do that? Why would ANYONE voluntarily DO that? Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/25/24 20:20, enh wrote: > On Sun, Mar 24, 2024 at 12:45 AM Rob Landley wrote: > On 3/22/24 10:24, enh wrote: > > On Thu, Mar 21, 2024 at 8:45 PM Rob Landley wrote: > >> Anyway, toys/android basically meant (to me), "commands that come from > and are > >> maintained by Elliott which I can't even test because they don't apply > to a > >> vanilla linux system that isn't running the full android environment". > Although > >> that's a personally idiosyncratic definition because I lumped selinux > in with > >> that; > > > > (heh. you beat me to it :-) ) > > If the new kconfig greyed out unavailable entries and had a status line > saying > "depends on TOYBOX_ON_ANDROID" or similar when you cursored over a greyed > out > entry... > > ah, as the kind of lunatic who only ever edits these files by hand with vi, > i'd > actually just assumed that was kind of the whole point of the _existing_ > kconfig > stuff? To me half the point is it's the same UI as configuring the linux kernel, busybox, and buildroot. Meaning A) a bunch of people out there are familiar with it already, B) presumably the worst sharp edges have been filed off over the past 15 years. > (to be fair, i did launch it once, but saw it was a ridiculously deeply > nested > ui [and not expanded by default?], and thought "i don't understand the purpose > of this", couldn't see how to search, It literally has help text at the top of the screen. Forward slash is search, cursor up and down, space to toggle the highlighted thingy, enter to go into a menu, ESC to back out again, ESC from the top level to exit (it prompts you whether or not you want to save), ESC twice from _that_ to abort the exit. There's also a menu at the bottom, where if you cursor left and right it highlights different things, and the ENTER will do that thing instead. (The default is "select". I cursor right to "help" and hit enter because I never remember that ? is the hotkey for that.) Mostly I'm assuming "same UI as linux kernel" is like 2/3 of the userbase though. > and immediately went back to editing by > hand. at least that way i only need to know how to use my editor, which i need > to know regardless :-) ) Dependency resolution comes to mind. > If we really wanted to rush this, I could make a TOYBOX_UNFINISHED symbol > that > the pending stuff could depend on, and then the blocker is the kconfig > replacement... > > no, i've been cursing the broken tab-complete for -- wow, almost a decade now! > -- so i think i can survive :-) I admit I sometimes do "ls toys/*/skel* when I can't remember whether I called it "example" or "examples". > Not THIS release though. Working on release notes! (And lowering my > standards on > the todo list.) > > indeed... something that benefits the handful of folks working on toybox isn't > worth much compared to something that benefits the users! Working on it... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/24/24 10:15, Oliver Webb wrote: > On Sunday, March 24th, 2024 at 04:09, Rob Landley wrote: >> On 3/24/24 01:00, Oliver Webb wrote: > >> This isn't the hard part. To me, the hard part is wanting to share lib/.c >> code >> with this new binary, which implies it would live in toys/example/.c, which >> means in the NEW design it would be a normal command that's "default n"... >> and >> maybe depends on TOYBOX_BUILD or some such? Except moving stuff from >> scripts/.c >> to toys/.c is conceptually ugly. But if we're getting rid of the >> subdirectories... Maybe make.sh needs to be able to build commands that DON'T >> live in toys/ but then... > > There is a chicken and egg problem with the build infrastructure and kconfig > being a toy, Yes, I know. That's why I've avoided it up until now. > We need a .config file to build toys, and parsing the help text requires some > kconfig > parser, But we can't make a .config file until we have kconfig. You don't need a .config file to build lib/*.c (policy, and why lib/lib.h is separate from toys.h). I'm talked before about doing packaged minimal headers to build "sed" and "sh" standalone, as part of toybox airlock stuff. (Possibly the full airlock command list needed to build toybox.) Ones that assume all the config probes failed and $LIBRARIES is empty and so on. The EASY way to do that is to have a scripts/shipped/generated with handcrafted headers files, and then stick -I scripts/shipped at the start of $CFLAGS. The hard way involves more cleanup so there are fewer header entry points, and I could have a single "hairball.h". The individual toys/*/*.c files only #include toys.h and generated/flags.h (which could be a re-include of toys.h with a little #ifdef cleverness). The rest are all included from toys.h and main.c. The above #ifdef cleverness could wrap the generated/ includes in toys.h in a __has_include("hairball.h") or similar, so I could provide a single replacement file with the collated stuff I need for specific commands to build in a way that assumes the host system has no brain, and then generated/build.sh it. The problem is the includes are in two places: "generated/config.h" comes before lib/portability.h (which comes before everything else so it can override standard header #includes), and then the rest of the generated/*.h are after a #define NEWTOY() and OLDTOY() so there's some reordering to do to combine them all into one header. (I don't think any of the generated/*.h files care about stuff in portability.h? There would be various structs used before they were defined in generated/globals.h if it got moved up, but that _should_ be ok? Nothing takes the sizeof() them or similar that early. Eh, I should be able to work it out, just haven't sat down to try yet. Far too many already open cans of worms...) And then there's the #includes in main.c, the other half of the declaration/definition pair for various global data: that needs newtoys.h, help.h, and zhelp.h. One of which is already chopped out by a config option, the second of which just needs a way to stub it to "", and which leaves newtoy.h to address... > The solution I thought of was to use the infrastructure that we will have to > have to remove > bash and gsed dependencies to build kconfig as a early step in the process. No. > But then we will > still need to extract the help text. config TOYBOX_HELP bool "Help messages" default y help Include help text for each command. You can configure help out entirely. (This isn't CONFIG_HELP the command, this is "the help subsystem" in the toybox general settings menu.) This was intentional. > Do you plan on not keeping 2 different kconfig parsers or moving scripts/*.c > to toys/example Look up at the first paragraph of mine you quoted in this email. It's an open question, but stripping down a "cc -I scripts/prebuilts main.c lib/*.c toys/*/{abc,def,ghi,jkl}.c" build so it could provide commands with nothing but a compiler would be a step towards that. Modulo that "cc *.c" doesn't parallelize across processors because C++ developers took over compiler development about 2 years after the Core Duo hit the market and brought SMP to the cheap retail mainstream, at which point making compilers better rather than merely more complicated hit a sudden brick wall. And thus even on my ancient 4x laptop: $ time make clean defconfig toybox ... real0m16.170s $ time generated/build.sh ... real0m27.474s I don't want to significatly slow down the build by compiling prerequisites? In theory: $ time gcc -I . main.c lib/*.c -o blah ... real0m1.780s (Yeah exits with a link error but that's not the point.) And I mean yeah, 2 seconds, not that big a deal. But I'd pr
Re: [Toybox] hexdump tests.
On 3/25/24 10:42, enh wrote: > On Sun, Mar 24, 2024 at 1:40 AM Rob Landley wrote: >> >> On 3/22/24 15:02, enh wrote: >> >> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION=1? so we trust ourselves but >> >> > no-one >> >> > else? :-) >> >> >> >> I _don't_ trust myself, and I'm not special. (That's policy.) >> > >> > yeah, but that's why i suggested >> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION --- that way we can say "we >> > can't make hard assertions about the _host's_ whitespace, but we can >> > still make hard assertions about _ours_". if we just canonicalize all >> > the whitespace all the time, we can't (say) ensure that columns line >> > up or whatever. >> >> Or we could just "NOSPACE=1 TEST_HOST=1 make tests" if that's the test we >> want >> to run...? > > it's not though. that's my point. there are several cases: > > 1. testing toybox --- we know what whitespace we're expecting to > produce, and want tests to protect against regressions. > > 2. testing host tools --- we _don't_ have control over what whitespace > the host produces. > a) in some cases we manually mark individual tests to show "we don't > care about host whitespace for this test case". > b) sometimes this applies to _all_ the tests for a toy. > > we're talking about case 2b here, which is currently the > least-well-supported variant. You can NOSPACE=1 in an individual tests/command.test and it should last until the end of the file? That's why scripts/test.sh does: # Run command.test in a subshell (. "$1"; cd "$TESTDIR"; echo "$FAILCOUNT" > continue) So the variables and functions and so on defined in one test don't leak into others. I spent like 3 commits getting that to work properly, the last of which was commit 07bbc1f61280 and mentions the previous 2. > i think we're talking at cross purposes because _i'm_ talking about > variables set _within the tests, by the tests themselves_ and you're > talking about variables set on the command-line, which i don't think > make any sense here, because we're talking about properties of the > individual tests/commands. There are three scopes: 1) Variables exported into all tests POTATO=1 make tests 2) Variables set for a single test: POTATO=1 testcmd "thingy" "-x woo" "expected\n" "file" "stdin" 3) Variables set for the current test file. [ -n "$TEST_HOST" ] && NOSPACE=1 Which is just a normal assignment (or export) in a tests/file.test, they go away at the end of the current file (because of the above parenthetical subshell calling it), and which was the new thing I added in 2022. I remember my first attempt at this years ago ctrl-c didn't work reliably, but the fix to that was just a trap at the top of scripts/test.sh: # Kill child processes when we exit trap 'kill $(jobs -p) 2>/dev/null; exit 1' INT > (unless you really do want to say "there's absolutely nothing we can > do about host whitespace, so give up completely", which i think has > yet to be proven that it's _that_ bad. but there are commands where > having a test that says "this whitespace -- that toybox produces -- is > reasonable [but as long as the non-whitespace matches, and there's > _some_ whitespace everywhere we have whitespace, we'll accept any > whitespace from the host tool]".) I think per-command [ -n "$TEST_HOST" ] && NOSPACE=1 might be reasonable. I'd rather not blanket do it for all commands. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Poke about the bc.c cleanup patches I submitted a while ago
Yesterday I did NOT spend all my energy reading email, and instead got https://landley.net/bin/toolchains updated with a musl 1.2.5 and or1k and riscv in the list, and that seems to have fixed the sh2eb build break as well (although I haven't tried booting it on a Turtle board yet, haven't unpacked any here in Minneapolis...) and rebuilt all the mkroot targets against the 6.8 kernel (the tmpfs patch went upstream-ish but the rest all still apply, none of those issues will ever voluntarily be fixed by the kernel clique), and the tests told me I need kernel/qemu configs for armv4l armv7m microblaze mips64 riscv32 riscv64 sh4eb, which reminded me of my "make the fdpic loader work on sh4 with mmu work" which should become another patch and get finished now that I've got updated toolchains with the sh4 longjmp bug fixed... But today I'm being good and back to spending my energy responding to email instead. On 3/24/24 21:45, Oliver Webb wrote: > On Sunday, March 24th, 2024 at 18:27, Rob Landley wrote: > >> > I've been looking to do a cleanup pass on bc because there are a lot of >> > very obvious things >> > that can be removed (typedefed structs as far as the eye can see, all the >> > "posixError" garbage, >> >> Agreed. I still haven't decided whether to throw it out and start over, but >> you >> can't make it worse. (Your cleanup patch broke xzcat, but I can't tell if >> this >> one is right or wrong outside of its test suite already, and only really care >> about the kernel timeconst.bc use case anyway, so...) > > Permission to remove the annoying signal handling that only really matters > (gets in the way of exiting) > on interactive sessions? "You can't make it worse." >> Why typecast at all? You're assigning to a variable of that size, shouldn't >> the >> typecast do the assignment? (Does this suppress a warning or something?) > > I did ":%s/uchar/char/g" instead of going over every individual use of > "uchar", > This patch (attached) removes a lot of those unnecessary typecasts, and > cleans up > the code formatting a lot, among other things like getting rid of the > posixError stuff, > about 350 lines removed > >> Is sizeof(char) ever not 1? > > There is support for multi-byte chars in gcc (i.e. "char x = 'ABCD';") That's a character literal (which has a return type int), not a char variable. Assigning it to a char will give you... I'm going to guess 'D'. > but noone uses that terrible extension from my knowledge It seems to warn about using it by default, even: $ cat test2.c #include int main(int argc, char *argv[]) { char c = 'ABCD'; printf("%d\n", c); } $ gcc test2.c test2.c: In function ‘main’: test2.c:5:12: warning: multi-character character constant [-Wmultichar] char c = 'ABCD'; ^~ test2.c:5:12: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘1094861636’ to ‘68’ [-Woverflow] $ ./a.out 68 >> > or the xz stuff, >> >> If you want to peel out individual upstream public domain xz patches and >> adapt >> them (one at a time) to apply to toybox's xzcat, I'd be very interesting in >> reading and applying the results. > > The main problem is that it takes a lot of work to patch upstream stuff and > not break everything, > I'll see what I can do, but I can't guarantee that I'll be able to get the > bigger blocks of code > like the ARM64 decoder in. > >> > nor the csplit regressions I started to patch out, >> >> What were the csplit regressions? > > A lot of things since I was testing the command manually when I first wrote > it, A test suite that TEST_HOST passes would be nice. I have the start of one, but csplit is such an utterly terrible command (a half-assed sed that only wants to write to files), I can't wrap my head around what anybody would ever WANT to use it for. I mean why have "prefix" and "suffix" when suffix is an arbitrary sprintf string? Prefix on WHAT, it's not adding in the input filename, and you can't if you try: $ seq 1 10 | csplit - 2 %4% 7 -b '%s' csplit: invalid conversion specifier in suffix: s I checked busybox to see if they had tests, but the only mention of csplit in the entire git tree there is docs/posix_conformance.txt under "Tools not supported". >> Glancing at pending, I don't have a test environment for >> arp, arping, > > Networking administration stuff for ARP caches that can manipulates kernel > ARP table entries, > would probably require mkroot to test safely. Yes, I know. >> bootchartd, > > A command with no standard; Described as "bootchartd is commonly used to > profile the boot process.&
Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set
On 3/24/24 18:40, Rob Landley wrote: >> Also, different command names, there's a dozen different vi implementations >> and >> only a few have the name "vi". This is true for some other commands as well > > I've been doing: > > mkdir sub > ln -s $(which potato) sub/vi > PATH=$PWD/sub:$PATH make tests > > Comes up a bit already, such as testing toybox tar --xform which requires > toybox > sed, and thus even the standalone test skips those unless you put toybox sed > in > the $PATH. > > In theory you could PATH=$PWD/sub:$PATH TEST_HOST=1 make test_vi above, in > which > case "C" should wind up pointing into sub... P.S. I don't want to commit to there still BEING a "C" a year from now. That's an internal implementation detail, not an API. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set
On 3/22/24 16:10, Oliver Webb wrote: >> On 3/21/24 21:38, Oliver Webb via Toybox wrote: >> >> > A mildly annoying issue of you are trying to test with different >> > implementations of commands >> > such as plan9 ones or sbase or busybox ones, things with different >> > conflicting implementations >> > of things like xxd or vi. With this patch you can do "make test_cmd >> > TEST_HOST=1 C=/path/to/other/cmd" >> > and have it work >> >> I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for >> years, I didn't know that needed to be documented... > > plan9 has a incompatible diff implementation, which means to test plan9 utils > I'd > either need to separate diff from the rest of the binaries or have some way > of overriding "C". > > Also, different command names, there's a dozen different vi implementations > and > only a few have the name "vi". This is true for some other commands as well I've been doing: mkdir sub ln -s $(which potato) sub/vi PATH=$PWD/sub:$PATH make tests Comes up a bit already, such as testing toybox tar --xform which requires toybox sed, and thus even the standalone test skips those unless you put toybox sed in the $PATH. In theory you could PATH=$PWD/sub:$PATH TEST_HOST=1 make test_vi above, in which case "C" should wind up pointing into sub... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/24/24 01:00, Oliver Webb wrote: > I've done some research on this too, we have no "select" statements in any of > our config symbols, for a definition of "we" that is "I have intentionally not merged any", since I review and approve all the kconfig command sections in the headers and have been tracking that. (At one point the config2help.c stuff was trying to stitch dependencies together to merge help text, and didn't understand complicated syntax.) That said, forking the kconfig language definition is not something I do lightly. Ours has fallen way behind the kernel's, and thus looks like something else but is only compatible with a subset of it. We are about to _shrink_ that subset. This needs a FAQ entry at least. > but we do have a fair amount of that ""SYMBOL && (SYMBOL||SYMBOL)"" > expression processing that's > annoying to deal with. I was referring to that, yes. I need to implement processing for it. I've already implemented such processing in find, test, and twice in toysh (both command && command and $((math&)) ). > Also a "choice" block and a few number ranges in the main Config.in we will > need to deal with in some way, the depends/selects stuff seems easy but with > that expr evaluating probably isn't Yes, I know. > I tried to write a kconfig parser (As a toy to make the codesharing easier) I've written at a bunch, and mostly thrown them away again. There's a simple one in scripts/config2help.c and wrote one in python at https://landley.net/hg/kdocs/file/tip/make/menuconfig2html.py which generated https://landley.net/kdocs/menuconfig/ way back when. (Those are the only two published ones that come to mind, but I've written more over the years.) > and got absolutely nowhere. The approach I took to it was... This isn't the hard part. To me, the hard part is wanting to share lib/*.c code with this new binary, which implies it would live in toys/example/*.c, which means in the NEW design it would be a normal command that's "default n"... and maybe depends on TOYBOX_BUILD or some such? Except moving stuff from scripts/*.c to toys/*.c is conceptually ugly. But if we're getting rid of the subdirectories... Maybe make.sh needs to be able to build commands that DON'T live in toys/ but then... Unanswered design questions looming here, have not been jigsawed into an elegant picture yet. (How much of that is assembling pieces and how much is SAWING THEM UP I don't know yet...) Anyway, it seems like config2help.c should also share this plumbing if it's parsing the kconfig input anyway, which is convenient since I've been meaning to rewrite all that too (and yes THAT has a motivating "somebody is waiting for me to fix this", ala https://github.com/landley/toybox/issues/458 ), but there's also the usage: line regularization (https://landley.net/notes-2023.html#06-11-2023) and fixing the remaining sub-options with maybe some sort of help text include syntax for inserting other help texts at controllable points (as either blogged about or mentioned here on the list, I'd have to check my notes to see where I left off on that)... Once I've got the design worked out, coding it is usually the easy part. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] hexdump tests.
On 3/22/24 15:02, enh wrote: >> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION=1? so we trust ourselves but >> > no-one >> > else? :-) >> >> I _don't_ trust myself, and I'm not special. (That's policy.) > > yeah, but that's why i suggested > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION --- that way we can say "we > can't make hard assertions about the _host's_ whitespace, but we can > still make hard assertions about _ours_". if we just canonicalize all > the whitespace all the time, we can't (say) ensure that columns line > up or whatever. Or we could just "NOSPACE=1 TEST_HOST=1 make tests" if that's the test we want to run...? >> Erik did lash (lame-ass shell) to be tiny, Ash was the bigass lump of >> complexity >> copied out of debian or some such and nailed to the side of the project by >> that >> insane Russian developer who never did learn english and communitcated >> entirely >> through a terrible translator program (so any conversation longer than 2 >> sentences turned into TL;DR in EITHER direction, he was also hugely >> territorial >> about anybody else touching "his" code), and msh was the minix shell mostly >> used >> on nommu systems. > > did lash _stay_ tiny? Yes, but it was also borderline unusable. > i feel like the trouble with projects like that > is usually that no-one can agree on what's necessary versus bloat, so > you trend towards just being a bad implementation of whatever. iirc > inferno had _two_ different "tiny" shells. Erik implemented something tiny for his own personal use, and ignored everybody else who tried to add stuff to it. When Erik moved on, I studied it. When I moved on, Bernhard removed it: https://git.busybox.net/busybox/commit/?id=96702ca945a8 >> > because, to be fair to the confused, in english >> > "pending" _can_ legitimately mean "almost there". whereas your whole point >> > with >> > pending is "i actually have _no_ idea how close this is yet". >> >> Linux has drivers/staging but I didn't like that. > > yeah, "staging" also sounds very much like "nearly there!". The problem is motivated reasoning. We could call the directory instant_death_do_not_touch and people would still enable stuff in it to see if it worked for them. (And then ship it when it Worked For Them.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/22/24 10:26, enh wrote: > On Fri, Mar 22, 2024 at 8:24 AM enh wrote: >> (tbh, just merging "lsb" into "other" would be a step forwards. wtf >> is/was "lsb" anyway? and while i can _usually_ guess "POSIX or not?" >> correctly, "lsb or other" is impossible by virtue of being >> meaningless.) > > (and to be clear, although "lsb" is particularly obscure, i think this > is the same problem busybox's organization has: why do i have to care > whether something is in coreutils or linux-utils or procps? how is > that relevant to me? There's a reason I didn't use that as an organizing method. Although I did try to map them at the end of the roadmap, and need to redo that analysis now since it's been a while... > the best answer i can think of is "because i want > to only use toybox/busybox to replace _that_ package", but i don't > think the _directory structure_ helps there, right? that hypothetical > person actually wants more metadata in the kconfig part of the comment > inside each file?) That's the theoretical use, yes. So distros (and system builders like gentoo, buildroot, yocto, etc) can annotate package alternatives so if you want to install busybox's tar instead of gnu tar your package management system could cope. In practice, making something like dpkg handle that was near impossible, and buildroot only did it because the maintainer of busybox created buildroot. I tried to add toybox to buildroot years ago and... https://lists.buildroot.org/pipermail/buildroot/2014-September/409298.html People still try from time to time: https://lists.buildroot.org/pipermail/buildroot/2017-January/181960.html http://lists.busybox.net/pipermail/buildroot/2022-September/652474.html But even a build system that ALREADY lets you swap in/out buildroot vs gnu versions of packages accomplished that by hardwiring busybox support deep into its build system. Getting something like debian to do that on the fly is... it's not really designed for it. I can think of better ways to do it (and am studying debian's build system in my copious free time), but I've been busy with other things and most people aren't motivated to try... I note that I did it by hand back when creating aboriginal linux, which is what led me to maintaining busybox in the first place, ala: https://landley.net/aboriginal/old/ > When the Firmware Linux project started, busybox applets like sed and sort > weren't powerful enough to handle the "./configure; make; make install" of > packages like binutils or gcc. Busybox was usable in an embedded router or > rescue floppy, but trying to get real work done with it revealed numerous > bugs and limitations. > > Busybox has now been fixed, and in Firmware Linux Busybox functions as an > effective replacement for bzip2, coreutils, e2fsprogs, file, findutils, gawk, > grep, inetutils, less, modutils, net-tools, patch, procps, sed, shadow, > sysklogd, sysvinit, tar, util-linux, and vim. (Eventually, it should be > capable of replacing bash and diffutils as well, but it's not there yet.) That's the old page from before I restarted the project and renamed it Aboriginal Linux (based on QEMU instead of User Mode Linux, ala https://landley.net/notes-2005.html#27-10-2005). Before that I was going though the Linux From Scratch package list and _disposing_ of gnu packages, one by one, as I got busybox to replace them. But "dpkg-query -S $(which $NAME)" is pretty easy to do the mapping yourself on debian... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/22/24 10:24, enh wrote: > On Thu, Mar 21, 2024 at 8:45 PM Rob Landley wrote: >> Anyway, toys/android basically meant (to me), "commands that come from and >> are >> maintained by Elliott which I can't even test because they don't apply to a >> vanilla linux system that isn't running the full android environment". >> Although >> that's a personally idiosyncratic definition because I lumped selinux in with >> that; > > (heh. you beat me to it :-) ) If the new kconfig greyed out unavailable entries and had a status line saying "depends on TOYBOX_ON_ANDROID" or similar when you cursored over a greyed out entry... There _is_ a way to collapse everything together into one directory and make it manage-ish-able. But there are currently 52 command files in pending, and "ip.c" alone is 6 commands and 3000 lines of "we already have route and ifconfig and iptables and so on as separate commands, why did they do it again?" >> It's been the status quo for a dozen years now (commit 3a9241add947 in 2012) >> and >> moving everything AGAIN would have costs, so I'd want a reason and assurance >> that we're not going to change our minds again. > > for me the holy grail is "tab complete works and i don't have to think > about arbitrary partitions". It's a good point. > i think "not yet default 'y'" is pretty > defensible (though the reason we're having this discussion is because > people _don't_ read "pending" as "danger, keep out!"), but the rest > seem so arbitrary. I'd like there to not BE "danger, keep out" in the tree, but a certain large korean company wanted their contributions checked in, I fell behind, and it snowballed from there. >> Collapsing the directories >> together when the last command is promoted (or deleted) out of pending might >> make sense, figuring out what to do about example/ (trusting to the demo_ >> prefix >> to annotate the example commands is nice, but hello.c hostid.c logpath.c and >> skeleton.c would need... something). > > no, i think example/ is defensible too. (i'd argue you're only ever > going to look in there if you have a _reason_ to. or you've done a > `grep -r` for something you're changing/checking all references to. > the reason i completely forgot about example/ is that it never causes > me the "where the fuck is _mount_?!" annoyance :-) ) Right now everything is at the same level. Having files at two different levels is not a simplification. Designing a way to have toys/*.c with no subdirectories and make it manageable seems a reasonable goal, if tricky to get to. Having toys/*.c _and toys/*/*.c does not smell like an improvement? We've got: android example lsb net other pending posix Pending needs everything cleaned up and prompted or deleted. Posix can be a defconfig file. Example can be commands that "default n". Android isn't necessary if a kconfig replacement greys things out instead of hiding them and displays WHY they're greyed out when you cursor over them (and the rewrite is needed to address pull request 332). Other, net, and lsb aren't sufficient distinction to persist in the absence of other directories. And that's all of them, I think? If we really wanted to rush this, I could make a TOYBOX_UNFINISHED symbol that the pending stuff could depend on, and then the blocker is the kconfig replacement... Not THIS release though. Working on release notes! (And lowering my standards on the todo list.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less
On 3/21/24 06:52, Jarno Mäkipää wrote: > On Thu, Mar 21, 2024 at 1:08 AM Rob Landley wrote: >> >> > There is also a testing problem. vi.c doesn't do TEST_HOST because it >> >> > needs a -s option >> >> > to pass in scripts to test with. >> >> >> >> Which is an issue I need to figure out how to address. What does a test >> >> that >> >> only toybox passes actually prove? (That it hasn't changed since we last >> >> looked at it?) >> > >> > There is vi -c which preforms a ex command which we could implement > > I took -s from vim, so toybox vi could be tested comparing to vim, > since vi itself does not have -s. And I was not interested in -c since > ex was out of the scope of implementation at that time. I'm not saying it's bad, I'm saying it's not sufficient. (The toysh tests have _both_ "testcmd" and "shxpect" tests.) Also, I'm not UPSET that someone's been making vi usable. Something is better than nothing and I'm thankful. I'm just really annoyed at myself for not having been able to get to it myself in a reasonable amount of time. The vi that's there has users, and at some point I _do_ need to go through and digest it all and wrap my head around it and take ownership of the thing, but i haven't even managed to reboot my laptop for months to install the new devuan version and put the 16 gig memory sticks back in, because I've been opening tabs as fast as I've been closing them and trying to close them turns into "let me fix this one thing real quick"... (It's like trying to pack bookshelves and winding up reading books, which I also spent too much of last month doing.) >> I leave vi to the people who are maintaining that vi. I got out of way for >> that >> command. >> > > Well im not sure who is "maintaining" vi.c at this point, I wrote base > implementation years ago, Elliott extended it with few commands, > because he had some use case for it. But mostly development has been > dormant for few years with few segfault bugfix here and there. Its not > very pleasant experience to maintain it, since everything lead to huge > bikeshedding, since there is no particular standard to follow, > everyone want different things. Indeed. I taught an "intro to unix" course at austin community college many moons ago which had like 20 vi keys on the syllabus (half of which were new to me, and most of which I've forgotten again). And every time I install a fresh debian I have to go through my checklist including: sudo ln -sf vimrc /etc/vim/vimrc.tiny && echo export EDITOR=vi >> ~/.profile Because going into "insert" mode and having the cursor keys crap capital letters all over your text is stupid (this vimrc.tiny mode STILL RUNS THE SAME BIG EXECUTABLE), and as with dash and upstart and mir and unity I suspect Mark Shuttleworth was behind it: https://mstdn.jp/@landley/112119853431329313 And no I'm not typing "vim" any more than gsed, gawk, or gmake... > Also from what I understand reading > your postings, you have never been very satisfied on it. And that is > understandable. The thing is, I'm not a vi expert any more than I was a sed expert before I wrote my own sed (twice). At some point, I have to learn enough awk to write an awk that can replace gawk in every package build in LFS and BLFS (and hopefully someday AOSP), and I'm not looking forward to that. I know I _need_ to, but I'm currently overwhelmed with half-finished stuff and am trying to dig out. I'm somewhat familiar with the subset busybox chose for its vi, although that was always missing several things I use, so good point of reference but not a standard. And I need to read the posix standard for vi. And then I was going to implement some low-hanging fruit have people tell me what they missed... >> >> I have been planning one all along, yes. The crunch_str() stuff I did was >> >> a >> >> first pass at general line handling stuff that could be used by less and >> >> by >> >> shell line editing and by vi and so on, but people wrote a vi that does >> >> not and >> >> never will share code with the rest of those so that's off the table >> >> permanently. > > vi.c uses crunch_str from lib for utf8 handling, there was just few > corner cases it needs to use vi only crunch_nstr, since it cant spit > up text until nul all the time. vi.c tried to use some other > functionality from lib also, but some of it got removed from lib and > some functionality have probably been added way after vi.c was written > in 2018-2020. I tend to do passes over the whole tree from time to time cleaning stuff up and modernizing it. (I re-review commands I hadn't seen
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free > FYI lsb
On 3/22/24 18:09, scsijon wrote: > Date: Fri, 22 Mar 2024 08:24:18 -0700 > >> From: enh >> To: Rob Landley >> Cc: Oliver Webb , toybox >> >> Subject: Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free >> Message-ID: >> >> Content-Type: text/plain; charset="UTF-8" >> >> On Thu, Mar 21, 2024 at 8:45?PM Rob Landley wrote: >>> On 3/17/24 14:52, Oliver Webb wrote: >>>> On Thursday, March 14th, 2024 at 12:04, enh wrote: >>>>> at a high level, it does seem like many/most people interpret "pending" >>>>> as "almost done" (he says, being part of the problem himself, having >>>>> several pending things building and shipping on all Android devices) >>>>> whereas in actual fact it can mean anything from "yeah, actually pretty >>>>> much done" to "will be completely rewritten" via "still just trying >>>>> random experiments trying to work out _how_ this should be rewritten". >>>>> sadly i don't have a better suggestion... >>>> pending/experimental and pending/functional maybe, or something along that >>>> gist? >>> That would be my "not adding more complexity to manage transient clutter >>> that >>> should instead go away" objection, already made. >>> >>>> Then again it'd make it harder to track the history of pending commands, >>>> adding only new ones >>>> to those 2 directories would fix that, but would make the organizational >>>> problem for the old >>>> ones worse. >>> https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering >>> >>> Stop. No. Halt. Wait. Hold it. Woah. Cease. Desist. Caution severe tire >>> damage. >>> Klatu barata nikto. Subcalifragilisticexpialidocious. >>> >>>>> a branch would be the usual git option, but that would probably mean "no >>>>> pending stuff in the main branch" >>>> Also a problem if you want to switch Version Control systems or distribute >>>> tarballs without a .git/ directory. >>> I already DID switch version control systems (from mercurial to git), and I >>> already distribute release tarballs. Why do you think these are new issues? >>> >>>> It'd hide these commands too, >>> I want to close tabs. I am not creating additional scaffolding for clutter >>> management: >>> >>> $ ls -d */toys >>> clean3/toys clean8/toys github/toys kl4/toys kl9/toys >>> toybox/toys >>> clean5/toys clean.old/toys kl10/toyskl6/toys kleen/toys >>> clean6/toys clean/toys kl2/toys kl7/toys kl/toys >>> clean7/toys debian/toys kl3/toys kl8/toys release/toys >>> >>> I already try not to publish quite as much clutter as accumulates locally. >>> >>> There's some real fossils checked into the tree. I started work on gene2fs >>> back >>> under busybox, checked in what I had to the toybox repo in 055cfcbe5b05 in >>> 2007 >>> and haven't LOOKED at it this decade because I just haven't gotten back >>> around >>> to it. Since then they INVENTED EXT4. (I still hope to get back to it, but >>> at >>> the moment I'm answering email.) >>> >>>> For the first time I checked if there were any special branches in the >>>> repo because >>>> I didn't bother to think about that in the months I spent working on it. >>>> >>>>> i still struggle between "other" and "lsb" in particular. >>>> Same here, I can remember the posix commands. >>> Can you? I still have to check some from time to time, and the definition of >>> whether "tar" is a posix command or not is outright eldrich bordering on >>> quantum. >>> >>>> But I don't care about LSB enough to >>>> memorize everything in wants. And keeping all completed commands that >>>> aren't in poisx, >>>> lsb, networking or android >>> The "example" directory is important because it's the only other directory >>> of >>> commands that should not "default y" in defconfig. It has a policy >>> distinction. >>> >>> Back in 2012, when the number of commands was growing fast and having one >>> big >>> directory of them all was getting a bit busy, the alternative of sorting >>> them &g
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/21/24 23:59, Oliver Webb wrote: > On Thursday, March 21st, 2024 at 22:45, Rob Landley wrote: >> On 3/17/24 14:52, Oliver Webb wrote: >> > Same here, I can remember the posix commands. >> >> Can you? I still have to check some from time to time, and the definition of >> whether "tar" is a posix command or not is outright eldrich bordering on >> quantum. > > I can certainly remember them better then the LSB commands. Most of the time > I can > remember if a command is in posix, which is what matters when trying to find > it usually. Congratulations? >> Collapsing the directories together when the last command is >> promoted (or deleted) out of pending might make sense, > > What would happen when a new command shows up and we need to evaluate it then? Presumably once caught up there wouldn't usually be a dozen of them submitted the same month, so I wouldn't fall far enough behind to need a dedicated waiting room. > Or glibc does a new release and yet another thing breaks we need to demote and > re-promote eventually? I don't de-promote commands because glibc does something stupid each new release. That's just normal gnu/braindamage: https://github.com/landley/toybox/issues/450 https://github.com/landley/toybox/pull/364 https://github.com/landley/toybox/issues/362 I de-promoted a command since last release because I rewrite lib/password.c in a way that broke stuff and didn't want people poking me about it, which was me being lazy/whelmed. Not having the option to do that is fine too, and would have made that stay higher on the todo list. (I could also have "default n" it without moving it, I do that locally all the time when in-progress changes break stuff. The difference this time was I'd checked IN the stuff that broke a command, and didn't want to revert it.) >> I also note I think I've figured out how to replace kconfig: I can just make >> a >> list that scrolls up and down with a highlighted entry you hit space on, >> handle >> help text, search, exit/save, resolve selects and depends and have "menus" >> be a >> label line with its contents nested two spaces further to the right. > > [Some paragraphs bikeshedding about kconfig use to be here, may they rest in > a text file > until we get around to doing the kconfig rewrite] Technically a project's maintainer explaining upcoming design issues he actually plans to implement isn't "bikeshedding". Bikeshedding is vaguely related to the Dunning-Kruger effect, in which the question "how hard can it be?" requiring some expertise to actually answer gets people in trouble. Cyril Parkinson is mostly known for Parkinson's Law (work expands to fill available time) but he also came up with the bike shed example, where a committee approving plans for a nuclear reactor defers to the experts enough that at least its budget approval gets discussed quickly, but a committee approving plans for a bike shed will argue far longer about every detail because they think they could do it themselves and have strongly held opinions. Everybody has an opinion on building the bike shed, and thinks their opinion is equally valid as everyone else's with no deference to authority, experience, or expertise. But the thing about a committee approving plans is they STARTED with a viable plan for the thing, which they then ignore because they know better. If you feel like I'm "bikeshedding" about a kconfig replacement when I was involved in https://lkml.indiana.edu/hypermail/linux/kernel/0202.1/2037.html and argued at length with Roman Zippel about https://lwn.net/Articles/160497/ and dug rather a lot through busybox's fork of it back around https://git.busybox.net/busybox/log/scripts/config?id=7a43bd07e64e and already implemented scroll up/down/left right list logic like I'm describing in the "top" command... I think we have a different definition of the term. >> > A possible solution is to... >> >> ... >> >> > Then again... >> >> I need to stop checking email every time I sit down at my laptop, because >> bikeshedding can eat an endless amount of time and I've got other stuff to >> do. >> >> For one thing, I promised to look at >> https://github.com/landley/toybox/issues/486 tonight. > > Sorry for getting in the way of that, the technical discussion about it was > interesting enough to me to respond to. Recently found something to run off to > and do while still benefiting toybox, so I'll stop bikeshedding about stuff > like this. I'm complaining about my own insufficient time management skills, I'm not trying to discourage people from taking an interest in the project. I do find "why is it like this" easier to deal with than "l
Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set
On 3/22/24 16:11, Rob Landley wrote: > On 3/21/24 21:38, Oliver Webb via Toybox wrote: >> A mildly annoying issue of you are trying to test with different >> implementations of commands >> such as plan9 ones or sbase or busybox ones, things with different >> conflicting implementations >> of things like xxd or vi. With this patch you can do "make test_cmd >> TEST_HOST=1 C=/path/to/other/cmd" >> and have it work > > I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for > years, I didn't know that needed to be documented... P.S. The point of C= being a path is otherwise shell builtins tend to get called (so you're not necessarily testing what you think you are), and last I checked I hadn't found a portable mechanism for disabling a specific shell builtin other than providing a path to the command to run. (If you disable _all_ shell builtins the test script could break due to missing commands on some systems.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set
On 3/21/24 21:38, Oliver Webb via Toybox wrote: > A mildly annoying issue of you are trying to test with different > implementations of commands > such as plan9 ones or sbase or busybox ones, things with different > conflicting implementations > of things like xxd or vi. With this patch you can do "make test_cmd > TEST_HOST=1 C=/path/to/other/cmd" > and have it work I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for years, I didn't know that needed to be documented... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
On 3/17/24 14:52, Oliver Webb wrote: > On Thursday, March 14th, 2024 at 12:04, enh wrote: >> at a high level, it does seem like many/most people interpret "pending" as >> "almost done" (he says, being part of the problem himself, having several >> pending things building and shipping on all Android devices) whereas in >> actual fact it can mean anything from "yeah, actually pretty much done" to >> "will be completely rewritten" via "still just trying random experiments >> trying to work out _how_ this should be rewritten". >> sadly i don't have a better suggestion... > > pending/experimental and pending/functional maybe, or something along that > gist? That would be my "not adding more complexity to manage transient clutter that should instead go away" objection, already made. > Then again it'd make it harder to track the history of pending commands, > adding only new ones > to those 2 directories would fix that, but would make the organizational > problem for the old > ones worse. https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering Stop. No. Halt. Wait. Hold it. Woah. Cease. Desist. Caution severe tire damage. Klatu barata nikto. Subcalifragilisticexpialidocious. >> a branch would be the usual git option, but that would probably mean "no >> pending stuff in the main branch" > > Also a problem if you want to switch Version Control systems or distribute > tarballs without a .git/ directory. I already DID switch version control systems (from mercurial to git), and I already distribute release tarballs. Why do you think these are new issues? > It'd hide these commands too, I want to close tabs. I am not creating additional scaffolding for clutter management: $ ls -d */toys clean3/toys clean8/toys github/toys kl4/toys kl9/toys toybox/toys clean5/toys clean.old/toys kl10/toyskl6/toys kleen/toys clean6/toys clean/toys kl2/toys kl7/toys kl/toys clean7/toys debian/toys kl3/toys kl8/toys release/toys I already try not to publish quite as much clutter as accumulates locally. There's some real fossils checked into the tree. I started work on gene2fs back under busybox, checked in what I had to the toybox repo in 055cfcbe5b05 in 2007 and haven't LOOKED at it this decade because I just haven't gotten back around to it. Since then they INVENTED EXT4. (I still hope to get back to it, but at the moment I'm answering email.) > For the first time I checked if there were any special branches in the repo > because > I didn't bother to think about that in the months I spent working on it. > >> i still struggle between "other" and "lsb" in particular. > > Same here, I can remember the posix commands. Can you? I still have to check some from time to time, and the definition of whether "tar" is a posix command or not is outright eldrich bordering on quantum. > But I don't care about LSB enough to > memorize everything in wants. And keeping all completed commands that aren't > in poisx, > lsb, networking or android The "example" directory is important because it's the only other directory of commands that should not "default y" in defconfig. It has a policy distinction. Back in 2012, when the number of commands was growing fast and having one big directory of them all was getting a bit busy, the alternative of sorting them into directories was annotating them with tags, and THAT was a nightmare (of the "this command has three tags" variety). And also implied future pressure to extend the existing kconfig implementation to USE the tags, which would be worse. Moving them into subdirectories, with each command in ONE directory, and a README explaining what the directory was for, with kconfig automatically displaying them in menus and using the first line of the README as the menu's title, seemed the least bad crowd control option at the time. > in a massive "other" folder sorta defeats > the purpose of these directories which are supposed to reduce clutter. It wasn't really about reducing clutter. I mean yeas, back then some web viewers wouldn't display more than 250 files in a directory, the way github truncates at 1000 today: https://github.com/landley/linux/tree/master/arch/arm/boot/dts But the goal was annotating command categories. Posix and pending are obvious, and I mentioned example. Back when I split them up, LSB was still a viable standard (the Linux Foundation hadn't destroyed it yet), and it STILL kind of means "this command existed back in Y2K and was considered part of the base system back then, even if posix never caught up". Several commands in pending get promoted into LSB (such as most of the password stuff, although oddly mkpasswd is NOT in lsb 4.1). Hmmm, possibly instead of a dead standard the linux foundation killed, I should instead check the $PATH of my old red hat 9 install from the dawn of time... Hah, it's still on busybox's website: https://busybox.net/downloads/qemu/rh-9-shrike.img.bz2 Login as user
Re: [Toybox] [PATCH] toysh: Shut up TEST_HOST, correct 3 test cases
On 3/17/24 10:23, Oliver Webb wrote: > On Fri, Mar 8, 2024 at 19:46, Rob Landley mailto:On Fri, > Mar > 8, 2024 at 19:46, Rob Landley <> wrote: >> On 3/7/24 19:39, Oliver Webb via Toybox wrote: >> > Looking at toysh again since the toybox test suite should run under it >> > (in mkroot or under a chroot) A problem seems to be that there is no >> > return command, which breaks runtest.sh to it's core. Dont know how to add >> one in yet >> > >> > On my version of bash (5.2.26) TEST_HOST fails on 3 test cases, >> > and toysh also fails on those cases (Even tho toysh is doing the right >> > thing, the same as bash) The attached patch changes the test file >> > so that 3 test cases are resolved. And TEST_HOST works >> >> Because Chet changed stuff I asked him about, making bash a moving target. > This does bring up the question of what to do with specific edge cases. Since > bash can’t even be consistent with itself, most bash scripts don’t rely on > them, > at least the ones I’ve seen. > > Should we set out to implement every specific edge case, and if so what > version > are we confirming with? Or should we pick what’s most sensible/easiest to deal > with and toyonly the test cases for them. I've been studying the problem space since 2006, have read the bash manual all the way through more than once, read some subset of the 'advance bash scripting guide", and was basically making judgement calls. then Elliott got me talking directly to the bash mainintainer, which from my perspective made a lot of those corner cases a moving target when they weren't before. In fact my FIRST pass at this was matching the bash 2.04b behavior from like 1999 that I used in aboriginal linux, until gentoo's portage scripts needed newer bash features, specifically ~= and some quoting corner case behavior... "What should all those judgement calls be ahead of time, I demand preemptive policy" does not personally strike me as helpful. I was mostly trying to implement what seemed good to me (which still involves asking a LOT of questions and turning them into test cases to see what bash's behavior actually IS), then run the Linux From Scratch and Beyond Linux From Scratch package builds through it to see what broke, then wait for people to complain and take it on a case by case basis. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less
On 3/21/24 16:13, Oliver Webb wrote: > On Thursday, March 21st, 2024 at 15:53, Rob Landley wrote: > >> I note that "more" is from the days of daisy wheel teletypes, and was thus >> designed to work ok without a tty or interaction through cursor keys (you can >> export $COLUMNS and $LINES or just let it guess 80x25), and "less" requires a >> tty and cursor keys. This might make "more" a better fit for on-screen >> keyboards >> that don't provide cursor keys. (Or not...) > > less supports vi keys (hjkl), and all the keybindings of more. less doesn't > require > cursor keys in the same way vi doesn't, it's just how it's more commonly used. Piping data through more doesn't allocate memory. Piping data through less continues to allocate memory as data is accumulated. I don't know if there's a backscroll limit, so I don't know if there's a limit on the amount of memory it allocates. >> I would like to have one implementation sharing code. Implementing "less -R" >> cuts the behavior delta between the two, and having an option to let ctrl-c >> exit >> less (instead of just killing the rest of the pipeline) probably gets us >> close >> enough we to handwave the rest? > > There is less -K and less -E (exit on C-c and exit at EOF respectively), Good to know. > so more_main would look something like: > > void more_main(void) > { > toys.optflags |= FLAG_E|FLAG_K|FLAG_R; > less_main(); > } > > Once we have a good enough less. We could implement a big thing and have it pretend to be a small simple thing, yes. Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less
On 3/20/24 11:47, enh wrote: > On Wed, Mar 20, 2024 at 9:38 AM Rob Landley <mailto:r...@landley.net>> wrote: > > On 3/20/24 00:02, Oliver Webb via Toybox wrote: > > I spotted the more implementation in pending. Looking at it, it's > missing > quite a lot of stuff, > > Such as the ability to go back in a file. > > More never had the ability to go backwards, less did. Different command. > > > (...but there's a lot of confusion because many modern systems have more just > a > symlink to less.) Ooh, there's a fun edge case. A failure mode of busybox is what if you symlink an unknown name to an existing command, busybox says the unknown name is an unknown command. But in toybox, if it doesn't recognize the name toybox_main loops resolving symlinks until it runs out of them or hits a recognized name: // fast path: try to exec immediately. // (Leave toys.which null to disable suid return logic.) // Try dereferencing symlinks until we hit a recognized name while (s) { char *ss = basename(s); struct toy_list *tl = toy_find(ss); if (tl==toy_list && s!=toys.argv[1]) unknown(ss); toy_exec_which(tl, toys.argv+1); s = (0 less -> toybox would act like more, not like less. (Unless you configured more out but left less in, then it should behave like less.) I note that "more" is from the days of daisy wheel teletypes, and was thus designed to work ok without a tty or interaction through cursor keys (you can export $COLUMNS and $LINES or just let it guess 80x25), and "less" requires a tty and cursor keys. This might make "more" a better fit for on-screen keyboards that don't provide cursor keys. (Or not...) I would _like_ to have one implementation sharing code. Implementing "less -R" cuts the behavior delta between the two, and having an option to let ctrl-c exit less (instead of just killing the rest of the pipeline) probably gets us close enough we to handwave the rest? I need to genericize my watch.c code to share the cursor tracking with less. Possibly keep a scrollback buffer. Except there's still some extension because watch.c doesn't let you cursor left and right... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] mount: avoid deferencing NULL.
On 3/20/24 16:07, enh via Toybox wrote: > I don't know why I wasn't seeing this yesterday Because /sys was mounted, so readfile() returned a string with its contents. (And/or race condition of the mount going away between reading /proc/mounts and asking for follow-up data about a specific mount point from sysfs.) Sigh, I initialized ss to "" so I could just printf("%s", ss) without testing, but readfile() returns NULL when the file doesn't exist and I overwrite it in place because I didn't want to juggle through a THIRD variable (mostly because I'm out of convenient names for them), and I missed an else setting it BACK to "" in the NULL case. Adding the one test doesn't fix printf() calling null, which segfaults on some libcs. Lemme put the else in... The real design failure here is that if the readfile() returns an empty string we won't free it, but that should never happen, the amount of memory leaked would be trivial and the command exits at the end of the list. Hmmm... well, I COULD move the s = xabspath(mm->device, 0) down to the end of the if (*s == '/') and then use THAT as my third variable... Ok, I rewrote the code to use three varaibles and thus leave the "" in ss when it doesn't have reason to change it. (Single Point of truth, setting it BACK to "" and thus having two "" constants was icky. Yeah, tiny flaw but _I_ saw it.) Commit d298747580c7 and once again I've only tested the "file exists" path, I'm not unmounting sysfs on my work laptop and haven't got a convenient test vm I can loopback mount a filesystem image in at the moment. (The devuan install iso image I've been using is a bit big to stick in initramfs...) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] chattr.test: awk -> cut so mkroot can run it
On 3/20/24 15:20, Oliver Webb via Toybox wrote: > Patch does what it says on the tin. First thing I caught while doing a test > of all commands > in mkroot chattr fails all tests on my system (A ton of "Operation not > permitted" errors, > on ext4), but the failures are consistent with TEST_HOST so I guess chattr > doing what it's > supposed to? (Yes, I ran it as root) The .test file will need a rewrite > eventually but right > now I'm just trying to get all tests to run under mkroot Elliott keeps sending me patches to remove bashisms from the test suite so it works under mksh, which I was intentionally leaving in because I intend to implement that before 1.0 and wanted to dogfood it. I have a file of the ones that were removed so I can put them BACK at some point... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [RFC] mkroot: Possible solution to running tests in a vacum: Use the host bash in a chroot
On 3/20/24 12:38, Oliver Webb via Toybox wrote: > A target for the 0.9 release is the test suite running under mkroot, On all the architectures mkroot supports (endianness, word size, kernel version), under qemu with a known kernel environment so we can test things like insmod with known modules, or test ifconfig and hostname without destabilizing my development laptop. > Which is also required > for passwd to be re-promoted (We need to test it in a vacuum). Eh, I can test that manually for one release. My problem is I keep getting distracted by tangents. The "create changelog" todo item made it as far as commit 40e73a387329 which has a pending TODO item I tried to fix (refill toybuf to try to span EXIF data when file is identifying JPEG files), I need to instead WRITE THAT DOWN (and leave it unfixed for now) and continue to the end of the list so I _have_ a current changelog and CAN cut a release... but haven't yet. > The main downside of this is that you have to look for the dynamic libraries > bash wants and > copy them into the fs directory, The code I wrote to do that way back when was something like: https://git.busybox.net/busybox/commit/?id=3a324754f88b I.E. recursively call ldd to see what its dependencies are, repeat until you run out of dependencies. I _can_ make this work. It's just not the direction I wanted to go in. and doing a chroot requires root permissions. Also it is very > clearly not a permanent fix (None of this is needed once toysh is ready), > just enough to get > tests for commands like passwd and chsh running. Another downside of > chroot-ing is you can't > emulate things that depend on drivers or nommu. > > Attached is a mkroot package (Not a patch), that sets up a environment to run > the test suite > under a chroot in. (./mkroot/mkroot.sh testwhost && sudo chroot root/host/fs > /test command_name). > It's not something I'm actually expecting to be merged, but that doesn't mean > it's not potentially > useful for testing the commands that modify /etc/passwd and friends. I was setting up a debootstrap to test it under, since that's presumably isolated enough, but last time I sat down to poke at that I got distracted into the Orange Pi 3b server setup which is the _other_ consumer of a debootstrap I have lying around, and then I went "too much for now but I can at least do the testing under a qemu-system-arm64 with devuan arm64 debootstrap" and hit the fact that trying to marshall a tarball into mkroot using "wget | tar xpv" spat out endless unexpected EOF files because the tarball autodetection logic had a regression and the child process thinks it's the parent process. Still have a tab open for that, trying to dig back down to fix it, been distracted by external pokes instead. > Also when making this I spotted some things in the build infrastructure we > will need to work around > in a airlock-ed test suite, test.sh needs configure, Only for single builds, not for testing all of toybox. And presumably I should add a TEST_EXISTING=1 to skip the single build and just grab the command out of the current directory and/or $PATH. (There's always more work to do on the test suite...) > and portability.sh needs something for CC or > else it will throw a fit. I know, one of my trees has a partial patch for it, but there's some design work... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net