Re: [Toybox] How can I contribute
On 10/17/19 3:14 PM, Ryan Prichard wrote: > I thought the order was the other way around (base character, then combining > character). I tested it with: The downside of "I had it backwards" is when you _do_ start to remember the correct way you reverse it again... Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces
I have various torture test utf8 in test/files/utf8. To see a trailing combining character eat a following space, you can do: $ head -c 5 tests/files/utf8/test3.txt; echo ' ' Or $ head -c 5 tests/files/utf8/test1.txt; echo 'X' The problem is, it's easy to see visually, but I'm not quite sure how an automated test is expected to care? (Other than going "it produced the right output"... The OTHER fun thing is that -C and -x care about terminal size, or default to 80 columns when they can't determine it. In theory there's an ls -w option that lets you set the width, but I never needed/implemented it. (Andrew: you're welcome to if you like. I could walk you through it or you could read http://landley.net/toybox/code.html#lib_args and try to work it out yourself.) So if you did want to test this, you'd want to "touch" a test file with a funky utf8 name, make another test file with a long enough name to force minimal spacing, and then... do what? You'd just be proving they did or didn't fit on the same line. (Basically reproducing known/expected output. If they had one space between, these two would fit, with a minimum of two spaces, they'll wrap.) What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do: $ mkdir sub $ cd sub $ touch $(cat ../tests/files/utf8/test3.txt) $ toybox ls The result fits on one line, when the file was 198 bytes long. (Note: if you don't put quotes around the "$()" it'll split it into 5 words because whitespace in the sentence. That said: $ wc -m ../tests/files/utf8/test3.txt 110 ../tests/files/utf8/test3.txt And I _visually_ count that as 22 characters. (That's the debian wc, not toybox saying that. Toybox matches debian here, but it seems conceptually wrong. It's like we want another option measuring combined characters. Maybe -M or something?) But testing how your terminal displays the result for "ls bridged the space or they're still visually distinct"...? I have no idea how to automate short of doing a screen capture in a known font and comparing the bitmap, which isn't remotely realistic. :) (There are, sadly, a lot of tests I know how to _perform_ but not _automate_. And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.) Rob On 10/17/19 10:44 AM, enh via Toybox wrote: > new tests? (including the difficult combining-character case?) > > On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic wrote: >> >> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001 >> From: Andrew Ilijic >> Date: Thu, 17 Oct 2019 11:03:19 -0400 >> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces >> >> We need two spaces between filenames because that is the convention >> followed by other implementations. More importantly, if we do not have >> two spaces, certain Unicode file names cause filenames to run together. >> In Unicode, combining characters come before the character they modify. >> If a filename ends in a combining character, the combining character >> attaches to the space that follows it, causing the space not to be >> visible. Having a two-space gap stops the above issue from happening. >> >> For context and a bit more information, see mailing list link below. >> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html >> --- >> toys/posix/ls.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/toys/posix/ls.c b/toys/posix/ls.c >> index 0956902f..d4c0211a 100644 >> --- a/toys/posix/ls.c >> +++ b/toys/posix/ls.c >> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir) >>memset(colsizes, 0, columns*sizeof(unsigned)); >>for (ul=0; ul> entrylen(sort[next_column(ul, dtlen, columns, )], len); >> -*len += totpad+1; >> +// Add `2` to `totpad` to ensure two spaces between filenames >> +*len += totpad+2; >> if (c == columns) break; >> // Expand this column if necessary, break if that puts us over >> budget >> if (*len > colsizes[c]) { >> -- >> 2.11.0 >> ___ >> Toybox mailing list >> Toybox@lists.landley.net >> http://lists.landley.net/listinfo.cgi/toybox-landley.net > ___ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net > ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] Android sitrep
> the tar "no ../ files outside this directory" check is > false positiving on "./" for some reason, gotta track that down. Here's a reproducible toy tar crash, might or might not be connected. git clone https://github.com/landley/toybox wget http://musl.cc/x86_64-linux-musl-native.tgz rm -r root.tar toybox/root x86_64-linux-musl-native (cd toybox; make defconfig root) toybox/toybox tar xf x86_64-linux-musl-native.tgz (cd toybox/root/*/*fs; ../../../toybox tar c *)>root.tar cd x86_64-linux-musl-native &&../toybox/toybox tar xvf ../root.tar # ... # tar: can't remove: bin: Is a directory # tar: can't remove: lib: Is a directory # tar: 'usr/' not under '/path/to/x86_64-linux-musl-native' # Segmentation fault First two errors are correct, refusing to overwrite an existing dir with a symlink and moving on, remembering to make a bad exit code later. But then you hit an existing symlink to a dir. Other implementations replace the symlink with the dir and keep unpacking files there. Toy tar doesn't overwrite, says odd message, starts extracting children of a dir that hasn't been created, and crashes. Full output below. bin tar: can't remove: bin: Is a directory dev/ etc/ etc/passwd etc/group etc/resolv.conf home/ init lib tar: can't remove: lib: Is a directory mnt/ proc/ root/ sbin sys/ tmp/ usr/ tar: 'usr/' not under '/path/to/x86_64-linux-musl-native' usr/lib/ usr/sbin/ Segmentation fault ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] How can I contribute
On Wed, Oct 16, 2019 at 3:27 PM Rob Landley wrote: > Right now toys/posix/ls.c only puts one space between filenames in -C or > -x mode > (which is the default output), and I have a todo item to increase that to 2 > spaces (which is what other implementations do). The reason is I > misunderstood > how unicode worked and it turns out combining characters don't come > _after_ the > character they combine with but before, which means a filename that ends > with a > combining character will attach to the space after the filename, and thus > make > two filenames visually run together unless you have a two space gap. [...] I thought the order was the other way around (base character, then combining character). I tested it with: printf 'o\xcc\x81o\n' >A.txt google-chrome A.txt I see: óo I also tried printf 'x \xcc\x81a\n', and I sometimes see "x á", but the terminal seems to think the accent is above the space (e.g. selecting 'a' doesn't select the accent). The accent appears above the space if I use a non-monospace font in a browser. -Ryan ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] What am I doing wrong with ndk-r20?
r21beta1 is out now, if you're looking for new things to break :-) https://android-developers.googleblog.com/2019/10/introducing-ndk-r21-our-first-long-term.html On Wed, Oct 9, 2019 at 9:27 AM enh wrote: > > On Wed, Oct 9, 2019 at 9:08 AM Rob Landley wrote: > > > > On 10/7/19 3:54 PM, enh wrote: > > > d'oh. that seems to be a real bug... __ANDROID_NDK__ went missing > > > between r19 and r20. congratulations on being the first person to > > > notice! > > > > Most people aren't building as part of AOSP _and_ with the NDK. :) > > well, depends what you mean by "with the NDK". the tricky part in the > platform is whether building for the platform but with a lower target > API level should count as "NDK" or not. that's why the plan is to put > __ANDROID_NDK__ (which really just means "built in a context where it > was targeting a published API level") back even though we also have > __NDK_MAJOR__ and friends which tell you exactly what version of the > NDK you're being built with (or aren't defined at all if you're being > built with the platform). similar to how there's __ANDROID__ versus > __BIONIC__ versus __linux__ depending on exactly what you're trying to > say. > > > > it looks like the removal was my fault, though i can't understand why > > > i did it now (because there wasn't nearly enough detail in my commit > > > message). > > > > > > https://android-review.googlesource.com/c/platform/bionic/+/1135146 > > > reverts the change that broke this. > > > > Sigh, should I add that to the compile time probes to work with the exiting > > NDK, > > or is there a new release soon? > > well, r21beta1 was supposed to be tomorrow, but we're having Windows > issues. but, yeah, pretty soon. > > if you want a quick hack, and don't care about _old_ versions of the > NDK (which you don't, because they can't build toybox anyway), you > could replace __ANDROID_NDK__ with __NDK_MAJOR__. that should fix r20 > and also work for r19 and r21 (and the OS, because it won't be set > there, which is intended in this case). > > > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] How can I contribute
On Thu, Oct 17, 2019 at 1:27 AM Rob Landley wrote: > > On 10/14/19 3:07 PM, Denys Nykula wrote: > >> saw one of the Toybox talks and wanted to see how I could contribute > > > > Completing landley.net/toybox/cleanup.html on toys/pending/{dhcp,route}.c > > to help them out of pending would be most demanded I think. > > That's kind of a high bar, though. :) > > Right now toys/posix/ls.c only puts one space between filenames in -C or -x > mode > (which is the default output), and I have a todo item to increase that to 2 > spaces (which is what other implementations do). The reason is I misunderstood > how unicode worked and it turns out combining characters don't come _after_ > the > character they combine with but before, which means a filename that ends with > a > combining character will attach to the space after the filename, and thus make > two filenames visually run together unless you have a two space gap. The loop > that needs adjusting is probably the one starting around line 401, I don't > _think_ tests/ls.test cares (since it has to pass with TEST_HOST and the host > version generally uses different spacing anyway), but make sure that still > passes. If terminal emulator behaves correctly, such as UXTerm/Xterm does. Combining chars dont render in space after, but on same area as previous char. So 'ls' implementation even with on space only between filenames are readable. That been said most other terminals dont render them correctly. I tested this on xterm, uxterm, st, alacritty, termite, terminator (I have lots of terminals installed because i were testing CSI escapes for vi) with terminator even the two spaces are not enough between zalgo text filenames... :) but long story short this problem is mostly terminal emulators problem, but having 2 spaces between filenames brings consistency with other implementations anyway. -Jarno > > Or, I just taught xargs to ignore -P but it would be nice if somebody actually > implemented that instead of ignoring it. > > Rob > ___ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] vi: changes to buffer drawing
I dare to say that crunch_str calculates width right already. if i do void hello_main(void) { char* str = "lll\xcc\xb4\xcc\x97\n"; char* end = str; int width; width = crunch_str(, 3, stdout, 0,0); xprintf("\nwidth: %d\n",width); end = str; width = crunch_str(, 3, 0, 0,0); xprintf("width: %d\n",width); xprintf("len: %d\n",end-str); } I get output jarno@Snow:~/work/src/toybox/toybox$ ./hello lll̴̗ width: 3 width: 3 len: 7 And if i use xterm combining chars are rendered on top of last ascii char as they are supposed to. crunch_str loop finished only if if (width-columns wrote: > > n 9/19/19 2:52 PM, Jarno Mäkipää wrote: > > Actually I think that current crunch_str prints trailing zero width > > combining chars just fine? > > > > since when width==columns its still >= 0 > > > > . > > for (end = start = *str; *end; columns += col, end += bytes) { > > wchar_t wc; > > > > if ((bytes = utf8towc(, end, 4))>0 && (col = wcwidth(wc))>=0) { > > if (!escmore || wc>255 || !strchr(escmore, wc)) { > > if (width-columns > <--col is 0 when U-0x300-0x36f > > if (out) fwrite(end, bytes, 1, out); > > > > continue; > > } > > } > > .. > > > > The problem is when you ask it how many bytes of input will fit in a given > number of columns (or to print up to this many columns), it doesn't give the > trailing combining characters. It stops right after the last printing > character > that fits. > > And the callers need to be adjusted to ask "how many bytes will fit into 0 > chars" so they can add combining characters to an existing column when > characters are coming in incrementally (from some outside source you're > reading > in chunks, or which is delivering individual characters like serial ports), > and > they've filled up the space _but_ there may be more combining characters > coming > in in future that attach to the existing space. > > > And yeah UTF-8 is good because it was originally written on napkin at > > dinner table > > by Ken Thompson and Rob Pike. Unicode on the other hand... not written > > in napkin. > > Unicode is insane. When characters come in incrementally, you have to redraw > the > same glyph repeatedly because you _can't_ know when you're done until you > overshoot. > > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces
new tests? (including the difficult combining-character case?) On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic wrote: > > From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001 > From: Andrew Ilijic > Date: Thu, 17 Oct 2019 11:03:19 -0400 > Subject: [PATCH] ls: Ensure file names are separated by 2 spaces > > We need two spaces between filenames because that is the convention > followed by other implementations. More importantly, if we do not have > two spaces, certain Unicode file names cause filenames to run together. > In Unicode, combining characters come before the character they modify. > If a filename ends in a combining character, the combining character > attaches to the space that follows it, causing the space not to be > visible. Having a two-space gap stops the above issue from happening. > > For context and a bit more information, see mailing list link below. > https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html > --- > toys/posix/ls.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/toys/posix/ls.c b/toys/posix/ls.c > index 0956902f..d4c0211a 100644 > --- a/toys/posix/ls.c > +++ b/toys/posix/ls.c > @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir) >memset(colsizes, 0, columns*sizeof(unsigned)); >for (ul=0; ul entrylen(sort[next_column(ul, dtlen, columns, )], len); > -*len += totpad+1; > +// Add `2` to `totpad` to ensure two spaces between filenames > +*len += totpad+2; > if (c == columns) break; > // Expand this column if necessary, break if that puts us over budget > if (*len > colsizes[c]) { > -- > 2.11.0 > ___ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces
>From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001 From: Andrew Ilijic Date: Thu, 17 Oct 2019 11:03:19 -0400 Subject: [PATCH] ls: Ensure file names are separated by 2 spaces We need two spaces between filenames because that is the convention followed by other implementations. More importantly, if we do not have two spaces, certain Unicode file names cause filenames to run together. In Unicode, combining characters come before the character they modify. If a filename ends in a combining character, the combining character attaches to the space that follows it, causing the space not to be visible. Having a two-space gap stops the above issue from happening. For context and a bit more information, see mailing list link below. https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html --- toys/posix/ls.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/toys/posix/ls.c b/toys/posix/ls.c index 0956902f..d4c0211a 100644 --- a/toys/posix/ls.c +++ b/toys/posix/ls.c @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir) memset(colsizes, 0, columns*sizeof(unsigned)); for (ul=0; ul colsizes[c]) { -- 2.11.0 From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001 From: Andrew Ilijic Date: Thu, 17 Oct 2019 11:03:19 -0400 Subject: [PATCH] ls: Ensure file names are separated by 2 spaces We need two spaces between filenames because that is the convention followed by other implementations. More importantly, if we do not have two spaces, certain Unicode file names cause filenames to run together. In Unicode, combining characters come before the character they modify. If a filename ends in a combining character, the combining character attaches to the space that follows it, causing the space not to be visible. Having a two-space gap stops the above issue from happening. For context and a bit more information, see mailing list link below. https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html --- toys/posix/ls.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/toys/posix/ls.c b/toys/posix/ls.c index 0956902f..d4c0211a 100644 --- a/toys/posix/ls.c +++ b/toys/posix/ls.c @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir) memset(colsizes, 0, columns*sizeof(unsigned)); for (ul=0; ul colsizes[c]) { -- 2.11.0 ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] How can I contribute
Thank you Rob, I'll start by taking a look at the `ls` command and let you know how I make out. On Wed, Oct 16, 2019 at 6:27 PM Rob Landley wrote: > > On 10/14/19 3:07 PM, Denys Nykula wrote: > >> saw one of the Toybox talks and wanted to see how I could contribute > > > > Completing landley.net/toybox/cleanup.html on toys/pending/{dhcp,route}.c > > to help them out of pending would be most demanded I think. > > That's kind of a high bar, though. :) > > Right now toys/posix/ls.c only puts one space between filenames in -C or -x > mode > (which is the default output), and I have a todo item to increase that to 2 > spaces (which is what other implementations do). The reason is I misunderstood > how unicode worked and it turns out combining characters don't come _after_ > the > character they combine with but before, which means a filename that ends with > a > combining character will attach to the space after the filename, and thus make > two filenames visually run together unless you have a two space gap. The loop > that needs adjusting is probably the one starting around line 401, I don't > _think_ tests/ls.test cares (since it has to pass with TEST_HOST and the host > version generally uses different spacing anyway), but make sure that still > passes. > > Or, I just taught xargs to ignore -P but it would be nice if somebody actually > implemented that instead of ignoring it. > > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net