Re: [Toybox] How can I contribute

2019-10-17 Thread Rob Landley
On 10/17/19 3:14 PM, Ryan Prichard wrote:
> I thought the order was the other way around (base character, then combining
> character). I tested it with:

The downside of "I had it backwards" is when you _do_ start to remember the
correct way you reverse it again...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

2019-10-17 Thread Rob Landley
I have various torture test utf8 in test/files/utf8. To see a trailing combining
character eat a following space, you can do:

$ head -c 5 tests/files/utf8/test3.txt; echo ' '

Or

$ head -c 5 tests/files/utf8/test1.txt; echo 'X'

The problem is, it's easy to see visually, but I'm not quite sure how an
automated test is expected to care? (Other than going "it produced the right
output"...

The OTHER fun thing is that -C and -x care about terminal size, or default to 80
columns when they can't determine it. In theory there's an ls -w option that
lets you set the width, but I never needed/implemented it. (Andrew: you're
welcome to if you like. I could walk you through it or you could read
http://landley.net/toybox/code.html#lib_args and try to work it out yourself.)

So if you did want to test this, you'd want to "touch" a test file with a funky
utf8 name, make another test file with a long enough name to force minimal
spacing, and then... do what? You'd just be proving they did or didn't fit on
the same line. (Basically reproducing known/expected output. If they had one
space between, these two would fit, with a minimum of two spaces, they'll wrap.)

What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you 
do:

$ mkdir sub
$ cd sub
$ touch $(cat ../tests/files/utf8/test3.txt)
$ toybox ls

The result fits on one line, when the file was 198 bytes long. (Note: if you
don't put quotes around the "$()" it'll split it into 5 words because whitespace
in the sentence.

That said:
$ wc -m ../tests/files/utf8/test3.txt
110 ../tests/files/utf8/test3.txt

And I _visually_ count that as 22 characters. (That's the debian wc, not toybox
saying that. Toybox matches debian here, but it seems conceptually wrong. It's
like we want another option measuring combined characters. Maybe -M or 
something?)

But testing how your terminal displays the result for "ls bridged the space or
they're still visually distinct"...? I have no idea how to automate short of
doing a screen capture in a known font and comparing the bitmap, which isn't
remotely realistic. :)

(There are, sadly, a lot of tests I know how to _perform_ but not _automate_.
And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.)

Rob

On 10/17/19 10:44 AM, enh via Toybox wrote:
> new tests? (including the difficult combining-character case?)
> 
> On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic  wrote:
>>
>> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
>> From: Andrew Ilijic 
>> Date: Thu, 17 Oct 2019 11:03:19 -0400
>> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces
>>
>> We need two spaces between filenames because that is the convention
>> followed by other implementations. More importantly, if we do not have
>> two spaces, certain Unicode file names cause filenames to run together.
>> In Unicode, combining characters come before the character they modify.
>> If a filename ends in a combining character, the combining character
>> attaches to the space that follows it, causing the space not to be
>> visible. Having a two-space gap stops the above issue from happening.
>>
>> For context and a bit more information, see mailing list link below.
>> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
>> ---
>>  toys/posix/ls.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/toys/posix/ls.c b/toys/posix/ls.c
>> index 0956902f..d4c0211a 100644
>> --- a/toys/posix/ls.c
>> +++ b/toys/posix/ls.c
>> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
>>memset(colsizes, 0, columns*sizeof(unsigned));
>>for (ul=0; ul>  entrylen(sort[next_column(ul, dtlen, columns, )], len);
>> -*len += totpad+1;
>> +// Add `2` to `totpad` to ensure two spaces between filenames
>> +*len += totpad+2;
>>  if (c == columns) break;
>>  // Expand this column if necessary, break if that puts us over 
>> budget
>>  if (*len > colsizes[c]) {
>> --
>> 2.11.0
>> ___
>> Toybox mailing list
>> Toybox@lists.landley.net
>> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
> 
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Android sitrep

2019-10-17 Thread Denys Nykula
> the tar "no ../ files outside this directory" check is
> false positiving on "./" for some reason, gotta track that down.

Here's a reproducible toy tar crash, might or might not be connected.

git clone https://github.com/landley/toybox
wget http://musl.cc/x86_64-linux-musl-native.tgz
rm -r root.tar toybox/root x86_64-linux-musl-native
(cd toybox; make defconfig root)
toybox/toybox tar xf x86_64-linux-musl-native.tgz
(cd toybox/root/*/*fs; ../../../toybox tar c *)>root.tar
cd x86_64-linux-musl-native &&../toybox/toybox tar xvf ../root.tar
# ...
# tar: can't remove: bin: Is a directory
# tar: can't remove: lib: Is a directory
# tar: 'usr/' not under '/path/to/x86_64-linux-musl-native'
# Segmentation fault

First two errors are correct, refusing to overwrite an existing dir with
a symlink and moving on, remembering to make a bad exit code later. But
then you hit an existing symlink to a dir. Other implementations replace
the symlink with the dir and keep unpacking files there. Toy tar doesn't
overwrite, says odd message, starts extracting children of a dir that
hasn't been created, and crashes. Full output below.

bin
tar: can't remove: bin: Is a directory
dev/
etc/
etc/passwd
etc/group
etc/resolv.conf
home/
init
lib
tar: can't remove: lib: Is a directory
mnt/
proc/
root/
sbin
sys/
tmp/
usr/
tar: 'usr/' not under '/path/to/x86_64-linux-musl-native'
usr/lib/
usr/sbin/
Segmentation fault
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] How can I contribute

2019-10-17 Thread Ryan Prichard via Toybox
On Wed, Oct 16, 2019 at 3:27 PM Rob Landley  wrote:

> Right now toys/posix/ls.c only puts one space between filenames in -C or
> -x mode
> (which is the default output), and I have a todo item to increase that to 2
> spaces (which is what other implementations do). The reason is I
> misunderstood
> how unicode worked and it turns out combining characters don't come
> _after_ the
> character they combine with but before, which means a filename that ends
> with a
> combining character will attach to the space after the filename, and thus
> make
> two filenames visually run together unless you have a two space gap. [...]


I thought the order was the other way around (base character, then
combining character). I tested it with:

printf 'o\xcc\x81o\n' >A.txt
google-chrome A.txt

I see: óo

I also tried printf 'x \xcc\x81a\n', and I sometimes see "x á", but the
terminal seems to think the accent is above the space (e.g. selecting 'a'
doesn't select the accent). The accent appears above the space if I use a
non-monospace font in a browser.

-Ryan
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] What am I doing wrong with ndk-r20?

2019-10-17 Thread enh via Toybox
r21beta1 is out now, if you're looking for new things to break :-)

https://android-developers.googleblog.com/2019/10/introducing-ndk-r21-our-first-long-term.html

On Wed, Oct 9, 2019 at 9:27 AM enh  wrote:
>
> On Wed, Oct 9, 2019 at 9:08 AM Rob Landley  wrote:
> >
> > On 10/7/19 3:54 PM, enh wrote:
> > > d'oh. that seems to be a real bug... __ANDROID_NDK__ went missing
> > > between r19 and r20. congratulations on being the first person to
> > > notice!
> >
> > Most people aren't building as part of AOSP _and_ with the NDK. :)
>
> well, depends what you mean by "with the NDK". the tricky part in the
> platform is whether building for the platform but with a lower target
> API level should count as "NDK" or not. that's why the plan is to put
> __ANDROID_NDK__ (which really just means "built in a context where it
> was targeting a published API level") back even though we also have
> __NDK_MAJOR__ and friends which tell you exactly what version of the
> NDK you're being built with (or aren't defined at all if you're being
> built with the platform). similar to how there's __ANDROID__ versus
> __BIONIC__ versus __linux__ depending on exactly what you're trying to
> say.
>
> > > it looks like the removal was my fault, though i can't understand why
> > > i did it now (because there wasn't nearly enough detail in my commit
> > > message).
> > >
> > > https://android-review.googlesource.com/c/platform/bionic/+/1135146
> > > reverts the change that broke this.
> >
> > Sigh, should I add that to the compile time probes to work with the exiting 
> > NDK,
> > or is there a new release soon?
>
> well, r21beta1 was supposed to be tomorrow, but we're having Windows
> issues. but, yeah, pretty soon.
>
> if you want a quick hack, and don't care about _old_ versions of the
> NDK (which you don't, because they can't build toybox anyway), you
> could replace __ANDROID_NDK__ with __NDK_MAJOR__. that should fix r20
> and also work for r19 and r21 (and the OS, because it won't be set
> there, which is intended in this case).
>
> > Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] How can I contribute

2019-10-17 Thread Jarno Mäkipää
On Thu, Oct 17, 2019 at 1:27 AM Rob Landley  wrote:
>
> On 10/14/19 3:07 PM, Denys Nykula wrote:
> >> saw one of the Toybox talks and wanted to see how I could contribute
> >
> > Completing landley.net/toybox/cleanup.html on toys/pending/{dhcp,route}.c
> > to help them out of pending would be most demanded I think.
>
> That's kind of a high bar, though. :)
>
> Right now toys/posix/ls.c only puts one space between filenames in -C or -x 
> mode
> (which is the default output), and I have a todo item to increase that to 2
> spaces (which is what other implementations do). The reason is I misunderstood
> how unicode worked and it turns out combining characters don't come _after_ 
> the
> character they combine with but before, which means a filename that ends with 
> a
> combining character will attach to the space after the filename, and thus make
> two filenames visually run together unless you have a two space gap. The loop
> that needs adjusting is probably the one starting around line 401, I don't
> _think_ tests/ls.test cares (since it has to pass with TEST_HOST and the host
> version generally uses different spacing anyway), but make sure that still 
> passes.

If terminal emulator behaves correctly, such as UXTerm/Xterm does.
Combining chars dont render in space after, but on same area as
previous char. So 'ls' implementation even with on space only between
filenames are readable. That been said most other terminals dont
render them correctly. I tested this on xterm, uxterm, st, alacritty,
termite, terminator (I have lots of terminals installed because i
were testing CSI escapes for vi)

with terminator even the two spaces are not enough between zalgo text
filenames... :)

but long story short this problem is mostly terminal emulators
problem, but having 2 spaces between filenames brings consistency with
other implementations anyway.

-Jarno




>
> Or, I just taught xargs to ignore -P but it would be nice if somebody actually
> implemented that instead of ignoring it.
>
> Rob
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] vi: changes to buffer drawing

2019-10-17 Thread Jarno Mäkipää
I dare to say that crunch_str calculates width right already. if i do

void hello_main(void)
{
  char* str = "lll\xcc\xb4\xcc\x97\n";
  char* end = str;
  int width;

  width = crunch_str(, 3, stdout, 0,0);
  xprintf("\nwidth: %d\n",width);
  end = str;
  width = crunch_str(, 3, 0, 0,0);
  xprintf("width: %d\n",width);
  xprintf("len: %d\n",end-str);
}

I get output
jarno@Snow:~/work/src/toybox/toybox$ ./hello
lll̴̗
width: 3
width: 3
len: 7

And if i use xterm combining chars are rendered on top of last ascii
char as they are supposed to.

crunch_str loop finished only if

if (width-columns wrote:
>
> n 9/19/19 2:52 PM, Jarno Mäkipää wrote:
> > Actually I think that current crunch_str prints trailing zero width
> > combining chars just fine?
> >
> > since when width==columns its still >= 0
> >
> > .
> > for (end = start = *str; *end; columns += col, end += bytes) {
> > wchar_t wc;
> >
> > if ((bytes = utf8towc(, end, 4))>0 && (col = wcwidth(wc))>=0) {
> >   if (!escmore || wc>255 || !strchr(escmore, wc)) {
> > if (width-columns > <--col is 0 when U-0x300-0x36f
> > if (out) fwrite(end, bytes, 1, out);
> >
> > continue;
> >   }
> > }
> > ..
> >
>
> The problem is when you ask it how many bytes of input will fit in a given
> number of columns (or to print up to this many columns), it doesn't give the
> trailing combining characters. It stops right after the last printing 
> character
> that fits.
>
> And the callers need to be adjusted to ask "how many bytes will fit into 0
> chars" so they can add combining characters to an existing column when
> characters are coming in incrementally (from some outside source you're 
> reading
> in chunks, or which is delivering individual characters like serial ports), 
> and
> they've filled up the space _but_ there may be more combining characters 
> coming
> in in future that attach to the existing space.
>
> > And yeah UTF-8 is good because it was originally written on napkin at
> > dinner table
> > by Ken Thompson and Rob Pike. Unicode on the other hand... not written
> > in napkin.
>
> Unicode is insane. When characters come in incrementally, you have to redraw 
> the
> same glyph repeatedly because you _can't_ know when you're done until you 
> overshoot.
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

2019-10-17 Thread enh via Toybox
new tests? (including the difficult combining-character case?)

On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic  wrote:
>
> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
> From: Andrew Ilijic 
> Date: Thu, 17 Oct 2019 11:03:19 -0400
> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces
>
> We need two spaces between filenames because that is the convention
> followed by other implementations. More importantly, if we do not have
> two spaces, certain Unicode file names cause filenames to run together.
> In Unicode, combining characters come before the character they modify.
> If a filename ends in a combining character, the combining character
> attaches to the space that follows it, causing the space not to be
> visible. Having a two-space gap stops the above issue from happening.
>
> For context and a bit more information, see mailing list link below.
> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
> ---
>  toys/posix/ls.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/toys/posix/ls.c b/toys/posix/ls.c
> index 0956902f..d4c0211a 100644
> --- a/toys/posix/ls.c
> +++ b/toys/posix/ls.c
> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
>memset(colsizes, 0, columns*sizeof(unsigned));
>for (ul=0; ul  entrylen(sort[next_column(ul, dtlen, columns, )], len);
> -*len += totpad+1;
> +// Add `2` to `totpad` to ensure two spaces between filenames
> +*len += totpad+2;
>  if (c == columns) break;
>  // Expand this column if necessary, break if that puts us over budget
>  if (*len > colsizes[c]) {
> --
> 2.11.0
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [PATCH] ls: Ensure file names are separated by 2 spaces

2019-10-17 Thread Andrew Ilijic
>From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
From: Andrew Ilijic 
Date: Thu, 17 Oct 2019 11:03:19 -0400
Subject: [PATCH] ls: Ensure file names are separated by 2 spaces

We need two spaces between filenames because that is the convention
followed by other implementations. More importantly, if we do not have
two spaces, certain Unicode file names cause filenames to run together.
In Unicode, combining characters come before the character they modify.
If a filename ends in a combining character, the combining character
attaches to the space that follows it, causing the space not to be
visible. Having a two-space gap stops the above issue from happening.

For context and a bit more information, see mailing list link below.
https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
---
 toys/posix/ls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/toys/posix/ls.c b/toys/posix/ls.c
index 0956902f..d4c0211a 100644
--- a/toys/posix/ls.c
+++ b/toys/posix/ls.c
@@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
   memset(colsizes, 0, columns*sizeof(unsigned));
   for (ul=0; ul colsizes[c]) {
-- 
2.11.0
From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001
From: Andrew Ilijic 
Date: Thu, 17 Oct 2019 11:03:19 -0400
Subject: [PATCH] ls: Ensure file names are separated by 2 spaces

We need two spaces between filenames because that is the convention
followed by other implementations. More importantly, if we do not have
two spaces, certain Unicode file names cause filenames to run together.
In Unicode, combining characters come before the character they modify.
If a filename ends in a combining character, the combining character
attaches to the space that follows it, causing the space not to be
visible. Having a two-space gap stops the above issue from happening.

For context and a bit more information, see mailing list link below.
https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html
---
 toys/posix/ls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/toys/posix/ls.c b/toys/posix/ls.c
index 0956902f..d4c0211a 100644
--- a/toys/posix/ls.c
+++ b/toys/posix/ls.c
@@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir)
   memset(colsizes, 0, columns*sizeof(unsigned));
   for (ul=0; ul colsizes[c]) {
-- 
2.11.0
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] How can I contribute

2019-10-17 Thread Andrew Ilijic
Thank you Rob,

I'll start by taking a look at the `ls` command and let you know how I make out.

On Wed, Oct 16, 2019 at 6:27 PM Rob Landley  wrote:
>
> On 10/14/19 3:07 PM, Denys Nykula wrote:
> >> saw one of the Toybox talks and wanted to see how I could contribute
> >
> > Completing landley.net/toybox/cleanup.html on toys/pending/{dhcp,route}.c
> > to help them out of pending would be most demanded I think.
>
> That's kind of a high bar, though. :)
>
> Right now toys/posix/ls.c only puts one space between filenames in -C or -x 
> mode
> (which is the default output), and I have a todo item to increase that to 2
> spaces (which is what other implementations do). The reason is I misunderstood
> how unicode worked and it turns out combining characters don't come _after_ 
> the
> character they combine with but before, which means a filename that ends with 
> a
> combining character will attach to the space after the filename, and thus make
> two filenames visually run together unless you have a two space gap. The loop
> that needs adjusting is probably the one starting around line 401, I don't
> _think_ tests/ls.test cares (since it has to pass with TEST_HOST and the host
> version generally uses different spacing anyway), but make sure that still 
> passes.
>
> Or, I just taught xargs to ignore -P but it would be nice if somebody actually
> implemented that instead of ignoring it.
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net