Re: [Toybox] Countering trusting trust.

2020-12-05 Thread Rob Landley
P.S. If you're wondering why I harp on this so much, and why I vanish from
toybox dev from time to time to do j-core stuff at $DAYJOB:

  https://pluralistic.net/2020/12/05/trusting-trust/#thompsons-devil

(And yes, our model involves decapping chips that come back from the fab to
confirm that what we sent them is what they produced, and yes this means doing
almost all of the low-level fab stuff with _our_ tools, and no the skywater 130
guys haven't even figured out what questions to ask in about 2/3 of the problem
space yet. And that's all _before_ you get into the design of the actually
secure parts...)

Rob

On 7/24/20 11:48 PM, Rob Landley wrote:
> This keeps coming up and I should have a writeup I can just point people at, 
> so:
> 
> 15 years ago when I was maintaining Busybox somebody told me the big NORAD
> display at Cheyene Mountain (as recreated in the movie Wargames) ran busybox,
> which surprised me: I didn't think my code was good enough to defend the 
> country
> from nuclear attack. But they explained they're required to audit every line 
> of
> source for anything running on such highly secure systems, and they'd much
> rather audit a few hundred thousand lines of busybox code than tens of 
> millions
> of lines of corresponding GNU code. This, I understood.
> 
> But it doesn't matter how secure your code is if it's running in a system 
> that's
> already been compromised. The solution is to get a minimal secure base system,
> audit it (have experts read every line of source), and build up from there. At
> the root of any package management tree the dependencies go circular 
> (everything
> depends on everything else), so there's a base set of packages you have to 
> start
> with as a lump or nothing can run. These days, the minimal system to boot to a
> shell prompt is 3 packages (kernel, libc, and application: if you're bootling
> linux to a shell prompt your kernel is linux, your application is toybox, and
> your libc is probably either musl or bionic).
> 
> Of course auditing the output isn't enough because your development tools 
> could
> have been compromised. Creating a new chroot from a machine that's running
> spyware is not very useful. So you make a tiny self-hosting system, which can
> rebuild itself from source code under itself. This is conceptually FOUR
> packages: the kernel libc and toybox above, plus a compiler toolchain (which 
> CAN
> be a single package if you upgrade Fabrice Bellard's tinycc, as I proposed 
> doing
> in my qcc project but have never found time to do).
> 
> My first implementation of this concept was aboriginal linux
> (https://landley.net/aboriginal/about.html) where I got the self-hosting 
> system
> (capable of building Linux From Scratch under the result as proof it could
> natively bootstrap up to arbitrary complexity by downloading and compiling
> source code) down to 7 packages: the kernel was linux, libc was uclibc, the 
> set
> of command line utilities was busybox, the toolchain was 2 packages (just gcc
> and binutils, it hadn't yet metastasized into 5 packages, gone gplv3, and
> rewritten itself in C++), and then I needed 2 more packages (make and bash)
> because the corresponding busybox commands were missing or not yet good 
> enough.
> 
> My new one is based on mkroot (https://landley.net/toybox/faq.html#mkroot) 
> with
> cross and native compilers from musl-cross-make (via scripts/mcm-buildall.sh 
> in
> this source ala https://landley.net/toybox/faq.html#cross). Eventually I'd 
> like
> to implement https://landley.net/qcc and get it down to the theoretical 4
> packages, but it's a work in progress and nobody ever wants to fund this stuff
> (ala https://elinux.org/CELF_Project_Proposal/Combine_tcg_with_tcc) so I can
> only throw scraps of hobby time/energy at it.
> 
> But then the NEXT step of paranoia is Ken Thompson's "trusting trust" attack,
> where the creator of unix modified the early BSD compiler to recognize and 
> hack
> the login program (so the login binary contained an exploit the login.c source
> didn't, a hardwired "ken" account with a fixed password), and then he added a
> SECOND part so the compiler would recognize and hack itself (inserting the
> original exploit for login and the new one for cc) so now the COMPILER binary
> would contain an exploit even when wasn't in the compiler source. Then he
> removed the changes from the compiler source, rebuilt it with the modified
> binary to make sure the exploit propagated from compiler binary to compiler
> binary without being in the source code, and sent it to berkeley so he could
> always log into his students' system. Years later, when the ACM gave him a
> lifetime achievement award, he told this story:
> https://dl.acm.org/doi/pdf/10.1145/358198.358210
> 
> The first defense against this (presented in a PHD thesis
> https://dwheeler.com/trusting-trust/) is "countering trusting trust through
> diverse double compiling", I.E. compile your compiler's source with a 
> 

Re: [Toybox] [PATCH] tr: fix pathological flushing.

2020-12-05 Thread enh via Toybox
On Sat, Dec 5, 2020 at 3:38 AM Rob Landley  wrote:

> On 12/4/20 1:58 PM, enh via Toybox wrote:
> > The AOSP build doesn't use tr (or anything that's still in pending), but
> > the kernel folks have been more aggressive. They found that tr's
> > pathological flushing was adding minutes to their build times.
>
> +  while ((n = read(0, toybuf, sizeof(toybuf {
> +if (!FLAG(d) && !FLAG(s)) {
> +  for (dst = 0; dst < n; dst++) toybuf[dst] = TT.map[toybuf[dst]];
>
> And when read returns -1 what happens?
>

enh@ realizes he thought he'd changed this to xread and sends a patch,
that's what :-)

(patch sent.)

it's a pity there's no /dev/eio or whatever to test these cases explicitly.


> Sounds like time for me to cleanup and promote this command. (Let's see,
> my todo
> entry for tr is: "TODO: -t (truncate) -a (ascii)" and I do NOT remember
> what -a
> is but I think I was going to make tr support utf8? Which required a
> redesign.
> Possibly it should be -U to support utf8 instead but everything ELSE
> supports
> utf8 by default, because planet. Hmmm...)
>

aye, but i don't think anyone makes you _pay_ for unicode (if you ignore
Plan9 and Inferno, both of which did, but you'd have to go a long way to
find anyone else who's ever used either of those).

but more than that i'm not sure anyone else _has_ done this (if you ignore
Plan9 and Inferno, both of which did, but you'd have to go a long way to
find anyone else who's ever used either of those):

~$ echo '동해 물과 백두산이' | tr '동' '東'
東해 欼과 氱摐산이
~$ echo '동해 물과 백두산이' | busybox tr '동' '東'
東해 欼과 氱摐산이
~$

(you'd expect to see '東해 물과 백두산이' if this actually worked: change from the
hangeul for "east" to the hanja for "east" but leave everything else alone.)

xxd confirms it's just screwing up bytes:

~$ echo '동해 물과 백두산이' |  xxd
: eb8f 99ed 95b4 20eb acbc eab3 bc20 ebb0  .. .. ..
0010: b1eb 9190 ec82 b0ec 9db4 0a  ...
~$ echo '동해 물과 백두산이' | tr '동' '東' | xxd
: e69d b1ed 95b4 20e6 acbc eab3 bc20 e6b0  .. .. ..
0010: b1e6 9190 ec82 b0ec 9db4 0a  ...

동 is 0xeb 0x8f 0x99 and the equivalent Chinese character 東 is 0xe6 0x9d
0xb1, and you can see from those hex dumps that what tr did was replace
0xeb with 0xe6, 0x8f with 0x9d, and 0x99 with 0xb1. this did the right
thing by accident for 동 but mangled other characters that contained any of
those bytes.

and although philosophically i'm usually on board with your "all times are
ISO, all text is UTF8", i'm really not sure it makes much sense to even
*try* to support this in tr. why? because i think it opens the i18n/l11n
can of worms again. if you think about non-binary uses of tr, they're often
stuff like "convert to all caps", but are we going to get that right for
Turkish/Azeri dotted/dotless 'i's, Greek final/non-final sigma, etc? are we
going to have tr's behavior then depend on your locale? are we going to
deal with combining characters too, or do i have to specify all the ways
you can write "ö" to get "Freude, schöner Götterfunken" right (because
without a hex dump, neither you nor i know whether those two 'ö's were
encoded the same way)?

amusingly, the Plan9 man page only gave one example of using tr(1), and it
was converting ASCII upper/lower. so i don't think they had any _use_ for
it either, they just wrote everything in terms of runes.

personally i'd s/characters/bytes/ in the docs and call it done. we can
"fix" it if/when anyone has an actual practical need for it.


> > Just removing the fflush() made tr significantly faster for my trivial
> > test, but still slow, with all the time going into stdio.
>
> Single byte writes suck no matter how you slice 'em.
>
> > Rewriting the
> > loop to modify toybuf in place and then do one write per read made most
> > of the difference, but special-casing the "neither -d nor -s" case made
> > a measurable difference too on a Xeon.
>
> Sigh. I have this patch in my tree, which I haven't applied yet because I
> don't
> have the regression test setup to see what it would slow down in the AOSP
> build:
>
> --- a/main.c
> +++ b/main.c
> @@ -103,7 +103,7 @@ void toy_singleinit(struct toy_list *which, char
> *argv[])
>// that choose non-UTF-8 locales. macOS doesn't support C.UTF-8
> though.
>if (!setlocale(LC_CTYPE, "C.UTF-8")) setlocale(LC_CTYPE, "");
>  }
> -setlinebuf(stdout);
> +setvbuf(stdout, 0, (which->flags & TOYFLAG_LINEBUF) ? _IOLBF :
> _IONBF, 0);
>}
>  }
>
> --- a/lib/toyflags.h
> +++ b/lib/toyflags.h
> @@ -32,6 +32,9 @@
>  // Suppress default --help processing
>  #define TOYFLAG_NOHELP   (1<<10)
>
> +// Line buffered stdout
> +#define TOYFLAG_LINEBUF  (1<<11)
> +
>  // Error code to return if argument parsing fails (default 1)
>  #define TOYFLAG_ARGFAIL(x) (x<<24)
>
> --- a/toys/posix/grep.c
> +++ b/toys/posix/grep.c
> @@ -10,9 +10,9 @@
>  * echo hello | grep -f   *
>
> -USE_GREP(NEWTOY(grep,
>
> 

[Toybox] [PATCH] tr: fix behavior if read fails.

2020-12-05 Thread enh via Toybox
---
 toys/pending/tr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
From c4c3f78afe689728c374faca86bcc25ebbaba01a Mon Sep 17 00:00:00 2001
From: Elliott Hughes 
Date: Sat, 5 Dec 2020 10:29:54 -0800
Subject: [PATCH] tr: fix behavior if read fails.

---
 toys/pending/tr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/toys/pending/tr.c b/toys/pending/tr.c
index e68ae464..5c630ccc 100644
--- a/toys/pending/tr.c
+++ b/toys/pending/tr.c
@@ -212,7 +212,7 @@ static void print_map(char *set1, char *set2)
 {
   int n, src, dst, prev = -1;
 
-  while ((n = read(0, toybuf, sizeof(toybuf {
+  while ((n = xread(0, toybuf, sizeof(toybuf {
 if (!FLAG(d) && !FLAG(s)) {
   for (dst = 0; dst < n; dst++) toybuf[dst] = TT.map[toybuf[dst]];
 } else {
-- 
2.29.2.576.ga3fc446d84-goog

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] tr: fix pathological flushing.

2020-12-05 Thread Rob Landley
On 12/5/20 8:55 AM, Josh Gao wrote:
> On Sat, Dec 5, 2020, 3:38 AM Rob Landley  > wrote:
> 
> (Also, line buffering sucks because it'll flush at the buffer size anyway 
> so
> you're not guaranteed to get a full line of contiguous output. What it 
> REALLY
> wants is nagle's algorithm for stdout but no libc ever bothered to 
> IMPLEMENT it,
> possibly because of the runtime expense... Ahem. My point is commands 
> should
> probably do sane output blocking on their own.)
> 
> 
> AFAIK, the only way this would work is if libc only ever does nonblocking
> writes to stdout,

I was thinking "it's been 1/10 of a second since pending data went into the
buffer and it hasn't gotten written out yet, perform the blocking write() now".
Which could be threads, or alarm() and have the signal handler do the flush
(which could go wrong in a half-dozen different ways and would have been way
easier to bake into the semantics in 1985 than 2020). But the expense of setting
and resetting signal handlers every time you start a write probably makes it a
net loss, because it's still extra system calls.

Mostly I was complaining that the semantics of the effect they _want_ is
available in the other context but not this one, and what they're actually doing
doesn't accomplish it. If it was easy to accomplish those semantics efficiently
in my own plumbing I'd have tried it already. :P

> which also means it would need to spawn a thread or use
> SIGIO to flush its buffer when stdout becomes available, plus modify the
> flags on STDOUT_FILENO to be O_NONBLOCK (which doesn't even work on regular
> files). I think people would be far more annoyed with this behavior than any
> potential gains would justify?

In my original message, "possibly because of the runtime expense" was a
load-bearing phrase. :)

> (io_uring might make things more interesting, though? You could eliminate
> libc's buffering entirely, and just memcpy and submit a write for every single
> fwrite, up to some buffer limit.)

If we had an output mechanism that worked like linux-vdso instead of syscalls,
we could do all sorts of fun stuff, but I don't really kernel much these days
unless it's directly job-related:

  http://lkml.iu.edu/hypermail/linux/kernel/2011.1/06282.html

They're no fun anymore.

The fundamental problem here is that single byte write() is an order of
magnitude slower than page size write(), which as far as I can tell is why FILE
* exists. (That and ungetc().) And any solution that involves another system
call isn't going to be an improvement. :(

But this has been true forever. At linuxworld expo Jon "maddog" Hall had a talk
about how back in the big iron days he sped up a reel-to-reel tape backup (and
made it fit on one tape instead of a dozen) by replacing single byte writes with
block writes. (In that case the tape spun down and spun back up, leaving blank
tape between each write and putting start and end sequences around each chunk of
data, which is why that was a pathological case. Nagel tries to avoid 1 byte per
ethernet frame which is why THAT is a pathological case. And here it's system
call overhead.

The two hardest problems in computer science remain cache invalidation, naming
things, and off by one errors.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] tr: fix pathological flushing.

2020-12-05 Thread Josh Gao via Toybox
On Sat, Dec 5, 2020, 3:38 AM Rob Landley  wrote:

> (Also, line buffering sucks because it'll flush at the buffer size anyway
> so
> you're not guaranteed to get a full line of contiguous output. What it
> REALLY
> wants is nagle's algorithm for stdout but no libc ever bothered to
> IMPLEMENT it,
> possibly because of the runtime expense... Ahem. My point is commands
> should
> probably do sane output blocking on their own.)


AFAIK, the only way this would work is if libc only ever does nonblocking
writes to stdout, which also means it would need to spawn a thread or use
SIGIO to flush its buffer when stdout becomes available, plus modify the
flags on STDOUT_FILENO to be O_NONBLOCK (which doesn't even work on regular
files). I think people would be far more annoyed with this behavior than any
potential gains would justify?

(io_uring might make things more interesting, though? You could eliminate
libc's buffering entirely, and just memcpy and submit a write for every
single
fwrite, up to some buffer limit.)
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] tr: fix pathological flushing.

2020-12-05 Thread Rob Landley
On 12/4/20 1:58 PM, enh via Toybox wrote:
> The AOSP build doesn't use tr (or anything that's still in pending), but
> the kernel folks have been more aggressive. They found that tr's
> pathological flushing was adding minutes to their build times.

+  while ((n = read(0, toybuf, sizeof(toybuf {
+if (!FLAG(d) && !FLAG(s)) {
+  for (dst = 0; dst < n; dst++) toybuf[dst] = TT.map[toybuf[dst]];

And when read returns -1 what happens?

Sounds like time for me to cleanup and promote this command. (Let's see, my todo
entry for tr is: "TODO: -t (truncate) -a (ascii)" and I do NOT remember what -a
is but I think I was going to make tr support utf8? Which required a redesign.
Possibly it should be -U to support utf8 instead but everything ELSE supports
utf8 by default, because planet. Hmmm...)

> Just removing the fflush() made tr significantly faster for my trivial
> test, but still slow, with all the time going into stdio.

Single byte writes suck no matter how you slice 'em.

> Rewriting the
> loop to modify toybuf in place and then do one write per read made most
> of the difference, but special-casing the "neither -d nor -s" case made
> a measurable difference too on a Xeon.

Sigh. I have this patch in my tree, which I haven't applied yet because I don't
have the regression test setup to see what it would slow down in the AOSP build:

--- a/main.c
+++ b/main.c
@@ -103,7 +103,7 @@ void toy_singleinit(struct toy_list *which, char *argv[])
   // that choose non-UTF-8 locales. macOS doesn't support C.UTF-8 though.
   if (!setlocale(LC_CTYPE, "C.UTF-8")) setlocale(LC_CTYPE, "");
 }
-setlinebuf(stdout);
+setvbuf(stdout, 0, (which->flags & TOYFLAG_LINEBUF) ? _IOLBF : _IONBF, 0);
   }
 }

--- a/lib/toyflags.h
+++ b/lib/toyflags.h
@@ -32,6 +32,9 @@
 // Suppress default --help processing
 #define TOYFLAG_NOHELP   (1<<10)

+// Line buffered stdout
+#define TOYFLAG_LINEBUF  (1<<11)
+
 // Error code to return if argument parsing fails (default 1)
 #define TOYFLAG_ARGFAIL(x) (x<<24)

--- a/toys/posix/grep.c
+++ b/toys/posix/grep.c
@@ -10,9 +10,9 @@
 * echo hello | grep -f http://lists.landley.net/listinfo.cgi/toybox-landley.net