Re: [Toybox] Impact of global struct size

2024-01-08 Thread enh via Toybox
On Fri, Jan 5, 2024 at 10:45 PM Rob Landley  wrote:
>
> On 1/2/24 16:58, Ray Gardner wrote:
> > On Mon, Jan 1, 2024 at 1:39 PM Rob Landley  wrote:
> >> ... [ a very long and detailed reply ] ...
> >
> > Rob, thank you for the "GIANT INFODUMP", and I mean that sincerely. It
> > took me a while to read it; it must have taken quite a while to write it.
>
> It did, but you asked. And posting it to the list means I can refer back to 
> it,
> and/or more people can learn it so they don't have to ask me. :)
>
> You know how I say I document compulsively? Combine stream of consciousness
> infodump with Pascal's Apology:
>
> https://www.npr.org/sections/13.7/2014/02/03/270680304/this-could-have-been-shorter
>
> And you get documentation. Editing it DOWN, figuring out a non-dupliciative
> sequence where I'm not assuming knowledge I haven't explained yet, and 
> chopping
> it into bite-sized chunks, is the hard part.
>
> Blathering like this is easy. Turning into a FAQ entry or something is hard.
>
> > A lot of info on kernel-level memory management, I think I got about 90%
> > of it but I'll have to look up some stuff (PLT, GOT, ...).
>
> Procedure Linkage Table and Global Offset Table. The first tracks where
> dynamically linked functions live, the second tracks dynamically linked global
> variables live.

(s/dynamically linked/position-independent/. but, yes, dynamically
linked stuff is a common case.)

> Ok, take everything here with a grain of salt because I last had to know this 
> in
> detail back around 2010 and I largely avoid dynamic linking when I can because
> is really messy. I am PROBABLY getting this wrong, but off the top of my head:
>
> [Note: Elliott started another thread while I was traveling with this
> half-finished, and he can correct most of the stuff I get wrong. I'm also
> pointing you at where the kernel code lives, and other references.]
>
> When you exec() a file, Linux checks the executable bit (if it's not 
> executable
> it won't even try, and the suid and sgid bits get handled here too), and then
> does some simple type identification on it, which involves waving it at the
> "binary format loaders" to see if any claim it. (This is a bit like filesystem
> probe functions during mount, only for file data instead of block device 
> data.)
>
> $ ls linux/fs/binfmt*
> linux/fs/binfmt_elf.clinux/fs/binfmt_flat.c
> linux/fs/binfmt_elf_fdpic.c  linux/fs/binfmt_misc.c
> linux/fs/binfmt_elf_test.c   linux/fs/binfmt_script.c
>
> (Sadly, these can all be kernel modules so you can DYNAMICALLY LOAD a BINARY
> FORMAT LOADER which is just wrong.)
>
> The main one that gets 90% of the use is binfmt_elf, the kernel's ELF 
> executable
> loader. We'll come back to that.
>
> The "binfmt_script" one gets almost all of the rest of the use: it checks if 
> the
> first two bytes of the file are #! and if so it re-runs the exec call with the
> /path/after/that as the new file argument, and inserting everything after the
> first space in that line as argv[1] with the remaining arguments (if any) 
> bumped
> to argv[2] and friends. This is how shell scripts work, and the mechanism perl
> and python and so on inherited. It's also how you can use tinycc to run C as a
> scripting language with the first line being "#!/usr/bin/tinycc -run" which
> turns into "tinycc -run file.c" so it compiles, links, and executes it instead
> of writing it out to a file.
>
> And yes, it catches:
>
> $ echo \#\!$(readlink -f bang.sh) > bang.sh && chmod +x bang.sh && ./bang.sh
> bash: ./bang.sh: /home/landley/bang.sh: bad interpreter: Too many levels of
> symbolic links
>
> The elf_fdpic one is the nommu variant of elf, which REALLY SHOULD be a couple
> of if () statements in the same file but they did an ext2/ext3 thing and
> duplicated the file, but unlike deleting both of those and just having ext4
> handle all three variants of the same format in modern systems, the 
> linux-kernel
> guys never went back and cleaned that up because linux-kernel developmet is
> almost completely ossified and bureuacratically paralyzed these days. Oh well.
>
> You can ignore binfmt_flat as obsolete. It was the nommu fork of binfmt_aout
> which was the old executable format before everybody switched to ELF in 1996.
> There was a binfmt_aout.c which got removed in kernel commit 987f20a9dcce in
> 2022. I wrote more but am making it a FOOTNOTE. (See footnote.)
>
> People mostly stopped writing new ones once binfmt_misc was invented, because
> that sucker's programmable. It's basically a binfmt_script that can be
> programmed (via /proc) to recognize arbitrary file formats and run arbitrary
> commands to handle them:
>
> https://docs.kernel.org/admin-guide/binfmt-misc.html
>
> If you've ever run an arm binary on x86 and it magically called qemu 
> application
> emulation for you, that's because some init script setup a binfmt_misc
> association to do that.
>
> I have no idea what binfmt_elf_test is, it was introduced recently (commit
> 

Re: [Toybox] Impact of global struct size

2024-01-08 Thread enh via Toybox
On Fri, Jan 5, 2024 at 8:25 PM Rob Landley  wrote:
>
> On 1/5/24 19:14, enh wrote:
> >> I'm out of the habit of speaking at conferences (there was a pandemic), 
> >> really I
> >> should just get on a regular local schedule of Posting Crap Videos To 
> >> Youtube.
> >> NOT trying to polish them but just get them out regularly and then later 
> >> string
> >> together playlists of the less bad ones. (I can blather much
> >> stream-of-consciousness! You think this is bad, you should meet me in 
> >> person!
> >> Elliott was subject to this at a lunch once, and I was NOT sleep deprived, 
> >> and
> >> on my best behavior for that.)
> >
> > (fwiw, it's less overwhelming in person where you're actually
> > interacting than it is finding the time to even read a 500-line email
> > response, let alone reply :-) )
>
> I used to teach community college courses to vent my enthusiasm, but I 
> wandered
> away for several years and when I looked back into it the bureaucratic
> requirements had increased dramatically. (Not certification, just... 
> paperwork.)
>
> > keep reading the test to see a couple of extra special cases:
> > https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128
>
> I was reading the actual kernel code to see what the limits are, which are
> enforced here:
>
> https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128
>
> Based on _STK_LIM from:
>
> https://github.com/torvalds/linux/blob/v6.6/include/uapi/linux/resource.h#L63
>
> And ARG_MAX at:
>
> https://github.com/torvalds/linux/blob/v6.6/include/uapi/linux/limits.h#L8
>
> So min is 131072 bytes and max is 6 megabytes.
>
> > basically, if RLIMIT_STACK is too big, you're capped at 128KiB. more
> > weirdly, if it's too _small_ you also get the maximum 128KiB.
>
> You're seeing 128k for too _big_? That's weird. Sounds like a bug?

isn't that https://elixir.bootlin.com/linux/v6.6.4/source/fs/exec.c#L853 ?
```
stack_expand = min(rlim_stack, stack_size + stack_expand);
```

> > (i don't think this affects your code, but the reason we touched this
> > recently is that it's not "32 pages" as we claimed before --- it's
> > actually 128KiB, which _happens to be_ 32 pages if your page size is
> > 4KiB: 
> > https://android.googlesource.com/platform/bionic/+/2da31cf7b0c6071f83244eb0c89f95395a48cb37%5E%21/#F0
> > )
>
> They hardwired 131072 into the ARG_MAX #define due to hysterical raisins, so 
> it
> didn't move when page size does. The comment in exec.c gives pages as 
> historical
> motivation, but that's not what the code _does_.
>
> *shrug* Arbitrary and historical.
>
> >> Basically I want to know what struct is at the end of the stack (a 
> >> sequenced
> >> collection of structs and arrays are conceptually in an encapsulating 
> >> struct),
> >> and where does "1/4 stack size" _start_ measuring from. (From the actual 
> >> end, or
> >> does some of the data there "not count"? The debian xargs behavior implies 
> >> it's
> >> _just_ measuring the strings, but if so I could feed it an argv[] of a 
> >> couple
> >> million "" and blow the stack because each of those is 8 bytes of argv[] to
> >> point at 1 byte of NUL terminator, and resticting _that_ to 1/4 the stack 
> >> would
> >> try to write off the end of it. I'm pretty sure somebody would have 
> >> noticed by
> >> now...)
> >
> > wasn't our conclusion last time we talked about this "it's always
> > likely to wobble a bit, so as long as we go with the most conservative
> > assumption, that's what we want for this use case?".
>
> I was informed of "xargs --show-limits", and if I have to implement that I 
> would
> like to do it _right_.

didn't we have this discussion before, where we worked out that no-one
does this right? (which is why we didn't implement it at the time?)

> Also, my argument with Linus about this was years ago now and things seem to
> have stabilized. Or at least git annotate fs/exec.c says the kernel hasn't
> changed how it calculates this since... commit 655c16a8ce9c1 in 2019 was 
> peeling
> out the calculation into a separate function... commit c31dbb146dd4 was 
> fixing a
> race condition (fetch stack limit once at the start of exec into a variable, 
> in
> case it changes during processing)...
>
> Looks like commit da029c11e6b1 in 2017 was the last thing to touch this? 6 
> years
> ago. That changed the behavior on July 7, and I argued with Linus about it
> November 3:
>
> https://lkml.iu.edu/hypermail/linux/kernel/1711.0/02949.html
>
> Both the min and max limits the kernel enforces are complete ass-pulls (and
> "max" seems likely to move someday because find | args is already choked by it
> and I recently bought a raspberry-pi-alike with 8 gigs RAM), but we're about 6
> months from the 7 year time horizon: seems stable enough that I can change my
> code if/when they change again...
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net

Re: [Toybox] Impact of global struct size

2024-01-05 Thread Rob Landley
On 1/2/24 16:58, Ray Gardner wrote:
> On Mon, Jan 1, 2024 at 1:39 PM Rob Landley  wrote:
>> ... [ a very long and detailed reply ] ...
> 
> Rob, thank you for the "GIANT INFODUMP", and I mean that sincerely. It
> took me a while to read it; it must have taken quite a while to write it.

It did, but you asked. And posting it to the list means I can refer back to it,
and/or more people can learn it so they don't have to ask me. :)

You know how I say I document compulsively? Combine stream of consciousness
infodump with Pascal's Apology:

https://www.npr.org/sections/13.7/2014/02/03/270680304/this-could-have-been-shorter

And you get documentation. Editing it DOWN, figuring out a non-dupliciative
sequence where I'm not assuming knowledge I haven't explained yet, and chopping
it into bite-sized chunks, is the hard part.

Blathering like this is easy. Turning into a FAQ entry or something is hard.

> A lot of info on kernel-level memory management, I think I got about 90%
> of it but I'll have to look up some stuff (PLT, GOT, ...).

Procedure Linkage Table and Global Offset Table. The first tracks where
dynamically linked functions live, the second tracks dynamically linked global
variables live.

Ok, take everything here with a grain of salt because I last had to know this in
detail back around 2010 and I largely avoid dynamic linking when I can because
is really messy. I am PROBABLY getting this wrong, but off the top of my head:

[Note: Elliott started another thread while I was traveling with this
half-finished, and he can correct most of the stuff I get wrong. I'm also
pointing you at where the kernel code lives, and other references.]

When you exec() a file, Linux checks the executable bit (if it's not executable
it won't even try, and the suid and sgid bits get handled here too), and then
does some simple type identification on it, which involves waving it at the
"binary format loaders" to see if any claim it. (This is a bit like filesystem
probe functions during mount, only for file data instead of block device data.)

$ ls linux/fs/binfmt*
linux/fs/binfmt_elf.clinux/fs/binfmt_flat.c
linux/fs/binfmt_elf_fdpic.c  linux/fs/binfmt_misc.c
linux/fs/binfmt_elf_test.c   linux/fs/binfmt_script.c

(Sadly, these can all be kernel modules so you can DYNAMICALLY LOAD a BINARY
FORMAT LOADER which is just wrong.)

The main one that gets 90% of the use is binfmt_elf, the kernel's ELF executable
loader. We'll come back to that.

The "binfmt_script" one gets almost all of the rest of the use: it checks if the
first two bytes of the file are #! and if so it re-runs the exec call with the
/path/after/that as the new file argument, and inserting everything after the
first space in that line as argv[1] with the remaining arguments (if any) bumped
to argv[2] and friends. This is how shell scripts work, and the mechanism perl
and python and so on inherited. It's also how you can use tinycc to run C as a
scripting language with the first line being "#!/usr/bin/tinycc -run" which
turns into "tinycc -run file.c" so it compiles, links, and executes it instead
of writing it out to a file.

And yes, it catches:

$ echo \#\!$(readlink -f bang.sh) > bang.sh && chmod +x bang.sh && ./bang.sh
bash: ./bang.sh: /home/landley/bang.sh: bad interpreter: Too many levels of
symbolic links

The elf_fdpic one is the nommu variant of elf, which REALLY SHOULD be a couple
of if () statements in the same file but they did an ext2/ext3 thing and
duplicated the file, but unlike deleting both of those and just having ext4
handle all three variants of the same format in modern systems, the linux-kernel
guys never went back and cleaned that up because linux-kernel developmet is
almost completely ossified and bureuacratically paralyzed these days. Oh well.

You can ignore binfmt_flat as obsolete. It was the nommu fork of binfmt_aout
which was the old executable format before everybody switched to ELF in 1996.
There was a binfmt_aout.c which got removed in kernel commit 987f20a9dcce in
2022. I wrote more but am making it a FOOTNOTE. (See footnote.)

People mostly stopped writing new ones once binfmt_misc was invented, because
that sucker's programmable. It's basically a binfmt_script that can be
programmed (via /proc) to recognize arbitrary file formats and run arbitrary
commands to handle them:

https://docs.kernel.org/admin-guide/binfmt-misc.html

If you've ever run an arm binary on x86 and it magically called qemu application
emulation for you, that's because some init script setup a binfmt_misc
association to do that.

I have no idea what binfmt_elf_test is, it was introduced recently (commit
9e1a3ce0a952 in 2022) and from the commit message the Linux Test Project people
crapping unnecessary complexity into the mainline kernel for no obvious reason.
It's the kernel equivalent of checking in debug printfs. Make an EFFORT to
ignore that one, it's NOT REAL.

Ok, so back to the ELF loader. We've more or less covered static linking
earlier, where 

Re: [Toybox] Impact of global struct size

2024-01-05 Thread Rob Landley
On 1/5/24 19:14, enh wrote:
>> I'm out of the habit of speaking at conferences (there was a pandemic), 
>> really I
>> should just get on a regular local schedule of Posting Crap Videos To 
>> Youtube.
>> NOT trying to polish them but just get them out regularly and then later 
>> string
>> together playlists of the less bad ones. (I can blather much
>> stream-of-consciousness! You think this is bad, you should meet me in person!
>> Elliott was subject to this at a lunch once, and I was NOT sleep deprived, 
>> and
>> on my best behavior for that.)
> 
> (fwiw, it's less overwhelming in person where you're actually
> interacting than it is finding the time to even read a 500-line email
> response, let alone reply :-) )

I used to teach community college courses to vent my enthusiasm, but I wandered
away for several years and when I looked back into it the bureaucratic
requirements had increased dramatically. (Not certification, just... paperwork.)

> keep reading the test to see a couple of extra special cases:
> https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128

I was reading the actual kernel code to see what the limits are, which are
enforced here:

https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128

Based on _STK_LIM from:

https://github.com/torvalds/linux/blob/v6.6/include/uapi/linux/resource.h#L63

And ARG_MAX at:

https://github.com/torvalds/linux/blob/v6.6/include/uapi/linux/limits.h#L8

So min is 131072 bytes and max is 6 megabytes.

> basically, if RLIMIT_STACK is too big, you're capped at 128KiB. more
> weirdly, if it's too _small_ you also get the maximum 128KiB.

You're seeing 128k for too _big_? That's weird. Sounds like a bug?

> (i don't think this affects your code, but the reason we touched this
> recently is that it's not "32 pages" as we claimed before --- it's
> actually 128KiB, which _happens to be_ 32 pages if your page size is
> 4KiB: 
> https://android.googlesource.com/platform/bionic/+/2da31cf7b0c6071f83244eb0c89f95395a48cb37%5E%21/#F0
> )

They hardwired 131072 into the ARG_MAX #define due to hysterical raisins, so it
didn't move when page size does. The comment in exec.c gives pages as historical
motivation, but that's not what the code _does_.

*shrug* Arbitrary and historical.

>> Basically I want to know what struct is at the end of the stack (a sequenced
>> collection of structs and arrays are conceptually in an encapsulating 
>> struct),
>> and where does "1/4 stack size" _start_ measuring from. (From the actual 
>> end, or
>> does some of the data there "not count"? The debian xargs behavior implies 
>> it's
>> _just_ measuring the strings, but if so I could feed it an argv[] of a couple
>> million "" and blow the stack because each of those is 8 bytes of argv[] to
>> point at 1 byte of NUL terminator, and resticting _that_ to 1/4 the stack 
>> would
>> try to write off the end of it. I'm pretty sure somebody would have noticed 
>> by
>> now...)
> 
> wasn't our conclusion last time we talked about this "it's always
> likely to wobble a bit, so as long as we go with the most conservative
> assumption, that's what we want for this use case?".

I was informed of "xargs --show-limits", and if I have to implement that I would
like to do it _right_.

Also, my argument with Linus about this was years ago now and things seem to
have stabilized. Or at least git annotate fs/exec.c says the kernel hasn't
changed how it calculates this since... commit 655c16a8ce9c1 in 2019 was peeling
out the calculation into a separate function... commit c31dbb146dd4 was fixing a
race condition (fetch stack limit once at the start of exec into a variable, in
case it changes during processing)...

Looks like commit da029c11e6b1 in 2017 was the last thing to touch this? 6 years
ago. That changed the behavior on July 7, and I argued with Linus about it
November 3:

https://lkml.iu.edu/hypermail/linux/kernel/1711.0/02949.html

Both the min and max limits the kernel enforces are complete ass-pulls (and
"max" seems likely to move someday because find | args is already choked by it
and I recently bought a raspberry-pi-alike with 8 gigs RAM), but we're about 6
months from the 7 year time horizon: seems stable enough that I can change my
code if/when they change again...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-05 Thread enh via Toybox
On Thu, Jan 4, 2024 at 6:21 PM Rob Landley  wrote:
>
> On 1/4/24 18:37, enh wrote:
> > On Thu, Jan 4, 2024 at 10:05 AM Rob Landley  wrote:
> >>
> >> On 1/3/24 12:19, Mouse wrote:
> >> >> (The line between PIE and dynamic linking confuses even me.  How does
> >> >> static PIE relocate itself?
> >> >
> >> > It may not.  It could get relocated by in-kernel ASLR or the like.
> >> > Also, I think PIE isn't relevant, or certainly isn't _as_ relevant, to
> >> > the final executable; my impression is that it's more important for
> >> > library code, so it doesn't need fixups.  These are less important for
> >> > static executables, since the fixups there happen once, at link time,
> >> > whereas for a .so the fixups happen at runtime and reduce the
> >> > text-segment sharing that is one of the benefits of shared objects.
> >>
> >> I want https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html but a
> >> walkthrough for the kernel's ELF loader. (I've had to walk through it 
> >> MYSELF
> >> several times, but I didn't do writeups afterwards so forgot it all.)
> >
> > (yeah, and the one i've done for that and for the libc side of things
> > were both just google-internal talks, so there's no record of them
> > anywhere :-( )
>
> I've stopped going to conferences that don't record and post the talks.
>
> Then there's a meta-problem of INDEXING all this information. Which is what I
> tried to tackle when I got the Linux Foundation documentation fellowship in
> 2007, but...
>
> https://landley.net/notes-2007.html#15-11-2007
>
> (Their Problem was Jon Rogers' old chestnut, "A goal is not a plan." I 
> basically
> finally convinced them "this is what actually needs to be done" and they went
> "Huh, yeah. You're right. We're not interested in funding that." They wanted 
> an
> author and NEEDED a librarian.)
>
> It doesn't matter if documentation exists that nobody can FIND. I'm weird in
> that I spent a whole project tracking down
> https://landley.net/notes-2007.html#13-10-2007 and
> https://landley.net/notes-2007.html#29-09-2007 and
> https://landley.net/notes-2007.html#07-09-2007 and
> https://landley.net/notes-2007.html#14-06-2007 and actually READ through the
> backlog of kernel-traffic and https://lwn.net/Kernel/Index/ and the linux
> journal articles back when the web page had an index of them (which I could
> probably fish out of archive.org if I tried...) I collected zillions of links 
> at
> https://landley.net/kdocs/ and many were links to other indexes! (They NEST!)
> But it's all bit-rotted. I haven't even set up a man7.org replacement web page
> builder, and that's a three day weekend's work, tops.
>
> I'm out of the habit of speaking at conferences (there was a pandemic), 
> really I
> should just get on a regular local schedule of Posting Crap Videos To Youtube.
> NOT trying to polish them but just get them out regularly and then later 
> string
> together playlists of the less bad ones. (I can blather much
> stream-of-consciousness! You think this is bad, you should meet me in person!
> Elliott was subject to this at a lunch once, and I was NOT sleep deprived, and
> on my best behavior for that.)

(fwiw, it's less overwhelming in person where you're actually
interacting than it is finding the time to even read a 500-line email
response, let alone reply :-) )

> https://web.archive.org/web/20130123001143/http://www.homeonthestrange.com/view.php?ID=28
>
> (Except... not Youtube. They've gone septic. And setting up peertube is one of
> those blocking todo items.)
>
> > i've been meaning to tell you, apropos something you said on your blog
> > about ARG_MAX (for xargs?), that the kernel changed how that works
> > recently... see
> > https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128
> > for more detail and links.
>
> Define "recently"? 2.6.23 was 2007.
>
> Assuming I haven't missed something, here's from my giant dirty tree:
>
> --- a/lib/env.c
> +++ b/lib/env.c
> @@ -8,14 +8,20 @@ extern char **environ;
>  // Returns the number of bytes taken by the environment variables. For use
>  // when calculating the maximum bytes of environment+argument data that can
>  // be passed to exec for find(1) and xargs(1).
> -long environ_bytes(void)
> +long child_env_free(char **argv)
>  {
> -  long bytes = sizeof(char *);
> +  struct rlimit lim;
> +  long bytes = 2*sizeof(char *); // NULL array terminators for argc and envp
>char **ev;
>
> -  for (ev = environ; *ev; ev++) bytes += sizeof(char *) + strlen(*ev) + 1;
> +  // Since 2.6.25, Linux's env limit has been 1/4 stack, with 32 page 
> minimum.
> +  // sysconf(_SC_ARG_MAX) is unreliable (compile time value, not probed)
>
> -  return bytes;
> +  getrlimit(RLIMIT_STACK, );
> +  if (argv) for (ev = argv; *ev; ev++) bytes += sizeof(char *)+strlen(*ev)+1;
> +  for (ev = environ; *ev; ev++) bytes += sizeof(char *)+strlen(*ev)+1;
> +
> +  return (lim.rlim_cur/4)-bytes;
>  }

keep reading the test to see a couple of extra special 

Re: [Toybox] Impact of global struct size

2024-01-05 Thread enh via Toybox
On Thu, Jan 4, 2024 at 5:07 PM Rob Landley  wrote:
>
> On 1/4/24 18:30, enh wrote:
> >> Between the two of them you can do things like check the current timestamp
> >> without a system call. What they actually provide varies by OS (and then 
> >> your
> >> libc has to be taught to use each new capability out of there instead of 
> >> making
> >> the syscalls).
> >>
> >> "cat /proc/self/maps" and they're the last two entries if present.
> >
> > (not necesssarily. aslr applies to them too.)
>
> I thought that was in order of map creation, not the order they occurred in 
> the
> address space?

afaik only the dynamic linker does the former. the kernel does the
latter (though iirc it's within a window of where the previous one
went, so it's not as random as you might imagine if someone just
describes the idea to you and you don't look at the implementation!).

(iirc there are differences between architectures here too. so you
might be right for x86? but certainly on arm64, literally the first
process i just picked had vmas after the vvars/vdso ones.)

> > funnily enough (as you can see from that link), argc is there too, so
> > you don't have to count the entries in argv. (and although a null
> > argv[0] is no longer allowed, that was allowed by linux until fairly
> > recently.)
>
> I really need to check in a lot of the dirty changes in my tree:
>
> diff --git a/main.c b/main.c
> index 3d9f612e..190e65cd 100644
> --- a/main.c
> +++ b/main.c
> diff --git a/main.c b/main.c
> index 3d9f612e..190e65cd 100644
> --- a/main.c
> +++ b/main.c
> @@ -279,6 +280,7 @@ void toybox_main(void)
>  int main(int argc, char *argv[])
>  {
>// don't segfault if our environment is crazy
> +  // TODO mooted by kernel commit dcd46d897adb7 5.17 kernel Jan 2022
>if (!*argv) return 127;
>
>// Snapshot stack location so we can detect recursion depth later.
>
> It's the tabsplosion problem in code form...
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-04 Thread Rob Landley
On 1/4/24 18:37, enh wrote:
> On Thu, Jan 4, 2024 at 10:05 AM Rob Landley  wrote:
>>
>> On 1/3/24 12:19, Mouse wrote:
>> >> (The line between PIE and dynamic linking confuses even me.  How does
>> >> static PIE relocate itself?
>> >
>> > It may not.  It could get relocated by in-kernel ASLR or the like.
>> > Also, I think PIE isn't relevant, or certainly isn't _as_ relevant, to
>> > the final executable; my impression is that it's more important for
>> > library code, so it doesn't need fixups.  These are less important for
>> > static executables, since the fixups there happen once, at link time,
>> > whereas for a .so the fixups happen at runtime and reduce the
>> > text-segment sharing that is one of the benefits of shared objects.
>>
>> I want https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html but a
>> walkthrough for the kernel's ELF loader. (I've had to walk through it MYSELF
>> several times, but I didn't do writeups afterwards so forgot it all.)
> 
> (yeah, and the one i've done for that and for the libc side of things
> were both just google-internal talks, so there's no record of them
> anywhere :-( )

I've stopped going to conferences that don't record and post the talks.

Then there's a meta-problem of INDEXING all this information. Which is what I
tried to tackle when I got the Linux Foundation documentation fellowship in
2007, but...

https://landley.net/notes-2007.html#15-11-2007

(Their Problem was Jon Rogers' old chestnut, "A goal is not a plan." I basically
finally convinced them "this is what actually needs to be done" and they went
"Huh, yeah. You're right. We're not interested in funding that." They wanted an
author and NEEDED a librarian.)

It doesn't matter if documentation exists that nobody can FIND. I'm weird in
that I spent a whole project tracking down
https://landley.net/notes-2007.html#13-10-2007 and
https://landley.net/notes-2007.html#29-09-2007 and
https://landley.net/notes-2007.html#07-09-2007 and
https://landley.net/notes-2007.html#14-06-2007 and actually READ through the
backlog of kernel-traffic and https://lwn.net/Kernel/Index/ and the linux
journal articles back when the web page had an index of them (which I could
probably fish out of archive.org if I tried...) I collected zillions of links at
https://landley.net/kdocs/ and many were links to other indexes! (They NEST!)
But it's all bit-rotted. I haven't even set up a man7.org replacement web page
builder, and that's a three day weekend's work, tops.

I'm out of the habit of speaking at conferences (there was a pandemic), really I
should just get on a regular local schedule of Posting Crap Videos To Youtube.
NOT trying to polish them but just get them out regularly and then later string
together playlists of the less bad ones. (I can blather much
stream-of-consciousness! You think this is bad, you should meet me in person!
Elliott was subject to this at a lunch once, and I was NOT sleep deprived, and
on my best behavior for that.)

https://web.archive.org/web/20130123001143/http://www.homeonthestrange.com/view.php?ID=28

(Except... not Youtube. They've gone septic. And setting up peertube is one of
those blocking todo items.)

> i've been meaning to tell you, apropos something you said on your blog
> about ARG_MAX (for xargs?), that the kernel changed how that works
> recently... see
> https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128
> for more detail and links.

Define "recently"? 2.6.23 was 2007.

Assuming I haven't missed something, here's from my giant dirty tree:

--- a/lib/env.c
+++ b/lib/env.c
@@ -8,14 +8,20 @@ extern char **environ;
 // Returns the number of bytes taken by the environment variables. For use
 // when calculating the maximum bytes of environment+argument data that can
 // be passed to exec for find(1) and xargs(1).
-long environ_bytes(void)
+long child_env_free(char **argv)
 {
-  long bytes = sizeof(char *);
+  struct rlimit lim;
+  long bytes = 2*sizeof(char *); // NULL array terminators for argc and envp
   char **ev;

-  for (ev = environ; *ev; ev++) bytes += sizeof(char *) + strlen(*ev) + 1;
+  // Since 2.6.25, Linux's env limit has been 1/4 stack, with 32 page minimum.
+  // sysconf(_SC_ARG_MAX) is unreliable (compile time value, not probed)

-  return bytes;
+  getrlimit(RLIMIT_STACK, );
+  if (argv) for (ev = argv; *ev; ev++) bytes += sizeof(char *)+strlen(*ev)+1;
+  for (ev = environ; *ev; ev++) bytes += sizeof(char *)+strlen(*ev)+1;
+
+  return (lim.rlim_cur/4)-bytes;
 }

And my hangup on that was probably the same November 24 entry you're replying to
where trying to figure out "where DOES the argv[] and envp[] pointer array live
and does it come out of the same budget, and does anything ELSE on the stack
come out of that budget or is it the magic 2 pages in the start that everything
gets as "kernel stack"? Trying to printf("%p") the pointers wasn't as
enlightening as I'd hoped (not remotely adjacent), which led me to reading the
kernel 

Re: [Toybox] Impact of global struct size

2024-01-04 Thread Rob Landley
On 1/4/24 18:30, enh wrote:
>> Between the two of them you can do things like check the current timestamp
>> without a system call. What they actually provide varies by OS (and then your
>> libc has to be taught to use each new capability out of there instead of 
>> making
>> the syscalls).
>>
>> "cat /proc/self/maps" and they're the last two entries if present.
> 
> (not necesssarily. aslr applies to them too.)

I thought that was in order of map creation, not the order they occurred in the
address space?

> funnily enough (as you can see from that link), argc is there too, so
> you don't have to count the entries in argv. (and although a null
> argv[0] is no longer allowed, that was allowed by linux until fairly
> recently.)

I really need to check in a lot of the dirty changes in my tree:

diff --git a/main.c b/main.c
index 3d9f612e..190e65cd 100644
--- a/main.c
+++ b/main.c
diff --git a/main.c b/main.c
index 3d9f612e..190e65cd 100644
--- a/main.c
+++ b/main.c
@@ -279,6 +280,7 @@ void toybox_main(void)
 int main(int argc, char *argv[])
 {
   // don't segfault if our environment is crazy
+  // TODO mooted by kernel commit dcd46d897adb7 5.17 kernel Jan 2022
   if (!*argv) return 127;

   // Snapshot stack location so we can detect recursion depth later.

It's the tabsplosion problem in code form...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-04 Thread enh via Toybox
On Thu, Jan 4, 2024 at 10:05 AM Rob Landley  wrote:
>
> On 1/3/24 12:19, Mouse wrote:
> >> (The line between PIE and dynamic linking confuses even me.  How does
> >> static PIE relocate itself?
> >
> > It may not.  It could get relocated by in-kernel ASLR or the like.
> > Also, I think PIE isn't relevant, or certainly isn't _as_ relevant, to
> > the final executable; my impression is that it's more important for
> > library code, so it doesn't need fixups.  These are less important for
> > static executables, since the fixups there happen once, at link time,
> > whereas for a .so the fixups happen at runtime and reduce the
> > text-segment sharing that is one of the benefits of shared objects.
>
> I want https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html but a
> walkthrough for the kernel's ELF loader. (I've had to walk through it MYSELF
> several times, but I didn't do writeups afterwards so forgot it all.)

(yeah, and the one i've done for that and for the libc side of things
were both just google-internal talks, so there's no record of them
anywhere :-( )

i've been meaning to tell you, apropos something you said on your blog
about ARG_MAX (for xargs?), that the kernel changed how that works
recently... see
https://android.googlesource.com/platform/bionic/+/main/tests/unistd_test.cpp#1128
for more detail and links.

> I suppose I should start by reading his dynamic version:
>
> https://www.muppetlabs.com/~breadbox/software/tiny/somewhat.html
>
> >> Luckily X11 has "detach and restart" plumbing that lets it reopen a
> >> process's network pipe without killing the window or the process,
> >
> > ...?  When did it grow that, and where can I find out more about it?
>
> Um... A) Before Scale 2011, B) ask Kir Kolyshkin? He said it was something 
> old.
> (I think the program can just detect that the connection closed and dial out 
> to
> the server again, opening a new window and repopulating it? It's just most
> programs don't bother.)
>
> What OpenVZ was doing was
>
> A) tell the container to create a giant multi-process coredump file that had
> every process in the container in one big file (but don't STOP anything, just
> checkpoint the live running stuff racily).
>
> B) rsync the filesystem and coredump over to the new machine.
>
> C) Suspend the container (all processes) and re-write the big coredump file.
>
> D) rsync everything AGAIN (fast because not much changed)
>
> E) do TCP/IP connection hijacking so the new machine inherits the old open
> connections (you don't have to predict sequence numbers the other side sends
> you, don't forget to broadcast an ARP update so the packets go to the new
> ethernet address):
>
> https://www.idc-online.com/technical_references/pdfs/data_communications/TCP_Sequence_Prediction_Attack.pdf
>
> F) Resume the new container in the new filesystem.
>
> He had an animated X11 window (screensaver) that paused for 1/3 of a second
> while migrating from machine to machine. His demo involved plugging in a cat 5
> to the new machine, migrating to it, and unplugging the old one's network 
> cable.
>
> This was in 2011. I assume vanilla Linux has caught up by now, but there was
> quite the laundry list at the time...
>
> Rob
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-04 Thread Mouse
>>> (The line between PIE and dynamic linking confuses even me.  How
>>> does static PIE relocate itself?
>> It may not.  It could get relocated by in-kernel ASLR or the like.

(This, incidentally, was an OS-agnostic remark.  I suspect doing ASLR
for statically linked executables is relatively rare.)

> I want https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
> but a walkthrough for the kernel's ELF loader.

I suspect you may have to read the source for the kernel you care
about.  That's what I did when I was implementing a userland-only
emulator and needed to mimic what the kernel I cared about did when
loading an executable.

>>> Luckily X11 has "detach and restart" plumbing that lets it reopen a
>>> process's network pipe without killing the window or the process,
>> ...?  When did it grow that, and where can I find out more about it?
> Um... A) Before Scale 2011, B) ask Kir Kolyshkin?  He said it was
> something old.  (I think the program can just detect that the
> connection closed and dial out to the server again, opening a new
> window and repopulating it?  It's just most programs don't bother.)

Yes, it can, but that's not "without killing the window"; that's
recreating it after it gets killed.  (You can set the connection's
close-down mode so it doesn't get killed, but then it will stick around
until someone explicitly kills it even if the client never reconnects.
Depending on your use case, that might be useful or might be an even
bigger headache.  And, strictly speaking, I should add "or until server
reset")

Also, depending on your X interface software (Xlib etc), it can be
difficult to handle connection death without killing the process
holding your end of the connection.  This can be worked around by using
helper processes, but it's ugly.

> What OpenVZ was doing was [...]

Okay, I guess I wasn't missing as much as I thought.  (That's not so
much reopening the connection as never closing it in the first place.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-04 Thread enh via Toybox
On Wed, Jan 3, 2024 at 9:30 AM Rob Landley  wrote:
>
> I note that I've written over a hundred lines of rant in response to his
> previous email already. I should dig back through this and turn it into proper
> documentation at some point. (Especially since Elliott knows more of this 
> stuff
> than I do so I'm likely to get corrected a lot here...)
>
> On 1/2/24 20:54, enh wrote:
> >> You can look at /proc/self/maps (and /proc/self/smaps, and
> >> /proc/self/smaps_rollup) to see them for a running process (replace "self" 
> >> with
> >> any running PID, self is a symlink to your current PID). The six sections 
> >> are:
> >>
> >>   text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC)
> >>   rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, 
> >> PROT_READ)
> >>   data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, 
> >> PROT_WRITE)
> >>   bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE)
> >>   stack - function call stack, also contains environment data
> >>   heap - backing store for malloc() and free()
> >
> > (Android and modern linux distros require the relro section too.
>
> I thought that was only needed for dynamic linking? Then again you don't 
> allow a
> lot of static stuff to run on the final system anyway...

iirc that was my reaction when the security folks came to me about
this a decade ago :-)

you still have a got and init_array/fini_array etc.

> (The line between PIE and dynamic linking confuses even me. How does static 
> PIE
> relocate itself? I _think_ I looked it up once, but "it's statically links in 
> a
> dynamic linker in the pile of crt1.o and begin.o files" _can't_ be right...)
>
> > interestingly, there _is_ an elf program header for the stack, to
> > signal that you don't want an executable stack. iirc Android and [very
> > very recently] modern linux distros won't let you start a process with
> > an executable main stack, but afaik the code for the option no-one has
> > wanted/needed for a very long time is still in the kernel.)
>
> Cool.
>
> These days there's also vdso and vvar, which are provided by the kernel at
> runtime. The first is a .text section with magic functions you can call as an
> alternative to syscalls, and the second is a magic .rodata section that 
> provides
> volatile variables the OS updates which you can just reach out and look at.
>
> Between the two of them you can do things like check the current timestamp
> without a system call. What they actually provide varies by OS (and then your
> libc has to be taught to use each new capability out of there instead of 
> making
> the syscalls).
>
> "cat /proc/self/maps" and they're the last two entries if present.

(not necesssarily. aslr applies to them too.)

> There is a "man 7 vdso" but I dunno how up to date it is. (Which gets us back 
> to
> Michael Kerrisk's retirement and the new guy NOT MAINTAINING A WEB COPY. 
> Grrr.)
>
> Maintaining backwards compatibility means keeping a lot of old stuff. I had a
> talk with Rich Felker last night on IRC about what musl-libc's syscall
> requirements actually _are_, and what it would take to repot it on top of a
> posix-ish RTOS du jour. (Makes the trusting trust cleansing cycle smaller if 
> you
> can cross compile Linux from an RTOS...)
>
> We didn't come to a conclusion, but I _did_ get permission from skarnet to use
> his git://git.skarnet.org/mdevd under 0BSD. (POrting that to toybox seems 
> easier
> than bringing my old mdev code up to speed for all the
> https://github.com/slashbeast/mdev-like-a-boss stuff it's grown since I handed
> it off.
>
> >> The first three of those literally exist in the ELF file, as in it mmap()s 
> >> a
> >> block of data out of the file at a starting offset, and the memory is thus
> >> automatically populated with data from the file. The text and rodata ones 
> >> don't
> >> really care if it's MAP_PRIVATE or MAP_SHARED because they can never write
> >> anything back to the file, but the data one cares that it's MAP_PRIVATE: 
> >> any
> >> changes stay local and do NOT get written back to the file. And the bss is 
> >> an
> >> anonymous mapping so starts zeroed, the file doesn't bother wasting space 
> >> on a
> >> run of zeroes when the OS can just provide that on request. (It stands for 
> >> Block
> >> Starting Symbol which I assume meant something useful 40 years ago on DEC 
> >> hardware.)
> >
> > (close, but it was IBM and the name was slightly different:
> > https://en.wikipedia.org/wiki/.bss#Origin)
>
> That says United Aircraft Corporation named it using IBM 704 hardware in an
> assembler and then in fortran. (I only give wikipedia[citation needed] about 
> an
> 80% chance to be accurate about any given fact, but am not root causing it 
> right
> now. :)
>
> I like to track down magic acronyms, ala grep meaning "get regular 
> expression".
> I once emailed Dennis Ritchie to ask what "inode" meant:
>
> 

Re: [Toybox] Impact of global struct size

2024-01-04 Thread Rob Landley
On 1/3/24 12:19, Mouse wrote:
>> (The line between PIE and dynamic linking confuses even me.  How does
>> static PIE relocate itself?
> 
> It may not.  It could get relocated by in-kernel ASLR or the like.
> Also, I think PIE isn't relevant, or certainly isn't _as_ relevant, to
> the final executable; my impression is that it's more important for
> library code, so it doesn't need fixups.  These are less important for
> static executables, since the fixups there happen once, at link time,
> whereas for a .so the fixups happen at runtime and reduce the
> text-segment sharing that is one of the benefits of shared objects.

I want https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html but a
walkthrough for the kernel's ELF loader. (I've had to walk through it MYSELF
several times, but I didn't do writeups afterwards so forgot it all.)

I suppose I should start by reading his dynamic version:

https://www.muppetlabs.com/~breadbox/software/tiny/somewhat.html

>> Luckily X11 has "detach and restart" plumbing that lets it reopen a
>> process's network pipe without killing the window or the process,
> 
> ...?  When did it grow that, and where can I find out more about it?

Um... A) Before Scale 2011, B) ask Kir Kolyshkin? He said it was something old.
(I think the program can just detect that the connection closed and dial out to
the server again, opening a new window and repopulating it? It's just most
programs don't bother.)

What OpenVZ was doing was

A) tell the container to create a giant multi-process coredump file that had
every process in the container in one big file (but don't STOP anything, just
checkpoint the live running stuff racily).

B) rsync the filesystem and coredump over to the new machine.

C) Suspend the container (all processes) and re-write the big coredump file.

D) rsync everything AGAIN (fast because not much changed)

E) do TCP/IP connection hijacking so the new machine inherits the old open
connections (you don't have to predict sequence numbers the other side sends
you, don't forget to broadcast an ARP update so the packets go to the new
ethernet address):

https://www.idc-online.com/technical_references/pdfs/data_communications/TCP_Sequence_Prediction_Attack.pdf

F) Resume the new container in the new filesystem.

He had an animated X11 window (screensaver) that paused for 1/3 of a second
while migrating from machine to machine. His demo involved plugging in a cat 5
to the new machine, migrating to it, and unplugging the old one's network cable.

This was in 2011. I assume vanilla Linux has caught up by now, but there was
quite the laundry list at the time...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-04 Thread Mouse
> (The line between PIE and dynamic linking confuses even me.  How does
> static PIE relocate itself?

It may not.  It could get relocated by in-kernel ASLR or the like.
Also, I think PIE isn't relevant, or certainly isn't _as_ relevant, to
the final executable; my impression is that it's more important for
library code, so it doesn't need fixups.  These are less important for
static executables, since the fixups there happen once, at link time,
whereas for a .so the fixups happen at runtime and reduce the
text-segment sharing that is one of the benefits of shared objects.

> Luckily X11 has "detach and restart" plumbing that lets it reopen a
> process's network pipe without killing the window or the process,

...?  When did it grow that, and where can I find out more about it?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-03 Thread Patrick Oppenlander
On Thu, Jan 4, 2024 at 4:30 AM Rob Landley  wrote:
>
> I note that I've written over a hundred lines of rant in response to his
> previous email already. I should dig back through this and turn it into proper
> documentation at some point. (Especially since Elliott knows more of this 
> stuff
> than I do so I'm likely to get corrected a lot here...)
>
> On 1/2/24 20:54, enh wrote:
> >> You can look at /proc/self/maps (and /proc/self/smaps, and
> >> /proc/self/smaps_rollup) to see them for a running process (replace "self" 
> >> with
> >> any running PID, self is a symlink to your current PID). The six sections 
> >> are:
> >>
> >>   text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC)
> >>   rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, 
> >> PROT_READ)
> >>   data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, 
> >> PROT_WRITE)
> >>   bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE)
> >>   stack - function call stack, also contains environment data
> >>   heap - backing store for malloc() and free()
> >
> > (Android and modern linux distros require the relro section too.
>
> I thought that was only needed for dynamic linking? Then again you don't 
> allow a
> lot of static stuff to run on the final system anyway...
>
> (The line between PIE and dynamic linking confuses even me. How does static 
> PIE
> relocate itself? I _think_ I looked it up once, but "it's statically links in 
> a
> dynamic linker in the pile of crt1.o and begin.o files" _can't_ be right...)
>
> > interestingly, there _is_ an elf program header for the stack, to
> > signal that you don't want an executable stack. iirc Android and [very
> > very recently] modern linux distros won't let you start a process with
> > an executable main stack, but afaik the code for the option no-one has
> > wanted/needed for a very long time is still in the kernel.)
>
> Cool.
>
> These days there's also vdso and vvar, which are provided by the kernel at
> runtime. The first is a .text section with magic functions you can call as an
> alternative to syscalls, and the second is a magic .rodata section that 
> provides
> volatile variables the OS updates which you can just reach out and look at.
>
> Between the two of them you can do things like check the current timestamp
> without a system call. What they actually provide varies by OS (and then your
> libc has to be taught to use each new capability out of there instead of 
> making
> the syscalls).
>
> "cat /proc/self/maps" and they're the last two entries if present.
>
> There is a "man 7 vdso" but I dunno how up to date it is. (Which gets us back 
> to
> Michael Kerrisk's retirement and the new guy NOT MAINTAINING A WEB COPY. 
> Grrr.)
>
> Maintaining backwards compatibility means keeping a lot of old stuff. I had a
> talk with Rich Felker last night on IRC about what musl-libc's syscall
> requirements actually _are_, and what it would take to repot it on top of a
> posix-ish RTOS du jour. (Makes the trusting trust cleansing cycle smaller if 
> you
> can cross compile Linux from an RTOS...)

I did the "run linux-musl binaries on an RTOS" part a few years ago
and ended up with this list:

https://github.com/apexrtos/apex/blob/master/sys/kern/syscall_table.c

It's by no means exhaustive, but it was enough to run a useful set of
toybox toys, busybox's ash and enough other stuff to build a
commercial product running on an armv7-m (nommu) chip on top of it. I
had a risc-v port working and was in the middle of getting powerpc
(mmu) stuff running when circumstances changed and I had to move on.

I'm not sure how many more syscalls would be required to be able to
compile Linux, but probably not a whole lot.

Patrick

> We didn't come to a conclusion, but I _did_ get permission from skarnet to use
> his git://git.skarnet.org/mdevd under 0BSD. (POrting that to toybox seems 
> easier
> than bringing my old mdev code up to speed for all the
> https://github.com/slashbeast/mdev-like-a-boss stuff it's grown since I handed
> it off.
>
> >> The first three of those literally exist in the ELF file, as in it mmap()s 
> >> a
> >> block of data out of the file at a starting offset, and the memory is thus
> >> automatically populated with data from the file. The text and rodata ones 
> >> don't
> >> really care if it's MAP_PRIVATE or MAP_SHARED because they can never write
> >> anything back to the file, but the data one cares that it's MAP_PRIVATE: 
> >> any
> >> changes stay local and do NOT get written back to the file. And the bss is 
> >> an
> >> anonymous mapping so starts zeroed, the file doesn't bother wasting space 
> >> on a
> >> run of zeroes when the OS can just provide that on request. (It stands for 
> >> Block
> >> Starting Symbol which I assume meant something useful 40 years ago on DEC 
> >> hardware.)
> >
> > (close, but it was IBM and the name was slightly different:
> > https://en.wikipedia.org/wiki/.bss#Origin)
>
> That says United 

Re: [Toybox] Impact of global struct size

2024-01-03 Thread Rob Landley
I note that I've written over a hundred lines of rant in response to his
previous email already. I should dig back through this and turn it into proper
documentation at some point. (Especially since Elliott knows more of this stuff
than I do so I'm likely to get corrected a lot here...)

On 1/2/24 20:54, enh wrote:
>> You can look at /proc/self/maps (and /proc/self/smaps, and
>> /proc/self/smaps_rollup) to see them for a running process (replace "self" 
>> with
>> any running PID, self is a symlink to your current PID). The six sections 
>> are:
>>
>>   text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC)
>>   rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, PROT_READ)
>>   data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, PROT_WRITE)
>>   bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE)
>>   stack - function call stack, also contains environment data
>>   heap - backing store for malloc() and free()
> 
> (Android and modern linux distros require the relro section too.

I thought that was only needed for dynamic linking? Then again you don't allow a
lot of static stuff to run on the final system anyway...

(The line between PIE and dynamic linking confuses even me. How does static PIE
relocate itself? I _think_ I looked it up once, but "it's statically links in a
dynamic linker in the pile of crt1.o and begin.o files" _can't_ be right...)

> interestingly, there _is_ an elf program header for the stack, to
> signal that you don't want an executable stack. iirc Android and [very
> very recently] modern linux distros won't let you start a process with
> an executable main stack, but afaik the code for the option no-one has
> wanted/needed for a very long time is still in the kernel.)

Cool.

These days there's also vdso and vvar, which are provided by the kernel at
runtime. The first is a .text section with magic functions you can call as an
alternative to syscalls, and the second is a magic .rodata section that provides
volatile variables the OS updates which you can just reach out and look at.

Between the two of them you can do things like check the current timestamp
without a system call. What they actually provide varies by OS (and then your
libc has to be taught to use each new capability out of there instead of making
the syscalls).

"cat /proc/self/maps" and they're the last two entries if present.

There is a "man 7 vdso" but I dunno how up to date it is. (Which gets us back to
Michael Kerrisk's retirement and the new guy NOT MAINTAINING A WEB COPY. Grrr.)

Maintaining backwards compatibility means keeping a lot of old stuff. I had a
talk with Rich Felker last night on IRC about what musl-libc's syscall
requirements actually _are_, and what it would take to repot it on top of a
posix-ish RTOS du jour. (Makes the trusting trust cleansing cycle smaller if you
can cross compile Linux from an RTOS...)

We didn't come to a conclusion, but I _did_ get permission from skarnet to use
his git://git.skarnet.org/mdevd under 0BSD. (POrting that to toybox seems easier
than bringing my old mdev code up to speed for all the
https://github.com/slashbeast/mdev-like-a-boss stuff it's grown since I handed
it off.

>> The first three of those literally exist in the ELF file, as in it mmap()s a
>> block of data out of the file at a starting offset, and the memory is thus
>> automatically populated with data from the file. The text and rodata ones 
>> don't
>> really care if it's MAP_PRIVATE or MAP_SHARED because they can never write
>> anything back to the file, but the data one cares that it's MAP_PRIVATE: any
>> changes stay local and do NOT get written back to the file. And the bss is an
>> anonymous mapping so starts zeroed, the file doesn't bother wasting space on 
>> a
>> run of zeroes when the OS can just provide that on request. (It stands for 
>> Block
>> Starting Symbol which I assume meant something useful 40 years ago on DEC 
>> hardware.)
> 
> (close, but it was IBM and the name was slightly different:
> https://en.wikipedia.org/wiki/.bss#Origin)

That says United Aircraft Corporation named it using IBM 704 hardware in an
assembler and then in fortran. (I only give wikipedia[citation needed] about an
80% chance to be accurate about any given fact, but am not root causing it right
now. :)

I like to track down magic acronyms, ala grep meaning "get regular expression".
I once emailed Dennis Ritchie to ask what "inode" meant:

https://lkml.iu.edu/hypermail/linux/kernel/0207.2/1182.html

But in this case I stopped paying attention once I confirmed it doesn't mean
anything of modern relevance.

The interesting part (to me) is that the name predates unix by almost 20 years
(mainframe legacy predating even the PDP-1), and predating ELF by 40 years. (The
first OS with ELF binaries was Solaris 2.0 released in 1992. Linux switched over
3-4 years later.)

If it wasn't a legacy acronym from shortly after world war II, it would probably
be called 

Re: [Toybox] Impact of global struct size

2024-01-02 Thread enh via Toybox
On Mon, Jan 1, 2024 at 12:39 PM Rob Landley  wrote:
>
> On 12/30/23 18:10, Ray Gardner wrote:
> > I am having a bit of trouble understanding the impact of globals.
> >
> > There are the probes GLOBALS and findglobals to see what the space usage
> > is for globals. The output of both show that "this" is up to 8232 bytes,
> > due to the "ip" toy using 8232 of global space.
>
> Which is in pending for a reason.
>
> > The blog entry of 31 August 2023 ends with some discussion of which
> > commands take up the most global space. It says "Everything "tr" and
> > earlier is reasonably sized, and "ip" and "telnet" are in pending."
>
> I sorted them by size and cut and pasted the end of the list, starting with 
> "tr".
>
> Commands in pending haven't been (fully) cleaned up yet, so problems in them
> aren't yet confirmed to be a design problem requiring heavy lifting. Most 
> likely
> something easily fixable I just haven't done the janitorial work for yet.
>
> > I inferred that this means commands in pending are less important here,
> > but they still seem to take up space in "this".
>
> Only when enabled, which they aren't in defconfig. If you don't switch it on 
> in
> config, then it doesn't get build, meaning it doesn't take up any space in the
> resulting toybox binary.
>
> > How important is the space here? "tr" was 520 then, cksum was 1024. How
> > big is too big?
>
> Mostly this is an issue for embedded systems. I doubt android's going to care.

i'm the one who prods you once every year or so suggesting that 4KiB
for toybuf/libbuf is a bit on the small side :-)

> Sorry for the delay replying, I can't figure out how to explain this without a
> GIANT INFODUMP of backstory. The tl;dr is your read-only data is shared 
> between
> all instances of the program, but your _writeable_ data needs a separate copy
> for each instance of the program that's running, and that includes every 
> global
> variable you MIGHT write to. The physical pages can be demand-faulted on 
> systems
> with an MMU (although each fault gets rounded up to page size), but without an
> mmu it's LD_BIND_NOW and then some. See "man 8 ld.so" if you didn't get that
> reference...)
>
> Ok, backstory: since 1996 modern Linux executables are stored in ELF format
> (Executable Linking Format, yes "ELF format" is like "ATM machine").

(no-one made you say "ELF format" ... you did that :-) i tend to say
"ELF file".)

> It's an
> archive format like zip or tar, except what it stores is (among other things) 
> a
> list of "sections" each containing a list of "symbols". Your linker puts this
> together from the .o files produced by the compiler.
>
> Statically linked processes have six main memory mappings, four of which are
> initialized by the kernel's ELF loader (fs/binfmt_elf.c) from sections in the
> ELF file, and the other two are generated at runtime. All six of these are
> created by the kernel during the execve(2) system call, mostly l wanders 
> through
> fs/binfmt_elf.c (or fs/binfmt_fdpic.c which is kind of an ext2/ext3 thing 
> that's
> MOSTLY the same with minor variants and REALLY SHOULD be the same file but 
> isn't
> because kernel development became proctologically recursive some years ago).
>
> You can look at /proc/self/maps (and /proc/self/smaps, and
> /proc/self/smaps_rollup) to see them for a running process (replace "self" 
> with
> any running PID, self is a symlink to your current PID). The six sections are:
>
>   text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC)
>   rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, PROT_READ)
>   data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, PROT_WRITE)
>   bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE)
>   stack - function call stack, also contains environment data
>   heap - backing store for malloc() and free()

(Android and modern linux distros require the relro section too.
interestingly, there _is_ an elf program header for the stack, to
signal that you don't want an executable stack. iirc Android and [very
very recently] modern linux distros won't let you start a process with
an executable main stack, but afaik the code for the option no-one has
wanted/needed for a very long time is still in the kernel.)

> The first three of those literally exist in the ELF file, as in it mmap()s a
> block of data out of the file at a starting offset, and the memory is thus
> automatically populated with data from the file. The text and rodata ones 
> don't
> really care if it's MAP_PRIVATE or MAP_SHARED because they can never write
> anything back to the file, but the data one cares that it's MAP_PRIVATE: any
> changes stay local and do NOT get written back to the file. And the bss is an
> anonymous mapping so starts zeroed, the file doesn't bother wasting space on a
> run of zeroes when the OS can just provide that on request. (It stands for 
> Block
> Starting Symbol which I assume meant something useful 40 years ago on 

Re: [Toybox] Impact of global struct size

2024-01-02 Thread Ray Gardner
On Tue, Jan 2, 2024 at 3:58 PM Ray Gardner  wrote:
>
> On Mon, Jan 1, 2024 at 1:39 PM Rob Landley  wrote:
> > ... [ a very long and detailed reply ] ...
>
> Rob, thank you for the "GIANT INFODUMP", and I mean that sincerely. It
> took me a while to read it; it must have taken quite a while to write it.
> A lot of info on kernel-level memory management, I think I got about 90%
> of it but I'll have to look up some stuff (PLT, GOT, ...).
>
> > yes "ELF format" is like "ATM machine"
>
> where I use my PIN code?

umm  where I use my PIN number?

I can screw up the simplest things
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-02 Thread Ray Gardner
On Mon, Jan 1, 2024 at 1:39 PM Rob Landley  wrote:
> ... [ a very long and detailed reply ] ...

Rob, thank you for the "GIANT INFODUMP", and I mean that sincerely. It
took me a while to read it; it must have taken quite a while to write it.
A lot of info on kernel-level memory management, I think I got about 90%
of it but I'll have to look up some stuff (PLT, GOT, ...).

> yes "ELF format" is like "ATM machine"

where I use my PIN code?

One bit I can contribute: BSS is an assembler directive dating back at
least to the 1960s and probably earlier (don't ask me how I know). It was
used to reserve uninitialized space; BSSZ was used to reserve space zeroed
out at load time. Don't know if it's in any current assemblers.

I tried inserting a printf of sizeof(TT) and find that it does report only
the global size of my own toy. I should have tried that before I asked
about it, and looked at how TT is defined. (I was thinking it was the
entire "this" union but obviously it could not be, given how globals are
accessed in each toy. Braino...)

I know you aren't too big on using "const", but you said (implied?) it
could put data into the rodata section. For example, would it be
beneficial to do this:

static char const * const msg = "a message";
static char const * const msgs[] = { "msg1", "msg2", 0 };

Ray
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Impact of global struct size

2024-01-01 Thread Rob Landley
On 12/30/23 18:10, Ray Gardner wrote:
> I am having a bit of trouble understanding the impact of globals.
> 
> There are the probes GLOBALS and findglobals to see what the space usage
> is for globals. The output of both show that "this" is up to 8232 bytes,
> due to the "ip" toy using 8232 of global space.

Which is in pending for a reason.

> The blog entry of 31 August 2023 ends with some discussion of which
> commands take up the most global space. It says "Everything "tr" and
> earlier is reasonably sized, and "ip" and "telnet" are in pending."

I sorted them by size and cut and pasted the end of the list, starting with 
"tr".

Commands in pending haven't been (fully) cleaned up yet, so problems in them
aren't yet confirmed to be a design problem requiring heavy lifting. Most likely
something easily fixable I just haven't done the janitorial work for yet.

> I inferred that this means commands in pending are less important here,
> but they still seem to take up space in "this".

Only when enabled, which they aren't in defconfig. If you don't switch it on in
config, then it doesn't get build, meaning it doesn't take up any space in the
resulting toybox binary.

> How important is the space here? "tr" was 520 then, cksum was 1024. How
> big is too big?

Mostly this is an issue for embedded systems. I doubt android's going to care.

Sorry for the delay replying, I can't figure out how to explain this without a
GIANT INFODUMP of backstory. The tl;dr is your read-only data is shared between
all instances of the program, but your _writeable_ data needs a separate copy
for each instance of the program that's running, and that includes every global
variable you MIGHT write to. The physical pages can be demand-faulted on systems
with an MMU (although each fault gets rounded up to page size), but without an
mmu it's LD_BIND_NOW and then some. See "man 8 ld.so" if you didn't get that
reference...)

Ok, backstory: since 1996 modern Linux executables are stored in ELF format
(Executable Linking Format, yes "ELF format" is like "ATM machine"). It's an
archive format like zip or tar, except what it stores is (among other things) a
list of "sections" each containing a list of "symbols". Your linker puts this
together from the .o files produced by the compiler.

Statically linked processes have six main memory mappings, four of which are
initialized by the kernel's ELF loader (fs/binfmt_elf.c) from sections in the
ELF file, and the other two are generated at runtime. All six of these are
created by the kernel during the execve(2) system call, mostly l wanders through
fs/binfmt_elf.c (or fs/binfmt_fdpic.c which is kind of an ext2/ext3 thing that's
MOSTLY the same with minor variants and REALLY SHOULD be the same file but isn't
because kernel development became proctologically recursive some years ago).

You can look at /proc/self/maps (and /proc/self/smaps, and
/proc/self/smaps_rollup) to see them for a running process (replace "self" with
any running PID, self is a symlink to your current PID). The six sections are:

  text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC)
  rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, PROT_READ)
  data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, PROT_WRITE)
  bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE)
  stack - function call stack, also contains environment data
  heap - backing store for malloc() and free()

The first three of those literally exist in the ELF file, as in it mmap()s a
block of data out of the file at a starting offset, and the memory is thus
automatically populated with data from the file. The text and rodata ones don't
really care if it's MAP_PRIVATE or MAP_SHARED because they can never write
anything back to the file, but the data one cares that it's MAP_PRIVATE: any
changes stay local and do NOT get written back to the file. And the bss is an
anonymous mapping so starts zeroed, the file doesn't bother wasting space on a
run of zeroes when the OS can just provide that on request. (It stands for Block
Starting Symbol which I assume meant something useful 40 years ago on DEC 
hardware.)

All four of those ELF sections (text, rodata, data, bss) are each treated as a
giant struct under the covers, because that's how C thinks. Every time you
reference a variable the C code goes "I have a pointer to the start of this, and
I have an offset into it where this particular symbol lives within that segment,
and I know the type and thus size of the variable living at that offset" every
time you reference a symbol that lives there.

The remaining two memory blocks aren't part of ELF, but they're needed at 
runtime.

The stack is also set up by the kernel, and is funny in three ways:

1) it has environment data at the end (so all your inherited environment
variables, and your argv[] arguments, plus an array of pointers to the start of
each string which is what char *argv[] and char *environ[] actually