from:"Rob Landley"

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley



On 01/31/2018 10:22 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 21:03 -0500, Arvind Sankar wrote:
>> On Wed, Jan 31, 2018 at 05:48:20PM -0600, Rob Landley wrote:
>>> On 01/31/2018 04:07 PM, Mimi Zohar wrote:
>>>> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
>>>> configured in tmpfs and am using rootfs but I want that
>>> rootfs
>>>>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>>>>> it?)
>>>>
>>>> I must be missing something.  Which systems don't specify "root=" on
>>>> the boot command line.
>>>
>>> Any system using initrd or initramfs?
>>>
>>
>> Don't a lot of initramfs setups use root= to tell the initramfs which
>> actual root file system to switch to after early boot?
> 
> With your patch and specifying "root=tmpfs", dracut is complaining:
> 
> dracut: FATAL: Don't know how to handle 'root=tmpfs'
> dracut: refusing to continue

"The kernel can't break this buggy userspace package."

"The kernel must give access to a new feature to this buggy userspace
package".

I think kernel policy asks you to pick one, but I could be wrong...

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley

On 02/01/2018 09:55 AM, Mimi Zohar wrote:
> On Thu, 2018-02-01 at 09:20 -0600, Rob Landley wrote:
> 
>>> With your patch and specifying "root=tmpfs", dracut is complaining:
>>>
>>> dracut: FATAL: Don't know how to handle 'root=tmpfs'
>>> dracut: refusing to continue
>>
>> [googles]... I do not understand why this package exists.
>>
>> If you're switching to another root filesystem, using a tool that
>> wikipedia[citation needed] says has no purpose but to switch to another
>> root filesystem, (so let's reproduce the kernel infrastructure in
>> userspace while leaving it the kernel too)... why do you need initramfs
>> to be tmpfs? You're using it for half a second, then discarding it,
>> what's the point of it being tmpfs?
> 
> Unlike the kernel image which is signed by the distros, the initramfs
> doesn't come signed, because it is built on the target system.  Even
> if the initramfs did come signed, it is beneficial to measure and
> appraise the individual files in the initramfs.

You can still shoot yourself in the foot with tmpfs. People mount a /run
and a /tmp and then as a normal user you can go
https://twitter.com/landley/status/959103235305951233 and maybe the
default should be a little more clever there...

I'll throw it on the todo heap. :)

>> Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
>> is configured in, even when you're then going to overmount it with
>> something else like you're doing, let's just _remove_ the test. If it
>> can be tmpfs, have it be tmpfs.
> 
> Very much appreciated!

Not yet tested, but something like the attached? (Sorry for the
half-finished doc changes in there, I'm at work and have a 5 minute
break. I can test properly this evening if you don't get to it...)

Rob
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b98048b..a5b44b2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3771,8 +3771,14 @@
 			debug-uart get routed to the D+ and D- pins of the usb
 			port and the regular usb controller gets disabled.
 
-	root=		[KNL] Root filesystem
-			See name_to_dev_t comment in init/do_mounts.c.
+	root=		[KNL] Fallback root filesystem when not using initramfs
+			If initramfs contains an /init file to run as PID 1 the
+			kernel ignores this setting. When initramfs doesn't have
+			/init (or whatever rdinit= points to) the kernel calls
+			prepare_namespace() in init/do_mounts.c to mount another
+			filesystem over / and chroot into it, then looks for
+			/sbin/init in there. (And /etc/init, /bin/init, and
+			/bin/sh for historical reasons.)
 
 	rootdelay=	[KNL] Delay (in seconds) to pause before attempting to
 			mount the root filesystem
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index b176928..f3c57ba 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -67,6 +67,10 @@ A ramfs derivative called tmpfs was created to add size limits, and the ability
 to write the data to swap space.  Normal users can be allowed write access to
 tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
 
+The kernel uses tmpfs for ramfs when CONFIG_TMPFS=y and no "root=" is
+specified in the kernel command line. If you can't stop yourself from
+specifying root= you can also use "root=tmpfs".
+
 What is rootfs?
 ---
 
@@ -236,22 +240,10 @@ An initramfs archive is a complete self-contained root filesystem for Linux.
 If you don't already understand what shared libraries, devices, and paths
 you need to get a minimal root filesystem up and running, here are some
 references:
-http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
-http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
-http://www.linuxfromscratch.org/lfs/view/stable/
-
-The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
-designed to be a tiny C library to statically link early userspace
-code against, along with some related utilities.  It is BSD licensed.
 
-I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
-myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
-package is planned for the busybox 1.3 release.)
-
-In theory you could use glibc, but that's not well suited for small embedded
-uses like this.  (A "hello world" program statically linked against glibc is
-over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
-name lookups, even when otherwise statically linked.)
+  http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+  http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+  http://www.linuxfromscratc

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley

On 02/01/2018 09:55 AM, Mimi Zohar wrote:
> On Thu, 2018-02-01 at 09:20 -0600, Rob Landley wrote:
> 
>>> With your patch and specifying "root=tmpfs", dracut is complaining:
>>>
>>> dracut: FATAL: Don't know how to handle 'root=tmpfs'
>>> dracut: refusing to continue
>>
>> [googles]... I do not understand why this package exists.
>>
>> If you're switching to another root filesystem, using a tool that
>> wikipedia[citation needed] says has no purpose but to switch to another
>> root filesystem, (so let's reproduce the kernel infrastructure in
>> userspace while leaving it the kernel too)... why do you need initramfs
>> to be tmpfs? You're using it for half a second, then discarding it,
>> what's the point of it being tmpfs?
> 
> Unlike the kernel image which is signed by the distros, the initramfs
> doesn't come signed, because it is built on the target system.  Even
> if the initramfs did come signed, it is beneficial to measure and
> appraise the individual files in the initramfs.

You can still shoot yourself in the foot with tmpfs. People mount a /run
and a /tmp and then as a normal user you can go
https://twitter.com/landley/status/959103235305951233 and maybe the
default should be a little more clever there...

I'll throw it on the todo heap. :)

>> Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
>> is configured in, even when you're then going to overmount it with
>> something else like you're doing, let's just _remove_ the test. If it
>> can be tmpfs, have it be tmpfs.
> 
> Very much appreciated!

Not yet tested, but something like the attached? (Sorry for the
half-finished doc changes in there, I'm at work and have a 5 minute
break. I can test properly this evening if you don't get to it...)

Rob
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b98048b..a5b44b2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3771,8 +3771,14 @@
 			debug-uart get routed to the D+ and D- pins of the usb
 			port and the regular usb controller gets disabled.
 
-	root=		[KNL] Root filesystem
-			See name_to_dev_t comment in init/do_mounts.c.
+	root=		[KNL] Fallback root filesystem when not using initramfs
+			If initramfs contains an /init file to run as PID 1 the
+			kernel ignores this setting. When initramfs doesn't have
+			/init (or whatever rdinit= points to) the kernel calls
+			prepare_namespace() in init/do_mounts.c to mount another
+			filesystem over / and chroot into it, then looks for
+			/sbin/init in there. (And /etc/init, /bin/init, and
+			/bin/sh for historical reasons.)
 
 	rootdelay=	[KNL] Delay (in seconds) to pause before attempting to
 			mount the root filesystem
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index b176928..f3c57ba 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -67,6 +67,10 @@ A ramfs derivative called tmpfs was created to add size limits, and the ability
 to write the data to swap space.  Normal users can be allowed write access to
 tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
 
+The kernel uses tmpfs for ramfs when CONFIG_TMPFS=y and no "root=" is
+specified in the kernel command line. If you can't stop yourself from
+specifying root= you can also use "root=tmpfs".
+
 What is rootfs?
 ---
 
@@ -236,22 +240,10 @@ An initramfs archive is a complete self-contained root filesystem for Linux.
 If you don't already understand what shared libraries, devices, and paths
 you need to get a minimal root filesystem up and running, here are some
 references:
-http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
-http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
-http://www.linuxfromscratch.org/lfs/view/stable/
-
-The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
-designed to be a tiny C library to statically link early userspace
-code against, along with some related utilities.  It is BSD licensed.
 
-I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
-myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
-package is planned for the busybox 1.3 release.)
-
-In theory you could use glibc, but that's not well suited for small embedded
-uses like this.  (A "hello world" program statically linked against glibc is
-over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
-name lookups, even when otherwise statically linked.)
+  http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+  http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+  http://www.linuxfromscratc

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley

On 01/31/2018 10:22 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 21:03 -0500, Arvind Sankar wrote:
>> On Wed, Jan 31, 2018 at 05:48:20PM -0600, Rob Landley wrote:
>>> On 01/31/2018 04:07 PM, Mimi Zohar wrote:
>>>> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
>>>> configured in tmpfs and am using rootfs but I want that
>>> rootfs
>>>>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>>>>> it?)
>>>>
>>>> I must be missing something.  Which systems don't specify "root=" on
>>>> the boot command line.
>>>
>>> Any system using initrd or initramfs?
>>>
>>
>> Don't a lot of initramfs setups use root= to tell the initramfs which
>> actual root file system to switch to after early boot?

You mean the option that _isn't_ passed through as an environment
variable (the way ROOT= would be) so you have to parse /proc/cmdline to
to see if it was passed in?

If you really, really, really, really, really want to double down on the
"no, this is the button, it doesn't do what I thought but I will MAKE it
work" obsession, sure.

> With your patch and specifying "root=tmpfs", dracut is complaining:
> 
> dracut: FATAL: Don't know how to handle 'root=tmpfs'
> dracut: refusing to continue

[googles]... I do not understand why this package exists.

If you're switching to another root filesystem, using a tool that
wikipedia[citation needed] says has no purpose but to switch to another
root filesystem, (so let's reproduce the kernel infrastructure in
userspace while leaving it the kernel too)... why do you need initramfs
to be tmpfs? You're using it for half a second, then discarding it,
what's the point of it being tmpfs?

Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
is configured in, even when you're then going to overmount it with
something else like you're doing, let's just _remove_ the test. If it
can be tmpfs, have it be tmpfs.

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley

On 01/31/2018 10:22 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 21:03 -0500, Arvind Sankar wrote:
>> On Wed, Jan 31, 2018 at 05:48:20PM -0600, Rob Landley wrote:
>>> On 01/31/2018 04:07 PM, Mimi Zohar wrote:
>>>> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
>>>> configured in tmpfs and am using rootfs but I want that
>>> rootfs
>>>>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>>>>> it?)
>>>>
>>>> I must be missing something.  Which systems don't specify "root=" on
>>>> the boot command line.
>>>
>>> Any system using initrd or initramfs?
>>>
>>
>> Don't a lot of initramfs setups use root= to tell the initramfs which
>> actual root file system to switch to after early boot?

You mean the option that _isn't_ passed through as an environment
variable (the way ROOT= would be) so you have to parse /proc/cmdline to
to see if it was passed in?

If you really, really, really, really, really want to double down on the
"no, this is the button, it doesn't do what I thought but I will MAKE it
work" obsession, sure.

> With your patch and specifying "root=tmpfs", dracut is complaining:
> 
> dracut: FATAL: Don't know how to handle 'root=tmpfs'
> dracut: refusing to continue

[googles]... I do not understand why this package exists.

If you're switching to another root filesystem, using a tool that
wikipedia[citation needed] says has no purpose but to switch to another
root filesystem, (so let's reproduce the kernel infrastructure in
userspace while leaving it the kernel too)... why do you need initramfs
to be tmpfs? You're using it for half a second, then discarding it,
what's the point of it being tmpfs?

Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
is configured in, even when you're then going to overmount it with
something else like you're doing, let's just _remove_ the test. If it
can be tmpfs, have it be tmpfs.

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-01-31 Thread Rob Landley

On 01/31/2018 04:07 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
> configured in tmpfs and am using rootfs but I want that
rootfs
>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>> it?)
> 
> I must be missing something.  Which systems don't specify "root=" on
> the boot command line.

Any system using initrd or initramfs?

I have one at https://github.com/landley/mkroot that doesn't, for
example. It's 600 lines of bash that builds simple Linux systems for a
bunch of different architectures, each with a qemu wrapper to boot it to
a shell prompt. And yes, it's using tmpfs for its initramfs, you can
tell because "grep rootfs /proc/mounts" gives a size. That's also where
I tested the patch I sent you.

The root= option specifies the filesystem to mount OVER rootfs. I.E.
it's the fallback root filesystem to mount when initramfs doesn't
contain an executable /init that can become PID 1. If you DO have an
/init in rootfs which the kernel manages to launch as PID 1, the kernel
code never reaches the part that uses the root= argument.

(Look for the call to prepare_namespace() in init/main.c, notice how
it's only called if it can't _already_ find "/init".)

That's why the test I added for initramfs vs initmpfs was "did they
specify root=", because if they did it means they're telling the kernel
what to mount over rootfs, so they're not staying in rootfs. That's what
that argument MEANS. They're telling init/main.c what fallback
filesystem to mount over rootfs _after_ failing to find /init in rootfs,
therefore they're not keeping rootfs as their root filesystem for userspace.

That said, a lot of people don't understand how this works, and they set
root= to things like /dev/ram when using initrd because "we must set
this knob to something, this is something, therefore we must set this
knob to it". The fact setting root=/dev/random would have the exact same
effect doesn't seem to bother them, they had Done It and It Worked,
therefore it was the Right Thing To Do. QED.

The patch last message was me going "alright, if people can't NOT
twiddle the knob, even when doing it breaks things in an immediate and
obvious way, and a big DO NOT TOUCH sign won't dissuade them, just give
the knob an explicit 'off' setting that literally does the same thing as
not touching it at all would".

Your solution was to add a safety catch for the knob, which is edging
into Rube Goldberg territory if you ask me.

> If we want to include and restore xattrs,
> there needs to be a way of using tmpfs.

Yes, using tmpfs for initramfs is useful, that's why I submitted patches
to hook it up back in 2013.

(Personally I find "cat /dev/zero > /filename" _not_ hard locking your
system instantly the most compelling feature. Although I believe what
motivated my initmpfs patches way back when was somebody wanting to
install an rpm into intramfs and the installer failing because ramfs
hasn't got a size so "df" always returns zero.)

> Mimi

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-01-31 Thread Rob Landley

On 01/31/2018 04:07 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
> configured in tmpfs and am using rootfs but I want that
rootfs
>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>> it?)
> 
> I must be missing something.  Which systems don't specify "root=" on
> the boot command line.

Any system using initrd or initramfs?

I have one at https://github.com/landley/mkroot that doesn't, for
example. It's 600 lines of bash that builds simple Linux systems for a
bunch of different architectures, each with a qemu wrapper to boot it to
a shell prompt. And yes, it's using tmpfs for its initramfs, you can
tell because "grep rootfs /proc/mounts" gives a size. That's also where
I tested the patch I sent you.

The root= option specifies the filesystem to mount OVER rootfs. I.E.
it's the fallback root filesystem to mount when initramfs doesn't
contain an executable /init that can become PID 1. If you DO have an
/init in rootfs which the kernel manages to launch as PID 1, the kernel
code never reaches the part that uses the root= argument.

(Look for the call to prepare_namespace() in init/main.c, notice how
it's only called if it can't _already_ find "/init".)

That's why the test I added for initramfs vs initmpfs was "did they
specify root=", because if they did it means they're telling the kernel
what to mount over rootfs, so they're not staying in rootfs. That's what
that argument MEANS. They're telling init/main.c what fallback
filesystem to mount over rootfs _after_ failing to find /init in rootfs,
therefore they're not keeping rootfs as their root filesystem for userspace.

That said, a lot of people don't understand how this works, and they set
root= to things like /dev/ram when using initrd because "we must set
this knob to something, this is something, therefore we must set this
knob to it". The fact setting root=/dev/random would have the exact same
effect doesn't seem to bother them, they had Done It and It Worked,
therefore it was the Right Thing To Do. QED.

The patch last message was me going "alright, if people can't NOT
twiddle the knob, even when doing it breaks things in an immediate and
obvious way, and a big DO NOT TOUCH sign won't dissuade them, just give
the knob an explicit 'off' setting that literally does the same thing as
not touching it at all would".

Your solution was to add a safety catch for the knob, which is edging
into Rube Goldberg territory if you ask me.

> If we want to include and restore xattrs,
> there needs to be a way of using tmpfs.

Yes, using tmpfs for initramfs is useful, that's why I submitted patches
to hook it up back in 2013.

(Personally I find "cat /dev/zero > /filename" _not_ hard locking your
system instantly the most compelling feature. Although I believe what
motivated my initmpfs patches way back when was somebody wanting to
install an rpm into intramfs and the installer failing because ramfs
hasn't got a size so "df" always returns zero.)

> Mimi

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-01-31 Thread Rob Landley

On 01/30/2018 03:46 PM, Mimi Zohar wrote:
> Commit 16203a7a9422 ("initmpfs: make rootfs use tmpfs when CONFIG_TMPFS
> enabled") introduced using tmpfs as the rootfs filesystem.  The use of
> tmpfs is limited to systems that do not specify "root=" on the boot
> command line.
> 
> Without the check "!saved_root_name[0]", rootfs uses tmpfs.  As there
> must be a valid reason for this check, this patch introduces a new boot
> command line option named "noramfs" to force rootfs to use tmpfs.
> 
> Signed-off-by: Mimi Zohar <zo...@linux.vnet.ibm.com>

How about just:

diff --git a/init/do_mounts.c b/init/do_mounts.c
index 7cf4f6d..af66ede 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -632,8 +632,8 @@ int __init init_rootfs(void)
if (err)
return err;
 
-   if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
-   (!root_fs_names || strstr(root_fs_names, "tmpfs"))) {
+   if (IS_ENABLED(CONFIG_TMPFS) && (!saved_root_name[0] ||
+   !strcmp(saved_root_name, "tmpfs"))) {
err = shmem_init();
is_tmpfs = true;
} else {

(Obviously-signed-off-by: Rob Landley <r...@landley.net>)

I.E. if you somehow just can't stop yourself from specifying root= when
using rootfs, have "root=tmpfs" do what you want.

(The old "I configured in tmpfs and am using rootfs but I want that rootfs
to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
it?)

> ---
>  Documentation/admin-guide/kernel-parameters.txt |  2 ++
>  init/do_mounts.c| 15 +--
>  2 files changed, 15 insertions(+), 2 deletions(-)

I suppose I should do a documentation update too. Lemme send a proper one
after work...

Rob

P.S. While I'm at it, I've meant to wire up rootflags= so you can specify
a memory limit other than 50% forever, I should do that too. And resend
my "make DEVTMPFS_MOUNT apply to initramfs" patch (with the debian bug
workaround)...

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-01-31 Thread Rob Landley

On 01/30/2018 03:46 PM, Mimi Zohar wrote:
> Commit 16203a7a9422 ("initmpfs: make rootfs use tmpfs when CONFIG_TMPFS
> enabled") introduced using tmpfs as the rootfs filesystem.  The use of
> tmpfs is limited to systems that do not specify "root=" on the boot
> command line.
> 
> Without the check "!saved_root_name[0]", rootfs uses tmpfs.  As there
> must be a valid reason for this check, this patch introduces a new boot
> command line option named "noramfs" to force rootfs to use tmpfs.
> 
> Signed-off-by: Mimi Zohar 

How about just:

diff --git a/init/do_mounts.c b/init/do_mounts.c
index 7cf4f6d..af66ede 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -632,8 +632,8 @@ int __init init_rootfs(void)
if (err)
return err;
 
-   if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
-   (!root_fs_names || strstr(root_fs_names, "tmpfs"))) {
+   if (IS_ENABLED(CONFIG_TMPFS) && (!saved_root_name[0] ||
+   !strcmp(saved_root_name, "tmpfs"))) {
        err = shmem_init();
is_tmpfs = true;
} else {

(Obviously-signed-off-by: Rob Landley )

I.E. if you somehow just can't stop yourself from specifying root= when
using rootfs, have "root=tmpfs" do what you want.

(The old "I configured in tmpfs and am using rootfs but I want that rootfs
to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
it?)

> ---
>  Documentation/admin-guide/kernel-parameters.txt |  2 ++
>  init/do_mounts.c| 15 +--
>  2 files changed, 15 insertions(+), 2 deletions(-)

I suppose I should do a documentation update too. Lemme send a proper one
after work...

Rob

P.S. While I'm at it, I've meant to wire up rootflags= so you can specify
a memory limit other than 50% forever, I should do that too. And resend
my "make DEVTMPFS_MOUNT apply to initramfs" patch (with the debian bug
workaround)...

Allnoconfig build still broken on x86-64 in today's git.

2018-01-31 Thread Rob Landley

$ make clean && make allnoconfig && make
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --allnoconfig Kconfig
#
# configuration written to .config
#
Makefile:932: *** "Cannot generate ORC metadata for
CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or
elfutils-libelf-devel".  Stop.
$ grep CONFIG_UNWINDER .config
CONFIG_UNWINDER_FRAME_POINTER=y
# CONFIG_UNWINDER_GUESS is not set
$

Still an unnecessary dependency that breaks the build even when it's
configged out.

Rob

Allnoconfig build still broken on x86-64 in today's git.

2018-01-31 Thread Rob Landley

$ make clean && make allnoconfig && make
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --allnoconfig Kconfig
#
# configuration written to .config
#
Makefile:932: *** "Cannot generate ORC metadata for
CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or
elfutils-libelf-devel".  Stop.
$ grep CONFIG_UNWINDER .config
CONFIG_UNWINDER_FRAME_POINTER=y
# CONFIG_UNWINDER_GUESS is not set
$

Still an unnecessary dependency that breaks the build even when it's
configged out.

Rob

Re: [PATCH v2 11/15] gen_init_cpio: add newcx format

2018-01-25 Thread Rob Landley

On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c
> index 7a2a6d85345d..78a47a5bdcb1 100644
> --- a/usr/gen_init_cpio.c
> +++ b/usr/gen_init_cpio.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

You're adding an assert? Really?

>   fputs(s, stdout);
> - offset += 110;
> + assert((offset & 3) == 0);
> + offset += cpio_hdr_size;

Why?

Rob

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-25 Thread Rob Landley

On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> diff --git a/Documentation/early-userspace/buffer-format.txt 
> b/Documentation/early-userspace/buffer-format.txt
> index e1fd7f9dad16..d818df4f72dc 100644
> --- a/Documentation/early-userspace/buffer-format.txt
> +++ b/Documentation/early-userspace/buffer-format.txt

> +compressed and/or uncompressed cpio archives; arbitrary amounts
> +zero bytes (for padding) can be added between members.

Missing "of" between amounts and zero. (Yeah it was in the original, but
if you're touching it anyway...)

> +c_xattrs_size  8 bytesSize of xattrs field
> +
> +Most of the fields match cpio_newc_header except c_mtime that contains
> +microseconds. c_chksum field is dropped.
> +
> +xattr_size is a total size of xattr_entry including 8 bytes of
> +xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
> +fields of cpio header.

xattrs_size or xattr_size?

Total nitpicks, I know. :)

Rob

Re: [PATCH v2 11/15] gen_init_cpio: add newcx format

2018-01-25 Thread Rob Landley

On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c
> index 7a2a6d85345d..78a47a5bdcb1 100644
> --- a/usr/gen_init_cpio.c
> +++ b/usr/gen_init_cpio.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

You're adding an assert? Really?

>   fputs(s, stdout);
> - offset += 110;
> + assert((offset & 3) == 0);
> + offset += cpio_hdr_size;

Why?

Rob

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-25 Thread Rob Landley

On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> diff --git a/Documentation/early-userspace/buffer-format.txt 
> b/Documentation/early-userspace/buffer-format.txt
> index e1fd7f9dad16..d818df4f72dc 100644
> --- a/Documentation/early-userspace/buffer-format.txt
> +++ b/Documentation/early-userspace/buffer-format.txt

> +compressed and/or uncompressed cpio archives; arbitrary amounts
> +zero bytes (for padding) can be added between members.

Missing "of" between amounts and zero. (Yeah it was in the original, but
if you're touching it anyway...)

> +c_xattrs_size  8 bytesSize of xattrs field
> +
> +Most of the fields match cpio_newc_header except c_mtime that contains
> +microseconds. c_chksum field is dropped.
> +
> +xattr_size is a total size of xattr_entry including 8 bytes of
> +xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
> +fields of cpio header.

xattrs_size or xattr_size?

Total nitpicks, I know. :)

Rob

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-25 Thread Rob Landley

On 01/25/2018 03:29 AM, Arnd Bergmann wrote:
> On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk  wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB.
>> - increased mtime field size to have usec precision and more than
>>   32-bit of seconds.
>> - removed unused checksum field.
>>
>> Signed-off-by: Taras Kondratiuk 
>> Signed-off-by: Mimi Zohar 
>> Signed-off-by: Victor Kamensky 
> 
> Ah nice, I like the extension of the time handling, that certainly
> addresses one of the issues with y2038 that we have previously
> hacked around in an ugly way (interpreting the 32-bit
> number as unsigned).

Taras and I exchanged email like a year ago working out format stuff, so
I don't have any real complaints. My feedback's already worked in, and I
can make toybox cpio support -h newcx as soon as the format's finalized
and I get a free weekend.

That said, I don't think -h newcx should emit (or recognize) the
"TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
files have a length, pipes provide EOF, and each cpiox entry starts with
6 bytes of c_magic anyway. (I stopped toybox from producing the TRAILER
entry back in june, toybox commit 32550751997d, and the kernel consumes
the resulting cpio just fine. All the trailer does is prevent you from
concatenating cpio files, which is a feature multiple people asked me for.)

> However, if this is to become a generally supported format
> for cpio files,

After Joerg Schilling dies (or admits solaris has) it might even make it
into posix.

> could we make it use nanosecond resolution
> instead? The issue that I see with microseconds is that
> storing a file in an archive and extracting it again would
> otherwise keep the mtime stamp /almost/ identical on file
> systems that have nanosecond resolution, but most of
> the time a comparison would indicate that the files are
> not the same.

I have no strong opinion on this? The tmpfs is still going to track
nanoseconds, this is just rounding when it populates them.

> Unfortunately, the range of a 64-bit nanoseconds counter
> is still a bit limited (584 years, or half of that if we make it
> signed). While this is clearly enough for the uses in
> initramfs, it still has a similar problem: someone creating
> a fake timestamp a long time in the past or future on
> a file system would lose information after going though
> cpio.

Hence microseconds. This came up in email when we were talking about
this (like a year ago) and I decided I didn't care. :)

64 bits of microseconds is +- 584 centuries, while being accurate
enough[1] that making a getpid() syscall probably takes longer than that
on our highest end boxen, let alone doing a dentry lookup in the vfs
(even if it's hot in cache).

Rob

[1] Is future proofing an issue here? The s-curve of moore's law started
bending down around y2k back when Intel had to recall its 1.13ghz
pentium III for having overclocked its own chip at the factory, and it's
pretty darn flat these days. Clock speeds first hit 4ghz 15 years ago
and haven't been back, most of the work since 2005 has been about
parallelism, and recent performance improvements are once again going to
pentium 4 pipeline length levels of absurdity, as meltdown/spectre
demonstrates (140 instructions of prefetch!??!?). Maybe intel will make
9 nanometer manufacturing work, but atomic limits are already an issue.

The problem with 1 second timestamps was you honestly could confuse
"make" about which file was newer once an exec() could complete in the
same second having done real work. That was the motivating issue causing
the change, going to nanoseconds was just the big hammer of "this is
large enough it won't matter again in our lifetimes". But nanosecond
time stamps are recording more jitter than useful information, and that
seems unlikely to change this century?

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-25 Thread Rob Landley

On 01/25/2018 03:29 AM, Arnd Bergmann wrote:
> On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk  wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB.
>> - increased mtime field size to have usec precision and more than
>>   32-bit of seconds.
>> - removed unused checksum field.
>>
>> Signed-off-by: Taras Kondratiuk 
>> Signed-off-by: Mimi Zohar 
>> Signed-off-by: Victor Kamensky 
> 
> Ah nice, I like the extension of the time handling, that certainly
> addresses one of the issues with y2038 that we have previously
> hacked around in an ugly way (interpreting the 32-bit
> number as unsigned).

Taras and I exchanged email like a year ago working out format stuff, so
I don't have any real complaints. My feedback's already worked in, and I
can make toybox cpio support -h newcx as soon as the format's finalized
and I get a free weekend.

That said, I don't think -h newcx should emit (or recognize) the
"TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
files have a length, pipes provide EOF, and each cpiox entry starts with
6 bytes of c_magic anyway. (I stopped toybox from producing the TRAILER
entry back in june, toybox commit 32550751997d, and the kernel consumes
the resulting cpio just fine. All the trailer does is prevent you from
concatenating cpio files, which is a feature multiple people asked me for.)

> However, if this is to become a generally supported format
> for cpio files,

After Joerg Schilling dies (or admits solaris has) it might even make it
into posix.

> could we make it use nanosecond resolution
> instead? The issue that I see with microseconds is that
> storing a file in an archive and extracting it again would
> otherwise keep the mtime stamp /almost/ identical on file
> systems that have nanosecond resolution, but most of
> the time a comparison would indicate that the files are
> not the same.

I have no strong opinion on this? The tmpfs is still going to track
nanoseconds, this is just rounding when it populates them.

> Unfortunately, the range of a 64-bit nanoseconds counter
> is still a bit limited (584 years, or half of that if we make it
> signed). While this is clearly enough for the uses in
> initramfs, it still has a similar problem: someone creating
> a fake timestamp a long time in the past or future on
> a file system would lose information after going though
> cpio.

Hence microseconds. This came up in email when we were talking about
this (like a year ago) and I decided I didn't care. :)

64 bits of microseconds is +- 584 centuries, while being accurate
enough[1] that making a getpid() syscall probably takes longer than that
on our highest end boxen, let alone doing a dentry lookup in the vfs
(even if it's hot in cache).

Rob

[1] Is future proofing an issue here? The s-curve of moore's law started
bending down around y2k back when Intel had to recall its 1.13ghz
pentium III for having overclocked its own chip at the factory, and it's
pretty darn flat these days. Clock speeds first hit 4ghz 15 years ago
and haven't been back, most of the work since 2005 has been about
parallelism, and recent performance improvements are once again going to
pentium 4 pipeline length levels of absurdity, as meltdown/spectre
demonstrates (140 instructions of prefetch!??!?). Maybe intel will make
9 nanometer manufacturing work, but atomic limits are already an issue.

The problem with 1 second timestamps was you honestly could confuse
"make" about which file was newer once an exec() could complete in the
same second having done real work. That was the motivating issue causing
the change, going to nanoseconds was just the big hammer of "this is
large enough it won't matter again in our lifetimes". But nanosecond
time stamps are recording more jitter than useful information, and that
seems unlikely to change this century?

Commit fc72ae40e303 broke x86-64 build environment.

2018-01-13 Thread Rob Landley

You've made the ORC unwinder part of allnoconfig, which means trying to
build "make ARCH=x86_64 allnoconfig" requires installing a new package
(libelf-dev) or else the build breaks.

What's worse, if I go into menuconfig and switch it back to frame
pointer, the build STILL breaks:

$ make -j 8
Makefile:932: *** "Cannot generate ORC metadata for
CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or
elfutils-libelf-devel".  Stop.
$ grep UNWIND .config
# CONFIG_UNWINDER_ORC is not set
CONFIG_UNWINDER_FRAME_POINTER=y
# CONFIG_UNWINDER_GUESS is not set

As far as I can tell, x86-64 doesn't build anymore without libelf-dev.
It's a new hard requirement for the build.

Why?

Rob

Commit fc72ae40e303 broke x86-64 build environment.

2018-01-13 Thread Rob Landley

You've made the ORC unwinder part of allnoconfig, which means trying to
build "make ARCH=x86_64 allnoconfig" requires installing a new package
(libelf-dev) or else the build breaks.

What's worse, if I go into menuconfig and switch it back to frame
pointer, the build STILL breaks:

$ make -j 8
Makefile:932: *** "Cannot generate ORC metadata for
CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or
elfutils-libelf-devel".  Stop.
$ grep UNWIND .config
# CONFIG_UNWINDER_ORC is not set
CONFIG_UNWINDER_FRAME_POINTER=y
# CONFIG_UNWINDER_GUESS is not set

As far as I can tell, x86-64 doesn't build anymore without libelf-dev.
It's a new hard requirement for the build.

Why?

Rob

powerpc64 kernel panic if you disable CONFIG_PPC_TRANSACTIONAL_MEM?

2017-12-16 Thread Rob Landley

I just added a ppc64 target to https://github.com/landley/mkroot which
means I built 4.14 with the attached miniconfig and ran it with the
attached qemu command line, and it works fine as is but if you remove
the transactional mem line from the config the kernel panics instead
of launching a shell prompt:

init[1]: unhandled signal 4 at 10001a04 nip 10001a04
lr 1002ebe8 code 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004

CPU: 0 PID: 1 Comm: init Not tainted 4.14.0 #1
Call Trace:
[ce02fa40] [c04ba730] dump_stack+0xb0/0xf0 (unreliable)
[ce02fa80] [c00602a0] panic+0x138/0x2f8
[ce02fb20] [c006541c] do_exit+0xa9c/0xaa0
[ce02fbe0] [c00654d8] do_group_exit+0x58/0xf0
[ce02fc20] [c0073274] get_signal+0x1c4/0x6b0
[ce02fd10] [c00142a0] do_signal+0x60/0x290
[ce02fe00] [c001461c] do_notify_resume+0x8c/0xd0
[ce02fe30] [c000b630] ret_from_except_lite+0x5c/0x60
Rebooting in 1 seconds..

Rob


powerpc64le.miniconf
Description: Binary data


qemu-powerpc64le.sh
Description: Bourne shell script

powerpc64 kernel panic if you disable CONFIG_PPC_TRANSACTIONAL_MEM?

2017-12-16 Thread Rob Landley

I just added a ppc64 target to https://github.com/landley/mkroot which
means I built 4.14 with the attached miniconfig and ran it with the
attached qemu command line, and it works fine as is but if you remove
the transactional mem line from the config the kernel panics instead
of launching a shell prompt:

init[1]: unhandled signal 4 at 10001a04 nip 10001a04
lr 1002ebe8 code 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004

CPU: 0 PID: 1 Comm: init Not tainted 4.14.0 #1
Call Trace:
[ce02fa40] [c04ba730] dump_stack+0xb0/0xf0 (unreliable)
[ce02fa80] [c00602a0] panic+0x138/0x2f8
[ce02fb20] [c006541c] do_exit+0xa9c/0xaa0
[ce02fbe0] [c00654d8] do_group_exit+0x58/0xf0
[ce02fc20] [c0073274] get_signal+0x1c4/0x6b0
[ce02fd10] [c00142a0] do_signal+0x60/0x290
[ce02fe00] [c001461c] do_notify_resume+0x8c/0xd0
[ce02fe30] [c000b630] ret_from_except_lite+0x5c/0x60
Rebooting in 1 seconds..

Rob


powerpc64le.miniconf
Description: Binary data


qemu-powerpc64le.sh
Description: Bourne shell script

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2017-11-17 Thread Rob Landley

On 11/17/2017 04:37 AM, John Paul Adrian Glaubitz wrote:
> Hi there!
> 
> On 07/03/2016 06:46 PM, Yoshinori Sato wrote:
>> SH get devicetree support. But it not working on existing H/W.
>>
>> IO-DATA HDL-U (aka landisk) currentry supported.
>> This H/W like SH7751 evalution board. It's a best to use this as a
>> change base H/W.
>> RTS7751R2Dplus is QEMU-SH4 target. So easy trying.
> 
> This patch series - which would make a huge improvement - is still not
> applied. It would be very useful to be able to test the device tree
> implementation with QEMU.
> 
> Any of the SH maintainers can apply this?

It's Rich's call, but given that it's _from_ one of the sh maintainers,
sounds to me like it can just go in if it still applies? (If there's
bugfixes needed they can go in -rc2 or so, after this merge window.)

Given that qemu serial's been broken for 9 months now, I doubt this
would make anything worse. (I should really check Cedric's qemu fork to
see if he fixed that...)

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2017-11-17 Thread Rob Landley

On 11/17/2017 04:37 AM, John Paul Adrian Glaubitz wrote:
> Hi there!
> 
> On 07/03/2016 06:46 PM, Yoshinori Sato wrote:
>> SH get devicetree support. But it not working on existing H/W.
>>
>> IO-DATA HDL-U (aka landisk) currentry supported.
>> This H/W like SH7751 evalution board. It's a best to use this as a
>> change base H/W.
>> RTS7751R2Dplus is QEMU-SH4 target. So easy trying.
> 
> This patch series - which would make a huge improvement - is still not
> applied. It would be very useful to be able to test the device tree
> implementation with QEMU.
> 
> Any of the SH maintainers can apply this?

It's Rich's call, but given that it's _from_ one of the sh maintainers,
sounds to me like it can just go in if it still applies? (If there's
bugfixes needed they can go in -rc2 or so, after this merge window.)

Given that qemu serial's been broken for 9 months now, I doubt this
would make anything worse. (I should really check Cedric's qemu fork to
see if he fixed that...)

Rob

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-04 Thread Rob Landley

On 11/03/2017 08:37 PM, Kees Cook wrote:
> We don't. (In fact, arg copying happens before we've even figured out
> which binfmt is involved.) I lifted it to just before the point of no
> return, but moving it before arg copying looks very hard (which
> contributed to why we went with the implementation we did).
> 
>> So it's pretty painful to make the limits different for suid and
>> non-suid binaries.
> 
> I would agree.

I think I know what to implement for toybox now: xargs should trust
libc's sysconf() to provide the common-case starting limit (subtracting
env space) then implement the fallback pipe-from-child thing to
iteratively try half the argument list when that fails.

Elliott's even cc'd so he can update bionic's sysconf for the new 10 meg
thing from the title commit. :)

Rob

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-04 Thread Rob Landley

On 11/03/2017 08:37 PM, Kees Cook wrote:
> We don't. (In fact, arg copying happens before we've even figured out
> which binfmt is involved.) I lifted it to just before the point of no
> return, but moving it before arg copying looks very hard (which
> contributed to why we went with the implementation we did).
> 
>> So it's pretty painful to make the limits different for suid and
>> non-suid binaries.
> 
> I would agree.

I think I know what to implement for toybox now: xargs should trust
libc's sysconf() to provide the common-case starting limit (subtracting
env space) then implement the fallback pipe-from-child thing to
iteratively try half the argument list when that fails.

Elliott's even cc'd so he can update bionic's sysconf for the new 10 meg
thing from the title commit. :)

Rob

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-04 Thread Rob Landley

Correcting Elliot's email to google, not gmail. (Sorry, I'm in Tokyo for
work this month, almost over the jetlag...)

On 11/03/2017 08:07 PM, Linus Torvalds wrote:
> On Fri, Nov 3, 2017 at 4:58 PM, Rob Landley <r...@landley.net> wrote:
>> On 11/02/2017 10:40 AM, Linus Torvalds wrote:
>>
>> But it boils down to "got the limit wrong, the exec failed after the
>> fork(), dynamic recovery from which is awkward so I'm trying to figure
>> out the right limit".

Sounds later like dynamic recovery is what you recommend. (Awkward
doesn't mean I can't do it.)

> I suspect we _do_ have to raise that limit, because clearly this is a
> regression, but I absolutely _detest_ the fact that a stupid
> _embedded_ OS thinks that it should have a bigger stack limit than
> stuff that runs on supercomputers.
> 
> That just makes me go "there's something seriously wrong".

This was me trying not to assume what other people will do, I think
android's default is still 8mb (it was in M) but my test systems for
this are literally on the other side of the planet right now.

Google's internal frame of reference is very different from mine. I got
pointed at a podcast (Android Developers Backstage #53) where Elliott
and another android dev talked about toybox for a few minutes in the
second half, they they shared a chuckle over my complaint that
downloading AOSP takes 150 gigabytes _before_ it tries to build
anything, and only the largest machine I own can build it at all (and
that very slowly). It was just so alien to them that this would be a
_problem_...

> For something like "xargs", I'm actually really saddened by the stupid
> decision to think it's a single value. The whole and *only* reason for
> xargs to exist is to just get it right,

Which is what I was trying very hard to do. :(

> and the natural thing for
> xargs to do would be to not ask, but simply try to do the whole thing,
> and if you get E2BIG, you decide to split it in half or something
> until it works. That kind of approach would just make it work
> _without_ depending on some magic value.
> 
> The fact that apparently xargs is too stupid to do that, and instead
> requires _SC_ARG_MAX to magically give it the "One True Value(tm)" is
> just all kinds of crap.

I'm writing this xargs, I can _make_ it do that, it just requires a pipe
back from the forked child to return status and is either slow (remove
one argument at a time) or inaccurate (cut it in half, result coulda
been longer). Either way xargs still needs an internal limit or "yes |
xargs" will try to fill all memory before ever calling exec().

The reason I wanted to support "exactly as big as possible" is that
calling a command as one invocation vs multiple invocations can change
behavior. Once you've decided to split, how BIG you split is much less
important, so falling back to an arbitrary limit would be fine except
I'd still have to check the stack size to see if it's _lower_ than that
arbitrary limit. (If you set the stack ulimit to 128k, which nommu
systems may wanna do, then the exec limit is 32k. It can be _anything_.)

And this limit is shared with environment variables so the problem might
be that your environment's pathological and you can't run this command
line with even one argument because envp ate all the space, but that's
another story and the user can wash it through env -i to make it work.
Except:

  $ env -i {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P}=$(printf '%0*d' 130657) \
  env | wc -c

Says 2090560 (of 2097152), but 130658 says argument list too long when
it's only 16 more bytes of the ~6k we should have left (envp[]=17*8,
argc=2*8, argv[0]=4...) argc and it sounds like you're saying I should
just stop _trying_ to figure out exact up-front measurements.

So stacksize /4, then split in half each time, and if it strips down to
one argument that can't run, have an error message for that. Ok.

> Oh well. Enough ranting.
> 
> What _is_ the stack limit when using toybox? Is it just entirely unlimited?

Answer to second question on ubuntu 14.04:

  landley@driftwood:~/linux/linux/fs$ ulimit -s 9
  landley@driftwood:~/linux/linux/fs$ ulimit -s
  9

Anybody can call ulimit to expand it as a normal user, so effectively
yes it is unlimited. I have no IDEA what my users are gonna do. (If they
do something stupid it's their fault, but I don't necessarily get to say
what stupid is from here.)

Answer to first: the default is whatever I inherited from the Android
fork du jour it's running on.

The google developers seem to be drinking from a firehose of
contributions from the half-dozen phone companies trying to get code
upstream. Elliott presumably says no to what he can but they're hugely
outnumbered and there's politics I'm only dimly aware of (never having
worked for google and only having met Elliott for lunch

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-04 Thread Rob Landley

Correcting Elliot's email to google, not gmail. (Sorry, I'm in Tokyo for
work this month, almost over the jetlag...)

On 11/03/2017 08:07 PM, Linus Torvalds wrote:
> On Fri, Nov 3, 2017 at 4:58 PM, Rob Landley  wrote:
>> On 11/02/2017 10:40 AM, Linus Torvalds wrote:
>>
>> But it boils down to "got the limit wrong, the exec failed after the
>> fork(), dynamic recovery from which is awkward so I'm trying to figure
>> out the right limit".

Sounds later like dynamic recovery is what you recommend. (Awkward
doesn't mean I can't do it.)

> I suspect we _do_ have to raise that limit, because clearly this is a
> regression, but I absolutely _detest_ the fact that a stupid
> _embedded_ OS thinks that it should have a bigger stack limit than
> stuff that runs on supercomputers.
> 
> That just makes me go "there's something seriously wrong".

This was me trying not to assume what other people will do, I think
android's default is still 8mb (it was in M) but my test systems for
this are literally on the other side of the planet right now.

Google's internal frame of reference is very different from mine. I got
pointed at a podcast (Android Developers Backstage #53) where Elliott
and another android dev talked about toybox for a few minutes in the
second half, they they shared a chuckle over my complaint that
downloading AOSP takes 150 gigabytes _before_ it tries to build
anything, and only the largest machine I own can build it at all (and
that very slowly). It was just so alien to them that this would be a
_problem_...

> For something like "xargs", I'm actually really saddened by the stupid
> decision to think it's a single value. The whole and *only* reason for
> xargs to exist is to just get it right,

Which is what I was trying very hard to do. :(

> and the natural thing for
> xargs to do would be to not ask, but simply try to do the whole thing,
> and if you get E2BIG, you decide to split it in half or something
> until it works. That kind of approach would just make it work
> _without_ depending on some magic value.
> 
> The fact that apparently xargs is too stupid to do that, and instead
> requires _SC_ARG_MAX to magically give it the "One True Value(tm)" is
> just all kinds of crap.

I'm writing this xargs, I can _make_ it do that, it just requires a pipe
back from the forked child to return status and is either slow (remove
one argument at a time) or inaccurate (cut it in half, result coulda
been longer). Either way xargs still needs an internal limit or "yes |
xargs" will try to fill all memory before ever calling exec().

The reason I wanted to support "exactly as big as possible" is that
calling a command as one invocation vs multiple invocations can change
behavior. Once you've decided to split, how BIG you split is much less
important, so falling back to an arbitrary limit would be fine except
I'd still have to check the stack size to see if it's _lower_ than that
arbitrary limit. (If you set the stack ulimit to 128k, which nommu
systems may wanna do, then the exec limit is 32k. It can be _anything_.)

And this limit is shared with environment variables so the problem might
be that your environment's pathological and you can't run this command
line with even one argument because envp ate all the space, but that's
another story and the user can wash it through env -i to make it work.
Except:

  $ env -i {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P}=$(printf '%0*d' 130657) \
  env | wc -c

Says 2090560 (of 2097152), but 130658 says argument list too long when
it's only 16 more bytes of the ~6k we should have left (envp[]=17*8,
argc=2*8, argv[0]=4...) argc and it sounds like you're saying I should
just stop _trying_ to figure out exact up-front measurements.

So stacksize /4, then split in half each time, and if it strips down to
one argument that can't run, have an error message for that. Ok.

> Oh well. Enough ranting.
> 
> What _is_ the stack limit when using toybox? Is it just entirely unlimited?

Answer to second question on ubuntu 14.04:

  landley@driftwood:~/linux/linux/fs$ ulimit -s 9
  landley@driftwood:~/linux/linux/fs$ ulimit -s
  9

Anybody can call ulimit to expand it as a normal user, so effectively
yes it is unlimited. I have no IDEA what my users are gonna do. (If they
do something stupid it's their fault, but I don't necessarily get to say
what stupid is from here.)

Answer to first: the default is whatever I inherited from the Android
fork du jour it's running on.

The google developers seem to be drinking from a firehose of
contributions from the half-dozen phone companies trying to get code
upstream. Elliott presumably says no to what he can but they're hugely
outnumbered and there's politics I'm only dimly aware of (never having
worked for google and only having met Elliott for lunch once a couple
years ago when I was

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-03 Thread Rob Landley

On 11/02/2017 10:40 AM, Linus Torvalds wrote:
> On Wed, Nov 1, 2017 at 9:28 PM, Linus Torvalds
>  wrote:
>>
>> Behavior changed. Things that test particular limits will get different
>> results. That's not breakage.
>>
>> Did an actual user application or script break?

Only due to getting the limit wrong. The actual failure's in the android
internal bugzilla I've never been able to read:

http://lists.landley.net/pipermail/toybox-landley.net/2017-September/009167.html

But it boils down to "got the limit wrong, the exec failed after the
fork(), dynamic recovery from which is awkward so I'm trying to figure
out the right limit".

> Ahh. I should have read that email more carefully. If xargs broke,
> that _will_ break actual scripts, yes. Do you actually set the stack
> limit to insane values? Anybody using toybox really shouldn't be doing
> 32MB stacks.

Toybox is the default command line of android since M, which went 64 bit
in L, and the Pixel 2 phone has 4 gigs of ram. My goal with toybox is to
turn android into a self-hosting development environment no longer
cross-compiled from a PC (http://landley.net/talks/celf-2013.txt) so I'm
trying to implement a command line that can run the entire AOSP build.

I.E. I have no idea what people will do with it, and try not to get in
their way.

My problem here is it's hard to figure out what exec size the limit
_is_. There's a sysconf(_SC_ARG_MAX) which bionic and glibc are
currently returning as stack_limit/4, which is now too big and exec()
will error out after the fork. Musl is returning the 131072 limit from
2011-ish, meaning "/bin/echo $(printf '%0*d' 131071)" works but
"printf '%0*d' 131071 | xargs" fails, an inconsistency I was trying to
avoid. Maybe I don't have that luxury...

Each argument has its own limit separate from the argv+envp total limit,
but there's only one "size" you can query through sysconf, so the
querying API is insufficient at the design level.

Meanwhile under bash you can allocate and dirty 256 megabytes from the
command line with:

  echo $(printf '%0*d' $((1<<28)))

Because it's a shell builtin so there's no actual exec. (And if
https://sourceware.org/bugzilla/show_bug.cgi?id=17829 ever gets fixed
it'll go back to allowing INT_MAX.)

Posix is its usual helpful self, read conservatively
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html
says to break the line at 2048 bytes.

> So I still do wonder if this actually breaks anything real, or just a
> test-suite or something?

I've cc'd Elliott, who would know. (He's the Android base os userspace
maintainer, he knows everything. Or can at least decode
http://b/65818597 .)

But this just broke my _fix_, not the earlier deployed stuff. I removed
the size measuring code when the 131072 limit went away, the bug was
there's a new limit I need to not hit, I tried to figure out what the
limit is now, confirmed that the various libc implementations don't
agree, then the actual kernel limit changed again while I was looking at it.

>Linus

Should I just go back to hardwiring in 131072? It's no _less_ arbitrary
than 10 megs, and it sounds like getting it _right_ is unachievable.

Thanks,

Rob

Re: Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-03 Thread Rob Landley

On 11/02/2017 10:40 AM, Linus Torvalds wrote:
> On Wed, Nov 1, 2017 at 9:28 PM, Linus Torvalds
>  wrote:
>>
>> Behavior changed. Things that test particular limits will get different
>> results. That's not breakage.
>>
>> Did an actual user application or script break?

Only due to getting the limit wrong. The actual failure's in the android
internal bugzilla I've never been able to read:

http://lists.landley.net/pipermail/toybox-landley.net/2017-September/009167.html

But it boils down to "got the limit wrong, the exec failed after the
fork(), dynamic recovery from which is awkward so I'm trying to figure
out the right limit".

> Ahh. I should have read that email more carefully. If xargs broke,
> that _will_ break actual scripts, yes. Do you actually set the stack
> limit to insane values? Anybody using toybox really shouldn't be doing
> 32MB stacks.

Toybox is the default command line of android since M, which went 64 bit
in L, and the Pixel 2 phone has 4 gigs of ram. My goal with toybox is to
turn android into a self-hosting development environment no longer
cross-compiled from a PC (http://landley.net/talks/celf-2013.txt) so I'm
trying to implement a command line that can run the entire AOSP build.

I.E. I have no idea what people will do with it, and try not to get in
their way.

My problem here is it's hard to figure out what exec size the limit
_is_. There's a sysconf(_SC_ARG_MAX) which bionic and glibc are
currently returning as stack_limit/4, which is now too big and exec()
will error out after the fork. Musl is returning the 131072 limit from
2011-ish, meaning "/bin/echo $(printf '%0*d' 131071)" works but
"printf '%0*d' 131071 | xargs" fails, an inconsistency I was trying to
avoid. Maybe I don't have that luxury...

Each argument has its own limit separate from the argv+envp total limit,
but there's only one "size" you can query through sysconf, so the
querying API is insufficient at the design level.

Meanwhile under bash you can allocate and dirty 256 megabytes from the
command line with:

  echo $(printf '%0*d' $((1<<28)))

Because it's a shell builtin so there's no actual exec. (And if
https://sourceware.org/bugzilla/show_bug.cgi?id=17829 ever gets fixed
it'll go back to allowing INT_MAX.)

Posix is its usual helpful self, read conservatively
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html
says to break the line at 2048 bytes.

> So I still do wonder if this actually breaks anything real, or just a
> test-suite or something?

I've cc'd Elliott, who would know. (He's the Android base os userspace
maintainer, he knows everything. Or can at least decode
http://b/65818597 .)

But this just broke my _fix_, not the earlier deployed stuff. I removed
the size measuring code when the 131072 limit went away, the bug was
there's a new limit I need to not hit, I tried to figure out what the
limit is now, confirmed that the various libc implementations don't
agree, then the actual kernel limit changed again while I was looking at it.

>Linus

Should I just go back to hardwiring in 131072? It's no _less_ arbitrary
than 10 megs, and it sounds like getting it _right_ is unachievable.

Thanks,

Rob

Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-01 Thread Rob Landley

Toybox has been trying to figure out how big an xargs is allowed to be
for a while:


http://lists.landley.net/pipermail/toybox-landley.net/2017-October/009186.html

We're trying to avoid the case where you can run something from the
command line, but not through xargs. In theory this limit is
sysconf(_SC_ARG_MAX) which on bionic and glibc returns 1/4 RLIMIT_STACK
(in accordance with the prophecy fs/exec.c function
get_arg_page()), but that turns out to be too simple. There's also a
131071 byte limit on each _individual_ argument, which I think I've
tracked down to fs/exec.c function setup_arg_pages() doing:

stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages *

And then it worked under ubuntu 14.04 but not current kernels. Why?
Because the above commit from Kees Cook broke it, by taking this:

include/uapi/linux/resource.h:
/*
 * Limit the stack by to some sane default: root can always
 * increase this limit if needed..  8MB seems reasonable.
 */
#define _STK_LIM(8*1024*1024)

And hardwiring in a random adjustment as a "640k ought to be enough for
anybody" constant on TOP of the existing RLIMIT_STACK/4 check. Without
even adjusting the "oh of course root can make this bigger, this is just
a default value" comment where it's #defined.

Look, if you want to cap RLIMIT_STACK for suid binaries, go for it. The
existing code will notice and adapt. But this new commit is crazy and
arbitrary and introduces more random version dependencies (how is
sysconf() supposed to know the value, an #if/else staircase based on
kernel version in every libc)?

Please revert it,

Rob

Regression: commit da029c11e6b1 broke toybox xargs.

2017-11-01 Thread Rob Landley

Toybox has been trying to figure out how big an xargs is allowed to be
for a while:


http://lists.landley.net/pipermail/toybox-landley.net/2017-October/009186.html

We're trying to avoid the case where you can run something from the
command line, but not through xargs. In theory this limit is
sysconf(_SC_ARG_MAX) which on bionic and glibc returns 1/4 RLIMIT_STACK
(in accordance with the prophecy fs/exec.c function
get_arg_page()), but that turns out to be too simple. There's also a
131071 byte limit on each _individual_ argument, which I think I've
tracked down to fs/exec.c function setup_arg_pages() doing:

stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages *

And then it worked under ubuntu 14.04 but not current kernels. Why?
Because the above commit from Kees Cook broke it, by taking this:

include/uapi/linux/resource.h:
/*
 * Limit the stack by to some sane default: root can always
 * increase this limit if needed..  8MB seems reasonable.
 */
#define _STK_LIM(8*1024*1024)

And hardwiring in a random adjustment as a "640k ought to be enough for
anybody" constant on TOP of the existing RLIMIT_STACK/4 check. Without
even adjusting the "oh of course root can make this bigger, this is just
a default value" comment where it's #defined.

Look, if you want to cap RLIMIT_STACK for suid binaries, go for it. The
existing code will notice and adapt. But this new commit is crazy and
arbitrary and introduces more random version dependencies (how is
sysconf() supposed to know the value, an #if/else staircase based on
kernel version in every libc)?

Please revert it,

Rob

[PATCH 1/1] Change ping_group_range default to what Android's init script sets.

2017-10-30 Thread Rob Landley

From: Rob Landley <r...@landley.net>

See message from the Android "native tools and libraries team" lead
(I.E. the maintainer of bionic, adb, toolbox, etc) at
http://lists.landley.net/pipermail/toybox-landley.net/2017-July/009103.html

Signed-off-by: Rob Landley <r...@landley.net>
---

 net/ipv4/af_inet.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e31108e..5b39a96 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1712,12 +1712,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.ip_local_ports.range[1] =  60999;
 
seqlock_init(>ipv4.ping_group_range.lock);
-   /*
-* Sane defaults - nobody may create ping sockets.
-* Boot scripts should set this to distro-specific group.
-*/
-   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 1);
-   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 
2147483647);
 
/* Default values for sysctl-controlled parameters.
 * We set them here, in case sysctl is not compiled.

[PATCH 1/1] Change ping_group_range default to what Android's init script sets.

2017-10-30 Thread Rob Landley

From: Rob Landley 

See message from the Android "native tools and libraries team" lead
(I.E. the maintainer of bionic, adb, toolbox, etc) at
http://lists.landley.net/pipermail/toybox-landley.net/2017-July/009103.html

Signed-off-by: Rob Landley 
---

 net/ipv4/af_inet.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e31108e..5b39a96 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1712,12 +1712,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.ip_local_ports.range[1] =  60999;
 
seqlock_init(>ipv4.ping_group_range.lock);
-   /*
-* Sane defaults - nobody may create ping sockets.
-* Boot scripts should set this to distro-specific group.
-*/
-   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 1);
-   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 
2147483647);
 
/* Default values for sysctl-controlled parameters.
 * We set them here, in case sysctl is not compiled.

Re: [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-19 Thread Rob Landley

On 09/17/2017 08:51 AM, Henrique de Moraes Holschuh wrote:
> On Sat, 16 Sep 2017, Rob Landley wrote:
>> So, I added a workaround with a printk in hopes of embarassing them into
>> someday fixing it.
> 
> Oh, it will be fixed in Debian alright.

Cool!

But part of the problem is people upgrade the kernel on existing
deployed root filesystems, some of which are a fork off of a fork off of
debian, so we won't exhaust the broken userspace for probably a couple
years.

I'd put it in feature-removal-schedule.txt but Linus zapped that, so...

> I am just waiting the issue to
> settle a bit to file the bug reports, or maybe even send in the Debian
> patches myself (note that I am not responsible for the code in question,
> so I am not wearing a brown paperbag at this time).  Even if I didn't do
> it, there are several other Debian Developers reading LKML that could do
> it (provided they noticed this specific thread and are aware of the
> situation) :p

There was a previous thread last merge window they didn't notice. I was
hoping the warning would be obvious enough. :)

> I can even push for the fixes to be accepted into the stable and
> oldstable branches of Debian, but that can take anything from a few
> weeks to several months, due to the way our stable releases work.  But
> it would eventually happen.
> 
> Whether such fixes will ever make it to LTS branches, especially
> Ubuntu's, *that* I don't know.

I have no idea what that powerpc system was, the guy didn't say...

Rob

Re: [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-19 Thread Rob Landley

On 09/17/2017 08:51 AM, Henrique de Moraes Holschuh wrote:
> On Sat, 16 Sep 2017, Rob Landley wrote:
>> So, I added a workaround with a printk in hopes of embarassing them into
>> someday fixing it.
> 
> Oh, it will be fixed in Debian alright.

Cool!

But part of the problem is people upgrade the kernel on existing
deployed root filesystems, some of which are a fork off of a fork off of
debian, so we won't exhaust the broken userspace for probably a couple
years.

I'd put it in feature-removal-schedule.txt but Linus zapped that, so...

> I am just waiting the issue to
> settle a bit to file the bug reports, or maybe even send in the Debian
> patches myself (note that I am not responsible for the code in question,
> so I am not wearing a brown paperbag at this time).  Even if I didn't do
> it, there are several other Debian Developers reading LKML that could do
> it (provided they noticed this specific thread and are aware of the
> situation) :p

There was a previous thread last merge window they didn't notice. I was
hoping the warning would be obvious enough. :)

> I can even push for the fixes to be accepted into the stable and
> oldstable branches of Debian, but that can take anything from a few
> weeks to several months, due to the way our stable releases work.  But
> it would eventually happen.
> 
> Whether such fixes will ever make it to LTS branches, especially
> Ubuntu's, *that* I don't know.

I have no idea what that powerpc system was, the guy didn't say...

Rob

Re: [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-16 Thread Rob Landley

On 09/14/2017 04:17 AM, Christophe LEROY wrote:
> Le 14/09/2017 à 01:51, Rob Landley a écrit :
>> From: Rob Landley <r...@landley.net>
>>
>> Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
>> /dev/console open after devtmpfs mount.
>>
>> Add workaround for Debian bug that was copied by Ubuntu.
> 
> Is that a bug only for Debian ? Why ?

Look down, specifically this bit:

>> v2 discussion:
>> http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05611.html

That's some discussion of version 2 of this patch, which was merged for
a while last dev cycle, then backed out again because it triggered the
same bug in a number of system init scripts:

  http://lkml.iu.edu/hypermail/linux/kernel/1705.2/07072.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01182.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01505.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01320.html

All of whom copied the broken error "recovery" path from debian. If they
checked whether it was already mounted, or didn't _blank_ the /dev
directory in response to mounting the exact same filesystem over itself
giving -EBUSY, the system would work fine. Heck, if you built a kernel
with a static /dev in initramfs and no devtmpfs configured in, the
script would break things exactly the same way. The breakage is that
script takes a hammer to a perfectly functional /dev directory and then
continues the boot with an empty /dev. That's bonkers.

> Why should a Debian bug be fixed by a workaround in the mainline kernel ?

That was my argument last time, and the answer was "Breaking userspace
is bad, mmmkay." Even when userspace is doing something REALLY OBVIOUSLY
STUPID and it is _clearly_ their fault, as long as they got there first
they've established the status quo and it doesn't matter how silly it is.

This was explicitly stated to me here:

  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/03292.html

I.E. don't argue with me, argue with him. :)

So, I added a workaround with a printk in hopes of embarassing them into
someday fixing it.

Rob

Re: [PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-16 Thread Rob Landley

On 09/14/2017 04:17 AM, Christophe LEROY wrote:
> Le 14/09/2017 à 01:51, Rob Landley a écrit :
>> From: Rob Landley 
>>
>> Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
>> /dev/console open after devtmpfs mount.
>>
>> Add workaround for Debian bug that was copied by Ubuntu.
> 
> Is that a bug only for Debian ? Why ?

Look down, specifically this bit:

>> v2 discussion:
>> http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05611.html

That's some discussion of version 2 of this patch, which was merged for
a while last dev cycle, then backed out again because it triggered the
same bug in a number of system init scripts:

  http://lkml.iu.edu/hypermail/linux/kernel/1705.2/07072.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01182.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01505.html
  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/01320.html

All of whom copied the broken error "recovery" path from debian. If they
checked whether it was already mounted, or didn't _blank_ the /dev
directory in response to mounting the exact same filesystem over itself
giving -EBUSY, the system would work fine. Heck, if you built a kernel
with a static /dev in initramfs and no devtmpfs configured in, the
script would break things exactly the same way. The breakage is that
script takes a hammer to a perfectly functional /dev directory and then
continues the boot with an empty /dev. That's bonkers.

> Why should a Debian bug be fixed by a workaround in the mainline kernel ?

That was my argument last time, and the answer was "Breaking userspace
is bad, mmmkay." Even when userspace is doing something REALLY OBVIOUSLY
STUPID and it is _clearly_ their fault, as long as they got there first
they've established the status quo and it doesn't matter how silly it is.

This was explicitly stated to me here:

  http://lkml.iu.edu/hypermail/linux/kernel/1705.3/03292.html

I.E. don't argue with me, argue with him. :)

So, I added a workaround with a printk in hopes of embarassing them into
someday fixing it.

Rob

[PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-13 Thread Rob Landley

From: Rob Landley <r...@landley.net>

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Add workaround for Debian bug that was copied by Ubuntu.

Signed-off-by: Rob Landley <r...@landley.net>
---

v2 discussion: http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05611.html

 drivers/base/Kconfig |   14 --
 fs/namespace.c   |   14 ++
 init/main.c  |   15 +--
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index f046d21..97352d4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system to come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..06057d7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,21 @@ static int do_add_mount(struct mount *newmnt, struct 
path *path, int mnt_flags)
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
+   {
+   if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+   !strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+   {
+   /* Debian's kernel config enables DEVTMPFS_MOUNT, then
+  its initramfs setup script tries to mount devtmpfs
+  again, and if the second mount-over-itself fails
+  the script overmounts a tmpfs on /dev to hide the
+  existing contents, then boot fails with empty /dev. 
*/
+   printk(KERN_WARNING "Debian bug workaround for devtmpfs 
overmount.");
+
+   err = 0;
+   }
goto unlock;
+   }
 
err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))
diff --git a/init/main.c b/init/main.c
index 0ee9c686..0d8e5ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,12 +1065,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1082,8 +1076,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

[PATCH v3] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-09-13 Thread Rob Landley

From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Add workaround for Debian bug that was copied by Ubuntu.

Signed-off-by: Rob Landley 
---

v2 discussion: http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05611.html

 drivers/base/Kconfig |   14 --
 fs/namespace.c   |   14 ++
 init/main.c  |   15 +--
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index f046d21..97352d4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system to come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..06057d7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,21 @@ static int do_add_mount(struct mount *newmnt, struct 
path *path, int mnt_flags)
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
+   {
+   if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+   !strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+   {
+   /* Debian's kernel config enables DEVTMPFS_MOUNT, then
+  its initramfs setup script tries to mount devtmpfs
+  again, and if the second mount-over-itself fails
+  the script overmounts a tmpfs on /dev to hide the
+  existing contents, then boot fails with empty /dev. 
*/
+   printk(KERN_WARNING "Debian bug workaround for devtmpfs 
overmount.");
+
+   err = 0;
+   }
goto unlock;
+   }
 
err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))
diff --git a/init/main.c b/init/main.c
index 0ee9c686..0d8e5ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,12 +1065,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1082,8 +1076,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-09-12 Thread Rob Landley

On 09/11/2017 06:45 AM, Petr Mladek wrote:
>> Except for the second printk line: If you boot with rdinit=/bin/hush
>> then the first time you mount -t devtmpfs /dev /dev after boot (with
>> CONFIG_DEVTMPFS_MOUNT already having mounted it), you get the 0 return
>> value but the last printk() doesn't output? The second and later times
>> you repeat it, both printk() lines are output.
>>
>> What's up with printk?
>>
>> (I added the second printk because the _first_ one wasn't outputting
>> that first time. Something is happening to flush the printk() queue
>> instead of writing it out?
> 
> You need to add "\n" at the end of the line. Otherwise, it expects
> that the message would continue and puts it into a cont buffer.
> The buffer is flushed only when another non-continuous message
> is added.

Ah. The next one flushes the previous one, meaning when I repeat the
command I get the output I expected the second time but I'm seeing the
_previous_ instance of it, not the current one.

> This problem is more visible since the commit 5c2992ee7fd8a29d0412
> ("printk: remove console flushing special cases for partial buffered
> lines").

Gotcha. My bad.

Thanks,

Rob

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-09-12 Thread Rob Landley

On 09/11/2017 06:45 AM, Petr Mladek wrote:
>> Except for the second printk line: If you boot with rdinit=/bin/hush
>> then the first time you mount -t devtmpfs /dev /dev after boot (with
>> CONFIG_DEVTMPFS_MOUNT already having mounted it), you get the 0 return
>> value but the last printk() doesn't output? The second and later times
>> you repeat it, both printk() lines are output.
>>
>> What's up with printk?
>>
>> (I added the second printk because the _first_ one wasn't outputting
>> that first time. Something is happening to flush the printk() queue
>> instead of writing it out?
> 
> You need to add "\n" at the end of the line. Otherwise, it expects
> that the message would continue and puts it into a cont buffer.
> The buffer is flushed only when another non-continuous message
> is added.

Ah. The next one flushes the previous one, meaning when I repeat the
command I get the output I expected the second time but I'm seeing the
_previous_ instance of it, not the current one.

> This problem is more visible since the commit 5c2992ee7fd8a29d0412
> ("printk: remove console flushing special cases for partial buffered
> lines").

Gotcha. My bad.

Thanks,

Rob

Re: execve(NULL, argv, envp) for nommu?

2017-09-12 Thread Rob Landley

On 09/12/2017 06:30 AM, Geert Uytterhoeven wrote:
> Hi Rob,
> 
> On Tue, Sep 12, 2017 at 12:48 PM, Rob Landley <r...@landley.net> wrote:
>> Your stack has pointers. Your heap has pointers. Your data and bss (once
>> initialized) can have pointers. These pointers can be in the middle of
>> malloc()'ed structures so no ELF table anywhere knows anything about
>> them. A long variable containing a value that _could_ point into one of
>> these ranges isn't guaranteed to _be_ a pointer, in which case adjusting
>> it is breakage. Tracking them all down and fixing up just the right ones
>> without missing any or changing data you shouldn't is REALLY HARD.
> 
> Hence (make the compiler) never store pointers, only offsets relative to a
> base register. So after making copies of stack, data/bss, and heap, all you
> need to do is adjust these base registers for the child process.
> Nothing in main memory needs to be modified.

Ok, I'll bite. How do you set a signal handler under this regime, since
that needs to pass a function pointer to the syscall? Have a different
function pointer type for when you want a real pointer instead of an
offset pointer? Perhaps label them "near" and "far" pointers, since
there's precedent for that back under DOS?

When you call printf(), how does it accept both a "string constant"
living in rodata and a char array on the stack? Two printf functions
with different argument types? If it _does_ take an actual memory
address rather than an offset that isn't always vs the same segment then
you've written pointers to the stack...

You're also requiring static linking: shared libraries work just fine
with fdpic, but under your segment:offset addressing system all text has
to be relative to the same code segment.

Plus there's still the "fork() off of mozilla" problem that you may copy
lots of data just to immediately discard it as the common case (unless
you'd still use vfork() for most things), and you still need contiguous
blocks of memory for each segment (nommu is vulnerable to fragmentation,
increasingly so as the system stays up longer) so your fork() will fail
where vfork() succeeds. But that just makes it really slow and
unreliable, rather than requiring a large rewrite of the C language.

> Text accesses can be PC-relative => nothing to adjust.
> Local variable accesses are stack-relative => nothing to adjust.
> Data/bss accesses can be relative to a reserved register that stores the
> data base address => only adjust the base register, nothing in RAM to adjust.

Does this compiler setup you're describing actually exist?

Instead of making a minor adjustment to one system call, it's better to
extensively rewrite compilers and calling conventions, ignoring the way
C traditionally treats strings and arrays as pointers where pointers
into data, bss, heap, and stack are all used interchangeably...

> Heap accesses can be relative to a reserved register that stores the heap
> base address => only adjust the base register, nothing in RAM to adjust.

Query: if you implement a linked list ala:

struct blah {
  struct blah *next;
  char *key, *value;
};

If next points to a malloc(), key is a constant string in rodata, and
value was strchr(getenv(key), '=')+1 (with appropriate error checking of
course), how does your compiler know which segment each pointer in that
structure is offset from? (What segment IS your environment space
relative to, anyway? It's not the _current_ value of your stack pointer,
that moves.)

How does your proposed compiler rewrite handle mmap()? You can do
MAP_SHARED just fine on nommu today, it's only MAP_PRIVATE that requires
copy on write. (Yes MAP_SHARED can be read only.)

You're aware that most heap implementations can have more than one
underlying mmap(), right?

  http://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c#n320

https://github.com/kraj/uClibc/blob/master/libc/stdlib/malloc/malloc.c#L121

So when you say _the_ heap base address above, which chunk are you
referring to?

Rob

Re: execve(NULL, argv, envp) for nommu?

2017-09-12 Thread Rob Landley

On 09/12/2017 06:30 AM, Geert Uytterhoeven wrote:
> Hi Rob,
> 
> On Tue, Sep 12, 2017 at 12:48 PM, Rob Landley  wrote:
>> Your stack has pointers. Your heap has pointers. Your data and bss (once
>> initialized) can have pointers. These pointers can be in the middle of
>> malloc()'ed structures so no ELF table anywhere knows anything about
>> them. A long variable containing a value that _could_ point into one of
>> these ranges isn't guaranteed to _be_ a pointer, in which case adjusting
>> it is breakage. Tracking them all down and fixing up just the right ones
>> without missing any or changing data you shouldn't is REALLY HARD.
> 
> Hence (make the compiler) never store pointers, only offsets relative to a
> base register. So after making copies of stack, data/bss, and heap, all you
> need to do is adjust these base registers for the child process.
> Nothing in main memory needs to be modified.

Ok, I'll bite. How do you set a signal handler under this regime, since
that needs to pass a function pointer to the syscall? Have a different
function pointer type for when you want a real pointer instead of an
offset pointer? Perhaps label them "near" and "far" pointers, since
there's precedent for that back under DOS?

When you call printf(), how does it accept both a "string constant"
living in rodata and a char array on the stack? Two printf functions
with different argument types? If it _does_ take an actual memory
address rather than an offset that isn't always vs the same segment then
you've written pointers to the stack...

You're also requiring static linking: shared libraries work just fine
with fdpic, but under your segment:offset addressing system all text has
to be relative to the same code segment.

Plus there's still the "fork() off of mozilla" problem that you may copy
lots of data just to immediately discard it as the common case (unless
you'd still use vfork() for most things), and you still need contiguous
blocks of memory for each segment (nommu is vulnerable to fragmentation,
increasingly so as the system stays up longer) so your fork() will fail
where vfork() succeeds. But that just makes it really slow and
unreliable, rather than requiring a large rewrite of the C language.

> Text accesses can be PC-relative => nothing to adjust.
> Local variable accesses are stack-relative => nothing to adjust.
> Data/bss accesses can be relative to a reserved register that stores the
> data base address => only adjust the base register, nothing in RAM to adjust.

Does this compiler setup you're describing actually exist?

Instead of making a minor adjustment to one system call, it's better to
extensively rewrite compilers and calling conventions, ignoring the way
C traditionally treats strings and arrays as pointers where pointers
into data, bss, heap, and stack are all used interchangeably...

> Heap accesses can be relative to a reserved register that stores the heap
> base address => only adjust the base register, nothing in RAM to adjust.

Query: if you implement a linked list ala:

struct blah {
  struct blah *next;
  char *key, *value;
};

If next points to a malloc(), key is a constant string in rodata, and
value was strchr(getenv(key), '=')+1 (with appropriate error checking of
course), how does your compiler know which segment each pointer in that
structure is offset from? (What segment IS your environment space
relative to, anyway? It's not the _current_ value of your stack pointer,
that moves.)

How does your proposed compiler rewrite handle mmap()? You can do
MAP_SHARED just fine on nommu today, it's only MAP_PRIVATE that requires
copy on write. (Yes MAP_SHARED can be read only.)

You're aware that most heap implementations can have more than one
underlying mmap(), right?

  http://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c#n320

https://github.com/kraj/uClibc/blob/master/libc/stdlib/malloc/malloc.c#L121

So when you say _the_ heap base address above, which chunk are you
referring to?

Rob

Re: execve(NULL, argv, envp) for nommu?

2017-09-12 Thread Rob Landley

On 09/11/2017 10:15 AM, Oleg Nesterov wrote:
> On 09/08, Rob Landley wrote:
>>
>> So is exec(NULL, argv, envp) a reasonable thing to want?
> 
> I think that something like prctl(PR_OPEN_EXE_FILE) which does
> 
>   dentry_open(current->mm->exe_file->path, O_PATH)
> 
> and returns fd make more sense.
> 
> Then you can do execveat(fd, "", ..., AT_EMPTY_PATH).
I'm all for it? That sounds like a cosmetic difference, a more verbose
way of achieving the same outcome.

(Of course now you've got a filehandle you can read xattrs and such
through from otherwise jailed contexts letting you do things you
couldn't necessarily do before, but I assume you know the security
implications of that more than I do. I tried to suggest something that
_didn't_ create new capabilities, just let nommu do a thing that mmu
could already do.)

> But to be honest, I can't understand the problem, because I know nothing
> about nommu.
> 
> You need to unblock parent sleeping in vfork(), and you can't do another
> fork (I don't undestand why).

A nommu system doesn't have a memory management unit, so all addresses
are physical addresses. This means two processes can't see different
things at the same address: either they see the same thing or one of
them can't see that address (due to a range register making it).

Conventional fork() creates copy on write mappings of all the existing
writable memory of the parent process. So when the new PID dirties a
page, the old page gets copied by the fault handler. The problem isn't
the copies (that's just slow), the problem is two processes seeing
different things at the same address. That requires an MMU with a TLB
loaded from page tables.

If you create _new_ mappings and copy the data over, they'll have
different addresses. But any pointers you copied will point to the _old_
addresses. Finding and adjusting all those pointers to point to the new
addresses instead is basically the same problem as doing garbage
collection in C.

Your stack has pointers. Your heap has pointers. Your data and bss (once
initialized) can have pointers. These pointers can be in the middle of
malloc()'ed structures so no ELF table anywhere knows anything about
them. A long variable containing a value that _could_ point into one of
these ranges isn't guaranteed to _be_ a pointer, in which case adjusting
it is breakage. Tracking them all down and fixing up just the right ones
without missing any or changing data you shouldn't is REALLY HARD.

The vfork() system call is what you use on nommu instead: it creates a
child process that uses its parent's memory mappings. The parent process
is stopped until the child calls _exit() or exec(), either of which
means it stops using those mappings and the parent can go back to using
them without the two stomping on each other. (Usually they even share
the same stack, so the child shouldn't return from the function that
called vfork() or it'll corrupt the stack for the parent process. And be
careful about changing local variables, the parent might see the changes
when it resumes. Some vfork() implementations provide a small new stack,
ala signal handlers or kernel interrupts, so you can't guarantee your
parent will see your local variable changes, but you still can't return
from the function that called vfork() in either case.)

So after calling vfork(), the child _must_ call exec() in order for
there to be two independent processes running at the same time. Until
then, the parent is stopped.

The real problem with implementing full fork() isn't the expense of
copying the data (although if you fork and exec from a mozilla style pig
process, you could copy hundreds of megabytes of data and then
immediately discard it again; that's why fork() doesn't usually do that;
oh and on nommu systems you need _contiguous_ memory blocks for the data
because it can't collect disparate pages together into a longer mapping,
so this is actually a largeish real-world issue on those systems, not
merely slow and expensive.) The hard problem is translating the pointers
so the new mapping doesn't read/write objects in the old mapping.

> Perhaps the child can create another thread? The main thread can exit
> after that and unblock the parent. Or perhaps even something like
> clone(CLONE_VM | CLONE_PARENT), I dunno...

Launching a new thread doesn't unblock the parent. A second vfork() from
the child wouldn't unblock the parent. Your mappings are still
overcommited, only _exit() or execve() releases the child process's use
of those mappings.

You can create threads on nommu because they're designed to share the
same mappings. In that case you're guaranteed a new stack, and not
stomping the parent's data is your problem.

But if you exec() from a thread, posix says it kills all the other threads:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html

And even without that, we're still in the "vfork

Re: execve(NULL, argv, envp) for nommu?

2017-09-12 Thread Rob Landley

On 09/11/2017 10:15 AM, Oleg Nesterov wrote:
> On 09/08, Rob Landley wrote:
>>
>> So is exec(NULL, argv, envp) a reasonable thing to want?
> 
> I think that something like prctl(PR_OPEN_EXE_FILE) which does
> 
>   dentry_open(current->mm->exe_file->path, O_PATH)
> 
> and returns fd make more sense.
> 
> Then you can do execveat(fd, "", ..., AT_EMPTY_PATH).
I'm all for it? That sounds like a cosmetic difference, a more verbose
way of achieving the same outcome.

(Of course now you've got a filehandle you can read xattrs and such
through from otherwise jailed contexts letting you do things you
couldn't necessarily do before, but I assume you know the security
implications of that more than I do. I tried to suggest something that
_didn't_ create new capabilities, just let nommu do a thing that mmu
could already do.)

> But to be honest, I can't understand the problem, because I know nothing
> about nommu.
> 
> You need to unblock parent sleeping in vfork(), and you can't do another
> fork (I don't undestand why).

A nommu system doesn't have a memory management unit, so all addresses
are physical addresses. This means two processes can't see different
things at the same address: either they see the same thing or one of
them can't see that address (due to a range register making it).

Conventional fork() creates copy on write mappings of all the existing
writable memory of the parent process. So when the new PID dirties a
page, the old page gets copied by the fault handler. The problem isn't
the copies (that's just slow), the problem is two processes seeing
different things at the same address. That requires an MMU with a TLB
loaded from page tables.

If you create _new_ mappings and copy the data over, they'll have
different addresses. But any pointers you copied will point to the _old_
addresses. Finding and adjusting all those pointers to point to the new
addresses instead is basically the same problem as doing garbage
collection in C.

Your stack has pointers. Your heap has pointers. Your data and bss (once
initialized) can have pointers. These pointers can be in the middle of
malloc()'ed structures so no ELF table anywhere knows anything about
them. A long variable containing a value that _could_ point into one of
these ranges isn't guaranteed to _be_ a pointer, in which case adjusting
it is breakage. Tracking them all down and fixing up just the right ones
without missing any or changing data you shouldn't is REALLY HARD.

The vfork() system call is what you use on nommu instead: it creates a
child process that uses its parent's memory mappings. The parent process
is stopped until the child calls _exit() or exec(), either of which
means it stops using those mappings and the parent can go back to using
them without the two stomping on each other. (Usually they even share
the same stack, so the child shouldn't return from the function that
called vfork() or it'll corrupt the stack for the parent process. And be
careful about changing local variables, the parent might see the changes
when it resumes. Some vfork() implementations provide a small new stack,
ala signal handlers or kernel interrupts, so you can't guarantee your
parent will see your local variable changes, but you still can't return
from the function that called vfork() in either case.)

So after calling vfork(), the child _must_ call exec() in order for
there to be two independent processes running at the same time. Until
then, the parent is stopped.

The real problem with implementing full fork() isn't the expense of
copying the data (although if you fork and exec from a mozilla style pig
process, you could copy hundreds of megabytes of data and then
immediately discard it again; that's why fork() doesn't usually do that;
oh and on nommu systems you need _contiguous_ memory blocks for the data
because it can't collect disparate pages together into a longer mapping,
so this is actually a largeish real-world issue on those systems, not
merely slow and expensive.) The hard problem is translating the pointers
so the new mapping doesn't read/write objects in the old mapping.

> Perhaps the child can create another thread? The main thread can exit
> after that and unblock the parent. Or perhaps even something like
> clone(CLONE_VM | CLONE_PARENT), I dunno...

Launching a new thread doesn't unblock the parent. A second vfork() from
the child wouldn't unblock the parent. Your mappings are still
overcommited, only _exit() or execve() releases the child process's use
of those mappings.

You can create threads on nommu because they're designed to share the
same mappings. In that case you're guaranteed a new stack, and not
stomping the parent's data is your problem.

But if you exec() from a thread, posix says it kills all the other threads:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html

And even without that, we're still in the "vfork

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-09-10 Thread Rob Landley

Taking another stab at this old issue from last merge window...

> Rob Landley <r...@landley.net> writes:
>> On 05/23/2017 03:01 AM, Yury Norov wrote:
>>> On Mon, May 22, 2017 at 09:07:54PM -0500, Rob Landley wrote:
>>>> Your userspace mounted a tmpfs over /dev when it couldn't mount a second
>>>> identical instance of devtmpfs over itself. If you had a static /dev in
>>>> initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
>>>> error path would have taken that out too with a pointless tmpfs mount.
>>>
>>> CONFIG_DEVTMPFS_MOUNT is enabled on my machine, so I think your
>>> suggestion is correct. But I didn't do that specifically - I run
>>> almost default kernel based on Ubuntu 14.04 config and environment.
>>
>> I.E. ubuntu has a bug: they enabled CONFIG_DEVTMPFS_MOUNT and then
>> launchd an initramfs instead (which didn't do the automount they
>> requested so why request it), but if CONFIG_DEVTMPFS_MOUNT actually
>> starts working in initramfs they have an insane error path that breaks
>> the system, and does nothing _except_ break the system.

...

On 05/25/2017 01:13 AM, Michael Ellerman wrote:
> Hi Rob,
> 
> This is breaking a bunch of my powerpc boxes, for the exact same
> reason, they use a config that has DEVTMPFS_MOUNT=y and that trips
> up the initramfs.

I've continued to use this locally but should probably make another
stab at submitting upstream. The obvious workaround until debian fixes
its 100% obvious bug seems to be:

diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..f57d5df 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,17 @@ static int do_add_mount(struct mount *newmnt, struct 
path *path, int mnt_flags)
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
+   {
+   if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+   !strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+   {
+   printk(KERN_WARNING "Debian bug workaround for devtmpfs 
overmount.");
+   printk(KERN_WARNING "This line doesn't output for some 
reason.");
+
+   err = 0;
+   }
goto unlock;
+   }
 
err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))

Except for the second printk line: If you boot with rdinit=/bin/hush
then the first time you mount -t devtmpfs /dev /dev after boot (with
CONFIG_DEVTMPFS_MOUNT already having mounted it), you get the 0 return
value but the last printk() doesn't output? The second and later times
you repeat it, both printk() lines are output.

What's up with printk?

(I added the second printk because the _first_ one wasn't outputting
that first time. Something is happening to flush the printk() queue
instead of writing it out? Built for x86-64, miniconfig attached for
reference. I tested commit 4dfc2788033d from yesterday.)

Rob
# make ARCH=x86 allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
# make ARCH=x86 -j $(nproc)
# boot arch/x86/boot/bzImage


CONFIG_64BIT=y

CONFIG_PCI=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_ATA_PIIX=y

CONFIG_NET_VENDOR_INTEL=y
CONFIG_E1000=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_RTC_CLASS=y


# CONFIG_EMBEDDED is not set
CONFIG_EARLY_PRINTK=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y

CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y

CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IPV6=y
CONFIG_NETDEVICES=y
#CONFIG_NET_CORE=y
#CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-09-10 Thread Rob Landley

Taking another stab at this old issue from last merge window...

> Rob Landley  writes:
>> On 05/23/2017 03:01 AM, Yury Norov wrote:
>>> On Mon, May 22, 2017 at 09:07:54PM -0500, Rob Landley wrote:
>>>> Your userspace mounted a tmpfs over /dev when it couldn't mount a second
>>>> identical instance of devtmpfs over itself. If you had a static /dev in
>>>> initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
>>>> error path would have taken that out too with a pointless tmpfs mount.
>>>
>>> CONFIG_DEVTMPFS_MOUNT is enabled on my machine, so I think your
>>> suggestion is correct. But I didn't do that specifically - I run
>>> almost default kernel based on Ubuntu 14.04 config and environment.
>>
>> I.E. ubuntu has a bug: they enabled CONFIG_DEVTMPFS_MOUNT and then
>> launchd an initramfs instead (which didn't do the automount they
>> requested so why request it), but if CONFIG_DEVTMPFS_MOUNT actually
>> starts working in initramfs they have an insane error path that breaks
>> the system, and does nothing _except_ break the system.

...

On 05/25/2017 01:13 AM, Michael Ellerman wrote:
> Hi Rob,
> 
> This is breaking a bunch of my powerpc boxes, for the exact same
> reason, they use a config that has DEVTMPFS_MOUNT=y and that trips
> up the initramfs.

I've continued to use this locally but should probably make another
stab at submitting upstream. The obvious workaround until debian fixes
its 100% obvious bug seems to be:

diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..f57d5df 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,17 @@ static int do_add_mount(struct mount *newmnt, struct 
path *path, int mnt_flags)
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
+   {
+   if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+   !strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+   {
+   printk(KERN_WARNING "Debian bug workaround for devtmpfs 
overmount.");
+   printk(KERN_WARNING "This line doesn't output for some 
reason.");
+
+   err = 0;
+   }
goto unlock;
+   }
 
err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))

Except for the second printk line: If you boot with rdinit=/bin/hush
then the first time you mount -t devtmpfs /dev /dev after boot (with
CONFIG_DEVTMPFS_MOUNT already having mounted it), you get the 0 return
value but the last printk() doesn't output? The second and later times
you repeat it, both printk() lines are output.

What's up with printk?

(I added the second printk because the _first_ one wasn't outputting
that first time. Something is happening to flush the printk() queue
instead of writing it out? Built for x86-64, miniconfig attached for
reference. I tested commit 4dfc2788033d from yesterday.)

Rob
# make ARCH=x86 allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
# make ARCH=x86 -j $(nproc)
# boot arch/x86/boot/bzImage


CONFIG_64BIT=y

CONFIG_PCI=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_ATA_PIIX=y

CONFIG_NET_VENDOR_INTEL=y
CONFIG_E1000=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_RTC_CLASS=y


# CONFIG_EMBEDDED is not set
CONFIG_EARLY_PRINTK=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y

CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y

CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IPV6=y
CONFIG_NETDEVICES=y
#CONFIG_NET_CORE=y
#CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y

Re: execve(NULL, argv, envp) for nommu?

2017-09-08 Thread Rob Landley

On 09/05/2017 08:12 PM, Rob Landley wrote:
> On 09/05/2017 08:24 AM, Alan Cox wrote:
>>>> honoring the suid bit if people feel that way. I just wanna unblock
>>>> vfork() while still running this code. 
>>
>> Would it make more sense to have a way to promote your vfork into a
>> fork when you hit these cases (I appreciate that fork on NOMMU has a much
>> higher performance cost as you start having to softmmu copy or swap
>> pages).
> 
> It's not the performance cost, it's rewriting all the pointers.
> 
> Without address translation, copying the existing mappings to a new
> range requires finding and adjusting every pointer to the old data,
> which you can do for the executable mappings in PIE* binaries, but
> tracking down all the pointers on the stack, heap, and in your global
> variables? Flaming pain.
> 
> Making fork() work on nommu is basically the same problem as making
> garbage collection work in C on mmu. Thus those of us who defend vfork()
> from the people who don't understand why it exists periodically
> suggesting we remove it.

So is exec(NULL, argv, envp) a reasonable thing to want?

Rob

Re: execve(NULL, argv, envp) for nommu?

2017-09-08 Thread Rob Landley

On 09/05/2017 08:12 PM, Rob Landley wrote:
> On 09/05/2017 08:24 AM, Alan Cox wrote:
>>>> honoring the suid bit if people feel that way. I just wanna unblock
>>>> vfork() while still running this code. 
>>
>> Would it make more sense to have a way to promote your vfork into a
>> fork when you hit these cases (I appreciate that fork on NOMMU has a much
>> higher performance cost as you start having to softmmu copy or swap
>> pages).
> 
> It's not the performance cost, it's rewriting all the pointers.
> 
> Without address translation, copying the existing mappings to a new
> range requires finding and adjusting every pointer to the old data,
> which you can do for the executable mappings in PIE* binaries, but
> tracking down all the pointers on the stack, heap, and in your global
> variables? Flaming pain.
> 
> Making fork() work on nommu is basically the same problem as making
> garbage collection work in C on mmu. Thus those of us who defend vfork()
> from the people who don't understand why it exists periodically
> suggesting we remove it.

So is exec(NULL, argv, envp) a reasonable thing to want?

Rob

Re: execve(NULL, argv, envp) for nommu?

2017-09-05 Thread Rob Landley

On 09/05/2017 08:24 AM, Alan Cox wrote:
>>> anymore. But I'm already _running_ this program. If I could fork() I
>>> could already get a second copy of the sucker and call main() again
>>> myself if necessary, but I can't, so...
> 
> You can - ptrace 8)

Oh I can call clone() with various flags and try to fake it myself, it
just won't do what I want. :)

>>> honoring the suid bit if people feel that way. I just wanna unblock
>>> vfork() while still running this code. 
> 
> Would it make more sense to have a way to promote your vfork into a
> fork when you hit these cases (I appreciate that fork on NOMMU has a much
> higher performance cost as you start having to softmmu copy or swap
> pages).

It's not the performance cost, it's rewriting all the pointers.

Without address translation, copying the existing mappings to a new
range requires finding and adjusting every pointer to the old data,
which you can do for the executable mappings in PIE* binaries, but
tracking down all the pointers on the stack, heap, and in your global
variables? Flaming pain.

Making fork() work on nommu is basically the same problem as making
garbage collection work in C on mmu. Thus those of us who defend vfork()
from the people who don't understand why it exists periodically
suggesting we remove it.

> Alan

Rob

* or FDPIC, which is basically just PIE with 4 individually relocatable
text/data/rodata/bss segments instead of one big mapping you relocate as
a contiguous block; both work on nommu but fdpic can fit into more
fragmented memory, and becauase the segments are independent it lets
nommu share some segments between processes (code+rodata**) without
sharing others (data and bss). That's why nommu can't run normal elf but
can run PIE or FDPIC binaries. Or binflt which is the old a.out version.

** Don't ask me what happens when rodata contains a constant pointer to
a bss or data object. I'm guessing the compiler Does A Thing. Ask Rich
Felker?

Re: execve(NULL, argv, envp) for nommu?

2017-09-05 Thread Rob Landley

On 09/05/2017 08:24 AM, Alan Cox wrote:
>>> anymore. But I'm already _running_ this program. If I could fork() I
>>> could already get a second copy of the sucker and call main() again
>>> myself if necessary, but I can't, so...
> 
> You can - ptrace 8)

Oh I can call clone() with various flags and try to fake it myself, it
just won't do what I want. :)

>>> honoring the suid bit if people feel that way. I just wanna unblock
>>> vfork() while still running this code. 
> 
> Would it make more sense to have a way to promote your vfork into a
> fork when you hit these cases (I appreciate that fork on NOMMU has a much
> higher performance cost as you start having to softmmu copy or swap
> pages).

It's not the performance cost, it's rewriting all the pointers.

Without address translation, copying the existing mappings to a new
range requires finding and adjusting every pointer to the old data,
which you can do for the executable mappings in PIE* binaries, but
tracking down all the pointers on the stack, heap, and in your global
variables? Flaming pain.

Making fork() work on nommu is basically the same problem as making
garbage collection work in C on mmu. Thus those of us who defend vfork()
from the people who don't understand why it exists periodically
suggesting we remove it.

> Alan

Rob

* or FDPIC, which is basically just PIE with 4 individually relocatable
text/data/rodata/bss segments instead of one big mapping you relocate as
a contiguous block; both work on nommu but fdpic can fit into more
fragmented memory, and becauase the segments are independent it lets
nommu share some segments between processes (code+rodata**) without
sharing others (data and bss). That's why nommu can't run normal elf but
can run PIE or FDPIC binaries. Or binflt which is the old a.out version.

** Don't ask me what happens when rodata contains a constant pointer to
a bss or data object. I'm guessing the compiler Does A Thing. Ask Rich
Felker?

Re: INITRAMFS_SOURCE broken by 6e19eded3684dc184181093af3bff2ff440f5b53?

2017-08-08 Thread Rob Landley

On 08/08/2017 07:04 AM, Willy Tarreau wrote:
> Hi Thomas,
> 
> On Tue, Aug 08, 2017 at 01:46:25PM +0200, Thomas Meyer wrote:
>> Hi,
>>
>> did the commit 6e19eded3684dc184181093af3bff2ff440f5b53 break a linux kernel
>> build with an included ramdisk?
>>
>> As fas as I understand you must expliclity add rootfstype=ramfs to the kernel
>> command line to boot from the included ramfsdisk?
>>
>> bug or feature?
> 
> Strange, I'm running my kernels with the modules packaged inside the initramfs
> and never met this problem even after this commit (my 4.9 kernels are still
> packaged this way and run fine). And yes, I do have TMPFS enabled. I can't
> tell whether tmpfs or ramfs was used however given that at this level I don't
> have all the tools available to report the FS type (and proc says "rootfs").
> Are you sure you're not missing anything ?

If your rootfs has a size= in /proc/mounts it's tmpfs, ala:

  rootfs / rootfs rw,size=126564k,nr_inodes=31641 0 0

Rob

Re: INITRAMFS_SOURCE broken by 6e19eded3684dc184181093af3bff2ff440f5b53?

2017-08-08 Thread Rob Landley

On 08/08/2017 07:04 AM, Willy Tarreau wrote:
> Hi Thomas,
> 
> On Tue, Aug 08, 2017 at 01:46:25PM +0200, Thomas Meyer wrote:
>> Hi,
>>
>> did the commit 6e19eded3684dc184181093af3bff2ff440f5b53 break a linux kernel
>> build with an included ramdisk?
>>
>> As fas as I understand you must expliclity add rootfstype=ramfs to the kernel
>> command line to boot from the included ramfsdisk?
>>
>> bug or feature?
> 
> Strange, I'm running my kernels with the modules packaged inside the initramfs
> and never met this problem even after this commit (my 4.9 kernels are still
> packaged this way and run fine). And yes, I do have TMPFS enabled. I can't
> tell whether tmpfs or ramfs was used however given that at this level I don't
> have all the tools available to report the FS type (and proc says "rootfs").
> Are you sure you're not missing anything ?

If your rootfs has a size= in /proc/mounts it's tmpfs, ala:

  rootfs / rootfs rw,size=126564k,nr_inodes=31641 0 0

Rob

ping/icmp sockets: define "sane".

2017-07-18 Thread Rob Landley

The title is from this comment in net/ipv4:

 /*
  * Sane defaults - nobody may create ping sockets.
  * Boot scripts should set this to distro-specific group.
  */

So in 2011 you added ICMP sockets, but made it so nobody could use them
without root performing a magic incatation at boot time. From the original
commit message:

>  socket(2) is restricted to the group range specified in
>  "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
>  that nobody (not even root) may create ping sockets.

Why? What's the point of NOT letting root use this? So ping programs like
busybox's can't use the new api as a drop-in replacment for the old one
because even if they keep the suid bit on the command, it won't work?

I thought busybox was using it, but they ripped it back out in 2014:
https://git.busybox.net/busybox/commit/?id=f0058b1b1fe9

What is the point of creating a new api to do something root could
previously do in a safer way that doesn't require root access, and then
not even let root do it by default? What's the point? I thought commit
ba6b918ab234 removed this blockage, but instead it moved it to a
different file.

Is ping flood from icmp somehow more dangerous than UDP flood from an
arbitrary user? What's the issue reeuiring this elaborate infrastructure
to render your new api so useless busybox went BACK to the suid root
version, and the ping in ubuntu 14.04 is also suid root?

I ask because I'm finally getting around to implementing ping in toybox
and of course I was going to use the new API, and testing it on Ubuntu
didn't work, so I dug this mess up and boggled. Perhaps you could explain
it?

The Android guys say that yes they use this API, and make it available
to everybody, even from java:

http://lists.landley.net/pipermail/toybox-landley.net/2017-July/009101.html

The kernel patch to make it work is presumably just:

--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1712,12 +1712,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.ip_local_ports.range[1] =  60999;
 seqlock_init(>ipv4.ping_group_range.lock);
-   /*
-* Sane defaults - nobody may create ping sockets.
-* Boot scripts should set this to distro-specific group.
-*/
-   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 1);
-   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns,
65535);
 /* Default values for sysctl-controlled parameters.
 * We set them here, in case sysctl is not compiled.

I'm tempted to put that diff into the toybox FAQ for people who want
to use toybox on vanilla linux. But first, I thought I'd ask for an
explanation of why it's explicitly, intentionally broken in the first
place? You made a "safer" API to not require root access, and then made
it so even root can't use it.

Why did you do that?

Rob

ping/icmp sockets: define "sane".

2017-07-18 Thread Rob Landley

The title is from this comment in net/ipv4:

 /*
  * Sane defaults - nobody may create ping sockets.
  * Boot scripts should set this to distro-specific group.
  */

So in 2011 you added ICMP sockets, but made it so nobody could use them
without root performing a magic incatation at boot time. From the original
commit message:

>  socket(2) is restricted to the group range specified in
>  "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
>  that nobody (not even root) may create ping sockets.

Why? What's the point of NOT letting root use this? So ping programs like
busybox's can't use the new api as a drop-in replacment for the old one
because even if they keep the suid bit on the command, it won't work?

I thought busybox was using it, but they ripped it back out in 2014:
https://git.busybox.net/busybox/commit/?id=f0058b1b1fe9

What is the point of creating a new api to do something root could
previously do in a safer way that doesn't require root access, and then
not even let root do it by default? What's the point? I thought commit
ba6b918ab234 removed this blockage, but instead it moved it to a
different file.

Is ping flood from icmp somehow more dangerous than UDP flood from an
arbitrary user? What's the issue reeuiring this elaborate infrastructure
to render your new api so useless busybox went BACK to the suid root
version, and the ping in ubuntu 14.04 is also suid root?

I ask because I'm finally getting around to implementing ping in toybox
and of course I was going to use the new API, and testing it on Ubuntu
didn't work, so I dug this mess up and boggled. Perhaps you could explain
it?

The Android guys say that yes they use this API, and make it available
to everybody, even from java:

http://lists.landley.net/pipermail/toybox-landley.net/2017-July/009101.html

The kernel patch to make it work is presumably just:

--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1712,12 +1712,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.ip_local_ports.range[1] =  60999;
 seqlock_init(>ipv4.ping_group_range.lock);
-   /*
-* Sane defaults - nobody may create ping sockets.
-* Boot scripts should set this to distro-specific group.
-*/
-   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 1);
-   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[0] = make_kgid(_user_ns, 0);
+   net->ipv4.ping_group_range.range[1] = make_kgid(_user_ns,
65535);
 /* Default values for sysctl-controlled parameters.
 * We set them here, in case sysctl is not compiled.

I'm tempted to put that diff into the toybox FAQ for people who want
to use toybox on vanilla linux. But first, I thought I'd ask for an
explanation of why it's explicitly, intentionally broken in the first
place? You made a "safer" API to not require root access, and then made
it so even root can't use it.

Why did you do that?

Rob

Re: [linux-next] PPC Lpar fail to boot with error hid: module verification failed: signature and/or required key missing - tainting kernel

2017-05-26 Thread Rob Landley

On 05/25/2017 04:24 PM, Stephen Rothwell wrote:
> Hi Michael,
> 
> On Thu, 25 May 2017 23:02:06 +1000 Michael Ellerman  
> wrote:
>>
>> It'll be:
>>
>> ee35011fd032 ("initramfs: make initramfs honor CONFIG_DEVTMPFS_MOUNT")
>
> And Andrew has asked me to drop that patch from linux-next which will
> happen today.

What approach do the kernel developers suggest I take here?

I would have thought letting it soak in linux-next for a release so
people could fix userspace bugs would be the next step, but this sounds
like that's not an option?

Is the behavior the patch implements wrong?

Rob

Re: [linux-next] PPC Lpar fail to boot with error hid: module verification failed: signature and/or required key missing - tainting kernel

2017-05-26 Thread Rob Landley

On 05/25/2017 04:24 PM, Stephen Rothwell wrote:
> Hi Michael,
> 
> On Thu, 25 May 2017 23:02:06 +1000 Michael Ellerman  
> wrote:
>>
>> It'll be:
>>
>> ee35011fd032 ("initramfs: make initramfs honor CONFIG_DEVTMPFS_MOUNT")
>
> And Andrew has asked me to drop that patch from linux-next which will
> happen today.

What approach do the kernel developers suggest I take here?

I would have thought letting it soak in linux-next for a release so
people could fix userspace bugs would be the next step, but this sounds
like that's not an option?

Is the behavior the patch implements wrong?

Rob

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-24 Thread Rob Landley

On 05/23/2017 06:08 PM, Yury Norov wrote:
>> It was 2 years ago, but AFAIR I took the Ubuntu image here:
>> http://cdimage.ubuntu.com/ubuntu-base/releases/14.04.1/release/ubuntu-base-14.04.1-core-arm64.tar.gz

Have you applied updates since then? (Maybe they fixed their init script
since 2 years ago?)

>> Kernel config is attached. I build the kernel with simple 'make'.
>>
>> Yury
> 
> Sorry, config is here.

$ diff -u yury.conf /boot/config-4.4.0-78-generic | grep '^[-+]' | wc -l
10384

So that's not Ubuntu's current 14.04 kernel config.

$ diff -u yury.conf /boot/config-4.2.0-36-generic | grep '^[-+]' | wc -l
10212

And it's not the oldest Ubuntu 14.04 config I have lying around (from a
year ago).

$ cd linux && make defconfig
$ diff -u ~/yury.conf .config | grep '^[-+]' | wc -l
4369

It's much closer to the current defconfig, but still significantly
different.

So you're using a custom config, and can't switch off a symbol.

Rob

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-24 Thread Rob Landley

On 05/23/2017 06:08 PM, Yury Norov wrote:
>> It was 2 years ago, but AFAIR I took the Ubuntu image here:
>> http://cdimage.ubuntu.com/ubuntu-base/releases/14.04.1/release/ubuntu-base-14.04.1-core-arm64.tar.gz

Have you applied updates since then? (Maybe they fixed their init script
since 2 years ago?)

>> Kernel config is attached. I build the kernel with simple 'make'.
>>
>> Yury
> 
> Sorry, config is here.

$ diff -u yury.conf /boot/config-4.4.0-78-generic | grep '^[-+]' | wc -l
10384

So that's not Ubuntu's current 14.04 kernel config.

$ diff -u yury.conf /boot/config-4.2.0-36-generic | grep '^[-+]' | wc -l
10212

And it's not the oldest Ubuntu 14.04 config I have lying around (from a
year ago).

$ cd linux && make defconfig
$ diff -u ~/yury.conf .config | grep '^[-+]' | wc -l
4369

It's much closer to the current defconfig, but still significantly
different.

So you're using a custom config, and can't switch off a symbol.

Rob

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-23 Thread Rob Landley

On 05/23/2017 03:01 AM, Yury Norov wrote:
> On Mon, May 22, 2017 at 09:07:54PM -0500, Rob Landley wrote:
>> Your userspace mounted a tmpfs over /dev when it couldn't mount a second
>> identical instance of devtmpfs over itself. If you had a static /dev in
>> initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
>> error path would have taken that out too with a pointless tmpfs mount.
> 
> CONFIG_DEVTMPFS_MOUNT is enabled on my machine, so I think your
> suggestion is correct. But I didn't do that specifically - I run
> almost default kernel based on Ubuntu 14.04 config and environment.

I.E. ubuntu has a bug: they enabled CONFIG_DEVTMPFS_MOUNT and then
launchd an initramfs instead (which didn't do the automount they
requested so why request it), but if CONFIG_DEVTMPFS_MOUNT actually
starts working in initramfs they have an insane error path that breaks
the system, and does nothing _except_ break the system.

> Grepping the kernel code shows that arc, arm, arm64, m86k, metag, 
> mips, nios2, openrisc, parisc, powerpc, sh, tile, um, x86 and xetensa
> enable it by default.

Most of which Ubuntu doesn't support, so none of them could trigger the
broken error path in ubuntu's init script.

Wait, are you saying you're doing a "make defconfig" on x86-64 and
booting ubuntu from the result? (Or is this arm?) Is _that_ the config
you still haven't specified in this conversation? I thought you were
using the /boot/config-4.4.0-78-generic and friends ubuntu installs.
(Which yes, also switch this symbol on.)

I can add a "default n" line to drivers/base/Kconfig if "make defconfig"
is what you're building from. (This symbol never specified a default in
the first place, so I dunno which way it falls, but it's repeated in a
gazillion defconfig files and not present in others... meaning I still
dunno which way the default goes. When I do a "make defconfig" it uses
arch/x86/configs/x86_64_defconfig because the kernel has multiple
codepaths to accomplish the same thing. I'm not sure the built-in
"default y" lines are used at all anymore? What a mess...)

But again, I'm just guessing what config you're using because you still
haven't _said_. I'm still trying to guess what you're doing when you hit
Ubuntu's bug.

> So it means for me that (at least) users why run
> Ubuntu 14.04 will have bricked system one day after updating the
> kernel.

Unless when they build their new kernel they open up menuconfig and
switch this symbol off. Which can't be done because...?

Or you could add the devtmpfs.mount=0 argument to your kernel command
line, as documented in the CONFIG_DEVTMPFS_MOUNT menuconfig help text.

The kernel already provides multiple workarounds for Ubuntu's bug, and
the issue only hits people who are manually building a new kernel from
source. If ubuntu provides a new kernel, I assume they'll tweak their
config _and_ fix their initramfs error path (which is just plain wrong).

> If you say that currently CONFIG_DEVTMPFS_MOUNT is ignored by kernel,

It's not _me_ saying it, it's the kernel doing it. The patch is
conceptually a straightforward fix on the kernel side to make it _not_
ignore that symbol in that context.

> I think you cannot relay on it anymore because people may have it
> enabled or disabled randomly.

I expected configs would have it randomly set, but the bug here is a
broken error path that does something actively harmful rather than going
"oh, we got a static /dev from somewhere, let's just leave it alone".
This error path goes out of its way to blank the contents of /dev by
mounting an empty tmpfs over it and leaving it empty, and then
complaining that /dev is blank _because_it_blanked_it_.

If Ubuntu meant to intentionally halt the proceedings the script could
have done that explicitly. What did the author of that error path think
would happen, exactly?

> So the proper way is to remove broken
> config option and introduce new one. BTW, I see it is used once in
> drivers/base/devtmpfs.c.

How does removing the broken config option (or renaming it to
CONFIG_DEFTMPFS_UBUNTU_IS_BROKEN) _not_ impact systems that were
previously happily using it in the contexts where it already worked?

If it's too much to ask people to switch it off when it was previously
on (but shouldn't have been), how is asking them to manually switch it
back on when it was previously on and needs to stay on better? (And if
you arrange it so "make oldconfig" migrates the old symbol to the new
one automatically, how would that work around the broken error path in
ubuntu's initramfs script? The rename becomes a NOP.)

If you're saying it should default to "n" I can send a patch. If you
want me to tweak every arch/*/configs file that redundantly includes the
same darn symbol, I can do that too. (Makes the patch big but it's just
a sed invocation to do it.)

>>

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-23 Thread Rob Landley

On 05/23/2017 03:01 AM, Yury Norov wrote:
> On Mon, May 22, 2017 at 09:07:54PM -0500, Rob Landley wrote:
>> Your userspace mounted a tmpfs over /dev when it couldn't mount a second
>> identical instance of devtmpfs over itself. If you had a static /dev in
>> initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
>> error path would have taken that out too with a pointless tmpfs mount.
> 
> CONFIG_DEVTMPFS_MOUNT is enabled on my machine, so I think your
> suggestion is correct. But I didn't do that specifically - I run
> almost default kernel based on Ubuntu 14.04 config and environment.

I.E. ubuntu has a bug: they enabled CONFIG_DEVTMPFS_MOUNT and then
launchd an initramfs instead (which didn't do the automount they
requested so why request it), but if CONFIG_DEVTMPFS_MOUNT actually
starts working in initramfs they have an insane error path that breaks
the system, and does nothing _except_ break the system.

> Grepping the kernel code shows that arc, arm, arm64, m86k, metag, 
> mips, nios2, openrisc, parisc, powerpc, sh, tile, um, x86 and xetensa
> enable it by default.

Most of which Ubuntu doesn't support, so none of them could trigger the
broken error path in ubuntu's init script.

Wait, are you saying you're doing a "make defconfig" on x86-64 and
booting ubuntu from the result? (Or is this arm?) Is _that_ the config
you still haven't specified in this conversation? I thought you were
using the /boot/config-4.4.0-78-generic and friends ubuntu installs.
(Which yes, also switch this symbol on.)

I can add a "default n" line to drivers/base/Kconfig if "make defconfig"
is what you're building from. (This symbol never specified a default in
the first place, so I dunno which way it falls, but it's repeated in a
gazillion defconfig files and not present in others... meaning I still
dunno which way the default goes. When I do a "make defconfig" it uses
arch/x86/configs/x86_64_defconfig because the kernel has multiple
codepaths to accomplish the same thing. I'm not sure the built-in
"default y" lines are used at all anymore? What a mess...)

But again, I'm just guessing what config you're using because you still
haven't _said_. I'm still trying to guess what you're doing when you hit
Ubuntu's bug.

> So it means for me that (at least) users why run
> Ubuntu 14.04 will have bricked system one day after updating the
> kernel.

Unless when they build their new kernel they open up menuconfig and
switch this symbol off. Which can't be done because...?

Or you could add the devtmpfs.mount=0 argument to your kernel command
line, as documented in the CONFIG_DEVTMPFS_MOUNT menuconfig help text.

The kernel already provides multiple workarounds for Ubuntu's bug, and
the issue only hits people who are manually building a new kernel from
source. If ubuntu provides a new kernel, I assume they'll tweak their
config _and_ fix their initramfs error path (which is just plain wrong).

> If you say that currently CONFIG_DEVTMPFS_MOUNT is ignored by kernel,

It's not _me_ saying it, it's the kernel doing it. The patch is
conceptually a straightforward fix on the kernel side to make it _not_
ignore that symbol in that context.

> I think you cannot relay on it anymore because people may have it
> enabled or disabled randomly.

I expected configs would have it randomly set, but the bug here is a
broken error path that does something actively harmful rather than going
"oh, we got a static /dev from somewhere, let's just leave it alone".
This error path goes out of its way to blank the contents of /dev by
mounting an empty tmpfs over it and leaving it empty, and then
complaining that /dev is blank _because_it_blanked_it_.

If Ubuntu meant to intentionally halt the proceedings the script could
have done that explicitly. What did the author of that error path think
would happen, exactly?

> So the proper way is to remove broken
> config option and introduce new one. BTW, I see it is used once in
> drivers/base/devtmpfs.c.

How does removing the broken config option (or renaming it to
CONFIG_DEFTMPFS_UBUNTU_IS_BROKEN) _not_ impact systems that were
previously happily using it in the contexts where it already worked?

If it's too much to ask people to switch it off when it was previously
on (but shouldn't have been), how is asking them to manually switch it
back on when it was previously on and needs to stay on better? (And if
you arrange it so "make oldconfig" migrates the old symbol to the new
one automatically, how would that work around the broken error path in
ubuntu's initramfs script? The rename becomes a NOP.)

If you're saying it should default to "n" I can send a patch. If you
want me to tweak every arch/*/configs file that redundantly includes the
same darn symbol, I can do that too. (Makes the patch big but it's just
a sed invocation to do it.)

>>

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-22 Thread Rob Landley

On 05/22/2017 07:05 AM, Yury Norov wrote:
> Hi Rob, 
> 
> I found that next-20170522 fails to boot on arm64 machine with the
> following log:

I don't know anything about your kernel config (is CONFIG_DEVTMPFS_MOUNT
enabled or disabled?) or what userspace you're booting with, but it
seems I can guess:

> [...]
> [4.179509] Freeing unused kernel memory: 1088K
> Loading, please wait...

At this point, the kernel has launched init and your userspace is
running. During that boot,the kernel mounted devtmpfs on /dev (you
edited the part where it did that out of your boot log), but the next line:

> mount: mounting udev on /dev failed: Device or resource busy

has an error that says you already have devtmpfs mounted on /dev, and
your userspace tries to mount devtmpfs on it _again_ and it fails
because you can't mount the exact same filesystem over itself due to a
sanity check in the kernel in fs/namespace.s line 2475 or so:

/* Refuse the same filesystem on the same mount point */
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
goto unlock;

> W: devtmpfs not available, falling back to tmpfs for /dev
> Couldn't get a file descriptor referring to the console

At which point your userspace does a "fixup" mounting something else
over the previously working devtmpfs, which succeeds (because you're
mounting a _different_ filesystem and not hitting the above sanity
test), thus breaking your userspace.

> Begin: Loading essential drivers ... done.
> Begin: Running /scripts/init-premount ... done.
> Begin: Mounting root file system ... Begin: Running
> /scripts/local-top ... done.
> chvt: can't open console

And then your userspace didn't notice for a while.

> Gave up waiting for root device.  Common problems:
>  - Boot args (cat /proc/cmdline)
>- Check rootdelay= (did the system wait long enough?)
>- Check root= (did the system wait for the right device?)
>  - Missing modules (cat /proc/modules; ls /dev)
> chvt: can't open console
> ALERT!  /dev/sda does not exist.  Dropping to a shell!
> Couldn't get a file descriptor referring to the console

And then it died.

> BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash)
> Enter 'help' for a list of built-in commands.
> 
> (initramfs)
> 
> Bisect points to your patch (attached below). If I revert it, everything
> becomes fine. If you need to know something more about my environment,
> feel free to ask me.

You were inappropriately specifying CONFIG_DEVTMPFS_MOUNT in your
config, now that it's no longer being ignored your init script is having
an allergic reaction to it. Either yank it from your config or fix your
userspace. It looks to me like my patch triggered a bug in your setup.

Your userspace mounted a tmpfs over /dev when it couldn't mount a second
identical instance of devtmpfs over itself. If you had a static /dev in
initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
error path would have taken that out too with a pointless tmpfs mount.

By the way, _why_ are you mounting a tmpfs over /dev on _initramfs_?
That can already be tmpfs. (Commits 137fdcc18a59 through 6e19eded3684.)

Feel free to send more context if you think I'm wrong about this.

> Yury

Rob

Re: Patch 0727d35de ("Make initramfs honor CONFIG_DEVTMPFS_MOUNT") breaks boot

2017-05-22 Thread Rob Landley

On 05/22/2017 07:05 AM, Yury Norov wrote:
> Hi Rob, 
> 
> I found that next-20170522 fails to boot on arm64 machine with the
> following log:

I don't know anything about your kernel config (is CONFIG_DEVTMPFS_MOUNT
enabled or disabled?) or what userspace you're booting with, but it
seems I can guess:

> [...]
> [4.179509] Freeing unused kernel memory: 1088K
> Loading, please wait...

At this point, the kernel has launched init and your userspace is
running. During that boot,the kernel mounted devtmpfs on /dev (you
edited the part where it did that out of your boot log), but the next line:

> mount: mounting udev on /dev failed: Device or resource busy

has an error that says you already have devtmpfs mounted on /dev, and
your userspace tries to mount devtmpfs on it _again_ and it fails
because you can't mount the exact same filesystem over itself due to a
sanity check in the kernel in fs/namespace.s line 2475 or so:

/* Refuse the same filesystem on the same mount point */
err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
goto unlock;

> W: devtmpfs not available, falling back to tmpfs for /dev
> Couldn't get a file descriptor referring to the console

At which point your userspace does a "fixup" mounting something else
over the previously working devtmpfs, which succeeds (because you're
mounting a _different_ filesystem and not hitting the above sanity
test), thus breaking your userspace.

> Begin: Loading essential drivers ... done.
> Begin: Running /scripts/init-premount ... done.
> Begin: Mounting root file system ... Begin: Running
> /scripts/local-top ... done.
> chvt: can't open console

And then your userspace didn't notice for a while.

> Gave up waiting for root device.  Common problems:
>  - Boot args (cat /proc/cmdline)
>- Check rootdelay= (did the system wait long enough?)
>- Check root= (did the system wait for the right device?)
>  - Missing modules (cat /proc/modules; ls /dev)
> chvt: can't open console
> ALERT!  /dev/sda does not exist.  Dropping to a shell!
> Couldn't get a file descriptor referring to the console

And then it died.

> BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash)
> Enter 'help' for a list of built-in commands.
> 
> (initramfs)
> 
> Bisect points to your patch (attached below). If I revert it, everything
> becomes fine. If you need to know something more about my environment,
> feel free to ask me.

You were inappropriately specifying CONFIG_DEVTMPFS_MOUNT in your
config, now that it's no longer being ignored your init script is having
an allergic reaction to it. Either yank it from your config or fix your
userspace. It looks to me like my patch triggered a bug in your setup.

Your userspace mounted a tmpfs over /dev when it couldn't mount a second
identical instance of devtmpfs over itself. If you had a static /dev in
initramfs but didn't configure _in_ devtmpfs to your kernel, your broken
error path would have taken that out too with a pointless tmpfs mount.

By the way, _why_ are you mounting a tmpfs over /dev on _initramfs_?
That can already be tmpfs. (Commits 137fdcc18a59 through 6e19eded3684.)

Feel free to send more context if you think I'm wrong about this.

> Yury

Rob

[tip:x86/urgent] x86/boot: Use CROSS_COMPILE prefix for readelf

2017-05-21 Thread tip-bot for Rob Landley

Commit-ID:  3780578761921f094179c6289072a74b2228c602
Gitweb: http://git.kernel.org/tip/3780578761921f094179c6289072a74b2228c602
Author: Rob Landley <r...@landley.net>
AuthorDate: Sat, 20 May 2017 15:03:29 -0500
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Sun, 21 May 2017 13:04:27 +0200

x86/boot: Use CROSS_COMPILE prefix for readelf

The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <r...@landley.net>
Acked-by: Kees Cook <keesc...@chromium.org>
Cc: Jiri Kosina <jkos...@suse.cz>
Cc: Paul Bolle <pebo...@tiscali.nl>
Cc: "H.J. Lu" <hjl.to...@gmail.com>
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a2...@landley.net
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 arch/x86/boot/compressed/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 44163e8..2c860ad 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
-   readelf -S $$obj | grep -qF .rel.local && { \
+   ${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \

[tip:x86/urgent] x86/boot: Use CROSS_COMPILE prefix for readelf

2017-05-21 Thread tip-bot for Rob Landley

Commit-ID:  3780578761921f094179c6289072a74b2228c602
Gitweb: http://git.kernel.org/tip/3780578761921f094179c6289072a74b2228c602
Author: Rob Landley 
AuthorDate: Sat, 20 May 2017 15:03:29 -0500
Committer:  Thomas Gleixner 
CommitDate: Sun, 21 May 2017 13:04:27 +0200

x86/boot: Use CROSS_COMPILE prefix for readelf

The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley 
Acked-by: Kees Cook 
Cc: Jiri Kosina 
Cc: Paul Bolle 
Cc: "H.J. Lu" 
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a2...@landley.net
Signed-off-by: Thomas Gleixner 

---
 arch/x86/boot/compressed/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 44163e8..2c860ad 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
-   readelf -S $$obj | grep -qF .rel.local && { \
+   ${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \

[PATCH] Make x86 use $TARGET-readelf like all the other arches.

2017-05-20 Thread Rob Landley

From: Rob Landley <r...@landley.net>

My cross-compile environment doesn't provide an unprefixed
readelf in the $PATH, which works fine on every target but x86,
where you get a bunch of "/bin/sh: 1: readelf: not found"
messages (but the result still works anyway).

Signed-off-by: Rob Landley <r...@landley.net>
---

 arch/x86/boot/compressed/Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 44163e8..2c860ad 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
-   readelf -S $$obj | grep -qF .rel.local && { \
+   ${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \

[PATCH] Make x86 use $TARGET-readelf like all the other arches.

2017-05-20 Thread Rob Landley

From: Rob Landley 

My cross-compile environment doesn't provide an unprefixed
readelf in the $PATH, which works fine on every target but x86,
where you get a bunch of "/bin/sh: 1: readelf: not found"
messages (but the result still works anyway).

Signed-off-by: Rob Landley 
---

 arch/x86/boot/compressed/Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 44163e8..2c860ad 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
-   readelf -S $$obj | grep -qF .rel.local && { \
+   ${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \

Re: [PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-17 Thread Rob Landley

On 05/16/2017 10:58 PM, Michael Ellerman wrote:
> Rob Landley <r...@landley.net> writes:
> 
>> diff --git a/init/main.c b/init/main.c
>> index f866510..9ec09ff 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
>>  if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
>>  ramdisk_execute_command = NULL;
>>  prepare_namespace();
>> +} else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
>> +sys_mkdir("/dev", 0755);
>> +devtmpfs_mount("/dev");
>>  }
>>  
>> +/* Open the /dev/console on the rootfs, this should never fail */
>> +if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
> 
> Sorry to pile on,

The correct phrase is "bikeshed". (It's a verb now.)

> but while you're moving it do you want

_I_ don't, no. I intentionally moved it unmodified. If you want to
submit a patch on top of mine, be my guest.

> to update this fairly misleading comment.

Define "should". (I'll get the popcorn.)

> It definitely can fail, eg. if /dev/console doesn't exist, or if no
> console driver is registered.

/dev/console not existing in an initramfs created by pointing
CONFIG_INITRAMFS_SOURCE at a directory created by a normal user was
pretty much my initial motivation for poking at this area, yes.

That said, /dev/console should always exist. My patch was just finding a
different way for it to exist so the condition was satisfied. Meaning
the comment isn't exactly _wrong_, just really terse. Feel free to
submit a patch rephrasing it.

Rob

Re: [PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-17 Thread Rob Landley

On 05/16/2017 10:58 PM, Michael Ellerman wrote:
> Rob Landley  writes:
> 
>> diff --git a/init/main.c b/init/main.c
>> index f866510..9ec09ff 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
>>  if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
>>  ramdisk_execute_command = NULL;
>>  prepare_namespace();
>> +} else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
>> +sys_mkdir("/dev", 0755);
>> +devtmpfs_mount("/dev");
>>  }
>>  
>> +/* Open the /dev/console on the rootfs, this should never fail */
>> +if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
> 
> Sorry to pile on,

The correct phrase is "bikeshed". (It's a verb now.)

> but while you're moving it do you want

_I_ don't, no. I intentionally moved it unmodified. If you want to
submit a patch on top of mine, be my guest.

> to update this fairly misleading comment.

Define "should". (I'll get the popcorn.)

> It definitely can fail, eg. if /dev/console doesn't exist, or if no
> console driver is registered.

/dev/console not existing in an initramfs created by pointing
CONFIG_INITRAMFS_SOURCE at a directory created by a normal user was
pretty much my initial motivation for poking at this area, yes.

That said, /dev/console should always exist. My patch was just finding a
different way for it to exist so the condition was satisfied. Meaning
the comment isn't exactly _wrong_, just really terse. Feel free to
submit a patch rephrasing it.

Rob

Re: [PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-14 Thread Rob Landley

Andrew asked for "a more complete changelog" and I've had
a reply window open for _days_ trying to figure out
what he wants. Maybe it's in the following somewhere...

Otherwise the same v2 patch.

From: Rob Landley <r...@landley.net>

Make initramfs honor CONFIG_DEVTMPFS_MOUNT (fixing commit
2b2af54a5bb6 which didn't bother), move /dev/console
open after devtmpfs mount, and update help text.

Commit 456eeabab849 in 2005 made gen_initramfs_list (when run
with no arguments) spit out an 'example' config creating /dev
and /dev/console. The kernel accidentally(?) included this
for many years when you didn't specify initramfs contents,
and of course grew dependencies on this /dev/console node
in the (often hidden) initramfs. Commit c33df4eaaf41 in 2007
explicitly preserved this dependency. Commit 2bd3a997befc in
2010 claimed it "removes the occasionally problematic assumption
that /dev/console exists from the boot code" but actually just
moved it later.

But nobody never tested statically linking an initramfs.
If you point CONFIG_INITRAMFS_SOURCE at a directory
running the build as a normal user you _don't_ get a
/dev/console (because you can't create it without being
root, and can't use the existing one out of /dev unless
you create your own initramfs list file), in which case init
runs with stdin/stdout/stderr closed and you get no output.

Eric's test case for his 2010 commit referenced above was:

  With this patch I was able to throw busybox on my /boot partition
  (which has no /dev directory) and boot into userspace without
  problems.

But it didn't work pointing CONFIG_INITRAMFS_SOURCE at a
directory of the same files. This provides the "automatically
mounting devtmpfs on /dev" workaround the earlier commit was
trying to avoid.

Signed-off-by: Rob Landley <r...@landley.net>
---

 drivers/base/Kconfig |   14 --
 init/main.c  |   15 +--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d718ae4..74779ee 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system to come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/init/main.c b/init/main.c
index f866510..9ec09ff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1038,12 +1038,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

Re: [PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-14 Thread Rob Landley

Andrew asked for "a more complete changelog" and I've had
a reply window open for _days_ trying to figure out
what he wants. Maybe it's in the following somewhere...

Otherwise the same v2 patch.

From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT (fixing commit
2b2af54a5bb6 which didn't bother), move /dev/console
open after devtmpfs mount, and update help text.

Commit 456eeabab849 in 2005 made gen_initramfs_list (when run
with no arguments) spit out an 'example' config creating /dev
and /dev/console. The kernel accidentally(?) included this
for many years when you didn't specify initramfs contents,
and of course grew dependencies on this /dev/console node
in the (often hidden) initramfs. Commit c33df4eaaf41 in 2007
explicitly preserved this dependency. Commit 2bd3a997befc in
2010 claimed it "removes the occasionally problematic assumption
that /dev/console exists from the boot code" but actually just
moved it later.

But nobody never tested statically linking an initramfs.
If you point CONFIG_INITRAMFS_SOURCE at a directory
running the build as a normal user you _don't_ get a
/dev/console (because you can't create it without being
root, and can't use the existing one out of /dev unless
you create your own initramfs list file), in which case init
runs with stdin/stdout/stderr closed and you get no output.

Eric's test case for his 2010 commit referenced above was:

  With this patch I was able to throw busybox on my /boot partition
  (which has no /dev directory) and boot into userspace without
  problems.

But it didn't work pointing CONFIG_INITRAMFS_SOURCE at a
directory of the same files. This provides the "automatically
mounting devtmpfs on /dev" workaround the earlier commit was
trying to avoid.

Signed-off-by: Rob Landley 
---

 drivers/base/Kconfig |   14 --
 init/main.c  |   15 +--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d718ae4..74779ee 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system to come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/init/main.c b/init/main.c
index f866510..9ec09ff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1038,12 +1038,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-13 Thread Rob Landley

On 05/12/2017 12:49 PM, Andreas Schwab wrote:
> On Mai 12 2017, Rob Landley <r...@landley.net> wrote:
> 
>> Last I checked I couldn't just "git push" the fullhist tree to
>> git.kernel.org because git graft didn't propagate right.
> 
> Perhaps you could recreate them with git replace --graft.  That creates
> replace objects that can be pushed and fetched.  (They are stored in
> refs/replace, and must be pushed/fetched explicitly.)

It's the "must be pushed/fetched explicitly" part that I couldn't figure
out back when I tried it.

I inherited this tree from somebody who made it. I noticed its existence
because lwn.net covered it, and then 6 months later it had vanished
without trace (as so many things do). I reproduced it from the build
script (if you can't reproduce the experiment from initial starting
conditions, it's not science), went "look, cool thing", and hosted a
copy with an occasional repaint.

I would be _thrilled_ to hand it off to somebody who knows what they're
doing with git. I'm just unusually interested in computer history and
the preservation thereof. (https://landley.net/history/mirror).

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-13 Thread Rob Landley

On 05/12/2017 12:49 PM, Andreas Schwab wrote:
> On Mai 12 2017, Rob Landley  wrote:
> 
>> Last I checked I couldn't just "git push" the fullhist tree to
>> git.kernel.org because git graft didn't propagate right.
> 
> Perhaps you could recreate them with git replace --graft.  That creates
> replace objects that can be pushed and fetched.  (They are stored in
> refs/replace, and must be pushed/fetched explicitly.)

It's the "must be pushed/fetched explicitly" part that I couldn't figure
out back when I tried it.

I inherited this tree from somebody who made it. I noticed its existence
because lwn.net covered it, and then 6 months later it had vanished
without trace (as so many things do). I reproduced it from the build
script (if you can't reproduce the experiment from initial starting
conditions, it's not science), went "look, cool thing", and hosted a
copy with an occasional repaint.

I would be _thrilled_ to hand it off to somebody who knows what they're
doing with git. I'm just unusually interested in computer history and
the preservation thereof. (https://landley.net/history/mirror).

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-13 Thread Rob Landley

On 05/13/2017 04:35 AM, Thomas Gleixner wrote:
> On Fri, 12 May 2017, Eric W. Biederman wrote:
>> Which leaves me perplexed.  The hashes from tglx's current tree:
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>> on kernel.org and the hashes in your full history tree differ.
>> Given that they are in theory the same tree this distrubs me.

The original build script used to make fullhist is at:

  http://landley.net/kdocs/fullhist/make-full-linux-history.tgz

And his original description of what he did and why is at:

  https://lwn.net/Articles/285366/

He mentioned something about rewriting dates?

  I used the "graft" feature of git (thanks to Junio and people
  on #git for the tip) to link them together. I also modified
  (via a git-filter-branch) the dates of some commits as for
  instance all commits from the Dave Jones's repo had the
  same date (23 Nov 2007). For this I mainly used the timestamp info
  of files on kernel.org. The script and info I used are also
  available on my website[2].

(I tried to read his conversion plumbing but it's in ocaml.)

Apparently he only considered the git commits in Linus's tree to be
worth preserving. I'd forgotten that part. (It was 9 years ago. I
remembered the pre-bitkeeper tree got edited but I forgot the other one
did too.)

>> Case in point in the commit connected to:
>> "[PATCH] linux-2.5.66-signal-cleanup.patch"
>> in tglx's tree is:   da334d91ff7001d234863fc7692de1ff90bed57a
> 
> That's the proper sha1 for my tree. I jsut verified it against the original
> tree which I still have in my archive.
> 
>> *scratches my head*
>>
>> Something appears to have changed somewhere.
> 
> Correct. That full history git rewrote the commits in my bitkeeper import.

I only checked that the current ones in Linus's tree were the same.
Nobody'd ever pointed me at a file hash in your conversion of bitkeeper
to git, so over the years I forgot that the date editing extended into
bitkeeper for some reason.

> history.git:
> 
>   commit 7a2deb32924142696b8174cdf9b38cd72a11fc96
>   Author: Linus Torvalds 
>   Date:   Mon Feb 4 17:40:40 2002 -0800
> 
> Import changeset

February 4, 2002.

> full-history:
> 
>   commit 26245c315da55330cb25dbfdd80be62db41dedb2
>   Author: linus1 
>   Date:   Thu Jan 4 12:00:00 2001 -0600
> 
> Import changeset

January 4, 2001.

According to https://www.kernel.org/pub/linux/kernel/v2.4/ January 4
2001 is when 2.4.0 was released. So yes, it looks like he rewrote these
dates to be correct.

I see what he did. Linus started his bitkeeper tree by importing 2.4.0
and then applying a year's worth of release diffs from 2.4.0 as
individual commits. That year+ worth of work was all dated February 4,
2002 in the repo, so the fullhist script went through and changed the
dates on those commits to match the release tarballs for those kernel
versions, and that changed the hashes in the rest of the history tree.

Upside, there's no longer a year+ hole in the commit dates (which makes
looking up associated mailing list posts a lot easier). Downside: this
changed the history.git commit hashes for the rest of that era. (I'd
missed that.)

> and as a consequence all other commits have different shas as well.

The most embarassing part is that the ocaml plumbing appears to
occasionally leak host context when doing the conversion, specifically
from "git log 26245c315da5" (checking to make sure the fullhist tree's
dates make sense in context) I get:

commit 26245c315da55330cb25dbfdd80be62db41dedb2
Author: linus1 
Date:   Thu Jan 4 12:00:00 2001 -0600

Import changeset

commit 13a80dffb74939e292b6e90e5d79dd26d577489f
Author: linus1 
Date:   Thu Jan 4 12:00:00 2001 -0600

add prerelease patch to get a 2.4.0

commit 4c5b4d50bb08753433f5962bd926198fe2b7105d
Author: linus1 
Date:   Sun Dec 31 12:00:00 2000 -0600

That landley@driftood should not be there. Sigh.

I guess the question is which is more broken? I linked the build scripts
above if somebody else wants to modify or rerun them, but... lithp. Do
you prefer a year gap in the archive dates, or do you prefer to call the
history.git hashes cannonical?

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-13 Thread Rob Landley

On 05/13/2017 04:35 AM, Thomas Gleixner wrote:
> On Fri, 12 May 2017, Eric W. Biederman wrote:
>> Which leaves me perplexed.  The hashes from tglx's current tree:
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>> on kernel.org and the hashes in your full history tree differ.
>> Given that they are in theory the same tree this distrubs me.

The original build script used to make fullhist is at:

  http://landley.net/kdocs/fullhist/make-full-linux-history.tgz

And his original description of what he did and why is at:

  https://lwn.net/Articles/285366/

He mentioned something about rewriting dates?

  I used the "graft" feature of git (thanks to Junio and people
  on #git for the tip) to link them together. I also modified
  (via a git-filter-branch) the dates of some commits as for
  instance all commits from the Dave Jones's repo had the
  same date (23 Nov 2007). For this I mainly used the timestamp info
  of files on kernel.org. The script and info I used are also
  available on my website[2].

(I tried to read his conversion plumbing but it's in ocaml.)

Apparently he only considered the git commits in Linus's tree to be
worth preserving. I'd forgotten that part. (It was 9 years ago. I
remembered the pre-bitkeeper tree got edited but I forgot the other one
did too.)

>> Case in point in the commit connected to:
>> "[PATCH] linux-2.5.66-signal-cleanup.patch"
>> in tglx's tree is:   da334d91ff7001d234863fc7692de1ff90bed57a
> 
> That's the proper sha1 for my tree. I jsut verified it against the original
> tree which I still have in my archive.
> 
>> *scratches my head*
>>
>> Something appears to have changed somewhere.
> 
> Correct. That full history git rewrote the commits in my bitkeeper import.

I only checked that the current ones in Linus's tree were the same.
Nobody'd ever pointed me at a file hash in your conversion of bitkeeper
to git, so over the years I forgot that the date editing extended into
bitkeeper for some reason.

> history.git:
> 
>   commit 7a2deb32924142696b8174cdf9b38cd72a11fc96
>   Author: Linus Torvalds 
>   Date:   Mon Feb 4 17:40:40 2002 -0800
> 
> Import changeset

February 4, 2002.

> full-history:
> 
>   commit 26245c315da55330cb25dbfdd80be62db41dedb2
>   Author: linus1 
>   Date:   Thu Jan 4 12:00:00 2001 -0600
> 
> Import changeset

January 4, 2001.

According to https://www.kernel.org/pub/linux/kernel/v2.4/ January 4
2001 is when 2.4.0 was released. So yes, it looks like he rewrote these
dates to be correct.

I see what he did. Linus started his bitkeeper tree by importing 2.4.0
and then applying a year's worth of release diffs from 2.4.0 as
individual commits. That year+ worth of work was all dated February 4,
2002 in the repo, so the fullhist script went through and changed the
dates on those commits to match the release tarballs for those kernel
versions, and that changed the hashes in the rest of the history tree.

Upside, there's no longer a year+ hole in the commit dates (which makes
looking up associated mailing list posts a lot easier). Downside: this
changed the history.git commit hashes for the rest of that era. (I'd
missed that.)

> and as a consequence all other commits have different shas as well.

The most embarassing part is that the ocaml plumbing appears to
occasionally leak host context when doing the conversion, specifically
from "git log 26245c315da5" (checking to make sure the fullhist tree's
dates make sense in context) I get:

commit 26245c315da55330cb25dbfdd80be62db41dedb2
Author: linus1 
Date:   Thu Jan 4 12:00:00 2001 -0600

Import changeset

commit 13a80dffb74939e292b6e90e5d79dd26d577489f
Author: linus1 
Date:   Thu Jan 4 12:00:00 2001 -0600

add prerelease patch to get a 2.4.0

commit 4c5b4d50bb08753433f5962bd926198fe2b7105d
Author: linus1 
Date:   Sun Dec 31 12:00:00 2000 -0600

That landley@driftood should not be there. Sigh.

I guess the question is which is more broken? I linked the build scripts
above if somebody else wants to modify or rerun them, but... lithp. Do
you prefer a year gap in the archive dates, or do you prefer to call the
history.git hashes cannonical?

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-12 Thread Rob Landley

On 05/12/2017 09:45 AM, Eric W. Biederman wrote:
> Thomas Gleixner  writes:
> 
>> On Fri, 12 May 2017, Michael Ellerman wrote:
>>>   Fixes: BKrev: 3e8e57a1JvR25MkFRNzoz85l2Gzccg ("[PATCH] 
>>> linux-2.5.66-signal-cleanup.patch")
>>>
>>> In your tree that is c3c107051660 ("[PATCH] 
>>> linux-2.5.66-signal-cleanup.patch"),
>>> but you don't have the 3e8e57a1JvR25MkFRNzoz85l2Gzccg revision recorded
>>> anywhere that I can see.
>>
>> That's correct. I did not include the BK revisions when I imported the
>> commits into the history git. I did not see any reason to do so. I still
>> have no idea what the value would have been or why anyone wants to
>> reference them at all.
> 
> Thomas your import seems to be significantly better than the one I got
> my hands on years ago.
> 
> I just know that if were to do something similar today we would really
> want to preserve the existing git sha1 hashes somewhere because we
> refer to commits everywhere in the code.

Which is why the https://landley.net/kdocs/fullhist tree uses "git
graft", so the git commit numbers are the same.

As Yoann Padioleau said:

> It's built from 3 other git repositories:
>  - the one from Dave Jones from 0.01 to 2.4.0,
>  - the one from tglx from 2.4.0 to 2.6.12,
>  - the one from Linus Torvalds from 2.6.12 to now.

And the hashes in his tree were the same as in each of those trees, all
three of which are on git.kernel.org. If you "git pull" the fullhist
tree to current, it still uses the same hashes today. (I think you can
still reproduce it localy using his scripts, which I mirrored. You'll
have to manually re-tag those old commits from last message, and reset
the "upstream" to pull current from.)

Last I checked I couldn't just "git push" the fullhist tree to
git.kernel.org because git graft didn't propagate right. I had to start
people from a tarball. Local clones doing the hardlink thing worked fine
though. (Maybe that's changed?)

> So I was imagining that bitkeeper would be similar.

When Larry flounced and people lost access to the bitkeeper tool they
lost access to read the old data, so what the bitkeeper numbers were
became irrelevant. That's why nobody's cared before now.

You're looking for a consistent way to refer to old commits, even using
bitkeeper numbers wouldn't fully solve that problem (it only goes back
to 2.4). Between the dave jones and tglx trees, there's complete
coverage back to 0.0.1. Yoann stitched them together, and I've kept a
current version. I used to host it on kernel.org/doc until
http://lkml.iu.edu/hypermail/linux/kernel/1411.3/04693.html happened,
they've since deleted it but it's GPL so anybody who wants to host a
mirror... :)

I'm traveling and not downloading a gigabyte through my phone tether
(darn tmobile 4 gig monthly tethering limit) but the date on the
https://landley.net/kdocs/local/linux-fullhist.tar.bz2
tarball is February 2016 so I'm pretty sure that's 4.0 with the old
major releases tagged (ala last email). Anybody who wants to mirror it
somewhere more official (and presumably .xz instead of .bz2) is welcome to.

(I would if I still had rsync access to kernel.org/doc, but alas I can't
even get them to link kernel.org/doc/Documentation from the page above
it. It used to, they accidentally deleted it, and nobody maintains the
page anymore...)

> Especially since the copy of the bitkeeper
> import into git had appened to each commit a BKrev which I presume
> tacked back to the original source.
> 
> If everyone who had imported the bitkeepr tree had done that it would
> not have mattered which bitkeeper import you were using they would all
> share a common identifier for commits.  With that absent the robustness
> we have to allow looking things up in an alternate tree lies solely
> with the one line patch description.
> 
> Compare the quotes lines above with what I have below.  Every tree
> appears to have a different identifier.

The commits in the fullhist tree have been stable since at least
https://lwn.net/Articles/285366/ which was June 6, 2008. It's derived
from earlier trees with the same commits, and kept those commit hashes.

> Below is what I wound up doing, and have queued for the next merge
> window.  Comments?

I've bisected or used the "git annotate... git annotate HASH^1... git
annotate NEXTHASH^1..." peeling trick back to some really old commits
over the years, which I've then referred to by submitter and date if it
wasn't in the current git tree.

I'll happily give people hashes out of the fullhist tree if they ask,
but haven't assumed they're using it. But if you're looking for an
existing standard, this exists and predates my use of it.

> Eric

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-12 Thread Rob Landley

On 05/12/2017 09:45 AM, Eric W. Biederman wrote:
> Thomas Gleixner  writes:
> 
>> On Fri, 12 May 2017, Michael Ellerman wrote:
>>>   Fixes: BKrev: 3e8e57a1JvR25MkFRNzoz85l2Gzccg ("[PATCH] 
>>> linux-2.5.66-signal-cleanup.patch")
>>>
>>> In your tree that is c3c107051660 ("[PATCH] 
>>> linux-2.5.66-signal-cleanup.patch"),
>>> but you don't have the 3e8e57a1JvR25MkFRNzoz85l2Gzccg revision recorded
>>> anywhere that I can see.
>>
>> That's correct. I did not include the BK revisions when I imported the
>> commits into the history git. I did not see any reason to do so. I still
>> have no idea what the value would have been or why anyone wants to
>> reference them at all.
> 
> Thomas your import seems to be significantly better than the one I got
> my hands on years ago.
> 
> I just know that if were to do something similar today we would really
> want to preserve the existing git sha1 hashes somewhere because we
> refer to commits everywhere in the code.

Which is why the https://landley.net/kdocs/fullhist tree uses "git
graft", so the git commit numbers are the same.

As Yoann Padioleau said:

> It's built from 3 other git repositories:
>  - the one from Dave Jones from 0.01 to 2.4.0,
>  - the one from tglx from 2.4.0 to 2.6.12,
>  - the one from Linus Torvalds from 2.6.12 to now.

And the hashes in his tree were the same as in each of those trees, all
three of which are on git.kernel.org. If you "git pull" the fullhist
tree to current, it still uses the same hashes today. (I think you can
still reproduce it localy using his scripts, which I mirrored. You'll
have to manually re-tag those old commits from last message, and reset
the "upstream" to pull current from.)

Last I checked I couldn't just "git push" the fullhist tree to
git.kernel.org because git graft didn't propagate right. I had to start
people from a tarball. Local clones doing the hardlink thing worked fine
though. (Maybe that's changed?)

> So I was imagining that bitkeeper would be similar.

When Larry flounced and people lost access to the bitkeeper tool they
lost access to read the old data, so what the bitkeeper numbers were
became irrelevant. That's why nobody's cared before now.

You're looking for a consistent way to refer to old commits, even using
bitkeeper numbers wouldn't fully solve that problem (it only goes back
to 2.4). Between the dave jones and tglx trees, there's complete
coverage back to 0.0.1. Yoann stitched them together, and I've kept a
current version. I used to host it on kernel.org/doc until
http://lkml.iu.edu/hypermail/linux/kernel/1411.3/04693.html happened,
they've since deleted it but it's GPL so anybody who wants to host a
mirror... :)

I'm traveling and not downloading a gigabyte through my phone tether
(darn tmobile 4 gig monthly tethering limit) but the date on the
https://landley.net/kdocs/local/linux-fullhist.tar.bz2
tarball is February 2016 so I'm pretty sure that's 4.0 with the old
major releases tagged (ala last email). Anybody who wants to mirror it
somewhere more official (and presumably .xz instead of .bz2) is welcome to.

(I would if I still had rsync access to kernel.org/doc, but alas I can't
even get them to link kernel.org/doc/Documentation from the page above
it. It used to, they accidentally deleted it, and nobody maintains the
page anymore...)

> Especially since the copy of the bitkeeper
> import into git had appened to each commit a BKrev which I presume
> tacked back to the original source.
> 
> If everyone who had imported the bitkeepr tree had done that it would
> not have mattered which bitkeeper import you were using they would all
> share a common identifier for commits.  With that absent the robustness
> we have to allow looking things up in an alternate tree lies solely
> with the one line patch description.
> 
> Compare the quotes lines above with what I have below.  Every tree
> appears to have a different identifier.

The commits in the fullhist tree have been stable since at least
https://lwn.net/Articles/285366/ which was June 6, 2008. It's derived
from earlier trees with the same commits, and kept those commit hashes.

> Below is what I wound up doing, and have queued for the next merge
> window.  Comments?

I've bisected or used the "git annotate... git annotate HASH^1... git
annotate NEXTHASH^1..." peeling trick back to some really old commits
over the years, which I've then referred to by submitter and date if it
wasn't in the current git tree.

I'll happily give people hashes out of the fullhist tree if they ask,
but haven't assumed they're using it. But if you're looking for an
existing standard, this exists and predates my use of it.

> Eric

Rob

[PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-11 Thread Rob Landley

From: Rob Landley <r...@landley.net>

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, move /dev/console
open after devtmpfs mount, and update help text.

Signed-off-by: Rob Landley <r...@landley.net>
---

 drivers/base/Kconfig |   14 --
 init/main.c  |   15 +--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d718ae4..74779ee 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/init/main.c b/init/main.c
index f866510..9ec09ff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1038,12 +1038,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

[PATCHv2] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-11 Thread Rob Landley

From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, move /dev/console
open after devtmpfs mount, and update help text.

Signed-off-by: Rob Landley 
---

 drivers/base/Kconfig |   14 --
 init/main.c  |   15 +--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d718ae4..74779ee 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@ config DEVTMPFS_MOUNT
bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
depends on DEVTMPFS
help
- This will instruct the kernel to automatically mount the
- devtmpfs filesystem at /dev, directly after the kernel has
- mounted the root filesystem. The behavior can be overridden
- with the commandline parameter: devtmpfs.mount=0|1.
- This option does not affect initramfs based booting, here
- the devtmpfs filesystem always needs to be mounted manually
- after the rootfs is mounted.
- With this option enabled, it allows to bring up a system in
- rescue mode with init=/bin/sh, even when the /dev directory
- on the rootfs is completely empty.
+ Automatically mount devtmpfs at /dev on the root filesystem, which
+ lets the system come up in rescue mode with [rd]init=/bin/sh.
+ Override with devtmpfs.mount=0 on the commandline. Initramfs can
+ create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
bool "Select only drivers that don't need compile-time external 
firmware"
diff --git a/init/main.c b/init/main.c
index f866510..9ec09ff 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1038,12 +1038,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1055,8 +1049,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   devtmpfs_mount("/dev");
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

Re: [PATCH] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-11 Thread Rob Landley

On 05/09/2017 04:31 PM, Andrew Morton wrote:
> On Thu, 4 May 2017 16:09:06 -0500 Rob Landley <r...@landley.net> wrote:
> 
>> From: Rob Landley <r...@landley.net>
>>
>> Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
>> /dev/console open after devtmpfs mount.
> 
> 
> Could we please see complete description of the runtime effects of this
> change?  How does it affect users?  How does it benefit users?

It makes the behavior consistent. If you're going to have the config
symbol anyway, why is initramfs a second class citizen?

That said, I was fixing a specific bug when I started the patch: when
you statically link in an initramfs by pointing the kernel build at a
directory (so it makes its own cpio archive from that), if you're not
running the build as root you can't create dev/console in there and
there's no obvious way to add nodes (like you can editing the
gen_initramfs_list) output.

This means there's no /dev/console when init gets launched, so PID 1's
stdin/stdout/stderr go nowhere, and until your init script can open its
own and redirect you get no output if something goes wrong, so debugging
is fiddly and there's a hole where output gets lost. Userspace can't
close that hole.

When making the patch I did a version that mounted /proc /sys and
/dev/pts too, so rdinit=/bin/sh had pretty much its full environment
without an init script just like the DEVTMPFS_MOUNT option's help text
implied... but that seemed unlikely to be accepted. The console gap is a
problem userspace can't fix, the rest userspace can, so I did the
minimal thing.

> The DEVTMPFS_MOUNT Kconfig help (drivers/base/Kconfig) says:
> 
> This option does not affect initramfs based booting, here
> the devtmpfs filesystem always needs to be mounted manually
> after the rootfs is mounted.
> 
> which seems to no longer be correct?

Ah, sorry. I rewrote the help text and didn't include that file in the
diff. And rechecking I see the override part wasn't implemented by my
patch, I'll send a new one.

Rob

Re: [PATCH] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-11 Thread Rob Landley

On 05/09/2017 04:31 PM, Andrew Morton wrote:
> On Thu, 4 May 2017 16:09:06 -0500 Rob Landley  wrote:
> 
>> From: Rob Landley 
>>
>> Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
>> /dev/console open after devtmpfs mount.
> 
> 
> Could we please see complete description of the runtime effects of this
> change?  How does it affect users?  How does it benefit users?

It makes the behavior consistent. If you're going to have the config
symbol anyway, why is initramfs a second class citizen?

That said, I was fixing a specific bug when I started the patch: when
you statically link in an initramfs by pointing the kernel build at a
directory (so it makes its own cpio archive from that), if you're not
running the build as root you can't create dev/console in there and
there's no obvious way to add nodes (like you can editing the
gen_initramfs_list) output.

This means there's no /dev/console when init gets launched, so PID 1's
stdin/stdout/stderr go nowhere, and until your init script can open its
own and redirect you get no output if something goes wrong, so debugging
is fiddly and there's a hole where output gets lost. Userspace can't
close that hole.

When making the patch I did a version that mounted /proc /sys and
/dev/pts too, so rdinit=/bin/sh had pretty much its full environment
without an init script just like the DEVTMPFS_MOUNT option's help text
implied... but that seemed unlikely to be accepted. The console gap is a
problem userspace can't fix, the rest userspace can, so I did the
minimal thing.

> The DEVTMPFS_MOUNT Kconfig help (drivers/base/Kconfig) says:
> 
> This option does not affect initramfs based booting, here
> the devtmpfs filesystem always needs to be mounted manually
> after the rootfs is mounted.
> 
> which seems to no longer be correct?

Ah, sorry. I rewrote the help text and didn't include that file in the
diff. And rechecking I see the override part wasn't implemented by my
patch, I'll send a new one.

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-11 Thread Rob Landley

On 05/11/2017 01:59 AM, Michael Ellerman wrote:
> Linus Torvalds <torva...@linux-foundation.org> writes:
> 
>> On Wed, May 10, 2017 at 3:04 PM, Eric W. Biederman
>> <ebied...@xmission.com> wrote:
>>>
>>> Thomas Gleixner appears to have a tree with all of those same commits
>>> except with the BKrev tags stripped out.
>>
>> That's the best import - so use that tree by Thomas, and just use the
>> git revision numbers in it (and say "tglx's linux-history tree" or
>> something).
> 
> I've been using this one by Rob Landley which seems good:
> 
> https://landley.net/kdocs/fullhist/
>
> It's grafted into the modern history so you can search seamlessly
> between the two which is pretty nice. I don't see any Bitkeeper tags
> though.

I went through and found/tagged the major old releases, did I forget to
upload a new tarball after that?

v0.0.1 cff5a6fb66765e90470f4d9ca2398da0ca3c75d5
v1.0.0 a068026b4a060e822892a64d5107fb58c45743ef
v1.2.0 8610c92442d125f165dc84e4a96f5cbc9b240484
v2.0.0 a374953c636bd91ea40b2d1e31af5405b90e8bf8
v2.2.0 bf330b5e3c471d0b67737c4822b0174ef4f89bed
v2.4.0 13a80dffb74939e292b6e90e5d79dd26d577489f
v2.6.0 4e9b4bc7a660962ae5f04f939469263b91cf95c2

Rob

Re: Is there an recommended way to refer to bitkeepr commits?

2017-05-11 Thread Rob Landley

On 05/11/2017 01:59 AM, Michael Ellerman wrote:
> Linus Torvalds  writes:
> 
>> On Wed, May 10, 2017 at 3:04 PM, Eric W. Biederman
>>  wrote:
>>>
>>> Thomas Gleixner appears to have a tree with all of those same commits
>>> except with the BKrev tags stripped out.
>>
>> That's the best import - so use that tree by Thomas, and just use the
>> git revision numbers in it (and say "tglx's linux-history tree" or
>> something).
> 
> I've been using this one by Rob Landley which seems good:
> 
> https://landley.net/kdocs/fullhist/
>
> It's grafted into the modern history so you can search seamlessly
> between the two which is pretty nice. I don't see any Bitkeeper tags
> though.

I went through and found/tagged the major old releases, did I forget to
upload a new tarball after that?

v0.0.1 cff5a6fb66765e90470f4d9ca2398da0ca3c75d5
v1.0.0 a068026b4a060e822892a64d5107fb58c45743ef
v1.2.0 8610c92442d125f165dc84e4a96f5cbc9b240484
v2.0.0 a374953c636bd91ea40b2d1e31af5405b90e8bf8
v2.2.0 bf330b5e3c471d0b67737c4822b0174ef4f89bed
v2.4.0 13a80dffb74939e292b6e90e5d79dd26d577489f
v2.6.0 4e9b4bc7a660962ae5f04f939469263b91cf95c2

Rob

[PATCH] Clarify help text that compression applies to ramfs as well as legacy ramdisk.

2017-05-04 Thread Rob Landley

From: Rob Landley <r...@landley.net>

Clarify help text that compression applies to ramfs as well as legacy ramdisk.

Signed-off-by: Rob Landley <r...@landley.net>
---

 usr/Kconfig |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/usr/Kconfig b/usr/Kconfig
index 572dcf7..d6f4633 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -46,7 +46,7 @@ config INITRAMFS_ROOT_GID
  If you are not sure, leave it set to "0".
 
 config RD_GZIP
-   bool "Support initial ramdisks compressed using gzip"
+   bool "Support initial ramdisk/ramfs compressed using gzip"
depends on BLK_DEV_INITRD
default y
select DECOMPRESS_GZIP
@@ -55,7 +55,7 @@ config RD_GZIP
  If unsure, say Y.
 
 config RD_BZIP2
-   bool "Support initial ramdisks compressed using bzip2"
+   bool "Support initial ramdisk/ramfs compressed using bzip2"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_BZIP2
@@ -64,7 +64,7 @@ config RD_BZIP2
  If unsure, say N.
 
 config RD_LZMA
-   bool "Support initial ramdisks compressed using LZMA"
+   bool "Support initial ramdisk/ramfs compressed using LZMA"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZMA
@@ -73,7 +73,7 @@ config RD_LZMA
  If unsure, say N.
 
 config RD_XZ
-   bool "Support initial ramdisks compressed using XZ"
+   bool "Support initial ramdisk/ramfs compressed using XZ"
depends on BLK_DEV_INITRD
default y
select DECOMPRESS_XZ
@@ -82,7 +82,7 @@ config RD_XZ
  If unsure, say N.
 
 config RD_LZO
-   bool "Support initial ramdisks compressed using LZO"
+   bool "Support initial ramdisk/ramfs compressed using LZO"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZO
@@ -91,7 +91,7 @@ config RD_LZO
  If unsure, say N.
 
 config RD_LZ4
-   bool "Support initial ramdisks compressed using LZ4"
+   bool "Support initial ramdisk/ramfs compressed using LZ4"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZ4

[PATCH] Clarify help text that compression applies to ramfs as well as legacy ramdisk.

2017-05-04 Thread Rob Landley

From: Rob Landley 

Clarify help text that compression applies to ramfs as well as legacy ramdisk.

Signed-off-by: Rob Landley 
---

 usr/Kconfig |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/usr/Kconfig b/usr/Kconfig
index 572dcf7..d6f4633 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -46,7 +46,7 @@ config INITRAMFS_ROOT_GID
  If you are not sure, leave it set to "0".
 
 config RD_GZIP
-   bool "Support initial ramdisks compressed using gzip"
+   bool "Support initial ramdisk/ramfs compressed using gzip"
depends on BLK_DEV_INITRD
default y
select DECOMPRESS_GZIP
@@ -55,7 +55,7 @@ config RD_GZIP
  If unsure, say Y.
 
 config RD_BZIP2
-   bool "Support initial ramdisks compressed using bzip2"
+   bool "Support initial ramdisk/ramfs compressed using bzip2"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_BZIP2
@@ -64,7 +64,7 @@ config RD_BZIP2
  If unsure, say N.
 
 config RD_LZMA
-   bool "Support initial ramdisks compressed using LZMA"
+   bool "Support initial ramdisk/ramfs compressed using LZMA"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZMA
@@ -73,7 +73,7 @@ config RD_LZMA
  If unsure, say N.
 
 config RD_XZ
-   bool "Support initial ramdisks compressed using XZ"
+   bool "Support initial ramdisk/ramfs compressed using XZ"
depends on BLK_DEV_INITRD
default y
select DECOMPRESS_XZ
@@ -82,7 +82,7 @@ config RD_XZ
  If unsure, say N.
 
 config RD_LZO
-   bool "Support initial ramdisks compressed using LZO"
+   bool "Support initial ramdisk/ramfs compressed using LZO"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZO
@@ -91,7 +91,7 @@ config RD_LZO
  If unsure, say N.
 
 config RD_LZ4
-   bool "Support initial ramdisks compressed using LZ4"
+   bool "Support initial ramdisk/ramfs compressed using LZ4"
default y
depends on BLK_DEV_INITRD
select DECOMPRESS_LZ4

[PATCH] Teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user".

2017-05-04 Thread Rob Landley

From: Rob Landley <r...@landley.net>

Teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user".

Signed-off-by: Rob Landley <r...@landley.net>
---

 scripts/gen_initramfs_list.sh |2 ++
 usr/Kconfig   |   12 
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/scripts/gen_initramfs_list.sh b/scripts/gen_initramfs_list.sh
index 17fa901..7666fa1 100755
--- a/scripts/gen_initramfs_list.sh
+++ b/scripts/gen_initramfs_list.sh
@@ -268,10 +268,12 @@ while [ $# -gt 0 ]; do
case "$arg" in
"-u")   # map $1 to uid=0 (root)
root_uid="$1"
+   [ "$root_uid" = "-1" ] && root_uid=$(id -u || echo 0)
shift
;;
"-g")   # map $1 to gid=0 (root)
root_gid="$1"
+   [ "$root_gid" = "-1" ] && root_gid=$(id -g || echo 0)
shift
;;
"-d")   # display default initramfs list
diff --git a/usr/Kconfig b/usr/Kconfig
index 572dcf7..3b6ff16 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -26,10 +26,8 @@ config INITRAMFS_ROOT_UID
depends on INITRAMFS_SOURCE!=""
default "0"
help
- This setting is only meaningful if the INITRAMFS_SOURCE is
- contains a directory.  Setting this user ID (UID) to something
- other than "0" will cause all files owned by that UID to be
- owned by user root in the initial ramdisk image.
+ If INITRAMFS_SOURCE points to a directory, files owned by this UID
+ (-1 = current user) will be owned by root in the resulting image.
 
  If you are not sure, leave it set to "0".
 
@@ -38,10 +36,8 @@ config INITRAMFS_ROOT_GID
depends on INITRAMFS_SOURCE!=""
default "0"
help
- This setting is only meaningful if the INITRAMFS_SOURCE is
- contains a directory.  Setting this group ID (GID) to something
- other than "0" will cause all files owned by that GID to be
- owned by group root in the initial ramdisk image.
+ If INITRAMFS_SOURCE points to a directory, files owned by this GID
+ (-1 = current group) will be owned by root in the resulting image.
 
  If you are not sure, leave it set to "0".

[PATCH] Teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user".

2017-05-04 Thread Rob Landley

From: Rob Landley 

Teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user".

Signed-off-by: Rob Landley 
---

 scripts/gen_initramfs_list.sh |2 ++
 usr/Kconfig   |   12 
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/scripts/gen_initramfs_list.sh b/scripts/gen_initramfs_list.sh
index 17fa901..7666fa1 100755
--- a/scripts/gen_initramfs_list.sh
+++ b/scripts/gen_initramfs_list.sh
@@ -268,10 +268,12 @@ while [ $# -gt 0 ]; do
case "$arg" in
"-u")   # map $1 to uid=0 (root)
root_uid="$1"
+   [ "$root_uid" = "-1" ] && root_uid=$(id -u || echo 0)
shift
;;
"-g")   # map $1 to gid=0 (root)
root_gid="$1"
+   [ "$root_gid" = "-1" ] && root_gid=$(id -g || echo 0)
shift
;;
"-d")   # display default initramfs list
diff --git a/usr/Kconfig b/usr/Kconfig
index 572dcf7..3b6ff16 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -26,10 +26,8 @@ config INITRAMFS_ROOT_UID
depends on INITRAMFS_SOURCE!=""
default "0"
help
- This setting is only meaningful if the INITRAMFS_SOURCE is
- contains a directory.  Setting this user ID (UID) to something
- other than "0" will cause all files owned by that UID to be
- owned by user root in the initial ramdisk image.
+ If INITRAMFS_SOURCE points to a directory, files owned by this UID
+ (-1 = current user) will be owned by root in the resulting image.
 
  If you are not sure, leave it set to "0".
 
@@ -38,10 +36,8 @@ config INITRAMFS_ROOT_GID
depends on INITRAMFS_SOURCE!=""
default "0"
help
- This setting is only meaningful if the INITRAMFS_SOURCE is
- contains a directory.  Setting this group ID (GID) to something
- other than "0" will cause all files owned by that GID to be
- owned by group root in the initial ramdisk image.
+ If INITRAMFS_SOURCE points to a directory, files owned by this GID
+ (-1 = current group) will be owned by root in the resulting image.
 
  If you are not sure, leave it set to "0".

[PATCH] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-04 Thread Rob Landley

From: Rob Landley <r...@landley.net>

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Signed-off-by: Rob Landley <r...@landley.net>
---

 init/main.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/init/main.c b/init/main.c
index 2858be7..71ed0d7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1016,12 +1016,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1033,8 +1027,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   sys_mount("dev", "dev", "devtmpfs", MS_SILENT, NULL);
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

[PATCH] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2017-05-04 Thread Rob Landley

From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Signed-off-by: Rob Landley 
---

 init/main.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/init/main.c b/init/main.c
index 2858be7..71ed0d7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1016,12 +1016,6 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
-   /* Open the /dev/console on the rootfs, this should never fail */
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   pr_err("Warning: unable to open an initial console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
@@ -1033,8 +1027,17 @@ static noinline void __init kernel_init_freeable(void)
if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
ramdisk_execute_command = NULL;
prepare_namespace();
+   } else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+   sys_mkdir("/dev", 0755);
+   sys_mount("dev", "dev", "devtmpfs", MS_SILENT, NULL);
}
 
+   /* Open the /dev/console on the rootfs, this should never fail */
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   pr_err("Warning: unable to open an initial console.\n");
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the

Re: [PATCH 1/3] futex: remove duplicated code

2017-03-08 Thread Rob Landley

On 03/04/2017 07:05 AM, Russell King - ARM Linux wrote:
> On Fri, Mar 03, 2017 at 01:27:10PM +0100, Jiri Slaby wrote:
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index b687cb22301c..c5ff9850952f 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -1457,6 +1457,42 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
>> nr_wake, u32 bitset)
>>  return ret;
>>  }
>>  
>> +static int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
>> +{
>> +int op = (encoded_op >> 28) & 7;
>> +int cmp = (encoded_op >> 24) & 15;
>> +int oparg = (encoded_op << 8) >> 20;
>> +int cmparg = (encoded_op << 20) >> 20;
> 
> Hmm.  oparg and cmparg look like they're doing these shifts to get sign
> extension of the 12-bit values by assuming that "int" is 32-bit -
> probably worth a comment, or for safety, they should be "s32" so it's
> not dependent on the bit-width of "int".

I thought Linux depended on the LP64 standard for all architectures?

Standard: http://www.unix.org/whitepapers/64bit.html
Rationale: http://www.unix.org/version2/whatsnew/lp64_wp.html

So int has a defined bit width (32) on linux?

Rob

Re: [PATCH 1/3] futex: remove duplicated code

2017-03-08 Thread Rob Landley

On 03/04/2017 07:05 AM, Russell King - ARM Linux wrote:
> On Fri, Mar 03, 2017 at 01:27:10PM +0100, Jiri Slaby wrote:
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index b687cb22301c..c5ff9850952f 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -1457,6 +1457,42 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
>> nr_wake, u32 bitset)
>>  return ret;
>>  }
>>  
>> +static int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
>> +{
>> +int op = (encoded_op >> 28) & 7;
>> +int cmp = (encoded_op >> 24) & 15;
>> +int oparg = (encoded_op << 8) >> 20;
>> +int cmparg = (encoded_op << 20) >> 20;
> 
> Hmm.  oparg and cmparg look like they're doing these shifts to get sign
> extension of the 12-bit values by assuming that "int" is 32-bit -
> probably worth a comment, or for safety, they should be "s32" so it's
> not dependent on the bit-width of "int".

I thought Linux depended on the LP64 standard for all architectures?

Standard: http://www.unix.org/whitepapers/64bit.html
Rationale: http://www.unix.org/version2/whatsnew/lp64_wp.html

So int has a defined bit width (32) on linux?

Rob

Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'

2016-09-28 Thread Rob Landley

On 09/18/2016 10:17 AM, Rich Felker wrote:
> On Sat, Sep 17, 2016 at 11:40:28PM -0500, Rob Landley wrote:
>>
>>
>> On 09/16/2016 09:23 PM, Guenter Roeck wrote:
>>> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>>>> 4.6.3 from kernel.org.
>>>>
>>>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>>>> or at the very least 4.7 or 4.8.
>>>>
>>> Unfortunately that is the latest one available from kernel.org :-(.
>>> I'll try to build one myself.
>>
>> Rich, you really, really need to get an actual release version of
>> https://github.com/richfelker/musl-cross-make posted.
> 
> What do you mean? Binaries? There are release tags, though it would
> probably be a good time to make another one.
> 
> But this project (musl-cross-make) is not needed for building kernels
> -- stock gcc, any modern-ish version, should work fine. The canonical
> way (from prior to my involvement) to build sh* kernels is to use a
> gcc that supports any ISA level, and this can be done without multilib
> libgcc since the kernel provides its own libgcc replacement functions.

The above was an example of somebody using a broken toolchain because
there isn't a known-good reference toolchain for the architecture, which
the kernel maintainer is known to regression test against. Having such a
thing might help people distinguish "bug in kernel" from "bug in gcc".

> Rich

Rob

Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'

2016-09-28 Thread Rob Landley

On 09/18/2016 10:17 AM, Rich Felker wrote:
> On Sat, Sep 17, 2016 at 11:40:28PM -0500, Rob Landley wrote:
>>
>>
>> On 09/16/2016 09:23 PM, Guenter Roeck wrote:
>>> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>>>> 4.6.3 from kernel.org.
>>>>
>>>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>>>> or at the very least 4.7 or 4.8.
>>>>
>>> Unfortunately that is the latest one available from kernel.org :-(.
>>> I'll try to build one myself.
>>
>> Rich, you really, really need to get an actual release version of
>> https://github.com/richfelker/musl-cross-make posted.
> 
> What do you mean? Binaries? There are release tags, though it would
> probably be a good time to make another one.
> 
> But this project (musl-cross-make) is not needed for building kernels
> -- stock gcc, any modern-ish version, should work fine. The canonical
> way (from prior to my involvement) to build sh* kernels is to use a
> gcc that supports any ISA level, and this can be done without multilib
> libgcc since the kernel provides its own libgcc replacement functions.

The above was an example of somebody using a broken toolchain because
there isn't a known-good reference toolchain for the architecture, which
the kernel maintainer is known to regression test against. Having such a
thing might help people distinguish "bug in kernel" from "bug in gcc".

> Rich

Rob

Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'

2016-09-17 Thread Rob Landley



On 09/16/2016 09:23 PM, Guenter Roeck wrote:
> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>> 4.6.3 from kernel.org.
>>
>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>> or at the very least 4.7 or 4.8.
>>
> Unfortunately that is the latest one available from kernel.org :-(.
> I'll try to build one myself.

Rich, you really, really need to get an actual release version of
https://github.com/richfelker/musl-cross-make posted.

Rob

Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'

2016-09-17 Thread Rob Landley



On 09/16/2016 09:23 PM, Guenter Roeck wrote:
> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>> 4.6.3 from kernel.org.
>>
>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>> or at the very least 4.7 or 4.8.
>>
> Unfortunately that is the latest one available from kernel.org :-(.
> I'll try to build one myself.

Rich, you really, really need to get an actual release version of
https://github.com/richfelker/musl-cross-make posted.

Rob

Re: [RFC] fs: add userspace critical mounts event support

2016-09-13 Thread Rob Landley

On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> kernel_read_file_from_path() can try to read a file from
> the system's filesystem. This is typically done for firmware
> for instance, which lives in /lib/firmware. One issue with
> this is that the kernel cannot know for sure when the real
> final /lib/firmare/ is ready, and even if you use initramfs
> drivers are currently initialized *first* prior to the initramfs
> kicking off.

Why?

> During init we run through all init calls first
> (do_initcalls()) and finally the initramfs is processed via
> prepare_namespace():

What's the downside of moving initramfs cpio extraction earlier in the boot?

I did some shuffling around of those code to make initmpfs work, does
anybody know why initramfs extraction _before_ we initialize drivers
would be a bad thing? (The cpio is in memory, either linked into the
kernel or from the bootloader. No drivers are needed to extract it,
that's sort of the point.)

The only things I can think of are memory churn (large contiguous
physical page allocations), or if a driver somehow got us access to more
physical memory?

Rob

Re: [RFC] fs: add userspace critical mounts event support

2016-09-13 Thread Rob Landley

On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> kernel_read_file_from_path() can try to read a file from
> the system's filesystem. This is typically done for firmware
> for instance, which lives in /lib/firmware. One issue with
> this is that the kernel cannot know for sure when the real
> final /lib/firmare/ is ready, and even if you use initramfs
> drivers are currently initialized *first* prior to the initramfs
> kicking off.

Why?

> During init we run through all init calls first
> (do_initcalls()) and finally the initramfs is processed via
> prepare_namespace():

What's the downside of moving initramfs cpio extraction earlier in the boot?

I did some shuffling around of those code to make initmpfs work, does
anybody know why initramfs extraction _before_ we initialize drivers
would be a bad thing? (The cpio is in memory, either linked into the
kernel or from the bootloader. No drivers are needed to extract it,
that's sort of the point.)

The only things I can think of are memory churn (large contiguous
physical page allocations), or if a driver somehow got us access to more
physical memory?

Rob

Re: [PATCH] sh: Fix building j2_defconfig

2016-08-19 Thread Rob Landley

On 08/16/2016 04:23 PM, Jason Cooper wrote:
> Hi Rob,
> 
> On Tue, Aug 16, 2016 at 04:15:22PM -0500, Rob Landley wrote:
>> On 08/16/2016 10:41 AM, Jason Cooper wrote:
>>> When targeting the j2, we need to retain '-m2'.  Previously, the
>>> Makefile blew out -m2 on the next line via :=.
>>>
>>> Fix this by s/:=/+=/ when building for the J2.
>>>
>>> Fixes: 5a846abad07f6 ("sh: add support for J-Core J2 processor")
>>> Signed-off-by: Jason Cooper <ja...@lakedaemon.net>
>>
>> Speaking of j2, any status on the missing pieces of infratsructure that
>> went in through other trees, without which booting hangs awaiting the
>> first interrupt?
>>
>>   http://lists.j-core.org/pipermail/j-core/2016-August/000326.html
>>
>> It would be nice if the rest of the board support could make it in this
>> release. Which trees are they going through?
> 
> I'm not aware of the status of other bits, but the irqchip driver can be
> found [1] in a stable, based off of v4.8-rc1, branch here:
> 
>   git://git.infradead.org/users/jcooper/linux.git irqchip/jcore

That's got the interrupt controller, and presumably Thomas' tree has the
timer.

Is it likely to go upstream this dev cycle? Basic j2 board support did,
and as I said it hangs before userspace without the rest of the
interrupt controller and timer plumbing (which are currently only used
by this board).

The above message to the j-core list had an attached patch that adds the
missing bits to -rc2. I tested that patch and it worked for me:

Tested-by: Rob Landley <r...@landley.net>

I just checked the current git pull (not quite rc3) and vanilla is still
hanging at the same place, and the patch still applies cleanly. I'm
aware we're in bugfix-only mode, but "kernel hangs before launching
init" seems bug-ish to me.

Rob

Re: [PATCH] sh: Fix building j2_defconfig

2016-08-19 Thread Rob Landley

On 08/16/2016 04:23 PM, Jason Cooper wrote:
> Hi Rob,
> 
> On Tue, Aug 16, 2016 at 04:15:22PM -0500, Rob Landley wrote:
>> On 08/16/2016 10:41 AM, Jason Cooper wrote:
>>> When targeting the j2, we need to retain '-m2'.  Previously, the
>>> Makefile blew out -m2 on the next line via :=.
>>>
>>> Fix this by s/:=/+=/ when building for the J2.
>>>
>>> Fixes: 5a846abad07f6 ("sh: add support for J-Core J2 processor")
>>> Signed-off-by: Jason Cooper 
>>
>> Speaking of j2, any status on the missing pieces of infratsructure that
>> went in through other trees, without which booting hangs awaiting the
>> first interrupt?
>>
>>   http://lists.j-core.org/pipermail/j-core/2016-August/000326.html
>>
>> It would be nice if the rest of the board support could make it in this
>> release. Which trees are they going through?
> 
> I'm not aware of the status of other bits, but the irqchip driver can be
> found [1] in a stable, based off of v4.8-rc1, branch here:
> 
>   git://git.infradead.org/users/jcooper/linux.git irqchip/jcore

That's got the interrupt controller, and presumably Thomas' tree has the
timer.

Is it likely to go upstream this dev cycle? Basic j2 board support did,
and as I said it hangs before userspace without the rest of the
interrupt controller and timer plumbing (which are currently only used
by this board).

The above message to the j-core list had an attached patch that adds the
missing bits to -rc2. I tested that patch and it worked for me:

Tested-by: Rob Landley 

I just checked the current git pull (not quite rc3) and vanilla is still
hanging at the same place, and the patch still applies cleanly. I'm
aware we're in bugfix-only mode, but "kernel hangs before launching
init" seems bug-ish to me.

Rob

Re: [PATCH] sh: Fix building j2_defconfig

2016-08-16 Thread Rob Landley

On 08/16/2016 10:41 AM, Jason Cooper wrote:
> When targeting the j2, we need to retain '-m2'.  Previously, the
> Makefile blew out -m2 on the next line via :=.
> 
> Fix this by s/:=/+=/ when building for the J2.
>
> Fixes: 5a846abad07f6 ("sh: add support for J-Core J2 processor")
> Signed-off-by: Jason Cooper 

Speaking of j2, any status on the missing pieces of infratsructure that
went in through other trees, without which booting hangs awaiting the
first interrupt?

  http://lists.j-core.org/pipermail/j-core/2016-August/000326.html

It would be nice if the rest of the board support could make it in this
release. Which trees are they going through?

Rob

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1794 matches

Mail list logo