from:"Kay Sievers"

Re: [PATCH] modules: CONFIG_MODULE_COMPRESS: add hint that userspace support may easily be missing.

2015-06-03 Thread Kay Sievers

On Wed, Jun 3, 2015 at 7:30 PM, Lucas De Marchi
 wrote:
> On Mon, Jun 1, 2015 at 3:26 AM, Rusty Russell  wrote:
>> Andreas Mohr  writes:
>>> Hi,
>>>
>>> I just had a not so nice experience
>>> when finally upgrading to a new 4.1-rc5
>>> with CONFIG_MODULE_COMPRESS newly enabled -
>>> userspace binary parts (kmod 18 or 20 in my case)
>>> did not have compression enabled
>>> (at least on Debian 8pre, vs. encountering it enabled on FC21)
>>> since it does not seem to be
>>> the default build configuration of kmod (yet?).
>>
>> Sure.  Let's get the maintainers to insert the actual version required
>> in the help text though.
>
> kmod supports gz since the first version and xz since version 3.  So both
> of them can be safely fall into "it's supported since the beginning of
> kmod IMO".
>
> Regarding the "default configuration", there's no such thing. Each 
> distribution
> uses a different one.

You could add something similar to this:

$ /usr/lib/systemd/systemd --version
systemd 220
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS
+KMOD +IDN

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How to fix CDROM/DVD eject mess?

2015-02-02 Thread Kay Sievers

On Mon, Feb 2, 2015 at 8:34 PM, Maciej W. Rozycki  wrote:
> On Mon, 2 Feb 2015, Kay Sievers wrote:
>
>> > I thought that fixing the udev behavior would solve the problem.  But
>> > it turned out that I was too naive.  A bigger problem is that all
>> > user-space stuff misinterprets DISK_EVENT_EJECT_REQUEST event: they
>> > see this as if the disk is *ready* to be ejected.  KDE, for example,
>> > dismisses the DVD icon when it receives this event even if it's still
>> > mounted.
>>
>> It is not really about being "ready to eject", if the user presses the
>> button, the user does not want to wait for anything else than actually
>> ejecting the media as fast as possible. It is the same as ripping out
>> a USB cable. It needs to work, no matter if things are mounted or
>> busy.
>
>  All the technical details aside, this is a bold statement -- how do you
> know what the user actually wants?

By working with people who spent a lot of time with the questions what
the default behavior of user interfaces should be. Buttons, especially
physical ones, need to give immediate feedback to the user. If they
don't give it it, people will look for something else to get what they
want.

>  I for one want to see the medium locked if in use, just as it has been
> since 1990s.  If I wanted to do an emergency eject (the equivalent of
> ripping out a USB cable), then I would use a paperclip in the manual eject
> hole.  So you've got a counterexample to your assertion now.  All people
> are not the same.

It's just the current default setup and intentional behavior. You or
your distribution can for sure implement something else.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How to fix CDROM/DVD eject mess?

2015-02-02 Thread Kay Sievers

On Mon, Feb 2, 2015 at 2:20 PM, Takashi Iwai  wrote:
> we've got a bug report about the mishandling of DVD/CDROM media eject
> button, and it seems indeed broken since some time ago.  In short:
> when the eject button is pressed, the media is forcibly ejected no
> matter whether it's mounted or in use.

Which is the right thing to do for all read-only stuff.

> And, the mount remains even
> after ejecting the media, eventually the kernel spews error messages
> when the further access happens.

This cosmetic issue should be "fixed" if it matters.

> There seem many problems behind the scene.  First of all, udev tries
> to lock the media unconditionally at the media insert.  This seems to
> be a workaround for making DISK_EVENT_EJECT_REQUEST working.

It is no workaround, it's the SCSI spec. You only see events if you
lock the door.

> Then,
> udev unlocks the media and issues the SCSI eject ioctl unconditionally
> when DISK_EVENT_EJECT_REQUEST event is received.  Since SCSI ioctl
> doesn't take the open refcount into account, it results in the
> forcible eject.

Which again is the expected behavior in the user's view.

> (A relevant problem is that CDROM_IOCTL doesn't behave consistently;
>  it checks the open refcount only for IDE.  For SCSI, it bypasses and
>  gives the control directly to SCSI backend.  So, using CDROM_EJECT
>  ioctl won't help as of now.)
>
> I thought that fixing the udev behavior would solve the problem.  But
> it turned out that I was too naive.  A bigger problem is that all
> user-space stuff misinterprets DISK_EVENT_EJECT_REQUEST event: they
> see this as if the disk is *ready* to be ejected.  KDE, for example,
> dismisses the DVD icon when it receives this event even if it's still
> mounted.

It is not really about being "ready to eject", if the user presses the
button, the user does not want to wait for anything else than actually
ejecting the media as fast as possible. It is the same as ripping out
a USB cable. It needs to work, no matter if things are mounted or
busy.

We have to handle "the mess" in our tools, meaning cleaning up the stale
stuff in the kernel or userspace, just like we do with all other
plugable devices when they ripped out by the user.

I'm really not convinced that suppressing events here makes any sense.
It is just a hardware button event which should not be masked out for
rather weird reasons.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 03/11] drivers: base: add new class "cpu" to group cpu devices

2014-08-22 Thread Kay Sievers

On Thu, Aug 21, 2014 at 2:30 PM, Sudeep Holla  wrote:
> On 21/08/14 12:20, David Herrmann wrote:
>> On Thu, Aug 21, 2014 at 12:59 PM, Sudeep Holla 
>> wrote:
>>>
>>> From: Sudeep Holla 
>>>
>>> This patch creates a new class called "cpu" and assigns it to all the
>>> cpu devices. This helps in grouping all the cpu devices and associated
>>> child devices under the same class.
>>>
>>> This patch also:
>>> 1. modifies the get_parent_device to return the legacy path
>>> (/sys/devices/system/cpu/..) for the cpu class devices to support
>>> existing sysfs ABI
>>> 2. avoids creating link in the class directory pointing to the device as
>>> there would be per-cpu instance of these devices with the same name
>>> 3. makes sure subsystem symlink continues pointing to cpu bus instead of
>>> cpu class for cpu devices
>>
>>
>> This patch lacks any explanation _why_ you add a class for CPUs. With
>> this patch applied, these directories are effectively the same:
>>/sys/bus/cpu/devices/
>>/sys/class/cpu/
>>
>
> Yes that's the intention, so that we don't break any existing sysfs/ABI
>
>
>> Why do we need a cpu-class if the same set of information is already
>> available on the cpu-bus? Furthermore, classes are deprecated anyway.
>> Everything you can do with a class can be solved with a bus. And we
>> already have a bus for cpus.
>>
>
> This was suggested[1] by GregKH. The main reason it was added is to
> reuse the device attributes rather than creating the raw kobjects.
>
> It helps to move few other cpu related subsystems using raw kobjects to
> the device attribute groups.

No, nothing should ever create a bus and a class with the same name.
This is not supported by userspace tools.

Your problem needs to be addressed by adding things to the existing
"cpu" bus, not by adding a new class.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] printk: more log flag simplification

2014-07-17 Thread Kay Sievers

On Thu, Jul 17, 2014 at 7:59 PM, Alex Elder  wrote:
> This series rearranges the log code in such a way that the LOG_CONT
> and LOG_PREFIX log record flags can be eliminated entirely.  The
> result should be considerably easier to understand than before.  It
> builds on another recently-posted series of patches:
> https://lkml.org/lkml/2014/7/17/363
>
> The first patch exploits the fact that LOG_CONT and LOG_NEWLINE
> are inverses, and uses LOG_NEWLINE (or its negation) anywhere
> LOG_CONT is used.  As a result, LOG_CONT is no longer needed, so
> it's eliminated.
>
> The next three patches together eliminate LOG_PREFIX.  The effect
> of LOG_PREFIX is to complete the previous log entry before recording
> a new one.  Patch 2 arranges to do this directly, marking the previous
> log record with LOG_NEWLINE whenever a new record is presented with
> LOG_PREFIX set.  Patch 3 stops saving LOG_PREFIX in any log records,
> and patch 4 finally gets rid of LOG_PREFIX.
>
> The last patch is just some cleanup of the code now that it's gone
> through this transformation.

Continuation lines are racy and unreliable by definition, they create
a problem that cannot be solved properly, so we need to try to make
the best out of it. The idea of the record buffer flags was to record
what actually happened when things race against each other. A line
without a newline is recorded as a line without a newline.

Your simplifying patches changes the code to store how things will
*look* like when exported, not what actually *happened*; like it
pretends the earlier line had a newline, while that might not have
been the case.

Maybe your simplified logic is "good enough" for the common use cases,
I haven't thought it through, but it sounds a bit strange to pretend
things in the flags of a record which actually did not happen for this
record that way, just to save a bit in a flag field.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/2] printk: Add context information to the header of /dev/kmsg

2014-05-19 Thread Kay Sievers

On Mon, May 19, 2014 at 7:28 PM, Yoshihiro YUNOMAE
 wrote:
> Add context information to the header of /dev/kmsg.
>
> Two printk messages connected with KERN_CONT can be divided in multiple lines
> by a different process context message. If the different context message seems
> like the 1st divided message, it is difficult to understand which the 2nd
> divided message belongs to. This problem can also occur for the situation 
> where
> multiple message lines without KERN_CONT are broken into by similar messages.
> For example, SCSI disk error messages can be show as follows:
>
> [110781.736171] sd 2:0:0:0: [sdb]
> [110781.736170] sd 3:0:0:0: [sdc] Unhandled sense code
> [110781.736172] sd 3:0:0:0: [sdc]
> [110781.736175] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [110781.736177] sd 3:0:0:0: [sdc]
> [110781.736178] Sense Key : Medium Error [current]
> [110781.736187] Sense Key : Recovered Error
> [110781.736189] [current]

[...]

> This patch adds PID and interrupt context flag to the header of /dev/kmsg as
> the context information in order to understand relation of output messages. If
> PID values of two messages and the interrupt context flags are same, it means
> that those messages are same context, so those message have some relation. If
> not so, it means that those messages are different context, so users do not
> need to take care about the relation of the messages.

[...]

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -197,6 +197,8 @@ enum log_flags {
>
>  struct printk_log {
> u64 ts_nsec;/* timestamp in nanoseconds */
> +   pid_t pid;  /* identify PID */
> +   u32 irq_count;  /* identify irq_count */

I don't think it is worth to blow-up this heavily used struct up even
more, just because SCSI cannot log in simple single calls. How about
fixing SCSI to log to a local buffer if it cannot safely print one
line at once.

I'm not convinced, that turning the /dev/kmsg format into an
"annotation language', and requiring a rather complex state machine to
re-construct the "broken" logging makes too much sense here. If tools
rely on properly formatted logging messages, the logging should be
fixed at the source while it is logged, not be reconstructed later.

It is not not my call, but I don't this makes much sense. Continuation
lines are "best effort" not a facility that is or ever was reliable in
the past. I think the proper fix is how the log is created, not how it
is exported.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Kay Sievers

On Tue, Apr 15, 2014 at 8:50 PM, Eric W. Biederman
 wrote:
> Kay Sievers  writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
>>>> cgroup users often need a way to determine when a cgroup's
>>>> subhierarchy becomes empty so that it can be cleaned up.  cgroup
>>>> currently provides release_agent for it; unfortunately, this mechanism
>>>> is riddled with issues.
>>>>
>>>> * It delivers events by forking and execing a userland binary
>>>>   specified as the release_agent.  This is a long deprecated method of
>>>>   notification delivery.  It's extremely heavy, slow and cumbersome to
>>>>   integrate with larger infrastructure.
>>>>
>>>> * There is single monitoring point at the root.  There's no way to
>>>>   delegate management of subtree.
>>>>
>>>> * The event isn't recursive.  It triggers when a cgroup doesn't have
>>>>   any tasks or child cgroups.  Events for internal nodes trigger only
>>>>   after all children are removed.  This again makes it impossible to
>>>>   delegate management of subtree.
>>>>
>>>> * Events are filtered from the kernel side.  "notify_on_release" file
>>>>   is used to subscribe to or suppress release event.  This is
>>>>   unnecessarily complicated and probably done this way because event
>>>>   delivery itself was expensive.
>>>>
>>>> This patch implements interface file "cgroup.subtree_populated" which
>>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>>> it or not.  Its value is 0 if there is no task in the cgroup and its
>>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>>> triggers when the value changes, which can be monitored through poll
>>>> and [di]notify.
>>>>
>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>>   empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>>   empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.

Inotify on kernfs will work.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Kay Sievers

On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
> On 2014/4/15 5:44, Tejun Heo wrote:
>> cgroup users often need a way to determine when a cgroup's
>> subhierarchy becomes empty so that it can be cleaned up.  cgroup
>> currently provides release_agent for it; unfortunately, this mechanism
>> is riddled with issues.
>>
>> * It delivers events by forking and execing a userland binary
>>   specified as the release_agent.  This is a long deprecated method of
>>   notification delivery.  It's extremely heavy, slow and cumbersome to
>>   integrate with larger infrastructure.
>>
>> * There is single monitoring point at the root.  There's no way to
>>   delegate management of subtree.
>>
>> * The event isn't recursive.  It triggers when a cgroup doesn't have
>>   any tasks or child cgroups.  Events for internal nodes trigger only
>>   after all children are removed.  This again makes it impossible to
>>   delegate management of subtree.
>>
>> * Events are filtered from the kernel side.  "notify_on_release" file
>>   is used to subscribe to or suppress release event.  This is
>>   unnecessarily complicated and probably done this way because event
>>   delivery itself was expensive.
>>
>> This patch implements interface file "cgroup.subtree_populated" which
>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>> it or not.  Its value is 0 if there is no task in the cgroup and its
>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>> triggers when the value changes, which can be monitored through poll
>> and [di]notify.
>>
>
> For the old notification mechanism, the path of the cgroup that becomes
> empty will be passed to the user specified release agent. Like this:
>
> # cat /sbin/cpuset_release_agent
> #!/bin/sh
> rmdir /dev/cpuset/$1
>
> How do we achieve this using inotify?
>
> - monitor all the cgroups, or
> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>   empty cgroups.
> - monitor root cgroup only, and travel the whole hierarchy to find
>   empy cgroups when it gets an fs event.
>
> Seems none of them is scalible.

The manager would add all cgroups as watches to one inotify file
descriptor, it should not be problem to do that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 0/4] Provide netdev naming-policy via sysfs

2014-03-29 Thread Kay Sievers

On Sat, Mar 29, 2014 at 8:37 PM, David Miller  wrote:
> From: Tom Gundersen 
> Date: Sat, 29 Mar 2014 10:46:02 +0100
>
>> The issue I see with that is that there are several ways to generate
>> predictable names, and the user may want to chose between them, so
>> this is arguably policy that should not be in the kernel. If you don't
>> think that's a problem, I'd be happy to submit a patch to move all
>> this logic from udev to the kernel, just let me know how you see it.
>
> Unfortunately, we have already allowed this kind of thing for setting
> MAC addresses so I guess I'll have to allow your changes to be applied.
>
> But before I ask you to resubmit, do you have full buy-in from the
> udev folks for this facility?

Absolutely, it is very useful to have.

Today, there is no way for userland to distinguish explicitly
requested and predictable names provided by tools, from unpredictable
names composed by the kernel itself by adding an unpredictable
enumeration number.

It allows us the work around the common problems in general purpose
Linux, where we have no strong dependencies, not necessarily central
management, and loose coupling between components.

This information will allow us to apply best-effort or common sense to
the default udev device name policies, like:
  - leave the name alone if someone has asked explicitly for that name
  - leave the name alone if someone else has already renamed it after
its first creation

So, having that information in /sys would be very welcome from udev's
point of view.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Add sysfs symlink for console name->tty device

2014-02-27 Thread Kay Sievers

On Thu, Feb 27, 2014 at 2:31 PM, Peter Hurley  wrote:
> On 02/27/2014 06:13 AM, Kay Sievers wrote:
>>
>> On Wed, Feb 26, 2014 at 3:40 PM, Peter Hurley 
>> wrote:
>>>
>>> Enable a user-space process to discover the underlying tty device
>>> for a console, if one exists, and when the tty device is later
>>> created or destroyed.
>>>
>>> Add sysfs symlinks for registered consoles to their respective
>>> devices in [sys/class,sys/devices/virtual]/tty/console.
>>> Scan consoles at tty device (un)registration to handle deferred
>>> console<->device (un)binding.
>>
>> What tool is supposed to read that? I can't think of anything
>> interested in this, as soon as we have fixed the "active" console
>> output.

> With all due respect, that "fix" is a ridiculous hack,

No, it is not. It's fine to handle tty0 special, as it is special.

> being done
> for self-serving expedience.

I don't see the problem.

> It already caused one user-space breakage
> which you did not expect.

That is normal way to do things, only people who don't do things don't
break things. And broken things get fixed, and the "active" file is still
fixable, and that is what we should do.

We don't need to invent new things because we did not get things right
with the first try.

> This sysfs interface is superior in every way.

But nothing uses it now, and probably never will, so I don't see the
need for it at this moment.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3] tty: Set correct tty name in 'active' sysfs attribute

2014-02-27 Thread Kay Sievers

On Tue, Feb 25, 2014 at 10:38 AM, David Herrmann  wrote:
> On Tue, Feb 25, 2014 at 8:51 AM, Hannes Reinecke  wrote:

>> Positive?
>> I thought this was precisely the problem, ->device() changing the
>> index '0' into something non-zero.
>> The reports we had were that the line 'tty0' changed into 'tty1'.
>> Hence ->device() converted cs[i]->index (which is '0') into index
>> (which is '1').
>> Hence the check would be correct, wouldn't it?
>
> If "cs[i]" points to tty0, then cs[i]->index is 0. If you call
> ->device(), it will store 1 (or !=0) in "index". Thus, "(driver &&
> (index > 0))" will be true and you will write tty1 into the file
> instead of tty0. So you don't want to check whether the new value is
> non-zero, but whether the *previous* value was 0, turning this into:
>
> if (driver && (cs[i]->index > 0 || driver->major != TTY_MAJOR))
>
> So loosely speaking, we use the new code only for devices which either
> are not a VT or have an idx > 0. Otherwise, we use our fallback.

Hannes, David, care to update the patch to do that? It all sounds fine
to me. And we should get this merged again.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Add sysfs symlink for console name->tty device

2014-02-27 Thread Kay Sievers

On Wed, Feb 26, 2014 at 3:40 PM, Peter Hurley  wrote:
> Enable a user-space process to discover the underlying tty device
> for a console, if one exists, and when the tty device is later
> created or destroyed.
>
> Add sysfs symlinks for registered consoles to their respective
> devices in [sys/class,sys/devices/virtual]/tty/console.
> Scan consoles at tty device (un)registration to handle deferred
> console<->device (un)binding.

What tool is supposed to read that? I can't think of anything
interested in this, as soon as we have fixed the "active" console
output.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ksm: Expose configuration via sysctl

2014-02-25 Thread Kay Sievers

On Tue, Feb 25, 2014 at 6:34 PM, Dave Hansen  wrote:
> On 02/24/2014 03:28 PM, Alexander Graf wrote:
>> Configuration of tunables and Linux virtual memory settings has traditionally
>> happened via sysctl. Thanks to that there are well established ways to make
>> sysctl configuration bits persistent (sysctl.conf).
>>
>> KSM introduced a sysfs based configuration path which is not covered by user
>> space persistent configuration frameworks.
>>
>> In order to make life easy for sysadmins, this patch adds all access to all
>> KSM tunables via sysctl as well. That way sysctl.conf works for KSM as well,
>> giving us a streamlined way to make KSM configuration persistent.
>
> Doesn't this essentially mean "don't use sysfs for configuration"?
> Seems like at least /sys/kernel/mm/transparent_hugepage would need the
> same treatment.
>
> Couldn't we also (maybe in parallel) just teach the sysctl userspace
> about sysfs?  This way we don't have to do parallel sysctls and sysfs
> for *EVERYTHING* in the kernel:
>
> sysfs.kernel.mm.transparent_hugepage.enabled=enabled
>
> Or do we just say "sysctls are the way to go for anything that might
> need to be persistent, don't use sysfs"?

Support in sysctl for setting static data in /sys might make sense for
some rare use cases.

It's still not obvious how to handle the dynamic nature of most of the
data that is created by modules, and which data belongs into udev
rules and which in the "sysctl /sys" settings.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ksm: Expose configuration via sysctl

2014-02-25 Thread Kay Sievers

On Wed, Feb 26, 2014 at 12:16 AM, Alexander Graf  wrote:
>
>
>>> Am 26.02.2014 um 01:19 schrieb Peter Zijlstra :
>>>
 On Tue, Feb 25, 2014 at 12:15:28PM -0500, Johannes Weiner wrote:
 On Tue, Feb 25, 2014 at 12:28:04AM +0100, Alexander Graf wrote:
 Configuration of tunables and Linux virtual memory settings has 
 traditionally
 happened via sysctl. Thanks to that there are well established ways to make
 sysctl configuration bits persistent (sysctl.conf).

 KSM introduced a sysfs based configuration path which is not covered by 
 user
 space persistent configuration frameworks.

 In order to make life easy for sysadmins, this patch adds all access to all
 KSM tunables via sysctl as well. That way sysctl.conf works for KSM as 
 well,
 giving us a streamlined way to make KSM configuration persistent.

 Reported-by: Sasche Peilicke 
 Signed-off-by: Alexander Graf 
 ---
 kernel/sysctl.c |   10 +++
 mm/ksm.c|   78 
 +++
 2 files changed, 88 insertions(+), 0 deletions(-)

 diff --git a/kernel/sysctl.c b/kernel/sysctl.c
 index 332cefc..2169a00 100644
 --- a/kernel/sysctl.c
 +++ b/kernel/sysctl.c
 @@ -217,6 +217,9 @@ extern struct ctl_table random_table[];
 #ifdef CONFIG_EPOLL
 extern struct ctl_table epoll_table[];
 #endif
 +#ifdef CONFIG_KSM
 +extern struct ctl_table ksm_table[];
 +#endif

 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
 int sysctl_legacy_va_layout;
 @@ -1279,6 +1282,13 @@ static struct ctl_table vm_table[] = {
   },

 #endif /* CONFIG_COMPACTION */
 +#ifdef CONFIG_KSM
 +{
 +.procname= "ksm",
 +.mode= 0555,
 +.child= ksm_table,
 +},
 +#endif
>>>
>>> ksm can be a module, so this won't work.
>>>
>>> Can we make those controls proper module parameters instead?
>>
>> You can do dynamic sysctl registration and removal. Its its own little
>> filesystem of sorts.
>
> Hm. Doesn't this open another big can of worms? If we have ksm as a module 
> and our sysctl helpers try to enable ksm on boot, they may not be able to 
> because the module hasn't been loaded yet.
>
> So in that case, we want to always register the sysctl and dynamically load 
> the ksm module when the sysctl gets accessed - similar to how we can do stub 
> devices that load modiles, no?

The files sysctl tries to write to at bootup need to be all there
right from the start, otherwise there can't be anything to access, and
no way to trigger any module load.

The auto-load stuff in /dev works by userspace creating dead device
nodes with the proper dev_t before the device exists, which is a very
different model.

sysctl is not suitable for any instantiated or conditional data like
loadable module, devices, ... Things need to be there right from the
start, later registered facilities will not be noticed by sysctl in
userspace and are therefore not hooked into system management. If any
values should be applied from userspace, this will not really work
out.

There a network devices configs doing that today, and they don't
really work that well; it's a gigantic stupid hack in userspace to
fiddle with /proc/sys/ when a netdev shows up, no model to copy to any
new facility.

Usually, loadable module parameter need to live in /sys/module/ or
/sys/bus/, where uevents are generated and udev can pick up the event
and apply system configuration.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.13 35/99] tty: Set correct tty name in active sysfs attribute

2014-02-21 Thread Kay Sievers

On Fri, Feb 21, 2014 at 3:56 PM, Josh Boyer  wrote:
> On Fri, Feb 21, 2014 at 9:52 AM, Hannes Reinecke  wrote:
>> On 02/21/2014 03:48 PM, Josh Boyer wrote:
>>> On Thu, Feb 20, 2014 at 6:52 PM, Greg Kroah-Hartman
>>>  wrote:
 3.13-stable review patch.  If anyone has any objections, please let me 
 know.

 --

 From: Hannes Reinecke 

 commit d8a5dc3033af2fd6d16030d2ee4fbd073460fe54 upstream.

 The 'active' sysfs attribute should refer to the currently active tty
 devices the console is running on, not the currently active console.

 The console structure doesn't refer to any device in sysfs, only the tty
 the console is running on has.  So we need to print out the tty names in
 'active', not the console names.

 This resolves an issue on s390 platforms in determining the correct
 console device to use.
>>>
>>> Just to be double sure this is seen, Ray points out that it breaks
>>> current plymouth because the heuristic changed.  Hold off on this one?
>>>
>> Without this patch systemd won't present a login console for s390.
>> I'd prefer fixing plymouth.
>
> Sure.  Except fixing plymouth is easy to do, but not easy to actually
> get deployed on all of the old userspace.  So if someone runs a 3.14
> kernel on any distro that doesn't have a fixed plymouth, it's broken.
> By including this patch, you're basically trading one broken userspace
> component for another.
>
> Is there some other way this could be fixed in-kernel that would allow
> both to work?

Why did the tty0 change to tty1 now? That doesn't look like a "driver
name" vs. "device name" issue?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: Fix discarding of records

2014-02-16 Thread Kay Sievers

On Mon, Feb 17, 2014 at 2:19 AM, Kay Sievers  wrote:
> On Mon, Feb 17, 2014 at 1:57 AM, Linus Torvalds
>  wrote:
>> On Sun, Feb 16, 2014 at 4:50 PM, Kay Sievers  wrote:
>>>
>>> That should avoid the overflow, yes. I expect it will not print the
>>> first line with a prefix, which we probably should.?
>>
>> Well, it's not printing out the prefix, but it's also not printing out
>> the whole first part of the line, so quite frankly, I think that's
>> actually "more correct".
>>
>> After all, it has already skipped the beginning of the line.
>> Prepending the prefix, then skipping part of the line, and then
>> printing the last part, that sounds truly insane, no?
>
> Yeah, it depends on the idea of what a "line" is; being it a single
> printk() call or a reconstructed continuation line, which happens when
> printk calls could not be merged for some reason into a single record.
>
> But sure, your patch, it sounds fine to just skip the prefix.
>
> The syslog() dump interface never made any promises, and it is not
> used that much anymore today (even dmesg switched away from it since
> quite a while).
>
> For the dumpers, who might use that interface to "page" through the
> data, not printing the prefix sounds actually like the better option
> looking at the stream of pages they ask for.

Your patch seems to work fine here. We now start in the middle of a
cont *line*, with the next cont *record*; do not print the header, and
also do not print more than we should:

  syslog(SYSLOG_ACTION_READ_ALL, "<6>[   73.671533] ---\n<6>[
73.673677] ...", 1088) = 704
  syslog(SYSLOG_ACTION_READ_ALL, "268-...", 1089) = 1089
  syslog(SYSLOG_ACTION_READ_ALL, "268-... 1090) = 1089

The current code before the patch does this:

  syslog(SYSLOG_ACTION_READ_ALL, "<4>[  210.190007]
268-26"..., 445) = 463

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: Fix discarding of records

2014-02-16 Thread Kay Sievers

On Mon, Feb 17, 2014 at 1:57 AM, Linus Torvalds
 wrote:
> On Sun, Feb 16, 2014 at 4:50 PM, Kay Sievers  wrote:
>>
>> That should avoid the overflow, yes. I expect it will not print the
>> first line with a prefix, which we probably should.?
>
> Well, it's not printing out the prefix, but it's also not printing out
> the whole first part of the line, so quite frankly, I think that's
> actually "more correct".
>
> After all, it has already skipped the beginning of the line.
> Prepending the prefix, then skipping part of the line, and then
> printing the last part, that sounds truly insane, no?

Yeah, it depends on the idea of what a "line" is; being it a single
printk() call or a reconstructed continuation line, which happens when
printk calls could not be merged for some reason into a single record.

But sure, your patch, it sounds fine to just skip the prefix.

The syslog() dump interface never made any promises, and it is not
used that much anymore today (even dmesg switched away from it since
quite a while).

For the dumpers, who might use that interface to "page" through the
data, not printing the prefix sounds actually like the better option
looking at the stream of pages they ask for.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: Fix discarding of records

2014-02-16 Thread Kay Sievers

On Mon, Feb 17, 2014 at 1:41 AM, Linus Torvalds
 wrote:
> On Sun, Feb 16, 2014 at 4:19 PM, Banerjee, Debabrata
>  wrote:

> The third loop does *not* start again from the first line! It
> *continues* from where the second loop ended. Which is exactly why
> clearing "prev" is *wrong*. Because the *last* line that the second
> loop processes is indeed the previous line before the *first* line
> that the third loop starts processing.
>
> No, I haven't tested my patch, and maybe it's broken for some subtle
> reason I'm missing too.

That should avoid the overflow, yes. I expect it will not print the
first line with a prefix, which we probably should.?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: Fix discarding of records

2014-02-16 Thread Kay Sievers

On Sun, Feb 16, 2014 at 9:47 PM, Greg Kroah-Hartman
 wrote:
> On Sun, Feb 16, 2014 at 11:28:36AM -0800, Linus Torvalds wrote:
>> Adding Kay and Greg, since the original code is from commit
>> 7ff9554bb578 ("printk: convert byte-buffer to variable-length record
>> buffer") and all the "prev" flag tweaks end up building on top of
>> that.
>>
>> The whole "prev flags" is messed up, and LOG_CONT is done very confusingly.
>>
>> Why are *those* particular two "prev = msg->flags" incorrect, when
>> every other case where we walk the messages they are required?
>>
>> The code/logic makes no sense. You remove the "prev = msg->flags" at
>> line 1070, when the *identical* loop just above it has it. So now the
>> two loops count the number of characters differently. That makes no
>> sense.
>>
>> So I don't think this fixes the fundamental problem. I'm more inclined
>> to believe that LOG_CONT is wrongly set somewhere, for example because
>> a continuation wasn't actually originally printed due to coming from
>> different users or something like that.
>>
>> Or at the very least I want a coherent explanation why one loop would
>> do this and the other would not, and why counting up *different*
>> numbers could possibly make sense.
>>
>> Because as it is, there clearly is some problem, but the patch does
>> not look sensible to me.
>
> Yeah, it doesn't make much sense to me either.
>
> Kay had a printk() test module that would stress these types of paths
> out a bunch, Kay, is that module around somewhere that we could maybe
> add it to the kernel tree so it could be used to test changes like this?

The module hack only tested the new stuff, not the legacy syslog() interface.

There is a bug in syslog_print_all(). We calculate the size of the
entire buffer for the plain text output including all the needed
headers, then we start to drop messages from the start, until the
result fits into the output buffer, and we start copying it.

Some of the lines get a header printed, cont lines don't. While
dropping messages from the start, if we end up at a line that did not
get a header added in the size calculation, we still add the header
now when copying it out as the first line.

We don't account for the size of this one additional header, and
therefore might end up writing up to 18 chars (syslog + time) to many.

I'll fix that tomorrow. Dropping the prev = in the loop avoids the
problem, but it will also result in fewer messages to be copied than
we could. We only have to account for the first header.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] tty: Set correct tty name in 'active' sysfs attribute

2014-02-07 Thread Kay Sievers

On Thu, Feb 6, 2014 at 5:29 PM, Greg Kroah-Hartman
 wrote:
> On Thu, Feb 06, 2014 at 04:44:20PM +0100, Hannes Reinecke wrote:
>> On 02/06/2014 04:29 PM, Greg Kroah-Hartman wrote:
>> > On Thu, Feb 06, 2014 at 03:27:43PM +0100, Hannes Reinecke wrote:
>> >> The 'active' sysfs attribute should refer to the currently
>> >> active tty devices the console is running on, not the currently
>> >> active console.
>> >
>> > That's not what Documentation/ABI/sysfs-tty says:
>> >  Shows the list of currently configured
>> >  console devices, like 'tty1 ttyS0'.
>> >  The last entry in the file is the active
>> >  device connected to /dev/console.
>> >  The file supports poll() to detect virtual
>> >  console switches.
>> >
>> The problem is indeed with 'console devices'. There is no such
>> thing; you only have tty devices where the console is running on.
>>
>> >> The console structure doesn't refer to any device in sysfs,
>> >> only the tty the console is running on has.
>> >
>> > That sentance doesn't make sense.
>> >
>> >> So we need to print out the tty names in 'active', not
>> >> the console names.
>> >
>> > But that doesn't match the documentation.
>> >
>> > What exactly are you trying to "fix" here?  What is the problem that the
>> > current file has that is broken?  And as you are changing what this file
>> > means, what will break if the information in the file changes?
>> >
>> systemd is using the 'active' sysfs attribute to figure out on which
>> _tty_ device to start a getty on.
>> As soon as the console name and the tty name are different
>> you have no means of figuring out which _device_ to open.
>> AFAICS the console 'device' (ie the current entry in 'active')
>> doesn't have _any_ equivalent in sysfs; it just so happens that for
>> most console drivers the tty driver name is identical.
>> But this is not a requirement, and fails for drivers which have a
>> different device for the console and the tty.
>>
>> EG on S/390 the 3270 tty has the devices
>>
>> /dev/3270/tty1
>>
>> but the console driver announces the name 'tty3270'.
>> So as per current rules the 'active' attribute contains
>>
>> tty32700
>>
>> which correct as per documentation, but doesn't have _any_
>> equivalent in sysfs.
>>
>> Martin has the grubby details here.
>>
>> But of course, the documentation should be updated to match the new
>> behavior.
>
> Ok, care to send an updated version, that fixes the Documentation as
> well?  If Kay agrees that this is the correct solution, I'll be glad to
> take it.

Sounds good to me. The intention clearly was to point to the device in use,
which we can find then.

I would not expect problems with this change. For common uses it is the
same name already and nothing visibly should change, and for the ones
where it isn't the same, I expect it is not too useful to find the driver
name.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/3] Send audit/procinfo/cgroup data in socket-level control message

2014-01-23 Thread Kay Sievers

On Thu, Jan 16, 2014 at 10:29 AM, Jan Kaluža  wrote:
> On 01/16/2014 12:23 AM, Tejun Heo wrote:
>> On Wed, Jan 15, 2014 at 06:21:43PM -0500, Eric Paris wrote:
>>>
>>> Reliably being able to audit what process requested an action is
>>> extremely useful.  And I like the audit patch, as it is a couple of ints
>>> we are storing.
>>>
>>> procinfo and cgroup can both be up to 4k of data.
>>>
>>> Is there an alternative he should consider?  Some way to grab a
>>> reference on task_struct and just attach that to the message?
>>
>> Or maybe it can be made separately optional instead of tagging along
>> on an existing option so that it doesn't tax use cases which don't
>> care about the new stuff?
>
> Right, I could add new option next to SOCK_PASSCRED which could be used to
> send newly added stuff. Would this be acceptable?
>
> I would still vote for SCM_AUDIT to be part of SOCK_PASSCRED and move
> SCM_CGROUP and SCM_PROCINFO into new option.

Maybe we could just add a new SOCK_PASS_TASKINFO bit to set in
socket->flags. Set that bit with a new SO_PASS_TASKINFO sockoption.

The SOCK_PASS_TASKINFO can carry all sorts of "struct task" related
stuff, also include the audit data. It is all fully conditional, so
users which do not explicitly subscribe to TASKINFO will never see the
data or needlessly pay for the overhead.

A TASKINFO sounds generic enough to be possibly extended with new data
in the future, without wasting extra bits in the socket flags.

Users which subscribe with SO_PASS_TASKINFO expect some overhead
anyway. In the end it's still a lot cheaper than parsing /proc for the
data; which is also racy and does therefore not work for any
short-living program.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: flush conflicting continuation line

2014-01-02 Thread Kay Sievers

On Fri, Jan 3, 2014 at 1:57 AM, Joe Perches  wrote:
> (Adding Kay to cc's)
>
> Kay?  any opinion on correctness?

Sounds fine by looking at it. Did not test anything though.

>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -1604,7 +1604,10 @@ asmlinkage int vprintk_emit(int facility, int level,
>> >   if (!(lflags & LOG_PREFIX))
>> >   stored = cont_add(facility, level, text, text_len);
>> >   cont_flush(LOG_NEWLINE);
>> > - }
>> > + /* Flush conflicting buffer. An earlier newline was missing
>> > + * and current print is from different task */
>> > + } else if (cont.len && cont.owner != current)
>> > + cont_flush(LOG_NEWLINE);

Unless I miss something, this whole sections all go inside a:
  if (cont.len) {
...
cont_flush(LOG_NEWLINE);
  }

and look a bit less confusing than the two conditions with just the
negated "current" check and duplicated flush call?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH 0/2] [BUGFIX] printk: Fix message continuation breakage involved with structured printk

2013-12-23 Thread Kay Sievers

On Tue, Dec 24, 2013 at 3:50 AM, Yoshihiro YUNOMAE
 wrote:
> (2013/12/20 20:29), Kay Sievers wrote:
>>
>> On Fri, Dec 20, 2013 at 10:41 AM, Yoshihiro YUNOMAE
>>  wrote:
>>
>>> This patch set fixes message continuation breakage involved with
>>> structured
>>> printk. A SCSI driver may output two continuation error messages like
>>>  scmd_printk("foo");
>>>  printf("bar\n");
>>
>>
>> Which is the absolutely wrong thing to do. Structured logging and racy
>> printk continuation must never be mixed. Userspace needs to be sure
>> that dictionary entries are not subject to racy continuation hackery,
>> and that these mwssages atomic, whole and intact.
>
>
> I see.
> As you say, user tools need to support messages output in multiple lines
> for SMP environments even if this patch set is introduced.

They cannot really, they can try to re-construct, but Information and
context is sometimes lost with  the use of continuation lines.
Structured logging need to be reliable and trustable, and it it is not
"best effort", hence it cannot use continuation lines.

Continuation lines are a nice debugging tool for humans only.
Structured logging carries the human readable string but also
machine-readable context and that alwasy needs to be safely
machine-readable and recognizable.

>> Please do not mix the both and do not apply these patches.
>
> OK, I'll make important messages with KERN_CONT or no-prefix printk()
> output those in single line.

Yes, it should use the plain printk versions and not the one which add
structured data.

If structured logging is really wanted for more complex continuation
lines, it might be the simplest to buffer the line locally, instead of
the single printk-owned buffer, before the line is emitted to printk.
That way there will be no race with other print users but structured
data can still safely be added.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] printk: Delete LOG_NEWLINE flag for structured printk

2013-12-20 Thread Kay Sievers

On Fri, Dec 20, 2013 at 10:41 AM, Yoshihiro YUNOMAE
 wrote:
> Delete LOG_NEWLINE flag for structured printk.
> When structured printk is used, the next printk message is output in a new 
> line
> from patch c313af145b9bc4fb8e8e0c83b8cfc10e1b894a50. However, in a following
> pseudo SCSI error test, the device information and the detail information are
> divided:

This is wrong. Please do not apply. Racy ontinuation lines and
strutured logging should not be mixed. Continuation lines are a nice
kernel hack for debugging, useful to have, but cannot be trusted by
userspace. The entire purpose of structured logging is to have
trustable atomic logging messages, the both facilities should not be
mixed.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] printk: Add dictionary information in structure cont

2013-12-20 Thread Kay Sievers

On Fri, Dec 20, 2013 at 10:41 AM, Yoshihiro YUNOMAE
 wrote:
> Add dictionary information in structure cont.
> Dictionary information is added when a driver uses structured printk, and the
> information is shown in /dev/kmsg. Current kernel directly stores the
> information to log_buf. This patch stores the dict information in structure 
> cont
> first, then the information in cont is stored to log_buf.
>
> Signed-off-by: Yoshihiro YUNOMAE 
> Cc: Kay Sievers 
> Cc: Andrew Morton 
> Cc: Joe Perches 
> Cc: Tejun Heo 
> Cc: Frederic Weisbecker 
> Cc: linux-kernel@vger.kernel.org

This is wrong. Please do not apply. Racy ontinuation lines and
strutured logging should not be mixed. Continuation lines are a nice
kernel hack for debugging, useful to have, but cannot be trusted by
userspace. The entire purpose of structured logging is to have
trustable atomic logging messages, the both facilities should not be
mixed.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] [BUGFIX] printk: Fix message continuation breakage involved with structured printk

2013-12-20 Thread Kay Sievers

On Fri, Dec 20, 2013 at 10:41 AM, Yoshihiro YUNOMAE
 wrote:

> This patch set fixes message continuation breakage involved with structured
> printk. A SCSI driver may output two continuation error messages like
> scmd_printk("foo");
> printf("bar\n");

Which is the absolutely wrong thing to do. Structured logging and racy
printk continuation must never be mixed. Userspace needs to be sure
that dictionary entries are not subject to racy continuation hackery,
and that these mwssages atomic, whole and intact.

Please do not mix the both and do not apply these patches.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND][pciutils] libpci: pci_id_lookup - add udev/hwdb support

2013-12-15 Thread Kay Sievers

On Sun, Dec 15, 2013 at 12:33 PM, Martin Mares  wrote:
>> It does that per process doing that, and that's the problem for how
>> udev works/worked. The binary hwdb is on-disk and can be mmaped, and
>> there is no difference between initialization, first, or subsequent
>> queries.
>
> OK, point taken.
>
> I see that a mechanism for fast lookup of hardware identification data
> is needed. However, why should such a mechanism depend on udev, systemd,
> or Linux in general?
>
> What I would really like to have is a universal library for HW lookup,
> independent of anything else and widely portable. All the hardware data
> would be provided by other packages -- pci.ids, usb.ids, kernel modules,
> etc. -- and compiled to a binary format available for instant queries.

Yeah, that's what udev's hwdb is for us now.

For the textual strings, it seems relatively easy to commit to a
reasonable stable format, but I doubt that text strings alone would
justify supporting a binary format to compile to.

I wouldn't want to commit to cross-platform stability for other device
properties we use, or things which will end up in the hwdb. It's hard
enough already to coordinate the development across Linux
distributions, and generic cross-platform device-management metadata
maintenance doesn't sound too promising or an easy goal to me.

Some data of hwdb is entirely Linux and Linux kernel device specific.
The hwdb query string to lookup the binary radix tree in the hwdb file
is just the Linux kernel modalias strings, and they are directly
passed as queries to hwdb. Other platforms could do the same thing,
sure, but I'm not sure this really makes too much sense.

> Do I miss anything?

No, you didn't. It's just that we develop for and work with Linux
only, and this is really hard enough already. :)

I personally never think about other platforms because I don't really
know/care about or use them; and hwdb was just a tiny and not
important enough piece of the problem we need to solve, so it became
part of udev.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND][pciutils] libpci: pci_id_lookup - add udev/hwdb support

2013-12-15 Thread Kay Sievers

On Sun, Dec 15, 2013 at 10:18 AM, Martin Mares  wrote:
> Hello Kay,
>
>> Libpci and its linear search through megabytes of text files for evey
>> new query is too inefficient, that we cannot afford to use it during
>> early bootup. It was the largest hit left in bootup profiling on
>> machines booting userspace in the sub-1-second range on common
>> machines. It was probably never meant to provide efficient queries,
>> but it's the reason we can never use it during early boot.
>
> I do not know what you are speaking about -- libpci definitely does
> not perform linear scans on pci.ids. It builds a hash table from pci.ids
> on the first query and and all subsequent queries are O(1) on average.

It does that per process doing that, and that's the problem for how
udev works/worked. The binary hwdb is on-disk and can be mmaped, and
there is no difference between initialization, first, or subsequent
queries.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND][pciutils] libpci: pci_id_lookup - add udev/hwdb support

2013-12-14 Thread Kay Sievers

On Sun, Dec 15, 2013 at 1:23 AM, Tom Gundersen  wrote:
>> Also, did you consider the opposite, that is making hwdb call libpci
>> to resolve PCI IDs? What are the downsides?
>
> Well, part of the reason for introducing the hwdb was to avoid libpci for
> performance reasons (I added Kay in cc who can correct me if I'm wrong):
> Firstly we don't want to parse the ids file at runtime, and secondly we want
> the constant time lookup time that hwdb gives. Due to the increased
> performance we are now able to tag (in the udev database) all devices with
> the pciids info, before we could only do this for selected devices. A result
> of this is that this info is available 'for free' for any app that uses
> libudev.

Hwdb is an indexed database which can be queried without any
measurable cost. It also carries more information than the textual
strings, and this data is needed during early boot, so we need hwdb
anyway.

Libpci and its linear search through megabytes of text files for evey
new query is too inefficient, that we cannot afford to use it during
early bootup. It was the largest hit left in bootup profiling on
machines booting userspace in the sub-1-second range on common
machines. It was probably never meant to provide efficient queries,
but it's the reason we can never use it during early boot.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: backlight: Remove backlight sysfs uevent

2013-11-11 Thread Kay Sievers

On Tue, Nov 12, 2013 at 3:08 AM, Kyungmin Park  wrote:
> On Tue, Nov 12, 2013 at 10:19 AM, Kay Sievers  wrote:
>> On Tue, Nov 12, 2013 at 1:56 AM, Henrique de Moraes Holschuh
>>  wrote:
>>> On Tue, 12 Nov 2013, Jingoo Han wrote:
>>>> On Tuesday, November 12, 2013 8:57 AM, Kyungmin Park wrote:
>>>> > From: Kyungmin Park 
>>>> >
>>>> > The most mobile phones have Ambient Light Sensors and it changes 
>>>> > brightness according lux.
>>>> > It means it changes backlight brightness frequently by just writing 
>>>> > sysfs node, so it generates uevent.
>>>> >
>>>> > Usually there's no user to use this backlight changes. But it forks udev 
>>>> > worker threads and it takes
>>>> > about 5ms. The main problem is that it hurts other process activities. 
>>>> > so remove it.
>>>> >
>>>> > Kay said
>>>> > "Uevents are for the major, low-frequent, global device state-changes,
>>>> >  not for carrying-out any sort of measurement data. Subsystems which
>>>> >  need that should use other facilities like poll()-able sysfs file or
>>>> >  any other subscription-based, client-tracking interface which does not
>>>> >  cause overhead if it isn't used. Uevents are not the right thing to
>>>> >  use here, and upstream udev should not paper-over broken kernel
>>>> >  subsystems."
>>>
>>> True.
>>>
>>> Now, let's take a look at reality: should you poll()/select() on a sysfs
>>> node that doesn't suport it, it will wait until the poll/select timeout
>>> happens (or EINTR happens), and userspace has absolutely NO way to detect
>>> whether a sysfs node has poll/select support.
>>>
>>> What happens if the sysfs interface did not provide poll/select support
>>> since day one, but rather added it later?  Nobody will use it for a *long*
>>> time, if ever... unless you actually took pains to version the sysfs
>>> interface, and people actually care.
>>
>> If that's an issue, we can add a new "event" file, just for that.
>>
>>>> 'thinkpad_acpi.c' uses the 'BACKLIGHT_UPDATE_SYSFS'.
>>>> Henrique, can we remove it?
>>>
>>> Can't you fix this by rate-limiting, or otherwise adding an attribute that
>>> backlight devices should set when they need to supress change events?
>>
>> Yeah, great idea, fix a bad hack with another bad one on top. :)
>> Passing measurement data through uevents is just an utterly broken
>> idea which cannot be fixed.
>>
>>> Is there a proper on-screen-display support path for the backlight class
>>> nowadays?  Otherwise, you'd be removing the only way userspace ever had to
>>> do proper OSD of backlight changes...
>>
>> OSD drawing and event sounds usually happen as a fedback for
>> keypresses of brightness control, it would be weird to show up when
>> something else, like a light-sensor, adjusts the brightness in the
>> background.
>>
>> Anyway, there might be the need for coordination and a new interface,
>> but uevents for measurement data need to die entirely; they make no
>> sense, never made any; and the sooner they are gone the better.
>
> Now power_supply, especially battery uses this scheme. it passes
> battery data using uevent.
> do you have any idea to kill it?

It should be removed too, the same applies to power_supply as to everything
else; uevents are a broken interface for any kind of device data which
is not meant as a trigger to re-configure the device itself.

But power_supply events are at least not as unfixable as backlight,
the number of events can be kept relatively low during normal
operation. So it can happen after the backlight thing is sorted out.

Note: The same rule as for generating uevents applies also to device
properties exported in the environment too; measurement data has no
place there. Reading the "uevent" file of a battery in sysfs (we need to
do that at bootup) sometimes synchronously blocks 1 second to return,
just because it tries to add measurement data reading it live from the
hardware to add it to the event itself.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] video: backlight: Remove backlight sysfs uevent

2013-11-11 Thread Kay Sievers

On Tue, Nov 12, 2013 at 1:56 AM, Henrique de Moraes Holschuh
 wrote:
> On Tue, 12 Nov 2013, Jingoo Han wrote:
>> On Tuesday, November 12, 2013 8:57 AM, Kyungmin Park wrote:
>> > From: Kyungmin Park 
>> >
>> > The most mobile phones have Ambient Light Sensors and it changes 
>> > brightness according lux.
>> > It means it changes backlight brightness frequently by just writing sysfs 
>> > node, so it generates uevent.
>> >
>> > Usually there's no user to use this backlight changes. But it forks udev 
>> > worker threads and it takes
>> > about 5ms. The main problem is that it hurts other process activities. so 
>> > remove it.
>> >
>> > Kay said
>> > "Uevents are for the major, low-frequent, global device state-changes,
>> >  not for carrying-out any sort of measurement data. Subsystems which
>> >  need that should use other facilities like poll()-able sysfs file or
>> >  any other subscription-based, client-tracking interface which does not
>> >  cause overhead if it isn't used. Uevents are not the right thing to
>> >  use here, and upstream udev should not paper-over broken kernel
>> >  subsystems."
>
> True.
>
> Now, let's take a look at reality: should you poll()/select() on a sysfs
> node that doesn't suport it, it will wait until the poll/select timeout
> happens (or EINTR happens), and userspace has absolutely NO way to detect
> whether a sysfs node has poll/select support.
>
> What happens if the sysfs interface did not provide poll/select support
> since day one, but rather added it later?  Nobody will use it for a *long*
> time, if ever... unless you actually took pains to version the sysfs
> interface, and people actually care.

If that's an issue, we can add a new "event" file, just for that.

>> 'thinkpad_acpi.c' uses the 'BACKLIGHT_UPDATE_SYSFS'.
>> Henrique, can we remove it?
>
> Can't you fix this by rate-limiting, or otherwise adding an attribute that
> backlight devices should set when they need to supress change events?

Yeah, great idea, fix a bad hack with another bad one on top. :)
Passing measurement data through uevents is just an utterly broken
idea which cannot be fixed.

> Is there a proper on-screen-display support path for the backlight class
> nowadays?  Otherwise, you'd be removing the only way userspace ever had to
> do proper OSD of backlight changes...

OSD drawing and event sounds usually happen as a fedback for
keypresses of brightness control, it would be weird to show up when
something else, like a light-sensor, adjusts the brightness in the
background.

Anyway, there might be the need for coordination and a new interface,
but uevents for measurement data need to die entirely; they make no
sense, never made any; and the sooner they are gone the better.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] usb: ohci/uhci - add soft dependencies on ehci_hcd

2013-09-10 Thread Kay Sievers

On Tue, Sep 10, 2013 at 7:02 PM, Alan Stern  wrote:

> Where is MODULE_SOFTDEP defined?  It isn't mentioned in any .h files in
> my kernel tree.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7cb14ba75d57910cc4b62115dd5db7bd83c93684

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG?] staging: zram: Add auto loading of module if user opens /dev/zram.

2013-09-09 Thread Kay Sievers

On Mon, Sep 9, 2013 at 5:58 PM, Tom Gundersen  wrote:
> Hi Konrad,
>
> The above commit (c70bda9 in mainline) doesn't appear to work for me.
> I.e., depmod does not create an entry in modules.devname and hence no
> device node is created on boot.
>
> If I understand correctly, you'd also need to create the correct
> "char-major--" alias. But I don't really see how this is
> meant to work as I thought zram only created /dev/zramX type device
> nodes, and not /dev/zram, or am I missing something?

Please just remove it. "devname" is meant to be used for
single-instance devices with a static dev_t, never for things like
zramX.

It will not do anything useful here, it does nothing really without a
statically assigned dev_t, and it should not be used for devices of
this kind anyway.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/3] Send cgroup_path in SCM_CGROUP

2013-09-02 Thread Kay Sievers

On Thu, Aug 29, 2013 at 4:13 PM, Jan Kaluza  wrote:
> Add new SCM type called SCM_CGROUP to send "cgroup_path" in SCM.
> This is useful for journald (systemd logging daemon) to get additional context
> with each log line received using UNIX socket.
>
> Signed-off-by: Jan Kaluza 

In many cases it's generally more useful to explain *why* something is
done, not *where* it is used. It makes it easier for people to match
the described problem to their own use-cases, where it possibly occurs
too. The problem this patch solves is very generic and not so much
specific to logging or the journal. Maybe something like this:

"Server-like processes in many cases need credentials and other
metadata of the peer, to decide if the calling process is allowed to
request a specific action, or the server just wants to log away this
type of information for auditing tasks.

The current practice to retrieve such process metadata is to look that
information up in procfs with the $PID received over SCM_CREDENTIALS.
This is sufficient for long-running tasks, but introduces a race which
cannot be worked around for short-living processes; the calling
process and all the information in /proc/$PID/ is gone before the
receiver of the socket message can look it up.

This introduces a new SCM type called SCM_CGROUP to allow the direct
attaching of "cgroup_path" to SCM, which is significantly more
efficient and will reliably avoid the race with the round-trip over
procfs."

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udev: fail firmware loading immediately if no search path is defined

2013-08-10 Thread Kay Sievers

On Sat, Aug 10, 2013 at 11:00 PM, Tom Gundersen  wrote:
> It would be simple enough to add an udev rule to just print 'ignoring
> firmware event' to the logs.

This and I guess:
  SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1"
would also just cancel the request at the same time without any other
code needed.

The udev firmware support just a configure option, just like the
kernel has them. So distributions should enable it in udev and the
kernel if they need it.

We simply cannot coordinate the defaults of systemd and the kernel
because the rules of the kernel are different. The kernel does "keep
defaults like stuff has been in the past" and udev does "make default
what makes the most sense on current systems".

> We should really ignore the event though, and
> not cancel it. Not sure if this is something we want upstream (after all,
> there are plenty of situations where we don't warn if the recommendations in
> the README file are not followed), or if distros and whoever wants it should
> ship that themselves. I'll leave that for Kay to decide.

The proper fix is that userspace firmware should be disabled in the
kernel for new systems, and kept enabled only for old systems. Old
systems need to enable a new udev version to support firmware loading.

There are currently broken in-kernel mis-users of the firmware
interface that use the firmware interface but disable uevents, they
still pull-in the user interface of the firmware loader. If nobody
wants to fix them, the code for the common users of the firmware
loader should at least get rid of the userspace fallback to call out
to userspace. At that point the udev configure option would not matter
any more.

> Lastly, note that the plan is to drop all the firmware code from udev in the
> not too distant future, so it doesn't really maker much sense to add new
> functionality to that at this point.

Right, I think all is fine. It's something that people can control
with the kernel and udev configuration options. It's just that the
defaults of the kernel and udev don't match at the moment, because
they have different policies of setting default values.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-05 Thread Kay Sievers

On Fri, Aug 2, 2013 at 6:28 PM, Zbigniew Jędrzejewski-Szmek
 wrote:
> On Fri, Aug 02, 2013 at 09:04:44AM -0700, Andy Lutomirski wrote:
>> CONFIG_FW_LOADER_USER_HELPER=y
> Do you need this? Unsetting this should help.
>
> """This option enables / disables the invocation of user-helper
> (e.g. udev) for loading firmware files as a fallback after the
> direct file loading in kernel fails. The user-mode helper is
> no longer required unless you have a special firmware file that
> resides in a non-standard path."""

On recent systems, if the kernel configures
CONFIG_FW_LOADER_USER_HELPER=y and a firmware is not found by the
kernel, the kernel will issue a request which is ignored by userspace
and will block in that for 60 seconds.

Udev is no longer in the game of firmware loading, not even as a
fallback, it will just completely ignore all kernel firmware class
events.

The source code in udev to handle firmware requests is disabled by
default, currently still kept around for old kernels without the
in-kernel firmware loader, but it will be deleted in the near future.

Any issues with firmware timeouts should be addressed in the kernel by
disabling CONFIG_FW_LOADER_USER_HELPER or by removing the fallback
code from the in-kernel loader.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 0/5] Add a hash value for each line in /dev/kmsg

2013-07-29 Thread Kay Sievers

On Mon, Jul 29, 2013 at 1:54 PM, Hidehiro Kawai
 wrote:

> Also, I heard about the discussion
> at the kernel summit 2 years ago.  According to the article of LWN,
> it seems that Linus objected your approach (i.e. adding random bit as
> message ID).  Were there some agreements on this issue at the kernel summit?

No, there are no further discussions about this.

Pre-allocated, static, randomly created 128-bit IDs are just the
simplest and most robust option to identify messages. It's an
unmanaged namespace that needs no coordination, the IDs are always
stable, never change and are guaranteed to be unique. None of the
hashing-of-the strings solutions can provide that by default.

I would expect that over time, the automatic hashes would end up
becoming static numbers explicitly add to the messages anyway, because
changing the message text will change the hash, which is nothing we
really want to deal with. For that reason, I think that we can add the
ID right away, without any of the hashing; and do that only for a very
tiny fraction of the messages where such IDs make sense and add value.

Message IDs is how userspace logging works today; so from the
userspace side this would fit into the already existing
infrastructure, while possibly changing hashes which require another
type of translation catalog would not.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] Add a hash value for each line in /dev/kmsg

2013-07-26 Thread Kay Sievers

On Wed, Jul 3, 2013 at 3:46 AM, Hidehiro Kawai
 wrote:

> This patch series adds hash values of printk format strings into
> each line of /dev/kmsg outputs as follows:
>
> 6,154,325061,-,b7db707c@kernel/smp.c:554;Brought up 4 CPUs

/dev/kmsg is to a certain degree a kernel ABI. Having source code
locations in exported log records might cause people / userspace tools
to rely on these strings and expect stability here. The kernel though
cannot make any promises of its source code layout.

The hash is supposed to identify the content of a message, but what if
someone fixes the string? Maybe someone just fixes a one char typo,
the hash will change and the message will not be recognizable any
more.

As much as "automated" hash creation sounds simple; I really think
adding explicit "manually" created random message ids to the bunch of
messages that are interesting is the better option long-term. It
shouldn't be that many messages, most of the printk output is not
really useful for automated inspection or to trigger specific actions.

Messages ideally should not have any direct context to the code that
emits them, they should just identify the message and add a few
structured properties to the message.

This is how userspace identifies log messages and maintains abstract
descriptions of the specific messages to act when they appear:
  http://www.freedesktop.org/wiki/Software/systemd/catalog/

Connecting kernel log message texts and source code locations with
message identifiers looks quite dangerous, hard to keep stable if
things are evolving. It might cause serious problems over time.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-25 Thread Kay Sievers

On Thu, Jul 25, 2013 at 5:03 PM, Dave Hansen  wrote:
> On 07/25/2013 04:14 AM, KY Srinivasan wrote:
>> As promised, I have sent out the patches for (a) an implementation of an 
>> in-kernel API
>> for onlining  and a consumer for this API. While I don't know the exact 
>> reason why the
>> user mode code is delayed (under some low memory conditions), what is the 
>> harm in having
>> a mechanism to online memory that has been hot added without involving user 
>> space code.
>
> KY, your potential problem, not being able to online more memory because
> of a shortage of memory, is a serious one.
>
> However, this potential issue exists in long-standing code, and
> potentially affects all users of memory hotplug.  The problem has not
> been described in sufficient detail for the rest of the developers to
> tell if you are facing a new problem, or whether *any* proposed solution
> will help the problem you face.
>
> Your propsed solution changes the semantics of existing user/kernel
> interfaces, duplicates existing functionality, and adds code complexity
> to the kernel.

Complexity, well, it's just a bit of code which belongs in the kernel.
The mentioned unconditional hotplug loop through userspace is
absolutely pointless. Such defaults never belong in userspace tools if
they do not involve data that is only available in userspace and
something would make a decision about that. Saying "hello" to
userspace and usrspace has a hardcoded "yes" in return makes no sense
at all. The kernel can just go ahead and do its job, like it does for
all other devices it finds too.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev: New default rule for autoloading kernel modules matching CPU modalias

2013-07-20 Thread Kay Sievers

On Sat, Jul 20, 2013 at 1:47 PM, Kay Sievers  wrote:
> On Sat, Jul 20, 2013 at 12:56 PM, Rafael J. Wysocki  wrote:
>
>> After a recent change present in 3.11-rc1 there is a driver, called 
>> processor,
>> that can be bound to the CPU devices whose sysfs directories are located 
>> under
>> /sys/devices/system/cpu/.  A side effect of this is that, after the driver 
>> has
>> been bound to those devices, the kernel adds DRIVER=processor to ENV for CPU
>> uevents and they don't match the default rule for autoloading modules 
>> matching
>> MODALIAS:
>>
>> DRIVER!="?*", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod load $env{MODALIAS}"
>>
>> any more.  However, there are some modules whose module aliases match 
>> specific
>> CPU features through the modalias string and those modules should be loaded
>> automatically if a compatible CPU is present.  Yet, with the processor driver
>> bound to the CPU devices the above rule is not sufficient for that, so we 
>> need
>> a new default udev rule allowing those modules to be autoloaded even if the
>> CPU devices have drivers.
>>
>> On my test systems I added the following rule for that:
>>
>> ACTION="add", SUBSYSTEM=="cpu", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod 
>> load $env{MODALIAS}"
>>
>> in a separate file, but I'm not a udev expert, so I guess it may be done in a
>> better way.
>>
>> Can you please consider adding such a rule to the default set of udev rules?
>
> The DRIVER!="?*" is an optimization which made a huge difference at
> the time we called out to /sbin/modprobe from udev rules and all the
> devices which already had a driver would not cause needless execution
> of the rather expensive modprobe binary.
>
> This obviously can't do the right thing for the completely generic,
> not bound to a driver-state, CPU modaliases when they have a driver
> now. :)
>
> These days we load the entire kmod modalias database into the udev
> process, and lookup what we need to load and load the module from
> within the udev worker process. It could be, that the optimization is
> not measurable on today's systems, if that's the case I'll remove it,
> which would be the simplest and most future proof option. Otherwise
> I'll add the CPU specific rule.
>
> I'll find that out and let you know.

I cannot see any measurable difference here that justifies that
optimization, so I removed it:
  
http://cgit.freedesktop.org/systemd/systemd/commit/?id=bf7f800f2b3e93ccd1229d4717166f3a4d3af72f

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev: New default rule for autoloading kernel modules matching CPU modalias

2013-07-20 Thread Kay Sievers

On Sat, Jul 20, 2013 at 12:56 PM, Rafael J. Wysocki  wrote:

> After a recent change present in 3.11-rc1 there is a driver, called processor,
> that can be bound to the CPU devices whose sysfs directories are located under
> /sys/devices/system/cpu/.  A side effect of this is that, after the driver has
> been bound to those devices, the kernel adds DRIVER=processor to ENV for CPU
> uevents and they don't match the default rule for autoloading modules matching
> MODALIAS:
>
> DRIVER!="?*", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod load $env{MODALIAS}"
>
> any more.  However, there are some modules whose module aliases match specific
> CPU features through the modalias string and those modules should be loaded
> automatically if a compatible CPU is present.  Yet, with the processor driver
> bound to the CPU devices the above rule is not sufficient for that, so we need
> a new default udev rule allowing those modules to be autoloaded even if the
> CPU devices have drivers.
>
> On my test systems I added the following rule for that:
>
> ACTION="add", SUBSYSTEM=="cpu", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod 
> load $env{MODALIAS}"
>
> in a separate file, but I'm not a udev expert, so I guess it may be done in a
> better way.
>
> Can you please consider adding such a rule to the default set of udev rules?

The DRIVER!="?*" is an optimization which made a huge difference at
the time we called out to /sbin/modprobe from udev rules and all the
devices which already had a driver would not cause needless execution
of the rather expensive modprobe binary.

This obviously can't do the right thing for the completely generic,
not bound to a driver-state, CPU modaliases when they have a driver
now. :)

These days we load the entire kmod modalias database into the udev
process, and lookup what we need to load and load the module from
within the udev worker process. It could be, that the optimization is
not measurable on today's systems, if that's the case I'll remove it,
which would be the simplest and most future proof option. Otherwise
I'll add the CPU specific rule.

I'll find that out and let you know.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Kay Sievers

On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner  wrote:
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
>> On 29.06.2013 05:05, Tim Hockin wrote:
>> > But that's not my point.  It seems pretty easy to make this cgroup
>> > management (in "native mode") a library that can have either a thin
>> > veneer of a main() function, while also being usable by systemd.  The
>> > point is to solve all of the problems ONCE.  I'm trying to make the
>> > case that systemd itself should be focusing on features and policies
>> > and awesome APIs.
>>
>> You know, getting this all right isn't easy. If you want to do things
>> properly, then you need to propagate attribute changes between the units you
>> manage. You also need something like a scheduler, since a number of
>> controllers can only be configured under certain external conditions (for
>> example: the blkio or devices controller use major/minor parameters for
>> configuring per-device limits. Since major/minor assignments are pretty much
>> unpredictable these days -- and users probably want to configure things with
>> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
>> wait for devices to show up before we can configure the parameters.) Soo...
>> you need a graph of units, where you can propagate things, and schedule 
>> things
>> based on some execution/event queue. And the propagation and scheduling are
>> closely intermingled.
>
> you are confusing policy and mechanisms.
>
> The access to cgroupfs is mechanism.
>
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
>
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
>
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.
>
> You are simply ignoring the fact, that Linux is used in other ways
> than those which you are focussed on. That's true for Google's way to
> manage its gazillion machines and that's equally true for the other
> end of the spectrum which is deep embedded or any other specialized
> use case. Just face it: running Linux on your laptop and on some RHT
> lab machines is covering about 1% of the use cases.
>
> Nevertheless you repeatedly claim, that systemd is the only way to
> deal with system startup and system management, is covering _ALL_ use
> cases and the interfaces you expose are sufficient.
>
> Did you ever work on specialized embedded or big data use cases? I
> really doubt that, but I might be wrong as usual.
>
> So I invite you to prove that you can beat an existing setup for an
> automotive use case with your magic systemd foo. I refund you fully,
> if you can beat the mark of a functional system less than 800ms after
> reset release on a 200MHz ARM machine. Functional is defined by the
> use case requirements and means:
>
> - Basic cgroups management working
> - GUI up and running
> - Main communication interface (CAN bus) up and running
>
> The rest of the system is starting up after that including a more
> complex cgroup management.
>
> According to your claim that systemd is covering everything and some
> more, this should take you a few hours. I grant you a full week to
> work on that.
>
> The use case Tim is talking about is different, but has similar
> constraints which are completely driven by his particular use case
> scenario. I'm sure, that Tim can persuade his management to setup a
> similar contest to prove your expertise on the other extreme of the
> Linux world.
>
> Before answering please think about the relevance of your statements
> "getting this all right isn't easy", "something like a scheduler",
> "users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
> those contexts.

I don't think anybody needs your money.

But it's sure an improvement over last time when you wanted to use a
"Kantholz" to make your statement.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: make high precision system boot time available

2013-06-30 Thread Kay Sievers

On Sat, Jun 29, 2013 at 10:53 PM, Sami Kerola  wrote:

> BTW having a way to measure effect of suspend/resume could lead to a
> way to fix time time distortion.

> Perhaps there is better alternative to fix user space programs.
> Unfortunately I do not have either knowledge, or imagination, to come
> up with any. Fix hints, ideas, and other proposal are most welcome.

Stuff should just use:
  clock_gettime(CLOCK_BOOTTIME, &ts);

to get the time in better resolution, and with suspend time taken into account?

As a general note, the kmsg/printk() time in the raw kernel log is the
time of the CPU it runs on, especially during early init this time
might not be in sync across CPUs and stuff jumps event in the very
same stream on events from the kernel itself:
  
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/clock.c#n329

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] firmware: Fix usermodehelper deadlock at shutdown

2013-05-08 Thread Kay Sievers

On Wed, May 8, 2013 at 6:26 PM, Takashi Iwai  wrote:
> At Thu, 9 May 2013 00:07:17 +0800,
> Ming Lei wrote:

>> On Wed, May 8, 2013 at 2:56 PM, Takashi Iwai  wrote:
>> > Hi,
>> >
>> > this is a series of patches for the issue we faced in the firmware
>> > loader code during debugging the problem with dell_rbu driver with
>> > 3.9 kernel.
>> >
>> > The original problem was that the shutdown gets stuck when DELL BIOS
>> > update is performed.  This turned out to be a problem in the firmware
>> > loader.  Although the reason of dell_rbu driver breakage is still
>>
>> Sorry, from these patchset, I can't see why it is a problem in firmware.
>>
>> > unclear, we should fix the firmware loader side, at least, not to
>> > stall during shutdown.
>>
>> Firstly you need to describe what/why is the stall? In fact, firmware
>> loading can't stall forever and it will timeout, but the current 60sec
>> timeout might be too long.
>
> The timeout check is activated only when uevent flag is set, and
> dell_rbu driver doesn't set it explicitly (because it's not supposed
> to be handled via udev or whatever).

These use the firmware loader not in the way the interface was intended:
  drivers/misc/lattice-ecp3-config.c
  drivers/firmware/dell_rbu.c

They just use the mechanism without any of the usual userspace setup.
It's really a nasty hack to hijack the interface that way.

The commit 6e3eaab02028c4087a92711b20abb9e72cc803a7 is a pretty broken
idea to start with. If something triggers uevents during runtime which
is not uncommon, these on-demand silently created firmware devices
would get really confused and race against the udev firmware loader
which cancels the events.

As if the userspace firmware loading in general wasn't a bad enough
idea already. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-25 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:55 PM, John Stultz  wrote:

>> FWIW, in the light of the original change, I've just removed the
>> /dev/rtc creation from the default udev rules now, so that thing will
>> be phased out in the future.
>
> Is that actually wanted? What happens to applications that use /dev/rtc?
>
> I think setting up the /dev/rtc link is important. Its just that setting it
> up exclusively by the hctosys flag is maybe more fragile then we'd like.
> Instead the hctosys flag maybe should only be used as a hint if there is
> more then one RTC available.

Ok, convinced.

I've changed the udev rules now to first "search" for the rtc with
"hctosys" flag set, and if none is found, just fall back to /dev/rtc0.

It should work reliably on most boxes, and still do the right thing in
most cases if none of the rtcs has that flag.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-25 Thread Kay Sievers

On Thu, Apr 25, 2013 at 8:33 PM, Jason Gunthorpe
 wrote:
> So, my conclusion is nobody with a RTC looking for space savings, will
> disable CONFIG_RTC, which means we can safely rely on
> CONFIG_RTC_SYSTOHC to do this work. To that end, I would encourage
> everyone who sets CONFIG_GENERIC_CMOS_UPDATE to also set
> CONFIG_RTC_SYSTOHC - that will let update_persistent_clock to be
> ripped out over time without impacting users.

John's original patch forcefully disabled CONFIG_RTC_SYSTOHC on x86,
which is quite the opposite of what you recommend here. :)

Could you guys both sort that out and give an idea what the
recommended setup should look like today, ignoring all space saving
and possible hctosys API changes caused by this. Should
CONFIG_RTC_SYSTOHC be enabled or not?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-25 Thread Kay Sievers

On Thu, Apr 25, 2013 at 9:11 AM, Alexander Holler  wrote:

> Hmm, I thought RTC_SYSTOHC was there to update the used RTC clock with the
> time from NTP (and liked that).

That seems to have the nice self-explaining name CONFIG_GENERIC_CMOS_UPDATE. :)

> Therefor I don't understand why it is
> redundant and unnecessary on x86.

Because the x86 native RTC/cmos is updated with platform code, not
generic rtc code:
  arch/x86/kernel/rtc.c

> Of course, most systems do have something
> in userspace to set the RTC on shutdown, so it isn't really needed.

That is gone on most systems today. Systems without NTP or something
else running have no clue about the time, and should not touch the
hardware clock with a boot cycle. Only if a reliable time source like
NTP is available, it should update the hardware clock accordingly.

> Anyway, thanks a lot for the great overview.

Yeah, thanks John, from my side too.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 11:51 PM, Josh Boyer  wrote:

>> In the daemon case, it's nice to be able to drop privileges after
>> setting up resources. The past was open /proc/kmsg with CAP_SYS_ADMIN,
>> then drop CAP_SYS_ADMIN and keep reading. Then later CAP_SYS_LOG was
>> introduced. So if a daemon switched from /proc/kmsg to /dev/kmsg they
>> wouldn't be able to drop the capability. But, it's much saner to carry
>> CAP_SYS_LOG than CAP_SYS_ADMIN on a long-running daemon.
>
> I have no idea on this front.  I'll let Kay speak to that.

The original code checks once at open() only, which would allow to do
do all that privilege dropping. It is how I would expect it to work,
instead of checking the permissions at every read().

> On my
> currently running Fedora 18 system, I actually have systemd-journald
> using /dev/kmsg

That's the recent structured logging stuff.

> and rsyslog using /proc/kmsg.

That's the old plain text syslog daemon stuff.

> Why I have both, I have no friggin idea.

Nobody removed the old syslog dameon by default from the distro. If
you don't want or need the plain text files in /var/log/ anymore, just
uninstall it and use journalctl(1) to see the system logs from then
on.

>> Is there an intention to use /dev/kmsg for the syslog management daemon?

Not that I know.

> Maybe?  I mean, systemd-journald seems to be using it for something.
> Kay?

I doubt that old syslog implementations will be ported to a new kernel
interface. They work just fine the way they are, and the structured
data that is additionally put out on the new interface, they cannot
really store away anyway in their plain text files, so they do not
gain anything really.

What we can probably expect though, is that in the future the default
systems will not install any old syslog daemon, which uses that
interface anymore.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:32 PM, John Stultz  wrote:
> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
> which enables some minor compile time optimization to avoid
> uncessary code in mostly the suspend/resume path could cause
> problems for userland.
>
> In particular, the dependency for RTC_HCTOSYS on
> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
> twice and simplifies suspend/resume, has the side effect
> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
> zero, and this flag is commonly used by udev to setup the
> /dev/rtc symlink to /dev/rtcN, which can cause pain for
> older applications.

An alternative to reverting this could be to set that flag
unconditionally on the rtc that matches the persistent clock the
kernel uses internally?

I mean regardless of the pretty weird config option
CONFIG_RTC_HCTOSYS_DEVICE, which internally just does a strcmp() with
the given device name when the flag is queried. :)

Can't we just have the same interface and your original optimization
on x86, even without CONFIG_RTC_HCTOSYS_DEVICE enabled at all?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:55 PM, John Stultz  wrote:
> On 04/24/2013 11:41 AM, Kay Sievers wrote:
>>
>> On Wed, Apr 24, 2013 at 8:32 PM, John Stultz 
>> wrote:
>>>
>>> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
>>> which enables some minor compile time optimization to avoid
>>> uncessary code in mostly the suspend/resume path could cause
>>> problems for userland.
>>>
>>> In particular, the dependency for RTC_HCTOSYS on
>>> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
>>> twice and simplifies suspend/resume, has the side effect
>>> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
>>> zero, and this flag is commonly used by udev to setup the
>>> /dev/rtc symlink to /dev/rtcN, which can cause pain for
>>> older applications.
>>
>> FWIW, in the light of the original change, I've just removed the
>> /dev/rtc creation from the default udev rules now, so that thing will
>> be phased out in the future.
>
> Is that actually wanted? What happens to applications that use /dev/rtc?
>
> I think setting up the /dev/rtc link is important. Its just that setting it
> up exclusively by the hctosys flag is maybe more fragile then we'd like.
> Instead the hctosys flag maybe should only be used as a hint if there is
> more then one RTC available.

So far we only created the symlink for an rtc with the hctosys flag
set. It was added as a workaround for ARM, which sometimes has
multiple RTCs. But there is now also logic in udev/systemd anyway to
search for the system's rtc, which does not rely on the symlink. Other
commonly used tools we checked just use /dev/rtc0 directly.

As mentioned in the mail yesterday, I expected that
CONFIG_RTC_HCTOSYS=y was also needed on x86. The current udev logic
would need updating anyway, if we disable CONFIG_RTC_HCTOSYS=y now. So
let's just find out what's really needed here and add it back properly
if something really needs it.

While we are at it, the interface to read and set the persistent clock
from userspace, the clock the kernel internally uses, is still to just
use the /dev/rtc0 device with the ioctls? There is no other native
kernel timer interface or something else for that, right?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:32 PM, John Stultz  wrote:
> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
> which enables some minor compile time optimization to avoid
> uncessary code in mostly the suspend/resume path could cause
> problems for userland.
>
> In particular, the dependency for RTC_HCTOSYS on
> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
> twice and simplifies suspend/resume, has the side effect
> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
> zero, and this flag is commonly used by udev to setup the
> /dev/rtc symlink to /dev/rtcN, which can cause pain for
> older applications.

FWIW, in the light of the original change, I've just removed the
/dev/rtc creation from the default udev rules now, so that thing will
be phased out in the future.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg

2013-04-24 Thread Kay Sievers

On Tue, Apr 9, 2013 at 6:33 PM, Kees Cook  wrote:
> On Tue, Apr 9, 2013 at 8:48 AM, Josh Boyer  wrote:
>> The dmesg_restrict sysctl currently covers the syslog method for access
>> dmesg, however /dev/kmsg isn't covered by the same protections.  Most
>> people haven't noticed because util-linux dmesg(1) defaults to using the
>> syslog method for access in older versions.  With util-linux dmesg(1)
>> defaults to reading directly from /dev/kmsg.
>>
>> Fix this by reworking all of the access methods to use the
>> check_syslog_permissions function and adding checks to devkmsg_open and
>> devkmsg_read.
>>
>> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192
>>
>> Reported-by: Christian Kujau 
>> CC: sta...@vger.kernel.org
>> Signed-off-by: Eric Paris 
>> Signed-off-by: Josh Boyer 
>
> Thanks!
>
> Acked-by: Kees Cook 

If that's the version currently in Fedora, we just cannot do this.
   https://bugzilla.redhat.com/show_bug.cgi?id=952655

/dev/kmsg is supposed, and was added, to be a sane alternative to
syslog(). It is already used in dmesg(1) which is now broken with this
patch.

The access rules for /dev/kmsg should follow the access rules of
syslog(), and not be any stricter.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 6:30 PM, John Stultz  wrote:

>> Until the above commits we always needed:
>>CONFIG_RTC_HCTOSYS=y
>>CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
>> to get the system time correctly initialized at bootup on x86.

> So... always needed to get system time correctly initialized? I'm not sure I
> agree with this. On non-x86 (mostly embedded) platforms that do not have
> persistent_clock support, yes, the above is needed.

Yeah, right, that's an "always" like the "forever" in "we support that
for forever" :)

> But I'm unaware of the above actually being necessary on x86, as its always
> had persistent_clock support.

Maybe it goes back longer, I remember that we needed to run hwclock in
userspace otherwise we had the 1970 state.

Here is the Fedora bug from 2009 to enable it:
  https://bugzilla.redhat.com/show_bug.cgi?id=489494

>> These options are gone now and cannot be selected anymore. You are
>> saying that this is all fine, that they are gone, but all initial
>> clock syncing should still work?
>
> Yes, we're just removing a duplicative initialization of time, and compiling
> out code in the suspend/resume path that would never trigger when
> persistent_clocks were present.

I see, makes sense.

>> Also:
>>$ cat /sys/class/rtc/rtc0/hctosys
>>0
>> always carried "1", and this now breaks setups which expect an
>> automatically created symlink /dev/rtc to the actual "system rtc".
>
>
> Sigh. So just turning off HCTOSYS on those systems causes them to break?

Well, the symlink is no longer there, which is visible. People asked
where it is gone now. That's the "breakage" which might not deserve
that word, if nothing really breaks that way. Stuff we looked at so
far, falls back to /dev/rtc0 which covers that.

> That is sort of obnoxiously fragile.  I've always been somewhat skeptical of
> the multi-rtc configs - as they're all the "system's" RTCs after all. 99%
> probably only have one rtc device, so checking the hctosys in that case is a
> little silly.

Yeah, ARM is as a mess, they often have rtc1 as the "system rtc", that
is why all this symlink game was "invented".

> But the terribly annoying interface breakage when /dev/rtc went to /dev/rtcN
> with the generic rtc layer landing shouldn't have happened, so I won't
> begrudge too much the userland hacks needed to fix that up.

Right.

> So I'll send Thomas a revert for the HCTOSYS optimization. In the kernel
> we'll still avoid using HCTOSYS when the persistent_clock is there, but at
> least userland will still have some /sys/class/rtc/rtcN that has the
> "offical" flag.

So in case you really revert it, x86 should not enable any of that RTC
stuff by default, right?

>> There was also always a line in dmesg that told the rtc_cmos time it
>> wrote to the system clock. This is also gone?
>
> More worrisome, I'll turn the question around: is that an expected interface
> never to break?

No, sure not. I was just noticing that, when looking what was going
on, and I couldn't make sense out of it before you explained the
details.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 6:07 PM, John Stultz  wrote:

> So summarizing the above, because as much as I'm aware, its always been
> redundant and unnecessary on x86.  Thus being able at build time to mark it
> as unnecessary was attractive, since it reduced the code paths running at
> suspend/resume.
>
> That said, Kay's concerns about userland implications (basically the
> userland side effects of SYSTOHC being enabled) give me pause, so I may
> revert the HAS_PERSISTENT_CLOCK on x86 change.

Thanks a lot for all the missing details!

No, I think that all makes too much sense to revert it. Let's just
find a way to solve it properly. I don't think it is of any pressing
importance to keep the old behaviour, if we can still provide the
functionality today.

I'll continue replying in the later mail ...

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 5:33 AM, Kay Sievers  wrote:

> Also:
>   $ cat /sys/class/rtc/rtc0/hctosys
>   0
> always carried "1", and this now breaks setups which expect an
> automatically created symlink /dev/rtc to the actual "system rtc".

We used to do this in upstream standard udev rules:
  SUBSYSTEM=="rtc", ATTR{hctosys}=="1", MODE="0644"
  
http://cgit.freedesktop.org/systemd/systemd/tree/rules/50-udev-default.rules#n18

If that information is expected to be gone now, we need to update some
tools. Whats' the proper way now to find the "system rtc" to use?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 5:19 AM, John Stultz  wrote:
> On 04/23/2013 08:05 PM, Kay Sievers wrote:
>>
>> On Wed, Apr 24, 2013 at 4:43 AM, John Stultz 
>> wrote:
>>>
>>> On 04/23/2013 06:34 PM, Kay Sievers wrote:
>>>>
>>>> Hey,
>>>>
>>>> what's the intention of:
>>>> e90c83f757fffdacec8b3c5eee5617dcc038338f ?
>>>> x86: Select HAS_PERSISTENT_CLOCK on x86
>>>>
>>>> It unconditionally sets:
>>>> HAS_PERSISTENT_CLOCK
>>>> but:
>>>> RTC_SYSTOHC
>>>> got a depends on !HAS_PERSISTENT_CLOCK
>>>>
>>>> This makes it impossible to sync the system time from the RTC on x86.
>>>> What's going on here?
>>>
>>>
>>> So I suspect just some confusion, but let me know if thats wrong and
>>> you're
>>> actually seeing an issue.
>>>
>>> SYSTOHC is what *sets the RTC* to the system time when we're synced with
>>> NTP.
>>>
>>> HCTOSYS is what sets the system time from the RTC.
>>
>> Right, and RTC_HCTOSYS is not NTP related. It just reads the time from
>> the RTC_HCTOSYS_DEVICE at bootup so we do not boot in 1970 time mode.
>> We need that it in all cases, at every bootup on x86. But it's no
>> longer there with the above commits. :)
>
> On x86 the persistent_clock() is backed by the CMOS/EFI/kvm-wall/xen/vrtc
> clock (all via x86_platform.get_wallclock) should be present and we'll
> initialize the time in timekeeping_init() there.
>
> Its only systems where there isn't a persistent_clock is where the RTC layer
> and the HCTOSYS is helpful.
>
> Again, if you're having a problem where an x86 system isn't getting its time
> initialized correctly, please let me know the details of the system.

Until the above commits we always needed:
  CONFIG_RTC_HCTOSYS=y
  CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
to get the system time correctly initialized at bootup on x86.

These options are gone now and cannot be selected anymore. You are
saying that this is all fine, that they are gone, but all initial
clock syncing should still work?

Also:
  $ cat /sys/class/rtc/rtc0/hctosys
  0
always carried "1", and this now breaks setups which expect an
automatically created symlink /dev/rtc to the actual "system rtc".

There was also always a line in dmesg that told the rtc_cmos time it
wrote to the system clock. This is also gone?

I haven't looked what goes wrong, I expected the make(1) errors with
"time in the future" after bootup before the network is up, which I
see since a week or two, to be a fallout of that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 4:43 AM, John Stultz  wrote:
> On 04/23/2013 06:34 PM, Kay Sievers wrote:
>>
>> Hey,
>>
>> what's the intention of:
>>e90c83f757fffdacec8b3c5eee5617dcc038338f ?
>>x86: Select HAS_PERSISTENT_CLOCK on x86
>>
>> It unconditionally sets:
>>HAS_PERSISTENT_CLOCK
>> but:
>>RTC_SYSTOHC
>> got a depends on !HAS_PERSISTENT_CLOCK
>>
>> This makes it impossible to sync the system time from the RTC on x86.
>> What's going on here?
>
>
> So I suspect just some confusion, but let me know if thats wrong and you're
> actually seeing an issue.
>
> SYSTOHC is what *sets the RTC* to the system time when we're synced with
> NTP.
>
> HCTOSYS is what sets the system time from the RTC.

Right, and RTC_HCTOSYS is not NTP related. It just reads the time from
the RTC_HCTOSYS_DEVICE at bootup so we do not boot in 1970 time mode.
We need that it in all cases, at every bootup on x86. But it's no
longer there with the above commits. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

Hey,

what's the intention of:
  e90c83f757fffdacec8b3c5eee5617dcc038338f ?
  x86: Select HAS_PERSISTENT_CLOCK on x86

It unconditionally sets:
  HAS_PERSISTENT_CLOCK
but:
  RTC_SYSTOHC
got a depends on !HAS_PERSISTENT_CLOCK

This makes it impossible to sync the system time from the RTC on x86.
What's going on here?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core: add uid and gid to devtmpfs

2013-04-07 Thread Kay Sievers

On Sat, Apr 6, 2013 at 7:58 PM, Greg KH  wrote:
> On Sat, Apr 06, 2013 at 06:45:12PM +0100, Al Viro wrote:
>> On Sat, Apr 06, 2013 at 10:26:12AM -0700, Greg KH wrote:
>>
>> > Why not?  "closed" systems, like Android and other embedded systems,
>> > have "assigned" uid and gid values that never change.  Right now they
>> > have to have a horrible shell-script to set these values in devtmpfs
>> > when the device shows up to the needed numbers.  This tiny patch gets
>> > rid of that shell script entirely, allowing them to specify it from the
>> > kernel as needed.
>>
>> What's to stop them from using static /dev?  Has an extra benefit of
>> getting rid of devtmpfs shite completely...
>
> True, it would, but they like using devtmpfs :)
>
> This change also allows systems that have "control" devices to properly
> be able to pass in the uid for the device they are creating, like rawctl
> (which I know is "obsolete"), and probably dm and lvm.  I thought loop
> devices might also want this, as they can now be created by normal
> users, but I don't think that's needed for them.

It is generally useful to be able to control the uid/gid of
userspace-initiated device nodes, instead of racy post-adjusting this
setting from udev rules with an unpredictable long window between the
request and the adjustment.

The created device node can inherit the ownership of the calling
process, in a similar way like we do for devpts.

We have the same for the mode already, and want to be able to do the
same for the other permissions properties.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] init: fix name of root device in /proc/mounts

2013-03-20 Thread Kay Sievers

On Wed, Mar 20, 2013 at 10:11 PM, William Hubbs  wrote:
> On Wed, Mar 20, 2013 at 02:03:20AM -0500, Rob Landley wrote:
>> On 03/19/2013 07:20:17 PM, William Hubbs wrote:
>> > On Tue, Mar 19, 2013 at 04:17:11PM -0700, H. Peter Anvin wrote:
>> > > On 03/19/2013 03:28 PM, William Hubbs wrote:
>> > > > The issue is that /dev/root appears in /proc/mounts if you do not
>> > > > boot with an initramfs, but /dev/root is not a device node. In the
>> > > > past, udev created a symbolic link from /dev/root to the
>> > > > appropriate block device, but it does not do this any longer.
>> > Also,
>> > > > devtmpfs does not create this symbolic link.
>> > > >
>> > > > This is causing bugs with software that depends on the existence
>> > > > of /dev/root [2] for example.
>> > >
>> > > Seems okay to me, although even better would be to use the udev name
>> > > of the device in question.
>> >
>> > I'm not following what you mean.
>> >
>> > The problem is that "/dev/root" should not be in /proc/mounts,
>> > since there is always another entry that points to the root
>> > file system.
>>
>> What gave you that idea?
>>
>> wget http://landley.net/aboriginal/bin/system-image-i686.tar.bz2
>> extract it and ./run-emulator.sh and in there:
>>
>> (i686:1) /home # cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/root / squashfs ro,relatime 0 0
>> proc /proc proc rw,relatime 0 0
>> sys /sys sysfs rw,relatime 0 0
>> dev /dev devtmpfs rw,relatime,size=63072k,nr_inodes=15768,mode=755 0 0
>> dev/pts /dev/pts devpts rw,relatime,mode=600 0 0
>> /tmp /tmp tmpfs rw,relatime 0 0
>> /home /home tmpfs rw,relatime 0 0
>>
>> Userspace can totally determine what /dev/root points to, I made mdev
>> do it in 2006 (udev started doing so shortly thereafter). Busybox git
>> commit a7e3d052.:4
>
> There are situations where it doesn't work though -- suppose that root
> is btrfs for example.
>
> Also, the other message that answered you is correct, the udev
> maintainers say we should not be relying on /dev/root at all so to make
> it work distro packagers have to add a rule themselves.
>
> Kay,
>
> if you are reading, can you jump in and explain why /dev/root is a bad
> idea?

stat("/") is just the better approach for tools to find out what
"root" is, there is not much point in doing symlinks here just because
the kernel uses that name to mount internally.

/dev/root was never part of the default udev setup, it happened in the
distros init scripts, and only some distributions added that.

Newer filesystems like btrfs do not have an 1:1 fs:device relation
anyway, there cannot be a /dev/root symlink anymore, so tools need to
catch up here anyway, and the sooner the better. /dev/root is a
concept that will probably no longer exist in the future, because
filesystems will work differently than they used to.

As Peter said, the kernel should internally just use the name of the
kernel block device instead of inventing and exporting the name of an
artificial /dev/root node, which does not exist later in the real
system.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 3:36 PM, Myron Stowe  wrote:
> On Sun, 2013-03-17 at 15:29 +0100, Kay Sievers wrote:
>> On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe  wrote:
>> > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote:
>> >> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
>> >>  wrote:
>> >> > I'm assuming that the device only breaks because udevadm is dumping the
>> >> > full I/O port register space of the device and that if an actual driver
>> >> > was interacting with it through this interface that it would work.  Who
>> >> > knows how many devices will have read side-effects by udevadm blindly
>> >> > dumping these files.  Thanks,
>> >>
>> >> Sysfs is a too public interface to export things there which make
>> >> devices/driver choke on a simple read() of an attribute.
>> >>
>> >> This is nothing specific to udevadm, any tool can do that. Udevadm
>> >> will never read any of the files during normal operation. The admin
>> >> explicitly asked udevadm with a specific command to dump all the stuff
>> >> the device offers.
>> >>
>> >> The kernel driver needs to be fixed to allow that, in the worst case,
>> >> the attributes not exported at all. People should take more care what
>> >> they export in /sys, it's not a hidden and private ioctl what's
>> >> exported there, stuff is very visible and will be looked at.
>> >>
>> >> Telling userspace not to use specific stuff in /sys I would not expect
>> >> to work as a strategy; there is too much weird stuff out there that
>> >> will always try to do that ...
>> >
>> > Kay - could you comment on Foot Note 3 in
>> > https://lkml.org/lkml/2013/3/16/168
>> >
>> > With respect to 'udev', you are working on the assumption that all files
>> > in sysfs must be readable with no consequences which may be implied by
>> > the Documentation's sysfs.txt file's mentioning ASCII.  If we are to
>> > interpret that as strictly as you seem to want to then why is there
>> > sysfs support for creating binary files?
>>
>> They cannot be distinguished from outside, so there is nothing I know
>> that could make a difference to userspace tools.
>
> Agreed
>>
>> Tools -- no matter how useful they are not not, it's that they do that
>> for many years already -- need to be able to read() the stuff in
>> there, without causing any damage to the system.
>
> So then, why are certain sysfs files skipped in udevadm-info's parsing
> (./src/udevadm-info.c::skip_attribute())?

Because they are not useful to use in udev rules, or are just not
recommended to use in rules because they break other assumptions and
would encode specific settings, which can rightfully change at
runtime, into rules.

The list is in no way a list to ensure a system/driver/device is not
choking on read().

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe  wrote:
> On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote:
>> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
>>  wrote:
>> > I'm assuming that the device only breaks because udevadm is dumping the
>> > full I/O port register space of the device and that if an actual driver
>> > was interacting with it through this interface that it would work.  Who
>> > knows how many devices will have read side-effects by udevadm blindly
>> > dumping these files.  Thanks,
>>
>> Sysfs is a too public interface to export things there which make
>> devices/driver choke on a simple read() of an attribute.
>>
>> This is nothing specific to udevadm, any tool can do that. Udevadm
>> will never read any of the files during normal operation. The admin
>> explicitly asked udevadm with a specific command to dump all the stuff
>> the device offers.
>>
>> The kernel driver needs to be fixed to allow that, in the worst case,
>> the attributes not exported at all. People should take more care what
>> they export in /sys, it's not a hidden and private ioctl what's
>> exported there, stuff is very visible and will be looked at.
>>
>> Telling userspace not to use specific stuff in /sys I would not expect
>> to work as a strategy; there is too much weird stuff out there that
>> will always try to do that ...
>
> Kay - could you comment on Foot Note 3 in
> https://lkml.org/lkml/2013/3/16/168
>
> With respect to 'udev', you are working on the assumption that all files
> in sysfs must be readable with no consequences which may be implied by
> the Documentation's sysfs.txt file's mentioning ASCII.  If we are to
> interpret that as strictly as you seem to want to then why is there
> sysfs support for creating binary files?

They cannot be distinguished from outside, so there is nothing I know
that could make a difference to userspace tools.

Tools -- no matter how useful they are not not, it's that they do that
for many years already -- need to be able to read() the stuff in
there, without causing any damage to the system.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
 wrote:
> I'm assuming that the device only breaks because udevadm is dumping the
> full I/O port register space of the device and that if an actual driver
> was interacting with it through this interface that it would work.  Who
> knows how many devices will have read side-effects by udevadm blindly
> dumping these files.  Thanks,

Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.

This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.

The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.

Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-10 Thread Kay Sievers

On Sun, Mar 10, 2013 at 6:24 PM, Greg Kroah-Hartman
 wrote:
> On Sun, Mar 10, 2013 at 06:00:18PM +0100, Kay Sievers wrote:
>> On Sun, Mar 10, 2013 at 5:45 PM, Greg Kroah-Hartman
>>  wrote:
>> > On Sun, Mar 10, 2013 at 04:57:02AM -0700, Tejun Heo wrote:
>> >> Hey, guys.
>> >>
>> >> On Fri, Mar 08, 2013 at 01:04:25AM +0100, Kay Sievers wrote:
>> >> > > Sorry for the delay, I'm at a conference all this week, and haven't 
>> >> > > had
>> >> > > much time to think about this.
>> >> > >
>> >> > > If Kay says this is ok for now, that's good enough for me.
>> >> >
>> >> > Yes, it looks fine to me. If we provide the unified handling of
>> >> > classes and buses some day, this can probably go away, but until that
>> >> > it looks fine and is straight forward to do it that way,
>> >>
>> >> How should this be routed?  I can take it but Kay needs it too so
>> >> workqueue tree probably isn't the best fit although I can set up a
>> >> separate branch if needed.
>> >
>> > What patch set does Kay need it for?  I have no objection for you to
>> > take it through the workqueue tree:
>>
>> The dbus bus has the same issues and needs the devices put under
>> virtual/ and not the devices/ root.
>
> Yes, but I can keep Tejun's patch in my local queue for now, dbus is
> going to not make 3.10, right?

No, sure not. It's just something we will need there too, but there is
no hurry, it's only a cosmetic issue anyway and nothing that matters
functionality-wise.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-10 Thread Kay Sievers

On Sun, Mar 10, 2013 at 5:45 PM, Greg Kroah-Hartman
 wrote:
> On Sun, Mar 10, 2013 at 04:57:02AM -0700, Tejun Heo wrote:
>> Hey, guys.
>>
>> On Fri, Mar 08, 2013 at 01:04:25AM +0100, Kay Sievers wrote:
>> > > Sorry for the delay, I'm at a conference all this week, and haven't had
>> > > much time to think about this.
>> > >
>> > > If Kay says this is ok for now, that's good enough for me.
>> >
>> > Yes, it looks fine to me. If we provide the unified handling of
>> > classes and buses some day, this can probably go away, but until that
>> > it looks fine and is straight forward to do it that way,
>>
>> How should this be routed?  I can take it but Kay needs it too so
>> workqueue tree probably isn't the best fit although I can set up a
>> separate branch if needed.
>
> What patch set does Kay need it for?  I have no objection for you to
> take it through the workqueue tree:

The dbus bus has the same issues and needs the devices put under
virtual/ and not the devices/ root.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-07 Thread Kay Sievers

On Fri, Mar 8, 2013 at 12:31 AM, Greg Kroah-Hartman
 wrote:
> On Tue, Mar 05, 2013 at 12:43:27PM -0800, Tejun Heo wrote:
>> On Sun, Mar 03, 2013 at 07:42:31AM +0100, Kay Sievers wrote:
>> > On Sat, Mar 2, 2013 at 7:17 PM, Greg Kroah-Hartman
>> >  wrote:
>> > > On Fri, Mar 01, 2013 at 07:24:21PM -0800, Tejun Heo wrote:
>> > >> Kay tells me the most appropriate place to expose workqueues to
>> > >> userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
>> > >> symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
>> > >> a way to do that outside of driver core as virtual_device_parent()
>> > >> isn't exported and there's no inteface to conveniently create a
>> > >> virtual subsystem.
>> > >
>> > > I'm almost afraid to ask what you want to export to userspace for a
>> > > workqueue that userspace would care about...
>> > >
>> > > If you create a subsystem, the devices will show up under the virtual
>> > > "bus" if you don't give them a parent, so this patch shouldn't be
>> > > needed, unless you are abusing the driver model.  What am I missing
>> > > here?
>> >
>> > Unfortunately, the parent == NULL --> /sys/devices/virtual//
>> > we have only implemented for classes, and not for buses. We should fix
>> > that.
>>
>> Greg, how should I proceed on this?  As I wrote before, I don't really
>> care about where or how.  As long as I can make workqueues visible to
>> userland, I'm happy.
>
> Sorry for the delay, I'm at a conference all this week, and haven't had
> much time to think about this.
>
> If Kay says this is ok for now, that's good enough for me.

Yes, it looks fine to me. If we provide the unified handling of
classes and buses some day, this can probably go away, but until that
it looks fine and is straight forward to do it that way,

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] fs: Limit sys_mount to only request filesystem modules.

2013-03-05 Thread Kay Sievers

On Mon, Mar 4, 2013 at 8:51 AM, Eric W. Biederman  wrote:
>
> Modify the request_module to prefix the file system type with "fs-"
> and add aliases to all of the filesystems that can be built as modules
> to match.
>
> A common practice is to build all of the kernel code and leave code
> that is not commonly needed as modules, with the result that many
> users are exposed to any bug anywhere in the kernel.
>
> Looking for filesystems with a fs- prefix limits the pool of possible
> modules that can be loaded by mount to just filesystems trivially
> making things safer with no real cost.

'-' is a commonly used part of a module name, and does not mix well
with ramdom user provided names.

We usually use ':' as the prefix separator for modaliases, when
user-supplied strings are prefixed with the subsystem.

I think it would be nicer to change that, and I'm sure some creative
guy calls the next filesystem of the month fs-$something :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-02 Thread Kay Sievers

On Sat, Mar 2, 2013 at 7:17 PM, Greg Kroah-Hartman
 wrote:
> On Fri, Mar 01, 2013 at 07:24:21PM -0800, Tejun Heo wrote:
>> Kay tells me the most appropriate place to expose workqueues to
>> userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
>> symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
>> a way to do that outside of driver core as virtual_device_parent()
>> isn't exported and there's no inteface to conveniently create a
>> virtual subsystem.
>
> I'm almost afraid to ask what you want to export to userspace for a
> workqueue that userspace would care about...
>
> If you create a subsystem, the devices will show up under the virtual
> "bus" if you don't give them a parent, so this patch shouldn't be
> needed, unless you are abusing the driver model.  What am I missing
> here?

Unfortunately, the parent == NULL --> /sys/devices/virtual//
we have only implemented for classes, and not for buses. We should fix
that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pstore: Create a convenient mount point for pstore

2013-02-27 Thread Kay Sievers

On Tue, Feb 12, 2013 at 12:15 AM, Josh Boyer  wrote:
> Using /dev/pstore as a mount point for the pstore filesystem is slightly
> awkward.  We don't normally mount filesystems in /dev/ and the /dev/pstore
> file isn't created automatically by anything.  While this method will
> still work, we can create a persistent mount point in sysfs.  This will
> put pstore on par with things like cgroups and efivarfs.

Mounted by default now, if available:
  
http://cgit.freedesktop.org/systemd/systemd/commit/?id=c06bf414042cd1bf94e0af63e9e2a0c291bfc546

Thanks everybody,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Kay Sievers

On Mon, Feb 25, 2013 at 11:43 PM, Greg KH  wrote:
> On Mon, Feb 25, 2013 at 11:39:52PM +0100, Kay Sievers wrote:
>> On Mon, Feb 25, 2013 at 11:12 PM, Greg KH  wrote:
>>
>> > Hm, I thought we were frowning apon running binaries from udev rules
>> > these days, especially ones that might have big consequences (like
>> > resizing a disk image) like this.
>> >
>> > Kay, am I right?
>>
>> We removed most of them from the default setups, yes. But there is
>> nothing wrong if people want to ship that in some package or as custom
>> rules.
>>
>> It looks fine to me, we would just not add such things to the default
>> set of of rules these days.
>>
>> > We already emit KOBJECT_CHANGE events when block devices change, from
>> > within the block core code.  Why is the patch below needed instead of
>> > using these events that are already generated?  How are virtio block
>> > devices special?
>>
>> I think we only do that for dm and md and a couple of special cases
>> like loop and read-only settings.
>
> What about when we repartition a block device?  I've seen the events
> happen then.

Right, from the common block code we send events for removable media
changes like cdroms, sd cards, when a device is switched to read-only,
and when we re-scan a partition table like on re-partitioning. Most of
the other events are block subsystem-specific like this one. For
things like device-mapper they are used pretty heavily.

> Anyway, if you are ok with this, no objection from my side then Rusty.

Looks fine to me, it should not do any harm if there are not heavy
programs hooked up -- which is nothing the kernel could fix if people
do that. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Kay Sievers

On Mon, Feb 25, 2013 at 11:12 PM, Greg KH  wrote:

> Hm, I thought we were frowning apon running binaries from udev rules
> these days, especially ones that might have big consequences (like
> resizing a disk image) like this.
>
> Kay, am I right?

We removed most of them from the default setups, yes. But there is
nothing wrong if people want to ship that in some package or as custom
rules.

It looks fine to me, we would just not add such things to the default
set of of rules these days.

> We already emit KOBJECT_CHANGE events when block devices change, from
> within the block core code.  Why is the patch below needed instead of
> using these events that are already generated?  How are virtio block
> devices special?

I think we only do that for dm and md and a couple of special cases
like loop and read-only settings.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 00/23] printk: refactoring

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 11:46 PM, Joe Perches  wrote:

> It's still a scheduling issue with Kay and Jan's patches.
> Does anyone have any idea if/when those patches are going in?

I was expecting that Jan rebases the patches incorporating
the latest fix. Jan?

It should not hold back anything else that is already ready to merge.
We will manage it to rebase, I think.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 3:18 PM, Michael Kerrisk (man-pages)
 wrote:
> So, just to be clear: you better not apply your patch; it might break
> something ;-).

Sounds good to me. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 2:37 PM, Michael Kerrisk (man-pages)
 wrote:
> On Thu, Nov 29, 2012 at 2:28 PM, Kay Sievers  wrote:
>> On Thu, Nov 29, 2012 at 2:18 PM, Michael Kerrisk (man-pages)
>>  wrote:
>>> On Wed, Nov 28, 2012 at 6:51 PM, Kay Sievers  wrote:
>>
>>>> Before:
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>>>
>>>> After:
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
>>>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0
>>
>>> I'm going to call my report yesterday bogus. Somewhere along the way,
>>> I got confused while testing something, and my statement about 2.6.31
>>> behavior is wrong: the 2.6.31 and 3.5 behaviors are the same. As such,
>>> your patch is unneeded. Sorry for wasting your time.
>>
>> I think you have been right with your report. The above pasted
>> before/after from the patch commit text is actually a result of real
>> testing with current git. And your initial description sounds right,
>> and the patch seems to produce the expected results here. I just
>> confused the numbers in your report and wrongly parsed 2.6 > 3.6.
>>
>> Hmm, at least do far we did not blame anybody else than ourselves as
>> confused. One of us at least is right, and it looks you have been, and
>> I also think the patch is at least intended to be right. :)
>
> Okay -- I'm pretty sure I am right about being wrong ;-).
>
> Could you do some comparative testing please between 3.5 and pre-3.5.
> I have a little test program below. When I rechecked 2.6.31 and 3.5
> using this program I found the behavior was the same, which is why I
> conclude my report is wrong. (And also, your proposed patch in
> response to my bogus report produces different behavior from 2.6.31).

Oh, seems you are right.

The old kernel does not return 0, while it probably should. The
current kernel seems to do the same thing.

But the behaviour with the patch stills seems like the better and the
obvious and expected behaviour to me. :)

Thanks,
Kay

# old fedora kernel (pre-new-logging) ###
# uname -a
Linux nja 3.4.4-3.fc17.x86_64

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 95299

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 29
<4>[   54.933282] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 95299

# current fedora kernel (new logging) 

# uname -a
Linux nja 3.6.6-3.fc18.x86_64

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 286822

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 31
<12>[259301.067699] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 286822

# git kernel with the above patch applied #

# uname -a
Linux nja 3.7.0-rc7+

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 30

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 30
<12>[   69.591745] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 2:18 PM, Michael Kerrisk (man-pages)
 wrote:
> On Wed, Nov 28, 2012 at 6:51 PM, Kay Sievers  wrote:

>> Before:
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>
>> After:
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0

> I'm going to call my report yesterday bogus. Somewhere along the way,
> I got confused while testing something, and my statement about 2.6.31
> behavior is wrong: the 2.6.31 and 3.5 behaviors are the same. As such,
> your patch is unneeded. Sorry for wasting your time.

I think you have been right with your report. The above pasted
before/after from the patch commit text is actually a result of real
testing with current git. And your initial description sounds right,
and the patch seems to produce the expected results here. I just
confused the numbers in your report and wrongly parsed 2.6 > 3.6.

Hmm, at least do far we did not blame anybody else than ourselves as
confused. One of us at least is right, and it looks you have been, and
I also think the patch is at least intended to be right. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, 2012-11-28 at 17:49 +0100, Kay Sievers wrote:
> On Wed, Nov 28, 2012 at 5:37 PM, Linus Torvalds
>  wrote:
> > On Wed, Nov 28, 2012 at 8:22 AM, Kay Sievers  wrote:
> >> On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  
> >> wrote:
> >>
> >>> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> >>> SYSLOG_ACTION_SIZE_UNREAD returns 0.
> >>
> >> Hmm, sounds like the right thing to do.
> >
> > Right.
> >
> > And that's the *OLD* behavior (2.6.31).
> 
> Ah, hmm, I read 2.6... as 3.6... :)
> 
> > So the new behavior is insane and different. Let's fix it.
> 
> Yeah.
> 
> > It looks like it is because the new SYSLOG_ACTION_SIZE_UNREAD code
> > does not take the new clear_seq code into account. Hmm?
> 
> Right, something like that. I'll take a look now ...

From: Kay Sievers 
Subject: printk: respect SYSLOG_ACTION_READ_CLEAR for SYSLOG_ACTION_SIZE_UNREAD

On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  wrote:
> It looks as though the changes here have broken SYSLOG_ACTION_SIZE_UNREAD.
> 
> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> SYSLOG_ACTION_SIZE_UNREAD returns 0.
> 
> On 3.5, immediately after SYSLOG_ACTION_READ_CLEAR, the value returned
> by SYSLOG_ACTION_SIZE_UNREAD is unchanged (i.e., assuming that the
> value returned was non-zero before SYSLOG_ACTION_SIZE_UNREAD, it is
> still nonzero afterward), even though a subsequent
> SYSLOG_ACTION_READ_CLEAR indicates that there are zero bytes to read.

Fix SYSLOG_ACTION_SIZE_UNREAD to return the amount of available
characters by starting to count at the first available record after
the last SYSLOG_ACTION_READ_CLEAR, instead of the first message
record for SYSLOG_ACTION_READ.

Before:
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
  syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965

After:
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
  syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0

Reported-By: Michael Kerrisk 
Signed-Off-By: Kay Sievers 
---
 printk.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 2d607f4..35a7f4f 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1183,12 +1183,10 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
raw_spin_lock_irq(&logbuf_lock);
-   if (syslog_seq < log_first_seq) {
+   if (clear_seq < log_first_seq) {
/* messages are gone, move to first one */
-   syslog_seq = log_first_seq;
-   syslog_idx = log_first_idx;
-   syslog_prev = 0;
-   syslog_partial = 0;
+   clear_seq = log_first_seq;
+   clear_idx = log_first_idx;
}
if (from_file) {
/*
@@ -1198,9 +1196,9 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
 */
error = log_next_idx - syslog_idx;
} else {
-   u64 seq = syslog_seq;
-   u32 idx = syslog_idx;
-   enum log_flags prev = syslog_prev;
+   u64 seq = clear_seq;
+   u32 idx = clear_idx;
+   enum log_flags prev = 0;
 
error = 0;
while (seq < log_next_seq) {
@@ -1211,7 +1209,6 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
seq++;
prev = msg->flags;
}
-   error -= syslog_partial;
}
raw_spin_unlock_irq(&logbuf_lock);
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, Nov 28, 2012 at 5:37 PM, Linus Torvalds
 wrote:
> On Wed, Nov 28, 2012 at 8:22 AM, Kay Sievers  wrote:
>> On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  
>> wrote:
>>
>>> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
>>> SYSLOG_ACTION_SIZE_UNREAD returns 0.
>>
>> Hmm, sounds like the right thing to do.
>
> Right.
>
> And that's the *OLD* behavior (2.6.31).

Ah, hmm, I read 2.6... as 3.6... :)

> So the new behavior is insane and different. Let's fix it.

Yeah.

> It looks like it is because the new SYSLOG_ACTION_SIZE_UNREAD code
> does not take the new clear_seq code into account. Hmm?

Right, something like that. I'll take a look now ...

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  wrote:
> On Thu, May 3, 2012 at 2:29 AM, Kay Sievers  wrote:
>> From: Kay Sievers 
> [...]
>> case SYSLOG_ACTION_SIZE_UNREAD:
>> -   error = log_end - log_start;
>> +   raw_spin_lock_irq(&logbuf_lock);
>> +   if (syslog_seq < log_first_seq) {
>> +   /* messages are gone, move to first one */
>> +   syslog_seq = log_first_seq;
>> +   syslog_idx = log_first_idx;
>> +   }
>> +   if (from_file) {
>> +   /*
>> +* Short-cut for poll(/"proc/kmsg") which simply 
>> checks
>> +* for pending data, not the size; return the count 
>> of
>> +* records, not the length.
>> +*/
>> +   error = log_next_idx - syslog_idx;
>> +   } else {
>> +   u64 seq;
>> +   u32 idx;
>> +
>> +   error = 0;
>> +   seq = syslog_seq;
>> +   idx = syslog_idx;
>> +   while (seq < log_next_seq) {
>> +   error += syslog_print_line(idx, NULL, 0);
>> +   idx = log_next(idx);
>> +   seq++;
>> +   }
>> +   }
>> +   raw_spin_unlock_irq(&logbuf_lock);
>> break;
> [...]
>
> It looks as though the changes here have broken SYSLOG_ACTION_SIZE_UNREAD.

Any specifics that it causes actual problems we need to address?

> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> SYSLOG_ACTION_SIZE_UNREAD returns 0.

Hmm, sounds like the right thing to do.

We have read everything, even cleared the buffer for later queries. So
there is nothing to read anymore for later calls, and they will
actually never return anything if they are called, so returning 0 here
sounds like the right thing. The current SYSLOG_ACTION_SIZE_UNREAD
seems to match properly the expectations one can make for
SYSLOG_ACTION_READ_ALL.

> On 3.5, immediately after SYSLOG_ACTION_READ_CLEAR, the value returned
> by SYSLOG_ACTION_SIZE_UNREAD is unchanged
>
> (i.e., assuming that the
> value returned was non-zero before SYSLOG_ACTION_SIZE_UNREAD, it is
> still nonzero afterward), even though a subsequent
> SYSLOG_ACTION_READ_CLEAR indicates that there are zero bytes to read.

Which sounds at least like weird behaviour, if not "broken".

Any indication that we need to restore the old behaviour to fix some
weird assumptions? To me the current one sounds like the better and
more correct option, and what one would expect from it. But maybe we
cannot get away with it ...

(I hope I understood what you explained correctly, I'm a bit confused
by the issue.)


Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] driver-core: Remove dummy 'platform_bus'

2012-11-23 Thread Kay Sievers

On Thu, Nov 22, 2012 at 10:20 PM, Grant Likely
 wrote:
> On Thu, Nov 22, 2012 at 7:17 PM, Kay Sievers  wrote:
>> On Wed, Nov 21, 2012 at 3:52 PM, Greg Kroah-Hartman 
>>  wrote:

>>> If the devices don't show up under platform/ where are they going to be
>>> at now, virtual/ ?  That doesn't sound like a good plan, they should be
>>> somewhere "useful".
>>
>> Just a note to keep in mind: We usually need and want devices to have
>> a bus or class. Devices without a "subsystem" are invisible to udev,
>> and do not get proper coldplug support at bootup.
>
> Note: this patch is only about the "platform_bus" dummy device. It has
> nothing to do with platform_bus_type.

Ah, I see now.

Why do you want to remove the "platform_bus" fake-parent, entirely? I
understand and agree that drivers should not fiddle with that
directly. But I don't see a real reason why it should not be private
to the platform_bus_type. It's nothing really wrong with it, I guess.

I have on x86:
  coretemp.0 -> ../../../devices/platform/coretemp.0
  dock.0 -> ../../../devices/platform/dock.0
  dock.1 -> ../../../devices/platform/dock.1
  efifb.0 -> ../../../devices/platform/efifb.0
  i8042 -> ../../../devices/platform/i8042
  iTCO_wdt -> ../../../devices/pci:00/:00:1f.0/iTCO_wdt
  pcspkr -> ../../../devices/platform/pcspkr
  serial8250 -> ../../../devices/platform/serial8250
  thinkpad_acpi -> ../../../devices/platform/thinkpad_acpi
  thinkpad_hwmon -> ../../../devices/platform/thinkpad_hwmon

which look pretty reasonable in a "platform parent" instead of ending
up in virtual/.

We should probably remove the explicit assignment in the drivers like
your patch does, unexport "platform_bus" from the core, so nobody can
use it anymore in the future. The /sys/bus/platform/devices/ directory
can then be created on-demand only, with the first registration of a
device without a parent, we do that in other parts of the core
already.

Wouldn't that solve your problem and still do not touch any other
stuff visible in /sys?

Also, it seems there are devices without any subsystem in the list of
things your patch touches? That is really not how things should work.
All devices should have a bus or class, so userspace receives proper
events, and can enumerate them.
The /sys/devices/ hierarchy is not really meant to be used directly,
it can for some devices even change at runtime. It's not considered a
stable or reasonable predictable ABI, all lookups should start in the
flat device symlink lists of the class/ and bus/, and not in the
hierarchical tree in devices/.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/kmsg giving eof on blocking read

2012-11-22 Thread Kay Sievers

On Thu, Nov 22, 2012 at 11:44 AM, Jan Schmidt  wrote:
> I'm currently debugging something in btrfs in good old printk style, 
> generating
> around 10MB/min. I'm seeing /proc/kmsg returning eof on a blocking read (and,
> side note, syslog-ng won't reopen it, effectively stopping logging kernel
> messages silently).

Are you sure there is not something else that opens the same file?
Even once might be enough to return 0. The too simple locking logic in
/proc/kmsg cannot support multiple readers properly, it never did.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] driver-core: Remove dummy 'platform_bus'

2012-11-22 Thread Kay Sievers

On Wed, Nov 21, 2012 at 3:52 PM, Greg Kroah-Hartman
 wrote:
> On Wed, Nov 21, 2012 at 02:44:31PM +, Grant Likely wrote:
>> The "platform_bus" (note: not platform_bus_type) only exists as an empty
>> directory to put platform devices into. However, it really doesn't make
>> sense to segregate all the platform devices into a sub directory when
>> typically they are memory mapped devices that doen't go through any
>> particular bus. Particularly on embedded type platforms the platform_bus
>> directory doesn't add anything.
>>
>> However, this will probably just end up breaking some userspace that
>> depends on the /sys/devices/platform/ path to be present (no matter how
>> much we protest that userspace must not depend on paths in sysfs). So
>> while I'm seriously proposing this change, it may just be unacceptable
>> ABI breakage
>
> If the devices don't show up under platform/ where are they going to be
> at now, virtual/ ?  That doesn't sound like a good plan, they should be
> somewhere "useful".

Just a note to keep in mind: We usually need and want devices to have
a bus or class. Devices without a "subsystem" are invisible to udev,
and do not get proper coldplug support at bootup.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

2012-11-20 Thread Kay Sievers

On Tue, Nov 20, 2012 at 9:42 PM, Kees Cook  wrote:
> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> protects from the case of a privileged process having an arbitrary file
> write flaw and an argumentless arbitrary execution (i.e. it would lack
> the ability to run "mount -o remount,exec,suid /dev").

This only really applies to systems without an initramfs when the
kernel mounts /dev over the rootfs it has mounted; with an initramfs,
/dev is always mounted by user code.

Just checking, that is the use case you are doing that for?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] devtmpfs: mount with noexec and nosuid

2012-11-16 Thread Kay Sievers

On Sat, Nov 17, 2012 at 1:27 AM, Greg Kroah-Hartman
 wrote:
> On Fri, Nov 16, 2012 at 04:20:16PM -0800, Kees Cook wrote:
>> Since devtmpfs is writable, make the default noexec nosuid as well. This
>> protects from the case of a privileged process having an arbitrary file
>> write flaw and an argumentless arbitrary execution (i.e. it would lack
>> the ability to run "mount -o remount,exec,suid /dev"), with a system
>> that already has nosuid,noexec on all other writable mounts.
>>
>> Cc: ellyjo...@chromium.org
>> Signed-off-by: Kees Cook 
>> ---
>>  drivers/base/devtmpfs.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Have you tested this to verify that it doesn't break anything?
>
> Kay, could this cause any problems that you could think of?

It breaks all sorts of old, possibly outdated, stuff, that does things
like mapping /dev/mem executable. It for sure used to break X drivers,
that fiddle with the BIOS of cards.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/12] printk() fixes, optimizations, and clean ups

2012-11-15 Thread Kay Sievers

On Thu, Nov 15, 2012 at 10:22 PM, "Jan H. Schönherr"
 wrote:
> Am 15.11.2012 17:05, schrieb Kay Sievers:
>> This with all your patches applied:
>> [1.032804] input: ImExPS/2 Generic Explorer Mouse as 
>> /devices/platform/i8042/serio1/input/input2
>> [1.063915] List of all partitions:
>> [1.065521] 0800   117220824 sda  driver: sd
>> [1.067444]   0801 1048576 sda1 
>> 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1  08023072 sda2 
>> 084917b7-8be2-4e86-838d-f771a9902e08  08031536 sda3 
>> 180053b6-6697-4f4c-b331-4925773197ff  080454730752 sda4 
>> b67dfc6e-d06f-4b11-bd52-96c09163aca9  08051536 sda5 
>> 6d0d537c-3107-4057-a57b-5379a0cd8e31No filesystem could mount root, tried:  
>> btrfs
>> [1.077120] Kernel panic - not syncing: VFS: Unable to mount root fs on 
>> unknown-block(8,1)
>>
>> Something seems broken in the patches regarding the console or newline logic.
>
> (Just to mention it: I did do quite some testing, but this case must have
> escaped me.)

Yeah, don't tell me, it was not really fun to discover all the weird
edges here because nobody wants to have rules. :)

> Please, try the patch below on top of everything. If this works, I'll prepare
> a -v2 where everything is sorted into the correct patches.

Looks all fine now.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/12] printk() fixes, optimizations, and clean ups

2012-11-15 Thread Kay Sievers

On Wed, 2012-11-14 at 00:12 +0100, Jan H. Schönherr wrote:
> Hi Greg, hi Kay, and all other interested people.
> 
> This series aims at cleaning up and fixing some bugs around printk().
> Patches 9 and 11 might require some discussion, see below.

This is how current git looks like:
[1.062953] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input2
[1.090595] List of all partitions:
[1.091703] 0800   117220824 sda  driver: sd
[1.093054]   0801 1048576 sda1 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1
[1.094662]   08023072 sda2 084917b7-8be2-4e86-838d-f771a9902e08
[1.096624]   08031536 sda3 180053b6-6697-4f4c-b331-4925773197ff
[1.098624]   080454730752 sda4 b67dfc6e-d06f-4b11-bd52-96c09163aca9
[1.100304]   08051536 sda5 6d0d537c-3107-4057-a57b-5379a0cd8e31
[1.101918] No filesystem could mount root, tried:  btrfs
[1.103633] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,1)


This with all your patches applied:
[1.032804] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input2
[1.063915] List of all partitions:
[1.065521] 0800   117220824 sda  driver: sd
[1.067444]   0801 1048576 sda1 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1 
 08023072 sda2 084917b7-8be2-4e86-838d-f771a9902e08  0803
1536 sda3 180053b6-6697-4f4c-b331-4925773197ff  080454730752 sda4 
b67dfc6e-d06f-4b11-bd52-96c09163aca9  08051536 sda5 
6d0d537c-3107-4057-a57b-5379a0cd8e31No filesystem could mount root, tried:  
btrfs
[1.077120] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,1)


Something seems broken in the patches regarding the console or newline logic.

Thanks,
Kay

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-11-01 Thread Kay Sievers

On Mon, Oct 8, 2012 at 9:56 PM, Kay Sievers  wrote:
> On Mon, Oct 8, 2012 at 9:54 PM, "Jan H. Schönherr"

>>> Jan,
>>> any updates, did you try something else?
>>> Or should we merge the first version for now?
>>
>> I'm working on it, though I cannot spend as much time as I want. :)
>>
>> My current version does mostly well for race-printk()s, now. But
>> there's still one issue to resolve and some polishing to do.
>>
>> If we can afford to wait a little longer, we might get a nicer
>> solution (and avoid a possible mostly-revert later).
>
> Sure, no hurry, I was just going through my TODO emails. :)

My TODO is nagging me again. :)

Any updates? No problems if not, but we should merge the current
version then, I think.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vlan: set sysfs device_type to 'vlan'

2012-10-23 Thread Kay Sievers

On Tue, Oct 23, 2012 at 8:36 AM, David Miller  wrote:
> From: Doug Goldstein 
> Date: Mon, 22 Oct 2012 00:53:57 -0500
>
>> Sets the sysfs device_type to 'vlan' for udev. This makes it easier for
>> applications that query network information via udev to identify vlans
>> instead of using strrchr().
>>
>> Signed-off-by: Doug Goldstein 
>
> You're extremely misguided.  This change, in fact, makes it ten times
> harder for such applications to query such devices.

That makes not much sense, really. Every new interface would fall into
that category. At least I can't see any mis-guidance here. The other
devtypes for the major netif types are not that much older.

> Because now the application has to decide whether it wants to support
> EVERY EXISTING SYSTEM OUT THERE or not.  Hundreds of millions of Linux
> systems do not provide this attribute.
>
> Applications have to handle the case of not having the 'vlan' device
> type available attribute essentially forever.

Which is an entirely separate issue, and not a technical reason not to
add new interfaces which are already in use for most other types of
netifs.

> So providing it in new kernels provides zero value whatsoever.

It sure does provide a value. The kernel can efficiently filter
uevents in the socket with this available. All other major types of
netdevs support that too, it's just a matter of completeness. For that
reason, it looks useful to me.

> I'm not applying this patch, sorry.

That's just sad. Not that I really care about that functionality, but
your reasoning is absolutely not transparent.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-10-08 Thread Kay Sievers

On Mon, Oct 8, 2012 at 9:54 PM, "Jan H. Schönherr"
 wrote:
> Am 08.10.2012 21:24, schrieb Kay Sievers:
>> On Fri, Sep 28, 2012 at 4:56 PM, Kay Sievers  wrote:
>>> On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"
>>
>>>> Given that I'm able to fix the racing case, would you be in favor of
>>>> this approach, or should we stick to the earlier version?
>>>
>>> I'm open to everything that makes sense. Let's see how it looks and we
>>> can decide when we have something that passes the tests.
>>
>> Jan,
>> any updates, did you try something else?
>> Or should we merge the first version for now?
>
> I'm working on it, though I cannot spend as much time as I want. :)
>
> My current version does mostly well for race-printk()s, now. But
> there's still one issue to resolve and some polishing to do.
>
> If we can afford to wait a little longer, we might get a nicer
> solution (and avoid a possible mostly-revert later).

Sure, no hurry, I was just going through my TODO emails. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-10-08 Thread Kay Sievers

On Fri, Sep 28, 2012 at 4:56 PM, Kay Sievers  wrote:
> On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"

>> Given that I'm able to fix the racing case, would you be in favor of
>> this approach, or should we stick to the earlier version?
>
> I'm open to everything that makes sense. Let's see how it looks and we
> can decide when we have something that passes the tests.

Jan,
any updates, did you try something else?
Or should we merge the first version for now?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Thu, Oct 4, 2012 at 12:58 AM, Linus Torvalds
 wrote:
> That said, there's clearly enough variation here that I think that for
> now I won't take the step to disable the udev part. I'll do the patch
> to support "direct filesystem firmware loading" using the udev default
> paths, and that hopefully fixes the particular case people see with
> media modules.

If that approach looks like it works out, please aim for full
in-kernel-*only* support. I would absolutely like to get udev entirely
out of the sick game of firmware loading here. I would welcome if we
are not falling back to the blocking timeouted behaviour again.

The whole story would be contained entirely in the kernel, and we get
rid of the rather fragile "userspace transaction" to execute
module_init(), where the kernel has no idea if userspace is even up to
ever responding to its requests.

There would be no coordination with userspace tools needed, which
sounds like a better fit in the way we develop things with the loosely
coupled kernel <-> udev requirements.

If that works out, it would a bit like devtmpfs which turned out to be
very simple, reliable and absolutely the right thing we could do to
primarily mange /dev content.

The whole dance with the fake firmware struct device, which has a 60
second timeout to wait for userspace, is a long story of weird
failures at various aspects.

It would not only solve the unfortunate modprobe lockup with
init=/bin/sh we see here, also big servers with an insane amount of
devices happen to run into the 60 sec timeout, because udev, which
runs with 4000-8000 threads in parallel handling things like 30.000
disks is not scheduled in time to fulfill network card firmware
requests. It would be nice if we don't have that arbitrary timeout at
all.

Having any timeout at all to answer the simple question if a file
stored in the rootfs exists, should be a hint that there is something
really wrong with the model that stuff is done.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 11:05 PM, Greg KH  wrote:

> As for the firmware path, maybe we should
> change that to be modified by userspace (much like /sbin/hotplug was) in
> a proc file so that distros can override the location if they need to.

If that's needed, a CONFIG_FIRMWARE_PATH= with the array of locations
would probably be sufficient.

Like udev's defaults here:
  http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n550

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 10:39 PM, Linus Torvalds
 wrote:
> On Wed, Oct 3, 2012 at 12:50 PM, Greg KH  wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me.  Having udev do firmware loading and tieing it to
>> the driver model may have not been such a good idea so many years ago.
>> Doing it this way makes more sense.
>
> Ok, I wish this had been getting more testing in Linux-next or
> something, but I suspect that what I'll do is to commit this patch
> asap, and then commit another patch that turns off udev firmware
> loading entirely for the synchronous firmware loading case.
>
> Why? Just to get more testing, and seeing if there are reports of
> breakage. Maybe some udev out there has a different search path (or
> because udev runs in a different filesystem namespace or whatever), in
> which case running udev as a fallback would otherwise hide the fact
> that he direct kernel firmware loading isn't working.

> Ok? Comments?

The current udev directory search order is:
  /lib/firmware/updates/$(uname -r)/
  /lib/firmware/updates/
  /lib/firmware/$(uname -r)/
  /lib/firmware/

There is no commonly known /firmware directory.

http://cgit.freedesktop.org/systemd/systemd/tree/src/udev/udev-builtin-firmware.c#n100
http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n548

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 6:57 PM, Greg KH  wrote:

>> It's the same in the current release, we still haven't wrapped our
>> head around how to fix it/work around it.
>
> Ick, as this is breaking people's previously-working machines, shouldn't
> this be resolved quickly?

Nothing really "breaks", It's "slow" and it will surely be fixed when
we know what's the right fix, which we haven't sorted out at this
moment.

> module_init() can do lots of "bad" things, sleeping, asking for
> firmware, and lots of other things.  To have userspace block because of
> this doesn't seem very wise.

Not saying that it is right or nice, but it's the kernel itself that
blocks. Run init=/bin/sh and do a modprobe of one of these drivers and
it hangs un-interruptible until the kernel's internal firmware loading
request times out, just because userspace is not there.

> But previously this all "just worked" as we ran 'modprobe' in a new
> thread/process right?

No, we used to un-queue events which had a timeout specified in the
environment, that code caused other issues and was removed.

> it can do without worrying about stopping anything else in the system that 
> might
> want to happen at the same time (like load multiple modules in a row).

It should not be an issue, the serialization is strictly parent <->
child, everything else runs in parallel.

>> If that unfortunate module_init() lockup can't be solved properly in
>> the kernel, we need to find out if we need to make the modprobe
>> handling in udev async, or let firmware events bypass dependency
>> resolving. As mentioned, we haven't decided as of now which road to
>> take here.
>
> It's not a lockup, there have never been rules about what a driver could
> and could not do in its module_init() function.  Sure, there are some
> not-nice drivers out there, but don't halt the whole system just because
> of them.

It is a kind of lock up, just try modprobe with the init=/bin/sh boot.

> I recommend making module loading async, like it used to be, and then
> all should be fine, right?

That's the current idea, and Tom is looking into it how it could look like.

I also have no issues at all if the kernel does load the firmware from
the filesystem on its own; it sounds like the simplest and most robust
solution from a general look at the problem. It would also make the
difference between in-kernel firmware and out-of-kernel firmware less
visible, which sounds good.
Honestly, requiring firmware-loading userspace-transactions to
successfully link a module into the kernel sounds like a pretty bad
idea to start with. Unlike module loading which needs the depmod alias
database and userspace configuration; with firmware loading, there is
no policy involved where userspace would add any single additional
value to that step.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 12:12 AM, Greg KH  wrote:

> Mauro, what version of udev are you using that is still showing this
> issue?
>
> Kay, didn't you resolve this already?  If not, what was the reason why?

It's the same in the current release, we still haven't wrapped our
head around how to fix it/work around it.

Unlike what the heated and pretty uncivilized and rude emails here
claim, udev does not dead-lock or "break" things, it's just "slow".
The modprobe event handling runs into a ~30 second event timeout.
Everything is still fully functional though, there's only this delay.

Udev ensures full dependency resolution between parent and child
events. Parent events have to finish the event handling and have to
return, before child event handlers are started. We need to ensure
such things so that (among other things) disk events have finished
their operations before the partition events are started, so they can
rely and access their fully set up parent devices.

What happens here is that the module_init() call blocks in a userspace
transaction, creating a child event that is not started until the
parent event has finished. The event handler for modprobe times out
then the child event loads the firmware.

Having kernel module relying on a running and fully functional
userspace to return from module_init() is surely a broken driver
model, at least it's not how things should work. If userspace does not
respond to firmware requests, module_init() locks up until the
firmware timeout happens.

This all is not so much about how probe() should behave, it's about a
fragile dependency on a specific userspace transaction to link a
loadable module into the kernel. Drivers should avoid such loops for
many reasons. Also, it's unclear in many cases how such a model should
work at all if the module is compiled in and initialized when no
userspace is running.

If that unfortunate module_init() lockup can't be solved properly in
the kernel, we need to find out if we need to make the modprobe
handling in udev async, or let firmware events bypass dependency
resolving. As mentioned, we haven't decided as of now which road to
take here.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-28 Thread Kay Sievers

On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"
 wrote:
> Am 28.09.2012 16:34, schrieb Kay Sievers:
>> On Fri, Sep 28, 2012 at 10:25 AM, Jan H. Schönherr

>> The current behaviour has the advantage, that non-cont users will not
>> race against a cont user (which is like 99.x% of the races I expect).
>> The cont buffer is currently only used when we expect a cont user,
>> non-cont users happening in the middle of a cont-print will not flush
>> the and disturb the cont buffer.
>
> That should be fixable by using a second set of flags, owner, and so on
> within vprintk... I still think, that getting rid of of remotely tracking
> the flags is worth something.

Sure, sounds fine in theory.

> (Ideally, we should also be able to correctly reassemble multiple
> simultaneous cont users. But that it still a bit out of scope I think.)

Yeah, that's messy, and not really worth it, I guess. We enter per-cpu
data and in_interrupt() territory with that, which is not really worth
the complexity for cont printing. The one separated cont buffer works
pretty well already.

> Given that I'm able to fix the racing case, would you be in favor of
> this approach, or should we stick to the earlier version?

I'm open to everything that makes sense. Let's see how it looks and we
can decide when we have something that passes the tests.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-28 Thread Kay Sievers

On Fri, Sep 28, 2012 at 10:25 AM, Jan H. Schönherr
 wrote:

>> If really, really everything passes through vprintk_emit()
>> then we could keep all info about the previous message
>> there and definitely decide whether the current message continues
>> the previous one.
>>
>> Then, we wouldn't need to track the previous flags everywhere.
>
> Here is a patch that does just that.
>
> Seems to work. And it makes the code easier to understand.
> A more detailed description is below.

> -   if (!(lflags & LOG_NEWLINE)) {
> -   /*
> -* Flush the conflicting buffer. An earlier newline was 
> missing,
> -* or another task also prints continuation lines.
> -*/
> -   if (cont.len && (lflags & LOG_PREFIX || cont.owner != 
> current))
> -   cont_flush(LOG_NEWLINE);
> +   cont_add(facility, level, lflags, dict, dictlen, text, text_len);

That fails the racing task test, and a cont user that was nicely
merged before is now all in separate records.

It seems, unconditionally using the cont buffer like in your patch,
for all incoming messages just makes the entire cont merge buffer
dance useless when it comes to races.

The current behaviour has the advantage, that non-cont users will not
race against a cont user (which is like 99.x% of the races I expect).
The cont buffer is currently only used when we expect a cont user,
non-cont users happening in the middle of a cont-print will not flush
the and disturb the cont buffer.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-27 Thread Kay Sievers

On Thu, Sep 27, 2012 at 5:46 PM, "Jan H. Schönherr"
 wrote:
> Am 27.09.2012 15:39, schrieb Kay Sievers:

>> It is a flag that we have not been able to merge a continuation line
>> in the buffer, because we had a race with another thread, or the
>> console lock was taken for a long time and we couldn't use the merge
>> buffer.
>
> But it is also set, if we don't know yet, whether there is going to
> be a continuation (printk.c, line 1583):
>
> log_store(facility, level, lflags | LOG_CONT, 0,
>   dict, dictlen, text, text_len);
>
> This confuses devkmsg_read() and msg_print_text() later on.

Yeah, I can see that here too. Tested your patch and seems to behave
well with a bunch of other tests I still have available. Looks good
and worth to try:
  Tested-By: Kay Sievers 

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-27 Thread Kay Sievers

On Thu, Sep 27, 2012 at 12:33 AM, "Jan H. Schönherr"
 wrote:
> Am 26.09.2012 23:15, schrieb Greg Kroah-Hartman:
>> On Wed, Sep 26, 2012 at 07:58:45PM +0200, Jan H. Schönherr wrote:
>>> Against v3.6-rc7, only lightly tested.
>>
>> Well, against linux-next and highly tested would be best.  It's a bit
>> late to get this into linux-next for 3.7, how important is it really?
>
> There are no conflicting commits in linux-next, so it should apply there
> as well.
>
> "Tested" as in: it fixes my use case: multiple printk()s shortly after each
> other -- with KERN_prefix but without a newline at the end. Those were
> sometimes concatenated since that printk-rewrite.

Please provide the output of /dev/kmsg so we can see what's going on here.

> All other printk()s that I come across more often look as usual, before and
> after the patch. (Mostly singular printk()s, but I also checked the output
> from the oom-killer.)
>
> There is no need to include this hastily -- at least not from my point of view
> -- as it is already broken in 3.5 and nobody else seems to notice it
> (... and I have now a fix for my development printk()s). Should I resend the
> patch later?
>
> I was also hoping that Kay might share his opinion, as the LOG_CONT
> flag is rather young, and he might have some different plans for it.

It is a flag that we have not been able to merge a continuation line
in the buffer, because we had a race with another thread, or the
console lock was taken for a long time and we couldn't use the merge
buffer.

LOG_CONT is used to merge (not intended to be) separate records at
time we read them from the record buffer, and print them, it is also
exported as a flag in /dev/kmsg.

I don't think we can just remove it, we can not get that information
from the prefix+newlines, they are not sufficient.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] cgroup TODOs

2012-09-14 Thread Kay Sievers

On Fri, Sep 14, 2012 at 9:29 PM, Tejun Heo  wrote:
> On Fri, Sep 14, 2012 at 09:58:30AM -0400, Vivek Goyal wrote:
>> I am little concerned about above and wondering how systemd and libvirt
>> will interact and behave out of the box.
>>
>> Currently systemd does not create its own hierarchy under blkio and
>> libvirt does. So putting all together means there is no way to avoid
>> the overhead of systemd created hierarchy.
>>
>> \
>> |
>> +- system
>>  |
>>  +- libvirtd.service
>>   |
>>   +- virt-machine1
>>   +- virt-machine2
>>
>> So there is now way to avoid the overhead of two levels of hierarchy
>> created by systemd. I really wish that systemd gets rid of "system"
>> cgroup and puts services directly in top level group. Creating deeper
>> hieararchices is expensive.

The idea here is to split equally between the "system" and the "user"s
at that level.

That all can be re-considered and changed if really needed, but it's
not an unintentionally created directory.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev 182: response timeout for request_firmware in module_probe path

2012-08-23 Thread Kay Sievers

On Thu, Aug 23, 2012 at 5:16 PM, Ming Lei  wrote:
> On Tue, Aug 21, 2012 at 1:34 PM, Ming Lei  wrote:

>> I found that udev 182 doesn't response for the request_firmware in
>> module_probe path until 30sec later after the 'ADD' event of firmware
>> device, but no such problem in udev175, sounds like a regression of udev?
>
> Looks udevd is capable of handling the firmware ADD event even though
> the device ADD event is being processed( modprobe is triggered by device
> ADD event).

Calling out from inside the kernel and blocking in a firmware loading
userspace transaction during module_init() is kind of weird.

The firmware loading call should not rely on a fully functional
userspace, and modprobe should not hang and block until the firmware
request is handled.

The firmware should be requested asynchronously or from a different
context as module_init(). It depends on the type of driver/hardware
what's the best approach here.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 428 matches

Mail list logo