Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

2019-03-07 Thread Vojtech Pavlik
On Thu, Mar 07, 2019 at 11:36:18PM +0800, Coly Li wrote:
> On 2019/3/7 11:06 下午, Shile Zhang wrote:
> > 
> > On 2019/3/7 18:34, Coly Li wrote:
> >> On 2019/3/7 1:15 下午, shile.zh...@linux.alibaba.com wrote:
> >>> From: Shile Zhang 
> >>>
> >>> Read /sys/fs/bcache//cacheN/priority_stats can take very long
> >>> time with huge cache after long run.
> >>>
> >>> Signed-off-by: Shile Zhang 
> >> Hi Shile,
> >>
> >> Do you test your change ? It will be helpful with more performance data
> >> (what problem that you improved).
> > 
> > In case of 960GB SSD cache device, once read of the 'priority_stats'
> > costs about 600ms in our test environment.
> > 
> 
> After the fix, how much time it takes ?
> 
> 
> > The perf tool shown that near 50% CPU time consumed by 'sort()', this
> > means once sort will hold the CPU near 300ms.
> > 
> > In our case, the statistics collector reads the 'priority_stats'
> > periodically, it will trigger the schedule latency jitters of the
> > 
> > task which shared same CPU core.
> > 
> 
> Hmm, it seems you just make the sort slower, and nothing more changes.
> Am I right ?

Well, it has to make the sort slower, but it'll also avoid hogging the
CPU (on a non-preemptible kernel), avoiding a potential soft lockup
warning and allowing other tasks to run.

-- 
Vojtech Pavlik
VP SUSE Labs


Re: [rfd] saving old mice -- button glitching/debouncing

2018-12-15 Thread Vojtech Pavlik
On Sat, Dec 15, 2018 at 10:47:22AM +0100, Pavel Machek wrote:

> > > b) would it be acceptable if done properly? (cmd line option to
> > > enable, avoiding duplicate/wrong events?)
> > 
> > Well, for one, you shouldn't be using a timer, all the debouncing can be
> > done by math on the event timestamps.
> 
> Not... really, right? You need to send an release some time after
> button indicates release if bounce did not happen. It is similar to
> autorepeat needing a timer.

You can send the first release and ignore all presses and releases in a
time window after that.

> Let me gain some experience with the patch. I don't think hardware
> does as heavy debouncing as you describe.

Microswitches vibrate significantly when clicked. Without hardware
debouncing, you'd have many clicks instead of one even on a new mouse.


-- 
Vojtech Pavlik


Re: [rfd] saving old mice -- button glitching/debouncing

2018-12-15 Thread Vojtech Pavlik
On Sat, Dec 15, 2018 at 12:24:37AM +0100, Pavel Machek wrote:

> Patch is obviously not ready; but:
> 
> a) would it be useful to people

Probably not.

Mice already do internal software/hardware debouncing on switches. If you
see duplicate clicks making it all the way to the kernel driver, something
is very wrong with the switch, to the point where it'll soon fail
completely.

Hence the value of doing extra processing in the kernel is very limited.

> b) would it be acceptable if done properly? (cmd line option to
> enable, avoiding duplicate/wrong events?)

Well, for one, you shouldn't be using a timer, all the debouncing can be
done by math on the event timestamps.

You'd also have a hard time distinguishing between doubleclicks and bounces.

Since the mice already filter the quick bounces and there is significant
delay (100ms on USB, similar on PS/2) between updates from the hardware, a
real doubleclick can look absolutely the same as what you observe as a
bounce.

As such, it couldn't be enabled by default.

If you really want to go this way, a userspace solution is the better choice.

-- 
Vojtech Pavlik


Re: [PATCH] Input: iforce - Add the Saitek R440 Force Wheel

2018-11-14 Thread Vojtech Pavlik
On Wed, Nov 14, 2018 at 10:52:40AM +0100, Tim Schumacher wrote:
> Signed-off-by: Tim Schumacher 
> ---
> Please note that I do NOT own this device.
> 
> I'm adding this based on the fact that this is an iforce-based
> device and that the Windows driver for the R440 works for the
> Logitech WingMan Formula Force after replacing the device/vendor
> IDs (I got the vendor/device IDs from there as well).
> 
> Please don't add this patch if adding devices based on that is
> not ok.
> 
> Also (not related to this patch specifically), does anyone know
> what the question marks at the end of some device definitions
> are supposed to mean?

I believe those are unconfirmed devices based on similar reasoning to yours.

> ---
>  drivers/input/joystick/iforce/iforce-main.c | 1 +
>  drivers/input/joystick/iforce/iforce-usb.c  | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/input/joystick/iforce/iforce-main.c 
> b/drivers/input/joystick/iforce/iforce-main.c
> index 58d5cfe46526..432deecaeff9 100644
> --- a/drivers/input/joystick/iforce/iforce-main.c
> +++ b/drivers/input/joystick/iforce/iforce-main.c
> @@ -67,6 +67,7 @@ static struct iforce_device iforce_device[] = {
>   { 0x05ef, 0x, "AVB Top Shot Force Feedback Racing Wheel",   
> btn_wheel, abs_wheel, ff_iforce }, //?
>   { 0x061c, 0xc0a4, "ACT LABS Force RS",  
> btn_wheel, abs_wheel, ff_iforce }, //?
>   { 0x061c, 0xc084, "ACT LABS Force RS",  
> btn_wheel, abs_wheel, ff_iforce },
> + { 0x06a3, 0xff04, "Saitek R440 Force Wheel",
> btn_wheel, abs_wheel, ff_iforce }, //?
>   { 0x06f8, 0x0001, "Guillemot Race Leader Force Feedback",   
> btn_wheel, abs_wheel, ff_iforce }, //?
>   { 0x06f8, 0x0001, "Guillemot Jet Leader Force Feedback",
> btn_joystick, abs_joystick_rudder, ff_iforce },
>   { 0x06f8, 0x0004, "Guillemot Force Feedback Racing Wheel",  
> btn_wheel, abs_wheel, ff_iforce }, //?
> diff --git a/drivers/input/joystick/iforce/iforce-usb.c 
> b/drivers/input/joystick/iforce/iforce-usb.c
> index 78073259c9a1..cdea61ae838d 100644
> --- a/drivers/input/joystick/iforce/iforce-usb.c
> +++ b/drivers/input/joystick/iforce/iforce-usb.c
> @@ -214,6 +214,7 @@ static const struct usb_device_id iforce_usb_ids[] = {
>   { USB_DEVICE(0x05ef, 0x) }, /* AVB Top Shot FFB Racing 
> Wheel */
>   { USB_DEVICE(0x061c, 0xc0a4) }, /* ACT LABS Force RS */
>   { USB_DEVICE(0x061c, 0xc084) }, /* ACT LABS Force RS */
> + { USB_DEVICE(0x06a3, 0xff04) }, /* Saitek R440 Force Wheel */
>   { USB_DEVICE(0x06f8, 0x0001) }, /* Guillemot Race Leader Force 
> Feedback */
>   { USB_DEVICE(0x06f8, 0x0003) }, /* Guillemot Jet Leader Force 
> Feedback */
>   { USB_DEVICE(0x06f8, 0x0004) }, /* Guillemot Force Feedback 
> Racing Wheel */
> -- 
> 2.19.1.450.ga4b8ab536
> 

-- 
Vojtech Pavlik
Director SUSE Labs


Re: [PATCH] Input: stop telling users to snail-mail Vojtech

2018-07-26 Thread Vojtech Pavlik
On Tue, Jul 24, 2018 at 11:45:54AM -0700, Dmitry Torokhov wrote:

> I do not think Vojtech wants snail mail these days, even if the
> address were correct (I do not know if it is), so let's remove
> snail-mail instructions from the sources.
 
Thanks. Nobody has ever sent me mail this way, but nevertheless, the
address isn't valid anymore, so removing it makes good sense.

-- 
Vojtech Pavlik
Director SUSE Labs


Re: bcache with existing ext4 filesystem

2017-07-25 Thread Vojtech Pavlik
On Tue, Jul 25, 2017 at 08:43:04AM +0200, Pavel Machek wrote:
> On Tue 2017-07-25 00:51:56, Theodore Ts'o wrote:
> > On Mon, Jul 24, 2017 at 10:04:51PM +0200, Pavel Machek wrote:
> > > Question for you was... Is the first 1KiB of each ext4 filesystem still
> > > free and "reserved for a bootloader"?
> > 
> > Yes.
> 
> Thanks.
> 
> > > If I needed more for bcache superblock (8KiB, IIRC), would that be
> > > easy to accomplish on existing filesystem?
> > 
> > Huh?  Why would the bcache superblock matter when you're talking about
> > the ext4 layout?  The bcache superblock will be on the bcache
> > device/partition, and the ext4 superblock will be on the ext4
> > device/partition.
> 
> I'd like to enable bcache on already existing ext4 partition. AFAICT
> normal situation, even on the backing device, is:
> 
> | 8KiB bcache superblock | 1KiB reserved | ext4 superblock | 400GB data |
> 
> Unfortunately, that would mean shifting 400GB data 8KB forward, and
> compatibility problems. So I'd prefer adding bcache superblock into
> the reserved space, so I can have caching _and_ compatibility with
> grub2 etc (and avoid 400GB move):

The common way to do that is to move the beginning of the partition,
assuming your ext4 lives in a partition.

I don't see how overlapping the ext4 and the bcache backing device
starts would give you what you want, because bcache assumes the
backing device data starts with an offset.

-- 
Vojtech Pavlik


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Vojtech Pavlik
On Sun, May 07, 2017 at 04:48:36PM -0500, Josh Poimboeuf wrote:

> > Can objtool verify the unwinder at each address in the kernel, or is that 
> > an AI-complete problem?
> 
> It can't verify the *unwinder*, but it can verify the data which is fed
> to the unwinder (either DWARF or the structs I proposed above).  For
> each function, it follows every possible code path, and it can keep
> track of the stack pointer while doing so.

In that case, the kernel build process can verify the DWARF data and its
compatibility with the kernel unwinder by running the unwinder against
each kernel code address verifying the output and bail if there is a bug
in the toolchain that affects it.

-- 
Vojtech Pavlik
Director SuSE Labs


Re: [PATCH] Input: joystick: gf2k - change msleep to usleep_range for small msecs

2016-11-28 Thread Vojtech Pavlik
On Tue, Nov 29, 2016 at 01:11:49AM +0530, Aniroop Mathur wrote:

> msleep(1~20) may not do what the caller intends, and will often sleep longer.
> (~20 ms actual sleep for any value given in the 1~20ms range)
> This is not the desired behaviour for many cases like device resume time,
> device suspend time, device enable time, connection time, probe time,
> loops, retry logic, etc
> msleep is built on jiffies / legacy timers which are not precise whereas
> usleep_range is build on top of hrtimers so the wakeups are precise.
> Thus, change msleep to usleep_range for precise wakeups.
> 
> For example:
> On a machine with tick rate / HZ as 100, msleep(4) will make the process to
> sleep for a minimum period of 10 ms whereas usleep_range(4000, 4100) will make
> sure that the process does not sleep for more than 4100 us or 4.1ms

And once more, patch not needed.

> 
> Signed-off-by: Aniroop Mathur 
> ---
>  drivers/input/joystick/gf2k.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/input/joystick/gf2k.c b/drivers/input/joystick/gf2k.c
> index 0f519db..e9d5095 100644
> --- a/drivers/input/joystick/gf2k.c
> +++ b/drivers/input/joystick/gf2k.c
> @@ -42,7 +42,7 @@ MODULE_LICENSE("GPL");
>  
>  #define GF2K_START   400 /* The time we wait for the first bit 
> [400 us] */
>  #define GF2K_STROBE  40  /* The time we wait for the first bit 
> [40 us] */
> -#define GF2K_TIMEOUT 4   /* Wait for everything to settle [4 ms] 
> */
> +#define GF2K_TIMEOUT 4000/* Wait for everything to settle [4000 
> us] */
>  #define GF2K_LENGTH  80  /* Max number of triplets in a packet */
>  
>  /*
> @@ -138,7 +138,7 @@ static void gf2k_trigger_seq(struct gameport *gameport, 
> short *seq)
>   i = 0;
>  do {
>   gameport_trigger(gameport);
> - t = gameport_time(gameport, GF2K_TIMEOUT * 1000);
> + t = gameport_time(gameport, GF2K_TIMEOUT);
>   while ((gameport_read(gameport) & 1) && t) t--;
>  udelay(seq[i]);
>  } while (seq[++i]);
> @@ -259,11 +259,11 @@ static int gf2k_connect(struct gameport *gameport, 
> struct gameport_driver *drv)
>  
>   gf2k_trigger_seq(gameport, gf2k_seq_reset);
>  
> - msleep(GF2K_TIMEOUT);
> + usleep_range(GF2K_TIMEOUT, GF2K_TIMEOUT + 100);
>  
>   gf2k_trigger_seq(gameport, gf2k_seq_digital);
>  
> - msleep(GF2K_TIMEOUT);
> + usleep_range(GF2K_TIMEOUT, GF2K_TIMEOUT + 100);
>  
>   if (gf2k_read_packet(gameport, GF2K_LENGTH, data) < 12) {
>   err = -ENODEV;
> -- 
> 2.6.2
> 

-- 
Vojtech Pavlik


Re: [PATCH] Input: joystick: analog - change msleep to usleep_range for small msecs

2016-11-28 Thread Vojtech Pavlik
On Tue, Nov 29, 2016 at 01:11:31AM +0530, Aniroop Mathur wrote:

> msleep(1~20) may not do what the caller intends, and will often sleep longer.
> (~20 ms actual sleep for any value given in the 1~20ms range)
> This is not the desired behaviour for many cases like device resume time,
> device suspend time, device enable time, connection time, probe time,
> loops, retry logic, etc
> msleep is built on jiffies / legacy timers which are not precise whereas
> usleep_range is build on top of hrtimers so the wakeups are precise.
> Thus, change msleep to usleep_range for precise wakeups.
> 
> For example:
> On a machine with tick rate / HZ as 100, msleep(3) will make the process to
> sleep for a minimum period of 10 ms whereas usleep_range(3000, 3100) will make
> sure that the process does not sleep for more than 3100 us or 3.1ms

Again, not needed, if the MAX_TIME sleeps are longer, nobody cares.

> 
> Signed-off-by: Aniroop Mathur 
> ---
>  drivers/input/joystick/analog.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
> index 3d8ff09..2891704 100644
> --- a/drivers/input/joystick/analog.c
> +++ b/drivers/input/joystick/analog.c
> @@ -88,7 +88,7 @@ MODULE_PARM_DESC(map, "Describes analog joysticks 
> type/capabilities");
>  #define ANALOG_EXTENSIONS0x7ff00
>  #define ANALOG_GAMEPAD   0x8
>  
> -#define ANALOG_MAX_TIME  3   /* 3 ms */
> +#define ANALOG_MAX_TIME  3000/* 3000 us */
>  #define ANALOG_LOOP_TIME 2000/* 2 * loop */
>  #define ANALOG_SAITEK_DELAY  200 /* 200 us */
>  #define ANALOG_SAITEK_TIME   2000/* 2000 us */
> @@ -257,7 +257,7 @@ static int analog_cooked_read(struct analog_port *port)
>   int i, j;
>  
>   loopout = (ANALOG_LOOP_TIME * port->loop) / 1000;
> - timeout = ANALOG_MAX_TIME * port->speed;
> + timeout = (ANALOG_MAX_TIME / 1000) * port->speed;
>  
>   local_irq_save(flags);
>   gameport_trigger(gameport);
> @@ -625,20 +625,20 @@ static int analog_init_port(struct gameport *gameport, 
> struct gameport_driver *d
>  
>   gameport_trigger(gameport);
>   t = gameport_read(gameport);
> - msleep(ANALOG_MAX_TIME);
> + usleep_range(ANALOG_MAX_TIME, ANALOG_MAX_TIME + 100);
>   port->mask = (gameport_read(gameport) ^ t) & t & 0xf;
>   port->fuzz = (port->speed * ANALOG_FUZZ_MAGIC) / port->loop / 
> 1000 + ANALOG_FUZZ_BITS;
>  
>   for (i = 0; i < ANALOG_INIT_RETRIES; i++) {
>   if (!analog_cooked_read(port))
>   break;
> - msleep(ANALOG_MAX_TIME);
> + usleep_range(ANALOG_MAX_TIME, ANALOG_MAX_TIME + 100);
>   }
>  
>   u = v = 0;
>  
> - msleep(ANALOG_MAX_TIME);
> - t = gameport_time(gameport, ANALOG_MAX_TIME * 1000);
> + usleep_range(ANALOG_MAX_TIME, ANALOG_MAX_TIME + 100);
> +     t = gameport_time(gameport, ANALOG_MAX_TIME);
>   gameport_trigger(gameport);
>   while ((gameport_read(port->gameport) & port->mask) && (u < t))
>   u++;
> -- 
> 2.6.2
> 

-- 
Vojtech Pavlik


Re: [PATCH] Input: joystick: sidewinder - change msleep to usleep_range for small msecs

2016-11-28 Thread Vojtech Pavlik
tal */
> - msleep(SW_TIMEOUT);
> + usleep_range(SW_TIMEOUT, SW_TIMEOUT + 100);
>   i = sw_read_packet(gameport, buf, SW_LENGTH, 0);/* 
> Retry reading packet */
> - msleep(SW_TIMEOUT);
> + usleep_range(SW_TIMEOUT, SW_TIMEOUT + 100);
>   dbg("Init 1b: Length %d.", i);
>   if (!i) {   /* No 
> data -> FAIL */
>   err = -ENODEV;
> @@ -636,7 +636,7 @@ static int sw_connect(struct gameport *gameport, struct 
> gameport_driver *drv)
>   dbg("Init 2: Mode %d. ID Length %d.", m, j);
>  
>   if (j <= 0) {   /* Read 
> ID failed. Happens in 1-bit mode on PP */
> - msleep(SW_TIMEOUT);
> + usleep_range(SW_TIMEOUT, SW_TIMEOUT + 100);
>   i = sw_read_packet(gameport, buf, SW_LENGTH, 0);/* 
> Retry reading packet */
>   m |= sw_guess_mode(buf, i);
>   dbg("Init 2b: Mode %d. Length %d.", m, i);
> @@ -644,7 +644,7 @@ static int sw_connect(struct gameport *gameport, struct 
> gameport_driver *drv)
>   err = -ENODEV;
>   goto fail2;
>   }
> - msleep(SW_TIMEOUT);
> + usleep_range(SW_TIMEOUT, SW_TIMEOUT + 100);
>   j = sw_read_packet(gameport, idbuf, SW_LENGTH, i);  /* 
> Retry reading ID */
>       dbg("Init 2c: ID Length %d.", j);
>   }
> @@ -655,7 +655,7 @@ static int sw_connect(struct gameport *gameport, struct 
> gameport_driver *drv)
>  
>   do {
>   k--;
> - msleep(SW_TIMEOUT);
> + usleep_range(SW_TIMEOUT, SW_TIMEOUT + 100);
>   i = sw_read_packet(gameport, buf, SW_LENGTH, 0);/* Read 
> data packet */
>   dbg("Init 3: Mode %d. Length %d. Last %d. Tries %d.", m, i, l, 
> k);
>  
> -- 
> 2.6.2
> 

-- 
Vojtech Pavlik


Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

2016-04-05 Thread Vojtech Pavlik
On Mon, Apr 04, 2016 at 01:33:07PM -0500, Josh Poimboeuf wrote:
> On Mon, Apr 04, 2016 at 08:27:59PM +0200, Vojtech Pavlik wrote:
> > On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:
> > 
> > > Hm, can you explain why it should be copied from the parent?
> > > 
> > > I'm thinking the above code is correct for today, but it should still be
> > > changed to be more future-proof.
> > > 
> > > Here's my thinking:
> > > 
> > > A forked task starts out with no stack, so if I understand correctly, it
> > > can safely start out in the goal universe, regardless of which universe
> > > its parent belongs to.
> > > 
> > > However, the current ret_from_fork code is a mess, and Andy Lutomirski
> > > has mentioned that he would like to give newly forked tasks a proper
> > > stack such that instead of jumping to ret_from_fork, they would just
> > > return from schedule().  In that case, it would no longer be safe to
> > > start the new task in the goal universe because it could be "sleeping"
> > > on a to-be-patched function.
> > > 
> > > So for proper future proofing, newly forked tasks should be started in
> > > the initial universe (rather than starting in the goal universe or
> > > inheriting the parent's universe).  They can then be transitioned over
> > > to the goal universe like any other task.  How does that sound?
> > 
> > How could a newly forked task start in the old universe if its parent
> > has already been migrated? Any context it inherits will already be from
> > the new universe.
> 
> Can you be more specific about "context" and why inheritance of it would
> be a problem?

Currently a forked task starts out with no stack, and as such it can
start in the goal universe.

If we create a synthetic stack, then we may need to start in the initial
universe, as the synthetic stack would likely be created using initial
universe return addresses. 

If we simply copy the stack of the parent process, which is in my
opionion the safest way, as it places little assumptions on the
compiler, then it may contain either old or new addresses
and we need to copy the universe flag along.

-- 
Vojtech Pavlik
Director SUSE Labs


Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

2016-04-04 Thread Vojtech Pavlik
On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:

> Hm, can you explain why it should be copied from the parent?
> 
> I'm thinking the above code is correct for today, but it should still be
> changed to be more future-proof.
> 
> Here's my thinking:
> 
> A forked task starts out with no stack, so if I understand correctly, it
> can safely start out in the goal universe, regardless of which universe
> its parent belongs to.
> 
> However, the current ret_from_fork code is a mess, and Andy Lutomirski
> has mentioned that he would like to give newly forked tasks a proper
> stack such that instead of jumping to ret_from_fork, they would just
> return from schedule().  In that case, it would no longer be safe to
> start the new task in the goal universe because it could be "sleeping"
> on a to-be-patched function.
> 
> So for proper future proofing, newly forked tasks should be started in
> the initial universe (rather than starting in the goal universe or
> inheriting the parent's universe).  They can then be transitioned over
> to the goal universe like any other task.  How does that sound?

How could a newly forked task start in the old universe if its parent
has already been migrated? Any context it inherits will already be from
the new universe.

-- 
Vojtech Pavlik
Director SuSE Labs


Re: [PATCH] livepatch: Update maintainers

2016-03-18 Thread Vojtech Pavlik
On Wed, Mar 16, 2016 at 10:03:36AM -0500, Josh Poimboeuf wrote:

> Seth and Vojtech are no longer active maintainers of livepatch, so
> remove them in favor of Jessica and Miroslav.
> 
> Also add Petr as a designated reviewer.
> 
> Signed-off-by: Josh Poimboeuf 
> ---
>  MAINTAINERS | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 860e306..e04e0a5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6583,9 +6583,10 @@ F: drivers/platform/x86/hp_accel.c
>  
>  LIVE PATCHING
>  M:   Josh Poimboeuf 
> -M:   Seth Jennings 
> +M:   Jessica Yu 
>  M:   Jiri Kosina 
> -M:   Vojtech Pavlik 
> +M:   Miroslav Benes 
> +R:   Petr Mladek 
>  S:   Maintained
>  F:   kernel/livepatch/
>  F:   include/linux/livepatch.h
> -- 
> 2.4.3

Acked-by: Vojtech Pavlik 

-- 
Vojtech Pavlik
Director SuSE Labs


[PATCH] bcache: Fix writeback_thread never writing back incomplete stripes.

2015-09-05 Thread Vojtech Pavlik
Fix writeback_thread never finishing writing back all dirty data in bcache when
partial_stripes_expensive is set, and spinning, consuming 100% of CPU instead.

Signed-off-by: Vojtech Pavlik 
---

This is a fix for the current upstream bcache, not the devel branch.

If partial_stripes_expensive is set for a cache set, then writeback_thread
always attempts to write full stripes out back to the backing device first.
However, since it does that based on a bitmap and not a simple linear
search, like the rest of the code of refill_dirty(), it changes the
last_scanned pointer so that never points to 'end'. refill_dirty() then
never tries to scan from 'start', resulting in the writeback_thread
looping, consuming 100% of CPU, but never making progress in writing out
the incomplete dirty stripes in the cache.

Scanning the tree after not finding enough full stripes fixes the issue.

Incomplete dirty stripes are written to the backing device, the device
eventually reaches a clean state if there is nothing dirtying data and
writeback_thread sleeps. This also fixes the problem of the cache device
not being possible to detach in the partial_stripes_expensive scenario.

It may be more efficient to separate the last_scanned field for normal and
stripe scans instead.

 drivers/md/bcache/writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index f1986bc..6f8b81d 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -382,6 +382,7 @@ static bool refill_dirty(struct cached_dev *dc)
refill_full_stripes(dc);
if (array_freelist_empty(&buf->freelist))
return false;
+   bch_refill_keybuf(dc->disk.c, buf, &end, dirty_pred);
}
 
if (bkey_cmp(&buf->last_scanned, &end) >= 0) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 02/10] x86: Compile-time asm code validation

2015-06-10 Thread Vojtech Pavlik
On Wed, Jun 10, 2015 at 10:21:36AM -0700, Andy Lutomirski wrote:
> On Jun 10, 2015 5:07 AM, "Josh Poimboeuf"  wrote:
> >
> > Add a new CONFIG_ASM_VALIDATION option which adds an asmvalidate host
> > tool which runs on every compiled .S file.  Its goal is to enforce sane
> > rules on all asm code, so that stack debug metadata (frame/back chain
> > pointers and/or DWARF CFI metadata) can be made reliable.
> >
> > It enforces the following rules:
> >
> > 1. Each callable function must be annotated with the ELF STT_FUNC type.
> >This is typically done using the ENTRY/ENDPROC macros.  If
> >asmvalidate finds a return instruction outside of a function, it
> >flags an error, since that usually indicates callable code which
> >should be annotated accordingly.
> >
> > 2. Each callable function must never leave its own bounds (i.e. with a
> >jump to outside the function) except when returning.
> 
> Won't that break with sibling/tail calls?  GCC can generate those, and
> the ia32_ptregs_common label is an example of such a thing.

It only validates hand-written assembly, so how it could?

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] Add virtio-input driver.

2015-03-24 Thread Vojtech Pavlik
On Wed, Mar 25, 2015 at 01:51:43PM +1030, Rusty Russell wrote:

> Gerd Hoffmann  writes:
> > virtio-input is basically evdev-events-over-virtio, so this driver isn't
> > much more than reading configuration from config space and forwarding
> > incoming events to the linux input layer.
> >
> > Signed-off-by: Gerd Hoffmann 
> 
> Is the input layer sane?  I've never dealt with it, so I don't know.

I'm rather biased having designed it, but I'd say it's reasonable. It
certainly has limitations and design mistakes, but none are too bad.
One testimony to that Android has based its own Input API around it.

> What's it used for?

For all human input devices, like keyboards, mice, touchscreens, etc. 

> Imagine a future virtio standard which incorporates this.  And a Windows
> or FreeBSD implementation of the device and or driver.  How ugly would
> they be?

A windows translation layer is fairly easy, people porting software from
Windows to Linux told me numerous times that adapting isn't hard. I also
believe that at least one of the BSD's has a compatible implementation
these days based on the fact that I was asked to allow copying the
headers in the past.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:

> > Your upgrade proposal is an *enormous* disruption to the 
> > system:
> > 
> > - a latency of "well below 10" seconds is completely
> >   unacceptable to most users who want to patch the kernel 
> >   of a production system _while_ it's in production.
> 
> I think this statement is false for the following reasons.

The statement is very true.

>   - I'd say the majority of system operators of production 
> systems can live with a couple of seconds of delay at a 
> well defined moment of the day or week - with gradual, 
> pretty much open ended improvements in that latency 
> down the line.

In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.

If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.

>   - I think your argument ignores the fact that live 
> upgrades would extend the scope of 'users willing to 
> patch the kernel of a production system' _enormously_. 
> 
> For example, I have a production system with this much 
> uptime:
> 
>10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05
> 
> While currently I'm reluctant to reboot the system to 
> upgrade the kernel (due to a reboot's intrusiveness), 
> and that is why it has achieved a relatively high 
> uptime, but I'd definitely allow the kernel to upgrade 
> at 0:00am just fine. (I'd even give it up to a few 
> minutes, as long as TCP connections don't time out.)
> 
> And I don't think my usecase is special.

I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.

If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.

But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.

> What gradual improvements in live upgrade latency am I 
> talking about?
> 
>  - For example the majority of pure user-space process 
>pages in RAM could be saved from the old kernel over 
>into the new kernel - i.e. they'd stay in place in RAM, 
>but they'd be re-hashed for the new data structures. 
>This avoids a big chunk of checkpointing overhead.

I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..

>  - Likewise, most of the page cache could be saved from an
>old kernel to a new kernel as well - further reducing
>checkpointing overhead.
> 
>  - The PROT_NONE mechanism of the current NUMA balancing
>code could be used to transparently mark user-space 
>pages as 'checkpointed'. This would reduce system 
>interruption as only 'newly modified' pages would have 
>to be checkpointed when the upgrade happens.
> 
>  - Hardware devices could be marked as 'already in well
>defined state', skipping the more expensive steps of 
>driver initialization.
> 
>  - Possibly full user-space page tables could be preserved 
>over an upgrade: this way user-space execution would be 
>unaffected even in the micro level: cache layout, TLB
>patterns, etc.
> 
> There's lots of gradual speedups possible with such a model 
> IMO.

Yes, as I say above, guaranteeing decades of employment. ;)

> With live kernel patching we run into a brick wall of 
> complexity straight away: we have to analyze the nature of 
> the kernel modification, in the context of live patching, 
> and that only works for the simplest of kernel 
> modifications.

But you're able to _use_ it.

> With live kernel upgrades no such brick wall exists, just 
> about any transition between kernel versions is possible.

The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.

> Granted, with live kernel upgrades it'

Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:53:28AM +0100, Ingo Molnar wrote:
> 
> * Jiri Kosina  wrote:
> 
> > [...] We could optimize the kernel the craziest way we 
> > can, but hardware takes its time to reinitialize. And in 
> > most cases, you'd really need to reinitalize it; [...]
> 
> If we want to reinitialize a device, most of the longer 
> initialization latencies during bootup these days involve 
> things like: 'poke hardware, see if there's any response'. 
> Those are mostly going away quickly with modern, 
> well-enumerated hardware interfaces.
> 
> Just try a modprobe of a random hardware driver - most 
> initialization sequences are very fast. (That's how people 
> are able to do cold bootups in less than 1 second.)

Have you ever tried to boot a system with a large (> 100) number of
drives connected over FC? That takes time to discover and you have to do
the discovery as the configuration could have changed while you were not
looking.

Or a machine with terabytes of memory? Just initializing the memory
takes minutes.

Or a desktop with USB? And you have to reinitialize the USB bus and the
state of all the USB devices, because an application might be accessing
files on an USB drive.

> In theory this could also be optimized: we could avoid the 
> reinitialization step through an upgrade via relatively 
> simple means, for example if drivers define their own 
> version and the new kernel's driver checks whether the 
> previous state is from a compatible driver. Then the new 
> driver could do a shorter initialization sequence.

There you're clearly getting in the "so complex to maintain that it'll never
work reliably" territory.

> But I'd only do it only in special cases, where for some 
> reason the initialization sequence takes longer time and it 
> makes sense to share hardware discovery information between 
> two versions of the driver. I'm not convinced such a 
> mechanism is necessary in the general case.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 10:44:05AM +0100, Ingo Molnar wrote:

> > This is the most common argument that's raised when live 
> > patching is discussed. "Why do need live patching when we 
> > have redundancy?"
> 
> My argument is that if we start off with a latency of 10 
> seconds and improve that gradually, it will be good for 
> everyone with a clear, actionable route for even those who 
> cannot take a 10 seconds delay today.

Sure, we can do it that way. 

Or do it in the other direction.

Today we have a tool (livepatch) in the kernel that can apply trivial
single-function fixes without a measurable disruption to applications.

And we can improve it gradually to expand the range of fixes it can
apply.

Dependent functions can be done by kGraft's lazy migration.

Limited data structure changes can be handled by shadowing.

Major data structure and/or locking changes require stopping the kernel,
and trapping all tasks at the kernel/userspace boundary is clearly the
cleanest way to do that. I comes at a steep latency cost, though.

Full code replacement without change scope consideration requires full
serialization and deserialization of hardware and userspace
interface state, which is something we don't have today and would
require work on every single driver. Possible, but probably a decade of
effort.

With this approach you have something useful at every point and every
piece of effort put in gives you a rewars.

> Lets see the use cases:
> 
> > [...] Examples would be legacy applications which can't 
> > run in an active-active cluster and need to be restarted 
> > on failover.
> 
> Most clusters (say web frontends) can take a stoppage of a 
> couple of seconds.

It's easy to find examples of workloads that can be stopped. It doesn't
rule out a significant set of those where stopping them is very
expensive.

> > Another usecase is large HPC clusters, where all nodes 
> > have to run carefully synchronized. Once one gets behind 
> > in a calculation cycle, others have to wait for the 
> > results and the efficiency of the whole cluster goes 
> > down. [...]
> 
> I think calculation nodes on large HPC clusters qualify as 
> the specialized case that I mentioned, where the update 
> latency could be brought down into the 1 second range.
> 
> But I don't think calculation nodes are patched in the 
> typical case: you might want to patch Internet facing 
> frontend systems, the rest is left as undisturbed as 
> possible. So I'm not even sure this is a typical usecase.

They're not patched for security bugs, but stability bugs are an
important issue for multi-month calculations.

> In any case, there's no hard limit on how fast such a 
> kernel upgrade can get in principle, and the folks who care 
> about that latency will sure help out optimizing it and 
> many HPC projects are well funded.

So far, unless you come up with an effective solutions, if you're
catching all tasks at the kernel/userspace boundary (the "Kragle"
approach), the service interruption is effectively unbounded due to
tasks in D state.

> > The value of live patching is in near zero disruption.
> 
> Latency is a good attribute of a kernel upgrade mechanism, 
> but it's by far not the only attribute and we should 
> definitely not design limitations into the approach and 
> hurt all the other attributes, just to optimize that single 
> attribute.

It's an attribute I'm not willing to give up. On the other hand, I
definitely wouldn't argue against having modes of operation where the
latency is higher and the tool is more powerful.

> I.e. don't make it a single-issue project.

There is no need to worry about that. 

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Vojtech Pavlik
On Mon, Feb 23, 2015 at 11:42:17AM +0100, Richard Weinberger wrote:

> > Of course, if you are random Joe User, you can do whatever you want, i.e.
> > also compile your own home-brew patches and apply them randomly and brick
> > your system that way. But that's in no way different to what you as Joe
> > User can do today; there is nothing that will prevent you from shooting
> > yourself in a foot if you are creative.
> 
> Sorry if I ask something that got already discussed, I did not follow
> the whole live-patching discussion.
> 
> How much of the userspace tools will be public available?
> With live-patching mainline the kernel offers the mechanism, but
> random Joe user still needs
> the tools to create good live patches.

All the tools for kGraft and kpatch are available in public git
repositories.

Also, while kGraft has tools to automate the generation of patches,
these are generally not required to create a patch.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Vojtech Pavlik
On Sun, Feb 22, 2015 at 03:01:48PM -0800, Andrew Morton wrote:

> On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  wrote:
> 
> > But if you ask the folks who are hungry for live bug patching, they 
> > wouldn't care.
> > 
> > You mentioned "10 seconds", that's more or less equal to infinity to them. 
> 
> 10 seconds outage is unacceptable, but we're running our service on a
> single machine with no failover.  Who is doing this??

This is the most common argument that's raised when live patching is
discussed. "Why do need live patching when we have redundancy?"

People who are asking for live patching typically do have failover in
place, but prefer not to have to use it when they don't have to.

In many cases, the failover just can't be made transparent to the
outside world and there is a short outage. Examples would be legacy
applications which can't run in an active-active cluster and need to be
restarted on failover. Or trading systems, where the calculations must
be strictly serialized and response times are counted in tens of
microseconds. 

Another usecase is large HPC clusters, where all nodes have to run
carefully synchronized. Once one gets behind in a calculation cycle,
others have to wait for the results and the efficiency of the whole
cluster goes down. There are people who run realtime on them for
that reason. Dumping all data and restarting the HPC cluster takes a lot
of time and many nodes (out of tens of thousands) may not come back up,
making the restore from media difficult. Doing a rolling upgrade causes
the nodes one by one stall by 10+ seconds, which times 10k is a long
time, too.

And even the case where you have a perfect setup with everything
redundant and with instant failover does benefit from live patching.
Since you have to plan for failure, you have to plan for failure while
patching, too. With live patching you need 2 servers minimum (or N+1),
without you need 3 (or N+2), as one will be offline while during the
upgrade process.

10 seconds of outage may be acceptable in a disaster scenario. Not
necessarily for a regular update scenario.

The value of live patching is in near zero disruption.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())

2015-02-20 Thread Vojtech Pavlik
On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote:

> > ... the choice the sysadmins have here is either have the 
> > system running in an intermediate state, or have the 
> > system completely dead for the *same time*. Because to 
> > finish the transition successfully, all the tasks have to 
> > be woken up in any case.
> 
> That statement is false: an 'intermediate state' system 
> where 'new' tasks are still running is still running and 
> will interfere with the resolution of 'old' tasks.

Can you suggest a way how they would interfere? The transition happens
on entering or returning from a syscall, there is no influence between
individual tasks.

If you mean that the patch author has to consider the fact that both old
and new code will be running simultaneously, then yes, they have to.

> > But I do get your point; what you are basically saying is 
> > that your preference is what kgraft is doing, and option 
> > to allow for a global synchronization point should be 
> > added in parallel to the gradual lazy migration.
> 
> I think you misunderstood: the 'simple' method I outlined 
> does not just 'synchronize', it actually executes the live 
> patching atomically, once all tasks are gathered and we 
> know they are _all_ in a safe state.

The 'simple' method has to catch and freeze all tasks one by one in
syscall entry/exit, at the kernel/userspace boundary, until all are
frozen and then patch the system atomically. 

This means that each and every sleeping task in the system has to be
woken up in some way (sending a signal ...) to exit from a syscall it is
sleeping in. Same for CPU hogs. All kernel threads need to be parked.

This is exactly what you need to do for kGraft to complete patching.

This may take a significant amount of time to achieve and you won't be
able to use a userspace script to send the signals, because it'd be
frozen immediately.

> I.e. it's in essence the strong stop-all atomic patching 
> model of 'kpatch', combined with the reliable avoidance of 
> kernel stacks that 'kgraft' uses.

> That should be the starting point, because it's the most 
> reliable method.

In the consistency models discussion, this was marked the
"LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the strongest model of
all, but also comes at the highest cost in terms of impact on running
tasks. It's so high (the interruption may be seconds or more) that it
was deemed not worth implementing.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched: add sched_task_call()

2015-02-19 Thread Vojtech Pavlik
On Thu, Feb 19, 2015 at 11:32:55AM -0600, Josh Poimboeuf wrote:
> On Thu, Feb 19, 2015 at 06:19:29PM +0100, Vojtech Pavlik wrote:
> > On Thu, Feb 19, 2015 at 11:03:53AM -0600, Josh Poimboeuf wrote:
> > > On Thu, Feb 19, 2015 at 05:33:59PM +0100, Vojtech Pavlik wrote:
> > > > On Thu, Feb 19, 2015 at 10:24:29AM -0600, Josh Poimboeuf wrote:
> > > > 
> > > > > > No, these tasks will _never_ make syscalls. So you need to guarantee
> > > > > > they don't accidentally enter the kernel while you flip them. 
> > > > > > Something
> > > > > > like so should do.
> > > > > > 
> > > > > > You set TIF_ENTER_WAIT on them, check they're still in userspace, 
> > > > > > flip
> > > > > > them then clear TIF_ENTER_WAIT.
> > > > > 
> > > > > Ah, that's a good idea.  But how do we check if they're in user space?
> > > > 
> > > > I don't see the benefit in holding them in a loop - you can just as well
> > > > flip them from the syscall code as kGraft does.
> > > 
> > > But we were talking specifically about HPC tasks which never make
> > > syscalls.
> > 
> > Yes. I'm saying that rather than guaranteeing they don't enter the
> > kernel (by having them spin) you can flip them in case they try to do
> > that instead. That solves the race condition just as well.
> 
> Ok, gotcha.
> 
> We'd still need a safe way to check if they're in user space though.

Having a safe way would be very nice and actually quite useful in other
cases, too.

For this specific purpose, however, we don't need a very safe way,
though. We don't require atomicity in any way, we don't mind even if it
creates false negatives, only false positives would be bad.

kGraft looks at the stacktrace of CPU hogs and if it finds no kernel
addresses there, it assumes userspace. Not very nice, but does the job.

> How about with a TIF_IN_USERSPACE thread flag?  It could be cleared/set
> right at the border.  Then for running tasks it's as simple as:
> 
> if (test_tsk_thread_flag(task, TIF_IN_USERSPACE))
>   klp_switch_task_universe(task);

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched: add sched_task_call()

2015-02-19 Thread Vojtech Pavlik
On February 19, 2015 6:32:55 PM CET, Josh Poimboeuf  wrote:

>> Yes. I'm saying that rather than guaranteeing they don't enter the
>> kernel (by having them spin) you can flip them in case they try to do
>> that instead. That solves the race condition just as well.
>
>Ok, gotcha.
>
>We'd still need a safe way to check if they're in user space though.
>
>How about with a TIF_IN_USERSPACE thread flag?  It could be cleared/set
>right at the border.  Then for running tasks it's as simple as:
>
>if (test_tsk_thread_flag(task, TIF_IN_USERSPACE))
>   klp_switch_task_universe(task);

The s390x arch used to have a TIF_SYSCALL, which was doing exactly that (well, 
negated). I think it would work well, but it isn't entirely for free: two 
instructions per syscall and an extra TIF bit, which are (near) exhausted on 
some archs.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched: add sched_task_call()

2015-02-19 Thread Vojtech Pavlik
On Thu, Feb 19, 2015 at 11:03:53AM -0600, Josh Poimboeuf wrote:
> On Thu, Feb 19, 2015 at 05:33:59PM +0100, Vojtech Pavlik wrote:
> > On Thu, Feb 19, 2015 at 10:24:29AM -0600, Josh Poimboeuf wrote:
> > 
> > > > No, these tasks will _never_ make syscalls. So you need to guarantee
> > > > they don't accidentally enter the kernel while you flip them. Something
> > > > like so should do.
> > > > 
> > > > You set TIF_ENTER_WAIT on them, check they're still in userspace, flip
> > > > them then clear TIF_ENTER_WAIT.
> > > 
> > > Ah, that's a good idea.  But how do we check if they're in user space?
> > 
> > I don't see the benefit in holding them in a loop - you can just as well
> > flip them from the syscall code as kGraft does.
> 
> But we were talking specifically about HPC tasks which never make
> syscalls.

Yes. I'm saying that rather than guaranteeing they don't enter the
kernel (by having them spin) you can flip them in case they try to do
that instead. That solves the race condition just as well.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched: add sched_task_call()

2015-02-19 Thread Vojtech Pavlik
On Thu, Feb 19, 2015 at 10:24:29AM -0600, Josh Poimboeuf wrote:

> > No, these tasks will _never_ make syscalls. So you need to guarantee
> > they don't accidentally enter the kernel while you flip them. Something
> > like so should do.
> > 
> > You set TIF_ENTER_WAIT on them, check they're still in userspace, flip
> > them then clear TIF_ENTER_WAIT.
> 
> Ah, that's a good idea.  But how do we check if they're in user space?

I don't see the benefit in holding them in a loop - you can just as well
flip them from the syscall code as kGraft does.

> > I still absolutely hate you need to disturb userspace like that. Signals
> > are quite visible and perturb userspace state.
> 
> Yeah, SIGSTOP on a sleeping task can be intrusive to user space if it
> results in EINTR being returned from a system call.  It's not ideal, but
> it's much less intrusive than something like suspend.
> 
> But anyway we leave it up to the user to decide whether they want to
> take that risk, or wait for the task to wake up on its own, or cancel
> the patch.
> 
> > Also, you cannot SIGCONT a task that was SIGSTOP'ed by userspace for
> > what they thought was a good reason. You'd wreck their state.
> 
> Hm, that's a good point.  Maybe use the freezer instead of signals?
> 
> (Again this would only be for those user tasks which are sleeping on a
> patched function)
> 
> > > But now I'm thinking that kthreads will almost never be a problem.  Most
> > > kthreads are basically this:
> > 
> > You guys are way too optimistic; maybe its because I've worked on
> > realtime stuff too much, but I'm always looking at worst cases. If you
> > can't handle those, I feel you might as well not bother :-)
> 
> Well, I think we're already resigned to the fact that live patching
> won't work for every patch, every time.  And that the patch author must
> know what they're doing, and must do it carefully.
> 
> Our target worst case is that the patching fails gracefully and the user
> can't patch their system with that particular patch.  Or that the system
> remains in a partially patched state forever, if the user is ok with
> that.
> 
> > > Patching thread_fn wouldn't be possible unless we killed the thread.
> > 
> > It is, see kthread_park().
> 
> When the kthread returns from kthread_parkme(), it'll still be running
> the old thread_fn code, regardless of whether we flipped the task's
> patch state.

We could instrument kthread_parkme() to be able to return to a different
function, but it'd be a bit crazy indeed.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/9] livepatch: create per-task consistency model

2015-02-19 Thread Vojtech Pavlik
On Thu, Feb 19, 2015 at 10:52:51AM +0100, Peter Zijlstra wrote:

> On Wed, Feb 18, 2015 at 09:44:44PM +0100, Vojtech Pavlik wrote:
> > For live patching it doesn't matter whether code is running, sleeping or
> > frozen.
> > 
> > What matters is whether there is state before patching that may not be
> > valid after patching.
> > 
> > For userspace tasks, the exit from a syscall is a perfect moment for
> > switching to the "after" state, as all stacks, and thus all local
> > variables are gone and no local state exists in the kernel for the
> > thread.
> > 
> > The freezer is a logical choice for kernel threads, however, given that
> > kernel threads have no defined entry/exit point and execute within a
> > single main function, local variables stay and thus local state persists
> > from before to after freezing.
> > 
> > Defining that no local state within a kernel thread may be relied upon
> > after exiting from the freezer is certainly possible, and is already
> > true for many kernel threads.
> > 
> > It isn't a given property of the freezer itself, though. And isn't
> > obvious for author of new kernel threads either.
> > 
> > The ideal solution would be to convert the majority of kernel threads to
> > workqueues, because then there is a defined entry/exit point over which
> > state isn't transferred. That is a lot of work, though, and has other
> > drawbacks, particularly in the realtime space.
> 
> kthread_park() functionality seems to be exactly what you want.

It might be exactly that, indeed. The requrement of not just cleaning
up, but also not using contents of local variables from before parking
would need to be documented.

And kernel threads would need to start using it, too. I have been able
to find one instance where this functionality is actually used. So it is
again a matter of a massive patch adding that, like with the approach of
converting kernel threads to workqueues.

By the way, if kthread_park() was implemented all through the kernel,
would we still need the freezer for kernel threads at all? Since parking
seems to be stronger than freezing, it could also be used for that
purpose.

Vojtech
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 6/9] livepatch: create per-task consistency model

2015-02-18 Thread Vojtech Pavlik
On Wed, Feb 18, 2015 at 09:17:55PM +0100, Ingo Molnar wrote:
> 
> * Jiri Kosina  wrote:
> 
> > On Thu, 12 Feb 2015, Peter Zijlstra wrote:
> > 
> > > And what's wrong with using known good spots like the freezer?
> > 
> > Quoting Tejun from the thread Jiri Slaby likely had on 
> > mind:
> > 
> > "The fact that they may coincide often can be useful as a 
> > guideline or whatever but I'm completely against just 
> > mushing it together when it isn't correct.  This kind of 
> > things quickly lead to ambiguous situations where people 
> > are not sure about the specific semantics or guarantees 
> > of the construct and implement weird voodoo code followed 
> > by voodoo fixes.  We already had a full round of that 
> > with the kernel freezer itself, where people thought that 
> > the freezer magically makes PM work properly for a 
> > subsystem.  Let's please not do that again."
> 
> I don't follow this vague argument.
> 
> The concept of 'freezing' all userspace execution is pretty 
> unambiguous: tasks that are running are trapped out at 
> known safe points such as context switch points or syscall 
> entry. Once all tasks have stopped, the system is frozen in 
> the sense that only the code we want is running, so you can 
> run special code without worrying about races.
> 
> What's the problem with that? Why would it be fundamentally 
> unsuitable for live patching?

For live patching it doesn't matter whether code is running, sleeping or
frozen.

What matters is whether there is state before patching that may not be
valid after patching.

For userspace tasks, the exit from a syscall is a perfect moment for
switching to the "after" state, as all stacks, and thus all local
variables are gone and no local state exists in the kernel for the
thread.

The freezer is a logical choice for kernel threads, however, given that
kernel threads have no defined entry/exit point and execute within a
single main function, local variables stay and thus local state persists
from before to after freezing.

Defining that no local state within a kernel thread may be relied upon
after exiting from the freezer is certainly possible, and is already
true for many kernel threads.

It isn't a given property of the freezer itself, though. And isn't
obvious for author of new kernel threads either.

The ideal solution would be to convert the majority of kernel threads to
workqueues, because then there is a defined entry/exit point over which
state isn't transferred. That is a lot of work, though, and has other
drawbacks, particularly in the realtime space.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 0/3] Kernel Live Patching

2014-12-17 Thread Vojtech Pavlik
On Wed, Dec 17, 2014 at 01:22:21PM +0530, Balbir Singh wrote:
> On Wed, Dec 17, 2014 at 12:16 PM, Jiri Kosina  wrote:
> > On Wed, 17 Dec 2014, Balbir Singh wrote:
> >
> >> >> Could you describe what this does to signing? I presume the patched
> >> >> module should cause a taint on module signing?
> >> >
> >> > Hmm, why should it?
> >>
> >> I wanted to clarify it from a different perspective
> >>
> >> If the base image is signed by X and the patched module is signed by
> >> Y, is that supported. What does it imply in the case of live-patching?
> >
> > Why should that matter? Both are signed by keys that kernel is configured
> > to trust, which makes them equal (even though they are technically
> > different).
> >
> 
> I am not sure they are equal, others can comment

Since any loaded kernel module can do virtually anything on a machine,
there can only be one level of trust. As such, all trusted keys are
equally trusted.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 2/3] kernel: add support for live patching

2014-11-24 Thread Vojtech Pavlik
On Mon, Nov 24, 2014 at 02:26:08PM +0100, Thomas Gleixner wrote:
> On Mon, 24 Nov 2014, Jiri Kosina wrote:
> > On Mon, 24 Nov 2014, Thomas Gleixner wrote:
> > > How is determined whether a change can be applied w/o a consistency
> > > mechanism or not?
> > 
> > By a human being producing the "live patch" code.
> > 
> > If the semantics of the patch requires consistency mechanism to be applied 
> > (such as "old and new function must not run in parallel, because locking 
> > rules would be violated", or "return value from a function that is being 
> > called in a loop is changing its meaning", etc.), then this first naive 
> > implementation simply can't be used.
> > 
> > For simple things though, such as "add a missing bounds check to sys_foo() 
> > prologue and return -EINVAL if out-of-bounds", this is sufficient.
> > 
> > It's being designed in a way that more advanced consistency models (such 
> > as the ones kgraft and kpatch are currently implementing) can be built on 
> > top of it.
> > 
> > The person writing the patch would always need to understand what he is 
> > doing to be able to pick correct consistency model to be used. I 
> > personally think this is a good thing -- this is nothing where we should 
> > be relying on any kinds of tools.
> 
> But why want we to provide a mechanism which has no consistency
> enforcement at all?
> 
> Surely you can argue that the person who is doing that is supposed to
> know what he's doing, but what's the downside of enforcing consistency
> mechanisms on all live code changes?

The consistency engine implementing the consistency model is the most
complex part of the live patching technology. We want to have something
small, easy to understand pushed out first, to build on top of that.

Plus we're still discussing which exact consistency model to use for
upstream live patching (there are many considerations) and whether one
is enough, or whether an engine that can do more than one is required.

The consistency models of kpatch and kGraft aren't directly compatible.

I think we're on a good way towards a single model, but we'll see when
it's implemented within the live patching framework just posted.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 2/3] kernel: add support for live patching

2014-11-24 Thread Vojtech Pavlik
On Mon, Nov 24, 2014 at 12:13:20PM +0100, Thomas Gleixner wrote:

> > This commit introduces code for the live patching core.  It implements
> > an ftrace-based mechanism and kernel interface for doing live patching
> > of kernel and kernel module functions.
> > 
> > It represents the greatest common functionality set between kpatch and
> > kgraft and can accept patches built using either method.
> > 
> > This first version does not implement any consistency mechanism that
> > ensures that old and new code do not run together.  In practice, ~90% of
> > CVEs are safe to apply in this way, since they simply add a conditional
> > check.  However, any function change that can not execute safely with
> > the old version of the function can _not_ be safely applied in this
> > version.
> 
> To be honest this sounds frightening.
> 
> How is determined whether a change can be applied w/o a consistency
> mechanism or not?

If there are any syntactic (function prototype - arguments, return value
type) or semantic dependencies (data value semantics, locking order,
...) between multiple functions the patch changes, then it cannot be
applied.

If the changes are small and localized, don't depend on the order in
which individual functions are replaced, when both new and old code can
run in parallel on different CPUs or in sequence in a single thread
safely, then it can be applied.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-13 Thread Vojtech Pavlik
y including setting the counter
> > to 0 on syscall entry/exit, but it'd make the switch per-thread like
> > kGraft does, not for the whole system, when the respective counters
> > reach zero.
> 
> I'm not sure what happens if a process sleeps on the patched-set?

Then the patching process will be stuck until it is woken up somehow.
But it's still much better to only have to care about processes sleeping
on the patched-set than about processes sleeping anywhere (kGraft).

> If we switch the other threads, when this sleeping thread wakes up
> that will see the old functions (and old data).

Yes, until the patching process is complete, data must be kept in the
old format, even by new functions.

> So I think we need both SWITCH_THREAD and SWITCH_KERNEL options in
> that case.

With data shadowing that's not required. It still may be worth having
it.

> What I'm thinking is to merge the code (technique) of both and
> allow to choose the "switch-timing" based on the patch's consistency
> requirement.

That's what I'm thinking about, too. But I'm also thinking, "this will
be complex, is it really needed"?

> Anyway, I'd like to support for this effort from kernel side.
> At least I have to solve ftrace regs conflict by IPMODIFY flag and
> a headache kretprobe failure case by sharing per-thread retstack
> with ftrace-callgraph.

While I don't know enough about the IPMODIFY flag, I wholeheartedly
support sharing the return stacks between kprobes and ftrace graph
caller.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: [PATCH 0/2] Kernel Live Patching

2014-11-12 Thread Vojtech Pavlik
ll be included in the patched set. (Or you can
just include it manually in the set.)

Then, you can be sure that the place which calls func() is not on the
stack when patching. This way, in your classification, PATCH_KERNEL can
be as good as PATCH_THREAD. In my classification, I'm saying that
LEAVE_PATCHED_SET is as good as LEAVE_KERNEL.

> >> (*) Instead of checking stacks, at first, wait for all threads leaving
> >> the kernel once, after that, wait for refcount becomes zero and switch
> >> all the patched functions.
> > 
> > This is a very beautiful idea.
> > 
> > It does away with both the stack parsing and the kernel stopping,
> > achieving kGraft's goals, while preserving kpatch's consistency model.
> > 
> > Sadly, it combines the disadvantages of both kpatch and kGraft: From
> > kpatch it takes the inability to patch functions where threads are
> > sleeping often and as such never leave them at once. From kGraft it
> > takes the need to annotate kernel threads and wake sleepers from
> > userspace.
> 
> But how frequently the former case happens? It seems very very rare.
> And if we aim to enable both kpatch mode and kGraft mode in the kernel,
> anyway we'll have something for the latter cases.

The kpatch problem case isn't that rare. It just happened with a CVE in
futexes recently. It will happen if you try to patch anything that is on
the stack when a TTY or TCP read is waiting for data as another example. 

The kGraft problem case will happen when you load a 3rd party module
with a non-annotated kernel thread. Or a different problem will happen
when you have an application sleeping that will exit when receiving any
signal.

Both the cases can be handled with tricks and workarounds. But it'd be
much nicer to have a patching engine that is reliable.

> > So while it is beautiful, it's less practical than either kpatch or
> > kGraft alone. 
> 
> Ah, sorry for confusing, I don't tend to integrate kpatch and kGraft.
> Actually, it is just about modifying kpatch, since it may shorten
> stack-checking time.
> This means that does not change the consistency model.
> We certainly need both of kGraft mode and kpatch mode.

What I'm proposing is a LEAVE_PATCHED_SET + SWITCH_THREAD mode. It's
less consistency, but it is enough. And it is more reliable (likely to
succeed in finite time) than either kpatch or kGraft.

It'd be mostly based on your refcounting code, including stack
checking (when a process sleeps, counter gets set based on number of
patched functions on the stack), possibly including setting the counter
to 0 on syscall entry/exit, but it'd make the switch per-thread like
kGraft does, not for the whole system, when the respective counters
reach zero.

This handles the frequent sleeper case, it doesn't need annotated kernel
thread main loops, it will not need the user to wake up every process in
the system unless it sleeps in a patched function.

And it can handle all the patches that kpatch and kGraft can (it needs
shadowing for some).

> > Yes, this is what I call 'extending the patched set'. You can do that
> > either by deliberately changing the prototype of the patched function
> > being called, which causes the calling function to be considered
> > different, or just add it to the set of functions considered manually.
> 
> I'd prefer latter one :) or just gives hints of watching targets.

Me too.


-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 0/2] Kernel Live Patching

2014-11-11 Thread Vojtech Pavlik
On Tue, Nov 11, 2014 at 10:24:03AM +0900, Masami Hiramatsu wrote:

> Hmm, I doubt this can cover all. what I'm thinking is a combination of
> LEAVE_KERNEL and SWITCH_KERNEL by using my refcounting and kGraft's
> per-thread "new universe" flagging(*). It switches all threads but not
> change entire kernel as kexec does.

While your approach described below indeed forces all threads to leave
kernel once, to initialize the reference counts, this can be considered
a preparatory phase before the actual patching begins.

The actual consistency model remains unchanged from what kpatch offers
today, which guarantees that at the time of switch, no execution thread
is inside the set of patched functions and that the switch happens at
once for all threads. Hence I'd still classify the consistency model
offered as LEAVE_PATCHED_SET SWITCH_KERNEL.

> So, I think the patch may be classified by following four types
> 
> PATCH_FUNCTION - Patching per function. This ignores context, just
>change the function.
>User must ensure that the new function can co-exist
>with old functions on the same context (e.g. recursive
>call can cause inconsistency).
> 
> PATCH_THREAD - Patching per thread. If a thread leave the kernel,
>changes are applied for that thread.
>User must ensure that the new functions can co-exist
>with old functions per-thread. Inter-thread shared
>data acquisition(locks) should not be involved.
> 
> PATCH_KERNEL - Patching all threads. This wait for all threads leave the
>all target functions.
>User must ensure that the new functions can co-exist
>with old functions on a thread (note that if there is a
>loop, old one can be called first n times, and new one
>can be called afterwords).(**)

Yes, but only when the function calling it is not included in the
patched set, which is only a problem for semantic changes accompanied by
no change in the function prototyppe. This can be avoided by changing
the prototype deliberately.

> RENEW_KERNEL - Renew entire kernel and reset internally. No patch limitation,
>but involving kernel resetting. This may take a time.

And involves recording the userspace-kernel interface state exactly. The
interface is fairly large, so this can become hairy.

> (*) Instead of checking stacks, at first, wait for all threads leaving
> the kernel once, after that, wait for refcount becomes zero and switch
> all the patched functions.

This is a very beautiful idea.

It does away with both the stack parsing and the kernel stopping,
achieving kGraft's goals, while preserving kpatch's consistency model.

Sadly, it combines the disadvantages of both kpatch and kGraft: From
kpatch it takes the inability to patch functions where threads are
sleeping often and as such never leave them at once. From kGraft it
takes the need to annotate kernel threads and wake sleepers from
userspace.

So while it is beautiful, it's less practical than either kpatch or
kGraft alone. 

> (**) For the loops, if it is a simple loop or some simple lock calls,
> we can wait for all threads leave the caller function to avoid inconsistency
> by using refcounting.

Yes, this is what I call 'extending the patched set'. You can do that
either by deliberately changing the prototype of the patched function
being called, which causes the calling function to be considered
different, or just add it to the set of functions considered manually.

> > There would be the refcounting engine, counting entries/exits of the
> > area of interest (nothing for LEAVE_FUNCTION, patched functions for
> > LEAVE_PATCHED_SET - same as Masami's work now, or syscall entry/exit for
> > LEAVE_KERNEL), and it'd do the counting either per thread, flagging a
> > thread as 'new universe' when the count goes to zero, or flipping a
> > 'new universe' switch for the whole kernel when the count goes down to zero.
> 
> Ah, that's similar thing what I'd like to try next :)

Cool.

> Sorry, here is an off-topic talk.  I think a problem of kGraft's
> LEAVE_KERNEL work is that the sleeping processes. To ensure all the
> threads are changing to new universe, we need to wakeup all the
> threads, or we need stack-dumping to find someone is sleeping on the
> target functions. What would the kGraft do for this issue?

Yes, kGraft uses an userspace helper to find such sleeper and wake them
by sending a SIGUSR1 or SIGSTOP/SIGCONT. It's one of the disadvantages
of kGraft that sleeper threads have to be handled possibly case by case.

Also, kernel threads are problematic for kGraft (as you may have seen i

Re: [PATCH 0/2] Kernel Live Patching

2014-11-11 Thread Vojtech Pavlik
used for other functions in the patch.  In that case we'd be forfeiting
> consistency just for those skipped functions in the list.

So you would not be skipping the stack checking entirely, just allowing
certain functions from the patched set to be on the stack while you
switch to the new universe.

That indeed would make it a mixed SWITCH_FUNCTION/SWITCH_KERNEL
situation. 

The big caveat in such a situation is that you must not change the
calling convention or semantics of any function called directly from the
function you skipped the stack check for. As doing so would crash the
kernel when an old call calls a new function.

> > But there are a few (probably much less than 10%) cases like the locking
> > one I used above, where SWITCH_THREAD just isn't going to cut it and for
> > those I would need SWITCH_KERNEL or get very creative with refactoring
> > the patch to do things differently.
> 
> I'm not opposed to having both if necessary.  But I think the code would
> be _much_ simpler if we could agree on a single consistency model that
> can be used in all cases.  Plus there wouldn't be such a strong
> requirement to get incremental patching to work safely (which will add
> more complexity).
> 
> I actually agree with you that LEAVE_PATCHED_SET + SWITCH_THREAD is
> pretty nice.

Cool! Do you see it as the next step consistency model we would focus on
implementing in livepatch after the null model is complete and upstream?

(That wouldn't preclude extending it or implementing more models later.)

> So I'd like to hear more about cases where you think we _need_
> SWITCH_KERNEL.  As I mentioned above, I think many of those cases can be
> solved by using data structure versioning with shadow data fields.

I have tried, but so far I can't find a situation whether we would
absolutely need SWITCH_KERNEL, assuming we have LEAVE_PATCHED_SET +
SWITCH_THREAD + TRANSFORM_ON_ACCESS.

> > > Why would LEAVE_PATCHED_SET SWITCH_THREAD finish much quicker than
> > > LEAVE_PATCHED_SET SWITCH_KERNEL?  Wouldn't they be about the same?
> > 
> > Because with LEAVE_PATCHED_SET SWITCH_THREAD you're waiting for each
> > thread to leave the patched set and when they each have done that at
> > least once, you're done. Even if some are already back within the set.
> 
> Ok, so you're talking about the case when you're trying to patch a
> function which is always active.  Agreed :-)

Yes.

> > With LEAVE_PATCHED_SET SWITCH_KERNEL, you have to find the perfect
> > moment when all of the threads are outside of the patched set at the
> > same time. Depending on how often the functions are used and how large
> > the set is, reaching that moment will get harder.
> 
> Yeah, I think this is the biggest drawback of SWITCH_KERNEL.  More
> likely to fail (or never succeed).

If some threads are sleeping in a loop inside the patched set:

With SWITCH_THREAD you can wake (eg. by a signal) the threads from
userspace as a last resort and that will complete your patching.

With SWITCH_KERNEL you'd have somehow to wake them at the same time
hoping they also leave the patched set together. That's unlikely to
happen when many threads are involved.

In addition the "in progress" behavior is nicer for SWITCH_THREAD, as
any new thread will already be running patched code. With SWITCH_KERNEL,
you're waiting with applying the fix until the perfect moment.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-08 Thread Vojtech Pavlik
On Fri, Nov 07, 2014 at 09:45:53PM -0600, Josh Poimboeuf wrote:

> On Fri, Nov 07, 2014 at 10:27:35PM +0100, Vojtech Pavlik wrote:
> > On Fri, Nov 07, 2014 at 09:45:00AM -0600, Josh Poimboeuf wrote:
> > 
> > > > LEAVE_FUNCTION
> > > > LEAVE_PATCHED_SET
> > > > LEAVE_KERNEL
> > > > 
> > > > SWITCH_FUNCTION
> > > > SWITCH_THREAD
> > > > SWITCH_KERNEL
> > > > 
> > 
> > There would be the refcounting engine, counting entries/exits of the
> > area of interest (nothing for LEAVE_FUNCTION, patched functions for
> > LEAVE_PATCHED_SET - same as Masami's work now, or syscall entry/exit
> > for LEAVE_KERNEL), and it'd do the counting either per thread,
> > flagging a thread as 'new universe' when the count goes to zero, or
> > flipping a 'new universe' switch for the whole kernel when the count
> > goes down to zero.
> > 
> > A patch would have flags which specify a combination of the above
> > properties that are needed for successful patching of that specific
> > patch.
> 
> Would it really be necessary to support all possible permutations?  I
> think that would add a lot of complexity to the code.  Especially if we
> have to support LEAVE_KERNEL, which adds a lot of interdependencies with
> the rest of the kernel (kthreads, syscall, irq, etc).
> 
> It seems to me that the two most interesting options are:
> - LEAVE_PATCHED_SET + SWITCH_THREAD (Masami-kGraft)
> - LEAVE_PATCHED_SET + SWITCH_KERNEL (kpatch and/or Masami-kpatch)

I agree here.

In fact LEAVE_KERNEL can be approximated by extending the patched
set as required to include functions which are not changed per se, but
are "contaminated" by propagation of semantic changes in the calling
convention, and/or data format.

This applies to cases like this (needs LEAVE_KERNEL or extending patched
set beyond changed functions):

---

int bar() {
[...]
-   return x;
+   return x + 1;
}

foo() {
int ret = bar();
do {
wait_interruptible();
} while (ret == bar());
}

---

Or like this (needs SWITCH_KERNEL so won't work with kGraft, but also
extending patched set, will not work with kpatch as it stands today):

---

void lock_a()
{
-   spin_lock(&x);
+   spin_lock(&y);
}
void lock_b()
{
-   spin_lock(&y);
+   spin_lock(&x);
}
void unlock_a()
}
-   spin_unlock(&x);
+   spin_unlock(&y);
}
void unlock_b()
{
-   spin_unlock(&y);
+   spin_unlock(&x);
}

void foo()
{
lock_a();
lock_b();
[...]
unlock_b();
unlock_a();
}
---


So once we can extend the patched set as needed (manually, after
review), we can handle all the cases that LEAVE_KERNEL offers, making
LEAVE_KERNEL unnecessary.

It'd be nice if we wouldn't require actually patching those functions,
only include them in the set we have to leave to proceed.

The remaining question is the performance impact of using a large set
with refcounting. LEAVE_KERNEL's impact as implemented in kGraft is
beyond being measurable, it's about 16 added instructions for each
thread that get executed.

> I do see some appeal for the choose-your-own-consistency-model idea.
> But it wouldn't be practical for cumulative patch upgrades, which I
> think we both agreed at LPC seems to be safer than incremental
> patching.  If you only ever have one patch module loaded at any given
> time, you only get one consistency model anyway.

> In order for multiple consistency models to be practical, I think we'd
> need to figure out how to make incremental patching safe.

I believe we have to get incremental patching working anyway as it is a
valid usecase for many users, just not for major distributions.

And we may want to take a look at how to mark parts of a cumulative
patch with different consistency models, when we combine eg. the recent
futex CVE patch (not working with SWITCH_KERNEL) and anything requiring
SWITCH kernel.

> > For data layout an semantic changes, there are two approaches:
> > 
> > 1) TRANSFORM_WORLD
 
> I'm kind of surprised to hear that Ksplice does this.  I had

Re: [PATCH 0/2] Kernel Live Patching

2014-11-07 Thread Vojtech Pavlik
It works with any of the approaches (except null model) and while it
needs two steps (patch, then enable conversion), it doesn't require two
rounds of patching. Also, you don't have to consider oldfunc/newdata as
that will never happen. oldfunc/olddata obviously works, so you only
have to look at newfunc/olddata and newfunc/newdata as the
transformation goes on.

I don't see either of these as really that much simpler. But I do see value
in offering both.

> On the other hand, SWITCH_KERNEL doesn't have those problems.  It does
> have the problem you mentioned, roughly 2% of the time, where it can't
> patch functions which are always in use.  But in that case we can skip
> the backtrace check ~90% of the time.  

An interesting bit is that when you skip the backtrace check you're
actually reverting to LEAVE_FUNCION SWITCH_FUNCTION, forfeiting all
consistency and not LEAVE_FUNCTION SWITCH_KERNEL as one would expect.

Hence for those 2% of cases (going with your number, because it's a
guess anyway) LEAVE_PATCHED_SET SWITCH_THREAD would in fact be a safer
option.

> So it's really maybe something
> like 0.2% of patches which can't be patched with SWITCH_KERNEL.  But
> even then I think we could overcome that by getting creative, e.g. using
> the multiple patch approach.
> 
> So my perspective is that SWITCH_THREAD causes big headaches 10% of the
> time, whereas SWITCH_KERNEL causes small headaches 1.8% of the time, and
> big headaches 0.2% of the time :-)

My preferred way would be to go with SWITCH_THREAD for the simpler stuff
and do a SWITCH_KERNEL for the 10% of complex patches. This because
(LEAVE_PATCHED_SET) SWITCH_THREAD finishes much quicker. But I'm biased
there. ;)

It seems more and more to me that we will actually want the more
powerful engine coping with the various options.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-07 Thread Vojtech Pavlik
On Fri, Nov 07, 2014 at 07:11:53AM -0600, Josh Poimboeuf wrote:

> 2. Add consistency model(s) (e.g. kpatch stop_machine, kGraft per-task
>consistency, Masami's per task ref counting)

I have given some thought to the consistency models and how they differ
and how they potentially could be unified.

I have to thank Masami, because his rewrite of the kpatch model based on
refcounting is what brought it closer to the kGraft model and thus
allowed me to find the parallels.

Let me start by defining the properties of the patching consistency
model. First, what entity the execution must be outside of to be able to
make the switch, ordered from weakest to strongest:

LEAVE_FUNCTION
- execution has to leave a patched function to switch
  to the new implementation

LEAVE_PATCHED_SET
- execution has to leave the set of patched functions
  to switch to the new implementation

LEAVE_KERNEL
- execution has to leave the entire kernel to switch
  to the new implementation

Then, what entity the switch happens for. Again, from weakest to strongest:

SWITCH_FUNCTION
- the switch to the new implementation happens on a per-function
   basis

SWITCH_THREAD
- the switch to the new implementation is per-thread.

SWITCH_KERNEL
- the switch to the new implementation happens at once for
  the whole kernel

Now with those definitions:

livepatch (null model), as is, is LEAVE_FUNCTION and SWITCH_FUNCTION

kpatch, masami-refcounting and Ksplice are LEAVE_PATCHED_SET and 
SWITCH_KERNEL

kGraft is LEAVE_KERNEL and SWITCH_THREAD

CRIU/kexec is LEAVE_KERNEL and SWITCH_KERNEL

By blending kGraft and masami-refcounting, we could create a consistency
engine capable of almost any combination of these properties and thus
all the consistency models.

However, I'm currently thinking that the most interesting model is
LEAVE_PATCHED_SET and SWITCH_THREAD, as it is reliable, fast converging,
doesn't require annotating kernel threads nor fails with frequent
sleepers like futexes. 

It provides the least consistency that is required to be able to change
the calling convention of functions and still allows for semantic
dependencies.

What do you think?



PS.: Livepatch's null model isn't in fact the weakest possible, as it still
guarantees executing complete intact functions, this thanks to ftrace.
That is much more than what would direct overwriting of the function in
memory achieve.

This is also the reason why Ksplice is locked to a very specific
consistency model. Ksplice can patch only when the kernel is stopped and
the model is built from that.

masami-refcounting, kpatch, kGraft, livepatch have a lot more freedom,
thanks to ftrace, into what the consistency model should look like.

PPS.: I haven't included any handling of changed data structures in
this, that's another set of properties.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-07 Thread Vojtech Pavlik
On Fri, Nov 07, 2014 at 06:31:54AM -0600, Josh Poimboeuf wrote:
> On Thu, Nov 06, 2014 at 09:24:23PM +0100, Vojtech Pavlik wrote:
> > On Thu, Nov 06, 2014 at 10:58:57AM -0800, Christoph Hellwig wrote:
> > 
> > > On Thu, Nov 06, 2014 at 07:51:57PM +0100, Vojtech Pavlik wrote:
> > > > I don't think this specific example was generated. 
> > > > 
> > > > I also don't think including the whole kpatch automation into the kernel
> > > > tree is a viable development model for it. (Same would apply for kGraft
> > > > automation.)
> > > 
> > > Why?  We (IMHO incorrectly) used the argument of tight coupling to put
> > > perf into the kernel tree.  Generating kernel live patches is way more
> > > integrated that it absolutely has to go into the tree to be able to do
> > > proper development on it in an integrated fashion.
> > 
> > One reason is that there are currently at least two generators using
> > very different methods of generation (in addition to the option of doing
> > the patch module by hand), and neither of them are currently in a state
> > where they would be ready for inclusion into the kernel (although the
> > kpatch one is clearly closer to that).
> 
> What generator does kGraft have?  Is that the one that generates the
> source patch, or is there one that generates a binary patch module?

The generator for kGraft:

* extracts a list of changed functions from a patch (rather naïvely so 
far)
* uses DWARF debuginfo of the old kernel to handle things like inlining
  and create a complete list of functions that need to be replaced
* compiles the kernel with -fdata-sections -ffunction-sections
* uses a modified objcopy to extract functions from the kernel
  into a single .o file
* creates a stub .c file that references those functions
* compiles the .c and links with the .o to build a .ko

The main difference is in that the kGraft generator doesn't try to
compare the old and new binary objects, but rather works with function
lists and the DWARF info of the old code and extracts new functions from
the new binary.

However, as I said before, we have found enough trouble around eg.
IPA-SRA and other optimizations that make any automated approach fragile
and in our view more effort than benefit. Hence, we're intend to use the
manual way of creating live patches until proven that we were wrong in
this assessment. :)

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-06 Thread Vojtech Pavlik
On Thu, Nov 06, 2014 at 10:58:57AM -0800, Christoph Hellwig wrote:

> On Thu, Nov 06, 2014 at 07:51:57PM +0100, Vojtech Pavlik wrote:
> > I don't think this specific example was generated. 
> > 
> > I also don't think including the whole kpatch automation into the kernel
> > tree is a viable development model for it. (Same would apply for kGraft
> > automation.)
> 
> Why?  We (IMHO incorrectly) used the argument of tight coupling to put
> perf into the kernel tree.  Generating kernel live patches is way more
> integrated that it absolutely has to go into the tree to be able to do
> proper development on it in an integrated fashion.

One reason is that there are currently at least two generators using
very different methods of generation (in addition to the option of doing
the patch module by hand), and neither of them are currently in a state
where they would be ready for inclusion into the kernel (although the
kpatch one is clearly closer to that).

A generator is not required for using the infrastructure and is merely a
means of preparing the live patch with less effort.

I'm not opposed at all to adding the generator(s) eventually.

However, given that their use is optional, I would prefer not to have to
wait on finishing and cleaning up the generator(s) to include of the
in-kernel live patching infrastructure.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Kernel Live Patching

2014-11-06 Thread Vojtech Pavlik
On Thu, Nov 06, 2014 at 10:44:46AM -0800, Christoph Hellwig wrote:

> On Thu, Nov 06, 2014 at 08:39:06AM -0600, Seth Jennings wrote:
> > An example patch module can be found here:
> > https://github.com/spartacus06/livepatch/blob/master/patch/patch.c
> 
> Please include the generator for this patch in the kernel tree.
> Providing interfaces for out of tree modules (or generators) is not
> how the kernel internal APIs work.

I don't think this specific example was generated. 

I also don't think including the whole kpatch automation into the kernel
tree is a viable development model for it. (Same would apply for kGraft
automation.)

So I believe including one or more human produced examples makes most sense.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] kernel: add support for live patching

2014-11-06 Thread Vojtech Pavlik
On Thu, Nov 06, 2014 at 10:20:49AM -0600, Seth Jennings wrote:

> Yes, I should explain it.
> 
> This is something that is currently only used in the kpatch approach.
> It allows the patching core to do dynamic relocations on the new
> function code, similar to what the kernel module linker does, but this
> works for non-exported symbols as well.
> 
> This is so the patch module doesn't have to do a kallsyms lookup on
> every non-exported symbol that the new functions use.
> 
> The fields of the dynrela structure are those of a normal ELF rela
> entry, except for the "external" field, which conveys information about
> where the core module should go looking for the symbol referenced in the
> dynrela entry.
> 
> Josh was under the impression that Vojtech was ok with putting the
> dynrela stuff in the core.  Is that not correct (misunderstanding)?

Yes, that is correct, as obviously the kpatch way of generating patches
by extracting code from a compiled kernel would not be viable without
it.

For our own kGraft usage we're choosing to compile patches from C
source, and there we can simply replace the function calls by calls via
pointer looked up through kallsyms.

However, kGraft also has tools to create patches in an automated way,
where the individual functions are extracted from the compiled patched
kernel using a modified objopy and this is hitting exactly the same
issue of having to do relocation of unexported symbols if any are
referenced.

So no objection to the idea. We'll have to look more into the code to
comment on the implementation of the dynrela stuff.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/2] s390 vs. kprobes on ftrace

2014-10-22 Thread Vojtech Pavlik
On Wed, Oct 22, 2014 at 10:26:25AM +0200, Heiko Carstens wrote:

> > I can confirm that kGraft works well on top of current mainline with
> > this patch added.
> > 
> > Another reason for a performance impact when kGraft is enabled is that
> > kGraft still adds two instructions to the syscall path on s390x, as
> > there is no space left for a kgraft TIF in the first eight bits of
> > thread info flags. Renumbering the thread info flags such that _TIF_WORK
> > occupies the first eight bits and TIF_TRACE the next eight would fix
> > that problem: Do you believe it is feasible?
> 
> Hi Vojtech,
> 
> I think you're talking about the SLES12 kernel? 

I'm working with upstream git right now, to make sure kGraft keeps up
with it.

> There you can simply move the TIF_SYSCALL bit to the same byte where
> your TIF_KGR bit resides.

On 3.12 (SLE12) this would allow me to clear the TIF_KGR in a single
instruction when exiting a syscall simply by extending the mask. 

I need to clear it before entering a syscall, too, and TIF_SYSCALL is
set there, not cleared.

What could work is moving TIF_SYSCALL and TIF_KGR to the same byte as
TIF_TRACE. Then I could use the TIF_TRACE check to also clear the
TIF_KGR bit in sysc_tracesys without adding an instruction to the hot
path.

> Upstream is a bit different since the TIF_SYSCALL bit is already gone (got
> replaced with an s390 specific "PIF" bit). However the free TIF bit got
> already eaten up by uprobes..

Yes, and all the 8 bits are eaten now in upstream. That's what got me
thinking about separating TIF_WORK and TIF_TRACE to separate bytes, with
no other flags in them. Then the syscall code would then just check the
whole byte for zero instead of doing test-under-mask.

> However we can think of a better solution for upstream if the combined
> solution of kGraft/kpatch is ready to be merged.

We may, indeed, end up doing things very differently there.

At least initially the plan is to go entirely without enforcing any kind
of consistency when patching, so TIF_KGR and the lazy migration will not
exist, and neither will we be stopping the kernel or checking stacks.

This will make applying patches bigger than a single functions tricky,
but it's a good initial goal.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/2] s390 vs. kprobes on ftrace

2014-10-21 Thread Vojtech Pavlik
Hello Heiko,

I can confirm that kGraft works well on top of current mainline with
this patch added.

Another reason for a performance impact when kGraft is enabled is that
kGraft still adds two instructions to the syscall path on s390x, as
there is no space left for a kgraft TIF in the first eight bits of
thread info flags. Renumbering the thread info flags such that _TIF_WORK
occupies the first eight bits and TIF_TRACE the next eight would fix
that problem: Do you believe it is feasible?

Vojtech

On Tue, Oct 21, 2014 at 10:30:27AM +0200, Heiko Carstens wrote:
> v3:
> Changed patch 1/2 to incorporate feedback from Steven Rostedt and
> Masami Hiramatsu: rename helper function check_ftrace_location()
> to arch_check_ftrace_location() and convert it to a weak function,
> so architectures can override it without the need for new config
> option.
> 
> v2:
> Changed patch 1/2 to incorporate feedback from Masami Hiramatsu, and
> introduce a new helper function check_ftrace_location().
> The requested ftracetest has been sent as an own patch set, since it
> has no dependency to these patches.
> 
> v1:
> We would like to implement an architecture specific variant of "kprobes
> on ftrace" without using the current HAVE_KPROBES_ON_FTRACE infrastructure
> which is currently only used by x86.
> 
> The rationale for these two patches is:
> - we want to patch the first instruction of the mcount code block to
>   reduce the overhead of the function tracer
> - we'd like to keep the ftrace_caller function as simple as possible and
>   not require it to generate a 100% valid pt_regs structure as required
>   by the combination of DYNAMIC_FTRACE_WITH_REGS and HAVE_KPROBES_ON_FTRACE.
>   This allows us to not generate the psw mask field in the pt_regs
>   structure on each function tracer enabled function, which otherwise would
>   be very expensive. Besides that program check generated pt_regs contents
>   are "more" accurate than program generated ones and don't require any
>   maintenance.
>   And also we can keep the ftrace and kprobes backends quite separated.
> 
> In order to make this work a small common code change is necessary which
> removes a check if kprobe is being placed on an ftrace location (see
> first patch).
> 
> If possible, I'd like to have an ACK from at least one of the kprobes
> maintainers for the first patch and bring it upstream via the s390 tree.
> 
> Thanks,
> Heiko
> 
> Heiko Carstens (2):
>   kprobes: introduce weak arch_check_ftrace_location() helper function
>   s390/ftrace,kprobes: allow to patch first instruction
> 
>  arch/s390/include/asm/ftrace.h  |  52 ++--
>  arch/s390/include/asm/kprobes.h |   1 +
>  arch/s390/include/asm/lowcore.h |   4 +-
>  arch/s390/include/asm/pgtable.h |  12 
>  arch/s390/kernel/asm-offsets.c  |   1 -
>  arch/s390/kernel/early.c|   4 --
>  arch/s390/kernel/ftrace.c   | 132 
> +---
>  arch/s390/kernel/kprobes.c  |  92 
>  arch/s390/kernel/mcount.S   |   1 +
>  arch/s390/kernel/setup.c|   2 -
>  arch/s390/kernel/smp.c  |   1 -
>  include/linux/kprobes.h |   1 +
>  kernel/kprobes.c|  18 +++---
>  scripts/recordmcount.c  |   2 +-
>  scripts/recordmcount.pl |   2 +-
>  15 files changed, 226 insertions(+), 99 deletions(-)
> 
> -- 
> 1.8.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] SOUND: kill gameport bits

2014-08-28 Thread Vojtech Pavlik
On Thu, Aug 28, 2014 at 10:03:55PM +0200, Clemens Ladisch wrote:
> Takashi Iwai wrote:
> > did anyone test the patch at all...?
> 
> Appears to work.  The ymfpci gameport seems to be somewhat unreliable:
> 
>  analog.c: 100 out of 17347 reads (0%) on pci:06:06.1/gameport0 failed
>  analog.c: 122 out of  reads (10%) on pci:06:07.0/gameport0 failed

The analog.c gameport read routine is unreliable by design. 

The 558 chip is not an ADC, it's an one-shot timer from 1971. The analog
position of the joystick is measured by timing bit changes on the
gameport.

analog.c does that without disabling interrupts, as the read can take
several milliseconds. analog.c instead detects when an interrupt influenced
the measurement too much and retries.

The retries are counted and reported.

10% is a largeish number, but still something the analog.c driver can
cope with and give reliable results. 

> There is still some dependence of the CPU speed on the loop counts
> calculated by gameport_time(), but it's not very high; the following are
> for 3200 and 800 MHz, 5 times each:
> 
>  gameport gameport7: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 651kHz
>  gameport gameport8: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 651kHz
>  gameport gameport9: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 651kHz
>  gameport gameport10: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 651kHz
>  gameport gameport11: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 651kHz
>  gameport gameport12: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 612kHz
>  gameport gameport13: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 612kHz
>  gameport gameport14: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 612kHz
>  gameport gameport15: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 611kHz
>  gameport gameport16: EMU10K1 is pci:06:06.1/gameport0, io 0xe480, speed 
> 612kHz

The gameport speed is speed of the i/o to the port. It may change as
frequencies in the system change. It's used for timeouts on digital
gameport protocols only and thus a variation of less than 20% shouldn't cause
trouble.

The analog.c driver uses its own timing calibration to make the
analog_cooked_read() reliable even when the speed of the i/o operations
changes.

What is important is that the GET_TIME() macro is fast (0.1 usec or
less), precise (sub-microsecond resolution) and reliable. Digital
protocols also rely on udelay() and mdelay() waiting precisely for the
amount of time specified.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] SOUND: kill gameport bits

2014-08-19 Thread Vojtech Pavlik
On Tue, Aug 19, 2014 at 10:18:15PM -0700, Dmitry Torokhov wrote:

> Are you actively testing gameport interfaces with real joysticks/gamepads on
> these cards? And what software is still in use that runs on these old boxes
> (with mainline kernel)?

I still do have a huge box of gameport hardware in my office, it's just
that I haven't opened it for a number of years. However, if this thread
spurred enough interest in gameport devices, I would be willing to open
it and do the needed fixes.

If not, I think dropping makes sense. I still would shed a tear for all
those weird devices in the box, and possibly design an ATMega-based
USB<->Gameport adapter that actually works and supports even the digital
joysticks.

> > Also, I'm left wondering why e.g. my Athlon XP system (a very popular
> > choice for longer times) would be affected by Cpufreq...
> > And there are no details on how exactly cpufreq is a problem or how this
> > timing issue could be fixed...
> 
> If you take a look at gameport_measure_speed() in gameport.c you will see that
> it counts cycles for timing, which obviously does not work that well when CPU
> frequency changes.
> 
> The bugs have been opened in bugzilla/reported on lists ages ago but nobody
> stepped up to fix that.

It wouldn't be hard to fix: That code was developed when the timing
infrastructure in the kernel was non-existent, making use of it today
would make things a lot easier.

> > The obvious workaround for such an ensuing dearth of hardware support
> > could be USB 15-pin gameport adapters - but are they even supported on
> > Linux? Haven't seen info on this...
> > And even if supported, these adapters (at least the non-perfect ones, as
> > can be seen from reviews on a well-known online shop site) are said to
> > be hit-or-miss.
> > 
> > http://www.flightsim.com/vbfs/showthread.php?238938-joystick-GamePort-to-USB-adapter-question
> > http://reviews.thesource.ca/9026/2600164/nexxtech-usb-gameport-adapter-reviews/reviews.htm
> > 
> 
> They have better chance of being supported ;) I had a couple a few years back
> and they did work for me.

They do work for analog joysticks if you don't want any extended
functionality. I have a couple in said box.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] s390: add support for DYNAMIC_FTRACE_WITH_REGS

2014-07-03 Thread Vojtech Pavlik
Add support for DYNAMIC_FTRACE_WITH_REGS to 64-bit and 31-bit s390
architectures. This is required for kGraft and kpatch to work on s390.

It's done by adding a _regs variant of ftrace_caller that preserves
registers and puts them on stack in a struct pt_regs layout and
allows modification of return address by changing the PSW (instruction
pointer) member od struct pt_regs. 

Signed-off-by: Vojtech Pavlik 
Reviewed-by: Jiri Kosina 
Cc: Steven Rostedt 

---

 arch/s390/Kconfig   |1 
 arch/s390/include/asm/ftrace.h  |4 ++
 arch/s390/include/asm/lowcore.h |   12 ---
 arch/s390/kernel/asm-offsets.c  |1 
 arch/s390/kernel/early.c|3 +
 arch/s390/kernel/ftrace.c   |   68 +++-
 arch/s390/kernel/mcount.S   |   43 +
 arch/s390/kernel/mcount64.S |   38 ++
 arch/s390/kernel/setup.c|1 
 arch/s390/kernel/smp.c  |1 
 include/linux/ftrace.h  |1 
 11 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index bb63499..1de14d3 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -113,6 +113,7 @@ config S390
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index bf246da..df87fd2 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -17,6 +17,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
 
 #endif /* __ASSEMBLY__ */
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#endif
+
 #ifdef CONFIG_64BIT
 #define MCOUNT_INSN_SIZE  12
 #else
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 4349197..31b064a 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -142,9 +142,10 @@ struct _lowcore {
__u32   percpu_offset;  /* 0x02f0 */
__u32   machine_flags;  /* 0x02f4 */
__u32   ftrace_func;/* 0x02f8 */
-   __u32   spinlock_lockval;   /* 0x02fc */
+   __u32   ftrace_regs_func;   /* 0x02fc */
+   __u32   spinlock_lockval;   /* 0x0300 */
 
-   __u8pad_0x0300[0x0e00-0x0300];  /* 0x0300 */
+   __u8pad_0x0304[0x0e00-0x0304];  /* 0x0304 */
 
/*
 * 0xe00 contains the address of the IPL Parameter Information
@@ -287,9 +288,10 @@ struct _lowcore {
__u64   vdso_per_cpu_data;  /* 0x0380 */
__u64   machine_flags;  /* 0x0388 */
__u64   ftrace_func;/* 0x0390 */
-   __u64   gmap;   /* 0x0398 */
-   __u32   spinlock_lockval;   /* 0x03a0 */
-   __u8pad_0x03a0[0x0400-0x03a4];  /* 0x03a4 */
+   __u64   ftrace_regs_func;   /* 0x0398 */
+   __u64   gmap;   /* 0x03a0 */
+   __u32   spinlock_lockval;   /* 0x03a8 */
+   __u8pad_0x03ac[0x0400-0x03ac];  /* 0x03ac */
 
/* Per cpu primary space access list */
__u32   paste[16];  /* 0x0400 */
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index afe1715..12ef6ce 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -150,6 +150,7 @@ int main(void)
DEFINE(__LC_MCCK_CLOCK, offsetof(struct _lowcore, mcck_clock));
DEFINE(__LC_MACHINE_FLAGS, offsetof(struct _lowcore, machine_flags));
DEFINE(__LC_FTRACE_FUNC, offsetof(struct _lowcore, ftrace_func));
+   DEFINE(__LC_FTRACE_REGS_FUNC, offsetof(struct _lowcore, 
ftrace_regs_func));
DEFINE(__LC_DUMP_REIPL, offsetof(struct _lowcore, ipib));
BLANK();
DEFINE(__LC_CPU_TIMER_SAVE_AREA, offsetof(struct _lowcore, 
cpu_timer_save_area));
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 0dff972..2ab52d2 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -493,5 +493,8 @@ void __init startup_init(void)
 #ifdef CONFIG_DYNAMIC_FTRACE
S390_lowcore.ftrace_func = (unsigned long)ftrace_caller;
 #endif
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+   S390_lowcore.ftrace_regs_func = (unsigned long)ftrace_regs_caller;
+#endif
lockdep_on();
 }
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 54d6493..7d1ec79 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -21,6 +21,7 @@
 
 void ftrace_disable_code(void);
 void ftrace_enable_insn(void);
+void ftrace_enable_regs_insn(void);
 
 #ifdef CONFIG_64BIT
 /*
@@ -56,7 +57,13 @@ asm(

Re: [PATCH 00/21] kGraft

2014-06-25 Thread Vojtech Pavlik
On Wed, Jun 25, 2014 at 05:54:50PM +0200, Jiri Kosina wrote:
> > -   wait_event_freezable(khubd_wait,
> > +   wait_event_freezable(khubd_wait,
> >   ({ kgr_task_safe(current);
> > 
> > The changes are somewhat ugly with all the kgraft crap leaking into plces
> > like jbd and freezer and usb. That says to me its not well isolated and
> > there must be a better way of hiding that kgr stuff so we don't have to
> > kepe putting more code into all the kthreads, and inevitably missing them
> > or have people getting it wrong.
> 
> Tejun commented on this very issue during the first RFC submission a 
> couple weeks ago. Part of his proposal was actually to take this as a good 
> pretext to review the existing kernel threads, as the idea is that a big 
> portion of those could easily be converted to workqueues.

In the past few years there was a significant proliferation of kernel
threads just for cases where something needs to be done from a process
context now and then.

This is underlined by the fact that on an average installation there are
more kernel threads than real userspace processes.

Converting these bits to workqueues would then also take care of their
interaction with freezer, kthread_stop. The main loop code wouldn't have
to be replicated over and over with slight variations. kgr_taks_safe
would then plug into that rather elegantly.

In the absence of this rework, kGraft hijacks the freezer and
kthread_stop to pinpoint the 'end' of the main loop of most kernel
threads and annotates the rest that doesn't handle freezing or stopping
with kgr_task_safe directly.


> It of course has its own implications, such as not being able to 
> prioritize that easily, but there is probably a lot of low-hanging fruits 
> where driver authors and their grandmas have been creating kthreads where 
> workqueue would suffice.

Indeed, on the other hand, there are enough workqueues today and on
realtime systems the need for prioritizing them exists already.

> But it's a very long term goal, and it's probably impractical to make 
> kGraft dependent on it.

Should the kernel thread changes turn to be a big issue, we could do the
initial submission without them and depend on the kthread->workqueue
rework. Once we add support for multiple patches in progress, the fact
that we don't support kernel threads would not be as painful.
 
> > You add instructions to various hotpaths (despite the CONFIG
> > comment).
> 
> Kernel entry is hidden behind _TIF_WORK_SYSCALL_ENTRY and
> _TIF_ALLWORK_MASK, so it doesn't add any overhead if kGraft is not
> enabled. What other hot paths do you refer to?

As Jiří says, I don't see any hot path where kGraft would add
instructions: Kernel exit and entry are handled outside the hot path
in the _TIF_WORK_SYSCALL_ENTRY and _TIF_ALLWORK_MASK handlers. [patch 15/21]

It adds instructions into the main loops of kernel threads, but those
can hardly be called hot paths as they mostly sleep for long times.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 64bit x86: NMI nesting still buggy?

2014-05-21 Thread Vojtech Pavlik
On Wed, May 21, 2014 at 04:20:55PM +0200, Borislav Petkov wrote:
> On Wed, May 21, 2014 at 03:42:24PM +0200, Jiri Kosina wrote:
> > Alright, Andy's iret optimization efforts do immediately bring a
> > followup question -- why is this not a problem with iret-based return
> > from #MC possibly interrupting NMI?
> 
> Yeah, and frankly, I don't see this nesting fun at all protected against
> a #MC interrupting it at any point actually. Because once the #MC
> handler returns, it goes into paranoid_exit and that place doesn't
> account for NMIs at all, AFAICS.
> 
> Which would mean:
> 
> * NMI goes off
> * MCE happens, we switch to machine_check which is paranoidzeroentry
> * #MC handler is done -> paranoid_exit -> IRET
> -> boom! Or if not "boom", at least, the NMI gets forgotten.
> 
> Am I missing something?

I think to get a full BOOM you need a bit more complex process, namely:

* NMI triggered
* NMI handler starts
* MCE happens
* Second NMI triggered and queued
* handler done, IRET
* Second NMI handler starts and overwrites NMI return address on stack
* Second NMI handler ends
* First NMI handler ends and goes into an infinite IRET loop, always
  returning to the beginning of itself

But you do have all the ingredients.

And I don't see any other way out than not calling IRET for MCEs.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching

2014-05-17 Thread Vojtech Pavlik
On Sat, May 17, 2014 at 03:09:57AM +0900, Masami Hiramatsu wrote:
> (2014/05/17 1:27), Jiri Kosina wrote:
> > On Tue, 6 May 2014, Steven Rostedt wrote:
> > 
> >>> However, I also think if users can accept such freezing wait-time,
> >>> it means they can also accept kexec based "checkpoint-restart" patching.
> >>> So, I think the final goal of the kpatch will be live patching without
> >>> stopping the machine. I'm discussing the issue on github #138, but that is
> >>> off-topic. :)
> >>
> >> I agree with Ingo too. Being conservative at first is the right
> >> approach here. We should start out with a stop_machine making sure that
> >> everything is sane before we continue. Sure, that's not much different
> >> than a kexec, but lets take things one step at a time.
> >>
> >> ftrace did the stop_machine (and still does for some archs), and slowly
> >> moved to a more efficient method. kpatch/kgraft should follow suit.
> > 
> > I don't really agree here.
> > 
> > I actually believe that "lazy" switching kgraft is doing provides a little 
> > bit more in the sense of consistency than stop_machine()-based aproach.
> > 
> > Consider this scenario:
> > 
> > void foo()
> > {
> > for (i=0; i<1; i++) {
> > bar(i);
> > something_else(i);
> > }
> > }
> 
> In this case, I'd recommend you to add foo() to replacing target
> as dummy. Then, kpatch can ensure foo() is actually not running. :)

The problem is: Where do you stop?

Adding the whole list of functions that transitively may ever call bar()
can be pretty much the whole kernel. And given that you can be calling
via pointer, you can't often even tell who is calling bar().

So yes, a developer could manually include foo() in the to be patched
list, so that it is checked that foo() isn't being executed while
patching.

But that would have to be done entirely manually after thoroughly
understanding the implications of the patch.

> > Let's say you want to live-patch bar(). With stop_machine()-based aproach, 
> > you can easily end-up with old bar() and new bar() being called in two 
> > consecutive iterations before the loop is even exited, right? (especially 
> > on preemptible kernel, or if something_else() goes to sleep).
> > 
> > With lazy-switching implemented in kgraft, this can never happen.
> 
> And I guess similar thing may happen with kgraft. If old function and
> new function share a non-auto variable and they modify it different way,
> the result will be unexpected by the mutual interference.

By non-auto I assume you mean globally accessible variable or data
structures.

Yes, with kGraft the new function has to be backward compatible with
pre-patch global data layout and semantics.

The parameters, return value and their semantics are free to change,
though, and kGraft guarantees consistency there.

-- 
Vojtech Pavlik
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 09/16] kgr: mark task_safe in some kthreads

2014-05-14 Thread Vojtech Pavlik
On Wed, May 14, 2014 at 04:59:05PM +0200, Jiri Slaby wrote:

> I see the worst case scenario. (For curious readers, it is for example
> this kthread body:
> while (1) {
>   some_paired_call(); /* invokes pre-patched code */
>   if (kthread_should_stop()) { /* kgraft switches to the new code */
> its_paired_function(); /* invokes patched code (wrong) */
> break;
>   }
>   its_paired_function(); /* the same (wrong) */
> })
> 
> What to do with that now? We have come up with a couple possibilities.
> Would you consider try_to_freeze() a good state-defining function? As it
> is called when a kthread expects weird things can happen, it should be
> safe to switch to the patched version in our opinion.
> 
> The other possibility is to patch every kthread loop (~300) and insert
> kgr_task_safe() semi-manually at some proper place.
> 
> Or if you have any other suggestions we would appreciate that?

A heretic idea would be to convert all kernel threads into functions
that do not sleep and exit after a single iteration and are called from
a central kthread main loop function. That would get all of
kthread_should_stop() and try_to_freeze() and kgr_task_safe() nicely
into one place and at the same time put enough constraint on what the
thread function can do to prevent it from breaking the assumptions of
each of these calls. 

-- 
Vojtech Pavlik
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 64bit x86: NMI nesting still buggy?

2014-05-01 Thread Vojtech Pavlik
On Tue, Apr 29, 2014 at 05:24:32PM +0200, Jiri Kosina wrote:
> On Tue, 29 Apr 2014, Steven Rostedt wrote:
> 
> > > According to 38.4 of [1], when SMM mode is entered while the CPU is 
> > > handling NMI, the end result might be that upon exit from SMM, NMIs will 
> > > be re-enabled and latched NMI delivered as nested [2].
> > 
> > Note, if this were true, then the x86_64 hardware would be extremely
> > buggy. That's because NMIs are not made to be nested. If SMM's come in
> > during an NMI and re-enables the NMI, then *all* software would break.
> > That would basically make NMIs useless.
> > 
> > The only time I've ever witness problems (and I stress NMIs all the
> > time), is when the NMI itself does a fault. Which my patch set handles
> > properly. 
> 
> Yes, it indeed does. 
> 
> In the scenario I have outlined, the race window is extremely small, plus 
> NMIs don't happen that often, plus SMIs don't happen that often, plus 
> (hopefully) many BIOSes don't enable NMIs upon SMM exit.
> 
> The problem is, that Intel documentation is clear in this respect, and 
> explicitly states it can happen. And we are violating that, which makes me 
> rather nervous -- it'd be very nice to know what is the background of 38.4 
> section text in the Intel docs.

If we cannot disable IST for NMI on x86_64, because it'd break SYSCALL,
and thus cannot handle this situation well, then we should at least try
to detect it post-factum.

In the NMI handler, after ascertaining that the first NMI is executing
(in_nmi not yet set) we check the return address stored on the stack.

If it points anywhere inside the NMI handler (in reality only in the
space between the NMI handler start and the check), a SMI-triggered
nested NMI has happened.

Then we should be able to at least report it before dying.

If it doesn't ever happen: Great, this wasn't a real concern. If it
does, we can pester BIOS vendors.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 09/16] kgr: mark task_safe in some kthreads

2014-04-30 Thread Vojtech Pavlik
On Wed, Apr 30, 2014 at 09:55:32AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 30, 2014 at 04:30:42PM +0200, Jiri Slaby wrote:
> > Some threads do not use kthread_should_stop. Before we enable a
> > kthread support in kgr, we must make sure all those mark themselves
> > safe explicitly.
> 
> Would it make sense to bury kgr_task_safe() in wait_event_interruptible()
> and friends?  The kgr_task_safe() implementation looks pretty lightweight,
> so it should not be a performance problem.

For userspace tasks, the kGraft in progress flag is cleared when
entering or exiting userspace. At that point it is safe to switch the
task to a post-patch world view.

For kernel threads, it's a bit more complicated: They never exit the
kernel, they keep executing within the kernel continuously. The
kgr_task_safe() call is thus inserted at a location within the main loop
where a 'new loop' begins - where there are no dependencies on results
of calls of functions from the previous loop.

Hence, putting kgr_task_safe() into every wait_event_interruptible()
wouldn't work, only a few of them are at that strategic spot where a
'new loop' can be indicated to kGraft.

The reason kgr_task_safe() is called from within the condition
evaluation statement in wait_event_interruptible() in this patch is
because we want it to be called as soon as a new loop begins - even if
that loop is empty because the condition to stop waiting has not been
met.

This also means that kGraft currently cannot patch the main loops of
kernel threads themselves as the thread of execution never exits them.

Jiří (Slabý) has some ideas about how to do without calling
kgr_task_safe() from within the kernel thread main loops, but for now,
the goal is to keep things simple and easy to understand.

> One reason might this might be a bad idea is that there are calls to
> wait_event_interruptible() all over the place, which might therefore
> constrain where grafting could be safely done.  That would be fair enough,
> but does that also imply new constraints on where kthread_should_stop()
> can be invoked?  Any new constraints might not be a big deal given that
> a very large fraction of the kthreads (and maybe all of them) invoke
> kthread_should_stop() from their top-level function, but would be good
> to call out.

> So, what is the story?

kGraft currently assumes that kthread_should_stop() is always in a part
of the main loop which doesn't carry over effect dependencies from the
previous iteration. This is currently true for all the uses of
kthread_should_stop(), but indeed it is an additional constraint for the
future.


Vojtech
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Standing for the Linux Foundation Technical Advisory Board

2013-10-22 Thread Vojtech Pavlik

I am standing for the Linux Foundation Technical advisory board.

Since I'm not attending LinuxCon EU, nor the Kernel Summit, I won't be
there for the TAB election at the evening event either, hence I'll try
to introduce myself in this email.

My first encounter with Linux was Slackware in 1993 with kernel 0.99,
installed from a stack of floppies, and while it was still very young,
the possibility to improve it was a reason to forget about DOS, Xenix,
SINIX, SCO UNIX and LynxOS that I was using back then.

And I did. Initially just a few patches to the (now gone) ARCnet
drivers, then a joystick driver rewrite, eventually a rewrite of the
whole input subsystem and a bunch of work on USB. I rewrote a bunch of
IDE drivers. For a few years I was maintaining the Input and HID
subsystems of the Linux kernel, until my other duties forced me to hand
that over to their present excellent maintainers. I was also a member of
the team that worked on a port of Linux to the x86-64 architecture,
achieving a successful boot when the first silicion was available.

I used Linux to power autonomous robots, figuring out the
limits of its realtime behavior and after a few less than successful
attempts, eventually scored a 4th place in the Eurobot Open robotics
championship.

My day job today is that of a Director of SUSE Labs, a department within
SUSE development that focuses on developing core open source projects:
the Linux kernel, the GNU toolchain, including GCC and glibc and Samba.
My work is to create an environment where open source hackers thrive
inside SUSE, allowing us to contribute to open source projects, while
creating a stable foundation for each release of SUSE's enterprise
distribution.

While the time I can devote to real hacking and coding is limited today,
I still love going very much technical and solving problems by looking
at them from new perspectives: When secure boot came around, I designed
the MOK concept, which allows users to manage their own keys even in the
absence of any help from the platform, returning to them the freedom
that was threatened.

This I'm offering my time, my helping hand and my thoughts to the TAB,
hoping that I can contribute to solving any future problems Linux and
the Linux Foundation may face.

Vojtech Pavlik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC V4 PATCH 00/15] Signature verification of hibernate snapshot

2013-09-26 Thread Vojtech Pavlik
On Thu, Sep 26, 2013 at 04:48:00PM +0200, Jiri Kosina wrote:

> > The only two problems I see are
> > 
> >  1. The key isn't generational (any compromise obtains it).  This
> > can be fixed by using a set of keys generated on each boot and
> > passing in both K_{N-1} and K_N
> 
> I think this could be easily made optional, leaving the user with choice 
> of faster or "safer" boot.

Ideally, the key should be regenerated on each true reboot and kept the
same if it is just a resume. Unfortunately, I don't see a way to
distinguish those before we call ExitBootServices().

The reasoning behind that is that in the case of a kernel compromise, a
suspended-and-resumed kernel will still be compromised, so there is no
value in passing it a new key. A freshly booted kernel, though, should
get a new key, exactly because the attacker could have obtained a key
from the previous, compromised one.

This speeds up the ususal suspend-and-resume cycle, but provides full
security once the user performs a full reboot.

The question that remains is how to tell in advance.

> >  2. No external agency other than the next kernel can do the
> > validation since the validating key has to be secret
> 
> This is true, but as you said, the relevance of this seems to be rather 
> questionable.

Indeed, it's hard to imagine a scenario that is also valid within the
secure boot threat model.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC V4 PATCH 00/15] Signature verification of hibernate snapshot

2013-09-26 Thread Vojtech Pavlik
On Thu, Sep 26, 2013 at 02:21:23PM +0200, Michal Marek wrote:

> > Is not it as simple as storing hash of hibernation image into NVRAM
> > and then verifying the hash matches the value in NVRAM on next
> > startup? No encryption needed. 
> 
> I think that part of the exercise is to minimize the number of writes to
> the NVRAM. The hash changes with every hibernation, obviously.

The key should, too. 

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC V4 PATCH 00/15] Signature verification of hibernate snapshot

2013-09-26 Thread Vojtech Pavlik
On Thu, Sep 26, 2013 at 02:06:21PM +0200, Pavel Machek wrote:

> > For the symmetric key solution, I will try HMAC (Hash Message
> > Authentication Code). It's already used in networking, hope the
> > performance is not too bad to a big image.
> 
> Kernel already supports crc32 of the hibernation image, you may want
> to take a look how that is done.
> 
> Maybe you want to replace crc32 with cryptographics hash (sha1?) and
> then use only hash for more crypto? That way speed of whatever crypto
> you do should not be an issue.

Well, yes, one could skip the CRC when the signing is enabled to gain a
little speedup.

> Actually...
> 
> Is not it as simple as storing hash of hibernation image into NVRAM
> and then verifying the hash matches the value in NVRAM on next
> startup? No encryption needed. 

First, there is no encryption going on. Only doing a HMAC (digest (hash)
using a key) of the image.

Second, since NVRAM is accessible through efivarsfs, storing the hash in
NVRAM wouldn't prevent an attacker from modifying the hash to match a
modified image.

There is a reason why the key for the HMAC is stored in the NVRAM in a
BootServices variable that isn't accessible from the OS and is
write-protected on hardware level from the OS.

> And that may even be useful for non-secure-boot people, as it ensures
> you boot right image after resume, boot it just once, etc...

The HMAC approach isn't much more complicated, and it gives you all
these benefits even with secure boot disabled.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Load keys from signed PE binaries

2013-03-01 Thread Vojtech Pavlik
On Thu, Feb 28, 2013 at 10:51:15PM +, Matthew Garrett wrote:
> On Thu, Feb 28, 2013 at 11:48:06PM +0100, Jiri Kosina wrote:
> 
> > Let me formulate my point more clearly -- Microsoft very likely going to 
> > sign hello world EFI PE binary, no matter the contents of .keylist 
> > section, as they don't give a damn about this section, as it has zero 
> > semantic value to them, right?
> > 
> > They sign the binary. By signing the binary, they are *NOT* establishing 
> > cryptographic chain of trust to the key stored in .keylist, but your 
> > patchset seems to imply so.
> 
> Mr Evil Blackhat's binary is then a mechanism for circumventing the 
> Windows trust mechanism, and as such his account is subject to 
> termination and his binary can be added to dbx. 

Why?

Let's take a look on what would happen in this scenario:

A PE binary, from Mr. Blackhat, doing nothing, in general, but
containing a key in a section, was signed by MS on the grounds that the
binary isn't harmful.

By issuing the signature, MS is attesting that the binary is safe, but
isn't saying anything about the data (key) embedded in it. It doesn't
say the key comes from a trusted party. It just says "this isn't
malware", and that's what their tools verify.

Your shim loader (signed by MS) loads your Linux kernel.

Your Linux kernel, then, based on the key-in-PE model decides to trust
the key, although nobody really said it's to be trusted.

Mr. Blackhat then can load his i_own_your_ring0.ko module signed by his
key on your system, having obtained root access previously.

You call Microsoft, telling them what Mr. Blackhat has done.

They now can:

a) Do what you want: Disable Mr. Blackhat's account and revoke the hash
   of his binary.

But also:

b) Say, "oh well, we're sorry this kills your security model, but it's
enough for us that you already fully booted Linux to worry about Windows
security, this affects only your distribution and we don't care".

c) Decide your security model is flawed, because you're abusing their
signature process to mean something else from what they intended and
revoke your shim hash instead.

And I don't think you can rely on MS doing 'a'. Particularly when there
will be a large number of key-in-PE binaries signed by them at that
point, with them not being able to tell by any analysis which of them
are evil and which not.

I would even say b) is most likely.

> We'd check the binary hash against dbx and refuse to load it on
> systems that have received the update, and Mr Evil Blackhat would have
> to find a new bunch of identity documents to create a new account to
> repeat the process.

Yes, from the point it gets blacklisted, it's fairly clear. You're
forced to reboot, but under the Secure Boot model, you have to do that
on any system that used code whose hash has been revoked.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1 try#2] [INPUT] keypad driver: Added support for OpenCores Keyboard Controller

2008-02-05 Thread Vojtech Pavlik
On Tue, Feb 05, 2008 at 12:18:15PM +0100, Javier Herrero wrote:
> Dear Vojtech,
>
> I think that a 1:1 mapping between linux keycodes and what keyboard sends 
> is right, because the scan code to key code conversion is already 
> programmed and done inside the FPGA code.

And the FPGA code changes with different keyboards attached?

> Best regards,
>
> Javier
>
> Vojtech Pavlik escribió:
>> On Thu, Jan 31, 2008 at 01:18:22AM +0800, Bryan Wu wrote:
>>
>>> +static irqreturn_t opencores_kbd_isr(int irq, void *dev_id)
>>> +{
>>> +   unsigned char c;
>>> +   struct platform_device *pdev = dev_id;
>>> +   struct opencores_kbd *opencores_kbd = platform_get_drvdata(pdev);
>>> +   struct input_dev *input = opencores_kbd->input;
>>> +
>>> +   c = readb(opencores_kbd->addr_res->start);
>>> +   input_report_key(input, c & 0x7f, c & 0x80 ? 0 : 1);
>>> +   input_sync(input);
>>> +
>>> +   return IRQ_HANDLED;
>>> +}
>>  This looks utterly wrong: It assumes 1:1 mapping between Linux keycodes
>> and what the keyboard sends, which I can't believe is the case.
>>
>
> -- 
> 
> Javier HerreroEMAIL: [EMAIL PROTECTED]
> HV Sistemas S.L.      PHONE: +34 949 336 806
> Los Charcones, 17AFAX:   +34 949 336 792
> 19170 El Casar - Guadalajara - Spain  WEB: http://www.hvsistemas.com 

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1 try#2] [INPUT] keypad driver: Added support for OpenCores Keyboard Controller

2008-02-05 Thread Vojtech Pavlik
On Thu, Jan 31, 2008 at 01:18:22AM +0800, Bryan Wu wrote:

> +static irqreturn_t opencores_kbd_isr(int irq, void *dev_id)
> +{
> + unsigned char c;
> + struct platform_device *pdev = dev_id;
> + struct opencores_kbd *opencores_kbd = platform_get_drvdata(pdev);
> + struct input_dev *input = opencores_kbd->input;
> +
> + c = readb(opencores_kbd->addr_res->start);
> + input_report_key(input, c & 0x7f, c & 0x80 ? 0 : 1);
> + input_sync(input);
> +
> + return IRQ_HANDLED;
> +}
 
This looks utterly wrong: It assumes 1:1 mapping between Linux keycodes
and what the keyboard sends, which I can't believe is the case.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux and remote control device

2008-01-22 Thread Vojtech Pavlik
On Tue, Jan 22, 2008 at 11:20:53AM +0100, Francis Moreau wrote:

> > It would still limit the IR ability to sending/receiving "just"
> > keypresses. You may want to send and receive more (arbitrary data) over
> > an IR dongle attached to the machine.
> 
> Don't know about such devices. The device I actually have, an infrared
> remote control, seems simple enough to use the input device model. I
> have a wheel on it but I think I can make it work too...
 
Yes, wheel isn't a problem. I should have said 'input events'.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux and remote control device

2008-01-22 Thread Vojtech Pavlik
On Tue, Jan 22, 2008 at 10:47:22AM +0100, Francis Moreau wrote:
> On Jan 22, 2008 9:19 AM, Vojtech Pavlik <[EMAIL PROTECTED]> wrote:
> > On Tue, Jan 22, 2008 at 09:01:10AM +0100, Francis Moreau wrote:
> > > Hello,
> > >
> > > I'd like to add support for my Infrared remote control to Linux.
> > >
> > > So far I only see LIRCD project that make the kernel support
> > > such device but I'm not sure if this project is the best choice
> > > since it's not part of mainline kernels. And there are certainly
> > > good reasons which I'm not aware of.
> > >
> > > Another possibility is to make the remote controle device an input
> > > device. But I see several flaws:
> > >
> > > - The IR receiver on my board can't be use as a transmitter any
> > > more
> >
> > The input subsystem moves events in both directions. It would need some
> > additional hacking to work with IR transmitters, but it might be
> > possible.
> >
> > > - All scancodes are embedded in the kernel.
> >
> > While they reside in the kernel, they can be changed from userspace.
> >
> > > Do you think it's still the way to go despite the 2 points raised above ?
> >
> > It might be, but I'm not 100% convinced either.
> >
> 
> All your answers sound good however.
> 
> Could you please tell me why you're not sure ?

It would still limit the IR ability to sending/receiving "just"
keypresses. You may want to send and receive more (arbitrary data) over
an IR dongle attached to the machine.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux and remote control device

2008-01-22 Thread Vojtech Pavlik
On Tue, Jan 22, 2008 at 09:01:10AM +0100, Francis Moreau wrote:
> Hello,
> 
> I'd like to add support for my Infrared remote control to Linux.
> 
> So far I only see LIRCD project that make the kernel support
> such device but I'm not sure if this project is the best choice
> since it's not part of mainline kernels. And there are certainly
> good reasons which I'm not aware of.
> 
> Another possibility is to make the remote controle device an input
> device. But I see several flaws:
> 
> - The IR receiver on my board can't be use as a transmitter any
> more

The input subsystem moves events in both directions. It would need some
additional hacking to work with IR transmitters, but it might be
possible.

> - All scancodes are embedded in the kernel.

While they reside in the kernel, they can be changed from userspace.

> Do you think it's still the way to go despite the 2 points raised above ?

It might be, but I'm not 100% convinced either.

-- 
Vojtech Pavlik
Director SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Whats the purpose of get_cycles_sync()

2007-10-30 Thread Vojtech Pavlik
On Tue, Oct 30, 2007 at 09:21:02PM +0100, Andi Kleen wrote:
> "Joerg Roedel" <[EMAIL PROTECTED]> writes:
> 
> > I would like to answer what the special purpose of the get_cycles_sync()
> > function is in the x86 architecture. In special I ask myself why
> > this function has to be *sync*?
> 
> Vojtech had one test that tested time monotonicity over CPUs 
> and it constantly failed until we added the CPUID on K8 C stepping. 
> He can give details on the test.
> 
> I suspect the reason was because the CPU reordered the RDTSCs so that
> a later RDTSC could return a value before an earlier one. This can
> happen because gettimeofday() is so fast that a tight loop calling it can
> fit more than one iteration into the CPU's reordering window.

The K8's still guarantee that subsequent RDTSCs return increasing
values, even if the processor reorders them.

What could have been happening then was that the RDTSC instruction might
have been reordered by the CPU out of the seqlock, causing trouble in
the calculation.

Anyway, adding the CPUID didn't solve all the problems we've seen back
then, and so far none of the approaches for using TSC without acquiring
a spinlock on multi-socket AMD boxes worked 100% correctly.

> That is why newer kernels use RDTSCP if available which doesn't need
> to be intercepted and is synchronous.  And since all AMD SVM systems
> have RDTSCP they are fine.
> 
> On Intel Core2 without RDTSCP the CPUID can be still intercepted right
> now, but the real fix there is to readd FEATURE_SYNC_TSC for Core2 --
> the RDTSC there is always monotonic per CPU and the patch that changed
> that (f3d73707a1e84f0687a05144b70b660441e999c7) was bogus and must be
> reverted. I didn't catch that in time unfortunately.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ALPS touchpad with new Dell not recognised

2007-08-05 Thread Vojtech Pavlik
On Sun, Aug 05, 2007 at 08:45:29AM +1000, William Pettersson wrote:
> Hi,
> This patch adds support for the Alps touchpad on my Dell Vostro 1400 to
> the linux kernel.
> 
> Signed-off-by: William Pettersson <[EMAIL PROTECTED]>
> 
> Vojtech, I went over my debug info again with your thoughts in mind, and
> you were right.  The highest bit is always set in the sync byte of the
> protocol.  It must have been luck that it didn't drop sync while it was
> testing.  So a mask and bit of 0xCF does work.  I then decided to go
> through the logs more thoroughly, and determine if I could make this
> mask more rigid.  Turns out that a bit of 0xCF and a mask of 0xFF also
> works.

Just one more question: Have you checked that the 0xcf doesn't change
when you press any of the ALPS buttons? Including those associated with
the passthrough? Some bits in the sync byte might still be mapped to
these.

> I've also attached the output dumps.  The one labelled mouse.nosync.log
> is the the one that I first worked on.  However, it was hard to see
> where the start of the packets were, so I modified alps.c to give me the
> output shown in alps-packets.log (I can submit this patch if you deem it
> useful too).  It was once I did this that I saw straight away what the
> bit0 had to be.  And then mouse.alps.log is just i8042.debug output
> after my patch has been applied.
> 
> William
> 
> Vojtech Pavlik wrote:
> > On Sat, Aug 04, 2007 at 10:15:54PM +1000, William Pettersson wrote:
> >
> >   
> >> Hi LKML,
> >> Thanks to Vojtech for some advice, I finally got something happening
> >> with this.  Turns out I had to add support for a new model in alps.c,
> >> the attached file does this successfully for my laptop.  It'd be nice to
> >> have others test it, but I am yet to find anyone else who has this same
> >> laptop, with the same mouse issues I have.
> >>
> >> With this patch, I now get a functioning touchpad, including side
> >> scrolling, circle scrolling, corner taps etc.  However, I did have to
> >> ramp the configuration numbers up significantly from my old Thinkpad
> >> with its synaptics touchpad.
> >> 
> >
> > That is normal. The ALPS's have very much different parameters -
> > sensitivity, resolution, etc., so that the settings aren't easily
> > comparable.
> >
> >   
> >> Minspeed is now 0.8 and maxspeed is 10. 
> >> Also the VertScrollDelta is 10.  I don't know whether these numbers are
> >> crazy high, or just normal for the Alps though.
> >> 
> >
> > I don't remember exactly myself - I'm only interested in the low level
> > of the protocol - making sure the kernel delivers correctly decoded data
> > to userspace.
> >
> > Please send the patch to Dmitry Torokhov, who is nowadays a maintainer
> > of this driver and the input subsystem. And don't forget to add a
> > description of the patch and a "Signed-off-by:" line.
> >
> > I'm quite surprised you needed the value of 0x4f, I'd expect the highest
> > bit always set in the sync byte of the protocol. Can you send me (and
> > dmitry) the dump data you gathered using i8042.debug?
> >
> > Thanks,
> > Vojtech
> >
> >   
> >> --- drivers/input/mouse/alps.c.old 2007-07-31 21:42:37.0 +1000
> >> +++ drivers/input/mouse/alps.c 2007-08-04 21:45:43.0 +1000
> >> @@ -53,6 +53,7 @@
> >>{ { 0x20, 0x02, 0x0e }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT }, /* 
> >> XXX */
> >>{ { 0x22, 0x02, 0x0a }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT },
> >>{ { 0x22, 0x02, 0x14 }, 0xff, 0xff, ALPS_PASS | ALPS_DUALPOINT }, /* 
> >> Dell Latitude D600 */
> >> +  { { 0x73, 0x02, 0x50 }, 0x4f, 0x4f, ALPS_FW_BK_1 } /* Dell Vostro 1400 
> >> */
> >>  };
> >> 
> >
> >   

> --- drivers/input/mouse/alps.c.old2007-07-31 21:42:37.0 +1000
> +++ drivers/input/mouse/alps.c2007-08-05 08:34:00.0 +1000
> @@ -53,6 +53,7 @@
>   { { 0x20, 0x02, 0x0e }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT }, /* 
> XXX */
>   { { 0x22, 0x02, 0x0a }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT },
>   { { 0x22, 0x02, 0x14 }, 0xff, 0xff, ALPS_PASS | ALPS_DUALPOINT }, /* 
> Dell Latitude D600 */
> + { { 0x73, 0x02, 0x50 }, 0xcf, 0xff, ALPS_FW_BK_1 } /* Dell Vostro 1400 
> */
>  };
>  
>  /*





-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ALPS touchpad with new Dell not recognised

2007-08-04 Thread Vojtech Pavlik
On Sat, Aug 04, 2007 at 10:15:54PM +1000, William Pettersson wrote:

> Hi LKML,
> Thanks to Vojtech for some advice, I finally got something happening
> with this.  Turns out I had to add support for a new model in alps.c,
> the attached file does this successfully for my laptop.  It'd be nice to
> have others test it, but I am yet to find anyone else who has this same
> laptop, with the same mouse issues I have.
> 
> With this patch, I now get a functioning touchpad, including side
> scrolling, circle scrolling, corner taps etc.  However, I did have to
> ramp the configuration numbers up significantly from my old Thinkpad
> with its synaptics touchpad.

That is normal. The ALPS's have very much different parameters -
sensitivity, resolution, etc., so that the settings aren't easily
comparable.

> Minspeed is now 0.8 and maxspeed is 10. 
> Also the VertScrollDelta is 10.  I don't know whether these numbers are
> crazy high, or just normal for the Alps though.

I don't remember exactly myself - I'm only interested in the low level
of the protocol - making sure the kernel delivers correctly decoded data
to userspace.

Please send the patch to Dmitry Torokhov, who is nowadays a maintainer
of this driver and the input subsystem. And don't forget to add a
description of the patch and a "Signed-off-by:" line.

I'm quite surprised you needed the value of 0x4f, I'd expect the highest
bit always set in the sync byte of the protocol. Can you send me (and
dmitry) the dump data you gathered using i8042.debug?

Thanks,
Vojtech

> --- drivers/input/mouse/alps.c.old2007-07-31 21:42:37.0 +1000
> +++ drivers/input/mouse/alps.c2007-08-04 21:45:43.0 +1000
> @@ -53,6 +53,7 @@
>   { { 0x20, 0x02, 0x0e }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT }, /* 
> XXX */
>   { { 0x22, 0x02, 0x0a }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT },
>   { { 0x22, 0x02, 0x14 }, 0xff, 0xff, ALPS_PASS | ALPS_DUALPOINT }, /* 
> Dell Latitude D600 */
> + { { 0x73, 0x02, 0x50 }, 0x4f, 0x4f, ALPS_FW_BK_1 } /* Dell Vostro 1400 
> */
>  };

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hibernation considerations

2007-07-28 Thread Vojtech Pavlik
On Mon, Jul 16, 2007 at 12:38:11AM +0200, Rafael J. Wysocki wrote:

> > Or the user unplugs their flash drive after hibernation rather than before.
> > 
> > Two things which I think would be nice to consider are:
> >1) Encryption - I'd actually prefer if my luks device did not
> >remember the key accross a hibernation; I want to be forced to
> >reenter the phrase.  However I don't know what the best thing
> >to do to partitions/applications using the luks device is.
> 
> Encryption is possible with both the userland hibernation (aka uswsusp) and
> TuxOnIce (formerly known as suspend2).  Still, I don't consider it as a "must
> have" feature for a framework to be generally useful (many users don't use it
> anyway).

If a user uses an encrypted filesystem, then he also needs an encrypted
swap and encrypted hibernation image: Otherwise the fileystem encryption
is not very useful.

Forgetting the filesystem/swap decryption keys before hibernation is
probably harder to do - there may be sensitive data in the kernel memory
image that weren't cleared - even if the key itself is not there.

In my opinion, encrypted hibernation is what every notebook user should
want - that's the only way how to make sure data from the notebook
aren't available when the notebook is physically stolen.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: keyboard stopped working after de9ce703c6b807b1dfef5942df4f2fadd0fdb67a

2007-07-28 Thread Vojtech Pavlik
On Tue, Jul 17, 2007 at 03:52:07PM -0400, Dmitry Torokhov wrote:
> Hi Christoph,
> 
> On 7/17/07, Christoph Pfister <[EMAIL PROTECTED]> wrote:
> >
> >Yep, attached (cold reboot, not pressing any keys, 2.6.21).
> >
> 
> Ok, I see. You don't use PS/2 mouse and so BIOS told us that mouse is
> absent and reassigned IRQ12 to EHCI controller. However we do not
> listen to BIOS on i386 (for historucal reasons) and process with
> registering AUX port... Now IRQ12 is shared between AUX port and EHCI
> and the keyboard controller is unhappy wehereas before (with polling
> timer) it would release IRQ12 and close port...

Here I should add that IRQ sharing between ISA/LPC where i8042 lives and
PCI where EHCI lives doesn't work - only one of the sides will ever be
able to trigger interrupts, depending on the bridge config.

> Does your keyboard start working if you boot with i8042.noaux?

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix return value of i8042_aux_test_irq

2007-07-26 Thread Vojtech Pavlik
On Thu, Jul 26, 2007 at 09:54:50AM -0400, Dmitry Torokhov wrote:
> Hi,
> 
> On 7/26/07, Fernando Luis Vázquez Cao <[EMAIL PROTECTED]> wrote:
> >I made an interesting finding while testing the two patches below.
> >
> >http://lkml.org/lkml/2007/7/19/685
> >http://lkml.org/lkml/2007/7/19/687
> >
> >These patches modify the traditional CONFIG_DEBUG_KERNEL in such a way
> >that the request_irq prints a warning if after calling the handler it
> >returned IRQ_HANDLED .
> >
> >The code looks like this:
> >
> >int request_irq(unsigned int irq, irq_handler_t handler,
> >   unsigned long irqflags, const char *devname, void
> >*dev_id)
> >.
> >   if (irqflags & IRQF_DISABLED) {
> >   unsigned long flags;
> >
> >   local_irq_save(flags);
> >   retval = handler(irq, dev_id);
> >   local_irq_restore(flags);
> >   } else
> >   retval = handler(irq, dev_id);
> >   if (retval == IRQ_HANDLED) {
> >   printk(KERN_WARNING
> >  "%s (IRQ %d) handled a spurious interrupt\n",
> >  devname, irq);
> >   }
> >.
> >
> >I discovered that i8042_aux_test_irq handles the "fake" interrupt,
> >which, in principle, is not correct because it obviously isn't a real
> >interrupt and it could have been a spurious interrupt as well.
> >
> >The problem is that the interrupt handler unconditionally returns IRQ
> >handled, which does not seem correct. Anyway I am not very familiar with
> >this code so I may be missing the whole point. I would appreciate your
> >comments on this.
> >
> 
> The handler does handle the interrupt - both status and data registers
> are read so from the keyboard controller point of view the interrupt
> has been handled even if we happen to discard the data. As far as I
> know IRQ12 is never shared by BIOS... Vojtech, do you remember why we
> request IRQ12 with IRQF_SHARED?
 
I believe it was needed on PPC.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]: Add command-line option to i8042 to completely disable it

2007-07-25 Thread Vojtech Pavlik
On Wed, Jul 25, 2007 at 12:05:04AM -0700, Andrew Morton wrote:
> On Mon, 23 Jul 2007 13:05:22 -0400 Chris Lalancette <[EMAIL PROTECTED]> wrote:
> 
> > (I tried to send this patch to linux-input@, but it seems to be currently 
> > having
> > some problems, so I'm going directly to LKML).
> > 
> > Certain (broken) pieces of South Bridge hardware will respond to
> > i8042_read_status() on boot with 0x0, despite there not being a real i8042
> > controller hooked up in the south bridge.  This can cause the detection for 
> > the
> > i8042 to return a "phantom" device, which hangs up later initialization.  
> > Note
> > that using "i8042.nokbd" and/or "i8042.noaux" do not help with this, since 
> > this
> > shows up during i8042_controller_check() (before either of those options are
> > checked).  This patch adds a command-line option "i8042.disable", which just
> > completely disables any checking for the i8042 controller.
> 
> That's an unfortunate fix.  Is there really no way in which we
> can auto-detect such a situation without requiring a manual
> setting?

The fact that the current detection hangs suggests that the scenario is
possible to detect.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: Support for a less exclusive grab.

2007-07-03 Thread Vojtech Pavlik
On Tue, Jul 03, 2007 at 12:45:55PM -0400, Zephaniah E. Hull wrote:

> Not really, what happens when the user presses alt-F1?

Well, if the console is switched to medium raw or raw mode, nothing
happens.

> A way to tell the kernel that events from a given input device should
> not go to the console has been needed since the very first time an X
> driver allowed keyboard events from /dev/input/event, and that
> still has not changed.

And the ages old method that X has been using before the existence of
the input layer still works fine, as far as I know.

> We just want a more flexible approach then what we are already using[0].

I'm fine with that, I just wonder whether attacking it on the input
layer side is the right way to do it: If you want to disable the
console, tell the console.

That way, we don't have to figure out which exact kernel input handlers
are console and which are not.

> I'll see about writing something up when I get back to my computers[1]
> and have things set back up[2].

Enjoy whatever computerless period you're planning!

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: Support for a less exclusive grab.

2007-07-02 Thread Vojtech Pavlik
On Tue, Jun 12, 2007 at 01:40:31AM -0400, Zephaniah E. Hull wrote:
> On Tue, Jun 12, 2007 at 01:35:05AM -0400, Dmitry Torokhov wrote:
> > On Tuesday 12 June 2007 01:23, Zephaniah E. Hull wrote:
> > > On Tue, Jun 12, 2007 at 01:19:59AM -0400, Dmitry Torokhov wrote:
> > > > 
> > > > Like I said I would love if xf86-input-evdev did not grab the
> > > > device at all.
> > > 
> > > We have to disable the legacy input handlers somehow, not doing so
> > > simply isn't an option.
> > 
> > I do not follow. If user's xorg.conf does not use /dev/input/mice and
> > does not use "kbd" driver then grabbing is not required, is it? Now,
> > as far as I understand, lack of hotplug support in X is the main
> > obstacle for removing "mouse" and "kbd" drivers, correct?
> 
> Sadly, not quite.
> 
> The problem is that if the user is not using the mouse and kbd drivers
> at all, but is instead using xf86-input-evdev, and no grabbing is done,
> then all key presses end up going to the console.

X still switches to its own VT, so those keys go to the X server via the
console, and GPM also knows about console switching ...

That is a sane way how to prevent the regular console from getting
keypresses/mouse movements.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH: change keycode for scancode e0 32 from 150 to 172

2007-06-19 Thread Vojtech Pavlik
On Tue, Jun 19, 2007 at 12:59:50AM +0200, Hans de Goede wrote:
> Dmitry Torokhov wrote:
> >On 6/13/07, Jiri Kosina <[EMAIL PROTECTED]> wrote:
> >>On Wed, 13 Jun 2007, Hans de Goede wrote:
> >>
> >>> Good to hear, so as everyone smees to agree, shall I write a (massive,
> >>> complex, intrusive) patch to fix this, or are there until now silent
> >>> parties that object?
> >>
> >>Well, Dmitry is the one who has final word on this, as it is related to
> >>PS/2 keyboards. Just noticed he is not in CC, added.
> >>
> >
> >Yep, just send me the patch (don't forget to make change to
> >drivers/char/keyboard.c that Vojtech mentioned in other e-mail).
> >
> 
> Here is a patch as promised, attached.
> 
> Thanks & Regards,
> 
> Hans

Patch looks OK (didn't test it, but it seems to do all what's needed).

Acked-by: Vojtech Pavlik <[EMAIL PROTECTED]>


> As discussed by mail change keycode for scancode e0 32 from 150 to 172
> 
> Signed-off-by: Hans de Goede <[EMAIL PROTECTED]>
> diff -ur linux-2.6.22-rc4.orig/drivers/char/keyboard.c 
> linux-2.6.22-rc4/drivers/char/keyboard.c
> --- linux-2.6.22-rc4.orig/drivers/char/keyboard.c 2007-06-16 
> 22:54:47.0 +0200
> +++ linux-2.6.22-rc4/drivers/char/keyboard.c  2007-06-18 23:15:27.0 
> +0200
> @@ -1005,8 +1005,8 @@
>   284,285,309,  0,312, 91,327,328,329,331,333,335,336,337,338,339,
>   367,288,302,304,350, 89,334,326,267,126,268,269,125,347,348,349,
>   360,261,262,263,268,376,100,101,321,316,373,286,289,102,351,355,
> - 103,104,105,275,287,279,306,106,274,107,294,364,358,363,362,361,
> - 291,108,381,281,290,272,292,305,280, 99,112,257,258,359,113,114,
> + 103,104,105,275,287,279,258,106,274,107,294,364,358,363,362,361,
> + 291,108,381,281,290,272,292,305,280, 99,112,257,306,359,113,114,
>   264,117,271,374,379,265,266, 93, 94, 95, 85,259,375,260, 90,116,
>   377,109,111,277,278,282,283,295,296,297,299,300,301,293,303,307,
>   308,310,313,314,315,317,318,319,320,357,322,323,324,325,276,330,
> diff -ur linux-2.6.22-rc4.orig/drivers/input/keyboard/atkbd.c 
> linux-2.6.22-rc4/drivers/input/keyboard/atkbd.c
> --- linux-2.6.22-rc4.orig/drivers/input/keyboard/atkbd.c  2007-06-16 
> 22:54:48.0 +0200
> +++ linux-2.6.22-rc4/drivers/input/keyboard/atkbd.c   2007-06-18 
> 23:07:24.0 +0200
> @@ -89,7 +89,7 @@
> 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
>   217,100,255,  0, 97,165,  0,  0,156,  0,  0,  0,  0,  0,  0,125,
>   173,114,  0,113,  0,  0,  0,126,128,  0,  0,140,  0,  0,  0,127,
> - 159,  0,115,  0,164,  0,  0,116,158,  0,150,166,  0,  0,  0,142,
> + 159,  0,115,  0,164,  0,  0,116,158,  0,172,166,  0,  0,  0,142,
>   157,  0,  0,  0,  0,  0,  0,  0,155,  0, 98,  0,  0,163,  0,  0,
>   226,  0,  0,  0,  0,  0,  0,  0,  0,255, 96,  0,  0,  0,143,  0,
> 0,  0,  0,  0,  0,  0,  0,  0,  0,107,  0,105,102,  0,  0,112,
> @@ -111,7 +111,7 @@
>82, 83, 80, 76, 77, 72, 69, 98,  0, 96, 81,  0, 78, 73, 55,183,
>  
>   184,185,186,187, 74, 94, 92, 93,  0,  0,  0,125,126,127,112,  0,
> -   0,139,150,163,165,115,152,150,166,140,160,154,113,114,167,168,
> +   0,139,172,163,165,115,152,172,166,140,160,154,113,114,167,168,
>   148,149,147,140
>  };
>  


-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Proposal: change keycode for scancode e0 32 from 150 to 172

2007-06-13 Thread Vojtech Pavlik
On Wed, Jun 13, 2007 at 11:25:34AM +0200, Hans de Goede wrote:
> Jiri Kosina wrote:
> >On Tue, 12 Jun 2007, H. Peter Anvin wrote:
> >
> >>In PS/2 mode it reports E0 32 which gets converted to keycode 150. In 
> >>USB mode it reports E0 02 which gets converted to keycode 172.
> >>I don't know if it's the keyboard itself that's being inconsistent, or 
> >>if it is the table in usbkbd.c that's broken (in which case it should be 
> >>fixed to be consistent with the keyboard in PS/2 mode.)
> >
> >Hi Peter,
> >
> >First, usbkbd.c has very probably zero business with this - the mappings 
> >are being done in hid-input.c, usbkdb.c is only for embedded/debugging 
> >cases, and is almost never used on modern systems (see the corresponding 
> >Kconfig help text).
> >
> >>You seem to be of the opinion that "usb behaviour is correct", but don't 
> >>give any motivation why usb should take precedence.  Offhand, I would 
> >>expect there to be fewer translation layers for PS/2 and would therefore 
> >>assume PS/2 is more inherently correct.
> >
> >For USB, we have Hid Usage Pages, which define this to be KEY_HOMEPAGE. 
> >There is no such specification for PS/2 though, so what Hans is proposing 
> >is to make it consistent with behavior of USB HID devices, which I agree 
> >with.
> 
> Good to hear, so as everyone smees to agree, shall I write a (massive, 
> complex, intrusive) patch to fix this, or are there until now silent 
> parties that object?

Just don't forget to update the char/keyboard.c scancode synthesis table
to generate e0 32 for keycode 172 instead for the current 150.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Proposal: change keycode for scancode e0 32 from 150 to 172

2007-06-13 Thread Vojtech Pavlik
On Tue, Jun 12, 2007 at 03:29:48PM -0700, H. Peter Anvin wrote:
> I just tested this using Microsoft Natural Keyboard Pro, which is a
> dual-mode (USB-PS/2) keyboard.
> 
> This key is labelled Web/Home and has a picture of a house on the keycap.
> 
> In PS/2 mode it reports E0 32 which gets converted to keycode 150.
> In USB mode it reports E0 02 which gets converted to keycode 172.
> 
> I don't know if it's the keyboard itself that's being inconsistent, or
> if it is the table in usbkbd.c that's broken (in which case it should be
> fixed to be consistent with the keyboard in PS/2 mode.)

USB keyboards don't send scancodes, what you see is smoke and mirrors.

Yes, the atkbd.c and hid-input.c mappings are inconsistent. We probably
want to fix atkbd.c (scancode->keycode mapping), and maybe also
char/keyboard.c (keycode->synth scancode mapping).

> > The logo on the key is a homepage logo, the text below is www/homepage.
> > So what to send? I believe that for consistency with the usb codes send
> > it should be KEY_HOMEPAGE, but thats based on a sample of 1 usb
> > keyboard. Input on what other usb keyboards send for the key with the
> > homepage iocn is very much welcome.
> 
> You seem to be of the opinion that "usb behaviour is correct", but don't
> give any motivation why usb should take precedence.  Offhand, I would
> expect there to be fewer translation layers for PS/2 and would therefore
> assume PS/2 is more inherently correct.

KEY_HOMEPAGE seems to better fit the description of how the key looks.
Microsoft calls it "WWW Home", and it's supposed to be e0 32. 

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Proposal: change keycode for scancode e0 32 from 150 to 172

2007-06-13 Thread Vojtech Pavlik
On Tue, Jun 12, 2007 at 11:47:14PM +0200, Hans de Goede wrote:
> Hi all,
> 
> As some of you might know from my earlier post/thread about atkbd and 
> softraw, I'm currently working on getting keyboards with internet/easy 
> access keys to work painlessly / plug and play.
> 
> In order to be able to better test / develop this I've bought 2 cheap such 
> keyboards today, one ps2 and one both usb and ps2 capable.
> 
> When comparing usb vs ps2 / testing the keycodes generated for the easy 
> access
> keys on my trust (microsoft compatible) keyboard. I noticed the homepage 
> key sends keycode 150 with ps2 and 172 with USB, or for those who don't 
> know the keycodes by head with ps2 it sends KEY_WWW and with usb it sends 
> KEY_HOMEPAGE
> 
> I personally believe that the usb behaviour is correct and that the ps/2 
> code should be modified to match for consistency. The ps/2 scancode to 
> keycode mapping is set up to handle easy access / internet keys for 
> microsoft compatible keyboards. So what is the right code to send here, 
> tricky, see:
> http://www.s2.com.br/s2arquivos/361/Imagens/555Image.jpg
> http://www.keyboardco.com/keyboard_images/microsoft_ergonomic_keyboard_4000_black_usb_large.jpg
> 
> The logo on the key is a homepage logo, the text below is www/homepage. So 
> what to send? I believe that for consistency with the usb codes send it 
> should be KEY_HOMEPAGE, but thats based on a sample of 1 usb keyboard. 
> Input on what other usb keyboards send for the key with the homepage iocn 
> is very much welcome.

KEY_HOMEPAGE is likely correct. Maybe we don't want to have KEY_WWW at
all, since it seems to be redundant.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: problem with softraw and keycodes > 128

2007-06-07 Thread Vojtech Pavlik
On Thu, Jun 07, 2007 at 08:05:42PM +0200, Hans de Goede wrote:
 
> This is an atkbd only setting, right, so indeed it doesn't affect USB, 
> right?

Right, what I meant is that USB keyboards are affected by the very same
problem you describe (163 in kernel will be 153 in X), and it can't be
worked around by setting softraw to 0.

> What I'm wondering if is there is any harm to setting softraw to 0, atleast 
> until there is a better fix. With it set to 0, for ps/2 keyboards all the 
> user addtionally needs todo is select the correct model, for which there 
> are nice gui tools.

By doing that, you'll be following a dead end road - and noone will fix
the issue for non-PS/2 keyboards. 

> Add a usb-keyboard (which in reality is the linuxkeyboard) model to the 
> mix, for the user to select / to make default even, and things will work 
> for usb users to, right?

Yes, but once you create the linuxkeyboard description, you don't need
softraw=0.

> >>   Since the xkb discriptions for most of the easy access keyboards
> >>   have been written by Suse, I assume the tested them and it works
> >>   for them. Does anyone know how suse does this?
> >
> >I don't think it does work. One of our guys is trying to fix it by
> >
> > 1) Converting the xkb multimedia keyboard descriptions to 
> > setkeycodes descriptions
> > 2) Creating a "linux keyboard" X xkb description
> >
> >Then, after loading the setkeycodes at boot for the right keyboard
> >ty[e, AT keyboard will work fine, and USB and other keyboards will not
> >need *any* setup to have their multimedia keys working out of the box.
> 
> Thats good news, but what about gui tools to select the model, as this 
> cannot be autoprobed?

I'm sure SUSE will update some of the existing tools to work with the
kernel-based keyboard selection well.

> And what about portability of said tools to for example freebsd?

That will likely not work, obviously.

> Using xkb models should work across different OS's, and is available
> now.  All that is needed (for ps2 keyboards) is for the path of the
> special keys from the kernel to X-server to not change the codes.

Yes, the linux kernel solution is linux specific. But a lot of other
interfaces that X uses are OS specific.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: problem with softraw and keycodes > 128

2007-06-07 Thread Vojtech Pavlik
On Thu, Jun 07, 2007 at 04:55:23PM +0200, Hans de Goede wrote:

> >Since the table mapping Linux event keycodes to the synthesized XT
> >scancodes in step 6. is constant and shared for all (even USB and other)
> >keyboards, this means that the Linux kernel always presents a virtual
> >"Linux keyboard" to X.
> >
> >The "Linux keyboard" always sends the same scancodes for keys with the
> >same meaning, regardless of what the real hardware does.
> 
> But doesn't it only know the meaning after doing a setkeycodes for 
> easy-access keys? 

Most microsoft-compatible keys work by default without setkeycodes. But
yes, for those that aren't in the default table, it only knows the
meaning after you tell it with setkeycodes.

> I find it somewhat counter inituitive that doing 
> "setkeycodes e023 163" will give me a keycode 163 on the console (as 
> reported by showkeys), but will send a scancode of 153 to the X-server.

As I said, the algorithms/table to compute the keycode from the XT
scancodes are different in the kernel and in the X server, and that
can't be changed without breaking backwards compatibility.

The good news is that while the numbers aren't the same, there is a 1:1
mapping, that is, you'll get 153 in the server for 163 in the kernel,
always, regardless of the keyboard used or regardless of the actual key
you assigned this code to.

And I don't have the mapping table: It could likely be fairly easily
created by simply going through all the keys and writing the results
down.

> Well to be honest I'm looking for a somewhat more quick fix then that. Let 
> me try to explain. Xorg comes with keyboard descriptions for many 
> special-access keys having keyboards. These descriptions can be easily 
> selected by the end user from the kde / gnome keyboard properties applets.
> 
> This however only works with softraw=0 for 2 reasons:
> 1) without softraw=0 the key presses never reach the server when there 
> hasn't
>been an setkeycodes command for the easy access key
> 2) even if a setkeycodes command was given, then the keycode given to
>setkeycodes doesn't match the scancode seen by xorg, messing things up.
>(unless one uses setkeycodes with a reverse mapping of the mapping done 
>in
> the kernel)
> 
> There are 3 ways out of this:
> 1) write special loadkey maps for all the easy access keyboards, since 
> unlike X
>loadkeys AFAIK doesn't support inheriting keys from a global layout, the
>number of language-layout <-> keyboard model variants would become quite
>frightening.

No. You just need a 'setkeycodes' map for the extra keys, and that's
language-independent.

>When loadkeys has been thought to map the special keys according to the
>linuxkeyboard codes, then X needs to be tought how to handle a linux
>keyboard and default to this
> 
>Also configuration utils need to be written to properly set things up for
>loadkeys, as autodetecting this is impossible

X doesn't get the loadkeys (keymap) processed data, it gets the
keycodes, which are only affected by setkeycodes.

> 2) Somehow fix things so that selecting the right model in gnome/kde
>keyboard-preferences will make the keys work. Like it does now with
>softraw=0. Which leads me to asking what are the downsides of using
>softraw=0?

It doesn't work with anything else but PS/2 keyboards. It's useless on
eg. USB.

>What about be default defining keycodes for all the 128+ scancodes
>not yet defined, and defining these in such a way that the actual
>scancodes is what gets send to X?

Ugly.

>Or maybe it would be an option to send the raw scancode just like
>is done with softraw=0 when there is no mapping (instead of
>ignoring the key as is done now).

That'd confuse people even more than the current situation.

>Since the xkb discriptions for most of the easy access keyboards
>have been written by Suse, I assume the tested them and it works
>for them. Does anyone know how suse does this?

I don't think it does work. One of our guys is trying to fix it by

1) Converting the xkb multimedia keyboard descriptions to setkeycodes 
descriptions
2) Creating a "linux keyboard" X xkb description

Then, after loading the setkeycodes at boot for the right keyboard
ty[e, AT keyboard will work fine, and USB and other keyboards will not
need *any* setup to have their multimedia keys working out of the box.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: problem with softraw and keycodes > 128

2007-06-07 Thread Vojtech Pavlik
On Thu, Jun 07, 2007 at 10:21:33AM +0200, Hans de Goede wrote:

Hans,

> I've been experimenting with getting the internetkeys on several
> keyboards to work. My biggest problem with this currently is the
> following:

> Step 1: press key, dmesg says:
>   atkbd.c: Unknown key released (translated set 2, code 0xa3 on  
> isa0060/serio0).
>   atkbd.c: Use 'setkeycodes e023 ' to make it known.
> Step 2: map key: setkeycodes e023 163
> Step 3: run xev, press key. X-keycode is: 153 instead of 163 ?


> Doing:
>   echo -n 0 >/sys/devices/platform/i8042/serio1/softraw
> However does make them identical.

this is a well known problem. It's not easy to explain and next to
impossible to fix within the scope of operation of your test case -
otherwsie it'd be fixed ages ago.

The explanation why this happen will be a little longer, and will follow
the life of a keystroke as it passes through the different layers.

1. First, the user presses a key. Since we're talking a 102-key
PS/2-type keyboard here, the keyboard scans its matrix and figures out
the position of the key. Then, since the connection to the PC is running
in AT mode, it translates the key to a string (one to eight bytes) of
AT-compatible scancodes.

2. The i8042 in the computer receives the scancodes, translates them to
XT-compatible scancodes and makes them available to the OS.

3. In OS the i8042.c gets the scancodes and forwards them to atkbd.c

4. atkbd.c decodes the string of scancodes and turns them into a single
Linux input event, and forwards it to the input core.

5. The Linux input core dispatches the events to char/keyboard.c.

6. keyboard.c, a part of the VT driver, sees that X told the VT that it
wants the "raw mode", which means that the VT should give the keystrokes
as the original string of XT scancodes. Hence (when softraw is 1), it
consults a table and synthesizes what it thinks the original string of
bytes could have been and gives that to X. For all keys that atkbd.c
knows by default, this string is IDENTICAL to what atkbd.c has received.

7. X takes the string, and using an algorithm similar to, but DIFFERENT
from the algoithm that atkbd.c uses, it converts them to an X scancode.
The X scancodes are similar to, but different from the Linux event
keycodes, for historical reasons going back to the 90's.

8. Using an xmodmap table compiled from xkb sources, X converts the X
scancode to an X keysym.

In the softraw == 0 case, the original string of XT scancodes is passed
all along the path to X, so that X really gets what the keyboard sent.

Since the table mapping Linux event keycodes to the synthesized XT
scancodes in step 6. is constant and shared for all (even USB and other)
keyboards, this means that the Linux kernel always presents a virtual
"Linux keyboard" to X.

The "Linux keyboard" always sends the same scancodes for keys with the
same meaning, regardless of what the real hardware does.

And this is why X sees a different code in softraw==1 mode than in
softraw==0 mode: In the first it gets the virtual "Linux keyboard", in
the other it gets access to the real hardware.

> Problem, as the xkb files for these keyboards expect the X-keycode to
> be 163, as just it is under the console. Now I know that X-keycodes !=
> console-keycodes, for example the A key is 30 on the console and 38 in
> X, but in the case of this special keys, both the xkb files for these
> internet keyboards (written by suse) and config files for special
> daemons like lineak, expect them to be identical.

This is wrong. Someone simply needs to write a xkb/lineak description
for the "Linux keyboard", and then all these problems will go away.

PS.: And it's best to use the 'evdev' driver for X instead the 'kbd'
driver, since that one gets the Linux keycodes directly from the kernel,
without any back/forward translation.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Acecad USB Tablet: usbmouse takeover and odd motion

2007-04-20 Thread Vojtech Pavlik
On Fri, Apr 20, 2007 at 06:09:55PM +0200, Giuseppe Bilotta wrote:
> On 4/20/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> >On 4/20/07, Giuseppe Bilotta <[EMAIL PROTECTED]> wrote:
> >>
> >> Sorry, it seems I was wrong, it's not usbhid but usbmouse taking over.
> >> After a fresh plug (e.g. at bootup) I get the following:
> >>
> >
> >Well, the question is - why do you have usbmouse module on your system?
> 
> Stock Debian kernel 2.6.18 comes with it.
> 
> With my custom kernels I can probably skip compiling it at all, if you
> so suggest; should I blacklist it for the distro kernel? Or is there a
> chance that some random USB mouse plugged in would fail to function by
> doing so?
 
usbmouse and usbkbd are only intended for embedded systems where the
full usbhid doesn't fit and for testing purposes: Normal distros
shouldn't have them enabled.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] wistron_btns: More keymaps

2007-03-14 Thread Vojtech Pavlik
On Wed, Mar 14, 2007 at 02:58:49PM -0400, Dmitry Torokhov wrote:

> >> I have couple of comments/requests:
> >>
> >> 1. KEY_OPEN and KEY_CLOSE should not be used to signal state of the
> >> lid, they correspond to Accpication Control Open and Close keys from
> >> USB HID HUT spec:
> >>
> >> http://www.usb.org/developers/devclass_docs/Hut1_12.pdf
> >>
> >> SW_LID shoudl be used to signal lid state instead.
> >>
> >> 2. I also have a concern about using KEY_SCREEN to signal toggling
> >> display on and off. I am CCing Vojtech - he must know what the
> >> original intent of this key code was. BTW, when user presses
> >> corresponding button - does the display actually goes off? Maybe we
> >> need another switch.
> >
> >Sorry, Dmitry, I probably should have documented it back then, I don't
> >have the slightest idea anymore.
> >
> 
> That's what I was afraid of ;) Well, maybe we should just use it to
> indicate display switch and add a comment to that effect to input.h...

Digging somewhat deeper into my memory, I think it just so happened that
there was a keyboard with a key with a pictogram of a computer screen on
it with no really defined purpose. Since at that time there was only a
limited number of different extended keyboard types, it was added.

I think it'd make sense to use it eg. for the key that switches between
internal and external output on laptops, if the laptop generates a
keycode for the key at all. But maybe that function is already taken by
KEY_SWITCHVIDEOMODE ...

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] wistron_btns: More keymaps

2007-03-14 Thread Vojtech Pavlik
On Wed, Mar 14, 2007 at 09:54:27AM -0400, Dmitry Torokhov wrote:
> Hi Eric,
> 
> On 3/13/07, Eric Piel <[EMAIL PROTECTED]> wrote:
> >Hello,
> >
> >As a sequel to my patch "Wistron button support for TravelMate 610" of
> >last week, here is a bigger addition of keymaps for the wistron_btns.
> >
> >Patch 1 adds all the database of acerhk which fits this driver (about 25
> >more laptops).
> >Patch 2 adds a generic map that should fit most users but has the
> >disadvantage of not being automatic.
> >
> >Dmitry, I've tried to make them against your tree. Still, if they don't
> >apply cleanly, just tell me and I'll try harder!
> >
> 
> I have couple of comments/requests:
> 
> 1. KEY_OPEN and KEY_CLOSE should not be used to signal state of the
> lid, they correspond to Accpication Control Open and Close keys from
> USB HID HUT spec:
> 
> http://www.usb.org/developers/devclass_docs/Hut1_12.pdf
> 
> SW_LID shoudl be used to signal lid state instead.
> 
> 2. I also have a concern about using KEY_SCREEN to signal toggling
> display on and off. I am CCing Vojtech - he must know what the
> original intent of this key code was. BTW, when user presses
> corresponding button - does the display actually goes off? Maybe we
> need another switch.

Sorry, Dmitry, I probably should have documented it back then, I don't
have the slightest idea anymore.

> 3. The number of keymap tables grew considerably ;) Do you think it
> woudl make sense to allocate memory for keymap at device creation time
> and copy selected keymap and them mark all original keymaps as
> __initdata so they are discarded one module completed initialization?
> 
> Thank you.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [KJ][RFC][PATCH] BIT macro cleanup

2007-02-24 Thread Vojtech Pavlik
On Fri, Feb 23, 2007 at 11:43:44PM +0100, Richard Knutsson wrote:

> >I am saying that IMO input's BIT definition should be
> >adequate for 99% of potential users and that I would be OK with moving
> >said BIT definition from input.h to bitops.h and maybe supplementing
> >it with LLBIT. I am also saying that I do not want BITWRAP, BITSWAP
> >(what swap btw?) nor BIT(x % BITS_PER_LONG) in input drivers.

And I totally agree with Dmitry. The "% BITS_PER_LONG" doesn't hurt
other users, and it's needed for larger-than-single-long bit arrays.

> Is the reason for the modulo to put a bitmask larger then the variable 
> into an array?

The complementary LONG() macro will tell you the index of an array of
longs where the bit should be set.

> I did just a quick 'grep' for "BIT(" in drivers/input/ 
> and from what I saw, most (or all?) of the values are defined constants 
> and those in input.h were noway near the limits of a 'long'.

Well, many do not need it, but for example BIT(BTN_LEFT) does, and
that's used in a lot of places.

> The reason I don't like it with modulo is simply because it hides 
> potential bugs (when x is to big). 

That would be my only concern - losing compiler warnings.

> And what about the "1%"?

The 1% will need either LLBIT or an extra % 8.

> IMHO BIT should be as simple as possible.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [KJ][RFC][PATCH] BIT macro cleanup

2007-02-24 Thread Vojtech Pavlik
On Fri, Feb 23, 2007 at 01:44:41PM +0530, Milind Choudhary wrote:
> Hi all
>   working towards the cleanup of BIT macro,
> I've added one to  & cleaned some obvious users.
> 
> include/linux/input.h also has a BIT macro
> which does a wrap
> so currently i've done something like
> 
> +#undef BIT
> #define BIT(nr)(1UL << ((nr) % BITS_PER_LONG))
 
Since the previous definition of

#define BIT(nr) (1UL << (nr))

gives the same results as the above one for all reasonable usage
scenarios (you don't want to supply nr larger than BITS_PER_LONG),
why not just use the modulo version everywhere?

The only problem I see is that the compiler would not warn where nr IS
too large.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/9] Remove the TSC synchronization on SMP machines

2007-02-13 Thread Vojtech Pavlik
On Tue, Feb 13, 2007 at 11:38:33PM +0100, Andrea Arcangeli wrote:
> Hi,
> 
> On Tue, Feb 13, 2007 at 11:18:48PM +0100, Vojtech Pavlik wrote:
> > It's not inherent to ntpd's design, but the current (which may have been
> > fixed since I looked last) implementation of the NTP PLL in the kernel.
> > 
> > The interaction with ntpd can be fixed and I've done it in the past
> > once, although the fix wasn't all that nice.
> 
> Yep, it can slowly move towards the correct time, but ntpdate (or more
> generally settimeofday) remains a fundamental issue (and I prefer time
> skews to be fixed ASAP, not slowly).

Skipping forward is trivial. For going backward, you can stop time (or
make it go forward very slowly). Still the output will be strictly
monotonic (but not more than that).

For small changes you simply change your estimate of the base clock
frequency to be different from what the specs say. Tuning that in a PLL
will get you to sync with true atomic GMT.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/9] Remove the TSC synchronization on SMP machines

2007-02-13 Thread Vojtech Pavlik
On Tue, Feb 13, 2007 at 06:20:14PM +0100, Andi Kleen wrote:
> On Tuesday 13 February 2007 18:09, Christoph Lameter wrote:
> > On Tue, 13 Feb 2007, Arjan van de Ven wrote:
> > 
> > > no quite the opposite. gettimeofday() currently is NOT monotonic
> > > unfortunately. With this patchseries it actually has a better chance of
> > > becoming that...
> > 
> > It is monotonic on IA64 at least and we have found that subtle application 
> > bugs occur if it is not. IA64 (and other arches using time interpolation) 
> > can insure the monotoneity of time sources. Are you sure about this? I 
> > wonder why the new time of day subsystem does not have that?
> 
> Just to avoid spreading misinformation: modulo some new broken hardware
> (which we always try to work around when found) i386/x86-64 gettimeofday
> is monotonic.  AFAIK on the currently known hardware it should be generally
> ok.
> 
> However ntpd can always screw you up, but that's inherent in the design.

It's not inherent to ntpd's design, but the current (which may have been
fixed since I looked last) implementation of the NTP PLL in the kernel.

The interaction with ntpd can be fixed and I've done it in the past
once, although the fix wasn't all that nice.

> Safer in general is to use clock_gettime(CLOCK_MONOTONIC, ...) which
> guarantees no interference from ntpd

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: +1 4 cz (Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit)

2007-02-06 Thread Vojtech Pavlik
On Tue, Feb 06, 2007 at 07:29:09PM +, James Simmons wrote:
> 
> Has the place for the KS been decided? If not I like to suggest switzerland. 
> 
> > On Tue, Jan 23, 2007 at 07:18:46PM +, Oleg Verych wrote:
> > 
> > > >> Ditto..
> > > >> 
> > > >> Definitely disagree with that. I'd like to see the conference somewhere
> > > >> else different this time - perhaps *Czech Republic*, or somewhere else 
> > > >> more
> > > >> easterly and Linux active (or even Finland...)
> > > >
> > > > This is my position as well.
> > > > 
> > > > For the first time in many years I'm strongly considering actually
> > > > going to the kernel summit, however if it goes back to Ottawa I
> > > > definitely will stop going again.
> > > 
> > > It would be interesting. Thank you Alan, David!
> > 
> > I (and SUSE/Novell) would be happy to help organizing the Kernel Summit
> > in Czech Republic.
 
I didn't get any response yet to my proposal.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday

2007-02-01 Thread Vojtech Pavlik
On Thu, Feb 01, 2007 at 08:56:48AM -0800, john stultz wrote:
> On Thu, 2007-02-01 at 15:52 +0100, Jiri Bohac wrote:
> > On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
> >>
> > > The big strategic problem is how to marry your patchkit to John Stultz's
> > > clocksources work which is also competing for merge. Any thoughts on 
> > > that? 
> > 
> > I'll look into that next week. Sorry, I wanted to do that a long time
> > ago, but I spent weeks (over a month) fighting a nasty livelock
> > in the code. (Morale: think twice before using a spinlock inside
> >   a {do .. while (read_seqretry(..))} loop)
> 
> The first step here shouldn't be too difficult. Just create a _read
> function that uses your code to return monotonic TSC cycles (instead of
> nanoseconds w/ gettimeofday).  Then just create a clocksource structure
> for it.

guess_mt() is more or less the function you're looking for. (With the
exception of the cpufreq and mode switching logic.)

> The harder part will be the vsyscall, as you will need extra per cpu
> data in the vsyscall read. I had some test code for this situation
> awhile back, so if you get the first part functioning correctly (just a
> clocksource w/o a vread pointer), I'll gladly help you get the vsyscall
> bits working.
> 
> thanks
> -john
> 
> 

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 9/9] Make use of the Master Timer

2007-02-01 Thread Vojtech Pavlik
On Thu, Feb 01, 2007 at 03:29:31PM +0100, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:36:05PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 11:00, [EMAIL PROTECTED] wrote:
> > 
> > > + case VXTIME_TSC:
> > > + rdtscll(tsc);
> > 
> > Where is the CPU synchronization? 
> > 
> > > + cpu = smp_processor_id();
> > > + rdtscll(t);
> > 
> > Also no synchronization. It's slower, but needed.
> 
> Hmm, I wasn't sure. Why is it needed? How outdated can the
> result of RDTSC / RDTSCP be?
> 
> If I do:
>   rdtscll(a)
>   ...
>   rdtscll(b)
> is it guaranteed that (b > a) ?

On a single CPU this is always guaranteed. Even on AMD.

> > >  unsigned long long sched_clock(void)
> > >  {
> > > - unsigned long a = 0;
> > > -
> > > - rdtscll(a);
> > > - return cycles_2_ns(a);
> > > + return monotonic_clock();
> > >  }
> > 
> > This is overkill because sched_clock() doesn't need a globally monotonic
> > clock, per CPU monotonic is enough. The old version was fine.
> 
> OK, thanks for spotting this. I'll change it to use __guess_mt().
> (more or less equal to cycles_2_ns(), no need to maintain yet another
> tsc->ns ratio just for cycles_2_ns().

Will this also work correctly during CPU frequency changes?

> > > - tv->tv_sec = sec + usec / 100;
> > > - tv->tv_usec = usec % 100;
> > > + sec += nsec / NSEC_PER_SEC;
> > > + nsec %= NSEC_PER_SEC;
> > 
> > Using while() here is probably faster (done in vdso patchkit where
> > gtod got mysteriously faster). Modulo and divisions are slow, even 
> > for constants when they are large.
> 
> OK, will do that

I'd suggest benchmarking the difference.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/9] Remove the TSC synchronization on SMP machines

2007-02-01 Thread Vojtech Pavlik
On Thu, Feb 01, 2007 at 02:17:15PM +0100, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:14:23PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 10:59, [EMAIL PROTECTED] wrote:
> > > TSC is either synchronized by design or not reliable
> > > to be used for anything, let alone timekeeping.
> > 
> > In my tree this is already done better by a patch from Ingo.
> > Check if they look synchronized and don't use TSC if they are not.
> 
> The whole purpose of this patchset is to make use of TSC even if
> it's not synchronized.
> 
> Synchronizing it will not make anything better in any way -- the
> implementation just does not care whether TSCs are synchronized.
> That's why I think the synchronization code is not needed.
 
It might even make sense to desycnhronize the TSCs on such (AMD)
machines on purpose, so that applications that rely on TSC break
immediately and not after some time when the error becomes too large.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] System Inactivity Monitor v1.0

2007-01-30 Thread Vojtech Pavlik
On Tue, Jan 30, 2007 at 01:33:10PM +0100, Alessandro Di Marco wrote:
> Vojtech Pavlik <[EMAIL PROTECTED]> writes:
> 
>On Mon, Jan 29, 2007 at 11:42:08PM +0100, Alessandro Di Marco wrote:
> 
>> OK, but what about the time-warp problem?. To fix it I need to know when 
> the
>> system goes to sleep/resumes. In SIN I've solved via the platform driver,
>> introducing suspend() resume() callbacks...
> 
>Well, you just need to make sure that a resume() actually is a visible
>event ...
> 
> Sorry, but I don't see the point. Visible to what? 

Counted as activity.

> Mine problem here is that the input device doesn't care about suspend/resume
> cycles (it is a straight char driver), probably because it doesn't need to (so
> far.) Low-level drivers (kbd & co) on the contrary are all bus or platform
> drivers, hooking directly into suspend/resume callbacks.
> 
> Do you mean that I should back-propagate a suspend/resume event from the
> low-level drivers to the input one?
 
Yes, but not as a callback, but instead as an input event. 

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] System Inactivity Monitor v1.0

2007-01-30 Thread Vojtech Pavlik
On Mon, Jan 29, 2007 at 11:42:08PM +0100, Alessandro Di Marco wrote:
> Pavel Machek <[EMAIL PROTECTED]> writes:
> 
>Hi!
> 
>>The /proc/bus/input/devices has an extensible structure. You can just
>>add an "A:" line (for Activity) instead of adding a new proc file.
>> 
>> I know, but IMO there is too much stuff to parse in there. Activity 
> counters
>> are frequently accessed by daemons, and four or five concurrent daemons 
> are the
>> norm in a typical X11 linux box...
> 
>Syscalls are fast enough, and the file is _very_ easy (=> fast) to parse.
> 
>>Also, the activity counters should IMO coincide with the event times
>>passed through /dev/input/event, and should not be jiffies based.
>>Ideally, both should be based on clock_gettime(CLOCK_MONOTONIC).
>> 
>> In evdev.c do_gettimeofday() is used. Anyway I just need of a monotonic
>> counter, so get_jiffies_64() wouldn't be better? It isn't affected by 
> wrapping
>> issues and it is probably faster than do_gtod().
> 
>Just use same time source rest of inputs already do...
> 
> OK, but what about the time-warp problem?. To fix it I need to know when the
> system goes to sleep/resumes. In SIN I've solved via the platform driver,
> introducing suspend() resume() callbacks...
 
Well, you just need to make sure that a resume() actually is a visible
event ...

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] System Inactivity Monitor v1.0

2007-01-27 Thread Vojtech Pavlik
On Sat, Jan 27, 2007 at 05:45:25PM +, Pavel Machek wrote:
> Hi!
> 
> >Well, I do not think your kernel code is mergeable. But bits to enable
> >similar functionality in userspace probably would be mergeable.
> > 
> > You said it :-)
> > 
> > This patch exports to the user space the inactivity time (in msecs) of a 
> > given
> > input device. Example follows:
> 
> Looks okay to me. I guess you should sign it off, and ask Dmitry
> (input maintainer) for a merge?

The /proc/bus/input/devices has an extensible structure. You can just
add an "A:" line (for Activity) instead of adding a new proc file.
Anyway, I believe this should be also available through sysfs, if not
only there.

Also, the activity counters should IMO coincide with the event times
passed through /dev/input/event, and should not be jiffies based.
Ideally, both should be based on clock_gettime(CLOCK_MONOTONIC).

> > <0> $ cat /proc/bus/input/activity
> > 0011 0001 0001 ab41 1
> > 0011 0002 0008  3160799
> > 0011 0002 0008 7321 549991
> > 0019  0005  3160799
> > 0019  0001  3454901
> > 0010 104d   3160799
> > 0010 104d   2162833
> > 
> > The device ordering matches the /proc/bus/input/devices one, anyway I 
> > reported
> > also vendor, product, etc. Now the daemon is trivial...
> 
> > @@ -482,6 +484,30 @@
> > return seq_open(file, &input_devices_seq_ops);
> >  }
> >  
> > +static int input_activity_seq_show(struct seq_file *seq, void *v)
> > +{
> > +   struct input_dev *dev = container_of(v, struct input_dev, node);
> > +
> > +   seq_printf(seq, "%04x %04x %04x %04x\t%u\n",
> > +  dev->id.bustype, dev->id.vendor,
> > +  dev->id.product, dev->id.version,
> > +  jiffies_to_msecs((long) jiffies - (long) 
> > dev->last_activity));
> > +
> > +   return 0;
> > +}
> > +
> > +static struct seq_operations input_activity_seq_ops = {
> > +   .start  = input_devices_seq_start,
> > +   .next   = input_devices_seq_next,
> > +   .stop   = input_devices_seq_stop,
> > +   .show   = input_activity_seq_show,
> > +};
> > +
> > +static int input_proc_activity_open(struct inode *inode, struct file *file)
> > +{
> > +   return seq_open(file, &input_activity_seq_ops);
> > +}
> > +
> >  static struct file_operations input_devices_fileops = {
> > .owner  = THIS_MODULE,
> > .open   = input_proc_devices_open,
> > @@ -491,6 +517,15 @@
> > .release= seq_release,
> >  };
> >  
> > +static struct file_operations input_activity_fileops = {
> > +   .owner  = THIS_MODULE,
> > +   .open   = input_proc_activity_open,
> > +   .poll   = input_proc_devices_poll,
> > +   .read   = seq_read,
> > +   .llseek = seq_lseek,
> > +   .release= seq_release,
> > +};
> > +
> >  static void *input_handlers_seq_start(struct seq_file *seq, loff_t *pos)
> >  {
> > /* acquire lock here ... Yes, we do need locking, I knowi, I know... */
> > @@ -558,15 +593,23 @@
> > entry->owner = THIS_MODULE;
> > entry->proc_fops = &input_devices_fileops;
> >  
> > -   entry = create_proc_entry("handlers", 0, proc_bus_input_dir);
> > +   entry = create_proc_entry("activity", 0, proc_bus_input_dir);
> > if (!entry)
> > goto fail2;
> >  
> > entry->owner = THIS_MODULE;
> > +   entry->proc_fops = &input_activity_fileops;
> > +
> > +   entry = create_proc_entry("handlers", 0, proc_bus_input_dir);
> > +   if (!entry)
> > +   goto fail3;
> > +
> > +   entry->owner = THIS_MODULE;
> > entry->proc_fops = &input_handlers_fileops;
> >  
> > return 0;
> >  
> > + fail3:remove_proc_entry("activity", proc_bus_input_dir);
> >   fail2:remove_proc_entry("devices", proc_bus_input_dir);
> >   fail1: remove_proc_entry("input", proc_bus);
> > return -ENOMEM;
> > diff -ur OLD/include/linux/input.h NEW/include/linux/input.h
> > --- OLD/include/linux/input.h   2007-01-26 16:59:38.0 +0100
> > +++ NEW/include/linux/input.h   2007-01-26 17:31:29.0 +0100
> > @@ -949,6 +949,8 @@
> > const char *uniq;
> > struct input_id id;
> >  
> > +   unsigned long last_activity;
> > +   
> > unsigned long evbit[NBITS(EV_MAX)];
> > unsigned long keybit[NBITS(KEY_MAX)];
> > unsigned long relbit[NBITS(REL_MAX)];
> 
> > 
> > -- 
> > I don't think anyone should write their autobiography until after they're
> > dead. - Samuel Goldwyn
> 
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> 

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: +1 4 cz (Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit)

2007-01-25 Thread Vojtech Pavlik
On Tue, Jan 23, 2007 at 07:18:46PM +, Oleg Verych wrote:

> >> Ditto..
> >> 
> >> Definitely disagree with that. I'd like to see the conference somewhere
> >> else different this time - perhaps *Czech Republic*, or somewhere else more
> >> easterly and Linux active (or even Finland...)
> >
> > This is my position as well.
> > 
> > For the first time in many years I'm strongly considering actually
> > going to the kernel summit, however if it goes back to Ottawa I
> > definitely will stop going again.
> 
> It would be interesting. Thank you Alan, David!

I (and SUSE/Novell) would be happy to help organizing the Kernel Summit
in Czech Republic.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.19] USB HID: proper LED-mapping (support for SpaceNavigator)

2007-01-15 Thread Vojtech Pavlik
On Mon, Jan 15, 2007 at 07:38:10PM +0100, Marcel Holtmann wrote:
> Hi Vojtech,
> 
> > > No, it didn't disappear, it was just moved to include/linux/hid-debug.h. 
> > 
> > Do you think that makes sense? It's code, not a header file.
> > 
> > > Should I wait for an updated patch that uses hid-debug.h again?
> 
> actually that code shouldn't be in a header file at all. It should be
> drivers/hid/hid-debug.c and the Makefile should compile it conditionally
> depending on a Kconfig option.
 
Agreed. It was a .h file just because of me being lazy.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.19] USB HID: proper LED-mapping (support for SpaceNavigator)

2007-01-15 Thread Vojtech Pavlik
On Mon, Jan 15, 2007 at 05:44:26PM +0100, Jiri Kosina wrote:
> On Mon, 15 Jan 2007, Simon Budig wrote:
> 
> > > thanks for the patch. It seems that it is based on pre-2.6.20-rc1 kernel 
> > > (this is where the USBHID split happened and generic HID layer was 
> > > introduced). Could you please rebase it against newer version of kernel 
> > > and resend it?
> > I've updated the patch, will submit it to the list in a few minutes.
> 
> I got it, thanks.
> 
> > > All your changes happen to be in the transport-independent code, so it 
> > > seems that this would be rather trivial task - probably only pathnames 
> > > (and diff offsets) will change - your changes should now go to 
> > > drivers/hid/hid-*, not drivers/usb/input/hid-*.
> > Yeah, it was easy to port over. Did the hid-debug stuff disappear
> > completely? What would I use instead?
> 
> No, it didn't disappear, it was just moved to include/linux/hid-debug.h. 

Do you think that makes sense? It's code, not a header file.

> Should I wait for an updated patch that uses hid-debug.h again?

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Note subscribers only lists for input subsystem

2006-12-13 Thread Vojtech Pavlik
On Tue, Dec 12, 2006 at 12:40:35PM -0500, Dmitry Torokhov wrote:
> On 12/12/06, Cal Peake <[EMAIL PROTECTED]> wrote:
> >According to Dmitry in <http://lkml.org/lkml/2006/10/17/280>, the input
> >list is subscribers only. I'm assuming here that both are but a
> >confirmation would be nice... :)
> >
> >From: Cal Peake <[EMAIL PROTECTED]>
> >
> >Annotate the MAINTAINERS file to reflect the subscribers only nature of
> >the input mailing lists.
> >
> 
> Actually I'd rather have them open, let's see if Vojtech could change
> that... The main problem is that not only input lists accept posts
> from subscribers only but they silently drop everything else. Maybe
> dropping only HTML posts would be a decent compromise.
 
Ok. I'll take a look at it.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix misspelled i8259 typo in io_apic.c

2005-09-09 Thread Vojtech Pavlik
On Fri, Sep 09, 2005 at 12:59:04PM +0200, Karsten Wiese wrote:
> The legacy PIC's name is "i8259".
> 
> Signed-off-by: Karsten Wiese <[EMAIL PROTECTED]>

Indeed, i82559 is an Ethernet NIC.

Signed-off-by: Vojtech Pavlik <[EMAIL PROTECTED]>

> diff -upr linux-2.6.13-rc6/arch/i386/kernel/io_apic.c 
> linux-2.6.13/arch/i386/kernel/io_apic.c
> --- linux-2.6.13-rc6/arch/i386/kernel/io_apic.c   2005-08-08 
> 11:46:00.0 +0200
> +++ linux-2.6.13/arch/i386/kernel/io_apic.c   2005-08-08 13:24:52.0 
> +0200
> @@ -1641,9 +1641,9 @@ void disable_IO_APIC(void)
>   clear_IO_APIC();
>  
>   /*
> -  * If the i82559 is routed through an IOAPIC
> +  * If the i8259 is routed through an IOAPIC
>* Put that IOAPIC in virtual wire mode
> -  * so legacy interrups can be delivered.
> +  * so legacy interrupts can be delivered.
>*/
>   pin = find_isa_irq_pin(0, mp_ExtINT);
>   if (pin != -1) {
> diff -upr linux-2.6.13-rc6/arch/x86_64/kernel/io_apic.c 
> linux-2.6.13/arch/x86_64/kernel/io_apic.c
> --- linux-2.6.13-rc6/arch/x86_64/kernel/io_apic.c 2005-08-08 
> 11:46:01.0 +0200
> +++ linux-2.6.13/arch/x86_64/kernel/io_apic.c 2005-08-08 13:25:09.0 
> +0200
> @@ -1138,9 +1138,9 @@ void disable_IO_APIC(void)
>   clear_IO_APIC();
>  
>   /*
> -  * If the i82559 is routed through an IOAPIC
> +  * If the i8259 is routed through an IOAPIC
>* Put that IOAPIC in virtual wire mode
> -  * so legacy interrups can be delivered.
> +  * so legacy interrupts can be delivered.
>*/
>   pin = find_isa_irq_pin(0, mp_ExtINT);
>   if (pin != -1) {
> _
> 
>   
> 
>   
>   
> ___ 
> Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: 
> http://mail.yahoo.de
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INPUT: keyboard_tasklet - don't touch LED's of already grabed device

2005-09-06 Thread Vojtech Pavlik
On Tue, Sep 06, 2005 at 04:52:28AM -0700, Hugo Vanwoerkom wrote:
> 
> 
> --- Aivils Stoss <[EMAIL PROTECTED]> wrote:
> 
> > Hi, Vojtech!
> > 
> > Recent kernels allow exclusive usage of input device
> > when
> > input device is grabed. keyboard_tasklet does not
> > check
> > device state and switch LED's of all keyboards.
> > However
> > grabed device may be use another LED steering code.
> > 
> > This patch forbid keyboard_tasklet switch LED's of
> > grabed devices.
> > 
> > Aivils Stoss
> 
> 
> While trying this with 2.6.12 it gets a compilation
> error. Not when you move the added statements after
> the structure declaration.
> 
> Is that me heading for them thar hills?
 
The patch probably wasn't tested. ;)

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Change behaviour of psmouse-base.c under error conditions.

2005-09-05 Thread Vojtech Pavlik
e->resync_time = parent ? 0 : psmouse_resync_time;
>   psmouse->smartscroll = psmouse_smartscroll;
>  
>   psmouse_switch_protocol(psmouse, NULL);
> @@ -944,6 +1094,7 @@ static int psmouse_connect(struct serio 
>   device_create_file(&serio->dev, &psmouse_attr_rate);
>   device_create_file(&serio->dev, &psmouse_attr_resolution);
>   device_create_file(&serio->dev, &psmouse_attr_resetafter);
> + device_create_file(&serio->dev, &psmouse_attr_resync_time);
>  
>   psmouse_activate(psmouse);
>  
> @@ -1233,6 +1384,24 @@ static ssize_t psmouse_attr_set_resetaft
>   return count;
>  }
>  
> +static ssize_t psmouse_attr_show_resync_time(struct psmouse *psmouse, char 
> *buf)
> +{
> + return sprintf(buf, "%d\n", psmouse->resync_time);
> +}
> +
> +static ssize_t psmouse_attr_set_resync_time(struct psmouse *psmouse, const 
> char *buf, size_t count)
> +{
> + unsigned long value;
> + char *rest;
> +
> + value = simple_strtoul(buf, &rest, 10);
> + if (*rest)
> + return -EINVAL;
> +
> + psmouse->resync_time = value;
> + return count;
> +}
> +
>  static int psmouse_set_maxproto(const char *val, struct kernel_param *kp)
>  {
>   struct psmouse_protocol *proto;
> @@ -1259,13 +1428,21 @@ static int psmouse_get_maxproto(char *bu
>  
>  static int __init psmouse_init(void)
>  {
> + kpsmoused_wq = create_singlethread_workqueue("kpsmoused");
> + if (!kpsmoused_wq) {
> + printk(KERN_ERR "psmouse: failed to create kpsmoused 
> workqueue\n");
> + return -ENOMEM;
> + }
> +
>   serio_register_driver(&psmouse_drv);
> +
>   return 0;
>  }
>  
>  static void __exit psmouse_exit(void)
>  {
>   serio_unregister_driver(&psmouse_drv);
> + destroy_workqueue(kpsmoused_wq);
>  }
>  
>  module_init(psmouse_init);
> Index: work/drivers/input/mouse/logips2pp.c
> ===
> --- work.orig/drivers/input/mouse/logips2pp.c
> +++ work/drivers/input/mouse/logips2pp.c
> @@ -117,7 +117,7 @@ static int ps2pp_cmd(struct psmouse *psm
>   if (psmouse_sliced_command(psmouse, command))
>   return -1;
>  
> - if (ps2_command(&psmouse->ps2dev, param, PSMOUSE_CMD_POLL))
> + if (ps2_command(&psmouse->ps2dev, param, PSMOUSE_CMD_POLL | 0x0300))
>   return -1;
>  
>   return 0;
> Index: work/drivers/input/mouse/alps.c
> ===
> --- work.orig/drivers/input/mouse/alps.c
> +++ work/drivers/input/mouse/alps.c
> @@ -347,6 +347,39 @@ static int alps_tap_mode(struct psmouse 
>   return 0;
>  }
>  
> +/*
> + * alps_poll() - poll the touchpad for current motion packet.
> + * Used in resync.
> + */
> +static int alps_poll(struct psmouse *psmouse)
> +{
> + struct alps_data *priv = psmouse->private;
> + unsigned char buf[6];
> +
> + if (priv->i->flags & ALPS_PASS)
> + alps_passthrough_mode(psmouse, 1);
> +
> + if (ps2_command(&psmouse->ps2dev, buf, PSMOUSE_CMD_POLL | 
> (psmouse->pktsize << 8)))
> + return -1;
> +
> + if ((buf[0] & priv->i->mask0) != priv->i->byte0)
> + return -1;
> +
> + if (priv->i->flags & ALPS_PASS)
> + alps_passthrough_mode(psmouse, 0);
> +
> + if ((psmouse->badbyte & 0xc8) == 0x08) {
> +/*
> + * Poll the track stick ...
> + */
> + if (ps2_command(&psmouse->ps2dev, buf, PSMOUSE_CMD_POLL | (3 << 
> 8)))
> + return -1;
> + }
> +
> + memcpy(psmouse->packet, buf, sizeof(buf));
> + return 0;
> +}
> +
>  static int alps_reconnect(struct psmouse *psmouse)
>  {
>   struct alps_data *priv = psmouse->private;
> @@ -448,6 +481,7 @@ int alps_init(struct psmouse *psmouse)
>   printk(KERN_INFO "input: %s on %s\n", priv->dev2.name, 
> psmouse->ps2dev.serio->phys);
>  
>   psmouse->protocol_handler = alps_process_byte;
> + psmouse->poll = alps_poll;
>   psmouse->disconnect = alps_disconnect;
>   psmouse->reconnect = alps_reconnect;
>   psmouse->pktsize = 6;
> 

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >