Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-11 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
>  wrote:
> >
> > If the only immediate problem is the code generation size, then Andy
> > already had a (simpler) hack-around:
> >
> >   #undef CONFIG_OPTIMIZE_INLINING
> >   #undef CONFIG_X86_PPRO_FENCE
> >
> > in vclock_gettime.c
> 
> Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.
> 
> It was of questionable value to begin with, and I think that the
> actual PPro bug is about one of
> 
>  - Errata 66, "Delayed line invalidation".
>  - Errata 92, "Potential loss of data coherency"
> 
> both of which affect all PPro versions afaik (there is also a UP 
> errata 51 wrt ordering of cached and uncached accesses that was 
> fixed in the sB1 stepping).
>
> And as far as I know, we have never actually seen the bug in real 
> life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on 
> knowledge of the errata afaik.

I'm not aware of any active PPro testers either. Even P4 feedback has 
become very rare. New systems have become so cheap and so fast, and 
energy use an issue, that there's very little upside left to using old 
CPUs, other than the vintage thrill factor.

But ... when PPro was common our parallelization sucked, so I'd not be 
surprised if it triggered more frequently with a modern kernel.

Still I agree that it most likely does not matter:

> So I do think we might want to consider retiring that config option 
> entirely as a "historical oddity".

Ack.

> And very much so for the vdso case. Do we even do the asm 
> alternative fixups for the vdso?
> 
> I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least 
> limit it to !SMP - I don't think anybody ever made SMP systems with 
> those IDT/Centaur Winchip chips in them.

Yeah.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-11 Thread H. Peter Anvin
On 03/10/2014 02:29 PM, stef...@seibold.net wrote:
> 
> Do you except a complete new patch set or an incremental patch based on the
> current patch set?
> 

An incremental patch is probably easier.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-11 Thread H. Peter Anvin
On 03/10/2014 02:29 PM, stef...@seibold.net wrote:
 
 Do you except a complete new patch set or an incremental patch based on the
 current patch set?
 

An incremental patch is probably easier.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-11 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  If the only immediate problem is the code generation size, then Andy
  already had a (simpler) hack-around:
 
#undef CONFIG_OPTIMIZE_INLINING
#undef CONFIG_X86_PPRO_FENCE
 
  in vclock_gettime.c
 
 Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.
 
 It was of questionable value to begin with, and I think that the
 actual PPro bug is about one of
 
  - Errata 66, Delayed line invalidation.
  - Errata 92, Potential loss of data coherency
 
 both of which affect all PPro versions afaik (there is also a UP 
 errata 51 wrt ordering of cached and uncached accesses that was 
 fixed in the sB1 stepping).

 And as far as I know, we have never actually seen the bug in real 
 life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on 
 knowledge of the errata afaik.

I'm not aware of any active PPro testers either. Even P4 feedback has 
become very rare. New systems have become so cheap and so fast, and 
energy use an issue, that there's very little upside left to using old 
CPUs, other than the vintage thrill factor.

But ... when PPro was common our parallelization sucked, so I'd not be 
surprised if it triggered more frequently with a modern kernel.

Still I agree that it most likely does not matter:

 So I do think we might want to consider retiring that config option 
 entirely as a historical oddity.

Ack.

 And very much so for the vdso case. Do we even do the asm 
 alternative fixups for the vdso?
 
 I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least 
 limit it to !SMP - I don't think anybody ever made SMP systems with 
 those IDT/Centaur Winchip chips in them.

Yeah.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 02:39 PM, Linus Torvalds wrote:
> On Mon, Mar 10, 2014 at 2:25 PM,   wrote:
>>
>> This was discovered by me.
> 
> Sorry for the misattribution.
> 
>> But this is not a real solution, at least when vcpu function support
>> will be added, then the code size will exceed the page size. Reserving
>> two pages for the VDSO is a good option.
> 
> Quite frankly, there is no way in hell I will take a patch like that
> for 3.14 any more, and I would argue against it for stable.
> 
> Now, if this problem never happens with current kernels (because it's
> purely due to the patch in -tip), then I don't much care.
> 

It is only for tip:x86/vdso, so current kernels don't matter.

There is going to be 32-bit use in the embedded sector for a long time
to come, I suspect/fear, so I'm not opposed to giving it a bit of a
performance boost as long as it isn't too invasive.

I think Andy's commentary applies, though :)

> IMO this is dumb.  I can think of two sensible solutions:
> 
> 1. Get rid of compat vdso and replace it with no vdso at all.  This is
> compatible with everything and requires almost no code 
> 
> 2. Fix compat vdso.  Give it as much space as needed, make the address
> dynamic, and relocate it to the right place.


-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 02:51 PM, Dave Jones wrote:
> 
> Even when it worked, it was only a small performance increase anyway,
> 

If it is performance rather then correctness, then let's kill it now.

I'd love to push that patchset already for 3.15, anyone want to write it
up (I'm on a trip)... otherwise I'll do it Wednesday or so.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 3:03 PM, Andy Lutomirski  wrote:
> On Mon, Mar 10, 2014 at 2:53 PM,   wrote:
>> Zitat von Linus Torvalds :
>>
>>> On Mon, Mar 10, 2014 at 2:25 PM,   wrote:


 This was discovered by me.
>>>
>>>
>>> Sorry for the misattribution.
>>>
 But this is not a real solution, at least when vcpu function support
 will be added, then the code size will exceed the page size. Reserving
 two pages for the VDSO is a good option.
>>>
>>>
>>> Quite frankly, there is no way in hell I will take a patch like that
>>> for 3.14 any more, and I would argue against it for stable.
>>>
>>> Now, if this problem never happens with current kernels (because it's
>>> purely due to the patch in -tip), then I don't much care.
>>>
>>> That said, I don't understand why we are even adding new features like
>>> this to 32-bit mode in the first place, so if that patch is the sole
>>> source of all this headache, then why not just throw the patch away?
>>>
>>
>> The patch is working. And for this current issue there is a solution i
>> already
>> announced.
>>
>> A dual VDSO: a one page sized VDSO for the compat mode which has only the
>> syscall
>> code and on multi page sized VDSO which is mapped into user space for the
>> non compat
>> mode.
>>
>> This will work and has no side effects.
>
> IMO this is dumb.  I can think of two sensible solutions:
>
> 1. Get rid of compat vdso and replace it with no vdso at all.  This is
> compatible with everything and requires almost no code :)
>
> 2. Fix compat vdso.  Give it as much space as needed, make the address
> dynamic, and relocate it to the right place.
>
> I see no legitimate reason to further increase the number of 32-bit
> vdso images.  Three is already ridiculous, and adding more is IMO
> hideous.
>
> #1 is actually a serious proposal.  To do it right, I think we should
> rename the config option to CONFIG_BROKEN_GLIBC_VDSO, default it to n,
> and make the help text clarify that this only affects certain
> non-released glibc versions and that anyone building a new kernel is
> highly unlikely to be affected.  Then make vdso=2 act just like
> vdso=0.  CONFIG_BROKEN_GLIBC_VDSO just changes the default from vdso=1
> to vdso=0.
>
> Damn it, the number of users who (a) have a buggy copy of glibc, (b)
> are using new kernels, and (c) are using CONFIG_COMPAT_VDSO as opposed
> to, say, vdso=2 is probably very close to zero.  (These users will
> have issues until they fix their config.)
>
> The number of users who (a) have a buggy copy of glibc, (b) are using
> new kernels, and (c) have cpus that derive significant benefit from
> using a vdso instead of int 80 and care at all is probably also very
> close to zero.
>
> The maintenance burden of this piece of shite is empirically quite far
> from zero.

I'm testing a patch.  If it seems to work, I'll send it out.  It's a
big cleanup.

>
> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 2:53 PM,   wrote:
> Zitat von Linus Torvalds :
>
>> On Mon, Mar 10, 2014 at 2:25 PM,   wrote:
>>>
>>>
>>> This was discovered by me.
>>
>>
>> Sorry for the misattribution.
>>
>>> But this is not a real solution, at least when vcpu function support
>>> will be added, then the code size will exceed the page size. Reserving
>>> two pages for the VDSO is a good option.
>>
>>
>> Quite frankly, there is no way in hell I will take a patch like that
>> for 3.14 any more, and I would argue against it for stable.
>>
>> Now, if this problem never happens with current kernels (because it's
>> purely due to the patch in -tip), then I don't much care.
>>
>> That said, I don't understand why we are even adding new features like
>> this to 32-bit mode in the first place, so if that patch is the sole
>> source of all this headache, then why not just throw the patch away?
>>
>
> The patch is working. And for this current issue there is a solution i
> already
> announced.
>
> A dual VDSO: a one page sized VDSO for the compat mode which has only the
> syscall
> code and on multi page sized VDSO which is mapped into user space for the
> non compat
> mode.
>
> This will work and has no side effects.

IMO this is dumb.  I can think of two sensible solutions:

1. Get rid of compat vdso and replace it with no vdso at all.  This is
compatible with everything and requires almost no code :)

2. Fix compat vdso.  Give it as much space as needed, make the address
dynamic, and relocate it to the right place.

I see no legitimate reason to further increase the number of 32-bit
vdso images.  Three is already ridiculous, and adding more is IMO
hideous.

#1 is actually a serious proposal.  To do it right, I think we should
rename the config option to CONFIG_BROKEN_GLIBC_VDSO, default it to n,
and make the help text clarify that this only affects certain
non-released glibc versions and that anyone building a new kernel is
highly unlikely to be affected.  Then make vdso=2 act just like
vdso=0.  CONFIG_BROKEN_GLIBC_VDSO just changes the default from vdso=1
to vdso=0.

Damn it, the number of users who (a) have a buggy copy of glibc, (b)
are using new kernels, and (c) are using CONFIG_COMPAT_VDSO as opposed
to, say, vdso=2 is probably very close to zero.  (These users will
have issues until they fix their config.)

The number of users who (a) have a buggy copy of glibc, (b) are using
new kernels, and (c) have cpus that derive significant benefit from
using a vdso instead of int 80 and care at all is probably also very
close to zero.

The maintenance burden of this piece of shite is empirically quite far
from zero.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani

Zitat von Linus Torvalds :


On Mon, Mar 10, 2014 at 2:25 PM,   wrote:


This was discovered by me.


Sorry for the misattribution.


But this is not a real solution, at least when vcpu function support
will be added, then the code size will exceed the page size. Reserving
two pages for the VDSO is a good option.


Quite frankly, there is no way in hell I will take a patch like that
for 3.14 any more, and I would argue against it for stable.

Now, if this problem never happens with current kernels (because it's
purely due to the patch in -tip), then I don't much care.

That said, I don't understand why we are even adding new features like
this to 32-bit mode in the first place, so if that patch is the sole
source of all this headache, then why not just throw the patch away?



The patch is working. And for this current issue there is a solution i already
announced.

A dual VDSO: a one page sized VDSO for the compat mode which has only  
the syscall
code and on multi page sized VDSO which is mapped into user space for  
the non compat

mode.

This will work and has no side effects.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Dave Jones
On Mon, Mar 10, 2014 at 02:20:34PM -0700, Linus Torvalds wrote:

 > I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
 > limit it to !SMP - I don't think anybody ever made SMP systems with
 > those IDT/Centaur Winchip chips in them.

Given the number of people who ever used that code when it was new could
probably be counted on a couple hands, I'd be amazed if a) anyone was still
using it, and b) that it hasn't regressed in some way in the last 15 years.

Even when it worked, it was only a small performance increase anyway,
and anyone who notices a circa 1998 CPU is now slightly slower on benchmarks
in 2014 probably needs psychiatric help.

I'd say rip it out completely.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 2:20 PM, Linus Torvalds
 wrote:
> On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
>  wrote:
>>
>> If the only immediate problem is the code generation size, then Andy
>> already had a (simpler) hack-around:
>>
>>   #undef CONFIG_OPTIMIZE_INLINING
>>   #undef CONFIG_X86_PPRO_FENCE
>>
>> in vclock_gettime.c
>
> Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.
>
> It was of questionable value to begin with, and I think that the
> actual PPro bug is about one of
>
>  - Errata 66, "Delayed line invalidation".
>  - Errata 92, "Potential loss of data coherency"
>
> both of which affect all PPro versions afaik (there is also a UP
> errata 51 wrt ordering of cached and uncached accesses that was fixed
> in the sB1 stepping).
>
> And as far as I know, we have never actually seen the bug in real
> life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on
> knowledge of the errata afaik.

I admit I don't fully follow the description of the errata, but it's
not obvious to me that making smp_rmb() emit lfence is going to do any
good.  The description seems to be suggesting using actual LOCK
operations to work around the erratum.

>
> So I do think we might want to consider retiring that config option
> entirely as a "historical oddity".
>
> And very much so for the vdso case. Do we even do the asm alternative
> fixups for the vdso?

Yes, we've done that for a couple years for rdtsc_barrier's benefit.

>
> I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
> limit it to !SMP - I don't think anybody ever made SMP systems with
> those IDT/Centaur Winchip chips in them.

Why does OOSTORE matter for !SMP?  Is it just for poking at hardware registers?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 2:25 PM,   wrote:
>
> This was discovered by me.

Sorry for the misattribution.

> But this is not a real solution, at least when vcpu function support
> will be added, then the code size will exceed the page size. Reserving
> two pages for the VDSO is a good option.

Quite frankly, there is no way in hell I will take a patch like that
for 3.14 any more, and I would argue against it for stable.

Now, if this problem never happens with current kernels (because it's
purely due to the patch in -tip), then I don't much care.

That said, I don't understand why we are even adding new features like
this to 32-bit mode in the first place, so if that patch is the sole
source of all this headache, then why not just throw the patch away?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani


Zitat von "H. Peter Anvin" :


On 03/10/2014 01:03 PM, Stefani Seibold wrote:


What is now the next step? Kick out the compat VDSO? Or should i
implement the dual VDSO. And what is now the preferred way to map the
VDSO into the user space? Using install_special_mapping() or map it
beyond the user stack?

The is easiest and fastest way to get a working result is to do the non
compat VDSO only mapping using install_special_mapping(). The dual VDSO
would take a little bit more time.

It would be great to have first a consensus about the design before i
start to implement ;-)



The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.



Do you except a complete new patch set or an incremental patch based on the
current patch set?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani


Zitat von Linus Torvalds :


On Mon, Mar 10, 2014 at 1:06 PM, H. Peter Anvin  wrote:


The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.


If the only immediate problem is the code generation size, then Andy
already had a (simpler) hack-around:

  #undef CONFIG_OPTIMIZE_INLINING
  #undef CONFIG_X86_PPRO_FENCE

in vclock_gettime.c.



This was discovered by me.


I think we could make it a bit less hacky by just restricting the
inlining of the paravirt case, since that's presumably the crap code
that causes things to grow too large. Or find out what in there it is
that explodes in size, and just try to de-crapify the code enough that
it no longer does that.



The two options above makes the code grow. The x86 pro fence make add
alternatives which increase the code by 600 bytes and the optimize
inlining will add another 500 bytes.

But this is not a real solution, at least when vcpu function support
will be added, then the code size will exceed the page size. Reserving
two pages for the VDSO is a good option.

- Stefani



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
 wrote:
>
> If the only immediate problem is the code generation size, then Andy
> already had a (simpler) hack-around:
>
>   #undef CONFIG_OPTIMIZE_INLINING
>   #undef CONFIG_X86_PPRO_FENCE
>
> in vclock_gettime.c

Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.

It was of questionable value to begin with, and I think that the
actual PPro bug is about one of

 - Errata 66, "Delayed line invalidation".
 - Errata 92, "Potential loss of data coherency"

both of which affect all PPro versions afaik (there is also a UP
errata 51 wrt ordering of cached and uncached accesses that was fixed
in the sB1 stepping).

And as far as I know, we have never actually seen the bug in real
life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on
knowledge of the errata afaik.

So I do think we might want to consider retiring that config option
entirely as a "historical oddity".

And very much so for the vdso case. Do we even do the asm alternative
fixups for the vdso?

I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
limit it to !SMP - I don't think anybody ever made SMP systems with
those IDT/Centaur Winchip chips in them.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 1:06 PM, H. Peter Anvin  wrote:
>
> The quick way to get something working is simply to reserve more than
> one page (two should presumably be enough) in the fixmap and adjust the
> link address of the VDSO accordingly.  This is not where we want to go
> in the long term, but it doesn't seem to make sense to try to do
> everything all at once -- we are already starting to push way too close
> to the 3.15 merge window.

If the only immediate problem is the code generation size, then Andy
already had a (simpler) hack-around:

  #undef CONFIG_OPTIMIZE_INLINING
  #undef CONFIG_X86_PPRO_FENCE

in vclock_gettime.c.

I think we could make it a bit less hacky by just restricting the
inlining of the paravirt case, since that's presumably the crap code
that causes things to grow too large. Or find out what in there it is
that explodes in size, and just try to de-crapify the code enough that
it no longer does that.

Or is there something else going on too?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 01:03 PM, Stefani Seibold wrote:
> 
> What is now the next step? Kick out the compat VDSO? Or should i
> implement the dual VDSO. And what is now the preferred way to map the
> VDSO into the user space? Using install_special_mapping() or map it
> beyond the user stack?
> 
> The is easiest and fastest way to get a working result is to do the non
> compat VDSO only mapping using install_special_mapping(). The dual VDSO
> would take a little bit more time.
> 
> It would be great to have first a consensus about the design before i
> start to implement ;-)
> 

The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.

And special thanks to Andy for doing the archaeology...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Stefani Seibold
Am Montag, den 10.03.2014, 10:12 -0700 schrieb Andy Lutomirski:
> On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
>  wrote:
> >
> > On Mar 10, 2014 8:01 AM, "H. Peter Anvin"  wrote:
> >>
> >> I have mentioned in the past wanting to move the fixmap to the low part
> >> of the kernel space, because the top isn't really fixed...
> >
> > How about the high part of the user address space, just above the stack?
> > Leave a unmapped page in between, or something. The stack is already
> > randomized, isn't it?
> 
> For the !compat_vdso case, I don't like it -- this will put the vdso
> (which is executable) at a constant offset from the stack, which will
> make it much easier to use the vdso to defeat ASLR.
> 
> For the compat_vdso case, this only works if the address is *not*
> random, unless we're going to start giving each process its very own
> relocated vdso.
> 
> >
> > That would actually be preferable in a few ways, notably not having to mark
> > page directories user accessible in the kennel space area.
> 
> Is that where the rabid pte dogs live?
> 
> We can already avoid making fixmap pages user-accessible in the
> !compat_vdso case for 32-bit tasks -- the vdso lives in a couple of
> more-or-less ordinary vmas.
> 

What is now the next step? Kick out the compat VDSO? Or should i
implement the dual VDSO. And what is now the preferred way to map the
VDSO into the user space? Using install_special_mapping() or map it
beyond the user stack?

The is easiest and fastest way to get a working result is to do the non
compat VDSO only mapping using install_special_mapping(). The dual VDSO
would take a little bit more time.

It would be great to have first a consensus about the design before i
start to implement ;-)

- Stefani

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Greg Kroah-Hartman
On Sun, Mar 09, 2014 at 05:00:31PM -0700, H. Peter Anvin wrote:
> On 03/09/2014 12:08 AM, Stefani Seibold wrote:
> > 
> > This was not addressed to you, it was addressed to the x86 intel kernel
> > developers to do more testing, since this piece of code has so many side
> > effects. I apologizes this miss understanding.
> > 
> 
> I think you're misunderstanding.
> 
> We cannot debug every single contributors' code for them.  There isn't
> enough of us to go around.  We have in fact stretched well beyond the
> point which we usually can accommodate for this particular patchset.

What is failing in the tests that need to be fixed up?

I can look into this next week when I return.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:58 AM, H. Peter Anvin  wrote:
> On 03/10/2014 10:52 AM, Andy Lutomirski wrote:
>>>
>>> Hint: where is your RIP?  Where is the RIP of other processes?
>>>
>>
>> Whoa there, I'm not suggesting anything nearly that crazy :)
>>
>> I'm suggesting changing out the vvar page *for that process*, which is
>> not executable.  The actual vdso code already supports this -- from
>> userspace's point of view it's the same thing as 'echo acpi_pm >
>> /sys/devices/system/clocksource/clocksource0/current_clocksource',
>> except that if the actual clocksource is HPET, the hpet page will be
>> switched out (presumably with a zero page) while being read.
>>
>> Other processes are totally irrelevant, unless they share the same
>> struct mm.  (This is why the vvar page can't be in the fixmap for this
>> to work.)
>>
>
> I meant "threads" not "processes"...

Still okay.  The vclock_gettime code does, more or less:

do {
seq = raw_read_seqcount_begin(>seq);
mode = gtod->clock.vclock_mode;
read the time;
} while (unlikely(read_seqcount_retry(>seq, seq)));

Switching the clocksource in current code will make seq odd, then
change vclock_mode, then make seq even again.  The prctl would zap the
mapping, flush the TLB, and then map something else (with a different
seq and vclock_mode) there.  User code will be hard pressed to tell
the difference.

To avoid having to carve out a special seq value, I'd actually propose
just leaving seq odd for the TSC off case -- I think that the
vclock_gettime code could move the branch for mode == NONE inside the
loop with no loss in performance.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:52 AM, Andy Lutomirski wrote:
>>
>> Hint: where is your RIP?  Where is the RIP of other processes?
>>
> 
> Whoa there, I'm not suggesting anything nearly that crazy :)
> 
> I'm suggesting changing out the vvar page *for that process*, which is
> not executable.  The actual vdso code already supports this -- from
> userspace's point of view it's the same thing as 'echo acpi_pm >
> /sys/devices/system/clocksource/clocksource0/current_clocksource',
> except that if the actual clocksource is HPET, the hpet page will be
> switched out (presumably with a zero page) while being read.
> 
> Other processes are totally irrelevant, unless they share the same
> struct mm.  (This is why the vvar page can't be in the fixmap for this
> to work.)
> 

I meant "threads" not "processes"...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:48 AM, H. Peter Anvin  wrote:
> On 03/10/2014 10:46 AM, Andy Lutomirski wrote:
>>>
>>> Yes, we'd have to switch the vdso to using syscall access.  Doing that
>>> from inside a system call is... "interesting".
>>
>> It's a little less interesting if it just involves changing a vma.
>> It's still tricky, though -- would each struct mm have its own struct
>> file for the vvar page?  Can this be done with some
>> vm_operations_struct magic?  There are possible races, too, though --
>> another thread could access the thing concurrently with a syscall.
>>
>
> Hint: where is your RIP?  Where is the RIP of other processes?
>

Whoa there, I'm not suggesting anything nearly that crazy :)

I'm suggesting changing out the vvar page *for that process*, which is
not executable.  The actual vdso code already supports this -- from
userspace's point of view it's the same thing as 'echo acpi_pm >
/sys/devices/system/clocksource/clocksource0/current_clocksource',
except that if the actual clocksource is HPET, the hpet page will be
switched out (presumably with a zero page) while being read.

Other processes are totally irrelevant, unless they share the same
struct mm.  (This is why the vvar page can't be in the fixmap for this
to work.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:46 AM, Andy Lutomirski wrote:
> 
> It might be nice in general for there to be a /dev/vdso and for the
> vdso to literally be a mapping of that device node.  I bet that CRIU
> would appreciate this.  (The mmap flags would be a little odd, since
> different pages have different protections.)
> 

Actually, it presumably ought to be handled like any other (readonly)
ELF file: that is, let the mapper handle the permissions.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:46 AM, Andy Lutomirski wrote:
>>
>> Yes, we'd have to switch the vdso to using syscall access.  Doing that
>> from inside a system call is... "interesting".
> 
> It's a little less interesting if it just involves changing a vma.
> It's still tricky, though -- would each struct mm have its own struct
> file for the vvar page?  Can this be done with some
> vm_operations_struct magic?  There are possible races, too, though --
> another thread could access the thing concurrently with a syscall.
> 

Hint: where is your RIP?  Where is the RIP of other processes?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:38 AM, H. Peter Anvin  wrote:
> On 03/10/2014 10:31 AM, Andy Lutomirski wrote:
>>>
 For 64-bit, this is an entirely different story.  The vsyscall page is
 stuck in the fixmap forever, although I want to add a way for
 userspace to opt out.  The vvar page, hpet, etc could move into vmas,
 though.  I kind of want to do that anyway to allow processes to turn
 off the ability to read the clock.
>>>
>>> Wait... you want to do what?!
>>
>> This isn't even my idea:
>>
>> commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
>> Author: Erik Bosman 
>> Date:   Fri Apr 11 18:54:17 2008 +0200
>>
>> generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC
>>
>> This patch adds prctl commands that make it possible
>> to deny the execution of timestamp counters in userspace.
>> If this is not implemented on a specific architecture,
>> prctl will return -EINVAL.
>>
>> Currently anything that tries to use the vdso will just crash if you
>> do that, and it fails to turn off direct HPET access.  Fixing this
>> might be nice, but the current vvar implementation makes it
>> impossible.  If you want to stick something in a seccomp sandbox and
>> make it very difficult for it to exploit timing side channels, then
>> this is important :)
>>
>
> Yes, we'd have to switch the vdso to using syscall access.  Doing that
> from inside a system call is... "interesting".

It's a little less interesting if it just involves changing a vma.
It's still tricky, though -- would each struct mm have its own struct
file for the vvar page?  Can this be done with some
vm_operations_struct magic?  There are possible races, too, though --
another thread could access the thing concurrently with a syscall.

It might be nice in general for there to be a /dev/vdso and for the
vdso to literally be a mapping of that device node.  I bet that CRIU
would appreciate this.  (The mmap flags would be a little odd, since
different pages have different protections.)

Anyway, this is totally off topic for the current issue :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:31 AM, Andy Lutomirski wrote:
>>
>>> For 64-bit, this is an entirely different story.  The vsyscall page is
>>> stuck in the fixmap forever, although I want to add a way for
>>> userspace to opt out.  The vvar page, hpet, etc could move into vmas,
>>> though.  I kind of want to do that anyway to allow processes to turn
>>> off the ability to read the clock.
>>
>> Wait... you want to do what?!
> 
> This isn't even my idea:
> 
> commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
> Author: Erik Bosman 
> Date:   Fri Apr 11 18:54:17 2008 +0200
> 
> generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC
> 
> This patch adds prctl commands that make it possible
> to deny the execution of timestamp counters in userspace.
> If this is not implemented on a specific architecture,
> prctl will return -EINVAL.
> 
> Currently anything that tries to use the vdso will just crash if you
> do that, and it fails to turn off direct HPET access.  Fixing this
> might be nice, but the current vvar implementation makes it
> impossible.  If you want to stick something in a seccomp sandbox and
> make it very difficult for it to exploit timing side channels, then
> this is important :)
> 

Yes, we'd have to switch the vdso to using syscall access.  Doing that
from inside a system call is... "interesting".

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:24 AM, H. Peter Anvin  wrote:
> On 03/10/2014 10:12 AM, Andy Lutomirski wrote:
>> On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
>>  wrote:
>>>
>>> On Mar 10, 2014 8:01 AM, "H. Peter Anvin"  wrote:

 I have mentioned in the past wanting to move the fixmap to the low part
 of the kernel space, because the top isn't really fixed...
>>>
>>> How about the high part of the user address space, just above the stack?
>>> Leave a unmapped page in between, or something. The stack is already
>>> randomized, isn't it?
>>
>> For the !compat_vdso case, I don't like it -- this will put the vdso
>> (which is executable) at a constant offset from the stack, which will
>> make it much easier to use the vdso to defeat ASLR.
>>
>> For the compat_vdso case, this only works if the address is *not*
>> random, unless we're going to start giving each process its very own
>> relocated vdso.
>>
>
> I presumed we were talking about compat_vdso, which thus simply turns
> into a "don't randomize the vdso flag."  A significant side benefit is
> that this should make the code more similar.

Fair enough.  I still don't like having (top of stack - vdso) being
constant, but maybe that's avoidable.

>
>> For 64-bit, this is an entirely different story.  The vsyscall page is
>> stuck in the fixmap forever, although I want to add a way for
>> userspace to opt out.  The vvar page, hpet, etc could move into vmas,
>> though.  I kind of want to do that anyway to allow processes to turn
>> off the ability to read the clock.
>
> Wait... you want to do what?!

This isn't even my idea:

commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
Author: Erik Bosman 
Date:   Fri Apr 11 18:54:17 2008 +0200

generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC

This patch adds prctl commands that make it possible
to deny the execution of timestamp counters in userspace.
If this is not implemented on a specific architecture,
prctl will return -EINVAL.

Currently anything that tries to use the vdso will just crash if you
do that, and it fails to turn off direct HPET access.  Fixing this
might be nice, but the current vvar implementation makes it
impossible.  If you want to stick something in a seccomp sandbox and
make it very difficult for it to exploit timing side channels, then
this is important :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:12 AM, Andy Lutomirski wrote:
> On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
>  wrote:
>>
>> On Mar 10, 2014 8:01 AM, "H. Peter Anvin"  wrote:
>>>
>>> I have mentioned in the past wanting to move the fixmap to the low part
>>> of the kernel space, because the top isn't really fixed...
>>
>> How about the high part of the user address space, just above the stack?
>> Leave a unmapped page in between, or something. The stack is already
>> randomized, isn't it?
> 
> For the !compat_vdso case, I don't like it -- this will put the vdso
> (which is executable) at a constant offset from the stack, which will
> make it much easier to use the vdso to defeat ASLR.
> 
> For the compat_vdso case, this only works if the address is *not*
> random, unless we're going to start giving each process its very own
> relocated vdso.
> 

I presumed we were talking about compat_vdso, which thus simply turns
into a "don't randomize the vdso flag."  A significant side benefit is
that this should make the code more similar.

> For 64-bit, this is an entirely different story.  The vsyscall page is
> stuck in the fixmap forever, although I want to add a way for
> userspace to opt out.  The vvar page, hpet, etc could move into vmas,
> though.  I kind of want to do that anyway to allow processes to turn
> off the ability to read the clock.

Wait... you want to do what?!

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
 wrote:
>
> On Mar 10, 2014 8:01 AM, "H. Peter Anvin"  wrote:
>>
>> I have mentioned in the past wanting to move the fixmap to the low part
>> of the kernel space, because the top isn't really fixed...
>
> How about the high part of the user address space, just above the stack?
> Leave a unmapped page in between, or something. The stack is already
> randomized, isn't it?

For the !compat_vdso case, I don't like it -- this will put the vdso
(which is executable) at a constant offset from the stack, which will
make it much easier to use the vdso to defeat ASLR.

For the compat_vdso case, this only works if the address is *not*
random, unless we're going to start giving each process its very own
relocated vdso.

>
> That would actually be preferable in a few ways, notably not having to mark
> page directories user accessible in the kennel space area.

Is that where the rabid pte dogs live?

We can already avoid making fixmap pages user-accessible in the
!compat_vdso case for 32-bit tasks -- the vdso lives in a couple of
more-or-less ordinary vmas.

For 64-bit, this is an entirely different story.  The vsyscall page is
stuck in the fixmap forever, although I want to add a way for
userspace to opt out.  The vvar page, hpet, etc could move into vmas,
though.  I kind of want to do that anyway to allow processes to turn
off the ability to read the clock.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/09/2014 09:46 PM, Andy Lutomirski wrote:
> On Sun, Mar 9, 2014 at 8:18 PM, Andy Lutomirski  wrote:
>> (Of course, I haven't the faintest idea what l_addr in glibc means.
>> If there was a way to arrange for l_addr to be zero, then maybe none
>> of this would matter.  Hmm, I wonder if just not relocating the vdso
>> at all would have the desired effect.  Anyone out there understand
>> glibc?)
> 
> No, that won't work.  The bug is that glibc expects PT_DYNAMIC's vaddr
> to be the virtual address of the dynamic table.  This can only be true
> if the vdso is mapped at the address that the kernel relocated it to.
> 
> I also learned that glibc's code is really hideous.  Wow.
> 

At the same time it does mean we have more flexibility than having a
hard-coded address... we can at least allocate more than one page in the
fixmap; for a really "full service" solution the kernel could adjust the
vdso for whatever address the "fixmap" is at.

I have mentioned in the past wanting to move the fixmap to the low part
of the kernel space, because the top isn't really fixed...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/09/2014 09:46 PM, Andy Lutomirski wrote:
 On Sun, Mar 9, 2014 at 8:18 PM, Andy Lutomirski l...@amacapital.net wrote:
 (Of course, I haven't the faintest idea what l_addr in glibc means.
 If there was a way to arrange for l_addr to be zero, then maybe none
 of this would matter.  Hmm, I wonder if just not relocating the vdso
 at all would have the desired effect.  Anyone out there understand
 glibc?)
 
 No, that won't work.  The bug is that glibc expects PT_DYNAMIC's vaddr
 to be the virtual address of the dynamic table.  This can only be true
 if the vdso is mapped at the address that the kernel relocated it to.
 
 I also learned that glibc's code is really hideous.  Wow.
 

At the same time it does mean we have more flexibility than having a
hard-coded address... we can at least allocate more than one page in the
fixmap; for a really full service solution the kernel could adjust the
vdso for whatever address the fixmap is at.

I have mentioned in the past wanting to move the fixmap to the low part
of the kernel space, because the top isn't really fixed...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 On Mar 10, 2014 8:01 AM, H. Peter Anvin h...@linux.intel.com wrote:

 I have mentioned in the past wanting to move the fixmap to the low part
 of the kernel space, because the top isn't really fixed...

 How about the high part of the user address space, just above the stack?
 Leave a unmapped page in between, or something. The stack is already
 randomized, isn't it?

For the !compat_vdso case, I don't like it -- this will put the vdso
(which is executable) at a constant offset from the stack, which will
make it much easier to use the vdso to defeat ASLR.

For the compat_vdso case, this only works if the address is *not*
random, unless we're going to start giving each process its very own
relocated vdso.


 That would actually be preferable in a few ways, notably not having to mark
 page directories user accessible in the kennel space area.

Is that where the rabid pte dogs live?

We can already avoid making fixmap pages user-accessible in the
!compat_vdso case for 32-bit tasks -- the vdso lives in a couple of
more-or-less ordinary vmas.

For 64-bit, this is an entirely different story.  The vsyscall page is
stuck in the fixmap forever, although I want to add a way for
userspace to opt out.  The vvar page, hpet, etc could move into vmas,
though.  I kind of want to do that anyway to allow processes to turn
off the ability to read the clock.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:12 AM, Andy Lutomirski wrote:
 On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 On Mar 10, 2014 8:01 AM, H. Peter Anvin h...@linux.intel.com wrote:

 I have mentioned in the past wanting to move the fixmap to the low part
 of the kernel space, because the top isn't really fixed...

 How about the high part of the user address space, just above the stack?
 Leave a unmapped page in between, or something. The stack is already
 randomized, isn't it?
 
 For the !compat_vdso case, I don't like it -- this will put the vdso
 (which is executable) at a constant offset from the stack, which will
 make it much easier to use the vdso to defeat ASLR.
 
 For the compat_vdso case, this only works if the address is *not*
 random, unless we're going to start giving each process its very own
 relocated vdso.
 

I presumed we were talking about compat_vdso, which thus simply turns
into a don't randomize the vdso flag.  A significant side benefit is
that this should make the code more similar.

 For 64-bit, this is an entirely different story.  The vsyscall page is
 stuck in the fixmap forever, although I want to add a way for
 userspace to opt out.  The vvar page, hpet, etc could move into vmas,
 though.  I kind of want to do that anyway to allow processes to turn
 off the ability to read the clock.

Wait... you want to do what?!

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:24 AM, H. Peter Anvin h...@linux.intel.com wrote:
 On 03/10/2014 10:12 AM, Andy Lutomirski wrote:
 On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 On Mar 10, 2014 8:01 AM, H. Peter Anvin h...@linux.intel.com wrote:

 I have mentioned in the past wanting to move the fixmap to the low part
 of the kernel space, because the top isn't really fixed...

 How about the high part of the user address space, just above the stack?
 Leave a unmapped page in between, or something. The stack is already
 randomized, isn't it?

 For the !compat_vdso case, I don't like it -- this will put the vdso
 (which is executable) at a constant offset from the stack, which will
 make it much easier to use the vdso to defeat ASLR.

 For the compat_vdso case, this only works if the address is *not*
 random, unless we're going to start giving each process its very own
 relocated vdso.


 I presumed we were talking about compat_vdso, which thus simply turns
 into a don't randomize the vdso flag.  A significant side benefit is
 that this should make the code more similar.

Fair enough.  I still don't like having (top of stack - vdso) being
constant, but maybe that's avoidable.


 For 64-bit, this is an entirely different story.  The vsyscall page is
 stuck in the fixmap forever, although I want to add a way for
 userspace to opt out.  The vvar page, hpet, etc could move into vmas,
 though.  I kind of want to do that anyway to allow processes to turn
 off the ability to read the clock.

 Wait... you want to do what?!

This isn't even my idea:

commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
Author: Erik Bosman ebn...@few.vu.nl
Date:   Fri Apr 11 18:54:17 2008 +0200

generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC

This patch adds prctl commands that make it possible
to deny the execution of timestamp counters in userspace.
If this is not implemented on a specific architecture,
prctl will return -EINVAL.

Currently anything that tries to use the vdso will just crash if you
do that, and it fails to turn off direct HPET access.  Fixing this
might be nice, but the current vvar implementation makes it
impossible.  If you want to stick something in a seccomp sandbox and
make it very difficult for it to exploit timing side channels, then
this is important :)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:31 AM, Andy Lutomirski wrote:

 For 64-bit, this is an entirely different story.  The vsyscall page is
 stuck in the fixmap forever, although I want to add a way for
 userspace to opt out.  The vvar page, hpet, etc could move into vmas,
 though.  I kind of want to do that anyway to allow processes to turn
 off the ability to read the clock.

 Wait... you want to do what?!
 
 This isn't even my idea:
 
 commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
 Author: Erik Bosman ebn...@few.vu.nl
 Date:   Fri Apr 11 18:54:17 2008 +0200
 
 generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC
 
 This patch adds prctl commands that make it possible
 to deny the execution of timestamp counters in userspace.
 If this is not implemented on a specific architecture,
 prctl will return -EINVAL.
 
 Currently anything that tries to use the vdso will just crash if you
 do that, and it fails to turn off direct HPET access.  Fixing this
 might be nice, but the current vvar implementation makes it
 impossible.  If you want to stick something in a seccomp sandbox and
 make it very difficult for it to exploit timing side channels, then
 this is important :)
 

Yes, we'd have to switch the vdso to using syscall access.  Doing that
from inside a system call is... interesting.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:38 AM, H. Peter Anvin h...@linux.intel.com wrote:
 On 03/10/2014 10:31 AM, Andy Lutomirski wrote:

 For 64-bit, this is an entirely different story.  The vsyscall page is
 stuck in the fixmap forever, although I want to add a way for
 userspace to opt out.  The vvar page, hpet, etc could move into vmas,
 though.  I kind of want to do that anyway to allow processes to turn
 off the ability to read the clock.

 Wait... you want to do what?!

 This isn't even my idea:

 commit 8fb402bccf203ecca8f9e0202b8fd3c937dece6f
 Author: Erik Bosman ebn...@few.vu.nl
 Date:   Fri Apr 11 18:54:17 2008 +0200

 generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC

 This patch adds prctl commands that make it possible
 to deny the execution of timestamp counters in userspace.
 If this is not implemented on a specific architecture,
 prctl will return -EINVAL.

 Currently anything that tries to use the vdso will just crash if you
 do that, and it fails to turn off direct HPET access.  Fixing this
 might be nice, but the current vvar implementation makes it
 impossible.  If you want to stick something in a seccomp sandbox and
 make it very difficult for it to exploit timing side channels, then
 this is important :)


 Yes, we'd have to switch the vdso to using syscall access.  Doing that
 from inside a system call is... interesting.

It's a little less interesting if it just involves changing a vma.
It's still tricky, though -- would each struct mm have its own struct
file for the vvar page?  Can this be done with some
vm_operations_struct magic?  There are possible races, too, though --
another thread could access the thing concurrently with a syscall.

It might be nice in general for there to be a /dev/vdso and for the
vdso to literally be a mapping of that device node.  I bet that CRIU
would appreciate this.  (The mmap flags would be a little odd, since
different pages have different protections.)

Anyway, this is totally off topic for the current issue :)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:46 AM, Andy Lutomirski wrote:

 Yes, we'd have to switch the vdso to using syscall access.  Doing that
 from inside a system call is... interesting.
 
 It's a little less interesting if it just involves changing a vma.
 It's still tricky, though -- would each struct mm have its own struct
 file for the vvar page?  Can this be done with some
 vm_operations_struct magic?  There are possible races, too, though --
 another thread could access the thing concurrently with a syscall.
 

Hint: where is your RIP?  Where is the RIP of other processes?

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:46 AM, Andy Lutomirski wrote:
 
 It might be nice in general for there to be a /dev/vdso and for the
 vdso to literally be a mapping of that device node.  I bet that CRIU
 would appreciate this.  (The mmap flags would be a little odd, since
 different pages have different protections.)
 

Actually, it presumably ought to be handled like any other (readonly)
ELF file: that is, let the mapper handle the permissions.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:48 AM, H. Peter Anvin h...@linux.intel.com wrote:
 On 03/10/2014 10:46 AM, Andy Lutomirski wrote:

 Yes, we'd have to switch the vdso to using syscall access.  Doing that
 from inside a system call is... interesting.

 It's a little less interesting if it just involves changing a vma.
 It's still tricky, though -- would each struct mm have its own struct
 file for the vvar page?  Can this be done with some
 vm_operations_struct magic?  There are possible races, too, though --
 another thread could access the thing concurrently with a syscall.


 Hint: where is your RIP?  Where is the RIP of other processes?


Whoa there, I'm not suggesting anything nearly that crazy :)

I'm suggesting changing out the vvar page *for that process*, which is
not executable.  The actual vdso code already supports this -- from
userspace's point of view it's the same thing as 'echo acpi_pm 
/sys/devices/system/clocksource/clocksource0/current_clocksource',
except that if the actual clocksource is HPET, the hpet page will be
switched out (presumably with a zero page) while being read.

Other processes are totally irrelevant, unless they share the same
struct mm.  (This is why the vvar page can't be in the fixmap for this
to work.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 10:52 AM, Andy Lutomirski wrote:

 Hint: where is your RIP?  Where is the RIP of other processes?

 
 Whoa there, I'm not suggesting anything nearly that crazy :)
 
 I'm suggesting changing out the vvar page *for that process*, which is
 not executable.  The actual vdso code already supports this -- from
 userspace's point of view it's the same thing as 'echo acpi_pm 
 /sys/devices/system/clocksource/clocksource0/current_clocksource',
 except that if the actual clocksource is HPET, the hpet page will be
 switched out (presumably with a zero page) while being read.
 
 Other processes are totally irrelevant, unless they share the same
 struct mm.  (This is why the vvar page can't be in the fixmap for this
 to work.)
 

I meant threads not processes...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 10:58 AM, H. Peter Anvin h...@linux.intel.com wrote:
 On 03/10/2014 10:52 AM, Andy Lutomirski wrote:

 Hint: where is your RIP?  Where is the RIP of other processes?


 Whoa there, I'm not suggesting anything nearly that crazy :)

 I'm suggesting changing out the vvar page *for that process*, which is
 not executable.  The actual vdso code already supports this -- from
 userspace's point of view it's the same thing as 'echo acpi_pm 
 /sys/devices/system/clocksource/clocksource0/current_clocksource',
 except that if the actual clocksource is HPET, the hpet page will be
 switched out (presumably with a zero page) while being read.

 Other processes are totally irrelevant, unless they share the same
 struct mm.  (This is why the vvar page can't be in the fixmap for this
 to work.)


 I meant threads not processes...

Still okay.  The vclock_gettime code does, more or less:

do {
seq = raw_read_seqcount_begin(gtod-seq);
mode = gtod-clock.vclock_mode;
read the time;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));

Switching the clocksource in current code will make seq odd, then
change vclock_mode, then make seq even again.  The prctl would zap the
mapping, flush the TLB, and then map something else (with a different
seq and vclock_mode) there.  User code will be hard pressed to tell
the difference.

To avoid having to carve out a special seq value, I'd actually propose
just leaving seq odd for the TSC off case -- I think that the
vclock_gettime code could move the branch for mode == NONE inside the
loop with no loss in performance.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Greg Kroah-Hartman
On Sun, Mar 09, 2014 at 05:00:31PM -0700, H. Peter Anvin wrote:
 On 03/09/2014 12:08 AM, Stefani Seibold wrote:
  
  This was not addressed to you, it was addressed to the x86 intel kernel
  developers to do more testing, since this piece of code has so many side
  effects. I apologizes this miss understanding.
  
 
 I think you're misunderstanding.
 
 We cannot debug every single contributors' code for them.  There isn't
 enough of us to go around.  We have in fact stretched well beyond the
 point which we usually can accommodate for this particular patchset.

What is failing in the tests that need to be fixed up?

I can look into this next week when I return.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Stefani Seibold
Am Montag, den 10.03.2014, 10:12 -0700 schrieb Andy Lutomirski:
 On Mon, Mar 10, 2014 at 8:11 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  On Mar 10, 2014 8:01 AM, H. Peter Anvin h...@linux.intel.com wrote:
 
  I have mentioned in the past wanting to move the fixmap to the low part
  of the kernel space, because the top isn't really fixed...
 
  How about the high part of the user address space, just above the stack?
  Leave a unmapped page in between, or something. The stack is already
  randomized, isn't it?
 
 For the !compat_vdso case, I don't like it -- this will put the vdso
 (which is executable) at a constant offset from the stack, which will
 make it much easier to use the vdso to defeat ASLR.
 
 For the compat_vdso case, this only works if the address is *not*
 random, unless we're going to start giving each process its very own
 relocated vdso.
 
 
  That would actually be preferable in a few ways, notably not having to mark
  page directories user accessible in the kennel space area.
 
 Is that where the rabid pte dogs live?
 
 We can already avoid making fixmap pages user-accessible in the
 !compat_vdso case for 32-bit tasks -- the vdso lives in a couple of
 more-or-less ordinary vmas.
 

What is now the next step? Kick out the compat VDSO? Or should i
implement the dual VDSO. And what is now the preferred way to map the
VDSO into the user space? Using install_special_mapping() or map it
beyond the user stack?

The is easiest and fastest way to get a working result is to do the non
compat VDSO only mapping using install_special_mapping(). The dual VDSO
would take a little bit more time.

It would be great to have first a consensus about the design before i
start to implement ;-)

- Stefani

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 01:03 PM, Stefani Seibold wrote:
 
 What is now the next step? Kick out the compat VDSO? Or should i
 implement the dual VDSO. And what is now the preferred way to map the
 VDSO into the user space? Using install_special_mapping() or map it
 beyond the user stack?
 
 The is easiest and fastest way to get a working result is to do the non
 compat VDSO only mapping using install_special_mapping(). The dual VDSO
 would take a little bit more time.
 
 It would be great to have first a consensus about the design before i
 start to implement ;-)
 

The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.

And special thanks to Andy for doing the archaeology...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 1:06 PM, H. Peter Anvin h...@linux.intel.com wrote:

 The quick way to get something working is simply to reserve more than
 one page (two should presumably be enough) in the fixmap and adjust the
 link address of the VDSO accordingly.  This is not where we want to go
 in the long term, but it doesn't seem to make sense to try to do
 everything all at once -- we are already starting to push way too close
 to the 3.15 merge window.

If the only immediate problem is the code generation size, then Andy
already had a (simpler) hack-around:

  #undef CONFIG_OPTIMIZE_INLINING
  #undef CONFIG_X86_PPRO_FENCE

in vclock_gettime.c.

I think we could make it a bit less hacky by just restricting the
inlining of the paravirt case, since that's presumably the crap code
that causes things to grow too large. Or find out what in there it is
that explodes in size, and just try to de-crapify the code enough that
it no longer does that.

Or is there something else going on too?

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 If the only immediate problem is the code generation size, then Andy
 already had a (simpler) hack-around:

   #undef CONFIG_OPTIMIZE_INLINING
   #undef CONFIG_X86_PPRO_FENCE

 in vclock_gettime.c

Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.

It was of questionable value to begin with, and I think that the
actual PPro bug is about one of

 - Errata 66, Delayed line invalidation.
 - Errata 92, Potential loss of data coherency

both of which affect all PPro versions afaik (there is also a UP
errata 51 wrt ordering of cached and uncached accesses that was fixed
in the sB1 stepping).

And as far as I know, we have never actually seen the bug in real
life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on
knowledge of the errata afaik.

So I do think we might want to consider retiring that config option
entirely as a historical oddity.

And very much so for the vdso case. Do we even do the asm alternative
fixups for the vdso?

I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
limit it to !SMP - I don't think anybody ever made SMP systems with
those IDT/Centaur Winchip chips in them.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani


Zitat von Linus Torvalds torva...@linux-foundation.org:


On Mon, Mar 10, 2014 at 1:06 PM, H. Peter Anvin h...@linux.intel.com wrote:


The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.


If the only immediate problem is the code generation size, then Andy
already had a (simpler) hack-around:

  #undef CONFIG_OPTIMIZE_INLINING
  #undef CONFIG_X86_PPRO_FENCE

in vclock_gettime.c.



This was discovered by me.


I think we could make it a bit less hacky by just restricting the
inlining of the paravirt case, since that's presumably the crap code
that causes things to grow too large. Or find out what in there it is
that explodes in size, and just try to de-crapify the code enough that
it no longer does that.



The two options above makes the code grow. The x86 pro fence make add
alternatives which increase the code by 600 bytes and the optimize
inlining will add another 500 bytes.

But this is not a real solution, at least when vcpu function support
will be added, then the code size will exceed the page size. Reserving
two pages for the VDSO is a good option.

- Stefani



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani


Zitat von H. Peter Anvin h...@linux.intel.com:


On 03/10/2014 01:03 PM, Stefani Seibold wrote:


What is now the next step? Kick out the compat VDSO? Or should i
implement the dual VDSO. And what is now the preferred way to map the
VDSO into the user space? Using install_special_mapping() or map it
beyond the user stack?

The is easiest and fastest way to get a working result is to do the non
compat VDSO only mapping using install_special_mapping(). The dual VDSO
would take a little bit more time.

It would be great to have first a consensus about the design before i
start to implement ;-)



The quick way to get something working is simply to reserve more than
one page (two should presumably be enough) in the fixmap and adjust the
link address of the VDSO accordingly.  This is not where we want to go
in the long term, but it doesn't seem to make sense to try to do
everything all at once -- we are already starting to push way too close
to the 3.15 merge window.



Do you except a complete new patch set or an incremental patch based on the
current patch set?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Linus Torvalds
On Mon, Mar 10, 2014 at 2:25 PM,  stef...@seibold.net wrote:

 This was discovered by me.

Sorry for the misattribution.

 But this is not a real solution, at least when vcpu function support
 will be added, then the code size will exceed the page size. Reserving
 two pages for the VDSO is a good option.

Quite frankly, there is no way in hell I will take a patch like that
for 3.14 any more, and I would argue against it for stable.

Now, if this problem never happens with current kernels (because it's
purely due to the patch in -tip), then I don't much care.

That said, I don't understand why we are even adding new features like
this to 32-bit mode in the first place, so if that patch is the sole
source of all this headache, then why not just throw the patch away?

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 2:20 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Mon, Mar 10, 2014 at 1:19 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 If the only immediate problem is the code generation size, then Andy
 already had a (simpler) hack-around:

   #undef CONFIG_OPTIMIZE_INLINING
   #undef CONFIG_X86_PPRO_FENCE

 in vclock_gettime.c

 Btw, we should seriously consider getting rid of CONFIG_X86_PPRO_FENCE.

 It was of questionable value to begin with, and I think that the
 actual PPro bug is about one of

  - Errata 66, Delayed line invalidation.
  - Errata 92, Potential loss of data coherency

 both of which affect all PPro versions afaik (there is also a UP
 errata 51 wrt ordering of cached and uncached accesses that was fixed
 in the sB1 stepping).

 And as far as I know, we have never actually seen the bug in real
 life, EVEN WHEN PPRO WAS COMMON. The workaround was always based on
 knowledge of the errata afaik.

I admit I don't fully follow the description of the errata, but it's
not obvious to me that making smp_rmb() emit lfence is going to do any
good.  The description seems to be suggesting using actual LOCK
operations to work around the erratum.


 So I do think we might want to consider retiring that config option
 entirely as a historical oddity.

 And very much so for the vdso case. Do we even do the asm alternative
 fixups for the vdso?

Yes, we've done that for a couple years for rdtsc_barrier's benefit.


 I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
 limit it to !SMP - I don't think anybody ever made SMP systems with
 those IDT/Centaur Winchip chips in them.

Why does OOSTORE matter for !SMP?  Is it just for poking at hardware registers?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Dave Jones
On Mon, Mar 10, 2014 at 02:20:34PM -0700, Linus Torvalds wrote:

  I also suspect we should get rid of CONFIG_X86_OOSTORE, or at least
  limit it to !SMP - I don't think anybody ever made SMP systems with
  those IDT/Centaur Winchip chips in them.

Given the number of people who ever used that code when it was new could
probably be counted on a couple hands, I'd be amazed if a) anyone was still
using it, and b) that it hasn't regressed in some way in the last 15 years.

Even when it worked, it was only a small performance increase anyway,
and anyone who notices a circa 1998 CPU is now slightly slower on benchmarks
in 2014 probably needs psychiatric help.

I'd say rip it out completely.

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread stefani

Zitat von Linus Torvalds torva...@linux-foundation.org:


On Mon, Mar 10, 2014 at 2:25 PM,  stef...@seibold.net wrote:


This was discovered by me.


Sorry for the misattribution.


But this is not a real solution, at least when vcpu function support
will be added, then the code size will exceed the page size. Reserving
two pages for the VDSO is a good option.


Quite frankly, there is no way in hell I will take a patch like that
for 3.14 any more, and I would argue against it for stable.

Now, if this problem never happens with current kernels (because it's
purely due to the patch in -tip), then I don't much care.

That said, I don't understand why we are even adding new features like
this to 32-bit mode in the first place, so if that patch is the sole
source of all this headache, then why not just throw the patch away?



The patch is working. And for this current issue there is a solution i already
announced.

A dual VDSO: a one page sized VDSO for the compat mode which has only  
the syscall
code and on multi page sized VDSO which is mapped into user space for  
the non compat

mode.

This will work and has no side effects.

- Stefani


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 2:53 PM,  stef...@seibold.net wrote:
 Zitat von Linus Torvalds torva...@linux-foundation.org:

 On Mon, Mar 10, 2014 at 2:25 PM,  stef...@seibold.net wrote:


 This was discovered by me.


 Sorry for the misattribution.

 But this is not a real solution, at least when vcpu function support
 will be added, then the code size will exceed the page size. Reserving
 two pages for the VDSO is a good option.


 Quite frankly, there is no way in hell I will take a patch like that
 for 3.14 any more, and I would argue against it for stable.

 Now, if this problem never happens with current kernels (because it's
 purely due to the patch in -tip), then I don't much care.

 That said, I don't understand why we are even adding new features like
 this to 32-bit mode in the first place, so if that patch is the sole
 source of all this headache, then why not just throw the patch away?


 The patch is working. And for this current issue there is a solution i
 already
 announced.

 A dual VDSO: a one page sized VDSO for the compat mode which has only the
 syscall
 code and on multi page sized VDSO which is mapped into user space for the
 non compat
 mode.

 This will work and has no side effects.

IMO this is dumb.  I can think of two sensible solutions:

1. Get rid of compat vdso and replace it with no vdso at all.  This is
compatible with everything and requires almost no code :)

2. Fix compat vdso.  Give it as much space as needed, make the address
dynamic, and relocate it to the right place.

I see no legitimate reason to further increase the number of 32-bit
vdso images.  Three is already ridiculous, and adding more is IMO
hideous.

#1 is actually a serious proposal.  To do it right, I think we should
rename the config option to CONFIG_BROKEN_GLIBC_VDSO, default it to n,
and make the help text clarify that this only affects certain
non-released glibc versions and that anyone building a new kernel is
highly unlikely to be affected.  Then make vdso=2 act just like
vdso=0.  CONFIG_BROKEN_GLIBC_VDSO just changes the default from vdso=1
to vdso=0.

Damn it, the number of users who (a) have a buggy copy of glibc, (b)
are using new kernels, and (c) are using CONFIG_COMPAT_VDSO as opposed
to, say, vdso=2 is probably very close to zero.  (These users will
have issues until they fix their config.)

The number of users who (a) have a buggy copy of glibc, (b) are using
new kernels, and (c) have cpus that derive significant benefit from
using a vdso instead of int 80 and care at all is probably also very
close to zero.

The maintenance burden of this piece of shite is empirically quite far
from zero.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread Andy Lutomirski
On Mon, Mar 10, 2014 at 3:03 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Mon, Mar 10, 2014 at 2:53 PM,  stef...@seibold.net wrote:
 Zitat von Linus Torvalds torva...@linux-foundation.org:

 On Mon, Mar 10, 2014 at 2:25 PM,  stef...@seibold.net wrote:


 This was discovered by me.


 Sorry for the misattribution.

 But this is not a real solution, at least when vcpu function support
 will be added, then the code size will exceed the page size. Reserving
 two pages for the VDSO is a good option.


 Quite frankly, there is no way in hell I will take a patch like that
 for 3.14 any more, and I would argue against it for stable.

 Now, if this problem never happens with current kernels (because it's
 purely due to the patch in -tip), then I don't much care.

 That said, I don't understand why we are even adding new features like
 this to 32-bit mode in the first place, so if that patch is the sole
 source of all this headache, then why not just throw the patch away?


 The patch is working. And for this current issue there is a solution i
 already
 announced.

 A dual VDSO: a one page sized VDSO for the compat mode which has only the
 syscall
 code and on multi page sized VDSO which is mapped into user space for the
 non compat
 mode.

 This will work and has no side effects.

 IMO this is dumb.  I can think of two sensible solutions:

 1. Get rid of compat vdso and replace it with no vdso at all.  This is
 compatible with everything and requires almost no code :)

 2. Fix compat vdso.  Give it as much space as needed, make the address
 dynamic, and relocate it to the right place.

 I see no legitimate reason to further increase the number of 32-bit
 vdso images.  Three is already ridiculous, and adding more is IMO
 hideous.

 #1 is actually a serious proposal.  To do it right, I think we should
 rename the config option to CONFIG_BROKEN_GLIBC_VDSO, default it to n,
 and make the help text clarify that this only affects certain
 non-released glibc versions and that anyone building a new kernel is
 highly unlikely to be affected.  Then make vdso=2 act just like
 vdso=0.  CONFIG_BROKEN_GLIBC_VDSO just changes the default from vdso=1
 to vdso=0.

 Damn it, the number of users who (a) have a buggy copy of glibc, (b)
 are using new kernels, and (c) are using CONFIG_COMPAT_VDSO as opposed
 to, say, vdso=2 is probably very close to zero.  (These users will
 have issues until they fix their config.)

 The number of users who (a) have a buggy copy of glibc, (b) are using
 new kernels, and (c) have cpus that derive significant benefit from
 using a vdso instead of int 80 and care at all is probably also very
 close to zero.

 The maintenance burden of this piece of shite is empirically quite far
 from zero.

I'm testing a patch.  If it seems to work, I'll send it out.  It's a
big cleanup.


 --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 02:51 PM, Dave Jones wrote:
 
 Even when it worked, it was only a small performance increase anyway,
 

If it is performance rather then correctness, then let's kill it now.

I'd love to push that patchset already for 3.15, anyone want to write it
up (I'm on a trip)... otherwise I'll do it Wednesday or so.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-10 Thread H. Peter Anvin
On 03/10/2014 02:39 PM, Linus Torvalds wrote:
 On Mon, Mar 10, 2014 at 2:25 PM,  stef...@seibold.net wrote:

 This was discovered by me.
 
 Sorry for the misattribution.
 
 But this is not a real solution, at least when vcpu function support
 will be added, then the code size will exceed the page size. Reserving
 two pages for the VDSO is a good option.
 
 Quite frankly, there is no way in hell I will take a patch like that
 for 3.14 any more, and I would argue against it for stable.
 
 Now, if this problem never happens with current kernels (because it's
 purely due to the patch in -tip), then I don't much care.
 

It is only for tip:x86/vdso, so current kernels don't matter.

There is going to be 32-bit use in the embedded sector for a long time
to come, I suspect/fear, so I'm not opposed to giving it a bit of a
performance boost as long as it isn't too invasive.

I think Andy's commentary applies, though :)

 IMO this is dumb.  I can think of two sensible solutions:
 
 1. Get rid of compat vdso and replace it with no vdso at all.  This is
 compatible with everything and requires almost no code 
 
 2. Fix compat vdso.  Give it as much space as needed, make the address
 dynamic, and relocate it to the right place.


-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Andy Lutomirski
On Sun, Mar 9, 2014 at 8:18 PM, Andy Lutomirski  wrote:
> (Of course, I haven't the faintest idea what l_addr in glibc means.
> If there was a way to arrange for l_addr to be zero, then maybe none
> of this would matter.  Hmm, I wonder if just not relocating the vdso
> at all would have the desired effect.  Anyone out there understand
> glibc?)

No, that won't work.  The bug is that glibc expects PT_DYNAMIC's vaddr
to be the virtual address of the dynamic table.  This can only be true
if the vdso is mapped at the address that the kernel relocated it to.

I also learned that glibc's code is really hideous.  Wow.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Andy Lutomirski
On Sun, Mar 9, 2014 at 5:16 PM, H. Peter Anvin  wrote:
> On 03/09/2014 12:47 AM, Stefani Seibold wrote:
>>
>> But let me ask an other question: Is the compat mode still needed
>> anymore?
>>
>> Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
>> the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
>> 32 bit emulation layer the address is not fix.
>>
>> So all applications can fail when try directly access the VDSO page with
>> a hard coded address 0xe000.
>>
>> IMHO this is broken. So an other solution is to remove the whole VDSO
>> compat code.
>>
>
> Lguest, Xen, OLPC and reservetop are corner cases.  My understanding is
> that at least one widely used distro actually cared about this, and
> Linus especially is adamant that "we don't break userspace."

OK, I did some research.  I think that the commit that fixed the glibc bug was:

commit 49ad572a70b8aeb91e57483a11dd1b77e31c4468
Author: Ulrich Drepper 
Date:   Sat Feb 28 17:56:22 2004 +

Update.

* elf/rtld.c (dl_main): Adjust l->l_ld of the vDSO by l->l_addr.
* sysdeps/generic/dl-sysdep.c (_dl_sysdep_start): Only set
GL(dl_sysinfo) if non-zero.

I don't think that the actual load address of the VDSO matters at all.
 Here's what I think is going on:

When the kernel is built, vdso32-int80.so looks like this (excerpted
from objdump -T):

DYNAMIC SYMBOL TABLE:
0420 gDF .text  0003  LINUX_2.5   __kernel_vsyscall
 gDO *ABS*    LINUX_2.5   LINUX_2.5
0410 gDF .text  0008  LINUX_2.5   __kernel_rt_sigreturn
0400 gDF .text  0009  LINUX_2.5   __kernel_sigreturn

When the kernel is run, the kernel "relocates" the vdso, generating
something more like:

DYNAMIC SYMBOL TABLE:
e420 gDF .text  0014  LINUX_2.5   __kernel_vsyscall
 gDO *ABS*    LINUX_2.5   LINUX_2.5
e410 gDF .text  0008  LINUX_2.5   __kernel_rt_sigreturn
e400 gDF .text  0009  LINUX_2.5   __kernel_sigreturn

That magic 0xe000 offset comes from VDSO_HIGH_BASE - VDSO_PRELINK,
and VDSO_PRELINK seems like an amazingly complicated way to say
"zero".

Before the fix, it looks like glibc couldn't handle a vdso that was
mapped in such a way that its ELF headers didn't match its actual
location.  Now it can.  This is borne out by this message:

commit d4f7a2c18e59e0304a1c733589ce14fc02fec1bd
Author: Jeremy Fitzhardinge 
Date:   Wed May 2 19:27:12 2007 +0200

[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT

Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address.  Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space.  However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

I suspect that it's entirely safe to map the 32-bit vdso wherever the
hell we want, so long as it's relocated to match the actual mapping
address.  In principle it could even live outside the fixmap, as long
as the actual binary that gets run doesn't end up on top of it.

So... I propose that we get rid of all the madness.  Fix the vdso32
setup code to stop being insane.  That means: stop memcpying the vdso
image anywhere and get rid of all references to the magical and wrong
number "3".  Just map it wherever it needs to be mapped and relocate
the damn think *in place*.  If some RODATA crud gets in the way,
twiddle the protection bits as needed.  That means that all this
"vvars before vdso" nonsense can go away.

(Of course, I haven't the faintest idea what l_addr in glibc means.
If there was a way to arrange for l_addr to be zero, then maybe none
of this would matter.  Hmm, I wonder if just not relocating the vdso
at all would have the desired effect.  Anyone out there understand
glibc?)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread H. Peter Anvin
On 03/09/2014 12:47 AM, Stefani Seibold wrote:
> 
> But let me ask an other question: Is the compat mode still needed
> anymore?
> 
> Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
> the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
> 32 bit emulation layer the address is not fix.
> 
> So all applications can fail when try directly access the VDSO page with
> a hard coded address 0xe000.
> 
> IMHO this is broken. So an other solution is to remove the whole VDSO
> compat code.
> 

Lguest, Xen, OLPC and reservetop are corner cases.  My understanding is
that at least one widely used distro actually cared about this, and
Linus especially is adamant that "we don't break userspace."

The dual vdso approach might be the best bet, for the cases where
compatibility is even possible.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread H. Peter Anvin
On 03/09/2014 12:08 AM, Stefani Seibold wrote:
> 
> This was not addressed to you, it was addressed to the x86 intel kernel
> developers to do more testing, since this piece of code has so many side
> effects. I apologizes this miss understanding.
> 

I think you're misunderstanding.

We cannot debug every single contributors' code for them.  There isn't
enough of us to go around.  We have in fact stretched well beyond the
point which we usually can accommodate for this particular patchset.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Stefani Seibold
Am Freitag, den 07.03.2014, 15:07 -0800 schrieb Andy Lutomirski:
> On Fri, Mar 7, 2014 at 1:53 PM, Stefani Seibold  wrote:
> >
> > Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
> >> On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold  
> >> wrote:
> >> > Hi Fengguang,
> >> >
> >> > i have build a kernel with the config, but my kvm is unable to start it.
> >> > I will try to find a way to test your kernek config.
> >> >
> >> > One thing is the crash point:
> >> >
> >> > The function sysenter_setup was modified by Andy, maybe he has an idea
> >> > what fails.
> >>
> >> *sigh*
> >>
> >> My host kernel is currently fscked up and won't run KVM.  Also, I want
> >> to confirm that I'm reproducing exactly what you're seeing, and I
> >> think it depends on the toolchain.  Can you (Fenguang) do:
> >>
> >> $ ls -l arch/x86/vdso/vdso32*.so
> >> -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
> >> -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so
> >>
> >> (Of course, triggering this depends on which image gets selected.)
> >>
> >
> > Yes, that what i also figured out. There are two culprits:
> > CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
> > increase the size of the code by about 500 bytes.
> >
> > When i add to file arch/x86/vdso/vdso32/vclock_gettime.c
> >
> > #undef CONFIG_OPTIMIZE_INLINING
> > #undef CONFIG_X86_PPRO_FENCE
> >
> > this will solve the issue.
> >
> >> Note that we have a .so file that exceeds 4k, i.e. one page.  Then
> >> read the relevant code and wonder what everyone was smoking when they
> >> wrote it.  There are so many buffer overflows, screwed up
> >> initializations, unnecessary and incorrect copies, etc, that I don't
> >> even want to speculate on what the first failure will be when the
> >> image is bigger than a page.
> >>
> >
> > Right. So the above one will not really solve it. At least when
> > __vdso_getcpu() code will also become a part of the 32 bit VDSO.
> >
> >> It's easy enough to fix, but someone should figure out what the impact
> >> will be on the compat vdso case.
> >>
> >> I wonder how hard it would be to change the compat vdso do be a dummy
> >> image a la the x86_64 fake vsyscall page so that old code can keep
> >> working (maybe with a performance hit) and new code can use a sane
> >> image.
> >>
> >
> > That is exactly what i wrote one week ago:
> >
> > Move the VDSO code before the VDSO compat fixmap area and create a kind
> > of helper VDSO for the VDSO compat fixmap page, which only calls the
> > real VDSO. But this would result in a performance regression for the
> > VDSO compat mode.
> 
> I think that regressing performance for compat_vdso (only) users is
> fine.  We need to figure out what those users are.  I have a vague
> recollection that it's a particular version of SuSE or OpenSuSE.
> 

Before i start to work i would ask if the following is a viable
solution:

The best is to have two different kinds of vDSO for all x86 32 bit
mutations (int80, syscall and sysenter):

- The compat vDSO which has only the __kernel_vsyscall(),
__kernel_sigreturn() and __kernel_rt_sigreturn() support. This will
never exceeds the page size limit.

- And the newer vDSO which has also support for __vdso_clock_gettime(),
__vdso_gettimeofday() and __vdso_time().

In case of compat vDSO (kernel parameter vdso=2) we map the compat vDSO
to the fixmap address. So we have exactly the old behaviour and there is
no regression nor a compatibility issue.

For the non compat vDSO suport (kernel parameter vdso=1) we can use the
larger vDSO with the time support functions, because we have no
limitations in the size of the vDSO.

This could be done very easily.

But let me ask an other question: Is the compat mode still needed
anymore?

Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
32 bit emulation layer the address is not fix.

So all applications can fail when try directly access the VDSO page with
a hard coded address 0xe000.

IMHO this is broken. So an other solution is to remove the whole VDSO
compat code.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Stefani Seibold
Am Samstag, den 08.03.2014, 07:44 +0800 schrieb Fengguang Wu:
> Hi Stefani,
> 
> > So i tried my best, but without support it is impossible to find all
> > issues. But mostly what i get was bureaucracy afflictions
> > 
> > I complied, but now it is time to help finding the issues. And not only
> > do a complain, sit back and wait.
> 
> I feel sorry if that's what you perceived. But I'm just submitting
> test results rather than complaining. I should actually be glad if my
> test system catches more bugs. ;-) And there is no way for me to sit
> back - I'm actually overloaded. Yesterday I wrote 63 emails, which is
> one per 10 minutes _assuming_ I'm working 8hours. You can imagine the
> works required behind all these emails.
> 

This was not addressed to you, it was addressed to the x86 intel kernel
developers to do more testing, since this piece of code has so many side
effects. I apologizes this miss understanding.

- Stefani

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Stefani Seibold
Am Samstag, den 08.03.2014, 07:44 +0800 schrieb Fengguang Wu:
 Hi Stefani,
 
  So i tried my best, but without support it is impossible to find all
  issues. But mostly what i get was bureaucracy afflictions
  
  I complied, but now it is time to help finding the issues. And not only
  do a complain, sit back and wait.
 
 I feel sorry if that's what you perceived. But I'm just submitting
 test results rather than complaining. I should actually be glad if my
 test system catches more bugs. ;-) And there is no way for me to sit
 back - I'm actually overloaded. Yesterday I wrote 63 emails, which is
 one per 10 minutes _assuming_ I'm working 8hours. You can imagine the
 works required behind all these emails.
 

This was not addressed to you, it was addressed to the x86 intel kernel
developers to do more testing, since this piece of code has so many side
effects. I apologizes this miss understanding.

- Stefani

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Stefani Seibold
Am Freitag, den 07.03.2014, 15:07 -0800 schrieb Andy Lutomirski:
 On Fri, Mar 7, 2014 at 1:53 PM, Stefani Seibold stef...@seibold.net wrote:
 
  Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
  On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold stef...@seibold.net 
  wrote:
   Hi Fengguang,
  
   i have build a kernel with the config, but my kvm is unable to start it.
   I will try to find a way to test your kernek config.
  
   One thing is the crash point:
  
   The function sysenter_setup was modified by Andy, maybe he has an idea
   what fails.
 
  *sigh*
 
  My host kernel is currently fscked up and won't run KVM.  Also, I want
  to confirm that I'm reproducing exactly what you're seeing, and I
  think it depends on the toolchain.  Can you (Fenguang) do:
 
  $ ls -l arch/x86/vdso/vdso32*.so
  -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
  -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so
 
  (Of course, triggering this depends on which image gets selected.)
 
 
  Yes, that what i also figured out. There are two culprits:
  CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
  increase the size of the code by about 500 bytes.
 
  When i add to file arch/x86/vdso/vdso32/vclock_gettime.c
 
  #undef CONFIG_OPTIMIZE_INLINING
  #undef CONFIG_X86_PPRO_FENCE
 
  this will solve the issue.
 
  Note that we have a .so file that exceeds 4k, i.e. one page.  Then
  read the relevant code and wonder what everyone was smoking when they
  wrote it.  There are so many buffer overflows, screwed up
  initializations, unnecessary and incorrect copies, etc, that I don't
  even want to speculate on what the first failure will be when the
  image is bigger than a page.
 
 
  Right. So the above one will not really solve it. At least when
  __vdso_getcpu() code will also become a part of the 32 bit VDSO.
 
  It's easy enough to fix, but someone should figure out what the impact
  will be on the compat vdso case.
 
  I wonder how hard it would be to change the compat vdso do be a dummy
  image a la the x86_64 fake vsyscall page so that old code can keep
  working (maybe with a performance hit) and new code can use a sane
  image.
 
 
  That is exactly what i wrote one week ago:
 
  Move the VDSO code before the VDSO compat fixmap area and create a kind
  of helper VDSO for the VDSO compat fixmap page, which only calls the
  real VDSO. But this would result in a performance regression for the
  VDSO compat mode.
 
 I think that regressing performance for compat_vdso (only) users is
 fine.  We need to figure out what those users are.  I have a vague
 recollection that it's a particular version of SuSE or OpenSuSE.
 

Before i start to work i would ask if the following is a viable
solution:

The best is to have two different kinds of vDSO for all x86 32 bit
mutations (int80, syscall and sysenter):

- The compat vDSO which has only the __kernel_vsyscall(),
__kernel_sigreturn() and __kernel_rt_sigreturn() support. This will
never exceeds the page size limit.

- And the newer vDSO which has also support for __vdso_clock_gettime(),
__vdso_gettimeofday() and __vdso_time().

In case of compat vDSO (kernel parameter vdso=2) we map the compat vDSO
to the fixmap address. So we have exactly the old behaviour and there is
no regression nor a compatibility issue.

For the non compat vDSO suport (kernel parameter vdso=1) we can use the
larger vDSO with the time support functions, because we have no
limitations in the size of the vDSO.

This could be done very easily.

But let me ask an other question: Is the compat mode still needed
anymore?

Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
32 bit emulation layer the address is not fix.

So all applications can fail when try directly access the VDSO page with
a hard coded address 0xe000.

IMHO this is broken. So an other solution is to remove the whole VDSO
compat code.

- Stefani


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread H. Peter Anvin
On 03/09/2014 12:08 AM, Stefani Seibold wrote:
 
 This was not addressed to you, it was addressed to the x86 intel kernel
 developers to do more testing, since this piece of code has so many side
 effects. I apologizes this miss understanding.
 

I think you're misunderstanding.

We cannot debug every single contributors' code for them.  There isn't
enough of us to go around.  We have in fact stretched well beyond the
point which we usually can accommodate for this particular patchset.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread H. Peter Anvin
On 03/09/2014 12:47 AM, Stefani Seibold wrote:
 
 But let me ask an other question: Is the compat mode still needed
 anymore?
 
 Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
 the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
 32 bit emulation layer the address is not fix.
 
 So all applications can fail when try directly access the VDSO page with
 a hard coded address 0xe000.
 
 IMHO this is broken. So an other solution is to remove the whole VDSO
 compat code.
 

Lguest, Xen, OLPC and reservetop are corner cases.  My understanding is
that at least one widely used distro actually cared about this, and
Linus especially is adamant that we don't break userspace.

The dual vdso approach might be the best bet, for the cases where
compatibility is even possible.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Andy Lutomirski
On Sun, Mar 9, 2014 at 5:16 PM, H. Peter Anvin h...@linux.intel.com wrote:
 On 03/09/2014 12:47 AM, Stefani Seibold wrote:

 But let me ask an other question: Is the compat mode still needed
 anymore?

 Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
 the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
 32 bit emulation layer the address is not fix.

 So all applications can fail when try directly access the VDSO page with
 a hard coded address 0xe000.

 IMHO this is broken. So an other solution is to remove the whole VDSO
 compat code.


 Lguest, Xen, OLPC and reservetop are corner cases.  My understanding is
 that at least one widely used distro actually cared about this, and
 Linus especially is adamant that we don't break userspace.

OK, I did some research.  I think that the commit that fixed the glibc bug was:

commit 49ad572a70b8aeb91e57483a11dd1b77e31c4468
Author: Ulrich Drepper drep...@redhat.com
Date:   Sat Feb 28 17:56:22 2004 +

Update.

* elf/rtld.c (dl_main): Adjust l-l_ld of the vDSO by l-l_addr.
* sysdeps/generic/dl-sysdep.c (_dl_sysdep_start): Only set
GL(dl_sysinfo) if non-zero.

I don't think that the actual load address of the VDSO matters at all.
 Here's what I think is going on:

When the kernel is built, vdso32-int80.so looks like this (excerpted
from objdump -T):

DYNAMIC SYMBOL TABLE:
0420 gDF .text  0003  LINUX_2.5   __kernel_vsyscall
 gDO *ABS*    LINUX_2.5   LINUX_2.5
0410 gDF .text  0008  LINUX_2.5   __kernel_rt_sigreturn
0400 gDF .text  0009  LINUX_2.5   __kernel_sigreturn

When the kernel is run, the kernel relocates the vdso, generating
something more like:

DYNAMIC SYMBOL TABLE:
e420 gDF .text  0014  LINUX_2.5   __kernel_vsyscall
 gDO *ABS*    LINUX_2.5   LINUX_2.5
e410 gDF .text  0008  LINUX_2.5   __kernel_rt_sigreturn
e400 gDF .text  0009  LINUX_2.5   __kernel_sigreturn

That magic 0xe000 offset comes from VDSO_HIGH_BASE - VDSO_PRELINK,
and VDSO_PRELINK seems like an amazingly complicated way to say
zero.

Before the fix, it looks like glibc couldn't handle a vdso that was
mapped in such a way that its ELF headers didn't match its actual
location.  Now it can.  This is borne out by this message:

commit d4f7a2c18e59e0304a1c733589ce14fc02fec1bd
Author: Jeremy Fitzhardinge jer...@goop.org
Date:   Wed May 2 19:27:12 2007 +0200

[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT

Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address.  Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space.  However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

I suspect that it's entirely safe to map the 32-bit vdso wherever the
hell we want, so long as it's relocated to match the actual mapping
address.  In principle it could even live outside the fixmap, as long
as the actual binary that gets run doesn't end up on top of it.

So... I propose that we get rid of all the madness.  Fix the vdso32
setup code to stop being insane.  That means: stop memcpying the vdso
image anywhere and get rid of all references to the magical and wrong
number 3.  Just map it wherever it needs to be mapped and relocate
the damn think *in place*.  If some RODATA crud gets in the way,
twiddle the protection bits as needed.  That means that all this
vvars before vdso nonsense can go away.

(Of course, I haven't the faintest idea what l_addr in glibc means.
If there was a way to arrange for l_addr to be zero, then maybe none
of this would matter.  Hmm, I wonder if just not relocating the vdso
at all would have the desired effect.  Anyone out there understand
glibc?)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-09 Thread Andy Lutomirski
On Sun, Mar 9, 2014 at 8:18 PM, Andy Lutomirski l...@amacapital.net wrote:
 (Of course, I haven't the faintest idea what l_addr in glibc means.
 If there was a way to arrange for l_addr to be zero, then maybe none
 of this would matter.  Hmm, I wonder if just not relocating the vdso
 at all would have the desired effect.  Anyone out there understand
 glibc?)

No, that won't work.  The bug is that glibc expects PT_DYNAMIC's vaddr
to be the virtual address of the dynamic table.  This can only be true
if the vdso is mapped at the address that the kernel relocated it to.

I also learned that glibc's code is really hideous.  Wow.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

> So i tried my best, but without support it is impossible to find all
> issues. But mostly what i get was bureaucracy afflictions
> 
> I complied, but now it is time to help finding the issues. And not only
> do a complain, sit back and wait.

I feel sorry if that's what you perceived. But I'm just submitting
test results rather than complaining. I should actually be glad if my
test system catches more bugs. ;-) And there is no way for me to sit
back - I'm actually overloaded. Yesterday I wrote 63 emails, which is
one per 10 minutes _assuming_ I'm working 8hours. You can imagine the
works required behind all these emails.

> If i haed a  8192 core i7 XEON machine i would be able to test all
> mutations of kernels. But i have not (despite i cannot pay the invoice).
> 
> Also i get no support by people who ask me to do this work. I am really
> pissed of.

We tried hard to build the test infrastructure for the good of Linux
community. And if you like, I'd be happy to add your git tree to our
test pool - currently it already includes 300+ kernel git trees from
various developers. It'd feel more at home to find bugs in one's own
tree, rather than in the maintainers'. :-)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread H. Peter Anvin
On 03/07/2014 08:06 AM, Stefani Seibold wrote:
>>
>> wfg@bee /tmp% git clone --reference /c/linux 
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> 
> As i wrote i already cloned the tip tree!!
> 
> But i cannot see the changeset, there is also no VDSO changes set in the
> git log.
> 

It isn't on the master branch because it hasn't been stable enough to merge.

You need to do:

git checkout x86/vdso

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Andy Lutomirski
On Fri, Mar 7, 2014 at 1:53 PM, Stefani Seibold  wrote:
>
> Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
>> On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold  wrote:
>> > Hi Fengguang,
>> >
>> > i have build a kernel with the config, but my kvm is unable to start it.
>> > I will try to find a way to test your kernek config.
>> >
>> > One thing is the crash point:
>> >
>> > The function sysenter_setup was modified by Andy, maybe he has an idea
>> > what fails.
>>
>> *sigh*
>>
>> My host kernel is currently fscked up and won't run KVM.  Also, I want
>> to confirm that I'm reproducing exactly what you're seeing, and I
>> think it depends on the toolchain.  Can you (Fenguang) do:
>>
>> $ ls -l arch/x86/vdso/vdso32*.so
>> -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
>> -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so
>>
>> (Of course, triggering this depends on which image gets selected.)
>>
>
> Yes, that what i also figured out. There are two culprits:
> CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
> increase the size of the code by about 500 bytes.
>
> When i add to file arch/x86/vdso/vdso32/vclock_gettime.c
>
> #undef CONFIG_OPTIMIZE_INLINING
> #undef CONFIG_X86_PPRO_FENCE
>
> this will solve the issue.
>
>> Note that we have a .so file that exceeds 4k, i.e. one page.  Then
>> read the relevant code and wonder what everyone was smoking when they
>> wrote it.  There are so many buffer overflows, screwed up
>> initializations, unnecessary and incorrect copies, etc, that I don't
>> even want to speculate on what the first failure will be when the
>> image is bigger than a page.
>>
>
> Right. So the above one will not really solve it. At least when
> __vdso_getcpu() code will also become a part of the 32 bit VDSO.
>
>> It's easy enough to fix, but someone should figure out what the impact
>> will be on the compat vdso case.
>>
>> I wonder how hard it would be to change the compat vdso do be a dummy
>> image a la the x86_64 fake vsyscall page so that old code can keep
>> working (maybe with a performance hit) and new code can use a sane
>> image.
>>
>
> That is exactly what i wrote one week ago:
>
> Move the VDSO code before the VDSO compat fixmap area and create a kind
> of helper VDSO for the VDSO compat fixmap page, which only calls the
> real VDSO. But this would result in a performance regression for the
> VDSO compat mode.

I think that regressing performance for compat_vdso (only) users is
fine.  We need to figure out what those users are.  I have a vague
recollection that it's a particular version of SuSE or OpenSuSE.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold

Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
> On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold  wrote:
> > Hi Fengguang,
> >
> > i have build a kernel with the config, but my kvm is unable to start it.
> > I will try to find a way to test your kernek config.
> >
> > One thing is the crash point:
> >
> > The function sysenter_setup was modified by Andy, maybe he has an idea
> > what fails.
> 
> *sigh*
> 
> My host kernel is currently fscked up and won't run KVM.  Also, I want
> to confirm that I'm reproducing exactly what you're seeing, and I
> think it depends on the toolchain.  Can you (Fenguang) do:
> 
> $ ls -l arch/x86/vdso/vdso32*.so
> -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
> -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so
> 
> (Of course, triggering this depends on which image gets selected.)
> 

Yes, that what i also figured out. There are two culprits:
CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
increase the size of the code by about 500 bytes.

When i add to file arch/x86/vdso/vdso32/vclock_gettime.c

#undef CONFIG_OPTIMIZE_INLINING
#undef CONFIG_X86_PPRO_FENCE

this will solve the issue.

> Note that we have a .so file that exceeds 4k, i.e. one page.  Then
> read the relevant code and wonder what everyone was smoking when they
> wrote it.  There are so many buffer overflows, screwed up
> initializations, unnecessary and incorrect copies, etc, that I don't
> even want to speculate on what the first failure will be when the
> image is bigger than a page.
>

Right. So the above one will not really solve it. At least when
__vdso_getcpu() code will also become a part of the 32 bit VDSO.
 
> It's easy enough to fix, but someone should figure out what the impact
> will be on the compat vdso case.
> 
> I wonder how hard it would be to change the compat vdso do be a dummy
> image a la the x86_64 fake vsyscall page so that old code can keep
> working (maybe with a performance hit) and new code can use a sane
> image.
> 

That is exactly what i wrote one week ago:

Move the VDSO code before the VDSO compat fixmap area and create a kind
of helper VDSO for the VDSO compat fixmap page, which only calls the
real VDSO. But this would result in a performance regression for the
VDSO compat mode.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Andy Lutomirski
On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold  wrote:
> Hi Fengguang,
>
> i have build a kernel with the config, but my kvm is unable to start it.
> I will try to find a way to test your kernek config.
>
> One thing is the crash point:
>
> The function sysenter_setup was modified by Andy, maybe he has an idea
> what fails.

*sigh*

My host kernel is currently fscked up and won't run KVM.  Also, I want
to confirm that I'm reproducing exactly what you're seeing, and I
think it depends on the toolchain.  Can you (Fenguang) do:

$ ls -l arch/x86/vdso/vdso32*.so
-rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
-rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so

(Of course, triggering this depends on which image gets selected.)

Note that we have a .so file that exceeds 4k, i.e. one page.  Then
read the relevant code and wonder what everyone was smoking when they
wrote it.  There are so many buffer overflows, screwed up
initializations, unnecessary and incorrect copies, etc, that I don't
even want to speculate on what the first failure will be when the
image is bigger than a page.

It's easy enough to fix, but someone should figure out what the impact
will be on the compat vdso case.

I wonder how hard it would be to change the compat vdso do be a dummy
image a la the x86_64 fake vsyscall page so that old code can keep
working (maybe with a performance hit) and new code can use a sane
image.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold
Am Freitag, den 07.03.2014, 18:21 +0800 schrieb Fengguang Wu:
> Hi Stefani,
> 
> On Fri, Mar 07, 2014 at 10:57:28AM +0100, Stefani Seibold wrote:
> > Hi Fengguang,
> > 
> > did you test the config i had sent to you?
> > 
> > My test was all done with current 3.14-rc tree. And with this i have no
> > problem. 
> 
> The regression is found on commit 4dea8e4824b363c53f320d328040d7c6c5921419
> ("x86, vdso: Add 32 bit VDSO time support for 32 bit kernel") in tip tree.
> 
> In the bisect log, you can see that next-20140306 is GOOD. So there's
> no way you can find the bug in 3.14-rcX.
> 
> > I just cloned the tip tree and i figured out that the patch was dropped
> > again (BTW: git log does not show that it was ever applied).
> 
> You can still access that specific commit:
> 
> wfg@bee /tmp% git clone --reference /c/linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

As i wrote i already cloned the tip tree!!

But i cannot see the changeset, there is also no VDSO changes set in the
git log.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

On Fri, Mar 07, 2014 at 10:57:28AM +0100, Stefani Seibold wrote:
> Hi Fengguang,
> 
> did you test the config i had sent to you?
> 
> My test was all done with current 3.14-rc tree. And with this i have no
> problem. 

The regression is found on commit 4dea8e4824b363c53f320d328040d7c6c5921419
("x86, vdso: Add 32 bit VDSO time support for 32 bit kernel") in tip tree.

In the bisect log, you can see that next-20140306 is GOOD. So there's
no way you can find the bug in 3.14-rcX.

> I just cloned the tip tree and i figured out that the patch was dropped
> again (BTW: git log does not show that it was ever applied).

You can still access that specific commit:

wfg@bee /tmp% git clone --reference /c/linux 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Cloning into 'tip'...
remote: Counting objects: 27506, done.
remote: Compressing objects: 100% (7812/7812), done.
remote: Total 25517 (delta 18365), reused 23335 (delta 16786)
Receiving objects: 100% (25517/25517), 7.76 MiB | 31.00 KiB/s, done.
Resolving deltas: 100% (18365/18365), completed with 1321 local objects.
Checking connectivity... done.
Checking out files: 100% (46209/46209), done.
wfg@bee /tmp% cd tip
wfg@bee /tmp/tip% git show 4dea8e4824b363c53f320d328040d7c6c5921419|head   
commit 4dea8e4824b363c53f320d328040d7c6c5921419
Author: Stefani Seibold 
Date:   Mon Mar 3 22:12:20 2014 +0100

x86, vdso: Add 32 bit VDSO time support for 32 bit kernel

> Okay, that's enough for me. If it nearly impossible to cut this gordian
> knot without support and test from the intel kernel developer group.
> 
> The origin code was not in the best shape too. I cannot understand why
> this was going into mainline without 32 bit support.
> 
> So i tried my best, but without support it is impossible to find all
> issues. But mostly what i get was bureaucracy afflictions
> 
> I complied, but now it is time to help finding the issues. And not only
> do a complain, sit back and wait.
> 
> If i haed a  8192 core i7 XEON machine i would be able to test all
> mutations of kernels. But i have not (despite i cannot pay the invoice).
> 
> Also i get no support by people who ask me to do this work. I am really
> pissed of.
> 
> - Stefani
> 
> Am Freitag, den 07.03.2014, 17:15 +0800 schrieb Fengguang Wu:
> 
> > Hi Stefani,
> > 
> > On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
> > > Hi Fengguang,
> > > 
> > > i was now able to bring up the kernel on my KVM with some minior
> > > changes. I kick out the PARIDE, switched to IDE and activated the VT
> > > support. With this modifications the kernel boot and i get no BUG,
> > > everything is fine!
> > > 
> > > So i can not reproduce the bug and i want ask you to check the attached
> > > kernel config. If this also works for you the problem is maybe located
> > > in the environment, f.e. gcc.
> > 
> > I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
> > I can reproduce it reliably - see the screen dump below. You can find
> > the reproduce script at the end of this email.
> > 
> > wfg@bee 
> > /kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78%
> >  kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
> > early console in setup code
> > [0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
> > version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
> > [0.00] e820: BIOS-provided physical RAM map:
> > [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
> > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
> > reserved
> > [0.00] BIOS-e820: [mem 0x000f-0x000f] 
> > reserved
> > [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
> > [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] 
> > reserved
> > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
> > reserved
> > [0.00] BIOS-e820: [mem 0xfffc-0x] 
> > reserved
> > [0.00] debug: ignoring loglevel setting.
> > [0.00] NX (Execute Disable) protection: active
> > [0.00] Hypervisor detected: KVM
> > [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
> > [0.00] e820: remove [mem 0x000a-0x000f] usable
> > [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
> > [0.00] MTRR default type: write-back
> > [0.00] MTRR fixed ranges enabled:
> > [0.00]   0-9 write-back
> > [0.00]   A-B uncachable
> > [0.00]   C-F write-protect
> > [0.00] MTRR variable ranges enabled:
> > [0.00]   0 base 008000 mask FF8000 uncachable
> > [0.00]   1 disabled
> > [0.00]   2 disabled
> > [0.00]   3 disabled
> > [0.00]   4 disabled
> > [0.00]   5 disabled
> > [0.00]   6 disabled
> > [0.00]   7 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold
Hi Fengguang,

did you test the config i had sent to you?

My test was all done with current 3.14-rc tree. And with this i have no
problem. 

I just cloned the tip tree and i figured out that the patch was dropped
again (BTW: git log does not show that it was ever applied).

Okay, that's enough for me. If it nearly impossible to cut this gordian
knot without support and test from the intel kernel developer group.

The origin code was not in the best shape too. I cannot understand why
this was going into mainline without 32 bit support.

So i tried my best, but without support it is impossible to find all
issues. But mostly what i get was bureaucracy afflictions

I complied, but now it is time to help finding the issues. And not only
do a complain, sit back and wait.

If i haed a  8192 core i7 XEON machine i would be able to test all
mutations of kernels. But i have not (despite i cannot pay the invoice).

Also i get no support by people who ask me to do this work. I am really
pissed of.

- Stefani

Am Freitag, den 07.03.2014, 17:15 +0800 schrieb Fengguang Wu:

> Hi Stefani,
> 
> On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
> > Hi Fengguang,
> > 
> > i was now able to bring up the kernel on my KVM with some minior
> > changes. I kick out the PARIDE, switched to IDE and activated the VT
> > support. With this modifications the kernel boot and i get no BUG,
> > everything is fine!
> > 
> > So i can not reproduce the bug and i want ask you to check the attached
> > kernel config. If this also works for you the problem is maybe located
> > in the environment, f.e. gcc.
> 
> I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
> I can reproduce it reliably - see the screen dump below. You can find
> the reproduce script at the end of this email.
> 
> wfg@bee 
> /kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78%
>  kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
> early console in setup code
> [0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
> version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
> [0.00] e820: BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
> [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
> [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
> [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
> [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
> [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
> [0.00] BIOS-e820: [mem 0xfffc-0x] reserved
> [0.00] debug: ignoring loglevel setting.
> [0.00] NX (Execute Disable) protection: active
> [0.00] Hypervisor detected: KVM
> [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
> [0.00] e820: remove [mem 0x000a-0x000f] usable
> [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
> [0.00] MTRR default type: write-back
> [0.00] MTRR fixed ranges enabled:
> [0.00]   0-9 write-back
> [0.00]   A-B uncachable
> [0.00]   C-F write-protect
> [0.00] MTRR variable ranges enabled:
> [0.00]   0 base 008000 mask FF8000 uncachable
> [0.00]   1 disabled
> [0.00]   2 disabled
> [0.00]   3 disabled
> [0.00]   4 disabled
> [0.00]   5 disabled
> [0.00]   6 disabled
> [0.00]   7 disabled
> [0.00] initial memory mapped: [mem 0x-0x023f]
> [0.00] Base memory trampoline at [c009b000] 9b000 size 16384
> [0.00] init_memory_mapping: [mem 0x-0x000f]
> [0.00]  [mem 0x-0x000f] page 4k
> [0.00] init_memory_mapping: [mem 0x0fa0-0x0fbf]
> [0.00]  [mem 0x0fa0-0x0fbf] page 4k
> [0.00] BRK [0x01e02000, 0x01e02fff] PGTABLE
> [0.00] init_memory_mapping: [mem 0x0c00-0x0f9f]
> [0.00]  [mem 0x0c00-0x0f9f] page 4k
> [0.00] BRK [0x01e03000, 0x01e03fff] PGTABLE
> [0.00] BRK [0x01e04000, 0x01e04fff] PGTABLE
> [0.00] BRK [0x01e05000, 0x01e05fff] PGTABLE
> [0.00] BRK [0x01e06000, 0x01e06fff] PGTABLE
> [0.00] BRK [0x01e07000, 0x01e07fff] PGTABLE
> [0.00] init_memory_mapping: [mem 0x0010-0x0bff]
> [0.00]  [mem 0x0010-0x0bff] page 4k
> [0.00] init_memory_mapping: [mem 0x0fc0-0x0fffdfff]
> [0.00]  [mem 0x0fc0-0x0fffdfff] page 4k
> [0.00] RAMDISK: [mem 0x0fce6000-0x0ffe]
> [0.00] ACPI: RSDP 0x000F16B0 14 (v00 BOCHS )
> [0.00] ACPI: RSDT 0x0FFFE3F0 34 (v01 BOCHS  BXPCRSDT 0001 
> BXPC 0001)
> [0.00] ACPI: FACP 0x0F80 74 (v01 BOCHS  BXPCFACP 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
> Hi Fengguang,
> 
> i was now able to bring up the kernel on my KVM with some minior
> changes. I kick out the PARIDE, switched to IDE and activated the VT
> support. With this modifications the kernel boot and i get no BUG,
> everything is fine!
> 
> So i can not reproduce the bug and i want ask you to check the attached
> kernel config. If this also works for you the problem is maybe located
> in the environment, f.e. gcc.

I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
I can reproduce it reliably - see the screen dump below. You can find
the reproduce script at the end of this email.

wfg@bee 
/kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78% 
kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
early console in setup code
[0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
[0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] Hypervisor detected: KVM
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
[0.00] MTRR default type: write-back
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 008000 mask FF8000 uncachable
[0.00]   1 disabled
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] initial memory mapped: [mem 0x-0x023f]
[0.00] Base memory trampoline at [c009b000] 9b000 size 16384
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] init_memory_mapping: [mem 0x0fa0-0x0fbf]
[0.00]  [mem 0x0fa0-0x0fbf] page 4k
[0.00] BRK [0x01e02000, 0x01e02fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0c00-0x0f9f]
[0.00]  [mem 0x0c00-0x0f9f] page 4k
[0.00] BRK [0x01e03000, 0x01e03fff] PGTABLE
[0.00] BRK [0x01e04000, 0x01e04fff] PGTABLE
[0.00] BRK [0x01e05000, 0x01e05fff] PGTABLE
[0.00] BRK [0x01e06000, 0x01e06fff] PGTABLE
[0.00] BRK [0x01e07000, 0x01e07fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0010-0x0bff]
[0.00]  [mem 0x0010-0x0bff] page 4k
[0.00] init_memory_mapping: [mem 0x0fc0-0x0fffdfff]
[0.00]  [mem 0x0fc0-0x0fffdfff] page 4k
[0.00] RAMDISK: [mem 0x0fce6000-0x0ffe]
[0.00] ACPI: RSDP 0x000F16B0 14 (v00 BOCHS )
[0.00] ACPI: RSDT 0x0FFFE3F0 34 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
[0.00] ACPI: FACP 0x0F80 74 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
[0.00] ACPI: DSDT 0x0FFFE430 001137 (v01 BXPC   BXDSDT   0001 INTL 
20100528)
[0.00] ACPI: FACS 0x0F40 40
[0.00] ACPI: SSDT 0x06A0 000899 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
[0.00] ACPI: APIC 0x05B0 80 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
[0.00] ACPI: HPET 0x0570 38 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] mapped APIC to 9000 (fee0)
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffe000
[0.00]   low ram: 0 - 0fffe000
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:fffd001, primary cpu clock
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]   Normal   [mem 0x0100-0x0fffdfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009efff]
[0.00]   node   0: [mem 0x0010-0x0fffdfff]
[0.00] On node 0 totalpages: 65436
[0.00] free_area_init_node: node 0, pgdat 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
 Hi Fengguang,
 
 i was now able to bring up the kernel on my KVM with some minior
 changes. I kick out the PARIDE, switched to IDE and activated the VT
 support. With this modifications the kernel boot and i get no BUG,
 everything is fine!
 
 So i can not reproduce the bug and i want ask you to check the attached
 kernel config. If this also works for you the problem is maybe located
 in the environment, f.e. gcc.

I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
I can reproduce it reliably - see the screen dump below. You can find
the reproduce script at the end of this email.

wfg@bee 
/kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78% 
kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
early console in setup code
[0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
[0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] Hypervisor detected: KVM
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
[0.00] MTRR default type: write-back
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 008000 mask FF8000 uncachable
[0.00]   1 disabled
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] initial memory mapped: [mem 0x-0x023f]
[0.00] Base memory trampoline at [c009b000] 9b000 size 16384
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] init_memory_mapping: [mem 0x0fa0-0x0fbf]
[0.00]  [mem 0x0fa0-0x0fbf] page 4k
[0.00] BRK [0x01e02000, 0x01e02fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0c00-0x0f9f]
[0.00]  [mem 0x0c00-0x0f9f] page 4k
[0.00] BRK [0x01e03000, 0x01e03fff] PGTABLE
[0.00] BRK [0x01e04000, 0x01e04fff] PGTABLE
[0.00] BRK [0x01e05000, 0x01e05fff] PGTABLE
[0.00] BRK [0x01e06000, 0x01e06fff] PGTABLE
[0.00] BRK [0x01e07000, 0x01e07fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x0010-0x0bff]
[0.00]  [mem 0x0010-0x0bff] page 4k
[0.00] init_memory_mapping: [mem 0x0fc0-0x0fffdfff]
[0.00]  [mem 0x0fc0-0x0fffdfff] page 4k
[0.00] RAMDISK: [mem 0x0fce6000-0x0ffe]
[0.00] ACPI: RSDP 0x000F16B0 14 (v00 BOCHS )
[0.00] ACPI: RSDT 0x0FFFE3F0 34 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
[0.00] ACPI: FACP 0x0F80 74 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
[0.00] ACPI: DSDT 0x0FFFE430 001137 (v01 BXPC   BXDSDT   0001 INTL 
20100528)
[0.00] ACPI: FACS 0x0F40 40
[0.00] ACPI: SSDT 0x06A0 000899 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
[0.00] ACPI: APIC 0x05B0 80 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
[0.00] ACPI: HPET 0x0570 38 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] mapped APIC to 9000 (fee0)
[0.00] 255MB LOWMEM available.
[0.00]   mapped low ram: 0 - 0fffe000
[0.00]   low ram: 0 - 0fffe000
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:fffd001, primary cpu clock
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]   Normal   [mem 0x0100-0x0fffdfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009efff]
[0.00]   node   0: [mem 0x0010-0x0fffdfff]
[0.00] On node 0 totalpages: 65436
[0.00] free_area_init_node: node 0, pgdat c17e5120, 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold
Hi Fengguang,

did you test the config i had sent to you?

My test was all done with current 3.14-rc tree. And with this i have no
problem. 

I just cloned the tip tree and i figured out that the patch was dropped
again (BTW: git log does not show that it was ever applied).

Okay, that's enough for me. If it nearly impossible to cut this gordian
knot without support and test from the intel kernel developer group.

The origin code was not in the best shape too. I cannot understand why
this was going into mainline without 32 bit support.

So i tried my best, but without support it is impossible to find all
issues. But mostly what i get was bureaucracy afflictions

I complied, but now it is time to help finding the issues. And not only
do a complain, sit back and wait.

If i haed a  8192 core i7 XEON machine i would be able to test all
mutations of kernels. But i have not (despite i cannot pay the invoice).

Also i get no support by people who ask me to do this work. I am really
pissed of.

- Stefani

Am Freitag, den 07.03.2014, 17:15 +0800 schrieb Fengguang Wu:

 Hi Stefani,
 
 On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
  Hi Fengguang,
  
  i was now able to bring up the kernel on my KVM with some minior
  changes. I kick out the PARIDE, switched to IDE and activated the VT
  support. With this modifications the kernel boot and i get no BUG,
  everything is fine!
  
  So i can not reproduce the bug and i want ask you to check the attached
  kernel config. If this also works for you the problem is maybe located
  in the environment, f.e. gcc.
 
 I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
 I can reproduce it reliably - see the screen dump below. You can find
 the reproduce script at the end of this email.
 
 wfg@bee 
 /kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78%
  kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
 early console in setup code
 [0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
 version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
 [0.00] e820: BIOS-provided physical RAM map:
 [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
 [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
 [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
 [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
 [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] reserved
 [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
 [0.00] BIOS-e820: [mem 0xfffc-0x] reserved
 [0.00] debug: ignoring loglevel setting.
 [0.00] NX (Execute Disable) protection: active
 [0.00] Hypervisor detected: KVM
 [0.00] e820: update [mem 0x-0x0fff] usable == reserved
 [0.00] e820: remove [mem 0x000a-0x000f] usable
 [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
 [0.00] MTRR default type: write-back
 [0.00] MTRR fixed ranges enabled:
 [0.00]   0-9 write-back
 [0.00]   A-B uncachable
 [0.00]   C-F write-protect
 [0.00] MTRR variable ranges enabled:
 [0.00]   0 base 008000 mask FF8000 uncachable
 [0.00]   1 disabled
 [0.00]   2 disabled
 [0.00]   3 disabled
 [0.00]   4 disabled
 [0.00]   5 disabled
 [0.00]   6 disabled
 [0.00]   7 disabled
 [0.00] initial memory mapped: [mem 0x-0x023f]
 [0.00] Base memory trampoline at [c009b000] 9b000 size 16384
 [0.00] init_memory_mapping: [mem 0x-0x000f]
 [0.00]  [mem 0x-0x000f] page 4k
 [0.00] init_memory_mapping: [mem 0x0fa0-0x0fbf]
 [0.00]  [mem 0x0fa0-0x0fbf] page 4k
 [0.00] BRK [0x01e02000, 0x01e02fff] PGTABLE
 [0.00] init_memory_mapping: [mem 0x0c00-0x0f9f]
 [0.00]  [mem 0x0c00-0x0f9f] page 4k
 [0.00] BRK [0x01e03000, 0x01e03fff] PGTABLE
 [0.00] BRK [0x01e04000, 0x01e04fff] PGTABLE
 [0.00] BRK [0x01e05000, 0x01e05fff] PGTABLE
 [0.00] BRK [0x01e06000, 0x01e06fff] PGTABLE
 [0.00] BRK [0x01e07000, 0x01e07fff] PGTABLE
 [0.00] init_memory_mapping: [mem 0x0010-0x0bff]
 [0.00]  [mem 0x0010-0x0bff] page 4k
 [0.00] init_memory_mapping: [mem 0x0fc0-0x0fffdfff]
 [0.00]  [mem 0x0fc0-0x0fffdfff] page 4k
 [0.00] RAMDISK: [mem 0x0fce6000-0x0ffe]
 [0.00] ACPI: RSDP 0x000F16B0 14 (v00 BOCHS )
 [0.00] ACPI: RSDT 0x0FFFE3F0 34 (v01 BOCHS  BXPCRSDT 0001 
 BXPC 0001)
 [0.00] ACPI: FACP 0x0F80 74 (v01 BOCHS  BXPCFACP 0001 
 BXPC 0001)
 [0.00] ACPI: DSDT 0x0FFFE430 001137 (v01 BXPC   BXDSDT 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

On Fri, Mar 07, 2014 at 10:57:28AM +0100, Stefani Seibold wrote:
 Hi Fengguang,
 
 did you test the config i had sent to you?
 
 My test was all done with current 3.14-rc tree. And with this i have no
 problem. 

The regression is found on commit 4dea8e4824b363c53f320d328040d7c6c5921419
(x86, vdso: Add 32 bit VDSO time support for 32 bit kernel) in tip tree.

In the bisect log, you can see that next-20140306 is GOOD. So there's
no way you can find the bug in 3.14-rcX.

 I just cloned the tip tree and i figured out that the patch was dropped
 again (BTW: git log does not show that it was ever applied).

You can still access that specific commit:

wfg@bee /tmp% git clone --reference /c/linux 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Cloning into 'tip'...
remote: Counting objects: 27506, done.
remote: Compressing objects: 100% (7812/7812), done.
remote: Total 25517 (delta 18365), reused 23335 (delta 16786)
Receiving objects: 100% (25517/25517), 7.76 MiB | 31.00 KiB/s, done.
Resolving deltas: 100% (18365/18365), completed with 1321 local objects.
Checking connectivity... done.
Checking out files: 100% (46209/46209), done.
wfg@bee /tmp% cd tip
wfg@bee /tmp/tip% git show 4dea8e4824b363c53f320d328040d7c6c5921419|head   
commit 4dea8e4824b363c53f320d328040d7c6c5921419
Author: Stefani Seibold stef...@seibold.net
Date:   Mon Mar 3 22:12:20 2014 +0100

x86, vdso: Add 32 bit VDSO time support for 32 bit kernel

 Okay, that's enough for me. If it nearly impossible to cut this gordian
 knot without support and test from the intel kernel developer group.
 
 The origin code was not in the best shape too. I cannot understand why
 this was going into mainline without 32 bit support.
 
 So i tried my best, but without support it is impossible to find all
 issues. But mostly what i get was bureaucracy afflictions
 
 I complied, but now it is time to help finding the issues. And not only
 do a complain, sit back and wait.
 
 If i haed a  8192 core i7 XEON machine i would be able to test all
 mutations of kernels. But i have not (despite i cannot pay the invoice).
 
 Also i get no support by people who ask me to do this work. I am really
 pissed of.
 
 - Stefani
 
 Am Freitag, den 07.03.2014, 17:15 +0800 schrieb Fengguang Wu:
 
  Hi Stefani,
  
  On Fri, Mar 07, 2014 at 09:47:14AM +0100, Stefani Seibold wrote:
   Hi Fengguang,
   
   i was now able to bring up the kernel on my KVM with some minior
   changes. I kick out the PARIDE, switched to IDE and activated the VT
   support. With this modifications the kernel boot and i get no BUG,
   everything is fine!
   
   So i can not reproduce the bug and i want ask you to check the attached
   kernel config. If this also works for you the problem is maybe located
   in the environment, f.e. gcc.
  
  I'm using gcc 4.8.1, as you can see from the 2nd line of the below dmesg.
  I can reproduce it reliably - see the screen dump below. You can find
  the reproduce script at the end of this email.
  
  wfg@bee 
  /kernel/i386-randconfig-nh0-03070222/d478a960edf1ea61ca31a07a48a8771f043dba78%
   kvm-0day.sh vmlinuz-3.14.0-rc5-03765-gd478a96
  early console in setup code
  [0.00] Linux version 3.14.0-rc5-03765-gd478a96 (kbuild@nhm4) (gcc 
  version 4.8.1 (Debian 4.8.1-8) ) #2 SMP PREEMPT Fri Mar 7 03:16:44 CST 2014
  [0.00] e820: BIOS-provided physical RAM map:
  [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
  [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
  reserved
  [0.00] BIOS-e820: [mem 0x000f-0x000f] 
  reserved
  [0.00] BIOS-e820: [mem 0x0010-0x0fffdfff] usable
  [0.00] BIOS-e820: [mem 0x0fffe000-0x0fff] 
  reserved
  [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
  reserved
  [0.00] BIOS-e820: [mem 0xfffc-0x] 
  reserved
  [0.00] debug: ignoring loglevel setting.
  [0.00] NX (Execute Disable) protection: active
  [0.00] Hypervisor detected: KVM
  [0.00] e820: update [mem 0x-0x0fff] usable == reserved
  [0.00] e820: remove [mem 0x000a-0x000f] usable
  [0.00] e820: last_pfn = 0xfffe max_arch_pfn = 0x100
  [0.00] MTRR default type: write-back
  [0.00] MTRR fixed ranges enabled:
  [0.00]   0-9 write-back
  [0.00]   A-B uncachable
  [0.00]   C-F write-protect
  [0.00] MTRR variable ranges enabled:
  [0.00]   0 base 008000 mask FF8000 uncachable
  [0.00]   1 disabled
  [0.00]   2 disabled
  [0.00]   3 disabled
  [0.00]   4 disabled
  [0.00]   5 disabled
  [0.00]   6 disabled
  [0.00]   7 disabled
  [0.00] initial memory mapped: [mem 0x-0x023f]
  [0.00] Base memory trampoline at [c009b000] 9b000 size 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold
Am Freitag, den 07.03.2014, 18:21 +0800 schrieb Fengguang Wu:
 Hi Stefani,
 
 On Fri, Mar 07, 2014 at 10:57:28AM +0100, Stefani Seibold wrote:
  Hi Fengguang,
  
  did you test the config i had sent to you?
  
  My test was all done with current 3.14-rc tree. And with this i have no
  problem. 
 
 The regression is found on commit 4dea8e4824b363c53f320d328040d7c6c5921419
 (x86, vdso: Add 32 bit VDSO time support for 32 bit kernel) in tip tree.
 
 In the bisect log, you can see that next-20140306 is GOOD. So there's
 no way you can find the bug in 3.14-rcX.
 
  I just cloned the tip tree and i figured out that the patch was dropped
  again (BTW: git log does not show that it was ever applied).
 
 You can still access that specific commit:
 
 wfg@bee /tmp% git clone --reference /c/linux 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

As i wrote i already cloned the tip tree!!

But i cannot see the changeset, there is also no VDSO changes set in the
git log.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Andy Lutomirski
On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold stef...@seibold.net wrote:
 Hi Fengguang,

 i have build a kernel with the config, but my kvm is unable to start it.
 I will try to find a way to test your kernek config.

 One thing is the crash point:

 The function sysenter_setup was modified by Andy, maybe he has an idea
 what fails.

*sigh*

My host kernel is currently fscked up and won't run KVM.  Also, I want
to confirm that I'm reproducing exactly what you're seeing, and I
think it depends on the toolchain.  Can you (Fenguang) do:

$ ls -l arch/x86/vdso/vdso32*.so
-rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
-rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so

(Of course, triggering this depends on which image gets selected.)

Note that we have a .so file that exceeds 4k, i.e. one page.  Then
read the relevant code and wonder what everyone was smoking when they
wrote it.  There are so many buffer overflows, screwed up
initializations, unnecessary and incorrect copies, etc, that I don't
even want to speculate on what the first failure will be when the
image is bigger than a page.

It's easy enough to fix, but someone should figure out what the impact
will be on the compat vdso case.

I wonder how hard it would be to change the compat vdso do be a dummy
image a la the x86_64 fake vsyscall page so that old code can keep
working (maybe with a performance hit) and new code can use a sane
image.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Stefani Seibold

Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
 On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold stef...@seibold.net wrote:
  Hi Fengguang,
 
  i have build a kernel with the config, but my kvm is unable to start it.
  I will try to find a way to test your kernek config.
 
  One thing is the crash point:
 
  The function sysenter_setup was modified by Andy, maybe he has an idea
  what fails.
 
 *sigh*
 
 My host kernel is currently fscked up and won't run KVM.  Also, I want
 to confirm that I'm reproducing exactly what you're seeing, and I
 think it depends on the toolchain.  Can you (Fenguang) do:
 
 $ ls -l arch/x86/vdso/vdso32*.so
 -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
 -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so
 
 (Of course, triggering this depends on which image gets selected.)
 

Yes, that what i also figured out. There are two culprits:
CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
increase the size of the code by about 500 bytes.

When i add to file arch/x86/vdso/vdso32/vclock_gettime.c

#undef CONFIG_OPTIMIZE_INLINING
#undef CONFIG_X86_PPRO_FENCE

this will solve the issue.

 Note that we have a .so file that exceeds 4k, i.e. one page.  Then
 read the relevant code and wonder what everyone was smoking when they
 wrote it.  There are so many buffer overflows, screwed up
 initializations, unnecessary and incorrect copies, etc, that I don't
 even want to speculate on what the first failure will be when the
 image is bigger than a page.


Right. So the above one will not really solve it. At least when
__vdso_getcpu() code will also become a part of the 32 bit VDSO.
 
 It's easy enough to fix, but someone should figure out what the impact
 will be on the compat vdso case.
 
 I wonder how hard it would be to change the compat vdso do be a dummy
 image a la the x86_64 fake vsyscall page so that old code can keep
 working (maybe with a performance hit) and new code can use a sane
 image.
 

That is exactly what i wrote one week ago:

Move the VDSO code before the VDSO compat fixmap area and create a kind
of helper VDSO for the VDSO compat fixmap page, which only calls the
real VDSO. But this would result in a performance regression for the
VDSO compat mode.

- Stefani


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Andy Lutomirski
On Fri, Mar 7, 2014 at 1:53 PM, Stefani Seibold stef...@seibold.net wrote:

 Am Freitag, den 07.03.2014, 10:56 -0800 schrieb Andy Lutomirski:
 On Thu, Mar 6, 2014 at 11:21 PM, Stefani Seibold stef...@seibold.net wrote:
  Hi Fengguang,
 
  i have build a kernel with the config, but my kvm is unable to start it.
  I will try to find a way to test your kernek config.
 
  One thing is the crash point:
 
  The function sysenter_setup was modified by Andy, maybe he has an idea
  what fails.

 *sigh*

 My host kernel is currently fscked up and won't run KVM.  Also, I want
 to confirm that I'm reproducing exactly what you're seeing, and I
 think it depends on the toolchain.  Can you (Fenguang) do:

 $ ls -l arch/x86/vdso/vdso32*.so
 -rwxrwxr-x. 1 luto luto 4096 Mar  7 10:19 arch/x86/vdso/vdso32-int80.so
 -rwxrwxr-x. 1 luto luto 4116 Mar  7 10:19 arch/x86/vdso/vdso32-sysenter.so

 (Of course, triggering this depends on which image gets selected.)


 Yes, that what i also figured out. There are two culprits:
 CONFIG_OPTIMIZE_INLINING and CONFIG_X86_PPRO_FENCE. Each of them
 increase the size of the code by about 500 bytes.

 When i add to file arch/x86/vdso/vdso32/vclock_gettime.c

 #undef CONFIG_OPTIMIZE_INLINING
 #undef CONFIG_X86_PPRO_FENCE

 this will solve the issue.

 Note that we have a .so file that exceeds 4k, i.e. one page.  Then
 read the relevant code and wonder what everyone was smoking when they
 wrote it.  There are so many buffer overflows, screwed up
 initializations, unnecessary and incorrect copies, etc, that I don't
 even want to speculate on what the first failure will be when the
 image is bigger than a page.


 Right. So the above one will not really solve it. At least when
 __vdso_getcpu() code will also become a part of the 32 bit VDSO.

 It's easy enough to fix, but someone should figure out what the impact
 will be on the compat vdso case.

 I wonder how hard it would be to change the compat vdso do be a dummy
 image a la the x86_64 fake vsyscall page so that old code can keep
 working (maybe with a performance hit) and new code can use a sane
 image.


 That is exactly what i wrote one week ago:

 Move the VDSO code before the VDSO compat fixmap area and create a kind
 of helper VDSO for the VDSO compat fixmap page, which only calls the
 real VDSO. But this would result in a performance regression for the
 VDSO compat mode.

I think that regressing performance for compat_vdso (only) users is
fine.  We need to figure out what those users are.  I have a vague
recollection that it's a particular version of SuSE or OpenSuSE.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread H. Peter Anvin
On 03/07/2014 08:06 AM, Stefani Seibold wrote:

 wfg@bee /tmp% git clone --reference /c/linux 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
 
 As i wrote i already cloned the tip tree!!
 
 But i cannot see the changeset, there is also no VDSO changes set in the
 git log.
 

It isn't on the master branch because it hasn't been stable enough to merge.

You need to do:

git checkout x86/vdso

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-07 Thread Fengguang Wu
Hi Stefani,

 So i tried my best, but without support it is impossible to find all
 issues. But mostly what i get was bureaucracy afflictions
 
 I complied, but now it is time to help finding the issues. And not only
 do a complain, sit back and wait.

I feel sorry if that's what you perceived. But I'm just submitting
test results rather than complaining. I should actually be glad if my
test system catches more bugs. ;-) And there is no way for me to sit
back - I'm actually overloaded. Yesterday I wrote 63 emails, which is
one per 10 minutes _assuming_ I'm working 8hours. You can imagine the
works required behind all these emails.

 If i haed a  8192 core i7 XEON machine i would be able to test all
 mutations of kernels. But i have not (despite i cannot pay the invoice).
 
 Also i get no support by people who ask me to do this work. I am really
 pissed of.

We tried hard to build the test infrastructure for the good of Linux
community. And if you like, I'd be happy to add your git tree to our
test pool - currently it already includes 300+ kernel git trees from
various developers. It'd feel more at home to find bugs in one's own
tree, rather than in the maintainers'. :-)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-06 Thread Stefani Seibold
Hi Fengguang,

i have build a kernel with the config, but my kvm is unable to start it.
I will try to find a way to test your kernek config.

One thing is the crash point:

The function sysenter_setup was modified by Andy, maybe he has an idea
what fails.

- Stefani

Am Freitag, den 07.03.2014, 09:38 +0800 schrieb Fengguang Wu:
> Hi Stefani,
> 
> I got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/vdso
> commit 4dea8e4824b363c53f320d328040d7c6c5921419
> Author: Stefani Seibold 
> AuthorDate: Mon Mar 3 22:12:20 2014 +0100
> Commit: H. Peter Anvin 
> CommitDate: Wed Mar 5 14:02:38 2014 -0800
> 
> x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
> 
> This patch add the time support for 32 bit a VDSO to a 32 bit kernel.
> 
> For 32 bit programs running on a 32 bit kernel, the same mechanism is
> used as for 64 bit programs running on a 64 bit kernel.
> 
> Reviewed-by: Andy Lutomirski 
> Signed-off-by: Stefani Seibold 
> Link: 
> http://lkml.kernel.org/r/1393881143-3569-10-git-send-email-stef...@seibold.net
> Signed-off-by: H. Peter Anvin 
> 
> +++
> || 4dea8e4824 |
> +++
> | boot_successes | 0  |
> | boot_failures  | 19 |
> | BUG:unable_to_handle_kernel_paging_request | 19 |
> | Oops:PREEMPT_SMP_DEBUG_PAGEALLOC   | 19 |
> | EIP_is_at_sysenter_setup   | 19 |
> | Kernel_panic-not_syncing:Fatal_exception   | 19 |
> +++
> 
> [0.004009] pid_max: default: 4096 minimum: 301
> [0.009099] Mount-cache hash table entries: 512
> [0.014838] mce: CPU supports 10 MCE banks
> [0.015243] BUG: unable to handle kernel paging request at d34bd000
> [0.016000] IP: [] sysenter_setup+0x9a/0x2d4
> [0.016000] *pdpt = 018a4001 *pde = 13bea067 *pte = 
> 8000134bd060 
> [0.016000] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [0.016000] Modules linked in:
> [0.016000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 3.14.0-rc5-03765-gd478a96 #2
> [0.016000] task: c17997c0 ti: c178e000 task.ti: c178e000
> [0.016000] EIP: 0060:[] EFLAGS: 00210212 CPU: 0
> [0.016000] EIP is at sysenter_setup+0x9a/0x2d4
> [0.016000] EAX: 078bfbfd EBX: d34bc000 ECX: 0004 EDX: 1004
> [0.016000] ESI: c186740c EDI: d34bd000 EBP: c178ff98 ESP: c178ff74
> [0.016000]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> [0.016000] CR0: 8005003b CR2: d34bd000 CR3: 018a7000 CR4: 06f0
> [0.016000] Stack:
> [0.016000]  00200202 c1dbb0f0 0f61 0800 80002001 078bfbfd 
>  
> [0.016000]  c18a8800 c178ffa0 c1821144 c178ffbc c182117d  
> c178ffbc 
> [0.016000]   c18a8800 c178ffec c181ab11 0101  
>  c181a549
> [0.016000] Call Trace:
> [0.016000]  [] identify_boot_cpu+0x17/0x28
> [0.016000]  [] check_bugs+0xe/0x160
> [0.016000]  [] start_kernel+0x401/0x470
> [0.016000]  [] ? repair_env_string+0x51/0x51
> [0.016000]  [] i386_start_kernel+0x12e/0x131
> [0.016000] Code: f6 c4 08 74 12 ba 10 74 86 c1 81 ea 0c 64 86 c1 be 0c 64 
> 86 c1 eb 10 ba 0c 64 86 c1 81 ea 1c 54 86 c1 be 1c 54 86 c1 89 df 89 d1  
> a4 89 d8 e8 8b fe ff ff b9 04 00 00 00 ba d6 c2 6e c1 89 d8
> [0.016000] EIP: [] sysenter_setup+0x9a/0x2d4 SS:ESP 
> 0068:c178ff74
> [0.016000] CR2: d34bd000
> [0.016000] ---[ end trace db4b7fde7786bb07 ]---
> [0.016000] Kernel panic - not syncing: Fatal exception
> 
> git bisect start d478a960edf1ea61ca31a07a48a8771f043dba78 
> 0414855fdc4a40da05221fc6062cccbc0c30f169 --
> git bisect  bad 6c2191ad9b6225860eef70a77d300c3d5ad39182  # 05:55  0- 
> 15  Merge 'digsig/for-mimi' into devel-hourly-2014030618
> git bisect good 61ca01b5aa63605e033f1826dcceb41421aa72cd  # 06:03 20+ 
>  0  Merge 'ubifs/master' into devel-hourly-2014030618
> git bisect  bad 53dca0b5f0e257f00b91fc3be98fb47c07d20cfc  # 06:11  0- 
> 16  Merge 'tip/x86/vdso' into devel-hourly-2014030618
> git bisect good f25ed0ebc194a51042a5392ca821de2ff6661275  # 06:20 20+ 
>  0  Merge 'slave-dma/next' into devel-hourly-2014030618
> git bisect good e0099b8165e2525541d7844e29e8838824b3601e  # 06:23 20+ 
>  0  Merge 'pcmoore-selinux/next' into devel-hourly-2014030618
> git bisect good c24bf54683dd0098e878a0cf40e2667e46a39a0a  # 06:29 20+ 
>  0  Merge 'renesas/next' into devel-hourly-2014030618
> git bisect good 6543ca6fee7d3b314bda69b83fd429ed3e336645  # 06:35 20+ 
>  0  x86, vdso: Cleanup __vdso_gettimeofday()
> git bisect  bad 4dea8e4824b363c53f320d328040d7c6c5921419  # 06:37  0- 
> 15  x86, 

Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

2014-03-06 Thread Stefani Seibold
Hi Fengguang,

i have build a kernel with the config, but my kvm is unable to start it.
I will try to find a way to test your kernek config.

One thing is the crash point:

The function sysenter_setup was modified by Andy, maybe he has an idea
what fails.

- Stefani

Am Freitag, den 07.03.2014, 09:38 +0800 schrieb Fengguang Wu:
 Hi Stefani,
 
 I got the below dmesg and the first bad commit is
 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/vdso
 commit 4dea8e4824b363c53f320d328040d7c6c5921419
 Author: Stefani Seibold stef...@seibold.net
 AuthorDate: Mon Mar 3 22:12:20 2014 +0100
 Commit: H. Peter Anvin h...@linux.intel.com
 CommitDate: Wed Mar 5 14:02:38 2014 -0800
 
 x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
 
 This patch add the time support for 32 bit a VDSO to a 32 bit kernel.
 
 For 32 bit programs running on a 32 bit kernel, the same mechanism is
 used as for 64 bit programs running on a 64 bit kernel.
 
 Reviewed-by: Andy Lutomirski l...@amacapital.net
 Signed-off-by: Stefani Seibold stef...@seibold.net
 Link: 
 http://lkml.kernel.org/r/1393881143-3569-10-git-send-email-stef...@seibold.net
 Signed-off-by: H. Peter Anvin h...@linux.intel.com
 
 +++
 || 4dea8e4824 |
 +++
 | boot_successes | 0  |
 | boot_failures  | 19 |
 | BUG:unable_to_handle_kernel_paging_request | 19 |
 | Oops:PREEMPT_SMP_DEBUG_PAGEALLOC   | 19 |
 | EIP_is_at_sysenter_setup   | 19 |
 | Kernel_panic-not_syncing:Fatal_exception   | 19 |
 +++
 
 [0.004009] pid_max: default: 4096 minimum: 301
 [0.009099] Mount-cache hash table entries: 512
 [0.014838] mce: CPU supports 10 MCE banks
 [0.015243] BUG: unable to handle kernel paging request at d34bd000
 [0.016000] IP: [c182dbca] sysenter_setup+0x9a/0x2d4
 [0.016000] *pdpt = 018a4001 *pde = 13bea067 *pte = 
 8000134bd060 
 [0.016000] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 [0.016000] Modules linked in:
 [0.016000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
 3.14.0-rc5-03765-gd478a96 #2
 [0.016000] task: c17997c0 ti: c178e000 task.ti: c178e000
 [0.016000] EIP: 0060:[c182dbca] EFLAGS: 00210212 CPU: 0
 [0.016000] EIP is at sysenter_setup+0x9a/0x2d4
 [0.016000] EAX: 078bfbfd EBX: d34bc000 ECX: 0004 EDX: 1004
 [0.016000] ESI: c186740c EDI: d34bd000 EBP: c178ff98 ESP: c178ff74
 [0.016000]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 [0.016000] CR0: 8005003b CR2: d34bd000 CR3: 018a7000 CR4: 06f0
 [0.016000] Stack:
 [0.016000]  00200202 c1dbb0f0 0f61 0800 80002001 078bfbfd 
  
 [0.016000]  c18a8800 c178ffa0 c1821144 c178ffbc c182117d  
 c178ffbc 
 [0.016000]   c18a8800 c178ffec c181ab11 0101  
  c181a549
 [0.016000] Call Trace:
 [0.016000]  [c1821144] identify_boot_cpu+0x17/0x28
 [0.016000]  [c182117d] check_bugs+0xe/0x160
 [0.016000]  [c181ab11] start_kernel+0x401/0x470
 [0.016000]  [c181a549] ? repair_env_string+0x51/0x51
 [0.016000]  [c181a364] i386_start_kernel+0x12e/0x131
 [0.016000] Code: f6 c4 08 74 12 ba 10 74 86 c1 81 ea 0c 64 86 c1 be 0c 64 
 86 c1 eb 10 ba 0c 64 86 c1 81 ea 1c 54 86 c1 be 1c 54 86 c1 89 df 89 d1 f3 
 a4 89 d8 e8 8b fe ff ff b9 04 00 00 00 ba d6 c2 6e c1 89 d8
 [0.016000] EIP: [c182dbca] sysenter_setup+0x9a/0x2d4 SS:ESP 
 0068:c178ff74
 [0.016000] CR2: d34bd000
 [0.016000] ---[ end trace db4b7fde7786bb07 ]---
 [0.016000] Kernel panic - not syncing: Fatal exception
 
 git bisect start d478a960edf1ea61ca31a07a48a8771f043dba78 
 0414855fdc4a40da05221fc6062cccbc0c30f169 --
 git bisect  bad 6c2191ad9b6225860eef70a77d300c3d5ad39182  # 05:55  0- 
 15  Merge 'digsig/for-mimi' into devel-hourly-2014030618
 git bisect good 61ca01b5aa63605e033f1826dcceb41421aa72cd  # 06:03 20+ 
  0  Merge 'ubifs/master' into devel-hourly-2014030618
 git bisect  bad 53dca0b5f0e257f00b91fc3be98fb47c07d20cfc  # 06:11  0- 
 16  Merge 'tip/x86/vdso' into devel-hourly-2014030618
 git bisect good f25ed0ebc194a51042a5392ca821de2ff6661275  # 06:20 20+ 
  0  Merge 'slave-dma/next' into devel-hourly-2014030618
 git bisect good e0099b8165e2525541d7844e29e8838824b3601e  # 06:23 20+ 
  0  Merge 'pcmoore-selinux/next' into devel-hourly-2014030618
 git bisect good c24bf54683dd0098e878a0cf40e2667e46a39a0a  # 06:29 20+ 
  0  Merge 'renesas/next' into devel-hourly-2014030618
 git bisect good 6543ca6fee7d3b314bda69b83fd429ed3e336645  # 06:35 20+ 
  0  x86, vdso: Cleanup __vdso_gettimeofday()
 git bisect  bad