[RFC] kernel/pid.c pid allocation wierdness

2007-03-13 Thread Pavel Emelianov
Hi.

I'm looking at how alloc_pid() works and can't understand
one (simple/stupid) thing.

It first kmem_cache_alloc()-s a strct pid, then calls
alloc_pidmap() and at the end it taks a global pidmap_lock()
to add new pid to hash.

The question is - why does alloc_pidmap() use at least
two atomic ops and potentially loop to find a zero bit
in pidmap? Why not call alloc_pidmap() under pidmap_lock
and find zero pid in pidmap w/o any loops and atomics?

The same is for free_pid(). Do I miss something?

Thank,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ALSA PATCH] alsa-git merge request

2007-03-13 Thread Jaroslav Kysela

Linus, please pull from [the linus branch at]:

  master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa.git linus
gitweb interface:
  http://www.kernel.org/git/?p=linux/kernel/git/perex/alsa.git

The GNU patch is available at:

  ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-git-2007-03-14.patch.gz

Additional notes:

  Only fixes / new hardware IDs.


The following files will be updated:

 Documentation/sound/alsa/ALSA-Configuration.txt |1 +
 include/sound/version.h |2 +-
 sound/pci/ac97/ac97_patch.c |   13 ---
 sound/pci/hda/hda_intel.c   |   13 ++-
 sound/pci/hda/patch_analog.c|   41 +--
 sound/pci/hda/patch_realtek.c   |1 +
 sound/pci/hda/patch_sigmatel.c  |5 +++
 sound/pci/intel8x0.c|   10 --
 sound/soc/Kconfig   |2 +
 sound/soc/at91/Kconfig  |3 +-
 sound/soc/pxa/Kconfig   |3 +-
 11 files changed, 76 insertions(+), 18 deletions(-)


The following things were done:

Jaroslav Kysela (1):
  [ALSA] version 1.0.14rc3

Randy Cushman (1):
  [ALSA] ac97 - fix AD shared shared jack control logic

Takashi Iwai (5):
  [ALSA] soc - Fix dependencies in Kconfig files
  [ALSA] hda-intel - Fix codec probe with ATI contorllers
  [ALSA] hda-codec - Fix speaker output on MacPro
  [ALSA] intel8x0 - Fix Oops at kdump crash kernel
  [ALSA] hda-codec - Add model for HP Compaq d5700

Tobin Davis (2):
  [ALSA] hda-codec - Add suppoprt for Asus M2N-SLI motherboard
  [ALSA] hda-codec - more systems for Analog Devices

Tommi Kyntola (1):
  [ALSA] intel8x0 - Fix speaker output after S2RAM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] Resource counters

2007-03-13 Thread Pavel Emelianov
Srivatsa Vaddagiri wrote:
> On Tue, Mar 13, 2007 at 06:41:05PM +0300, Pavel Emelianov wrote:
>>> right, but atomic ops have much less impact on most
>>> architectures than locks :)
>> Right. But atomic_add_unless() is slower as it is
>> essentially a loop. See my previous letter in this sub-thread.
> 
> If I am not mistaken, you shouldn't loop in normal cases, which means
> it boils down to a atomic_read() + atomic_cmpxch()
> 
> 

So does the lock - in a normal case (when it's not
heavily contented) it will boil down to atomic_dec_and_test().

Nevertheless, making charge like in this patchset
requires two atomic ops with atomic_xxx and only
one with spin_lock().
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-13 Thread Nick Piggin

Balbir Singh wrote:

Nick Piggin wrote:



And strangely, this example does not go outside the parameters of
what you asked for AFAIKS. In the worst case of one container getting
_all_ the shared pages, they will still remain inside their maximum
rss limit.



When that does happen and if a container hits it limit, with a LRU
per-container, if the container is not actually using those pages,
they'll get thrown out of that container and get mapped into the
container that is using those pages most frequently.


Exactly. Statistically, first touch will work OK. It may mean some
reclaim inefficiencies in corner cases, but things will tend to
even out.


So they might get penalised a bit on reclaim, but maximum rss limits
will work fine, and you can (almost) guarantee X amount of memory for
a given container, and it will _work_.

But I also take back my comments about this being the only design I
have seen that gets everything, because the node-per-container idea
is a really good one on the surface. And it could mean even less impact
on the core VM than this patch. That is also a first-touch scheme.



With the proposed node-per-container, we will need to make massive core
VM changes to reorganize zones and nodes. We would want to allow

1. For sharing of nodes
2. Resizing nodes
3. May be more


But a lot of that is happening anyway for other reasons (eg. memory
plug/unplug). And I don't consider node/zone setup to be part of the
"core VM" as such... it is _good_ if we can move extra work into setup
rather than have it in the mm.

That said, I don't think this patch is terribly intrusive either.



With the node-per-container idea, it will hard to control page cache
limits, independent of RSS limits or mlock limits.

NOTE: page cache == unmapped page cache here.


I don't know that it would be particularly harder than any other
first-touch scheme. If one container ends up being charged with too
much pagecache, eventually they'll reclaim a bit of it and the pages
will get charged to more frequent users.



However the messed up accounting that doesn't handle sharing between
groups of processes properly really bugs me.  Especially when we have
the infrastructure to do it right.

Does that make more sense?



I think it is simplistic.

Sure you could probably use some of the rmap stuff to account shared
mapped _user_ pages once for each container that touches them. And
this patchset isn't preventing that.

But how do you account kernel allocations? How do you account unmapped
pagecache?

What's the big deal so many accounting people have with just RSS? I'm
not a container person, this is an honest question. Because from my
POV if you conveniently ignore everything else... you may as well just
not do any accounting at all.



We decided to implement accounting and control in phases

1. RSS control
2. unmapped page cache control
3. mlock control
4. Kernel accounting and limits

This has several advantages

1. The limits can be individually set and controlled.
2. The code is broken down into simpler chunks for review and merging.


But this patch gives the groundwork to handle 1-4, and it is in a small
chunk, and one would be able to apply different limits to different types
of pages with it. Just using rmap to handle 1 does not really seem like a
viable alternative because it fundamentally isn't going to handle 2 or 4.

I'm not saying that you couldn't _later_ add something that uses rmap or
our current RSS accounting to tweak container-RSS semantics. But isn't it
sensible to lay the groundwork first? Get a clear path to something that
is good (not perfect), but *works*?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [5/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Arkadiusz Miskiewicz
On Tuesday 13 of March 2007, Adrian Bunk wrote:

> Subject: ThinkPad Z60m: usb mouse stops working after suspend to ram
> References : http://lkml.org/lkml/2007/2/21/413
>  http://lkml.org/lkml/2007/2/28/172
> Submitter  : Arkadiusz Miskiewicz <[EMAIL PROTECTED]>
> Caused-By  : Konstantin Karasyov <[EMAIL PROTECTED]>
>  commit 0a6139027f3986162233adc17285151e78b39cac
> Handled-By : Konstantin Karasyov <[EMAIL PROTECTED]>
> Status : problem is being debugged

It's fixed in git tree. Commit ff24ba74b6d3befbfbafa142582211b5a6095d45

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86 boot, pda and gdt cleanups

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 21:39 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > This is called "pissing in the corners".  Don't do it: we don't need to
> > touch that code and I actually prefer the original anyway (explicit is
> > *good*).  
> >
> > The habit of extracting cpu number once then using it is an optimization
> > which we should be aiming to get rid of (it simply hurts archs with
> > efficient per-cpu implementations).
> 
> No, that was for a reason.  I was worried about smp_processor_id() not
> returning valid values between init_gdt and cpu_set_gdt.  It's not
> actually a problem, but relying on smp_processor_id() while we're moving
> the foundations its based on seems fragile.

smp_processor_id() always works, so it's fundamental, not fragile.

However, we *should* remove the arg from cpu_set_gdt, since we have such
faith in smp_processor_id() 8)

Cheers,
Rusty.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stolen and degraded time and schedulers

2007-03-13 Thread Jeremy Fitzhardinge
Daniel Walker wrote:
> The adjustments that I spoke of above are working regardless of ntp ..
> The stability of the TSC directly effects the clock mult adjustments in
> timekeeping, as does interrupt latency since the clock is essentially
> validated against the timer interrupt.
>   

Yep.  But the tsc is just an example of a clocksource, and doesn't have
any real bearing on what I'm saying.

> like I said there are other factors so that's not going to exactly model
> cpu speed changes. You could come up with another method, but that would
> likely require another known constant clock.
>   

Well, it doesn't need to be a constant clock if its modelling a changing
rate.  And it doesn't need to be an exact model; it just needs to be
better than the current situation.

> sched_clock doesn't measure amounts of cpu work either, it's all about
> timing. 
>   

Specifically, how much cpu time a process has used.  But if the CPU is
running at half speed (or 50% duty cycle), then claiming that the
process got the full amount of time is just an error.

>> Well, lots of cpus have dynamic frequencies.  Any scheduler which
>> maintains history will suffer the same problem, even on UP.  If
>> processes A and B are supposed to have the same priority and they both
>> execute for 1ms of real time, did they make the same amount of
>> progress?  Not if the cpu changed speed in between.
>> 
>
> That's true, but given a constant clock (like what sched_clock should
> have) then the accounting is similarly inaccurate. Any connection
> between the scheduler and the TSC frequency changes aren't part of the
> design AFAIK ..
>   

Well, my whole argument is that sched_clock /should not/ be a constant
clock.  And I'm not quite sure why you keep bringing up the tsc, because
it has no relevance.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce load_TLS to the "for" loop.

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 21:55 +0100, Andi Kleen wrote:
> On Tue, Mar 13, 2007 at 10:31:27AM -0700, Jeremy Fitzhardinge wrote:
> > Andi Kleen wrote:
> > > On Tue, Mar 13, 2007 at 05:39:36PM +1100, Rusty Russell wrote:
> > >   
> > >> GCC (4.1 at least) unrolls it anyway, but I can't believe this code
> > >> 
> > >
> > > Are you sure? Normally it doesn't unroll without -funroll-loops which
> > > the kernel does normally not set. Especially not with -Os builds.
> > >   
> > 
> > Does it matter either way in this case?
> 
> It's in the middle of the context switch.

Well, the rest of __switch_to isn't "0PTIM1Z3D!!!" like this.

But even so, that's no excuse for crap code.  If it had used memcpy, we
wouldn't be wasting cycles on this discussion.

Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-13 Thread Balbir Singh

Nick Piggin wrote:

Eric W. Biederman wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:



Eric W. Biederman wrote:


First touch page ownership does not guarantee give me anything useful
for knowing if I can run my application or not.  Because of page
sharing my application might run inside the rss limit only because
I got lucky and happened to share a lot of pages with another running
application.  If the next I run and it isn't running my application
will fail.  That is ridiculous.


Let's be practical here, what you're asking is basically impossible.

Unless by deterministic you mean that it never enters the a non
trivial syscall, in which case, you just want to know about maximum
RSS of the process, which we already account).



Not per process I want this on a group of processes, and yes that
is all I want just.  I just want accounting of the maximum RSS of
a group of processes and then the mechanism to limit that maximum rss.


Well don't you just sum up the maximum for each process?

Or do you want to only count shared pages inside a container once,
or something difficult like that?



I don't want sharing between vservers/VE/containers to affect how many
pages I can have mapped into my processes at once.


You seem to want total isolation. You could use virtualization?



No.  I don't want the meaning of my rss limit to be affected by what
other processes are doing.  We have constraints of how many resources
the box actually has.  But I don't want accounting so sloppy that
processes outside my group of processes can artificially
lower my rss value, which magically raises my rss limit.


So what are you going to do about all the shared caches and slabs
inside the kernel?



It is basically handwaving anyway. The only approach I've seen with
a sane (not perfect, but good) way of accounting memory use is this
one. If you care to define "proper", then we could discuss that.



I will agree that this patchset is probably in the right general 
ballpark.

But the fact that pages are assigned exactly one owner is pure non-sense.
We can do better.  That is all I am asking for someone to at least 
attempt
to actually account for the rss of a group of processes and get the 
numbers

right when we have shared pages, between different groups of
processes.  We have the data structures to support this with rmap.


Well rmap only supports mapped, userspace pages.



Let me describe the situation where I think the accounting in the
patchset goes totally wonky.

Gcc as I recall maps the pages it is compiling with mmap.
If in a single kernel tree I do:
make -jN O=../compile1 &
make -jN O=../compile2 &

But set it up so that the two compiles are in different rss groups.
If I run the concurrently they will use the same files at the same
time and most likely because of the first touch rss limit rule even
if I have a draconian rss limit the compiles will both be able to
complete and finish.   However if I run either of them alone if I
use the most draconian rss limit I can that allows both compiles to
finish I won't be able to compile a single kernel tree.


Yeah it is not perfect. Fortunately, there is no perfect solution,
so we don't have to be too upset about that.

And strangely, this example does not go outside the parameters of
what you asked for AFAIKS. In the worst case of one container getting
_all_ the shared pages, they will still remain inside their maximum
rss limit.



When that does happen and if a container hits it limit, with a LRU
per-container, if the container is not actually using those pages,
they'll get thrown out of that container and get mapped into the
container that is using those pages most frequently.


So they might get penalised a bit on reclaim, but maximum rss limits
will work fine, and you can (almost) guarantee X amount of memory for
a given container, and it will _work_.

But I also take back my comments about this being the only design I
have seen that gets everything, because the node-per-container idea
is a really good one on the surface. And it could mean even less impact
on the core VM than this patch. That is also a first-touch scheme.



With the proposed node-per-container, we will need to make massive core
VM changes to reorganize zones and nodes. We would want to allow

1. For sharing of nodes
2. Resizing nodes
3. May be more

With the node-per-container idea, it will hard to control page cache
limits, independent of RSS limits or mlock limits.

NOTE: page cache == unmapped page cache here.




However the messed up accounting that doesn't handle sharing between
groups of processes properly really bugs me.  Especially when we have
the infrastructure to do it right.

Does that make more sense?


I think it is simplistic.

Sure you could probably use some of the rmap stuff to account shared
mapped _user_ pages once for each container that touches them. And
this patchset isn't preventing that.

But how do you account kernel allocations? How do you account unmapped
pagecache?

Wh

Re: [PATCH] Introduce load_TLS to the "for" loop.

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 14:50 +0100, Andi Kleen wrote:
> On Tue, Mar 13, 2007 at 05:39:36PM +1100, Rusty Russell wrote:
> > GCC (4.1 at least) unrolls it anyway, but I can't believe this code
> 
> Are you sure? Normally it doesn't unroll without -funroll-loops which
> the kernel does normally not set. Especially not with -Os builds.

Yep, checked again:

$ gcc --version
gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)
...
...
  gcc -Wp,-MD,arch/x86_64/kernel/.process.o.d  -nostdinc
-isystem /usr/lib/gcc/i486-linux-gnu/4.1.2/include -D__KERNEL__
-Iinclude  -include include/linux/autoconf.h -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -O2
-mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe
-fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables
-funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
-maccumulate-outgoing-args   -fno-omit-frame-pointer
-fno-optimize-sibling-calls -g  -fno-stack-protector
-Wdeclaration-after-statement -Wno-pointer-sign -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(process)"
-D"KBUILD_MODNAME=KBUILD_STR(process)" -c -o
arch/x86_64/kernel/process.o arch/x86_64/kernel/process.c
...
$ objdump -Dr arch/x86_64/kernel/process.o | less
...
 6be:   48 8b 94 00 00 00 00mov0x0(%rax,%rax,1),%rdx
 6c5:   00 
6c2: R_X86_64_32S   cpu_gdt_descr+0x2
 6c6:   48 8b 83 98 02 00 00mov0x298(%rbx),%rax
 6cd:   48 83 c2 60 add$0x60,%rdx
 6d1:   48 89 02mov%rax,(%rdx)
 6d4:   48 8b 83 a0 02 00 00mov0x2a0(%rbx),%rax
 6db:   48 89 42 08 mov%rax,0x8(%rdx)
 6df:   48 8b 83 a8 02 00 00mov0x2a8(%rbx),%rax
 6e6:   48 89 42 10 mov%rax,0x10(%rdx)

If I turn on CONFIG_OPTIMIZE_FOR_SIZE, it's still unrolled,
interestingly.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-13 Thread William Lee Irwin III
> On Wednesday 14 March 2007, William Lee Irwin III wrote:
> >On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote:
> >> Now, can someone suggest a patch I can revert that might fix this? 
> >> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me
> >> building kernels to bisect this till the middle of June at this rate.
> >
> >4 billion patches could be bisected in 34 boots. Between 2.6.20 and
> >2.6.21-rc1 there are only:
> >
> >$ git rev-list --no-merges v2.6.20..v2.6.21-rc1  |wc -l
> >3118
> >
> >patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots.
> >
> >Of course, this is a little optimistic because it assumes no additional
> >breakage occurring at the various bisection points. In any event,
> >assuming (pessimistically) 10 minutes per build, this is 280 minutes or
> >4 hours and 40 minutes of build time. I estimate the process should
> >complete well before Friday of this week, never mind June.
> 
On Wed, Mar 14, 2007 at 02:09:57AM -0400, Gene Heskett wrote:
> Chuckle, sorry to disappoint you wli, on that 32 cpu Niagra Con was 
> calling 'poor equipment', maybe.
> Even using  ccache, its about 15-18 minutes per build, with another 10 to 
> edit my build script and construct the kernel tree with the proper 
> patches applied.  Then a reboot, probably 10 minutes by the time I get 
> the nvidia driver installed for the new kernel and get startx'd, then its 
> another 2 hours or a bit less for an amanda run to test it.

2 hours, 48 minutes times 13 boots (see the correction post) is 36
hours, 24 minutes. One attempt a day (24 hours instead of 2 hours, 48
minutes) yyields 2 weeks. So you're still done by April, not June.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-13 Thread Gene Heskett
On Wednesday 14 March 2007, William Lee Irwin III wrote:
>On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote:
>> Now, can someone suggest a patch I can revert that might fix this? 
>> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me
>> building kernels to bisect this till the middle of June at this rate.
>
>4 billion patches could be bisected in 34 boots. Between 2.6.20 and
>2.6.21-rc1 there are only:
>
>$ git rev-list --no-merges v2.6.20..v2.6.21-rc1  |wc -l
>3118
>
>patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots.
>
>Of course, this is a little optimistic because it assumes no additional
>breakage occurring at the various bisection points. In any event,
>assuming (pessimistically) 10 minutes per build, this is 280 minutes or
>4 hours and 40 minutes of build time. I estimate the process should
>complete well before Friday of this week, never mind June.

Chuckle, sorry to disappoint you wli, on that 32 cpu Niagra Con was 
calling 'poor equipment', maybe.

Even using  ccache, its about 15-18 minutes per build, with another 10 to 
edit my build script and construct the kernel tree with the proper 
patches applied.  Then a reboot, probably 10 minutes by the time I get 
the nvidia driver installed for the new kernel and get startx'd, then its 
another 2 hours or a bit less for an amanda run to test it.

I've posted to the amanda lists too, so they will be aware of it.  And 
because an ls -lc returns perfectly sane values for the mtimes and sizes, 
I suspect the real problem may not necessarily be 100% kernel related.  I 
have been intermittently ranting because both the tar api stir, and the 
return from tar are such a moving target that the developers are having a 
hard time staying ahead of the changes to tar, backward compatibility it 
seems, is the furthest thing from the tar maintainers minds.  The most 
recent change that I'm aware of is that tar now returns a 1 for success! 
What the heck were those guys at gnu.org thinking?  Or smoking as the 
case may be.

I obviously have a copy of the -rc1 patch in its entirety that I could 
peruse, but I'm not sure I would recognize a change that would effect tar 
if it bit me, hence the questions here to those who are far more 
conversant than I.

As I've said on several occasions William, at my age, the best part I can 
play here is the Canary, in the coal mine scene, and something strange in 
the air of 2.6.21* just killed me.  It's now up to the coroner(s) to 
determine the cause, and he has several dozen very very able assistants 
monitoring this list in my NSH opinion.  Whatever, either tar, or this 
particular board in the kernels architecture surely needs fixed before 
2.6.21 final.  I don't even know if gnu.org has a bugzilla setup, but 
I'll look around tomorrow night as I'm tied up now till late tomorrow.  
If they do, I'll file it.

But, I'm also amazed that no one else has been bitten.  Don't any of you 
ever make a backup using tar for the pack mule?

Thanks William.

>-- wli

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Do unto others before they undo you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/6] 2.6.21-rc2: known regressions

2007-03-13 Thread Tejun Heo
Hello,

Mathieu Bérard wrote:
> [   15.031823] ata1.00: taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00
> 00 a0 ef

Okay, this is interesting.  This is Enable Device-Initiated Interface
Power State Transitions.  So, after this command is executed the device
will try to transit to partial/slumber SATA PHY power states at its
discretion, which is all cool and dandy in theory but depending on
controller and drive firmware can cause all sorts of problems.

The NCQ problem you're seeing probably is some side effect of device
initiated link PS.  Can't tell whether the controller or the drive's
firmware is problem without further info.  Due to blacklisting, NCQ
won't be turned on your drive in future kernels and link PS doesn't seem
to cause any problem no non-NCQ, so your case is taken care of here but
this leaves me a bit worried about what _GTF feeds us.

I don't think we can reliably filter out command TFs as it might even
contain vendor-specific commands but it might be better to always log
TFs executed for _GTF such that we at least know what's going on with
the drive.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NPTL patch for linux 2.4.28

2007-03-13 Thread Willy Tarreau
On Wed, Mar 14, 2007 at 05:49:22AM +0530, Syed Ahemed wrote:
> Hello all.
> I have a tricky problem on  hand and a straight forward question.
> 
> Tricky problem:
> -
> While debugging a simple multithreaded application using gdb linux
> 2.4.28 , i noticed the thread that has crashed after sigsegv has
> complete information on the gdb (both  address and function at the
> time of crash ) .But the other threads that are in wait state (
> executing glibc functions at the time of crash ) just has the address
> but not the function name as shown below.
> 
> 
> sh-2.05b# ./gdb a.out /mnt/cf/engg_files/core_files/
> a.out.1173437318.core.5312   a.out.1173453940.core.9829
> a.out.1173438125.core.16016  lost+found
> a.out.1173438881.core.18721
>  GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
> library
> "/lib/libthread_db.so.1".
> 
> warning: exec file is newer than core file.
> Core was generated by `./a.out'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libpthread.so.0...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /opt/lib/ld-linux.so.2...done.
> Loaded symbols for /opt//lib/ld-linux.so.2
> #0  0x080485df in a (p=0x0) at threadcore.c:34
> 34  threadcore.c: No such file or directory.
>in threadcore.c
> (gdb) info threads
>  3 process 10993  0x08053840 in ?? ()
>  2 process 1267  0xbf5ff9d0 in ?? ()
> * 1 process 9829  0x080485df in a (p=0x0) at threadcore.c:34
> (gdb) thread 3
> [Switching to thread 3 (process 10993)]#0  0x08053840 in ?? ()
> (gdb) bt
> #0  0x08053840 in ?? ()
> Cannot access memory at address 0x2b
> (gdb)
> #0  0x08053840 in ?? ()
> Cannot access memory at address 0x2b
> (gdb) thread 1
> [Switching to thread 1 (process 9829)]#0  0x080485df in a (p=0x0)
>at threadcore.c:34
> 34  in threadcore.c
> (gdb) bt
> #0  0x080485df in a (p=0x0) at threadcore.c:34
> #1  0x080485bc in main () at threadcore.c:21
> (gdb) thread 2
> [Switching to thread 2 (process 1267)]#0  0xbf5ff9d0 in ?? ()
> (gdb) bt
> #0  0xbf5ff9d0 in ?? ()
> Cannot access memory at address 0x2b
> (gdb) q
> sh-2.05b#
> 
> 
> The problem is with the same glibc and gdb , Redhat 9 linux 2.4.20-8
> does  give me complete information of all the threads in the "info
> threads" command.
> Having read similar problems on various mailing lists , i believe the
> only difference is redhat 9 has patched its kernel with NTPL or
> debugging support for linux in the kernel.
> 
> Wanted to confirm if it this correct .

I really have no idea about this problem.

> My question
> --
> 
> Someone would say move to 2.6 kernel and a different glibc,But with
> custom applications at stake .I can't take that risk as yet .So i
> would want an NTPL patch for 2.4.28 kernel
> Where do i get it ? Please do respond .

Last time I saw an NPTL patch, it was for something like 2.4.21 patched
with O(1) scheduler. I bet you'll have a hard time merging those together
in 2.4.28. It is also possible that core dumps don't look the same. I
have in mind some old changes about thread core dumps, but that's too
far away to say anything reliable on the subject. Check that your core
files are of the same sizes between NPTL and no-NPTL kernels.

Alternatively, you could try RHEL3's kernel (2.4.21) which has all those
things and which is still supported.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-13 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote:
>> Now, can someone suggest a patch I can revert that might fix this?  The 
>> total number of patches between 2.6.20 and 2.6.21-rc1 will have me 
>> building kernels to bisect this till the middle of June at this rate.

On Tue, Mar 13, 2007 at 10:07:21PM -0700, William Lee Irwin III wrote:
> 4 billion patches could be bisected in 34 boots. Between 2.6.20 and
> 2.6.21-rc1 there are only:
> $ git rev-list --no-merges v2.6.20..v2.6.21-rc1  |wc -l
> 3118
> patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots.
> Of course, this is a little optimistic because it assumes no additional
> breakage occurring at the various bisection points. In any event,
> assuming (pessimistically) 10 minutes per build, this is 280 minutes or
> 4 hours and 40 minutes of build time. I estimate the process should
> complete well before Friday of this week, never mind June.

33 boots for 4 billion, 13 boots for 3118, ceil(log(n)/log(2))+1 boots
in general, 10 minutes/build gives 130 minutes or 2 hours, 10 minutes
for 13 boots. I have no plausible explanation for these errors, and
don't care to be told of any, either.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread David Miller
From: Bill Irwin <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 22:40:18 -0700

> I'm still trying to get on this.

See a response I just gave in this thread, I gave some tips that might
help track down what's going wrong here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread Bill Irwin
On Tue, Mar 13, 2007 at 05:29:17PM -0700, Nish Aravamudan wrote:
> Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but
> had your small hugetlb patch on top.
> So I went back and patched 2.6.20.1 with your patch, rebooted, got a
> soft lockup. Went back to stock 2.6.20.1 and did not.
> I don't see how your patch (C&P below for reference) could make any
> difference...Especially because no hugepages were in use at the time.
> On patched 2.6.20.1, I was just trying to check if my source tree had
> your patch applied (by `patch -p1 < davem.patch`) and got the
> soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to
> try a clean 2.6.20.3 as well, now.

I'm still trying to get on this.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] reiser4: page lock recursion in reiser4_write_extent

2007-03-13 Thread Nate Diller

This little code snippet seems to have a page_lock recursion, in
addition to overall looking particularly fragile to me.  It seems to
be handling the case where a page needs to be brought uptodate because
a partial page write is being done.  The page gets locked as many as 3
times, each checking PageUptodate, however the two failure cases here
go BUG() instead of returning an error.  I'm starting to think that
somehow the whole suspect branch just never gets taken, because
otherwise I would expect to see bug reports related to -EIO, -ENOMEM,
etc causing this to barf.

either way, it seems there's a lock recursion if another thread races
to bring @page uptodate while we're waiting on the first lock_page()
call.

---

   page = jnode_page(jnodes[i]);
   if (page_offset(page) < inode->i_size &&
   !PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {
   /*
* the above is not optimal for partial write to last
* page of file when file size is not at boundary of
* page
*/
takes the lock
   lock_page(page);
raced with readpage?
   if (!PageUptodate(page)) {
readpage drops lock
   result = readpage_unix_file(NULL, page);
   BUG_ON(result != 0);
-ENOMEM?
   /* wait for read completion */
   lock_page(page);
   BUG_ON(!PageUptodate(page));
-EIO?
   unlock_page(page);
   } else
still have the lock here
   result = 0;
   }

   BUG_ON(get_current_context()->trans->atom != NULL);
   fault_in_pages_readable(buf, to_page);
   BUG_ON(get_current_context()->trans->atom != NULL);

BOOM!!!
   lock_page(page);
   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {

---

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread David Miller
From: "Nish Aravamudan" <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 17:29:17 -0700

> Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but
> had your small hugetlb patch on top.
> 
> So I went back and patched 2.6.20.1 with your patch, rebooted, got a
> soft lockup. Went back to stock 2.6.20.1 and did not.
> 
> I don't see how your patch (C&P below for reference) could make any
> difference...Especially because no hugepages were in use at the time.
> On patched 2.6.20.1, I was just trying to check if my source tree had
> your patch applied (by `patch -p1 < davem.patch`) and got the
> soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to
> try a clean 2.6.20.3 as well, now.

We've seen cases in the past where something benign like this
triggers a bug because it moves the data/bss/etc. sections
around.

For example, if the used parts of the kernel image end up extending
into another page, this can influence the bootup memory detection
logic.

The softlockup in your first trace shows it spinning in the journaling
code.  Perhaps what is happening is that it is looping endlessly over
some data structure which is in a corrupted state.  This could happen
if the bootup code erroneously frees up pages it should not have.
I would recommend figuring out exactly what the journaling code is
stuck on, then try to trace the life of that page of memory from
early bootup until it is allocated.

I hope this can help you figure out this bug as I can't reproduce it
here at all, and I did merge the hugetlb fix into Linus's tree
already (and that is the right thing to do as we certainly have some
unrelated bug here).

On the flip side, I bet removing some kernel config option might make
the heisenbug go away if you're just eager to test the hugetlb patch
:-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression between 2.6.20 and 2.6.21-rc1: NCQ problem with ahci and Hitachi drive

2007-03-13 Thread Len Brown

> Yes It works with acpi=off (2.6.21-rc1):
> Please notice that IRQ is changed from 19 with ACPI to 11 without.

Please verify the problem still exists in the latest 2.6.21 git.

If yes, please file a bug here:
http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI

For 2.6.20.stable, please attach
the complete output from dmesg -s64000
output from acpidump
output from lspci -vv
and paste a copy of /proc/interrupts

For 2.6.21.broken, please attach as much
of the dmesg as you can capture, and the /proc/interrupts
if you can get that far.

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/18] kconfig for oprofile

2007-03-13 Thread Steven Rostedt
Merge the oprofile configs.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/oprofile/Kconfig b/arch/x86_64/oprofile/Kconfig
deleted file mode 100644
index d8a8408..000
--- a/arch/x86_64/oprofile/Kconfig
+++ /dev/null
@@ -1,17 +0,0 @@
-config PROFILING
-   bool "Profiling support (EXPERIMENTAL)"
-   help
- Say Y here to enable the extended profiling support mechanisms used
- by profilers such as OProfile.
- 
-
-config OPROFILE
-   tristate "OProfile system profiling (EXPERIMENTAL)"
-   depends on PROFILING
-   help
- OProfile is a profiling system capable of profiling the
- whole system, include the kernel, kernel modules, libraries,
- and applications.
-
- If unsure, say N.
-

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/18] create x86/kernel/cpu/mcheck/Makefile

2007-03-13 Thread Steven Rostedt
Create the Makefile in the common hold and adjust the i386 and
x86_64 code accordingly.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/cpu/mcheck/Makefile 
b/arch/x86/kernel/cpu/mcheck/Makefile
new file mode 100644
index 000..6e7cb4c
--- /dev/null
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -0,0 +1 @@
+obj-y  = therm_throt.o
\ No newline at end of file
diff --git a/arch/i386/kernel/cpu/mcheck/Makefile 
b/arch/i386/kernel/cpu/mcheck/Makefile
index f1ebe1c..30808f3 100644
--- a/arch/i386/kernel/cpu/mcheck/Makefile
+++ b/arch/i386/kernel/cpu/mcheck/Makefile
@@ -1,2 +1,2 @@
-obj-y  =   mce.o k7.o p4.o p5.o p6.o winchip.o therm_throt.o
+obj-y  =   mce.o k7.o p4.o p5.o p6.o winchip.o
 obj-$(CONFIG_X86_MCE_NONFATAL) +=  non-fatal.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/18] create x86/mm/Makefile

2007-03-13 Thread Steven Rostedt
Create the Makefile in the common hold and adjust the i386 and
x86_64 code accordingly.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
new file mode 100644
index 000..1b6e922
--- /dev/null
+++ b/arch/x86/mm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
diff --git a/arch/i386/mm/Makefile b/arch/i386/mm/Makefile
index 80908b5..0cb01e6 100644
--- a/arch/i386/mm/Makefile
+++ b/arch/i386/mm/Makefile
@@ -5,6 +5,5 @@
 obj-y  := init.o pgtable.o fault.o ioremap.o extable.o pageattr.o mmap.o
 
 obj-$(CONFIG_NUMA) += discontig.o
-obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_HIGHMEM) += highmem.o
 obj-$(CONFIG_BOOT_IOREMAP) += boot_ioremap.o
diff --git a/arch/x86_64/mm/Makefile b/arch/x86_64/mm/Makefile
index d25ac86..4beaed8 100644
--- a/arch/x86_64/mm/Makefile
+++ b/arch/x86_64/mm/Makefile
@@ -3,9 +3,6 @@
 #
 
 obj-y   := init.o fault.o ioremap.o extable.o pageattr.o mmap.o
-obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa.o
 obj-$(CONFIG_K8_NUMA) += k8topology.o
 obj-$(CONFIG_ACPI_NUMA) += srat.o
-
-hugetlbpage-y = ../../i386/mm/hugetlbpage.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/18] rm include pointer to i386 msr-on-cpu.c file

2007-03-13 Thread Steven Rostedt
Remove the C file with just the include that points to the
i386 msr-on-cpu.c file.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/lib/msr-on-cpu.c b/arch/x86_64/lib/msr-on-cpu.c
deleted file mode 100644
index 47e0ec4..000
--- a/arch/x86_64/lib/msr-on-cpu.c
+++ /dev/null
@@ -1 +0,0 @@
-#include "../../i386/lib/msr-on-cpu.c"

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/18] acpi Makefile updates

2007-03-13 Thread Steven Rostedt
Create the arch/x86/acpi/Makefile, and remove the associate stuff
from the i386 and x86_64.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile
new file mode 100644
index 000..f4aa6dc
--- /dev/null
+++ b/arch/x86/kernel/acpi/Makefile
@@ -0,0 +1,5 @@
+obj-$(CONFIG_ACPI) += boot.o
+
+ifneq ($(CONFIG_ACPI_PROCESSOR),)
+obj-y  += processor.o cstate.o
+endif
diff --git a/arch/x86_64/kernel/acpi/Makefile b/arch/x86_64/kernel/acpi/Makefile
index 080b996..eb4bc11 100644
--- a/arch/x86_64/kernel/acpi/Makefile
+++ b/arch/x86_64/kernel/acpi/Makefile
@@ -1,9 +1,2 @@
-obj-y  := boot.o
-boot-y := ../../../i386/kernel/acpi/boot.o
 obj-$(CONFIG_ACPI_SLEEP)   += sleep.o wakeup.o
 
-ifneq ($(CONFIG_ACPI_PROCESSOR),)
-obj-y  += processor.o
-processor-y:= ../../../i386/kernel/acpi/processor.o 
../../../i386/kernel/acpi/cstate.o
-endif
-

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/18] mv kernel/cpu/cpufreq/speedstep-lib.c

2007-03-13 Thread Steven Rostedt
Move kernel/cpu/cpufreq/speedstep-lib.c to the common hold.

Also has the slight change to reference speedstep-lib.h that is being moved
to include/asm-i386.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.c 
b/arch/x86/kernel/cpu/cpufreq/speedstep-lib.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/speedstep-lib.c
rename to arch/x86/kernel/cpu/cpufreq/speedstep-lib.c
index d59277c..ff4482b 100644
--- a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.c
+++ b/arch/x86/kernel/cpu/cpufreq/speedstep-lib.c
@@ -17,7 +17,7 @@
 #include 
 
 #include 
-#include "speedstep-lib.h"
+#include 
 
 #define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, 
"speedstep-lib", msg)
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/18] mv kernel/cpu/cpufreq/p4-clockmod.c

2007-03-13 Thread Steven Rostedt
Move kernel/cpu/cpufreq/p4-clockmod.c to the common hold.

Also has the slight change to reference speedstep-lib.h that is being moved
to include/asm-i386.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>


diff --git a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c 
b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
rename to arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
index 4786fed..ac5d5a1 100644
--- a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
+++ b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
@@ -33,7 +33,7 @@
 #include 
 #include 
 
-#include "speedstep-lib.h"
+#include 
 
 #define PFX"p4-clockmod: "
 #define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, 
"p4-clockmod", msg)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/18] create x86/kernel/cpu/Makefile

2007-03-13 Thread Steven Rostedt
Create the Makefile in the common hold and adjust the i386 and
x86_64 code accordingly.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
new file mode 100644
index 000..bf4ae59
--- /dev/null
+++ b/arch/x86/kernel/cpu/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_MTRR) +=  mtrr/
+obj-y  += intel_cacheinfo.o
+obj-$(CONFIG_X86_MCE)  +=  mcheck/
+obj-$(CONFIG_CPU_FREQ) +=  cpufreq/
diff --git a/arch/i386/kernel/cpu/Makefile b/arch/i386/kernel/cpu/Makefile
index 010aecf..e484d74 100644
--- a/arch/i386/kernel/cpu/Makefile
+++ b/arch/i386/kernel/cpu/Makefile
@@ -8,12 +8,11 @@ obj-y +=  amd.o
 obj-y  +=  cyrix.o
 obj-y  +=  centaur.o
 obj-y  +=  transmeta.o
-obj-y  +=  intel.o intel_cacheinfo.o
+obj-y  +=  intel.o
 obj-y  +=  rise.o
 obj-y  +=  nexgen.o
 obj-y  +=  umc.o
 
 obj-$(CONFIG_X86_MCE)  +=  mcheck/
 
-obj-$(CONFIG_MTRR) +=  mtrr/
 obj-$(CONFIG_CPU_FREQ) +=  cpufreq/

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/18] Straight file moves

2007-03-13 Thread Steven Rostedt
Here's a list of files that were moved from either i386 or x86_64 over
to the arch/x86 directory.  Since I now used the git-diff -M option
(thanks Linus!), and to spare LKML with a lot of patches, I put all
the renames that were unmodified (strictly renamed) into this file,
with one exception.  I put the moving of the speedstep-lib.h file
in it's own file to allow for discussion on that, (ok Chris).

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
similarity index 100%
rename from arch/i386/kernel/acpi/boot.c
rename to arch/x86/kernel/acpi/boot.c
diff --git a/arch/i386/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
similarity index 100%
rename from arch/i386/kernel/acpi/cstate.c
rename to arch/x86/kernel/acpi/cstate.c
diff --git a/arch/i386/kernel/acpi/processor.c 
b/arch/x86/kernel/acpi/processor.c
similarity index 100%
rename from arch/i386/kernel/acpi/processor.c
rename to arch/x86/kernel/acpi/processor.c
diff --git a/arch/i386/kernel/alternative.c b/arch/x86/kernel/alternative.c
similarity index 100%
rename from arch/i386/kernel/alternative.c
rename to arch/x86/kernel/alternative.c
diff --git a/arch/i386/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
similarity index 100%
rename from arch/i386/kernel/bootflag.c
rename to arch/x86/kernel/bootflag.c
diff --git a/arch/i386/kernel/cpu/intel_cacheinfo.c 
b/arch/x86/kernel/cpu/intel_cacheinfo.c
similarity index 100%
rename from arch/i386/kernel/cpu/intel_cacheinfo.c
rename to arch/x86/kernel/cpu/intel_cacheinfo.c
diff --git a/arch/i386/kernel/cpu/mcheck/therm_throt.c 
b/arch/x86/kernel/cpu/mcheck/therm_throt.c
similarity index 100%
rename from arch/i386/kernel/cpu/mcheck/therm_throt.c
rename to arch/x86/kernel/cpu/mcheck/therm_throt.c
diff --git a/arch/i386/kernel/cpu/mtrr/Makefile 
b/arch/x86/kernel/cpu/mtrr/Makefile
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/Makefile
rename to arch/x86/kernel/cpu/mtrr/Makefile
diff --git a/arch/i386/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/amd.c
rename to arch/x86/kernel/cpu/mtrr/amd.c
diff --git a/arch/i386/kernel/cpu/mtrr/centaur.c 
b/arch/x86/kernel/cpu/mtrr/centaur.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/centaur.c
rename to arch/x86/kernel/cpu/mtrr/centaur.c
diff --git a/arch/i386/kernel/cpu/mtrr/cyrix.c 
b/arch/x86/kernel/cpu/mtrr/cyrix.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/cyrix.c
rename to arch/x86/kernel/cpu/mtrr/cyrix.c
diff --git a/arch/i386/kernel/cpu/mtrr/generic.c 
b/arch/x86/kernel/cpu/mtrr/generic.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/generic.c
rename to arch/x86/kernel/cpu/mtrr/generic.c
diff --git a/arch/i386/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/if.c
rename to arch/x86/kernel/cpu/mtrr/if.c
diff --git a/arch/i386/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/main.c
rename to arch/x86/kernel/cpu/mtrr/main.c
diff --git a/arch/i386/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/mtrr.h
rename to arch/x86/kernel/cpu/mtrr/mtrr.h
diff --git a/arch/i386/kernel/cpu/mtrr/state.c 
b/arch/x86/kernel/cpu/mtrr/state.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/state.c
rename to arch/x86/kernel/cpu/mtrr/state.c
diff --git a/arch/i386/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
similarity index 100%
rename from arch/i386/kernel/cpuid.c
rename to arch/x86/kernel/cpuid.c
diff --git a/arch/x86_64/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
similarity index 100%
rename from arch/x86_64/kernel/early_printk.c
rename to arch/x86/kernel/early_printk.c
diff --git a/arch/i386/kernel/i8237.c b/arch/x86/kernel/i8237.c
similarity index 100%
rename from arch/i386/kernel/i8237.c
rename to arch/x86/kernel/i8237.c
diff --git a/arch/x86_64/kernel/k8.c b/arch/x86/kernel/k8.c
similarity index 100%
rename from arch/x86_64/kernel/k8.c
rename to arch/x86/kernel/k8.c
diff --git a/arch/i386/kernel/microcode.c b/arch/x86/kernel/microcode.c
similarity index 100%
rename from arch/i386/kernel/microcode.c
rename to arch/x86/kernel/microcode.c
diff --git a/arch/i386/kernel/msr.c b/arch/x86/kernel/msr.c
similarity index 100%
rename from arch/i386/kernel/msr.c
rename to arch/x86/kernel/msr.c
diff --git a/arch/i386/kernel/pcspeaker.c b/arch/x86/kernel/pcspeaker.c
similarity index 100%
rename from arch/i386/kernel/pcspeaker.c
rename to arch/x86/kernel/pcspeaker.c
diff --git a/arch/i386/kernel/quirks.c b/arch/x86/kernel/quirks.c
similarity index 100%
rename from arch/i386/kernel/quir

[PATCH 01/18] toplevel Kconfig changes

2007-03-13 Thread Steven Rostedt
Create a toplevel Kconfig for arch/x86 and update the i386
and x86_64 Kconfigs as well.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
new file mode 100644
index 000..18223ff
--- /dev/null
+++ b/arch/x86/Kconfig
@@ -0,0 +1,4 @@
+
+
+source arch/x86/oprofile/Kconfig
+
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 56eb14c..3a2e117 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -738,7 +738,7 @@ source fs/Kconfig
 menu "Instrumentation Support"
 depends on EXPERIMENTAL
 
-source "arch/x86_64/oprofile/Kconfig"
+source "arch/x86/Kconfig"
 
 config KPROBES
bool "Kprobes (EXPERIMENTAL)"

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2

2007-03-13 Thread Steven Rostedt
[Hopefully fixed email client to make it to the list this time]
[This series has changed by using git-diff -M]

Recently I've been doing some work that will affect both the i386 and x86_64
architectures.  So there will be common code for both, as well as code
that will be unique for the specific arch.  So I was looking into a way
to do this cleanly, and found that there is no clean way to share code
between x86_64 and i386.

What we have currently is a bunch of hacks.  Seems that people can't make
up their mind to what to do.

We have hack 1.
===

Reference code from i386 in the x86_64 Makefiles.

Examples:

  therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
  bootflag-y+= ../../i386/kernel/bootflag.o

[tabs screwed up, because the above can't be consistant on that either]


We have hack 2.
===

Reference code from x86_64 in the i386 Makefiles.

Examples:

  k8-y  += ../../x86_64/kernel/k8.o
  stacktrace-y+= ../../x86_64/kernel/stacktrace.o

[again the tabs too are messed up]
[--ok I'm sure I mess up the tabs too in my code--]

Now my favorite hacks!

We have hack 3.
===

Make a sole file with just an include pointer to the i386 code.

  [EMAIL PROTECTED]:~/work/git/linus.git$ cat arch/x86_64/lib/msr-on-cpu.c
  #include "../../i386/lib/msr-on-cpu.c"
  [EMAIL PROTECTED]:~/work/git/linus.git$

We have hack 4.
===

Make a sole file with just an include pointer to the x86_64 code.

  [EMAIL PROTECTED]:~/work/git/linus.git$ cat arch/i386/kernel/early_printk.c

  #include "../../x86_64/kernel/early_printk.c"
  [EMAIL PROTECTED]:~/work/git/linus.git$


So I spent last night hacking up something to try to make a common ground
for all code that is shared between x86_64 and i386.  I called this

   arch/x86


Seems appropriate, but I really don't care what it's called.  One thing about
this name, is that typing arch/x86 doesn't tab complete x86_64 anymore.
But if you can think of something better, I'd be happy to apply it.


So the following set of patches moves common code into the arch/x86 area
and updates the i386 and x86_64 files accordingly.  I separated the
patches into files that hold just Makefile changes, Kconfig changes, and
the actual moves of files.

The moves are now represted in its own patch, with one big rename patch,
using the git-diff -M format.

So the moves are simply renames, with the slight exception
of files that hold the speedstep-lib.h file.  This file was moved from the
arch/i386/kerne/cpu/cpufreq directory and put into the include/asm-i386
directory.  This was due to the fact that some of the moved files included
it, and some files that were not moved also included it. Instead of using
the #include "../../x86/" hack again, I just simply moved it to the global
i386 include directory.  Only the arch/x86 will use the include/asm-i386
change. But to make this change the move patches of the files that contain
this change also contain the changes to reference the change to locate this
file.


With this change of having a single repo that holds both the x86_64 files
as well as the i386 code, it becomes obvious of what files are being shared.
This way we don't have to worry about someone changing a file in either
x86_64 or i386 and having it break the other arch, because they didn't
realize it was being shared.


Note: I left out all the shared pci code.  It seems that this code is placed
special in the Makefiles for linking order or what not, and I don't want to
spend the time sorting that out without knowing if these changes are acceptible
or not.


-- Steve

PS. Sorry for the spam. I need to figure out how to tame quilt mail!


--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/18] rm include pointer to x86_64 early_printk.c

2007-03-13 Thread Steven Rostedt
Remove the C file with just the include that points to the
x86_64 early_printk.c file.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/early_printk.c b/arch/i386/kernel/early_printk.c
deleted file mode 100644
index 92f812b..000
--- a/arch/i386/kernel/early_printk.c
+++ /dev/null
@@ -1,2 +0,0 @@
-
-#include "../../x86_64/kernel/early_printk.c"

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/18] mv kernel/cpu/cpufreq/speedstep-lib.h

2007-03-13 Thread Steven Rostedt
OK, this one is a little different.

Move arch/i386/kernel/cpu/cpufreq/speedstep-lib.h to include/asm-i386.h

This file is used by files in arch/i386/kernel/cpu/cpufreq that are not moved.
So we move it into a more global area, to keep the includes from going a bit
crazy.

Note, the moved files that include this file will have the change to
locate it. So it's not just a straight copy.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.h 
b/include/asm-i386/speedstep-lib.h
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/speedstep-lib.h
rename to include/asm-i386/speedstep-lib.h

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/18] make the kernel Makefile

2007-03-13 Thread Steven Rostedt
Create the arch/x86/kernel/Makefile and change the i386 and x86_64
Makefiles accordingly.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
new file mode 100644
index 000..e2300ad
--- /dev/null
+++ b/arch/x86/kernel/Makefile
@@ -0,0 +1,18 @@
+
+obj-y  += bootflag.o topology.o quirks.o i8237.o alternative.o 
\
+  pcspeaker.o
+
+obj-$(CONFIG_STACKTRACE)   += stacktrace.o
+
+obj-y  +=  cpu/
+obj-$(CONFIG_X86_MSR)  += msr.o
+obj-$(CONFIG_MICROCODE)+= microcode.o
+obj-$(CONFIG_X86_CPUID)+= cpuid.o
+obj-$(CONFIG_ACPI) += acpi/
+obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+
+ifeq ($(CONFIG_X86_VOYAGER), )
+obj-$(CONFIG_SMP)  += tsc_sync.o
+endif
+
+obj-$(CONFIG_K8_NB)+= k8.o
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 4ae3dcf..bea2137 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -6,19 +6,15 @@ extra-y := head.o init_task.o vmlinux.lds
 
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
-   pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
-   quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+   pci-dma.o i386_ksyms.o i387.o  e820.o\
+   i8253.o tsc.o
 
-obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
 obj-y  += acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)  += reboot.o
 obj-$(CONFIG_MCA)  += mca.o
-obj-$(CONFIG_X86_MSR)  += msr.o
-obj-$(CONFIG_X86_CPUID)+= cpuid.o
-obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_APM)  += apm.o
-obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o tsc_sync.o
+obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o
 obj-$(CONFIG_X86_TRAMPOLINE)   += trampoline.o
 obj-$(CONFIG_X86_MPPARSE)  += mpparse.o
 obj-$(CONFIG_X86_LOCAL_APIC)   += apic.o nmi.o
@@ -35,13 +31,10 @@ obj-$(CONFIG_ACPI_SRAT) += srat.o
 obj-$(CONFIG_EFI)  += efi.o efi_stub.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
 obj-$(CONFIG_VM86) += vm86.o
-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
-obj-$(CONFIG_K8_NB)+= k8.o
 
 obj-$(CONFIG_VMI)  += vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
-obj-y  += pcspeaker.o
 
 EXTRA_AFLAGS   := -traditional
 
@@ -82,7 +75,3 @@ SYSCFLAGS_vsyscall-syms.o = -r
 $(obj)/vsyscall-syms.o: $(src)/vsyscall.lds \
$(obj)/vsyscall-sysenter.o $(obj)/vsyscall-note.o FORCE
$(call if_changed,syscall)
-
-k8-y  += ../../x86_64/kernel/k8.o
-stacktrace-y += ../../x86_64/kernel/stacktrace.o
-
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index bb47e86..3f10fe0 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -7,19 +7,14 @@ EXTRA_AFLAGS  := -traditional
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
-   setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
-   pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
+   setup64.o e820.o reboot.o \
+   pci-dma.o pci-nommu.o hpet.o tsc.o
 
-obj-$(CONFIG_STACKTRACE)   += stacktrace.o
-obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
+obj-$(CONFIG_X86_MCE)  += mce.o
 obj-$(CONFIG_X86_MCE_INTEL)+= mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o
-obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI) += acpi/
-obj-$(CONFIG_X86_MSR)  += msr.o
-obj-$(CONFIG_MICROCODE)+= microcode.o
-obj-$(CONFIG_X86_CPUID)+= cpuid.o
-obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o tsc_sync.o
+obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
 obj-y  += apic.o  nmi.o
 obj-y  += io_apic.o mpparse.o \
genapic.o genapic_cluster.o genapic_flat.o
@@ -27,34 +22,15 @@ obj-$(CONFIG_KEXEC) += machine_kexec.o 
relocate_kernel.o crash.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump.o
 obj-$(CONFIG_PM)   += suspend.o
 obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend_asm.o
-obj-$(CONFIG_CPU_FREQ) += cpufreq/
-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_IOMMU)+= pci-gart.o aperture.o

[PATCH 12/18] rm include pointer to x86_64 tsc_sync.c

2007-03-13 Thread Steven Rostedt
Remove the C file with just the include that points to the
x86_64 tsc_sync.c file.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/tsc_sync.c b/arch/i386/kernel/tsc_sync.c
deleted file mode 100644
index 1242462..000
--- a/arch/i386/kernel/tsc_sync.c
+++ /dev/null
@@ -1 +0,0 @@
-#include "../../x86_64/kernel/tsc_sync.c"

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/18] create x86/oprofile/Makefile

2007-03-13 Thread Steven Rostedt
Create the Makefile in the common hold and adjust the i386 and
x86_64 code accordingly.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/oprofile/Makefile b/arch/x86_64/oprofile/Makefile
deleted file mode 100644
index 6be3268..000
--- a/arch/x86_64/oprofile/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-#
-# oprofile for x86-64.
-# Just reuse the one from i386. 
-#
-
-obj-$(CONFIG_OPROFILE) += oprofile.o
- 
-DRIVER_OBJS = $(addprefix ../../../drivers/oprofile/, \
-   oprof.o cpu_buffer.o buffer_sync.o \
-   event_buffer.o oprofile_files.o \
-   oprofilefs.o oprofile_stats.o \
-   timer_int.o )
-
-OPROFILE-y := init.o backtrace.o
-OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o 
\
-op_model_ppro.o
-OPROFILE-$(CONFIG_X86_IO_APIC)+= nmi_timer_int.o 
-
-oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y))

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/18] create x86/lib/Makefile

2007-03-13 Thread Steven Rostedt
Create the Makefile in the common hold and adjust the i386 and
x86_64 code accordingly.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
new file mode 100644
index 000..d683d55
--- /dev/null
+++ b/arch/x86/lib/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 22d8ac5..304f754 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -8,4 +8,3 @@ lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o 
memcpy.o strstr.o \
 
 lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
 
-obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index c943271..8d5f835 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -5,7 +5,6 @@
 CFLAGS_csum-partial.o := -funroll-loops
 
 obj-y := io.o iomap_copy.o
-obj-$(CONFIG_SMP)  += msr-on-cpu.o
 
 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o  \

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/18] make the cpu/cpufreq/Makefile

2007-03-13 Thread Steven Rostedt
Create the arch/x86/kernel/cpu/cpufreq/Makefile and update the
i386 and x86_64 accordingly.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/cpu/cpufreq/Makefile 
b/arch/x86/kernel/cpu/cpufreq/Makefile
new file mode 100644
index 000..51b32fe
--- /dev/null
+++ b/arch/x86/kernel/cpu/cpufreq/Makefile
@@ -0,0 +1,6 @@
+
+obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
+obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
+obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
+obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
+obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o
diff --git a/arch/x86_64/kernel/cpufreq/Makefile 
b/arch/x86_64/kernel/cpufreq/Makefile
deleted file mode 100644
index 753ce1d..000
--- a/arch/x86_64/kernel/cpufreq/Makefile
+++ /dev/null
@@ -1,17 +0,0 @@
-#
-# Reuse the i386 cpufreq drivers
-#
-
-SRCDIR := ../../../i386/kernel/cpu/cpufreq
-
-obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
-obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
-obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
-obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
-obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o
-
-powernow-k8-objs := ${SRCDIR}/powernow-k8.o
-speedstep-centrino-objs := ${SRCDIR}/speedstep-centrino.o
-acpi-cpufreq-objs := ${SRCDIR}/acpi-cpufreq.o
-p4-clockmod-objs := ${SRCDIR}/p4-clockmod.o
-speedstep-lib-objs := ${SRCDIR}/speedstep-lib.o
diff --git a/arch/i386/kernel/cpu/cpufreq/Makefile 
b/arch/i386/kernel/cpu/cpufreq/Makefile
index 560f776..49c4ca4 100644
--- a/arch/i386/kernel/cpu/cpufreq/Makefile
+++ b/arch/i386/kernel/cpu/cpufreq/Makefile
@@ -1,6 +1,6 @@
+# See also arch/x86/kernel/cpu/cpufreq/Makefile
 obj-$(CONFIG_X86_POWERNOW_K6)  += powernow-k6.o
 obj-$(CONFIG_X86_POWERNOW_K7)  += powernow-k7.o
-obj-$(CONFIG_X86_POWERNOW_K8)  += powernow-k8.o
 obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
 obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
 obj-$(CONFIG_ELAN_CPUFREQ) += elanfreq.o
@@ -8,9 +8,5 @@ obj-$(CONFIG_SC520_CPUFREQ) += sc520_freq.o
 obj-$(CONFIG_X86_LONGRUN)  += longrun.o  
 obj-$(CONFIG_X86_GX_SUSPMOD)   += gx-suspmod.o
 obj-$(CONFIG_X86_SPEEDSTEP_ICH)+= speedstep-ich.o
-obj-$(CONFIG_X86_SPEEDSTEP_LIB)+= speedstep-lib.o
 obj-$(CONFIG_X86_SPEEDSTEP_SMI)+= speedstep-smi.o
-obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
-obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO)   += speedstep-centrino.o
-obj-$(CONFIG_X86_P4_CLOCKMOD)  += p4-clockmod.o
 obj-$(CONFIG_X86_CPUFREQ_NFORCE2)  += cpufreq-nforce2.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/18] x86 Makefile changes

2007-03-13 Thread Steven Rostedt
Create the arch/x86/Makefile and modify the i386 and x86_64
Makefiles accordingly.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
new file mode 100644
index 000..97407b2
--- /dev/null
+++ b/arch/x86/Makefile
@@ -0,0 +1,3 @@
+
+
+drivers-$(CONFIG_OPROFILE) += arch/x86/oprofile/
diff --git a/arch/i386/Makefile b/arch/i386/Makefile
index bd28f9f..78e59ef 100644
--- a/arch/i386/Makefile
+++ b/arch/i386/Makefile
@@ -98,15 +98,17 @@ mflags-y += -Iinclude/asm-i386/mach-default
 
 head-y := arch/i386/kernel/head.o arch/i386/kernel/init_task.o
 
-libs-y += arch/i386/lib/
+libs-y += arch/i386/lib/ \
+  arch/x86/lib/
 core-y += arch/i386/kernel/ \
+  arch/x86/kernel/ \
   arch/i386/mm/ \
+  arch/x86/mm/ \
   arch/i386/$(mcore-y)/ \
   arch/i386/crypto/
 drivers-$(CONFIG_MATH_EMULATION)   += arch/i386/math-emu/
 drivers-$(CONFIG_PCI)  += arch/i386/pci/
 # must be linked after kernel/
-drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/
 drivers-$(CONFIG_PM)   += arch/i386/power/
 
 CFLAGS += $(mflags-y)
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index 2941a91..150942b 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -79,11 +79,15 @@ head-y := arch/x86_64/kernel/head.o 
arch/x86_64/kernel/head64.o arch/x86_64/kern
 
 libs-y += arch/x86_64/lib/
 core-y += arch/x86_64/kernel/ \
+  arch/x86/kernel/ \
   arch/x86_64/mm/ \
-  arch/x86_64/crypto/
+  arch/x86/mm/ \
+  arch/x86_64/crypto/ \
+  arch/x86/lib/
 core-$(CONFIG_IA32_EMULATION)  += arch/x86_64/ia32/
 drivers-$(CONFIG_PCI)  += arch/x86_64/pci/
-drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/
+
+include arch/x86/Makefile
 
 boot := arch/x86_64/boot
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-13 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote:
> Now, can someone suggest a patch I can revert that might fix this?  The 
> total number of patches between 2.6.20 and 2.6.21-rc1 will have me 
> building kernels to bisect this till the middle of June at this rate.

4 billion patches could be bisected in 34 boots. Between 2.6.20 and
2.6.21-rc1 there are only:

$ git rev-list --no-merges v2.6.20..v2.6.21-rc1  |wc -l
3118

patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots.

Of course, this is a little optimistic because it assumes no additional
breakage occurring at the various bisection points. In any event,
assuming (pessimistically) 10 minutes per build, this is 280 minutes or
4 hours and 40 minutes of build time. I estimate the process should
complete well before Friday of this week, never mind June.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ooops with suspend to RAM

2007-03-13 Thread Ismail Dönmez
Hi all,

With latest GIT tree I am getting the following oops when I try to suspend to 
RAM:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0094
 printing eip:
c0222af4
*pde = 
Oops:  [#1]
PREEMPT
Modules linked in: i915 drm snd_pcm_oss snd_mixer_oss snd_seq_dummy 
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device usbhid eth1394 ipw2200 
ieee80211 ieee80211_crypt snd_hda_intel snd_hda_codec snd_pcm snd_timer snd 
snd_page_alloc tifm_7xx1 tifm_core i2c_i801 i2c_core ehci_hcd uhci_hcd 
ohci1394 ieee1394 pcmcia usbcore yenta_socket rsrc_nonstatic pcmcia_core 
sony_laptop backlight
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.21-rc3 #12)
EIP is at class_device_remove_attrs+0xa/0x30
eax: f7cb5b18   ebx:    ecx: f8bde010   edx: 
esi:    edi: f7cb5b18   ebp:    esp: d93e7e1c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process modprobe (pid: 12200, ti=d93e6000 task=e5770a50 task.ti=d93e6000)
Stack: f7cb5b18 f7cb5b20  c0222bc3 f7cb5990  f7cb5b18 f7cb59c4
   f8bcdc0f  c0222bfb f7cb5990 f8bcdbf6 f8bd3275 04e2c100 000f
   03c3 f8dcf05f  f7e3e000  f8bcdc17 c0220567 f7e3e0a4
Call Trace:
 [] class_device_del+0xa9/0xd9
 [] __nodemgr_remove_host_dev+0x0/0xb [ieee1394]
 [] class_device_unregister+0x8/0x10
 [] nodemgr_remove_ne+0x61/0x7a [ieee1394]
 [] ether1394_mac_addr+0x0/0x12 [eth1394]
 [] __nodemgr_remove_host_dev+0x8/0xb [ieee1394]
 [] device_for_each_child+0x1a/0x3c
 [] nodemgr_remove_host+0x30/0x90 [ieee1394]
 [] __unregister_host+0x1a/0xac [ieee1394]
 [] flush_cpu_workqueue+0x98/0xb7
 [] highlevel_remove_host+0x21/0x42 [ieee1394]
 [] hpsb_remove_host+0x37/0x58 [ieee1394]
 [] ohci1394_pci_remove+0x47/0x1ec [ohci1394]
 [] sysfs_hash_and_remove+0xfa/0x111
 [] pci_device_remove+0x16/0x35
 [] __device_release_driver+0x6e/0x8b
 [] driver_detach+0x99/0xda
 [] bus_remove_driver+0x57/0x75
 [] driver_unregister+0x8/0x13
 [] pci_unregister_driver+0xc/0x67
 [] sys_delete_module+0x15c/0x19d
 [] remove_vma+0x31/0x36
 [] do_munmap+0x19b/0x1b4
 [] sysenter_past_esp+0x5f/0x85
 [] packet_notifier+0xf3/0x157
 ===
Code: ff c3 85 c0 74 08 83 c0 08 e9 83 6d f6 ff b8 ea ff ff ff c3 85 c0 74 08 
83 c0 08 e9 4c 51 f6 ff c3 57 89 c7 56 53 8b 70 44 31 db <83> be 94 00 00 00 
00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83
EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP 0068:d93e7e1c


Checking Google I see a similar oops was reported long ago: 
http://lkml.org/lkml/2006/11/16/147 .

Any ideas/patches to test? Please CC me in your replies.

Thanks.

-- 
Happiness in intelligent people is the rarest thing I know. (Ernest Hemingway)

Ismail Donmez ismail (at) pardus.org.tr
GPG Fingerprint: 7ACD 5836 7827 5598 D721 DF0D 1A9D 257A 5B88 F54C
Pardus Linux / KDE developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86 boot, pda and gdt cleanups

2007-03-13 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
> This is called "pissing in the corners".  Don't do it: we don't need to
> touch that code and I actually prefer the original anyway (explicit is
> *good*).  
>
> The habit of extracting cpu number once then using it is an optimization
> which we should be aiming to get rid of (it simply hurts archs with
> efficient per-cpu implementations).

No, that was for a reason.  I was worried about smp_processor_id() not
returning valid values between init_gdt and cpu_set_gdt.  It's not
actually a problem, but relying on smp_processor_id() while we're moving
the foundations its based on seems fragile.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stolen and degraded time and schedulers

2007-03-13 Thread Jeremy Fitzhardinge
Dan Hecht wrote:
> With your previous definition of work time, would it be that:
>
> monotonic_time == work_time + stolen_time ??

(By monotonic time, I presume you mean monotonic real time.)  Yes, I
suppose you could, but I don't think that's terribly useful.   I think
work_time is probably most naturally measured in cpu clock cycles rather
than an actual time unit.  You could convert it to ns, but I don't see
the point.

I know its a term in general use, but I don't think the term "stolen
time" is all that useful, particularly when we're talking about a more
general notion of cpu work contributing to the progress of process
execution.  In the cpufreq case, time isn't "stolen" per se.

(I guess I don't like the term stolen time because you don't refer to
time spent on other processes as being stolen from your process: its
just processor time being distributed.)

> i.e. would you be defining stolen_time to include the time lost to
> processes due to the cpu running at a lower frequency?  How does this
> play into the other potential users, besides sched_clock(), of stolen
> time?  We should make sure that the abstraction introduced here makes
> sense in those places too.

Be specific.  What other uses are there?

> For example, the stuff that happens in update_process_times().  I
> think we'd want to account the stolen time to cpustat->steal.

I guess we could do something for that.  Would we account non-full-speed
cpus to it?  Maybe?

How is cpustat->steal used?  How does it get out to usermode?


>   Also we'd probably want account for stolen time with regards to
> task_running_tick().  (Though, in the latter case, maybe we first have
> to move the scheduler away from assuming HZ rate decrementing of
> p->time_slice to get this right. i.e. remove the tick based assumption
> from the scheduler, and then maybe stolen time falls in more naturally
> when accounting time slices).

I think the important part is that sched_clock() be used to actually
compute how much time each process gets.  The fact that a time quantum
gets stolen is less important.  Or do you mean something else?

> I guess taking your cpufreq as an example of work_time progressing
> slower than monotonic_time (and assuming that the remaining time is
> what you would call stolen), then e.g. top would report 50% of your
> cpu stolen when you cpu is running at 1/2 max rate.

Yes.  In the same way that clock modulation gates the cpu clock, the
hypervisor effectively gates the clock by giving time to other vcpus.

> And p->time_slice would decrement at 1/2 the rate it normally did when
> running at 1/2 speed.  Is this the right thing to do?  If so, then I
> agree it makes sense to model hypervisor stolen time in terms of your
> "work time".

Yes, that's my thought.

>   But, if not, then maybe the amount of work you can get done during a
> period of time that is not stolen and the stolen time itself are
> really two different notions, and shouldn't be confused.  I can see
> arguments both ways. 

It seems to me like a nice opportunity to solve two problems with one
mechanism.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm2 (oops in move_freepages)

2007-03-13 Thread Bjorn Helgaas
FYI, I'm seeing the following oops with 2.6.21-rc3-mm1 (and -mm2)
on the HP rx2600 and an Intel Tiger (both ia64 boxes).

I haven't investigated this other than to determine that it
does not occur with 2.6.21-rc3 or 2.6.20-rc3-mm1, and the
instruction at move_freepages+0x10 is a load of the value
pointed to by the third argument (end_page).


Linux version 2.6.21-rc3-mm1 ([EMAIL PROTECTED]) (gcc version 4.0.3 (Debian 
4.0.3-1)) #2 SMP Tue Mar 13 16:16:22 MST 2007
...
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator,Target}
scsi0 : ioc0: LSI53C1030, FwRev=01032300h, Ports=1, MaxQ=255, IRQ=53
scsi 0:0:0:0: Direct-Access HP 36.4G ST336706LC   HP04 PQ: 0 ANSI: 2
 target0:0:0: Beginning Domain Validation
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63)
Unable to handle kernel paging request at virtual address a0007fffc758
swapper[1]: Oops 8813272891392 [1]
Modules linked in:

Pid: 1, CPU 1, comm:  swapper
psr : 1010085a2010 ifs : 840b ip  : []Not 
tainted
ip is at move_freepages+0x10/0x340
unat:  pfs : 030a rsc : 0003
rnat:  bsps:  pr  : a581
ldrs:  ccv :  fpsr: 0009804c8a74433f
csd :  ssd : 
b0  : a001000fea10 b6  : a001005ad980 b7  : a001bb20
f6  : 1003e f7  : 1003ed37a6f4de9bd37a7
f8  : 1003e f9  : 1003e000194a0cb8b
f10 : 1003e70658ddf530a940d f11 : 1003e
r1  : a00100dfe280 r2  :  r3  : 
r8  :  r9  : 4000 r10 : 
r11 : 0002 r12 : e0406004fc10 r13 : e04060048000
r14 : 0001 r15 : 0001c000 r16 : 0004
r17 : 0400 r18 : a0007fffc720 r19 : 4000
r20 : 4000 r21 : 2fff4000 r22 : d000
r23 : 5fff4000 r24 : 2fffa000 r25 : 6318
r26 : 0318c000 r27 : 00c5c000 r28 : 000c4000
r29 : 0008 r30 : 0003fc00 r31 : 000e

Call Trace:
 [] show_stack+0x40/0xa0
sp=e0406004f7c0 bsp=e040600496f8
 [] show_regs+0x880/0x8a0
sp=e0406004f990 bsp=e040600496a0
 [] die+0x1c0/0x2c0
sp=e0406004f990 bsp=e04060049658
 [] ia64_do_page_fault+0x820/0x9c0
sp=e0406004f9b0 bsp=e04060049608
 [] ia64_leave_kernel+0x0/0x270
sp=e0406004fa40 bsp=e04060049608
 [] move_freepages+0x10/0x340
sp=e0406004fc10 bsp=e040600495a8
 [] move_freepages_block+0x110/0x140
sp=e0406004fc10 bsp=e04060049578
 [] __rmqueue+0x4e0/0x7e0
sp=e0406004fc10 bsp=e04060049518
 [] rmqueue_bulk+0x50/0x120
sp=e0406004fc10 bsp=e040600494d0
 [] get_page_from_freelist+0x460/0xd40
sp=e0406004fc10 bsp=e04060049420
 [] __alloc_pages+0xa0/0x580
sp=e0406004fc10 bsp=e040600493a8
 [] kmem_getpages+0x150/0x3a0
sp=e0406004fc20 bsp=e04060049370
 [] cache_grow+0x1e0/0x640
sp=e0406004fc30 bsp=e04060049308
 [] cache_alloc_refill+0x490/0x580
sp=e0406004fc30 bsp=e040600492a0
 [] kmem_cache_alloc+0x120/0x1e0
sp=e0406004fc30 bsp=e04060049270
 [] sd_revalidate_disk+0x90/0x1c20
sp=e0406004fc30 bsp=e040600491f0
 [] sd_probe+0x6c0/0x7c0
sp=e0406004fc70 bsp=e04060049198
 [] driver_probe_device+0x230/0x360
sp=e0406004fc80 bsp=e04060049160
 [] __device_attach+0x30/0x60
sp=e0406004fc80 bsp=e04060049138
 [] bus_for_each_drv+0x80/0x120
sp=e0406004fc80 bsp=e04060049100
 [] device_attach+0x190/0x200
sp=e0406004fca0 bsp=e040600490c8
 [] bus_attach_device+0x80/0x160
sp=e0406004fca0 bsp=e04060049090
 [] device_add+0x940/0xf60
sp=e0406004fca0 bsp=e04060049028
 [] scsi_sysfs_add_sdev+0x60/0x520
sp=e0406004fca0 bsp=e04060048fd8
 [] scsi_probe_and_add_lun+0x1000/0x1200
sp=e0406004fca0 bsp=e04060048f68
 [] __scsi_scan_target+0x150/0xae0
sp=e0406004fcd0 bsp=e04060048f10
 [] scsi_scan_channel+0x60/0xe0
  

Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-13 Thread Nick Piggin

Eric W. Biederman wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:



Eric W. Biederman wrote:


First touch page ownership does not guarantee give me anything useful
for knowing if I can run my application or not.  Because of page
sharing my application might run inside the rss limit only because
I got lucky and happened to share a lot of pages with another running
application.  If the next I run and it isn't running my application
will fail.  That is ridiculous.


Let's be practical here, what you're asking is basically impossible.

Unless by deterministic you mean that it never enters the a non
trivial syscall, in which case, you just want to know about maximum
RSS of the process, which we already account).



Not per process I want this on a group of processes, and yes that
is all I want just.  I just want accounting of the maximum RSS of
a group of processes and then the mechanism to limit that maximum rss.


Well don't you just sum up the maximum for each process?

Or do you want to only count shared pages inside a container once,
or something difficult like that?



I don't want sharing between vservers/VE/containers to affect how many
pages I can have mapped into my processes at once.


You seem to want total isolation. You could use virtualization?



No.  I don't want the meaning of my rss limit to be affected by what
other processes are doing.  We have constraints of how many resources
the box actually has.  But I don't want accounting so sloppy that
processes outside my group of processes can artificially
lower my rss value, which magically raises my rss limit.


So what are you going to do about all the shared caches and slabs
inside the kernel?



It is basically handwaving anyway. The only approach I've seen with
a sane (not perfect, but good) way of accounting memory use is this
one. If you care to define "proper", then we could discuss that.



I will agree that this patchset is probably in the right general ballpark.
But the fact that pages are assigned exactly one owner is pure non-sense.
We can do better.  That is all I am asking for someone to at least attempt
to actually account for the rss of a group of processes and get the numbers
right when we have shared pages, between different groups of
processes.  We have the data structures to support this with rmap.


Well rmap only supports mapped, userspace pages.



Let me describe the situation where I think the accounting in the
patchset goes totally wonky. 



Gcc as I recall maps the pages it is compiling with mmap.
If in a single kernel tree I do:
make -jN O=../compile1 &
make -jN O=../compile2 &

But set it up so that the two compiles are in different rss groups.
If I run the concurrently they will use the same files at the same
time and most likely because of the first touch rss limit rule even
if I have a draconian rss limit the compiles will both be able to
complete and finish.   However if I run either of them alone if I
use the most draconian rss limit I can that allows both compiles to
finish I won't be able to compile a single kernel tree.


Yeah it is not perfect. Fortunately, there is no perfect solution,
so we don't have to be too upset about that.

And strangely, this example does not go outside the parameters of
what you asked for AFAIKS. In the worst case of one container getting
_all_ the shared pages, they will still remain inside their maximum
rss limit.

So they might get penalised a bit on reclaim, but maximum rss limits
will work fine, and you can (almost) guarantee X amount of memory for
a given container, and it will _work_.

But I also take back my comments about this being the only design I
have seen that gets everything, because the node-per-container idea
is a really good one on the surface. And it could mean even less impact
on the core VM than this patch. That is also a first-touch scheme.



However the messed up accounting that doesn't handle sharing between
groups of processes properly really bugs me.  Especially when we have
the infrastructure to do it right.

Does that make more sense?


I think it is simplistic.

Sure you could probably use some of the rmap stuff to account shared
mapped _user_ pages once for each container that touches them. And
this patchset isn't preventing that.

But how do you account kernel allocations? How do you account unmapped
pagecache?

What's the big deal so many accounting people have with just RSS? I'm
not a container person, this is an honest question. Because from my
POV if you conveniently ignore everything else... you may as well just
not do any accounting at all.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-13 Thread Gene Heskett
On Tuesday 13 March 2007, Gene Heskett wrote:
>On Tuesday 13 March 2007, Gene Heskett wrote:
>>Greetings;
>>Someone suggested a fresh thread for this.
>>
>>I now have my scripts more or less under control, and I can report that
>>kernel-2.6.20.1 with no other patches does not exhibit the undesirable
>>behaviour where tar thinks its all new, even when told to do a level 2
>> on a directory tree that hasn't been touched in months to update
>> anything.
>>
>>Next up, 2.6.20.2, plain and with the latest RDSL-0.30 patch.
>
>And amanda/tar worked normally for 2.6.20.2 plain.
>
>Next up, 2.6.21-rc1 if it will build here.

It built, it booted, and its busted big time.  First, with an amdump 
running in the background, the machine is so close to unusable that I 
considered rebooting, but I needed the data to show the problem.  I am 
losing the keyboard and mouse for a minute or more at a time but the 
keystrokes seem to be being registered so it eventually catches up.

Disk i/o seems to be the killer according to gkrellm.

But to give one an idea of the fits this is giving tar, I'll snip a line 
or 2 from an amstatus report here:
coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 138200 KB, 
must skip incremental dumps]

Huh?  138.2GB?  A 'du -h .' in that dir says 766megs.

coyote:/root  1 4426m wait for dumping
du -h says 5.0GB so that's ballpark, but its also a level 1, so maybe 20 
megs is actually new since 15:57 this afternoon local.  kmails final 
maildir is in that dir.

This goes on for much of the amstatus report, very few of the reported 
sizes are close to sane.

Now, can someone suggest a patch I can revert that might fix this?  The 
total number of patches between 2.6.20 and 2.6.21-rc1 will have me 
building kernels to bisect this till the middle of June at this rate.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Is a tattoo real, like a curb or a battleship?  Or are we suffering in 
Safeway?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-13 Thread Gabriel C

[EMAIL PROTECTED] wrote:

On Mon, 12 Mar 2007 17:38:38 BST, Kasper Sandberg said:
  

with latest xorg, xlib will be using xcb internally,



Out of curiosity, when is this "latest" Xorg going to escape to distros,
  


Already is .. Xorg 7.2+ libx11 build with xcb enabled..



and is it far enough along that beta testers can gather usable numbers?

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

2007-03-13 Thread Roland McGrath
> Yes, the code could be reworked by moving some of the data from the CPU
> hw-breakpoint info into the thread's info.  I'll see how much simpler it
> ends up being.

I don't quite understand that characterization of the kind of change I'm
advocating.  If the common case path in context switch has really anything
at all more than the example I gave, something is wrong.

> It isn't quite that easy.  Even though the number of user breakpoints may
> not have changed, their identities may have.  So the unlikely case has to
> encompass two possibilities: the number of installable user breakpoints
> has changed, or any user breakpoints have been registered or unregistered.

Why does it matter?  When a new user breakpoint was made the
highest-priority one, it ought to update tdr[0..3] right then before the
registration call returns.  It seems fine to me for it to make an
uninstalled callback right away rather than at the thread's next switch-in.
But even if you wanted to delay it, you could just set active_dr7 to zero
or something so that the unlikely case triggers.

> > For the masks to work as I described, you need to use the same enable bit
> > (or both) for kernel and user allocations.  It really doesn't matter which
> > one you use, since all of Linux is "local" for the sense of the dr7 enable
> > bits (i.e. you should just use DR_GLOBAL_ENABLE).
> 
> This shouldn't be necessary.  So long as DR_GLOBAL_ENABLE always belongs
> to the kernel's part of DR7 and DR_LOCAL_ENABLE always belongs to the
> thread's part there will be no interference between them.

The plan I suggested relies on setting want_dr7 with the enable bits that
do include the ones the kernel uses (for contested slots).  Of course it
works as well to use either bit for this, as long as you're consistent.
But as I've said at least twice already, there is no actual meaning
whatsoever to choosing one enable bit over the other.  It's just confusing
and misleading to have the code make special efforts to set one rather than
the other for different cases.  You talk about them as if they meant
something, which keeps making me wonder if you're confused.  Since the
hardware doesn't care which bit you set, you could overload them to record
a bit and a half of information there if really wanted to, but you're not
even doing that, unless I'm confused.

> Maybe.  I always had in the back of my mind the possibility that there
> might be a user I/O breakpoint set.  It could be triggered by an interrupt
> handler even in the SIGKILL case.  But since we're not supporting I/O
> breakpoints now, that's a moot point.

How would that happen?  This would mean that some user process has been
allowed to enable ioperm for some io port that kernel drivers also send to
from interrupt handlers.  Can that ever happen?

> Actually the code _doesn't_ already know what's there; the chbi area
> doesn't include any storage for the kernel DR7 value.  I figured it was at
> least as easy to read it from the CPU register as to read it from memory.  
> But maybe that's not true; according to my ancient processor manual, moves
> to/from debug registers take many more clock cycles than moves to/from
> memory.

The purpose of the chbi area is to optimize this path.  Make it store
whatever precomputed values are most convenient for the hot paths.  This
path doesn't need num_kbps, it needs kdr7.  So precompute that and do that
one load, instead of a load of chbi->num_bkps we don't otherwise need plus
a load from kdr7_masks that can be avoided altogether on hot paths.

I don't really know about the slowness of reading debug registers, though I
would guess it is slower than most common operations.  But regardless, you
can avoid it because kdr7 is something you need anyway, so you're not
replacing it with a load but letting a load you already had kill two birds.

> No.  If a debugger has removed some user breakpoints since the last time
> the thread ran, the chbi->bps[] entries could still be present.  Likewise
> if the previously-running task had more breakpoints than the current one.

I don't really get why user breakpoints would be in chbi->bps at all.
When a debug trap hits, you can check kdr7 or whatnot to see if it was a
kernel allocation, and otherwise look in current->thbi->bps to find it.

> I don't like using DR_LEN_1, because it would force asm/debugreg.h to be 
> #included by any user of hw_breakpoint.  The raw numerical value should do 
> just as well.

Agreed.  (I just used DR_LEN_1 as shorthand and was not hot on including
asm/debugreg.h in asm/hw_breakpoint.h in the actual version.)

> > On powerpc, the address breakpoint is always for an 8-byte address range.
> 
> So there's no way to trap on accesses to a particular byte within a
> string?

There's no way to tell which of the 8 bytes were accessed, AFAIK.  It's the
same as LEN8 on x86_64 or LEN[42] on i386: some byte in there was accessed.

> Better yet, if type is HW_BREAKPOINT_TYPE_EXECUTE then just ignore the
> caller's

Re: [RFC] [Patch 1/1] IBAC Patch

2007-03-13 Thread Seth Arnold
On Thu, Mar 08, 2007 at 05:58:16PM -0500, Mimi Zohar wrote:
> This is a request for comments for a new Integrity Based Access
> Control(IBAC) LSM module which bases access control decisions
> on the new integrity framework services. 

Thanks Mimi, nice to see an example of how the integrity framework ought
to be used.

> (Hopefully this will help clarify the interaction between an LSM 
> module and LIM module.)

Is this module intended to clarify an interface, or be useful in and of
itself?

> Index: linux-2.6.21-rc3-mm2/security/ibac/Makefile
> ===
> --- /dev/null
> +++ linux-2.6.21-rc3-mm2/security/ibac/Makefile
> @@ -0,0 +1,6 @@
> +#
> +# Makefile for building IBAC
> +#
> +
> +obj-$(CONFIG_SECURITY_IBAC) += ibac.o
> +ibac-y   := ibac_main.o
> Index: linux-2.6.21-rc3-mm2/security/ibac/ibac_main.c
> ===
> --- /dev/null
> +++ linux-2.6.21-rc3-mm2/security/ibac/ibac_main.c
> @@ -0,0 +1,126 @@
> +/*
> + * Integrity Based Access Control (IBAC)
> + *
> + * Copyright (C) 2007 IBM Corporation
> + * Author: Mimi Zohar <[EMAIL PROTECTED]>
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation, version 2 of the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_SECURITY_IBAC_BOOTPARAM
> +int ibac_enabled = CONFIG_SECURITY_IBAC_BOOTPARAM_VALUE;
> +
> +static int __init ibac_enabled_setup(char *str)
> +{
> + ibac_enabled = simple_strtol(str, NULL, 0);
> + return 1;
> +}
> +
> +__setup("ibac=", ibac_enabled_setup);
> +#else
> +int ibac_enabled = 0;
> +#endif

If the command line option isn't enabled, how will ibac_enabled ever be
set to '1'? Have I overlooked or forgotten some helper routine elsewhere?

> +static unsigned int integrity_enforce = 0;
> +static int __init integrity_enforce_setup(char *str)
> +{
> + integrity_enforce = simple_strtol(str, NULL, 0);
> + return 1;
> +}
> +
> +__setup("ibac_enforce=", integrity_enforce_setup);
> +
> +#define XATTR_NAME "security.evm.hash"

Is this name unique to this IBAC module? Or should it be kept in sync
with the integrity framework?

> +static inline int is_kernel_thread(struct task_struct *tsk)
> +{
> + return (!tsk->mm) ? 1 : 0;
> +}
> +
> +static int ibac_bprm_check_security(struct linux_binprm *bprm)
> +{
> + struct dentry *dentry = bprm->file->f_dentry;
> + int xattr_len;
> + char *xattr_value = NULL;
> + int rc, status;
> +
> + rc = integrity_verify_metadata(dentry, XATTR_NAME,
> +&xattr_value, &xattr_len, &status);
> + if (rc < 0 && rc == -EOPNOTSUPP) {
> + kfree(xattr_value);
> + return 0;
> + }
> +
> + if (rc < 0) {
> + printk(KERN_INFO "verify_metadata %s failed "
> +"(rc: %d - status: %d)\n", bprm->filename, rc, status);
> + if (!integrity_enforce)
> + rc = 0;
> + goto out;
> + }
> + if (status != INTEGRITY_PASS) { /* FAIL | NO_LABEL */
> + if (!is_kernel_thread(current)) {

Please remind me why kernel threads are exempt?

> + printk(KERN_INFO "verify_metadata %s "
> +"(Integrity status: FAIL)\n", bprm->filename);

Integrity status may be FAIL or NO_LABEL at this point -- would it be
more useful to report the whole truth?

> + if (integrity_enforce) {
> + rc = -EACCES;
> + goto out;
> + }
> + }
> + }
> +
> + rc = integrity_verify_data(dentry, &status);
> + if (rc < 0) {
> + printk(KERN_INFO "%s verify_data failed "
> +"(rc: %d - status: %d)\n", bprm->filename, rc, status);
> + if (!integrity_enforce)
> + rc = 0;
> + goto out;
> + }
> + if (status != INTEGRITY_PASS) {
> + if (!is_kernel_thread(current)) {

Please remind me why kernel threads are exempt?

> + printk(KERN_INFO "verify_data %s "
> +"(Integrity status: FAIL)\n", bprm->filename);

Same question about FAIL vs NO_LABEL.. (Would NO_LABEL be caught by a
failing verify_metadata above?)

> + if (integrity_enforce) {
> + rc = -EACCES;
> + goto out;
> + }
> + }
> + }
> +
> + kfree(xattr_value);
> +
> + /* measure all integrity level executables */
> + integrity_measure(dentry, bprm->filename, MAY_EXEC);
> + return 0;

If integrity_measure() fails (can it fail?) is allowing the exec still the
right approach? (I seem to recall that "measuring

Re: [PATCH 8/8] Convert PDA into the percpu section

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 10:15 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > +   pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a,
> > +   (u32 *)&gdt[GDT_ENTRY_PERCPU].b,
> > +   __per_cpu_offset[cpu], 0xF,
> > 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data 
> > segment */
> >   
> 
> Why testing with qemu is not enough.

Indeed 8(.

Thanks!
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-13 Thread Valdis . Kletnieks
On Mon, 12 Mar 2007 17:38:38 BST, Kasper Sandberg said:
> with latest xorg, xlib will be using xcb internally,

Out of curiosity, when is this "latest" Xorg going to escape to distros,
and is it far enough along that beta testers can gather usable numbers?



pgpt7KqlXv9Rp.pgp
Description: PGP signature


Re: [PATCH 0/8] x86 boot, pda and gdt cleanups

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 13:48 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > Hi all,
> >
> > The GDT stuff on x86 is a little more complex than it need be, but
> > playing with boot code is always dangerous.  These compile and boot on
> > UP and SMP for me, but Andrew should let the cook in -mm for a while.
> >   
> Hi Rusty,
> 
> This is my rough hacking patch I needed to get things into a Xen-shape
> state.

Looks good.  Just one thing:

>  void __devinit native_smp_prepare_boot_cpu(void)
>  {
> - cpu_set(smp_processor_id(), cpu_online_map);
> - cpu_set(smp_processor_id(), cpu_callout_map);
> - cpu_set(smp_processor_id(), cpu_present_map);
> - cpu_set(smp_processor_id(), cpu_possible_map);
> - per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
> + int cpu = smp_processor_id();
> +
> + cpu_set(cpu, cpu_online_map);
> + cpu_set(cpu, cpu_callout_map);
> + cpu_set(cpu, cpu_present_map);
> + cpu_set(cpu, cpu_possible_map);
> + per_cpu(cpu_state, cpu) = CPU_ONLINE;
>  
>   /* Set up %fs to point to our per-CPU area now it's allocated */
> - init_gdt(smp_processor_id(), &init_task);
> - cpu_set_gdt(smp_processor_id());
> + init_gdt(cpu, &init_task);
> + cpu_set_gdt(cpu);
>  }

This is called "pissing in the corners".  Don't do it: we don't need to
touch that code and I actually prefer the original anyway (explicit is
*good*).  

The habit of extracting cpu number once then using it is an optimization
which we should be aiming to get rid of (it simply hurts archs with
efficient per-cpu implementations).

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irda rmmod lockdep trace.

2007-03-13 Thread David Miller
From: Samuel Ortiz <[EMAIL PROTECTED]>
Date: Wed, 14 Mar 2007 02:50:03 +0200

> On Mon, Mar 12, 2007 at 04:49:21PM -0700, David Miller wrote:
> > I would strongly caution against adding any run-time overhead just to
> > cure a false lockdep warning.  Even adding a new function argument
> > is too much IMHO.
> > 
> > Make the cost show up for lockdep only, perhaps by putting each
> > hashbin lock into a seperate locking class?
> Does that look better to you:

Yes, it does.:)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: _proxy_pda still makes linking modules fail

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 16:57 +0100, Andi Kleen wrote:
> On Tue, Mar 13, 2007 at 05:23:52PM +1100, Rusty Russell wrote:
> > In particular, it's been put in GCC 4.1 for
> > CONFIG_CC_STACKPROTECTOR, which assumes %gs:40 will give the stack
> > canary.
> 
> Yes that was always ugly, but I don't know a better way.

Well, "%gs:__gcc_stack_protector" would have been better.  We could have
defined __gcc_stack_protector as an absolute symbol (0x40) at the
moment, and made it a real per-cpu var later.

> > For the record: the PDA should never have existed, that's what percpu
> > vars were supposed to be for.  Something went wrong here 8(
> 
> PDA predates per cpu.

Indeed, but I should have converted it over back in 2003 (?) when the
per-cpu stuff went in 8(

> > The ideal solution has always been to use __thread, but no architecture
> > has yet managed it (I tried for i386, and it quickly caused unbearable
> 
> I tried it too, but __thread is hopeless for kernel code
> 
> > pain).  On x86-64 that uses "%fs" on x86-64, not "%gs" as the kernel
> > does, but I might try that if I feel particularly masochistic soon...
> 
> Then swapgs wouldn't work anymore (there is no swapfs)

Good point.

Thanks,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stolen and degraded time and schedulers

2007-03-13 Thread Daniel Walker
On Tue, 2007-03-13 at 14:59 -0700, Jeremy Fitzhardinge wrote:
> Daniel Walker wrote:
> > The frequency tracking you mention is done to some extent inside the
> > timekeeping adjustment functions, but I'm not sure it's totally accurate
> > for non-timekeeping, and it also tracks things like interrupt latency.
> > Tracking frequency changes where it's important to get it right
> > shouldn't be done I think ..
> >
> > If you want accurate time accounting, don't use the TSC .
> >   
> 
> I'm not sure I follow you here.  Clocksources have the means to adjust
> the rate of time progression, mostly to warp the time for things like
> ntp.  The stability or otherwise of the tsc is irrelevant.

The adjustments that I spoke of above are working regardless of ntp ..
The stability of the TSC directly effects the clock mult adjustments in
timekeeping, as does interrupt latency since the clock is essentially
validated against the timer interrupt.

> If you had a clocksource which was explicitly using the rate at which a
> CPU does work as a timebase, then using the same warping mechanism would
> allow you to model CPU speed changes.

like I said there are other factors so that's not going to exactly model
cpu speed changes. You could come up with another method, but that would
likely require another known constant clock.

> > The sched_clock interface is basically a stripped down clocksource..
> > I've implemented sched_clock as a clocksource in the past ..
> >   
> 
> Yes, that works.  But a clocksource is strictly about measuring the
> progression of real time, and so doesn't generally measure how much work
> a CPU has done.

sched_clock doesn't measure amounts of cpu work either, it's all about
timing. 

> >> We currently have a sched_clock interface in paravirt_ops to deal with
> >> the hypervisor aspect.  It only occurred to me this morning that cpufreq
> >> presents exactly the same problem to the rest of the kernel, and so
> >> there's room for a more general solution.
> >> 
> >
> > Are there other architecture which have this per-cpu clock frequency
> > changing issue? I worked with several other architectures beyond just
> > x86 and haven't seen this issue ..
> 
> Well, lots of cpus have dynamic frequencies.  Any scheduler which
> maintains history will suffer the same problem, even on UP.  If
> processes A and B are supposed to have the same priority and they both
> execute for 1ms of real time, did they make the same amount of
> progress?  Not if the cpu changed speed in between.

That's true, but given a constant clock (like what sched_clock should
have) then the accounting is similarly inaccurate. Any connection
between the scheduler and the TSC frequency changes aren't part of the
design AFAIK ..

> And any system which commonly runs virtualized (s390, power, etc) will
> need to deal with the notion of stolen time.

I haven't followed the "stolen time" discussion, but just a brief look
at your first email I'd say don't mess with the clocks .. The clocks
should always reflect the time accurately .. That's the point of the
clocks, and when the TSC, or any other clock, changes frequency it
sucks..

I haven't thought it through completely, but you might be able to solve
the issue by adding a value to each jiffie in the scheduler or altering
the scheduler to extend the number of jiffies a task gets pending on the
virtual speed of the cpu..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspend to RAM fault in VT when resuming

2007-03-13 Thread Tim Gardner
Pavel Machek wrote:
> Hi1
> 
>> I've chased one of the 'Suspend to RAM' resume problems to a specific
>> line in drivers/char/vt.c, see attached 2.6.21-rc3 diff with
> 
> Has suspend/resume ever worked on that hardware?
> 
>> TRACE_RESUME() instrumentation. The macro scr_writew resolves to '*addr
>> = val', which appears to be causing the problem. I've verified that the
>> pointer is not NULL, but don't know if its really valid. Its pretty
>> tough to tell what is happening, but on a Dell XPS it just hangs. A Dell
>> Precision blinks the keyboard lights.
> 
> It is possible that video is not initialized at that point, and that
> hardware goes seriously unhappy when you access non-existing vga. Does
> it resume ok when you completely disable video support? 
> 
> 
>   Pavel
> 
> 

Resume works on the Dell XPS with the Ubuntu Edgy release
Ubuntu-2.6.17-10.25 (2.6.17 plus a zillion fixes). Ubuntu's git tree is
rsync://rsync.kernel.org/pub/scm/linux/kernel/git/bcollins/ubuntu-2.6.git.
Ubuntu-2.6.17-8.21 is the first version where resume works, but there
are a boatload of changes in ACPI and SW suspend between that and the
previous tag Ubuntu-2.6.17-7.20. I don't know what made SW suspend work,
but even knowing that won't tell me what broke it again.

I've been avoiding the bisect process because it is quite time consuming
on my slow machine, there is much branch weirdness, and I'm not that
good with git. I thought if I narrowed the failure down to a small chunk
of code in 2.6.21-rc3, then the answer might be obvious. No such luck, huh?

This crash behaves like the video memory space has become unmapped. Is
that possible?

rtg
-- 
Tim Gardner [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread mdew .

many changes since 2.6.20.3-rc1?

On 3/14/07, Greg KH <[EMAIL PROTECTED]> wrote:

We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
It contains a number of bugfixes and all 2.6.20 users are recommended to
upgrade.

The diffstat and short summary of the fixes are below.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Linus Torvalds


On Tue, 13 Mar 2007, Roland McGrath wrote:
> 
> Ok, fine.  But PATH_MAX is a real constant that has some meaning in the
> kernel.  It's perfectly correct to use PATH_MAX as a constant on a system
> like Linux that defines it and means what it says.  Conversely, OPEN_MAX
> has no useful relationship with anything the kernel is doing at all.

Sure. I'm just saying that some people may use OPEN_MAX the way I know 
people use PATH_MAX - whether it's what you're supposed to or not.

I do agree that PATH_MAX is much more appropriate to be used that way, and 
is more likely to have "real" meaning, I just worry.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote:
> I'm trying to remember why we ever would have needed to zero out the
> pagetable pages if we're taking down the whole mm?  Maybe it's
> because "oh, the arch wants to put this page into a quicklist to
> recycle it", which is all rather circular.
> It would be interesting to look at a) leave the page full of random
> garbage if we're releasing the whole mm and b) return it straight to
> the page allocator.

We never did need to modify ptes on exit() or other pagetable prunings
(not that they were ever done outside exit() before 2.6.x). The only
subtlety is that pruning on munmap() needs a TLB flush for the TLB
itself to drop the references to the pages referred to by the PTE's on
pruning in the presence of hardware pagetable walkers (in the exit()
case there are no user execution contexts left to potentially utilize
the dead translations so it's less important). That's handled by
tlb_remove_page() and shouldn't need any updates across such a change.

I believe the zeroing on teardown was largely a result of idiom vs.
any particular need. Essentially using ptep_get_and_clear() to handle
the non-pruning munmap() case in a manner unified with other pagetable
teardowns. Also likely is 2.4.x legacy from when that and possibly
earlier kernels maintained arch-private quicklists for pagetables.

There are furthermore distinctions to make between fork() and execve().
fork() stomps over the entire process address space copying pagetables
en masse. After execve() a process incrementally faults in PTE's one at
a time. It should be clear that if case analyses are of interest at
all, fork() will want cache-hot pages (cache-preloaded pages?) where
such are largely wasted on incremental faults after execve(). The copy
operations in fork() should probably also be examined in the context of
shared pagetables at some point.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: _proxy_pda still makes linking modules fail

2007-03-13 Thread Rusty Russell
On Tue, 2007-03-13 at 08:31 -0700, Jeremy Fitzhardinge wrote:
> Paul Mackerras wrote:
> > There is a fundamental problem with using __thread, which is that gcc
> > assumes that the addresses of __thread variables are constant within
> > one thread, and that therefore it can cache the result of address
> > calculations.  However, with preempt, threads in the kernel can't rely
> > on staying on one cpu, and therefore the addresses of per-cpu
> > variables can change.  There appears to be no way to tell gcc to drop
> > all cached __thread variable address calculations at a given point
> > (e.g. when enabling or disabling preemption).  That is basically why I
> > gave up on using __thread for per-cpu variables on powerpc.

[ Thanks for the enlightenment, Paul ]

> Doesn't that fall under the general class of "you have to be pinned to a
> particular cpu in order to meaningfully use per-cpu variables"?

No, it makes assumptions about the *address* of a per-cpu variable not
changing, even across barriers.

> In principle gcc could CSE the value of smp_processor_id() across a cpu
> change in the same way.

No, this is why preempt_enable and the like are memory barriers.
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-13 Thread Nish Aravamudan

On 3/13/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Nish Aravamudan a écrit :
> On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote:
>>
>> Hi Nick,
>>
>> > Anyway, I'll keep experimenting. If anyone from MySQL wants to help
>> look
>> > at this, send me a mail (eg. especially with the sched_setscheduler
>> issue,
>> > you might be able to do something better).
>>
>> I took a look at this today and figured Id document it:
>>
>> http://ozlabs.org/~anton/linux/sysbench/
>>
>> Bottom line: it looks like issues in the glibc malloc library, replacing
>> it with the google malloc library fixes the negative scaling:
>>
>> # apt-get install libgoogle-perftools0
>> # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld
>
> Quick datapoint, still collecting data and trying to verify it's
> always the case: on my 8-way Xeon, I'm actually seeing *much* worse
> performance with libtcmalloc.so compared to mainline. Am generating
> graphs and such still, but maybe someone else with x86_64 hardware
> could try the google PRELOAD and see if it helps/hurts (to rule out
> tester stupidity)?

I wish I had a 8-way test platform :)

Anyway, could you post some oprofile results ?


Hopefully soon -- want to still make sure I'm not doing something
dumb. Am also hoping to get some of the gdb backtraces like Anton had.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 00/59] Make common x86 arch area for i386 and x86_64

2007-03-13 Thread Steven Rostedt
On Tue, 2007-03-13 at 14:45 -0700, Chris Wright wrote:

> what about asm-x86/ dir?  the asm/ symlink would still point to relevant
> arch, but the file there could be simply #include  ?

Would it be acceptable to have an include/asm-x86/ dir with one file?
Of course it will open the door to merge current code and share it there
too.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 00/59] Make common x86 arch area for i386 and x86_64

2007-03-13 Thread Steven Rostedt
On Tue, 2007-03-13 at 14:39 -0700, Linus Torvalds wrote:
> 
> On Tue, 13 Mar 2007, Steven Rostedt wrote:
> > 
> > What we have currently is a bunch of hacks.  Seems that people can't make
> > up their mind to what to do.
> 
> I don't mind the patches, but I'd be a lot happier if it also was a stated 
> intention to actually make it be buildable as "x86", the same way that the 
> separate 32-bit and 64-bit POWER architectures were merged into just one 
> architecture that could be built either way.

That's actually a larger goal, but for the immediate future, I figure
this would be a good first step.  Start out by stating what's similar,
and then build off of this for something bigger.  But in the mean time,
we can have a staging ground for work that's for both i386 and x86_64
archs. And for those that know these systems in a more intimate way
(Andi :) they can work off of this to make that monster.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-13 Thread Eric Dumazet

Nish Aravamudan a écrit :

On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote:


Hi Nick,

> Anyway, I'll keep experimenting. If anyone from MySQL wants to help 
look
> at this, send me a mail (eg. especially with the sched_setscheduler 
issue,

> you might be able to do something better).

I took a look at this today and figured Id document it:

http://ozlabs.org/~anton/linux/sysbench/

Bottom line: it looks like issues in the glibc malloc library, replacing
it with the google malloc library fixes the negative scaling:

# apt-get install libgoogle-perftools0
# LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld


Quick datapoint, still collecting data and trying to verify it's
always the case: on my 8-way Xeon, I'm actually seeing *much* worse
performance with libtcmalloc.so compared to mainline. Am generating
graphs and such still, but maybe someone else with x86_64 hardware
could try the google PRELOAD and see if it helps/hurts (to rule out
tester stupidity)?


I wish I had a 8-way test platform :)

Anyway, could you post some oprofile results ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Roland McGrath
> I'd actually prefer this as part of the "remove OPEN_MAX" patch.

Ok.  (But now you're going to argue with me about "remove OPEN_MAX",
and you haven't said you have any problem with changing SCM_MAX_FD,
so why make it wait?)

> That said, it actually worries me that you should call "_SC_OPEN_MAX". 
[...]
> For example, I know perfectly well that I should use _SC_PATH_MAX, but a 
> *lot* of code simply doesn't care. In git, I used PATH_MAX, and the reason 
[...]

Ok, fine.  But PATH_MAX is a real constant that has some meaning in the
kernel.  It's perfectly correct to use PATH_MAX as a constant on a system
like Linux that defines it and means what it says.  Conversely, OPEN_MAX
has no useful relationship with anything the kernel is doing at all.

> So, what's the likelihood that this will break some old programs? I 
> realize that modern distributions don't put the kernel headers in their 
> user-visible includes any more, but the breakage is most likely exactly 
> for old programs and older distributions.

Well, I don't know for sure.  It doesn't seem all that likely to me (not
like PATH_MAX), as there has been getdtablesize() since before there was
OPEN_MAX by that name (not to mention before there was Linux).  If things
use OPEN_MAX as a constant for arrays, they're already broken unless they
call setrlimit to constrain themselves.  Getting things fixed has to start
somewhere.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 06/59] mv kernel/acpi/processor.c

2007-03-13 Thread Steven Rostedt
On Tue, 2007-03-13 at 14:32 -0700, Linus Torvalds wrote:
> 
> On Tue, 13 Mar 2007, Steven Rostedt wrote:
> >
> > Move kernel/acpi/processor.c to the common hold.
> 
> Please use
> 
>   git diff -M

OK, thanks!  I'm still quite a git-nubie.

I'll update all the move patches. It may take a bit of hand work.

What I really did to do this patch series was to make all my changes in
git.  But the changes where not smooth from change set to change set. So
I did one big git-diff, and then used good old midnight commander (mc)
to parse the patches. And then pulled them into quilt to comment and
send them.


> 
> for things like this.
> 
> In fact, even if you weren't a git user, I'd ask you to *become* one just 
> because I think that it's a *lot* more productive if people actually see 
> renames as renames, and will see what - if anything - changed when 
> renaming.
> 
> The "-M" flag isn't the default, simply because it generates patches that 
> cannot be applied with regular "patch", but for something like this, I 
> think it's practically imperative. The old kind of "remove file" + "add 
> file" patch just isn't acceptable when there are very viable alternaties.

I wish I knew this before breaking it up.   But I'm sure I can do
another big patch and automate these updates :)

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irda rmmod lockdep trace.

2007-03-13 Thread Samuel Ortiz
On Mon, Mar 12, 2007 at 04:49:21PM -0700, David Miller wrote:
> I would strongly caution against adding any run-time overhead just to
> cure a false lockdep warning.  Even adding a new function argument
> is too much IMHO.
> 
> Make the cost show up for lockdep only, perhaps by putting each
> hashbin lock into a seperate locking class?
Does that look better to you:

diff --git a/include/net/irda/irqueue.h b/include/net/irda/irqueue.h
index 335b0ac..67cb434 100644
--- a/include/net/irda/irqueue.h
+++ b/include/net/irda/irqueue.h
@@ -71,6 +71,7 @@ typedef struct hashbin_t {
inthb_size;
spinlock_t hb_spinlock; /* HB_LOCK - Can be used by the user */
 
+   struct lock_class_key hb_lock_key;
irda_queue_t* hb_queue[HASHBIN_SIZE] IRDA_ALIGN;
 
irda_queue_t* hb_current;
diff --git a/net/irda/irqueue.c b/net/irda/irqueue.c
index 9266233..c72ecee 100644
--- a/net/irda/irqueue.c
+++ b/net/irda/irqueue.c
@@ -370,6 +370,8 @@ hashbin_t *hashbin_new(int type)
/* Make sure all spinlock's are unlocked */
if ( hashbin->hb_type & HB_LOCK ) {
spin_lock_init(&hashbin->hb_spinlock);
+   lockdep_set_class(&hashbin->hb_spinlock,
+ &hashbin->hb_lock_key);
}
 
return hashbin;


 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stolen and degraded time and schedulers

2007-03-13 Thread Dan Hecht

On 03/13/2007 02:59 PM, Jeremy Fitzhardinge wrote:

Daniel Walker wrote:

The frequency tracking you mention is done to some extent inside the
timekeeping adjustment functions, but I'm not sure it's totally accurate
for non-timekeeping, and it also tracks things like interrupt latency.
Tracking frequency changes where it's important to get it right
shouldn't be done I think ..

If you want accurate time accounting, don't use the TSC .
  


I'm not sure I follow you here.  Clocksources have the means to adjust
the rate of time progression, mostly to warp the time for things like
ntp.  The stability or otherwise of the tsc is irrelevant.

If you had a clocksource which was explicitly using the rate at which a
CPU does work as a timebase, then using the same warping mechanism would
allow you to model CPU speed changes.


The sched_clock interface is basically a stripped down clocksource..
I've implemented sched_clock as a clocksource in the past ..
  


Yes, that works.  But a clocksource is strictly about measuring the
progression of real time, and so doesn't generally measure how much work
a CPU has done.


We currently have a sched_clock interface in paravirt_ops to deal with
the hypervisor aspect.  It only occurred to me this morning that cpufreq
presents exactly the same problem to the rest of the kernel, and so
there's room for a more general solution.


Are there other architecture which have this per-cpu clock frequency
changing issue? I worked with several other architectures beyond just
x86 and haven't seen this issue ..


Well, lots of cpus have dynamic frequencies.  Any scheduler which
maintains history will suffer the same problem, even on UP.  If
processes A and B are supposed to have the same priority and they both
execute for 1ms of real time, did they make the same amount of
progress?  Not if the cpu changed speed in between.

And any system which commonly runs virtualized (s390, power, etc) will
need to deal with the notion of stolen time.



With your previous definition of work time, would it be that:

monotonic_time == work_time + stolen_time ??

i.e. would you be defining stolen_time to include the time lost to 
processes due to the cpu running at a lower frequency?  How does this 
play into the other potential users, besides sched_clock(), of stolen 
time?  We should make sure that the abstraction introduced here makes 
sense in those places too.


For example, the stuff that happens in update_process_times().  I think 
we'd want to account the stolen time to cpustat->steal.  Also we'd 
probably want account for stolen time with regards to 
task_running_tick().  (Though, in the latter case, maybe we first have 
to move the scheduler away from assuming HZ rate decrementing of 
p->time_slice to get this right. i.e. remove the tick based assumption 
from the scheduler, and then maybe stolen time falls in more naturally 
when accounting time slices).


I guess taking your cpufreq as an example of work_time progressing 
slower than monotonic_time (and assuming that the remaining time is what 
you would call stolen), then e.g. top would report 50% of your cpu 
stolen when you cpu is running at 1/2 max rate.  And p->time_slice would 
decrement at 1/2 the rate it normally did when running at 1/2 speed.  Is 
this the right thing to do?  If so, then I agree it makes sense to model 
hypervisor stolen time in terms of your "work time".  But, if not, then 
maybe the amount of work you can get done during a period of time that 
is not stolen and the stolen time itself are really two different 
notions, and shouldn't be confused.  I can see arguments both ways.


Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-13 Thread William Lee Irwin III
On Mon, Mar 12, 2007 at 03:51:57PM -0700, David Miller wrote:
> Someone with some extreme patience could do the sparc 32-bit port too,
> in fact it's lacking the cached PGD update logic that x86 et al. have
> so it would even end up being a bug fix :-)  This lack is why sparc32
> pre-initializes the vmalloc/module area PGDs with static page tables
> at boot time, FWIW.

I'll spare everyone the details and let code if/when it appears stand
in for promises on the sparc32 front.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-13 Thread Nish Aravamudan

On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote:


Hi Nick,

> Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
> at this, send me a mail (eg. especially with the sched_setscheduler issue,
> you might be able to do something better).

I took a look at this today and figured Id document it:

http://ozlabs.org/~anton/linux/sysbench/

Bottom line: it looks like issues in the glibc malloc library, replacing
it with the google malloc library fixes the negative scaling:

# apt-get install libgoogle-perftools0
# LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld


Quick datapoint, still collecting data and trying to verify it's
always the case: on my 8-way Xeon, I'm actually seeing *much* worse
performance with libtcmalloc.so compared to mainline. Am generating
graphs and such still, but maybe someone else with x86_64 hardware
could try the google PRELOAD and see if it helps/hurts (to rule out
tester stupidity)?

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread Nish Aravamudan

On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:

On 3/13/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Nish Aravamudan" <[EMAIL PROTECTED]>
> Date: Tue, 13 Mar 2007 14:58:24 -0700
>
> > On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:
> > > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote:
> > > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
> > > > It contains a number of bugfixes and all 2.6.20 users are recommended to
> > > > upgrade.
> > > >
> > > > The diffstat and short summary of the fixes are below.
> > > >
> > > > I'll also be replying to this message with a copy of the patch between
> > > > 2.6.20.2 and 2.6.20.3.
> > >
> > > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get:
> >
> > err, duh -- this is a Sun Ultra 60, debian testing install.
>
> Figure out if 2.6.20.2 does it too, then please try to git bisect
> it down further.

Yep, that's the plan, just wanted to make folks aware.

> I took a quick look and the two sparc64 commits between 2.6.20.1
> and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld
> fix which is for a driver for hardware your ultra60 doesn't have. :)
>
> There is a decent amount of raid and nfs fixes in here, do you
> use either?

Neither.

> Another commit that might be relevant is:
>
> commit 530b09160744a12450fdacb2b78779c9830a29c8
> Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]>
> Date:   Thu Mar 1 19:02:55 2007 -0500
>
> tty_io: fix race in master pty close/slave pty close path
>
> Hmmm...
>
> Please let us know if you can narrow it down further.

Building 2.6.20.2 right now, will let you know.


Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but
had your small hugetlb patch on top.

So I went back and patched 2.6.20.1 with your patch, rebooted, got a
soft lockup. Went back to stock 2.6.20.1 and did not.

I don't see how your patch (C&P below for reference) could make any
difference...Especially because no hugepages were in use at the time.
On patched 2.6.20.1, I was just trying to check if my source tree had
your patch applied (by `patch -p1 < davem.patch`) and got the
soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to
try a clean 2.6.20.3 as well, now.

diff --git a/arch/sparc64/mm/hugetlbpage.c b/arch/sparc64/mm/hugetlbpage.c
index 33fd0b2..00677b5 100644
--- a/arch/sparc64/mm/hugetlbpage.c
+++ b/arch/sparc64/mm/hugetlbpage.c
@@ -248,6 +248,7 @@ void set_huge_pte_at(struct mm_struct *mm,
unsigned long addr,
   if (!pte_present(*ptep) && pte_present(entry))
   mm->context.huge_pte_count++;

+   addr &= HPAGE_MASK;
   for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) {
   set_pte_at(mm, addr, ptep, entry);
   ptep++;
@@ -266,6 +267,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct
*mm, unsigned long addr,
   if (pte_present(entry))
   mm->context.huge_pte_count--;

+   addr &= HPAGE_MASK;
+
   for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) {
   pte_clear(mm, addr, ptep);
   addr += PAGE_SIZE;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NPTL patch for linux 2.4.28

2007-03-13 Thread Syed Ahemed

Hello all.
I have a tricky problem on  hand and a straight forward question.

Tricky problem:
-
While debugging a simple multithreaded application using gdb linux
2.4.28 , i noticed the thread that has crashed after sigsegv has
complete information on the gdb (both  address and function at the
time of crash ) .But the other threads that are in wait state (
executing glibc functions at the time of crash ) just has the address
but not the function name as shown below.


sh-2.05b# ./gdb a.out /mnt/cf/engg_files/core_files/
a.out.1173437318.core.5312   a.out.1173453940.core.9829
a.out.1173438125.core.16016  lost+found
a.out.1173438881.core.18721
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-13 Thread Michael K. Edwards

In case anyone cares, this is a snippet of my work-in-progress
read_write.c illustrating how I might handle f_pos.  Can anyone point
me to data showing whether it's worth avoiding the spinlock when the
"struct file" is not shared between threads?  (In my world,
correctness comes before code-bumming as long as the algorithm scales
properly, and there are a fair number of corner cases to think through
-- although one might be able to piggy-back on the logic in
fget_light.)

Cheers,
- Michael

/*
*  Synchronization of f_pos is not for the purpose of serializing writes
*  to the same file descriptor from multiple threads.  It is solely to
*  protect against corruption of the f_pos field leading to a severe
*  violation of its semantics, such as:
*  - a user-visible negative value on a file type which POSIX forbids
*ever to have a negative offset; or
*  - an unexpected jump from (say) (2^32 - small) to (2^33 - small),
*due to an interrupt between the two 32-bit write instructions
*needed to write out an loff_t on some architectures, leading to
*a delayed overwrite of half of the f_pos value written by another
*thread.  (Applicable to SMP and CONFIG_PREEMPT kernels.)
*
*  Three tiers of protection on f_pos may be needed in order to trade off
*  between performance and least surprise:
*
*1. All f_pos accesses must go through accessors that protect against
*   problems with atomic 64-bit writes on some platforms.  These
*   accessors are only atomic with respect to one another.
*
*2. Those few accesses that cannot handle transient negative values of
*   f_pos must be protected from a race in some llseek implementations
*   (including generic_file_llseek).  Correct application code should
*   never encounter this race, and the syscall use cases that are
*   vulnerable to it are relatively infrequent.  This is a job for an
*   rwlock, although the sense is inverted (readers need exclusive
*   access to a "stalled pipeline", while writers only need to be able
*   to fix things up after the fact in the event of an exception).
*
*3. Applications that cannot handle transient overshoot on f_pos, under
*   conditions where several threads are writing to the same open file
*   concurrently and one of them experiences a short write, can be
*   protected from themselves by an rwsem around vfs_write(v) calls.
*   (The same applies to multi-threaded reads, mutatis mutandis.)
*   When CONFIG_WOMBAT (waste of memory, brain, and time -- thanks,
*   Bodo!) is enabled, this per-struct-file rwsem is taken as necessary.
*/

#define file_pos_local_acquire(file, flags) \
   spin_lock_irqsave(file->f_pos_lock, flags)

#define file_pos_local_release(file, flags) \
   spin_unlock_irqrestore(file->f_pos_lock, flags)

#define file_pos_excl_acquire(file, flags) \
   do {\
   write_lock_irqsave(file->f_pos_rwlock, flags);  \
   spin_lock(file->f_pos_lock);\
   } while (0)

#define file_pos_excl_release(file, flags) \
   do {\
   spin_unlock(file->f_pos_lock);  \
   write_unlock_irqrestore(file->f_pos_rwlock, flags); \
   } while (0)

#define file_pos_nonexcl_acquire(file, flags) \
   do {\
   read_lock_irqsave(file->f_pos_rwlock, flags);   \
   spin_lock(file->f_pos_lock);\
   } while (0)

#define file_pos_nonexcl_release(file, flags) \
   do {\
   spin_unlock(file->f_pos_lock);  \
   read_unlock_irqrestore(file->f_pos_rwlock, flags);  \
   } while (0)

/*
*  Accessors for f_pos (the file descriptor "position" for seekable file
*  types, also of interest as a bytes read/written counter on non-seekable
*  file types such as pipes and FIFOs).  The f_pos field of struct file
*  should be accessed exclusively through these functions, so that the
*  changes needed to interlock these accesses atomically are localized to
*  the accessor functions.
*
*  file_pos_write is defined to return the old file position so that it
*  can be restored by the caller if appropriate.  (Note that it is not
*  necessarily guaranteed that restoring the old position will not clobber
*  a value written by another thread; see below.)  file_pos_adjust is also
*  defined to return the old file position because it is more often needed
*  immediately by the caller; the new position can always be obtained by
*  adding the value passed into the "pos" parameter to file_pos_adjust.
*/

/*
*  Architectures on which an aligned 64-bit read/write is atomic can omit
*  l

Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Paul Mackerras
Andrew Morton writes:

> Plus, we can get in a situation where take a cache-cold, known-zero page
> from the pte quicklist when there is a cache-hot, non-zero page sitting in
> the page allocator.  I suspect that zeroing the cache-hot page would take a
> similar amount of time to a single miss agains the cache-cold page.

That is certainly the case on powerpc.

> I'm not saying that I _know_ that the quicklists are pointless, but I don't
> think it's established that they are pointful.

I don't see much point to them.  For powerpc, I would rather grab an
arbitrary page and zero it than get a page off a quicklist.

> Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a

My recollection was that it wasn't a win, but it was a long time ago...

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Summary of resource management discussion

2007-03-13 Thread Herbert Poetzl
On Tue, Mar 13, 2007 at 11:28:20PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Mar 13, 2007 at 05:24:59PM +0100, Herbert Poetzl wrote:
> > what about identifying different resource categories and
> > handling them according to the typical usage pattern?
> > 
> > like the following:
> > 
> >  - cpu and scheduler related accounting/limits
> >  - memory related accounting/limits
> >  - network related accounting/limits
> >  - generic/file system related accounting/limits
> > 
> > I don't worry too much about having the generic/file stuff
> > attached to the nsproxy, but the cpu/sched stuff might be
> > better off being directly reachable from the task
> 
> I think we should experiment with both combinations (a direct pointer
> to cpu_limit structure from task_struct and an indirect pointer), get
> some numbers and then decide. Or do you have results already with
> respect to that?

nope, no numbers for that, but I appreciate some testing
and probably can do some testing in this regard too
(although I want to get some testing done for the resource
 sharing between guests first)

> > > 3. How are cpusets related to vserver/containers?
> > > 
> > >   Should it be possible to, lets say, create exclusive cpusets and
> > >   attach containers to different cpusets?
> > 
> > that is what Linux-VServer does atm, i.e. you can put
> > an entire guest into a specific cpu set 
> 
> Interesting. What abt /dev/cpuset view? 

host only for now

best,
Herbert

> Is that same for all containers or do you restrict that view 
> to the containers cpuset only?
> 
> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: _proxy_pda still makes linking modules fail

2007-03-13 Thread Paul Mackerras
Jeremy Fitzhardinge writes:

> Or do you mean that if you have:
> 
>   preempt_disable();
>   use_my_percpu++;
>   preempt_enable();
>   // switch cpus
>   preempt_disable();
>   use_my_percpu++;
>   preempt_enable();
> 
> then it will still use the old pointer to use_my_percpu?

Yes.  It can, and sometimes does.  There's no way (that I know of) to
tell gcc "all my __thread variables might have moved to a different
address".

> In principle gcc could CSE the value of smp_processor_id() across a cpu
> change in the same way.

There it's easier to make gcc do what we want, because we can use a
barrier or a volatile.  The difference is that smp_processor_id() is
ultimately the value of something, not the address of something.  We
can tell gcc "values might have changed" but have no way to say
"addresses might have changed".

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Heads up on sys_fallocate()

2007-03-13 Thread David Chinner
On Tue, Mar 06, 2007 at 10:46:56AM -0600, Eric Sandeen wrote:
> Ulrich Drepper wrote:
> > Christoph Hellwig wrote:
> >> fallocate with the whence argument and flags is already quite complicated,
> >> I'd rather have another call for placement decisions, that would
> >> be called on an fd to do placement decissions for any further allocations
> >> (prealloc, write, etc)
> > 
> > Yes, posix_fallocate shouldn't be made more complicated.  But I don't
> > understand why requesting linear layout of the blocks should be an
> > option.  It's always an advantage if the blocks requested this way are
> > linear on disk.  So, the kernel should always do its best to make this
> > happen, without needing an additional option.
> > 
> 
> Agreed on both points.  The hints would be for things like start block,
> or speculative EOF preallocation, not contiguity, which I think should
> always be the goal.

ISTR having had this discussion before ;)

About guided preallocation for defrag:

http://marc.info/?t=11624785951&r=1&w=2

e.g.: The sorts of policies we need for effective use of
preallocation:

http://marc.info/?l=linux-fsdevel&m=116184475308164&w=2
http://marc.info/?l=linux-fsdevel&m=116278169519095&w=2

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-13 Thread Sanjoy Mahajan
> a previous discussion that said 4 was the default...I don't see
> why. nice uses +10 by default on all linux distro...So I suspect
> that if Mike just used "nice lame" instead of "nice +5 lame", he
> would have got what he wanted.

tcsh, and probably csh, has a builtin 'nice' with default +4.  So

  tcsh% nice ps -l

will show a process with nice +4.  If you tell it not to use the builtin,

  tcsh% \nice ps -l

then it uses /usr/bin/nice and you get +10.  bash doesn't have a nice
builtin, so it always uses /usr/bin/nice and you get +10 by default.

-Sanjoy

`Not all those who wander are lost.' (J.R.R. Tolkien)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-13 Thread Michael K. Edwards

On 3/13/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote:

Michael, please stop spreading this utter bullshit _now_.  You're so
full of half-knowledge that it's not funny anymore, and you try
to insult people knowing a few magniutes more than you left and right.


Thank you Christoph for that informative response to my comments.  I
take it that you consider read_write.c to be code of the highest
quality and maintainability.  If you have something specific in mind
when you write "utter bullshit" and "half-knowledge", I'd love to hear
it.

Now, for those who still care to respond as if improving the kernel
were a goal that you and I can share, a question:  When
generic_file_llseek needs the inode in order to retrieve the current
file size, it goes through f_mapping (the pagecache entry?) rather
than through f_path.dentry (the dentry cache?).  All other inode
retrievals in read_write.c go through f_path.dentry.  Why?  Or is this
a question that can only be asked on linux-fsdevel?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] wriston_btns: Add acerhk laptop database

2007-03-13 Thread Eric Piel
This patch adds all the "tm_new" laptops information that is in acerhk 
to wistron_btns. That's about 25 more laptops. Obviously, I couldn't try 
them all. I've just tried the Aspire 3020. For this reason, I've also 
added a printk which ask the users of those laptops to confirm me it 
works (or not). Surprisingly, the dmi information could be found on 
google for a majority of the laptops, so it might not work so badly.


The information about which laptop has which led is also imported, 
however for now it doesn't do anything. It's just in case someone adds 
led support later, in order to avoid hunting information in the acerhk 
for a second time.


Eric
From: Eric Piel <[EMAIL PROTECTED]>

wriston_btns: Add acerhk laptop database

acerhk supports already a lot of laptops. Lets import its database so that everyone can benefit
of the work of Olaf Tauber. Only the "tm_new" laptops were imported. "tm_old" laptops could be possible 
but requires more testing and probably only few laptops are still alive. "dritek" laptops should 
probably be imported into a different driver. Also compress the keymaps by fitting each entry on
an int. Most of the dmi matching was written based on google searches, so it's rather prone to errors.
That's why I'm asking people to confirm it works.

This adds the following hardware:
Acer TravelMate 370
Acer TravelMate 380
Acer TravelMate C300
Acer TravelMate C100
Acer TravelMate C110
Acer TravelMate 250
Acer TravelMate 350
Acer TravelMate 620
Acer TravelMate 630
Acer TravelMate 220
Acer TravelMate 230
Acer TravelMate 260
Acer TravelMate 280
Acer TravelMate 360
Acer TravelMate 2100
Acer TravelMate 2410
Acer Aspire 1500
Acer Aspire 1600
Acer Aspire 3020
Acer Aspire 5020
Medion MD 2900
Medion MD 40100
Medion MD 95400
Medion MD 96500
Fujitsu Siemens Amilo 7820

Signed-off-by: Eric Piel <[EMAIL PROTECTED]>

--- linux-2.6.21/drivers/input/misc/wistron_btns.c~tm610	2007-03-10 01:41:23.0 +0100
+++ linux-2.6.21/drivers/input/misc/wistron_btns.c	2007-03-12 00:54:54.0 +0100
@@ -233,11 +233,15 @@ static void bios_set_state(u8 subsys, in
 struct key_entry {
 	char type;		/* See KE_* below */
 	u8 code;
-	unsigned keycode;	/* For KE_KEY */
+	u16 keycode;		/* For KE_KEY */
 };
 
 enum { KE_END, KE_KEY, KE_WIFI, KE_BLUETOOTH };
 
+#define FE_MAIL_LED 0x01
+#define FE_WIFI_LED 0x02
+#define FE_UNTESTED 0x80
+
 static const struct key_entry *keymap; /* = NULL; Current key map */
 static int have_wifi;
 static int have_bluetooth;
@@ -288,7 +292,16 @@ static struct key_entry keymap_wistron_m
 	{ KE_KEY,  0x13, KEY_PROG3 },
 	{ KE_KEY,  0x31, KEY_MAIL },
 	{ KE_KEY,  0x36, KEY_WWW },
-	{ KE_END,  0 }
+	{ KE_END, FE_MAIL_LED }
+};
+
+static struct key_entry keymap_wistron_md40100[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x02, KEY_CONFIG },
+	{ KE_KEY, 0x31, KEY_MAIL },
+	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x37, KEY_SCREEN }, /* Display on/off */
+	{ KE_END, FE_MAIL_LED | FE_WIFI_LED | FE_UNTESTED }
 };
 
 static struct key_entry keymap_wistron_ms2141[] = {
@@ -305,23 +318,163 @@ static struct key_entry keymap_wistron_m
 };
 
 static struct key_entry keymap_acer_aspire_1500[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x03, KEY_POWER },
 	{ KE_KEY, 0x11, KEY_PROG1 },
 	{ KE_KEY, 0x12, KEY_PROG2 },
 	{ KE_WIFI, 0x30, 0 },
 	{ KE_KEY, 0x31, KEY_MAIL },
 	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x49, KEY_CONFIG },
 	{ KE_BLUETOOTH, 0x44, 0 },
-	{ KE_END, 0 }
+	{ KE_END, FE_UNTESTED }
+};
+
+static struct key_entry keymap_acer_aspire_1600[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x03, KEY_POWER },
+	{ KE_KEY, 0x08, KEY_MUTE },
+	{ KE_KEY, 0x11, KEY_PROG1 },
+	{ KE_KEY, 0x12, KEY_PROG2 },
+	{ KE_KEY, 0x13, KEY_PROG3 },
+	{ KE_KEY, 0x31, KEY_MAIL },
+	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x49, KEY_CONFIG },
+	{ KE_WIFI, 0x30, 0 },
+	{ KE_BLUETOOTH, 0x44, 0 },
+	{ KE_END, FE_MAIL_LED | FE_UNTESTED }
+};
+
+/* 3020 has been tested */
+static struct key_entry keymap_acer_aspire_5020[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x03, KEY_POWER },
+	{ KE_KEY, 0x05, KEY_MEDIA }, /* Display switch */
+	{ KE_KEY, 0x11, KEY_PROG1 },
+	{ KE_KEY, 0x12, KEY_PROG2 },
+	{ KE_KEY, 0x31, KEY_MAIL },
+	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x6a, KEY_CONFIG },
+	{ KE_WIFI, 0x30, 0 },
+	{ KE_BLUETOOTH, 0x44, 0 },
+	{ KE_END, FE_MAIL_LED | FE_UNTESTED }
+};
+
+static struct key_entry keymap_acer_travelmate_2410[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x6d, KEY_POWER },
+	{ KE_KEY, 0x11, KEY_PROG1 },
+	{ KE_KEY, 0x12, KEY_PROG2 },
+	{ KE_KEY, 0x31, KEY_MAIL },
+	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x6a, KEY_CONFIG },
+	{ KE_WIFI, 0x30, 0 },
+	{ KE_BLUETOOTH, 0x44, 0 },
+	{ KE_END, FE_MAIL_LED | FE_UNTESTED }
+};
+
+static struct key_entry keymap_acer_travelmate_110[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x02, KEY_CONFIG },
+	{ KE_KEY, 0x03, KEY_POWER },
+	{ KE_KEY, 0x08, KEY_MUTE },
+	{ KE_KEY, 0x11, KEY_PROG1 },
+	{ KE_KEY, 0x12, KEY_PROG2 },
+	{ KE_KEY, 0x20, KEY_VOLUM

RSDL development plans

2007-03-13 Thread Con Kolivas
On Wednesday 14 March 2007 07:58, Con Kolivas wrote:
> On Wednesday 14 March 2007 03:03, Con Kolivas wrote:
> > On Wednesday 14 March 2007 02:31, Con Kolivas wrote:
> > > On Monday 12 March 2007 22:26, Al Boldi wrote:
> > > > I think, it should be possible to spread this max expiration latency
> > > > across the rotation, should it not?
> > >
> > > Can you try the attached patch please Al and Mike? It "dithers" the
> > > priority bitmap which tends to fluctuate the latency a lot more but in
> > > a cyclical fashion. This tends to make the max latency bound to a
> > > smaller value and should make it possible to run -nice tasks without
> > > killing the latency of the non niced tasks. Eg you could possibly run X
> > > nice -10 at a guess like we used to in 2.4 days. It's not essential of
> > > course, but is a workaround for Mike's testcase.
> >
> > Oops, one tiny fix. This is a respin of the patch, sorry.

> Bah with a bit more sleep under my belt it became clear that I forgot to
> update the expired array in any proper way so this change almost breaks
> stuff at the moment in the shape it's in. Please disregard this change for
> now apart from interest in how I'm tackling the nice issue.

The rsdl patches queued up so far are stable and boot fine and are reasonably 
performant on many architectures so I'm quite happy for them to get a run 
in -mm. The changes planned will (as you may have seen on this email thread) 
decrease average latencies across all nice levels, and make differential nice 
levels run better together. This will allow -nice to be used without 
significant latency harm to not niced tasks (as there is presently in rsdl 
and mainline). The change required on top of the patch earlier in this email 
is to make the dynamic bitmap reflect where the tasks will actually be on an 
array swap.

However, I must inform people that I have to arrest the RSDL development for 
at least this week. I have a new and fairly serious neck problem that is 
being exacerbated badly by sitting in front of the computer for any extended 
period.

I suspect the inner workings of RSDL currently are not well understood yet by 
anyone else well enough to hack on it. I'm not at all opposed to someone 
taking up the code at the moment and making the necessary changes I've 
mentioned above in the meantime though if they can get their head around it.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /proc/kallsyms race vs module unload

2007-03-13 Thread Alexey Dobriyan
On Tue, Mar 13, 2007 at 06:49:50PM +, Paulo Marques wrote:
> Alexey Dobriyan wrote:
> >[...]
> >What happens is that module_get_kallsym() drops module_mutex,
> >returns "struct module *", module unloaded, "struct module *"
> >used.
>
> The only use for the "struct module *" is to display the name of the
> module.

Ehh?

> This can be solved by adding a "char mod_name[MODULE_NAME_LEN];" field
> to "kallsym_iter" and copy the name of the module over, while still
> holding module_mutex. It would be slightly slower, but safer.

iter->owner = module_get_kallsym(iter->pos - kallsyms_num_syms,
 &iter->value, &iter->type,
 iter->name, sizeof(iter->name));
if (iter->owner == NULL)
return 0;

/* Label it "global" if it is exported, "local" if not exported. */
iter->type = is_exported(iter->name, iter->owner)
 ^^^

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] wistron_btns: Generic keymap

2007-03-13 Thread Eric Piel
This patch adds a generic map. That is, a keymap that should output the 
correct keycodes for most laptops. This is simply based on the 
observation of all those keymaps already gathered, as most of the 
wistron codes are always mapped to the same keycode.


Hopefully, this way users which have a non-supported laptop will have a 
quick and dirty way to use the multimedia keys.


Eric
From: Eric Piel <[EMAIL PROTECTED]>

wistron_btns: Generic keymap

It turns out that the mapping of the wistron code is always the same, the main
difference being some keys which may not exist and leds which might not be
present. Therefore it's possible to write a generic keymap which would allow
the use of unknown keyboard. The user can select it specifying the parameter
"keymap=generic".

Signed-off-by: Eric Piel <[EMAIL PROTECTED]>

--- linux-2.6.21/drivers/input/misc/wistron_btns.c	2007-03-12 00:53:51.0 +0100
+++ linux-2.6.21/drivers/input/misc/wistron_btns.c~full	2007-03-12 00:39:26.0 +0100
@@ -58,7 +58,7 @@ MODULE_PARM_DESC(force, "Load even if co
 
 static char *keymap_name; /* = NULL; */
 module_param_named(keymap, keymap_name, charp, 0);
-MODULE_PARM_DESC(keymap, "Keymap name, if it can't be autodetected");
+MODULE_PARM_DESC(keymap, "Keymap name, if it can't be autodetected [generic, 1557/MS2141]");
 
 static struct platform_device *wistron_device;
 
@@ -562,6 +562,42 @@ static struct key_entry keymap_wistron_m
 	{ KE_END, FE_UNTESTED }
 };
 
+static struct key_entry keymap_wistron_generic[] = {
+	{ KE_KEY, 0x01, KEY_HELP },
+	{ KE_KEY, 0x02, KEY_CONFIG },
+	{ KE_KEY, 0x03, KEY_POWER },
+	{ KE_KEY, 0x05, KEY_MEDIA }, /* Display switch */
+	{ KE_KEY, 0x06, KEY_SCREEN }, /* Display on/off */
+	{ KE_KEY, 0x08, KEY_MUTE },
+	{ KE_KEY, 0x11, KEY_PROG1 },
+	{ KE_KEY, 0x12, KEY_PROG2 },
+	{ KE_KEY, 0x13, KEY_PROG3 },
+	{ KE_KEY, 0x14, KEY_MAIL },
+	{ KE_KEY, 0x15, KEY_WWW },
+	{ KE_KEY, 0x20, KEY_VOLUMEUP },
+	{ KE_KEY, 0x21, KEY_VOLUMEDOWN },
+	{ KE_KEY, 0x22, KEY_REWIND },
+	{ KE_KEY, 0x23, KEY_FORWARD },
+	{ KE_KEY, 0x24, KEY_PLAYPAUSE },
+	{ KE_KEY, 0x25, KEY_STOPCD },
+	{ KE_KEY, 0x31, KEY_MAIL },
+	{ KE_KEY, 0x36, KEY_WWW },
+	{ KE_KEY, 0x37, KEY_SCREEN }, /* Display on/off */
+	{ KE_KEY, 0x40, KEY_WLAN },
+	{ KE_KEY, 0x49, KEY_CONFIG },
+	{ KE_KEY, 0x4a, KEY_CLOSE }, /* lid close */
+	{ KE_KEY, 0x4b, KEY_OPEN }, /* lid open */
+	{ KE_KEY, 0x6a, KEY_CONFIG },
+	{ KE_KEY, 0x6d, KEY_POWER },
+	{ KE_KEY, 0x71, KEY_STOPCD },
+	{ KE_KEY, 0x72, KEY_PLAYPAUSE },
+	{ KE_KEY, 0x74, KEY_REWIND },
+	{ KE_KEY, 0x78, KEY_FORWARD },
+	{ KE_WIFI, 0x30, 0 },
+	{ KE_BLUETOOTH, 0x44, 0 },
+	{ KE_END, 0 }
+};
+
 /*
  * If your machine is not here (which is currently rather likely), please send
  * a list of buttons and their key codes (reported when loading this module
@@ -880,15 +916,17 @@ static struct dmi_system_id dmi_ids[] __
 
 static int __init select_keymap(void)
 {
+	dmi_check_system(dmi_ids);
 	if (keymap_name != NULL) {
 		if (strcmp (keymap_name, "1557/MS2141") == 0)
 			keymap = keymap_wistron_ms2141;
+		else if (strcmp (keymap_name, "generic") == 0)
+			keymap = keymap_wistron_generic;
 		else {
 			printk(KERN_ERR "wistron_btns: Keymap unknown\n");
 			return -EINVAL;
 		}
 	}
-	dmi_check_system(dmi_ids);
 	if (keymap == NULL) {
 		if (!force) {
 			printk(KERN_ERR "wistron_btns: System unknown\n");


[PATCH 0/2] wistron_btns: More keymaps

2007-03-13 Thread Eric Piel

Hello,

As a sequel to my patch "Wistron button support for TravelMate 610" of 
last week, here is a bigger addition of keymaps for the wistron_btns.


Patch 1 adds all the database of acerhk which fits this driver (about 25 
more laptops).
Patch 2 adds a generic map that should fit most users but has the 
disadvantage of not being automatic.


Dmitry, I've tried to make them against your tree. Still, if they don't 
apply cleanly, just tell me and I'll try harder!


See you,
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread Nish Aravamudan

On 3/13/07, David Miller <[EMAIL PROTECTED]> wrote:

From: "Nish Aravamudan" <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 14:58:24 -0700

> On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:
> > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote:
> > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
> > > It contains a number of bugfixes and all 2.6.20 users are recommended to
> > > upgrade.
> > >
> > > The diffstat and short summary of the fixes are below.
> > >
> > > I'll also be replying to this message with a copy of the patch between
> > > 2.6.20.2 and 2.6.20.3.
> >
> > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get:
>
> err, duh -- this is a Sun Ultra 60, debian testing install.

Figure out if 2.6.20.2 does it too, then please try to git bisect
it down further.


Yep, that's the plan, just wanted to make folks aware.


I took a quick look and the two sparc64 commits between 2.6.20.1
and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld
fix which is for a driver for hardware your ultra60 doesn't have. :)

There is a decent amount of raid and nfs fixes in here, do you
use either?


Neither.


Another commit that might be relevant is:

commit 530b09160744a12450fdacb2b78779c9830a29c8
Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]>
Date:   Thu Mar 1 19:02:55 2007 -0500

tty_io: fix race in master pty close/slave pty close path

Hmmm...

Please let us know if you can narrow it down further.


Building 2.6.20.2 right now, will let you know.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux

2007-03-13 Thread Lennart Sorensen
On Tue, Mar 13, 2007 at 10:38:43PM +0100, Rodolfo Giometti wrote:
> here my new patch for PPS support in Linux.
> 
> I tried to follow your suggestions as much possible! Please let me
> know if this new version could be more acceptable.

I have tried out 3.0.0-rc2 which seems to work pretty well so far (when
combined with the patches to the jsm driver I just posted).  It took soe
work to get ntp's refclock_nmea to work though, since the patch that is
linked to from the linuxpps page seems out of date.  Here is the patch
that seems to be working for me, although I am still testing it.  Given
you know the linuxpps code better perhaps you can see if it looks sane
to you.

--- ntpd/refclock_nmea.c.ori2007-03-13 18:38:01.0 -0400
+++ ntpd/refclock_nmea.c2007-03-13 18:44:47.0 -0400
@@ -79,6 +79,7 @@
 #define RANGEGATE  50  /* range gate (ns) */
 
 #define LENNMEA75  /* min timecode length */
+#define LENPPS PPS_MAX_NAME_LEN
 
 /*
  * Tables to compute the ddd of year form icky dd/mm timecode. Viva la
@@ -99,6 +100,7 @@
pps_params_t pps_params; /* pps parameters */
pps_info_t pps_info;/* last pps data */
pps_handle_t handle;/* pps handlebars */
+   int handle_created; /* pps handle created flag */
 #endif /* HAVE_PPSAPI */
 };
 
@@ -147,6 +149,11 @@
register struct nmeaunit *up;
struct refclockproc *pp;
int fd;
+#ifdef PPS_HAVE_FINDPATH
+   char id[LENPPS] = "",
+path[LENPPS],
+mylink[LENPPS] = "";/* just a default device */
+#endif /* PPS_HAVE_FINDPATH */
char device[20];
 
/*
@@ -201,7 +208,20 @@
 #else
 return (0);
 #endif
-}
+} else {
+struct serial_struct  ss;
+if (ioctl(fd, TIOCGSERIAL, &ss) < 0 ||
+(
+ss.flags |= ASYNC_HARDPPS_CD,
+ ioctl(fd, TIOCSSERIAL, &ss)) < 0) {
+ msyslog(LOG_NOTICE, "refclock_nmea: TIOCSSERIAL fd %d, %m", 
fd);
+ msyslog(LOG_NOTICE,
+ "refclock_nmea: optional PPS processing not 
available");
+} else {
+msyslog(LOG_INFO,
+"refclock_nmea: PPS detection on");
+}
+   }
 
/*
 * Allocate and initialize unit structure
@@ -238,12 +258,26 @@
 * Start the PPSAPI interface if it is there. Default to use
 * the assert edge and do not enable the kernel hardpps.
 */
+#ifdef PPS_HAVE_FINDPATH
+   /* Get the PPS source's real name */
+   //time_pps_readlink(mylink, LENPPS, path, LENPPS);
+   time_pps_readlink(device, LENPPS, path, LENPPS);
+
+   /* Try to find the source */
+   fd = time_pps_findpath(path, LENPPS, id, LENPPS);
+   if (fd < 0) {
+   msyslog(LOG_ERR, "refclock_nmea: cannot find PPS path \"%s\" in 
the system", path);
+   return (0);
+   }
+   msyslog(LOG_INFO, "refclock_nmea: found PPS source \"%s\" at id #%d on 
\"%s\"", path, fd, id);
+#endif /* PPS_HAVE_FINDPATH */
if (time_pps_create(fd, &up->handle) < 0) {
-   up->handle = 0;
+   up->handle_created = 0;
msyslog(LOG_ERR,
"refclock_nmea: time_pps_create failed: %m");
return (1);
}
+   up->handle_created = ~0;
return(nmea_ppsapi(peer, 0, 0));
 #else
return (1);
@@ -265,8 +299,10 @@
pp = peer->procptr;
up = (struct nmeaunit *)pp->unitptr;
 #ifdef HAVE_PPSAPI
-   if (up->handle != 0)
+   if (up->handle_created) {
time_pps_destroy(up->handle);
+   up->handle_created = 0;
+   }
 #endif /* HAVE_PPSAPI */
io_closeclock(&pp->io);
free(up);
@@ -374,7 +410,7 @@
/*
 * Convert the timespec nanoseconds field to ntp l_fp units.
 */ 
-   if (up->handle == 0)
+   if (!up->handle_created)
return (0);
timeout.tv_sec = 0;
timeout.tv_nsec = 0;

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [5/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Tomáš Janoušek
Hi,

On Tue, Mar 13, 2007 at 04:56:24PM +0100, Tomáš Janoušek wrote:
> On Tue, Mar 13, 2007 at 04:51:39PM +0100, [EMAIL PROTECTED] wrote:
> > Can you please try to compile without nohz and without hrtimers and try it
> > again?
>
> A colleage told me to try this yesterday and if I remember correctly, it did
> not help. I may try it again because I'm not sure whether it wasn't some

Ok, this was bullshit. Nohz and hrtimers turned off really solve the issue with
having to press keys.

Seems like the yesterday's check was for the other issue and I just pressed
the keys automatically, remembering that I had to.

Sorry,
-- 
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: module.h and moduleparam.h: more header file pedantry

2007-03-13 Thread Robert P. J. Day
On Wed, 14 Mar 2007, Alexey Dobriyan wrote:

> On Mon, Mar 12, 2007 at 12:59:20PM -0400, Robert P. J. Day wrote:
> >   to my surprise, i learned only today that module.h includes
> > moduleparam.h, which flies in the face of all of the documentation
> > i've ever read which was adamant that i *had* to include moduleparam.h
> > if i was using parameters. i'm guessing this comes as a surprise to
> > the 400+ header files which include both unnecessarily.
> >
> >   so ... in a perfect world, should a module source file that doesn't
> > use parameters *at all* need to include moduleparam.h?
>
> Probably not.
>
> > as it stands
> > now, yes, it does, given some ugly inter-dependencies between the two
> > files.
> >
> >   so, at the very least, programmers can stop including moduleparam.h,
> > unless there's a cleaner way to do all that.
>
> Regardless, of what you'll do: cross-compile test!
>
> After aforementioned removal and adding "struct kernel_param;"
>
> + akmk arm-assabet -k
>   CHK include/linux/version.h
> make[2]: `include/asm-arm/mach-types.h' is up to date.
>   Using /home/linux/linux-irq-flags-t as source for kernel
>   GEN /home/linux/build/arm-assabet/Makefile
>   CHK include/linux/utsrelease.h
>   CHK include/linux/compile.h
>   CC  arch/arm/nwfpe/fpmodule.o
> arch/arm/nwfpe/fpmodule.c:179: error: syntax error before string constant
> arch/arm/nwfpe/fpmodule.c:179: warning: type defaults to `int' in declaration 
> of `__MODULE_INFO'
> arch/arm/nwfpe/fpmodule.c:179: warning: function declaration isn't a prototype
> arch/arm/nwfpe/fpmodule.c:179: warning: data definition has no type or 
> storage class

oh, i've already been by that and figured out what's going on.  i'm
going to summarize this on the KJ wiki.  it's really quite the mess.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


jsm driver fix for linuxpps support

2007-03-13 Thread Lennart Sorensen
The jsm driver doesn't currently use the uart_handle_*_change helper
functions, which are the obvious place for things like linuxpps to tie
into (which it now does of course), and as a result the jsm driver can
not be used with linuxpps and anything else that ties into the
serial_core helper functions.  This patch adds calls to these helper
functions whenever the value they manage changes.  That actual storage
of the state is not modified since the jsm driver caches the current
settings (The 8250 driver reads them everytime a user asks for the
state), and only updates them whenever they change.

Signed-off-by: Len Sorensen <[EMAIL PROTECTED]>

--- a/drivers/serial/jsm/jsm_neo.c  2007-03-01 10:31:28.0 -0500
+++ b/drivers/serial/jsm/jsm_neo.c  2007-03-01 10:18:16.0 -0500
@@ -592,8 +592,13 @@
return;
 
/* Scrub off lower bits. They signify delta's, which I don't care about 
*/
-   msignals &= 0xf0;
+   /* Keep DDCD and DDSR though */
+   msignals &= 0xf8;
 
+   if (msignals & UART_MSR_DDCD)
+   uart_handle_dcd_change(&ch->uart_port, msignals & UART_MSR_DCD);
+   if (msignals & UART_MSR_DDSR)
+   uart_handle_cts_change(&ch->uart_port, msignals & UART_MSR_CTS);
if (msignals & UART_MSR_DCD)
ch->ch_mistat |= UART_MSR_DCD;
else
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Small fixes for jsm driver

2007-03-13 Thread Lennart Sorensen
The jsm driver fails when you try to use the TIOCSSERIAL ioctl.  The
reason is that the driver never sets uart_port.uartclk, causing the data
received using TIOCGSERIAL to not match the internal state of the driver.
This patch fixes this problem by settings the uartclk to the value used
by the serial_core (16 times the baud base).

Signed-off-by: Len Sorensen <[EMAIL PROTECTED]>

--- a/drivers/serial/jsm/jsm_tty.c  2007-03-13 15:53:39.0 -0400
+++ b/drivers/serial/jsm/jsm_tty.c  2007-03-13 15:55:15.0 -0400
@@ -471,6 +471,7 @@
continue;
 
brd->channels[i]->uart_port.irq = brd->irq;
+   brd->channels[i]->uart_port.uartclk = 14745600;
brd->channels[i]->uart_port.type = PORT_JSM;
brd->channels[i]->uart_port.iotype = UPIO_MEM;
brd->channels[i]->uart_port.membase = brd->re_map_membase;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-13 Thread Peter Chubb
> "Jeremy" == Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:


Jeremy> And do the same in pte pages for actual mapped pages?  Or do
Jeremy> you think they would be too densely populated for it to be
Jeremy> worthwhile?

We've been doing some measurements on how densely clumped ptes are.
On 32-bit platforms, they're pretty dense.  On IA64, quite a bit
sparser, depending on the workload of course.  I think that's mostly because
of the larger pagesize on IA64 -- with 64k pages, you don't need very
many to map a small object.

I'm hoping IanW can give more details.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread David Miller
From: "Nish Aravamudan" <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 14:58:24 -0700

> On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:
> > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote:
> > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
> > > It contains a number of bugfixes and all 2.6.20 users are recommended to
> > > upgrade.
> > >
> > > The diffstat and short summary of the fixes are below.
> > >
> > > I'll also be replying to this message with a copy of the patch between
> > > 2.6.20.2 and 2.6.20.3.
> >
> > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get:
> 
> err, duh -- this is a Sun Ultra 60, debian testing install.

Figure out if 2.6.20.2 does it too, then please try to git bisect
it down further.

I took a quick look and the two sparc64 commits between 2.6.20.1
and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld
fix which is for a driver for hardware your ultra60 doesn't have. :)

There is a decent amount of raid and nfs fixes in here, do you
use either?

Another commit that might be relevant is:

commit 530b09160744a12450fdacb2b78779c9830a29c8
Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]>
Date:   Thu Mar 1 19:02:55 2007 -0500

tty_io: fix race in master pty close/slave pty close path

Hmmm...

Please let us know if you can narrow it down further.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


FF layer restrictions [Was: [PATCH 1/1] Input: add sensable phantom driver]

2007-03-13 Thread Jiri Slaby

Why did you remove all Cced people? Anyway I filtered some of them out

johann deneux napsal(a):

You are right, the direction in ff_effect is meant to be an angle.
A dirty solution would be to use the 16 bits as two 8-bits angles. Or 


That would be a problem as I need 3x 16bits.

maybe we should change the API. I don't think there are many 
applications using force feedback yet, so maybe that should be ok?


If we change the API, we should remove the assumption that a device has 
at most two axes to render effects. We could for instance have a 
magnitude argument for each axis which is capable of rendering effects. 
That might be necessary even for more common gaming devices like racing 
wheels: One can think pedals could also be capable of force feedback 
some day, not just the steering wheel.


I can do that, but in that case, I need to know how people (especially those 
input one) want me to do...


regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] iproute2 2.6.20-070313

2007-03-13 Thread Stephen Hemminger
This is an experimental to the iproute2 command set.

The version number includes the kernel version to denote what features are
supported. The same source should build on older systems, but obviously the
newer kernel features won't be available. As much as possible, this package
tries to be source compatible across releases.

It can be downloaded from:
  http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.20-070313.tar.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

For more info on iproute2 see:
  http://linux-net.osdl.org/index.php/Iproute2

Changes:

Jamal Hadi Salim:
  update rest to use nl_mgrp
  nl_mgrp to crap if base multicast groups exceeded
  Old bug on tc

Mike Frysinger:
  do not ignore build failures in subdirs of iproute2

Noriaki TAKAMIYA:
  enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from 
ip.

Patrick McHardy:
  tbf: fix latency printing
  Use tc_calc_xmittime() where appropriate
  Introduce tc_calc_xmitsize and use where appropriate
  Introduce TIME_UNITS_PER_SEC to represent internal clock resolution
  Replace "usec" by "time" in function names
  Add sprint_ticks() function and use in CBQ
  Handle different kernel clock resolutions
  Increase internal clock resolution to nsec

Stephen Hemminger:
  netem use read/write for changes
  fix tc-pfifo and tc-bfifo man pages
  iptables library fix
  TC bfifo man page
  Use kernel headers from 2.6.20.y

Thomas Hisch:
  Fixes use of uninitialized string

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-13 Thread David Chinner
On Tue, Mar 13, 2007 at 09:21:59AM +0100, Miklos Szeredi wrote:
> > > read request
> > > sys_write
> > >   mutex_lock(i_mutex)
> > >   ...
> > >  balance_dirty_pages
> > > submit write requests
> > > loop ... write requests completed ... dirty still over limit ... 
> > >   ... loop forever
> > 
> > Hmmm - the situation in balance_dirty_pages() after an attempt
> > to writeback_inodes(&wbc) that has written nothing because there
> > is nothing to write would be:
> > 
> > wbc->nr_write == write_chunk &&
> > wbc->pages_skipped == 0 &&
> > wbc->encountered_congestion == 0 &&
> > !bdi_congested(wbc->bdi)
> > 
> > What happens if you make that an exit condition to the loop?
> 
> That's almost right.  The only problem is that even if there's no
> congestion, the device queue can be holding a great amount of yet
> unwritten pages.  So exiting on this condition would mean, that
> dirty+writeback could go way over the threshold.

Only if the queue depth is not bound. Queue depths are bound and so
the distance we can go over the threshold is limited.  This is the
fundamental principle on which the throttling is based.

Hence, if the queue is not full, then we will have either written
dirty pages to it (i.e wbc->nr_write != write_chunk so we will throttle
or continue normally if write_chunk was written) or we have no more
dirty pages left.

Having no dirty pages left on the bdi and it not being congested
means we effectively have a clean, idle bdi. We should not be trying
to throttle writeback here - we can't do anything to improve the
situation by continuing to try to do writeback on this bdi, so we
may as well give up and let the writer continue. Once we have dirty
pages on the bdi, we'll get throttled appropriately.

The point I'm making here is that if the bdi is not congested, any
pages dirtied on that bdi can be cleaned _quickly_ and so writing
more pages to it isn't a big deal even if we are over the global
dirty threshold.

Remember, the global dirty threshold is not really a hard limit -
it's a threshold at which we change behaviour. Throttling idle bdi's
does not contribute usefully to reducing the number of dirty pages
in the system; all it really does is deny service to devices that could
otherwise be doing useful work.

> How much this would be a problem?  I don't know, I guess it depends on
> many things: how many queues, how many requests per queue, how many
> bytes per request.

Right, and most ppl don't have enough devices in their system for
this to be a problem. Even those of us that do have enough devices
for this to potentially be a problem usually have enough RAM in
the machine so that it is not a problem

> > Or alternatively, adding another bit to the wbc structure to
> > say "there was nothing to do" and setting that if we find
> > list_empty(&sb->s_dirty) when trying to flush dirty inodes."
> > 
> > [ FWIW, this may also solve another problem of fast block devices
> > being throttled incorrectly when a slow block dev is consuming
> > all the dirty pages... ]
> 
> There may be a patch floating around, which I think basically does
> this, but only as long as the dirty+writeback are over a soft limit,
> but under the hard limit.
> 
> When over the the hard limit, balance_dirty_pages still loops until
> dirty+writeback go below the threshold.

The difference between the two methods is that if there is any hard
limit that results in balance_dirty_pages looping then you have a
potential deadlock.  Hence the soft+hard limits will reduce the
occurrence but not remove the deadlock. Breaking out of the loop
when there is nothing to do simply means we'll reenter again
with something to do very shortly (and *then* throttle) if the
process continues to write.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-13 Thread Mark Lord

Con Kolivas wrote:

On Wednesday 14 March 2007 05:21, Mark Lord wrote:

Con Kolivas wrote:

Can you try the new version of RSDL. Assuming it doesn't oops on you it
has some accounting bugfixes which may have been biting you.

Retesting today with 2.6.21-rc3-git7 + 2.6.21-rc3-sched-rsdl-0.30.patch.

Still not pleasant to use the GUI with a kernel build (-j1 or -j2)
happening unless the build is manually "nice'd".

Also, accounting looks weird in top(1).

With a 100% busy machine, top will show something like this :

top - 14:20:11 up 10:22,  1 user,  load average: 2.65, 2.80, 2.18
Tasks: 134 total,   4 running, 128 sleeping,   0 stopped,   2 zombie
Cpu(s): 68.7% us,  6.7% sy, 24.7% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0%
si Mem:   2076964k total,  2002560k used,74404k free,   148924k
buffers Swap:  2409740k total,  244k used,  2409496k free,  1448876k
cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 1824 root  36  10 11748 7244 1936 R  4.0  0.3   0:00.12 cc1
 1845 root  31   0  8080 5272 1412 R  1.7  0.3   0:00.05 cc1
 4139 root  20   0  176m  35m 6860 S  1.3  1.7  18:59.35 Xorg
29381 root  20   0 33712  16m  12m R  1.0  0.8   0:27.24 konsole
3 root  20   0 000 S  0.3  0.0   0:00.49 events/0
 1529 root  20   0  2556 1460  752 S  0.3  0.1   0:00.05 make
14623 root  20   0  2200 1144  860 R  0.3  0.1   0:00.89 top
1 root  20   0  1568  532  464 S  0.0  0.0   0:00.22 init
2 root  39  19 000 S  0.0  0.0   0:00.01 ksoftirqd/0
4 root  20   0 000 S  0.0  0.0   0:00.00 khelper
5 root  20   0 000 S  0.0  0.0   0:00.00 kthread

Mmm.. I wonder where all of that 100% CPU went to.. the busiest tasks
are only showing up as 4.0% and 1.7% (when in fact they are using near
100%).


Nothing ever looks like it stays running for very long. That would be enough 
to account for this sort of top picture.


Sorry, I just don't buy that one.  This was a 2-second sampling interval in top.
top(1) is a program that has to work, so if this scheduler breaks it like this,
then we need to understand and fix top(1) or the scheduler.


What HZ are you running? Do you usually run two makes at different nice levels?


This was HZ=1000, with NO_HZ.  And, no, not normally different nice levels.
Here I was just trying to keep the machine usable while building a couple of 
things.

Keep at it.  Someday this might be good enough for mainline,
but right now the stock scheduler beats it for my desktop (notebook) loads.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: module.h and moduleparam.h: more header file pedantry

2007-03-13 Thread Alexey Dobriyan
On Mon, Mar 12, 2007 at 12:59:20PM -0400, Robert P. J. Day wrote:
>   to my surprise, i learned only today that module.h includes
> moduleparam.h, which flies in the face of all of the documentation
> i've ever read which was adamant that i *had* to include moduleparam.h
> if i was using parameters. i'm guessing this comes as a surprise to
> the 400+ header files which include both unnecessarily.
>
>   so ... in a perfect world, should a module source file that doesn't
> use parameters *at all* need to include moduleparam.h?

Probably not.

> as it stands
> now, yes, it does, given some ugly inter-dependencies between the two
> files.
>
>   so, at the very least, programmers can stop including moduleparam.h,
> unless there's a cleaner way to do all that.

Regardless, of what you'll do: cross-compile test!

After aforementioned removal and adding "struct kernel_param;"

+ akmk arm-assabet -k
  CHK include/linux/version.h
make[2]: `include/asm-arm/mach-types.h' is up to date.
  Using /home/linux/linux-irq-flags-t as source for kernel
  GEN /home/linux/build/arm-assabet/Makefile
  CHK include/linux/utsrelease.h
  CHK include/linux/compile.h
  CC  arch/arm/nwfpe/fpmodule.o
arch/arm/nwfpe/fpmodule.c:179: error: syntax error before string constant
arch/arm/nwfpe/fpmodule.c:179: warning: type defaults to `int' in declaration 
of `__MODULE_INFO'
arch/arm/nwfpe/fpmodule.c:179: warning: function declaration isn't a prototype
arch/arm/nwfpe/fpmodule.c:179: warning: data definition has no type or storage 
class

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stolen and degraded time and schedulers

2007-03-13 Thread Jeremy Fitzhardinge
Daniel Walker wrote:
> The frequency tracking you mention is done to some extent inside the
> timekeeping adjustment functions, but I'm not sure it's totally accurate
> for non-timekeeping, and it also tracks things like interrupt latency.
> Tracking frequency changes where it's important to get it right
> shouldn't be done I think ..
>
> If you want accurate time accounting, don't use the TSC .
>   

I'm not sure I follow you here.  Clocksources have the means to adjust
the rate of time progression, mostly to warp the time for things like
ntp.  The stability or otherwise of the tsc is irrelevant.

If you had a clocksource which was explicitly using the rate at which a
CPU does work as a timebase, then using the same warping mechanism would
allow you to model CPU speed changes.

> The sched_clock interface is basically a stripped down clocksource..
> I've implemented sched_clock as a clocksource in the past ..
>   

Yes, that works.  But a clocksource is strictly about measuring the
progression of real time, and so doesn't generally measure how much work
a CPU has done.

>> We currently have a sched_clock interface in paravirt_ops to deal with
>> the hypervisor aspect.  It only occurred to me this morning that cpufreq
>> presents exactly the same problem to the rest of the kernel, and so
>> there's room for a more general solution.
>> 
>
> Are there other architecture which have this per-cpu clock frequency
> changing issue? I worked with several other architectures beyond just
> x86 and haven't seen this issue ..

Well, lots of cpus have dynamic frequencies.  Any scheduler which
maintains history will suffer the same problem, even on UP.  If
processes A and B are supposed to have the same priority and they both
execute for 1ms of real time, did they make the same amount of
progress?  Not if the cpu changed speed in between.

And any system which commonly runs virtualized (s390, power, etc) will
need to deal with the notion of stolen time.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread Nish Aravamudan

On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote:

We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
It contains a number of bugfixes and all 2.6.20 users are recommended to
upgrade.

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between
2.6.20.2 and 2.6.20.3.


Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get:

[  199.361347] BUG: soft lockup detected on CPU#2!

smp_percpu_timer_interrupt+0xd4/0x180
tl0_irq14+0x1c/0x20
journal_add_journal_head+0x2c/0x1e0
journal_write_metadata_buffer+0x480/0x500
journal_commit_transaction+0xc38/0x1040
kjournald+0xc0/0x1e0
kthread+0xb0/0xc0
kernel_thread+0x38/0x60
keventd_create_kthread+0x20/0xa0

shortly after the serial console prompts for login.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.3

2007-03-13 Thread Nish Aravamudan

On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:

On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote:
> We (the -stable team) are announcing the release of the 2.6.20.3 kernel.
> It contains a number of bugfixes and all 2.6.20 users are recommended to
> upgrade.
>
> The diffstat and short summary of the fixes are below.
>
> I'll also be replying to this message with a copy of the patch between
> 2.6.20.2 and 2.6.20.3.

Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get:


err, duh -- this is a Sun Ultra 60, debian testing install.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >