Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith
On Tue, 2007-03-13 at 16:53 +1100, Con Kolivas wrote:
> On Tuesday 13 March 2007 16:10, Mike Galbraith wrote:

> > It's not "offensive" to me, it is a behavioral regression.  The
> > situation as we speak is that you can run cpu intensive tasks while
> > watching eye-candy.  With RSDL, you can't, you feel the non-interactive
> > load instantly.  Doesn't the fact that you're asking me to lower my
> > expectations tell you that I just might have a point?
> 
> Yet looking at the mainline scheduler code, nice 5 tasks are also supposed to 
> get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% cpu 
> with a fully cpu bound task in the presence of an interactive task.

(One more comment before I go.  You can then have the last word this
time, promise :)

Because the interactivity logic, which was put there to do precisely
this, is doing it's job?

>  To me 
> that means mainline is not living up to my expectations. What you're saying 
> is your expectations are based on a false cpu expectation from nice 5. You 
> can spin it both ways.

Talk about spin, you turn an example of the current scheduler working
properly into a negative attribute, and attempt to discredit me with it.

The floor is yours.  No reply will be forthcoming.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-12 Thread Andrew Morton
> On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> Page table pages have the characteristics that they are typically zero
> or in a known state when they are freed.

Well if they're zero then perhaps they should be released to the page allocator
to satisfy the next __GFP_ZERO request.  If that request is for a pagetable
page, we break even (except we get to remove special-case code).  If that
__GFP_ZERO allocation was or some application other than for a pagetable, we
win.

iow, can we just nuke 'em?

(Will require some work in the page allocator)
(That work will open the path to using the idle thread to prezero pages)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


LSM Stacking

2007-03-12 Thread JanuGerman
Hi All,

Within the security folder in the kernel tree, the
2.6.20 linux kernel distribution is shipped with a
file root_plug.c (written by Greg Kroah-Hartman),
which is a classic introduction to Linux Security
Modules (LSM). The folder also contains the folder of
SELinux.

My question is that whether root_plug.c security
module is stacked with the SELinux security module or
not. If root_plug.c is stacked, where i can find the
code which handles the stacking of SELinux and
root_plug.c within the kernel.

Further, any pointer to stacking mechansims in Linux
2.6.* kernel will be highly appreciated.

Thanking you in advance,
MA






___ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-12 Thread David Miller
From: "Michael K. Edwards" <[EMAIL PROTECTED]>
Date: Mon, 12 Mar 2007 23:25:48 -0800

> Quality means the devices you ship now keep working in the field, and
> the probable cost of later rework if the requirements change does not
> exceed the opportunity cost of over-engineering up front.  Economy
> gets a look-in too, and says that it's pointless to delay shipment and
> bloat the application coding for cases that can't happen.  If POSIX
> says that any and all writes (except small pipe/FIFO writes, whatever)
> can return a short byte count -- but you know damn well you're writing
> to a device driver that never, ever writes short, and if it did you'd
> miss a timing budget recovering from it anyway -- to hell with POSIX.

You're not even safe over standard output, simply run the program over
ssh and you suddenly have socket semantics to deal with.

In the early days the fun game to play was to run programs over rsh to
see in what amusing way they would explode.  ssh has replaced rsh in
this game, but the bugs have largely stayed the same.

Even early versions of tar used to explode on TCP half-closes and
whatnot.

In short, if you don't handle short writes, you're writing a program
for something other than unix.

We're not changing write() to interlock with other parallel callers or
messing with the f_pos semantics in such cases, that's stupid, please
cope, kthx.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-12 Thread Michael K. Edwards

On 3/12/07, Alan Cox <[EMAIL PROTECTED]> wrote:

> Writing to a file from multiple processes is not usually the problem.
> Writing to a common "struct file" from multiple threads is.

Not normally because POSIX sensibly invented pread/pwrite. Forgot
preadv/pwritev but they did the basics and end of problem


pread/pwrite address a miniscule fraction of lseek+read(v)/write(v)
use cases -- a fraction that someone cared about strongly enough to
get into X/Open CAE Spec Issue 5 Version 2 (1997), from which it
propagated into UNIX98 and thence into POSIX.2 2001.  The fact that no
one has bothered to implement preadv/pwritev in the decade since
pread/pwrite entered the Single UNIX standard reflects the rarity with
which they appear in general code.  Life is too short to spend it
rewriting application code that uses readv/writev systematically,
especially when that code is going to ship inside a widget whose
kernel you control.


> So what?  My products are shipping _now_.

That doesn't inspire confidence.


Oh, please.  Like _your_ employer is the poster child for code
quality.  The cheap shot is also irrelevant to the point that I was
making, which is that sometimes portability simply doesn't matter and
the right thing to do is to firm up the semantics of the filesystem
primitives from underneath.


> even funny.  If POSIX mandates stupid shit, and application
> programmers don't read that part of the manual anyway (and don't code
> on that assumption in practice), to hell with POSIX.  On many file

Thats funny, you were talking about quality a moment ago.


Quality means the devices you ship now keep working in the field, and
the probable cost of later rework if the requirements change does not
exceed the opportunity cost of over-engineering up front.  Economy
gets a look-in too, and says that it's pointless to delay shipment and
bloat the application coding for cases that can't happen.  If POSIX
says that any and all writes (except small pipe/FIFO writes, whatever)
can return a short byte count -- but you know damn well you're writing
to a device driver that never, ever writes short, and if it did you'd
miss a timing budget recovering from it anyway -- to hell with POSIX.
And if you want to build a test jig for this code that uses pipes or
dummy files in place of the device driver, that test jig should never,
ever write short either.


> descriptors, short writes simply can't happen -- and code that

There is almost no descriptor this is true for. Any file I/O can and will
end up short on disk full or resource limit exceeded or quota exceeded or
NFS server exploded or ...


Not on a properly engineered widget, it won't.  And if it does, and
the application isn't coded to cope in some way totally different from
an infinite retry loop, then you might as well signal the exception
condition using whatever mechanism is appropriate to the API
(-EWHATEVER, SIGCRISIS, or block until some other process makes room).
And in any case files on disk are the least interesting kind of file
descriptor in an embedded scenario -- devices and pipes and pollfds
and netlink sockets are far more frequent read/write targets.


And on the device side about the only thing with the vaguest guarantees
is pipe().


Guaranteed by the standard, sure.  Guaranteed by the implementation,
as long as you write in the size blocks that the device is expecting?
Lots of devices -- ALSA's OSS PCM emulation, most AF_LOCAL and
AF_NETLINK sockets, almost any "character" device with a
record-structured format.  A short write to any of these almost
certainly means the framing is screwed and you need to close and
reopen the device.  Not all of these are exclusively O_APPEND
situations, and there's no reason on earth not to thread-safe the
f_pos handling so that an application and filesystem/driver can agree
on useful lseek() semantics.


> purports to handle short writes but has never been exercised is
> arguably worse than code that simply bombs on short write.  So if I
> can't shim in an induce-short-writes-randomly-on-purpose mechanism
> during development, I don't want short writes in production, period.

Easy enough to do and gcov plus dejagnu or similar tools will let you
coverage analyse the resulting test set and replay it.


Here we agree.  Except that I've rarely seen embedded application code
that wouldn't explode in my face if I tried it.  Databases yes, and
the better class of mail and web servers, and relatively mature
scripting languages and bytecode interpreters; but the vast majority
of working programmers in these latter days do not exercise this level
of discipline.


> Sure -- until the one code path in a hundred that handles the "short
> write" case incorrectly gets traversed in production, after having
> gone untested in a development environment that used a different
> filesystem that never happened to trigger it.

Competent QA and testing people test all the returns in the manual as
well as all the returns they can find in the cod

Re: RSDL-mm 0.28

2007-03-12 Thread Nick Piggin

David Schwartz wrote:

There's a substantial performance hit for not yield, so we probably
want to investigate alternate semantics for it. It seems reasonable
for apps to say "let me not hog the CPU" without completely expiring
them. Imagine you're in the front of the line (aka queue) and you
spend a moment fumbling for your wallet. The polite thing to do is to
let the next guy in front. But with the current sched_yield, you go
all the way to the back of the line.




Well... are you advocating we change sched_yield semantics to a
gentler form? This is a cinch to implement but I know how Ingo feels
about this. It will only encourage more lax coding using sched_yield
instead of proper blocking (see huge arguments with the ldap people on
this one who insist it's impossible not to use yield).



The basic point of sched_yield is to allow every other process at the same
static priority level a chance to use the CPU before you get it back. It is
generally an error to use sched_yield to be nice. It's nice to get your work
done when the scheduler gives you the CPU, that's why it gave it to you.

It is proper to use sched_yield as an optimization when it more efficient to
allow another process/thread to run than you, for example, when you
encounter a task you cannot do efficiently at that time because another
thread holds a lock.

It's also useful prior to doing something that can most efficiently be done
without interruption. So a thread that returns from 'sched_yield' should
ideally be given a full timeslice if possible. This may not be sensible if
the 'sched_yield' didn't actuall yield, but then again, if nothing else
wants to run, why not give the only task that does a full slice?

In no case is much of anything guaranteed, of course. (What can you do if
there's no other process to yield to?)

Note that processes that call sched_yield should be rewarded for doing so
just as process that block on I/O are, assuming they do in fact wind up
giving up the CPU when they would otherwise have had it.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-12 Thread Pavel Emelianov
Herbert Poetzl wrote:
> On Mon, Mar 12, 2007 at 12:02:01PM +0300, Pavel Emelianov wrote:
> Maybe you have some ideas how we can decide on this?
 We need to work out what the requirements are before we can 
 settle on an implementation.
>>> Linux-VServer (and probably OpenVZ):
>>>
>>>  - shared mappings of 'shared' files (binaries 
>>>and libraries) to allow for reduced memory
>>>footprint when N identical guests are running
>> This is done in current patches.
> 
> nice, but the question was about _requirements_
> (so your requirements are?)
> 
>>>  - virtual 'physical' limit should not cause
>>>swap out when there are still pages left on
>>>the host system (but pages of over limit guests
>>>can be preferred for swapping)
>> So what to do when virtual physical limit is hit?
>> OOM-kill current task?
> 
> when the RSS limit is hit, but there _are_ enough
> pages left on the physical system, there is no
> good reason to swap out the page at all
> 
>  - there is no benefit in doing so (performance
>wise, that is)
> 
>  - it actually hurts performance, and could
>become a separate source for DoS
> 
> what should happen instead (in an ideal world :)
> is that the page is considered swapped out for
> the guest (add guest penality for swapout), and 

Is the page stays mapped for the container or not?
If yes then what's the use of limits? Container mapped
pages more than the limit is but all the pages are
still in memory. Sounds weird.

> when the page would be swapped in again, the guest
> takes a penalty (for the 'virtual' page in) and
> the page is returned to the guest, possibly kicking
> out (again virtually) a different page
> 
>>>  - accounting and limits have to be consistent
>>>and should roughly represent the actual used
>>>memory/swap (modulo optimizations, I can go
>>>into detail here, if necessary)
>> This is true for current implementation for
>> booth - this patchset ang OpenVZ beancounters.
>>
>> If you sum up the physpages values for all containers
>> you'll get the exact number of RAM pages used.
> 
> hmm, including or excluding the host pages?

Depends on whether you account host pages or not.

>>>  - OOM handling on a per guest basis, i.e. some
>>>out of memory condition in guest A must not
>>>affect guest B
>> This is done in current patches.
> 
>> Herbert, did you look at the patches before
>> sending this mail or do you just want to
>> 'take part' in conversation w/o understanding
>> of hat is going on?
> 
> again, the question was about requirements, not
> your patches, and yes, I had a look at them _and_
> the OpenVZ implementations ...
> 
> best,
> Herbert
> 
> PS: hat is going on? :)
> 
>>> HTC,
>>> Herbert
>>>
 Sigh.  Who is running this show?   Anyone?

 You can actually do a form of overcommittment by allowing multiple
 containers to share one or more of the zones. Whether that is
 sufficient or suitable I don't know. That depends on the requirements,
 and we haven't even discussed those, let alone agreed to them.

 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[QUICKLIST 3/4] Quicklist support for x86_64

2007-03-12 Thread Christoph Lameter
Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86_64/Kconfig  |4 ++
 arch/x86_64/kernel/process.c |1 
 arch/x86_64/kernel/smp.c |2 -
 arch/x86_64/mm/fault.c   |5 +-
 include/asm-x86_64/pgalloc.h |   76 +--
 include/asm-x86_64/pgtable.h |3 -
 mm/Kconfig   |5 ++
 7 files changed, 52 insertions(+), 44 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig   2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig2007-03-12 22:53:28.0 
-0700
@@ -56,6 +56,10 @@ config ZONE_DMA
bool
default y
 
+config NR_QUICK
+   int
+   default 2
+
 config ISA
bool
 
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h  2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h   2007-03-12 
22:53:28.0 -0700
@@ -4,6 +4,10 @@
 #include 
 #include 
 #include 
+#include 
+
+#define QUICK_PGD 0/* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
+   quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
+   quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+   unsigned boundary;
+   pgd_t *pgd = x;
struct page *page = virt_to_page(pgd);
 
+   /*
+* Copy kernel pointers in from init.
+*/
+   boundary = pgd_index(__PAGE_OFFSET);
+   memcpy(pgd + boundary,
+   init_level4_pgt + boundary,
+   (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
spin_lock(&pgd_lock);
-   page->index = (pgoff_t)pgd_list;
-   if (pgd_list)
-   pgd_list->private = (unsigned long)&page->index;
-   pgd_list = page;
-   page->private = (unsigned long)&pgd_list;
+   list_add(&page->lru, &pgd_list);
spin_unlock(&pgd_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-   struct page *next, **pprev, *page = virt_to_page(pgd);
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
 
spin_lock(&pgd_lock);
-   next = (struct page *)page->index;
-   pprev = (struct page **)page->private;
-   *pprev = next;
-   if (next)
-   next->private = (unsigned long)pprev;
+   list_del(&page->lru);
spin_unlock(&pgd_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   unsigned boundary;
-   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-   if (!pgd)
-   return NULL;
-   pgd_list_add(pgd);
-   /*
-* Copy kernel pointers in from init.
-* Could keep a freelist or slab cache of those because the kernel
-* part never changes.
-*/
-   boundary = pgd_index(__PAGE_OFFSET);
-   memset(pgd, 0, boundary * sizeof(pgd_t));
-   memcpy(pgd + boundary,
-  init_level4_pgt + boundary,
-  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
return pgd;
 }
 
 static in

[QUICKLIST 0/4] Arch independent quicklists V2

2007-03-12 Thread Christoph Lameter
V1->V2
- Add sparch64 patch
- Single i386 and x86_64 patch
- Update attribution
- Update justification
- Update approvals
- Earlier discussion of V1 was at
  http://marc.info/?l=linux-kernel&m=117357922219342&w=2

This patchset introduces an arch independent framework to handle lists
of recently used page table pages. It is necessary for x86_64 and
i386 to avoid the special casing of SLUB because these two platforms
use fields in the page_struct (page->index and page->private)
that SLUB needs (and in fact SLAB also needs page-private if
performing debugging!). There is also the tendency of arches to use
page flags to mark page table pages. The slab also uses page flags.
Separating page table page allocation into quicklists avoids the danger
of conflicts and frees up page flags for SLUB and for the arch code.

Page table pages have the characteristics that they are typically zero
or in a known state when they are freed. This is usually the exactly
same state as needed after allocation. So it makes sense to build a list
of freed page table pages and then consume the pages already in use
first. Those pages have already been initialized correctly (thus no
need to zero them) and are likely already cached in such a way that
the MMU can use them most effectively. Page table pages are used in
a sparse way so zeroing them on allocation is not too useful.

Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.
It also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.

Quicklists are defined by an arch defining the necessary number
of quicklists in arch//Kconfig. F.e. i386 needs two and thus
has

config NR_QUICK
int
default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:

quicklist_alloc(, , )

Page table pages can be freed using:

quicklist_free(, , )

Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[QUICKLIST 1/4] Generic quicklist implementation

2007-03-12 Thread Christoph Lameter
Abstract quicklist from the OA64 implementation

Extract the quicklist implementation for IA64, clean it up
and generalize it to allow multiple quicklists and support
for constructors and destructors..

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/ia64/Kconfig  |4 ++
 arch/ia64/mm/contig.c  |2 -
 arch/ia64/mm/discontig.c   |2 -
 arch/ia64/mm/init.c|   51 ---
 include/asm-ia64/pgalloc.h |   82 -
 include/linux/quicklist.h  |   81 
 mm/Kconfig |5 ++
 mm/Makefile|2 +
 mm/quicklist.c |   81 
 9 files changed, 191 insertions(+), 119 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c
===
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/init.c   2007-03-12 
22:49:21.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c2007-03-12 22:49:23.0 
-0700
@@ -39,9 +39,6 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist);
-DEFINE_PER_CPU(long, __pgtable_quicklist_size);
-
 extern void ia64_tlb_init (void);
 
 unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x1UL;
@@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map);
 struct page *zero_page_memmap_ptr; /* map entry for zero page */
 EXPORT_SYMBOL(zero_page_memmap_ptr);
 
-#define MIN_PGT_PAGES  25UL
-#define MAX_PGT_FREES_PER_PASS 16L
-#define PGT_FRACTION_OF_NODE_MEM   16
-
-static inline long
-max_pgt_pages(void)
-{
-   u64 node_free_pages, max_pgt_pages;
-
-#ifndefCONFIG_NUMA
-   node_free_pages = nr_free_pages();
-#else
-   node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
-#endif
-   max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
-   max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
-   return max_pgt_pages;
-}
-
-static inline long
-min_pages_to_free(void)
-{
-   long pages_to_free;
-
-   pages_to_free = pgtable_quicklist_size - max_pgt_pages();
-   pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS);
-   return pages_to_free;
-}
-
-void
-check_pgt_cache(void)
-{
-   long pages_to_free;
-
-   if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES))
-   return;
-
-   preempt_disable();
-   while (unlikely((pages_to_free = min_pages_to_free()) > 0)) {
-   while (pages_to_free--) {
-   free_page((unsigned long)pgtable_quicklist_alloc());
-   }
-   preempt_enable();
-   preempt_disable();
-   }
-   preempt_enable();
-}
-
 void
 lazy_mmu_prot_update (pte_t pte)
 {
Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h
===
--- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h2007-03-12 
22:49:21.0 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h 2007-03-12 
22:49:23.0 -0700
@@ -18,71 +18,18 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
-#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist)
-DECLARE_PER_CPU(long, __pgtable_quicklist_size);
-#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size)
-
-static inline long pgtable_quicklist_total_size(void)
-{
-   long ql_size = 0;
-   int cpuid;
-
-   for_each_online_cpu(cpuid) {
-   ql_size += per_cpu(__pgtable_quicklist_size, cpuid);
-   }
-   return ql_size;
-}
-
-static inline void *pgtable_quicklist_alloc(void)
-{
-   unsigned long *ret = NULL;
-
-   preempt_disable();
-
-   ret = pgtable_quicklist;
-   if (likely(ret != NULL)) {
-   pgtable_quicklist = (unsigned long *)(*ret);
-   ret[0] = 0;
-   --pgtable_quicklist_size;
-   preempt_enable();
-   } else {
-   preempt_enable();
-   ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-   }
-
-   return ret;
-}
-
-static inline void pgtable_quicklist_free(void *pgtable_entry)
-{
-#ifdef CONFIG_NUMA
-   int nid = page_to_nid(virt_to_page(pgtable_entry));
-
-   if (unlikely(nid != numa_node_id())) {
-   free_page((unsigned long)pgtable_entry);
-   return;
-   }
-#endif
-
-   preempt_disable();
-   *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
-   pgtable_quicklist = (unsigned long *)pgtable_entry;
-   ++pgtable_quicklist_size;
-   preempt_enable();
-}
-
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   return pgtable_quicklist_alloc();
+   return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t 

[QUICKLIST 4/4] Quicklist support for sparc64

2007-03-12 Thread Christoph Lameter
From: David Miller <[EMAIL PROTECTED]>

[QUICKLIST]: Add sparc64 quicklist support.

I ported this to sparc64 as per the patch below, tested on
UP SunBlade1500 and 24 cpu Niagara T1000.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

---
 arch/sparc64/Kconfig  |4 
 arch/sparc64/mm/init.c|   24 
 arch/sparc64/mm/tsb.c |2 +-
 include/asm-sparc64/pgalloc.h |   26 ++
 4 files changed, 19 insertions(+), 37 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig
===
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/Kconfig  2007-03-12 
22:49:19.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig   2007-03-12 22:53:30.0 
-0700
@@ -26,6 +26,10 @@ config MMU
bool
default y
 
+config NR_QUICK
+   int
+   default 1
+
 config STACKTRACE_SUPPORT
bool
default y
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c
===
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/init.c2007-03-12 
22:49:19.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c 2007-03-12 22:53:30.0 
-0700
@@ -176,30 +176,6 @@ unsigned long sparc64_kern_sec_context _
 
 int bigkernel = 0;
 
-struct kmem_cache *pgtable_cache __read_mostly;
-
-static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long 
flags)
-{
-   clear_page(addr);
-}
-
-extern void tsb_cache_init(void);
-
-void pgtable_cache_init(void)
-{
-   pgtable_cache = kmem_cache_create("pgtable_cache",
- PAGE_SIZE, PAGE_SIZE,
- SLAB_HWCACHE_ALIGN |
- SLAB_MUST_HWCACHE_ALIGN,
- zero_ctor,
- NULL);
-   if (!pgtable_cache) {
-   prom_printf("Could not create pgtable_cache\n");
-   prom_halt();
-   }
-   tsb_cache_init();
-}
-
 #ifdef CONFIG_DEBUG_DCFLUSH
 atomic_t dcpage_flushes = ATOMIC_INIT(0);
 #ifdef CONFIG_SMP
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c
===
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/tsb.c 2007-03-12 
22:49:19.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c  2007-03-12 22:53:30.0 
-0700
@@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] = 
"tsb_1MB",
 };
 
-void __init tsb_cache_init(void)
+void __init pgtable_cache_init(void)
 {
unsigned long i;
 
Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h
===
--- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h 2007-03-12 
22:49:19.0 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h  2007-03-12 
22:53:30.0 -0700
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -13,52 +14,50 @@
 #include 
 
 /* Page table allocation/freeing. */
-extern struct kmem_cache *pgtable_cache;
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
+   return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
-   kmem_cache_free(pgtable_cache, pgd);
+   quicklist_free(0, NULL, pgd);
 }
 
 #define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD)
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return kmem_cache_alloc(pgtable_cache,
-   GFP_KERNEL|__GFP_REPEAT);
+   return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pmd_free(pmd_t *pmd)
 {
-   kmem_cache_free(pgtable_cache, pmd);
+   quicklist_free(0, NULL, pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
 {
-   return kmem_cache_alloc(pgtable_cache,
-   GFP_KERNEL|__GFP_REPEAT);
+   return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm,
 unsigned long address)
 {
-   return virt_to_page(pte_alloc_one_kernel(mm, address));
+   void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
+   return pg ? virt_to_page(pg) : NULL;
 }

 static inline void pte_free_kernel(pte_t *pte)
 {
-   kmem_cache_free(pgtable_cache, pte);
+   quicklist_free(0, NULL, pte);
 }
 
 static inline void pte_free(struct page *ptepage)
 {
-   pte_free_kernel(page_address(ptepage));
+   quicklist_free(0, NULL, page_address(ptepage));
 }
 
 
@@ -66,6 +65,9 @@ static inline void pte_free(struct page 
 #define pmd_populate(MM,PMD,PTE_PAGE)  \
pmd_populate_

[QUICKLIST 2/4] Quicklist support for i386

2007-03-12 Thread Christoph Lameter
i386: Convert to quicklists

Implement the i386 management of pgd and pmds using quicklists.

The i386 management of page table pages currently uses page sized slabs.
The page state is therefore mainly determined by the slab code. However,
i386 also uses its own fields in the page struct to mark special pages
and to build a list of pgds using the ->private and ->index field (yuck!).
This has been finely tuned to work right with SLAB but SLUB needs more
control over the page struct. Currently the only way for SLUB to support
these slabs is through special casing PAGE_SIZE slabs.

If we use quicklists instead then we can avoid the mess, and also the
overhead of manipulating page sized objects through slab.

It also allows us to use standard list manipulation macros for the
pgd list using page->lru thereby simplifying the code.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/i386/Kconfig  |4 ++
 arch/i386/kernel/process.c |1 
 arch/i386/kernel/smp.c |2 -
 arch/i386/mm/fault.c   |5 +--
 arch/i386/mm/init.c|   25 -
 arch/i386/mm/pageattr.c|2 -
 arch/i386/mm/pgtable.c |   63 +
 include/asm-i386/pgalloc.h |2 -
 include/asm-i386/pgtable.h |   13 +++--
 9 files changed, 39 insertions(+), 78 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/i386/mm/init.c
===
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/init.c   2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/init.c2007-03-12 22:53:27.0 
-0700
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif
 
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
-   if (PTRS_PER_PMD > 1) {
-   pmd_cache = kmem_cache_create("pmd",
-   PTRS_PER_PMD*sizeof(pmd_t),
-   PTRS_PER_PMD*sizeof(pmd_t),
-   0,
-   pmd_ctor,
-   NULL);
-   if (!pmd_cache)
-   panic("pgtable_cache_init(): cannot create pmd cache");
-   }
-   pgd_cache = kmem_cache_create("pgd",
-   PTRS_PER_PGD*sizeof(pgd_t),
-   PTRS_PER_PGD*sizeof(pgd_t),
-   0,
-   pgd_ctor,
-   PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
-   if (!pgd_cache)
-   panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
 /*
  * This function cannot be __init, since exceptions don't work in that
  * section.  Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c
===
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c 2007-03-12 22:53:27.0 
-0700
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -181,9 +182,12 @@ void reserve_top_address(unsigned long r
 #endif
 }
 
+#define QUICK_PGD 0
+#define QUICK_PT 1
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-   return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+   return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
 }
 
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
@@ -198,11 +202,6 @@ struct page *pte_alloc_one(struct mm_str
return pte;
 }
 
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
-   memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
 /*
  * List of all pgd's needed for non-PAE so it can invalidate entries
  * in both cached and uncached pgd's; not needed for PAE since the
@@ -211,36 +210,15 @@ void pmd_ctor(void *pmd, struct kmem_cac
  * against pageattr.c; it is the unique case in which a valid change
  * of kernel pagetables can't be lazily synchronized by vmalloc faults.
  * vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
  * -- wli
  */
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
-   struct page *page = virt_to_page(pgd);
-   page->index = (unsigned long)pgd_list;
-   if (pgd_list)
-   set_page_private(pgd_list, (unsigned long)&page->index);
-   pgd_list = page;
-   set_page_private(page, (unsigned long)&pgd_list);
-}
+LIST_HEAD(pgd_list);
 
-static inline void pgd_list_del(pgd_t *pgd)
-{
-   struct page *next, **pprev, *page = virt_to_page(pgd);
-   next =

Re: [RFC][PATCH 3/7] Data structures changes for RSS accounting

2007-03-12 Thread Pavel Emelianov
Dave Hansen wrote:
> On Mon, 2007-03-12 at 20:19 +0300, Pavel Emelianov wrote:
>> Dave Hansen wrote:
>>> On Mon, 2007-03-12 at 19:16 +0300, Kirill Korotaev wrote:
 now VE2 maps the same page. You can't determine whether this page is mapped
 to this container or another one w/o page->container pointer. 
>>> Hi Kirill,
>>>
>>> I thought we can always get from the page to the VMA.  rmap provides
>>> this to us via page->mapping and the 'struct address_space' or anon_vma.
>>> Do we agree on that?
>> Not completely. When page is unmapped from the *very last*
>> user its *first* toucher may already be dead. So we'll never
>> find out who it was.
> 
> OK, but  this is assuming that we didn't *un*account for the page when
> the last user of the "owning" container stopped using the page.

That's exactly what we agreed on during our discussions:
When page is get touched it is charged to this container.
When page is get touched again by new container it is NOT
charged to new container, but keeps holding the old one
till it (the page) is completely freed. Nobody worried the
fact that a single page can hold container for good.

OpenVZ beancounters work the other way (and we proposed this
solution when we first sent the patches). We keep track of
*all* the containers (i.e. beancounters) holding this page.

>>> We can also get from the vma to the mm very easily, via vma->vm_mm,
>>> right?
>>>
>>> We can also get from a task to the container quite easily.  
>>>
>>> So, the only question becomes whether there is a 1:1 relationship
>>> between mm_structs and containers.  Does each mm_struct belong to one
>> No. The question is "how to get a container that touched the
>> page first" which is the same as "how to find mm_struct which
>> touched the page first". Obviously there's no answer on this
>> question unless we hold some direct page->container reference.
>> This may be a hash, a direct on-page pointer, or mirrored
>> array of pointers.
> 
> Or, you keep track of when the last user from the container goes away,
> and you effectively account it to another one.

We can migrate page to another user but we decided
to implement it later after accepting simple accounting.

> Are there problems with shifting ownership around like this?
> 
> -- Dave
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Make sure we populate the initroot filesystem late enough

2007-03-12 Thread Benjamin Herrenschmidt

> Hmm. The crash came back after I booted into Mac OS X and back. It was however
> a different crash, I believe it was coming from the USB modules (as it would
> keep going when it happened, and get another crash, which tended to scroll 
> away
> too fast for me to capture) but I believe it was still getting down into the
> slab code and actually dying there.

Have you tried, instead, to apply
38f3323037de22bb0089d08be27be01196e7148b ? (That is revert
39d61db0edb34d60b83c5e0d62d0e906578cc707).

I suspect this is the proper fix...

Ben.

> However, reverting the reversion of
> 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
> the following patch:
> 
> diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 
> linux-source-2.6.20/arch/powerpc/mm/init_32.c
> --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05 
> 05:44:54.0 +1100
> +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c   2007-03-10 
> 11:03:56.0 +1100
> @@ -244,7 +244,8 @@
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
> if (start < end)
> -   printk ("Freeing initrd memory: %ldk freed\n", (end - start) 
> >> 10);
> +   printk ("NOT Freeing initrd memory: %ldk freed\n", (end - 
> start) >> 10);
> +   return;
> for (; start < end; start += PAGE_SIZE) {
> ClearPageReserved(virt_to_page(start));
> init_page_count(virt_to_page(start));
> 
> which if I recall correctly David Woodhouse posted to this thread,
> seems to have fixed it.
> 
> I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
> (ie 99 bytes over 12884k) and the above logs:
> "NOT Freeing initrd memory: 12888k freed"
> which makes sense...
> 
> I of course completely failed to think to check this with the crashing
> kernel, if it seems relevant I can roll back to it and get the numbers.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/12] Syslets, Threadlets, generic AIO support, v5

2007-03-12 Thread Milton Miller

Anton Blanchard wrote:

Hi Ingo,


this is the v5 release of the syslet/threadlet subsystem:

   http://redhat.com/~mingo/syslet-patches/


Nice!



I too went and downloaded patches-v5 for review.

First off, one problem I noticed in sys_async_wait:

+   ah->events_left = min_wait_events - (kernel_ring_idx - user_ring_idx);

This completely misses the wraparound case of kernel_ring_idx <
user_ring_idx. I wonder if this is causing some of the benchmark 
problems?

(add max_ring_index if kernel < user).


I tried to port this to ppc64 but found a few problems:

The 64bit powerpc ABI has the concept of a TOC (r2) which is used for
per function data. This means this wont work:

[deleted]
I think we would want to change restore_ip to restore_function, and 
then

create a per arch helper, perhaps:

void set_user_context(struct task_struct *task, unsigned long stack,
  unsigned long function, unsigned long retval);

ppc64 could then grab the ip and r2 values from the function 
descriptor.


The other issue involves the syscall table:

asmlinkage struct syslet_uatom __user *
sys_async_exec(struct syslet_uatom __user *uatom,
   struct async_head_user __user *ahu)
{
return __sys_async_exec(uatom, ahu, sys_call_table, 
NR_syscalls);

}

This exposes the layout of the syscall table. Unfortunately it wont 
work

on ppc64. In arch/powerpc/kernel/systbl.S:

#define COMPAT_SYS(func).llong  .sys_##func,.compat_sys_##func

Both syscall tables are overlaid.

Anton


In addition, the entries in the table are not function pointers, they
are the actual code targets.   So we need a arch helper to invoke the
system call.

Here is another problem with your compat code.  Just telling user space
that everything is u64 and having the kernel retrieve pointers and ulong
doesn't work, you have to actually copy in u64 values and truncate them
down.  Your current code is broken on all 32bit big endian kernels.
Actually, the check needs to be that the upper 32 bits are 0 or return
-EINVAL.

In addition, the compat syscall entry points assume that the arguments
have been truncated to compat_ulong values by the syscall entry path,
and that they only need to do sign extension (and/or pointer conversion
on s390 with its 31 bit pointers).  So all compat kernels are broken.

The two of these things together makes me think we want two copy
functions.  At that point we may as well define the struct uatom in
terms of ulong and compat_ulong for the compat_uatom.  That would lead
to two copies of exec_uatom, but the elimination of passing the syscall
table as an argument down.  The need_resched and signal check could
become part of the common next_uatom routine, although it would need to
know uatom+1 instead of doing the addition in itself.

Other observations:

All the logic setting at and async_ready is a bit hard to follow.
After some analysis, t->at is only ever set to &t->__at and
async_ready is only set to the same at or NULL.  Both of these
should become flags, and at->task should be converted to
container_of.  Also, the name at is hard to grep / search for.

The stop flags are decoded with a case but are not densely encoded,
rather they are one hot.  We either need to error on multiple stop
bits being set, stop on each possible condition, or encode them
densely.

There is no check for flags being set that are not recognized.
If we ever add a flag for another stop condition this would
lead to incorrect execution by the kernel.

There are some syscalls that can return -EFAULT but later have
force_syscall_noerror.  We should create a stop on ERROR and
clear the force_noerror flag between syscalls.  The umem_add
syscall should add force_noerror if the put_user succeeds.

In copy_uatom, you call verify_read on the entire uatom.  This means
that the struct with all user space size has to be within the process
limit, which violates your assertion that userspace doesn't need the
whole structure.  If we add the requirement that the space that would
be occupied by the complete atom has to exist, then we can copy the
whole struct uatom with copy_from_user and then copy the args with
get_user.  User space can still pack them more densely, and we can
still stop copying on a null arg pointer.  Actually, calling access_ok
then __get_user can be more expensive on some architectures because
they have to verify both start and length on access_ok but can only
verify start on get_user because they have unmapped areas between user
space and kernel space.  This would also mean that we don't check
arg_ptr for NULL without verifying that get_user actually worked.   
The gotos in exec_uatom are just a while loop with a break.

sys_umem_add should be in /lib under lib-y in the Makefile.
In fact declaring the function weak does not make it a weak
syscall implementation on some architectures.

Weak syscalls aliases to sys_ni_syscall are needed for when
async support is not selected in Kconfig.

The Documentation 

Re: Removal of multipath cached (was Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.)

2007-03-12 Thread Jarek Poplawski
On Mon, Mar 12, 2007 at 10:22:36PM -0800, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:53:11 -0700 (PDT) David Miller <[EMAIL PROTECTED]> 
> > wrote:
...
> > And there is absolutely no negotiations about this, I've held back on
> > this for nearly 2 years, and nothing has happened, this code is not
> > maintained, nobody cares enough to fix the bugs, and even no
> > distributions enable it because it causes crashes.
> 
> Good stuff.
> 
> I suggest you put a big printk explaining the above into 2.6.21.
> 

Plus official way: Documentation/feature-remove-schedule.txt
in the next rc-git.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix vmi time header bug

2007-03-12 Thread Zachary Amsden

Andrew Morton wrote:

Really truly?   I think we have a _lot_ of declarations which omit the section
qualifier altogether.  How come they don't all break too?
  


User build was smoking this:

make O=build -j16

This and non-repeatable results make me suspect some kind of build 
dependency problem, or perhaps a make bug.  Still, please apply, as it 
doesn't hurt.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Remove unused set_seg_base

2007-03-12 Thread Rusty Russell
The set_seg_base function isn't used anywhere (2.6.21-rc3-git1)

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 0798f7cfc709 include/asm-x86_64/desc.h
--- a/include/asm-x86_64/desc.h Mon Mar 12 16:56:18 2007 +1100
+++ b/include/asm-x86_64/desc.h Tue Mar 13 11:39:16 2007 +1100
@@ -107,16 +107,6 @@ static inline void set_ldt_desc(unsigned
  DESC_LDT, size * 8 - 1);
 }
 
-static inline void set_seg_base(unsigned cpu, int entry, void *base)
-{ 
-   struct desc_struct *d = &cpu_gdt(cpu)[entry];
-   u32 addr = (u32)(u64)base;
-   BUG_ON((u64)base >> 32); 
-   d->base0 = addr & 0x;
-   d->base1 = (addr >> 16) & 0xff;
-   d->base2 = (addr >> 24) & 0xff;
-} 
-
 #define LDT_entry_a(info) \
info)->base_addr & 0x) << 16) | ((info)->limit & 0x0))
 /* Don't allow setting of the lm bit. It is useless anyways because 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Introduce load_TLS to the "for" loop.

2007-03-12 Thread Rusty Russell
GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable.  (I've also submitted a patch which cleans up
i386, which is even uglier).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r de5618b5e562 include/asm-x86_64/desc.h
--- a/include/asm-x86_64/desc.h Tue Mar 13 11:41:55 2007 +1100
+++ b/include/asm-x86_64/desc.h Tue Mar 13 16:09:56 2007 +1100
@@ -135,16 +135,13 @@ static inline void set_ldt_desc(unsigned
(info)->useable == 0&& \
(info)->lm  == 0)
 
-#if TLS_SIZE != 24
-# error update this code.
-#endif
-
 static inline void load_TLS(struct thread_struct *t, unsigned int cpu)
 {
+   unsigned int i;
u64 *gdt = (u64 *)(cpu_gdt(cpu) + GDT_ENTRY_TLS_MIN);
-   gdt[0] = t->tls_array[0];
-   gdt[1] = t->tls_array[1];
-   gdt[2] = t->tls_array[2];
+
+   for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++)
+   gdt[i] = t->tls_array[i];
 } 
 
 /*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_REORDER Kconfig help strange sentence.

2007-03-12 Thread Rusty Russell
On Tue, 2007-03-13 at 00:56 +0100, Andi Kleen wrote:
> On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote:
> > OK, this confused me:
> > 
> > Function reordering (REORDER) [N/y/?] (NEW) ?
> > 
> > This option enables the toolchain to reorder functions for a more 
> > optimal TLB usage. If you have pretty much any version of binutils, 
> > this can increase your kernel build time by roughly one minute.
> > 
> > "If you have pretty much any version of binutils"?  Huh?
> > 
> > You mean "This will slow your kernel build by about a minute"?
> 
> Yes. Lots of sections seem to trigger some quadratic behaviour in ld.
> 
> It might be fixed in some unreleased CVS version though (not 100% sure) 
> 
> -Andi

OK, well here is a patch for the moment.

==
Clarify CONFIG_REORDER explanation

if (1 && X) => if (X).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r de5618b5e562 arch/x86_64/Kconfig
--- a/arch/x86_64/Kconfig   Tue Mar 13 11:41:55 2007 +1100
+++ b/arch/x86_64/Kconfig   Tue Mar 13 17:27:05 2007 +1100
@@ -632,8 +632,8 @@ config REORDER
default n
help
  This option enables the toolchain to reorder functions for a more 
- optimal TLB usage. If you have pretty much any version of binutils, 
-this can increase your kernel build time by roughly one minute.
+ optimal TLB usage.  This will slow your kernel build by
+roughly one minute.
 
 config K8_NB
def_bool y


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)

2007-03-12 Thread Trent Piepho
On Tue, 13 Mar 2007, Rusty Russell wrote:
> Hi Trent,
>
>   Patch looks good, just one comment:
>
> On Mon, 2007-03-12 at 07:07 -0700, Trent Piepho wrote:
> > +   use = already_uses(a, b);
> > +   if (!use) {
> > +   printk(KERN_ERR "module %s trying to un-use a module, %s, which 
> > "
> > +  "it is not using", a->name, b->name);
> > +return 0;
> > +   }
>
> s/return 0/BUG()/.  This is potentially quite a nasty bug.

Ok, I did that before, I'll change it back.

Note that the reference counting isn't perfect when it comes to catching
mistakes.

The fundamental problem is that when a module is loaded and linked, all the
modules that it used symbols from gain a "use".  To be symmetric, when a
module is unloaded all the modules it used symbols from should lose a
"use".  Except, there is no record of what modules gained a "use" at link
time.  Suppose module 1 uses a symbol from module 2.  At link time, a
module_use that "1 uses 2" is created.  Now say 1 does a symbol_put() on
something in 2, with no matching get.  The "1 uses 2" goes away.  When 1 is
unloaded, there is no way to tell that "1 uses 2", deleted by the extra
put, is missing.

If it's wanted, I think I could fix this.  I'd have a separate count of
static uses vs dynamic uses.From: Trent Piepho <[EMAIL PROTECTED]>

Add ability to keep track of callers of symbol_(get|put)

When a module uses symbol_get() to increase the ref count of another
module, there is no record what module called symbol_get().  A module can
show up as having other users, but there is no way to tell who those
users are.

This adds that ability to symbol_put() and symbol_get().

__symbol_get() and __symbol_put() gain another parameter, which specifies
the module that is doing the getting or putting.  symbol_put_addr() is
renamed to __symbol_put_addr() and has the same parameter added.  The
module can be NULL, in which case the symbol's owner's refcount is
incremented without recording who did it, as was the case before.

The macros symbol_get(), symbol_put(), and symbol_put_addr() will use
THIS_MODULE as the getter/putter and so don't have an extra parameter.  A
macro symbol_put_user() is added that allows specifying the putting
module.

The module_use structure that keeps track of one module's use of another
gains a count member.  The module_use will not go away until the count
goes down to zero.  The count wasn't necessary before because a module
could only use another module once, when the module was linked in, and
un-use that module once, when it was unloaded.

When a module calls symbol_get() to get a symbol from module that owns
the symbol, the ref count of the owning module is _not_ incremented if
the getting module was already listed as using the owning module.
Rather, the count of that module_use is incremented.

When a module is loaded and the kernel module linker is resolving
symbols, it will not increment the module_use count for each symbol used,
but will just leave it at one.  We don't count each symbol resolved,
because during module unloading we wouldn't know how many times to
decrement the module_use count.

When the module is unloaded, the module_use count will only be
decremented by one, which should bring it to zero.  If it's not zero,
then the remaining count is the number of symbol_get()s the module did
that were unmatched with a symbol_put().

Signed-off-by: Trent Piepho <[EMAIL PROTECTED]>

diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -167,9 +167,10 @@ struct notifier_block;
 #ifdef CONFIG_MODULES
 
 /* Get/put a kernel symbol (calls must be symmetric) */
-void *__symbol_get(const char *symbol);
+void *__symbol_get(const char *symbol, struct module *user);
 void *__symbol_get_gpl(const char *symbol);
-#define symbol_get(x) ((typeof(&x))(__symbol_get(MODULE_SYMBOL_PREFIX #x)))
+#define symbol_get(x) ((typeof(&x))(__symbol_get(MODULE_SYMBOL_PREFIX #x, \
+   THIS_MODULE)))
 
 #ifndef __GENKSYMS__
 #ifdef CONFIG_MODVERSIONS
@@ -386,9 +387,11 @@ extern void __module_put_and_exit(struct
 
 #ifdef CONFIG_MODULE_UNLOAD
 unsigned int module_refcount(struct module *mod);
-void __symbol_put(const char *symbol);
-#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x)
-void symbol_put_addr(void *addr);
+void __symbol_put(const char *symbol, struct module *user);
+#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x, THIS_MODULE)
+#define symbol_put_user(x,u) __symbol_put(MODULE_SYMBOL_PREFIX #x, (u))
+void __symbol_put_addr(void *addr, struct module *user);
+#define symbol_put_addr(x) __symbol_put_addr((x), THIS_MODULE)
 
 /* Sometimes we know we already have a refcount, and it's easier not
to handle the error case (which only happens with rmmod --wait). */
diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -516,30 +516,54 @@ struct module_use
 {
struct list

Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith
On Tue, 2007-03-13 at 17:16 +1100, Con Kolivas wrote:
> On Tuesday 13 March 2007 17:08, Mike Galbraith wrote:
> > Virtual or physical cores has nothing to do with the interactivity
> > regression I noticed.  Two nice 0 tasks which combined used 50% of my
> > box can no longer share that box with two nice 5 tasks and receive the
> > 50% they need to perform.  That's it. From there, we wandered off into a
> > discussion on the relative merit and pitfalls of fairness.
> 
> And again, with X in its current implementation it is NOT like two nice 0 
> tasks at all; it is like one nice 0 task. This is being fixed in the X design 
> as we speak.

Shrug.  I don't live then, I live now.  I have expressed my concerns,
and will now switch from talk back to listen mode.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: _proxy_pda still makes linking modules fail

2007-03-12 Thread Rusty Russell
On Tue, 2007-03-13 at 08:59 +1100, Rusty Russell wrote:
> On Mon, 2007-03-12 at 10:48 +0100, Andi Kleen wrote:
> > > Rusty's pda->per_cpu patch will deal with this once and for all; have
> > 
> > Not on x86-64.
> 
> Indeed.  Perhaps it's time I join the modern world and compile a 64-bit
> kernel...
> 
> Will prepare patches,

No, I don't think I will.  The PDA concept has gone too far in x86-64 to
be undone.  In particular, it's been put in GCC 4.1 for
CONFIG_CC_STACKPROTECTOR, which assumes %gs:40 will give the stack
canary.

For the record: the PDA should never have existed, that's what percpu
vars were supposed to be for.  Something went wrong here 8(

%gs is best set to the offset of the local cpu's area from the "master"
per-cpu area, not set to the local cpu area's address.  In the former
case, booting with %gs at offset 0 works naturally, in the latter case,
hoops need to be jumped through to make it work.  See how much nicer the
x86 code is post pda->percpu conversion.

So, even if we leave the PDA and place the per-cpu area immediately
after it, we still can't use "%gs:var" to access a per-cpu variable: we
need to do a subtract, so why bother using the segment reg?

The ideal solution has always been to use __thread, but no architecture
has yet managed it (I tried for i386, and it quickly caused unbearable
pain).  On x86-64 that uses "%fs" on x86-64, not "%gs" as the kernel
does, but I might try that if I feel particularly masochistic soon...

In summary, containing the PDA infection to x86-64 is possible, but
curing that patient is non-trivial 8)

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread David Lang

On Mon, 12 Mar 2007, Lee Revell wrote:


On 3/12/07, David Lang <[EMAIL PROTECTED]> wrote:

the problem comes when this isn't enough. if you have several CPU hogs on a
system, and they are all around the same priority level, how can the 
scheduler

know which one needs the CPU the most for good interactivity?

in some cases you may be able to directly detect that your high-priority 
process
is waiting for another one (tracing pipes and local sockets for example), 
but
what if you are waiting for several of them? (think a multimedia desktop 
waiting
for the sound card, CDRom, hard drive, and video all at once) which one 
needs

the extra CPU the most?


I'm not an expert in this area by any means but after reading this
thread the OSX solution of simply telling the kernel "I'm the GUI,
schedule me accordingly" looks increasingly attractive.  Why make the
kernel guess when we can just be explicit?


this can solve the specific problem (and since 'nice' is the natural way to tell 
the kernel this, it's not even a one-shot solution).


however Linus is right, the real underlying problem is where the user is 
waiting on a server. if this issue could be solved then a lot of things would 
benifit.


Con, as a quick hack (probably a bad idea as I'm not a scheduling expert), if a 
program blocks on another program (via a pipe or socket) could you easily give 
the rest of the first program's timeslice to the second one, without makeing it 
loose it's own?


I'm thinking that doing the dumb thing and just throwing a bit more CPU at the 
thing you are waiting for may work. (assuming that the server process actually 
does something useful with the extra CPU time it gets)


as far as latencies go, it would be like turning every process on the system 
into a cpu hog.


David Lang


Does anyone know of a UNIX-like system that has managed to solve this
problem without hooking the GUI into the scheduler?

Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix vmi time header bug

2007-03-12 Thread Zachary Amsden

Andrew Morton wrote:

Really truly?   I think we have a _lot_ of declarations which omit the section
qualifier altogether.  How come they don't all break too?
  


According to the report I have.  Perhaps a bogus section qualifier does 
more damage than an omitted one.  I'll get gcc  / linker version, but 
this could be a combination of user error, a strange toolchain, and 
perhaps a real bug somewhere.



(ARM (at least) in fact does require the section tagging on the declaration as
well as the definition, but we've thus far only fixed that in a couple of places
which were causing breakage).
  


Yes, I was surprised by this as well, and I'm still skeptical about this 
being the real cause.  Still, this reportedly fixed the problem, and is 
certainly not a bad thing.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith
On Tue, 2007-03-13 at 16:53 +1100, Con Kolivas wrote:
> On Tuesday 13 March 2007 16:10, Mike Galbraith wrote:

> > I'm not trying to be pig-headed.  I'm of the opinion that fairness is
> > great... until you strictly enforce it wrt interactive tasks.
> 
> How about answering my question then since I offered you numerous 
> combinations 
> of ways to tackle the problem? The simplest one doesn't even need code, it 
> just needs you to alter the nice value that you're already setting.

Hey, you specifically asked me to not choose 5 :)  (I mentioned 5
earlier in the thread anyway, so no sense in repeating myself)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Con Kolivas
On Tuesday 13 March 2007 17:08, Mike Galbraith wrote:
> Virtual or physical cores has nothing to do with the interactivity
> regression I noticed.  Two nice 0 tasks which combined used 50% of my
> box can no longer share that box with two nice 5 tasks and receive the
> 50% they need to perform.  That's it. From there, we wandered off into a
> discussion on the relative merit and pitfalls of fairness.

And again, with X in its current implementation it is NOT like two nice 0 
tasks at all; it is like one nice 0 task. This is being fixed in the X design 
as we speak.

>   -Mike

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith
On Mon, 2007-03-12 at 17:38 -0400, michael chang wrote:

> Perhaps, Mike Galbraith, do you feel that it should be possible to use
> the CPU at 100% for some task and still maintain excellent
> interactivity?

Within reason, yes.  Defining "reason" is difficult.  As we speak, this
is possible to a much greater degree than with RSDL.  Before anybody
pipes in, yes, I'm very much aware of the down side of the interactivity
estimator, I've waged bloody battles with it, and have the t-shirt :)

> That said, I haven't run the test case in particular yet, although I
> will see if I can get the time to do so soon. In any case, I
> personally do have a few qualms about this test case being run on HT
> virtual cores:

Virtual or physical cores has nothing to do with the interactivity
regression I noticed.  Two nice 0 tasks which combined used 50% of my
box can no longer share that box with two nice 5 tasks and receive the
50% they need to perform.  That's it. From there, we wandered off into a
discussion on the relative merit and pitfalls of fairness.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Djprobes questions

2007-03-12 Thread Masami Hiramatsu
Hi Mathieu,

Mathieu Desnoyers wrote:
> Hi Masami,
> 
> I recently had to add support for inline code patching on i386 to my
> marker infrastructure. Clearly, it looks like what is done in djprobes,
> with the main difference that I only patch the immediate value of a 2
> bytes "load immediate" instruction.

That's interesting.

> I think I found a solution to one of the main issues with djprobes : it
> currently has to wait for each CPU to hit the probe before being sure
> that it's safe to patch the code with something else than an int3. This
> is due to PIII errata 49, which says that a CPU much execute a
> serializing instruction before executing cross-modified code.

Hmm, djprobe already might not wait for each CPU to hit the probe
point. It just wait scheduler synchronization instead of that.
And after that, it issues cpuid for cache serialization before
executing cross-modified code.

The most difficult point of the djprobe is that it has to replace
"live" instructions. So we must check other processors not to run
those instructions carefully.

> Here is what I do : While I use a breakpoint to fall in a trap for the
> CPUs that hit the site currently being modified, I also send an IPI to
> all CPUs so they execute cpuid. Once it returns, I am sure that every
> CPU has executed a serializing instruction, which enables me to go on
> with the complete code modification, therefore removing the initial
> breakpoint.

I think its OK. That is the same way which I've done in djprobe.

> Here is my code :
> 
> http://ltt.polymtl.ca/cgi-bin/gitweb.cgi?p=linux-2.6-lttng.git;a=blob;f=arch/i386/kernel/marker.c;h=89b06f02f0966685be260d6364a0dd94c3d14456;hb=v2.6.20-lttng
> 
> (Comments are welcome)
> 
> On a second note, looking at the djprobes code triggered some question 
> in my mind about the safety of using a worker thread to "make sure"
> every interrupt context has returned (so there is no IP pointing into
> the modified code). The following scenario might be possible : an
> interrupt handler (or trap handler) reenables interrupts, does irq_exit()
> or nmi_exit() (which reenables preemption) but does not do iret yet. My
> understanding is that it could be scheduled and have a return IP
> pointing to the code that is being modified. Am I right ?

Same idea was already discussed. It might work on normal kernel,
but, unfortunately, it doesn't work on preemptive kernel. And actually,
that idea is same as synchronize_sched(). So, I've used it on normal
kernel. In the case of preemptive kernel, currently, I'm using
freeze_processes() suggested by Ingo.

Anyway, I and Satoshi are developing a static analysis tool to
check whether target instructions can be replaced by long jump.
I'd like to release djprobe patch against latest kernel after
developed it.

Best regards,

-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Rodney Gordon II
On Tuesday 13 March 2007 00:53, Con Kolivas wrote:
> On Tuesday 13 March 2007 16:10, Mike Galbraith wrote:
> > On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote:
> > > On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > > > As soon as your cpu is fully utilized, fairness looses or
> > > > interactivity loses.  Pick one.
> > >
> > > That's not true unless you refuse to prioritise your tasks
> > > accordingly. Let's take this discussion in a different direction. You
> > > already nice your lame processes. Why? You already have the concept
> > > that you are prioritising things to normal or background tasks. You
> > > say so yourself that lame is a background task. Stating the bleedingly
> > > obvious, the unix way of prioritising things is via nice. You already
> > > do that. So moving on from that...
> >
> > Sure.  If a user wants to do anything interactive, they can indeed nice
> > 19 the rest of their box before they start.
> >
> > > Your test case you ask "how can I maximise cpu usage". Well you know
> > > the answer already. You run two threads. I won't dispute that.
> > >
> > > The debate seems to be centered on whether two tasks that are niced +5
> > > or to a higher value is background. In my opinion, nice 5 is not
> > > background, but relatively less cpu. You already are savvy enough to
> > > be using two threads and nicing them. All I ask you to do when using
> > > RSDL is to change your expectations slightly and your settings from
> > > nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you?
> >
> > It's not "offensive" to me, it is a behavioral regression.  The
> > situation as we speak is that you can run cpu intensive tasks while
> > watching eye-candy.  With RSDL, you can't, you feel the non-interactive
> > load instantly.  Doesn't the fact that you're asking me to lower my
> > expectations tell you that I just might have a point?

I do not feel nearly any non-interactive load. See below.

>
> Yet looking at the mainline scheduler code, nice 5 tasks are also supposed
> to get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75%
> cpu with a fully cpu bound task in the presence of an interactive task. To
> me that means mainline is not living up to my expectations. What you're
> saying is your expectations are based on a false cpu expectation from nice
> 5. You can spin it both ways. It seems to me the only one that lives up to
> a defined expectation is to be fair. Anything else is at best vague, and at
> worst starvation prone.
>
> > > Please don't pick 5.none of the above. Please try to work with me on
> > > this.
> >
> > I'm not trying to be pig-headed.  I'm of the opinion that fairness is
> > great... until you strictly enforce it wrt interactive tasks.
>
> How about answering my question then since I offered you numerous
> combinations of ways to tackle the problem? The simplest one doesn't even
> need code, it just needs you to alter the nice value that you're already
> setting.

Also, just to chime in, I am doing a large project converting over 250GB of 
FLAC audio to MP3 via lame for my archive conversion.

I am using 2.6.20.2-rsdl0.30, and I have 2 processes of flac decoding/lame 
encoding running simultaneously from a perl script I hacked up on my P-D 830. 
These processes are both nice'd to 19.

I have almost no degredation in latency in my usage of X (which is at nice 0), 
if that matters at all. Please try what Con is suggesting by adjusting your 
nice level, and see if that helps you at all.

These are just useless arguments, time better spent on coding and fixing real 
problems, than a flamewar on whether nice 5 is good enough or not.

Con's rsdl implements what ingosched was supposed to do, wrt the niceness 
levels. Perhaps Mike, you are used to the impression ingosched gave you with 
nice +5, but try something else as Con suggested.. +10, +15, hell, whatever. 
Is that so hard?

My 2c,
-r

-- 
Rodney "meff" Gordon II -*- [EMAIL PROTECTED]
Systems Administrator / Coder Geek -*- Open yourself to OpenSource
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-12 Thread Eric Dumazet

Anton Blanchard a écrit :
 
Hi Nick,



Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
at this, send me a mail (eg. especially with the sched_setscheduler issue,
you might be able to do something better).


I took a look at this today and figured Id document it:

http://ozlabs.org/~anton/linux/sysbench/

Bottom line: it looks like issues in the glibc malloc library, replacing
it with the google malloc library fixes the negative scaling:

# apt-get install libgoogle-perftools0
# LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld


Hi Anton, thanks for the report.
glibc has certainly many scalability problems.

One of the known problem is its (ab)use of mmap() to allocate one (yes : one 
!) page every time you fopen() a file. And then a munmap() at fclose() time.



mmap()/munmap() should be avoided as hell in multithreaded programs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugme-new] [Bug 8187] New: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801

2007-03-12 Thread Greg KH
On Mon, Mar 12, 2007 at 10:19:52PM -0800, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:30:05 -0700 [EMAIL PROTECTED] wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=8187
> > 
> >Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> > Kernel Version: 2.6.20
> > Status: NEW
> >   Severity: normal
> >  Owner: [EMAIL PROTECTED]
> >  Submitter: [EMAIL PROTECTED]
> > 
> > 
> > Most recent kernel where this bug did *NOT* occur:
> > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> > 
> > Distribution:  Slackware 11.0
> > Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> > Software Environment:  Xorg 6.9.0
> > Problem Description:
> > 
> > Alan Cox introduced a "PCI: Quirks" patch (git commit
> > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> > I82801 platform.  Specifically, it causes the PCI initialisation to become
> > buggered; Xorg 6.9.0 dumps the following to the console:
> > (EE) end of block range 0x177 < begin 0x3f0
> > (EE) end of block range 0x177 < begin 0x3f0
> > (WW) INVALID IO ALLOCATION b: 0x14d0 e: 0x14d7 correcting
> > [...]
> > Backtrace:
> > 0: X(xf86SigHandler+0x8a) [0x8088b2a]
> > 1: [0xb7f2b420]
> > 2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> > 3: X(InitOutput+0xb83) [0x8072713]
> > 4: X(main+0x226) [0x80d4496]
> > 5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> > 6: X [0x806ff61]
> > 
> > Fatal server error:
> > Caught signal 11.  Server aborting
> > 
> > Steps to reproduce:
> > 
> > Reverting the git commit mentioned above fixes the issue.  Apparently, this 
> > may
> > be limited to certain combinations of on-motherboard chipsets, as I haven't 
> > seen
> > many bug reports.  Googling shows some people having X11 segfault issues 
> > with
> > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due 
> > to
> > the evdev driver and not PCI initialisation.
> > 
> > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks 
> > ago
> > but have heard nothing, so I'm leaving a bug here instead.
> > 
> 
> argh.
> 
> Would we break more machines than we fix if we just revert that?

I don't know, Alan?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Con Kolivas
On Tuesday 13 March 2007 16:10, Mike Galbraith wrote:
> On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote:
> > On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > > As soon as your cpu is fully utilized, fairness looses or interactivity
> > > loses.  Pick one.
> >
> > That's not true unless you refuse to prioritise your tasks
> > accordingly. Let's take this discussion in a different direction. You
> > already nice your lame processes. Why? You already have the concept
> > that you are prioritising things to normal or background tasks. You
> > say so yourself that lame is a background task. Stating the bleedingly
> > obvious, the unix way of prioritising things is via nice. You already
> > do that. So moving on from that...
>
> Sure.  If a user wants to do anything interactive, they can indeed nice
> 19 the rest of their box before they start.
>
> > Your test case you ask "how can I maximise cpu usage". Well you know
> > the answer already. You run two threads. I won't dispute that.
> >
> > The debate seems to be centered on whether two tasks that are niced +5
> > or to a higher value is background. In my opinion, nice 5 is not
> > background, but relatively less cpu. You already are savvy enough to
> > be using two threads and nicing them. All I ask you to do when using
> > RSDL is to change your expectations slightly and your settings from
> > nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you?
>
> It's not "offensive" to me, it is a behavioral regression.  The
> situation as we speak is that you can run cpu intensive tasks while
> watching eye-candy.  With RSDL, you can't, you feel the non-interactive
> load instantly.  Doesn't the fact that you're asking me to lower my
> expectations tell you that I just might have a point?

Yet looking at the mainline scheduler code, nice 5 tasks are also supposed to 
get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% cpu 
with a fully cpu bound task in the presence of an interactive task. To me 
that means mainline is not living up to my expectations. What you're saying 
is your expectations are based on a false cpu expectation from nice 5. You 
can spin it both ways. It seems to me the only one that lives up to a defined 
expectation is to be fair. Anything else is at best vague, and at worst 
starvation prone.

> > Please don't pick 5.none of the above. Please try to work with me on
> > this.
>
> I'm not trying to be pig-headed.  I'm of the opinion that fairness is
> great... until you strictly enforce it wrt interactive tasks.

How about answering my question then since I offered you numerous combinations 
of ways to tackle the problem? The simplest one doesn't even need code, it 
just needs you to alter the nice value that you're already setting.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Gene Heskett
On Tuesday 13 March 2007, Willy Tarreau wrote:
>On Tue, Mar 13, 2007 at 12:04:42AM -0400, Gene Heskett wrote:
>> On Monday 12 March 2007, Nish Aravamudan wrote:
>> >On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
>> >> On Monday 12 March 2007, Douglas McNaught wrote:
>> >> >Patrick Mau <[EMAIL PROTECTED]> writes:
>> >> >> Why not temporarly replace "/bin/tar" with a shell script that
>> >> >> does:
>> >> >>
>> >> >> #!/bin/sh
>> >> >> exec strace -f -o output /bin/real.tar $@
>> >> >
>> >> >You beat me to it.  :) I've done that before; it's a great
>> >> > suggestion.
>> >> >
>> >> >Except that if you expect 'tar' to be invoked multiple times in a
>> >> > run, you should probably use 'output.$$' for the output filename
>> >> > so things don't get clobbered.
>> >> >
>> >> >-Doug
>> >>
>> >> In my case, Doug, it will get invoked 64 times, amanda does a dummy
>> >> run to get an estimate, calculates what to do based on that output
>> >> which is 32 runs, 1 per disklist entry and I have 32, and then
>> >> reruns tar with the appropriate level options against each
>> >> individual disklist entry.
>> >>
>> >> But I'm puzzled a bit, what does the double $$ do?, or it buried
>> >> someplace in the bash manpage?  Its not something I've stumbled
>> >> over yet.
>> >
>> >buried indeed:
>> >
>> >"Special Parameters:
>> >  ...
>> >   $  Expands to the process ID of the shell.  In a  ()
>> > subshell,  it expands  to  the  process  ID of the current shell,
>> > not the sub?$B!> shell.
>> >"
>>
>> Well, that's clear enough, but what of the double $$ case?  Would this
>> them make a PID unique to each invocation untill it finally wraps a 16
>> bit value, or will the kernel re-use them because they won't all be
>> running simultainiously, but limited by the number of unique 'spindle'
>> numbers on the system, this to prevent as best as it can, the
>> thrashing of a drive by having tar working on 2 separate (or more)
>> partitions at the same time.  In my case 2 are possible, as /var is on
>> a separate drive.
>
>Yes there a risk of wrapping, but it is very small. You can add the
> command line arguments to the file name if you want, like this :
>
>#!/bin/sh
>exec strace -f -o "output.$$.${*//\//_}" /bin/real.tar $@
>
>It will name the output file "output..", replacing slashes
> with underscores. This is very dirty but can help.
>
Excellent Willy, thanks.

>Cheers,
>Willy



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Whatever doesn't succeed in two months and a half in California will
never succeed.
-- Rev. Henry Durant, founder of the University of California
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cleanfile: a script to clean up stealth whitespace

2007-03-12 Thread H. Peter Anvin

H. Peter Anvin wrote:


Fair enough.

It'd be nice to have a clean-up-a-patch version of this.  So it does
all these things, except it only changes lines which start with ^+.


It can do everything except kill empty lines at the end of the file; a 
patch simply doesn't contain enough information to know if blank lines 
are inserted at the end of a file as opposed in the middle of the file.


It can, of course, be done if the unpatched material is available, 
probably by applying the patch and seeing what happens.




Correction: for a context/unified diff it can be done by observing that 
there is no context left at the end of the file.  It won't work if the 
file already have empty space at the end of it, but that's probably good 
enough.  I'll cook something up.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)

2007-03-12 Thread Srivatsa Vaddagiri
On Tue, Mar 13, 2007 at 10:57:16AM +0530, Gautham R Shenoy wrote:
> CPU_DEAD:
> thaw_process(p);
> kthread_stop(p);
> p = NULL;

This neednt guarantee that the thread will see the stop request before
it exits the kthread_should_stop_freeze() function. There will always
be races .. So the only safe way for a thread to know whether it is time
to exit is:

while (!kthread_should_stop_freeze()) {
 if (!cpu_online(home_cpu)) 
goto wait_to_die;

...

}

wait_to_die:

while (!kthread_should_stop()) {
/* sleep */
}


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cleanfile: a script to clean up stealth whitespace

2007-03-12 Thread H. Peter Anvin

Andrew Morton wrote:

On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:
This script cleans up various classes of stealth whitespace.  In
particular, it cleans up:

- Whitespace (spaces or tabs)before newline;
- DOS line endings (CR before LF);
- Space before tab (spaces are deleted or converted to tabs);
- Empty lines at end of file.


Fair enough.

It'd be nice to have a clean-up-a-patch version of this.  So it does
all these things, except it only changes lines which start with ^+.


It can do everything except kill empty lines at the end of the file; a 
patch simply doesn't contain enough information to know if blank lines 
are inserted at the end of a file as opposed in the middle of the file.


It can, of course, be done if the unpatched material is available, 
probably by applying the patch and seeing what happens.


Let me know if you still want it; I'll whip it up.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix vmi time header bug

2007-03-12 Thread Andrew Morton
> On Mon, 12 Mar 2007 14:58:08 -0800 Zachary Amsden <[EMAIL PROTECTED]> wrote:
> Some gcc put this function in .init.text because the header didn't 
> match.  For 2.6.21-rc.
> 
> Zach
> 
> 
> [vmi-devinit-header-fix.patch  text/plain (606B)]
> 
> 
> Index: linux-2.6.21/include/asm-i386/vmi_time.h
> ===
> --- linux-2.6.21.orig/include/asm-i386/vmi_time.h 2007-03-06 
> 18:56:03.0 -0800
> +++ linux-2.6.21/include/asm-i386/vmi_time.h  2007-03-12 13:55:16.0 
> -0800
> @@ -54,7 +54,7 @@ extern unsigned long vmi_cpu_khz(void);
>  
>  #ifdef CONFIG_X86_LOCAL_APIC
>  extern void __init vmi_timer_setup_boot_alarm(void);
> -extern void __init vmi_timer_setup_secondary_alarm(void);
> +extern void __devinit vmi_timer_setup_secondary_alarm(void);
>  extern void apic_vmi_timer_interrupt(void);
>  #endif

Really truly?   I think we have a _lot_ of declarations which omit the section
qualifier altogether.  How come they don't all break too?

(ARM (at least) in fact does require the section tagging on the declaration as
well as the definition, but we've thus far only fixed that in a couple of places
which were causing breakage).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-12 Thread David Miller
From: Willy Tarreau <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 05:32:07 +0100

> On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote:
> > On Tuesday 13 March 2007 10:46, David Miller wrote:
> > > From: Con Kolivas <[EMAIL PROTECTED]>
> > > Date: Mon, 12 Mar 2007 10:58:11 +1100
> > >
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > > >30.patch
> > >
> > > FWIW, this boots and seems to work well on sparc64.  Tested
> > > on UP SunBlade1500 and 24cpu Niagara T1000.
> > 
> > Very nice. Thanks for the feedback and I'm sorry you have to work with such 
> > lousy hardware.
> 
> BTW, I don't know if you say this as a joke,

He was definitely being sarcastic, relax :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)

2007-03-12 Thread Gautham R Shenoy
On Sun, Mar 11, 2007 at 06:49:08PM +0100, Rafael J. Wysocki wrote:
> On Saturday, 3 March 2007 18:32, Oleg Nesterov wrote:
> > On 03/02, Paul E. McKenney wrote:
> > >
> > > On Sat, Mar 03, 2007 at 02:33:37AM +0300, Oleg Nesterov wrote:
> > > > On 03/02, Paul E. McKenney wrote:
> > > > >
> > > > > One way to embed try_to_freeze() into kthread_should_stop() might be
> > > > > as follows:
> > > > > 
> > > > >   int kthread_should_stop(void)
> > > > >   {
> > > > >   if (kthread_stop_info.k == current)
> > > > >   return 1;
> > > > >   try_to_freeze();
> > > > >   return 0;
> > > > >   }
> > > > 
> > > > I think this is dangerous. For example, worker_thread() will probably
> > > > need some special actions after return from refrigerator. Also, a kernel
> > > > thread may check kthread_should_stop() in the place where 
> > > > try_to_freeze()
> > > > is not safe.
> > > > 
> > > > Perhaps we should introduce a new helper which does this.
> > > 
> > > Good point -- the return value from try_to_freeze() is lost if one uses
> > > the above approach.  About one third of the calls to try_to_freeze()
> > > in 2.6.20 pay attention to the return value.
> > > 
> > > One approach would be to have a kthread_should_stop_nofreeze() for those
> > > cases, and let the default be to try to freeze.
> > 
> > I personally think we should do the opposite, add 
> > kthread_should_stop_check_freeze()
> > or something. kthread_should_stop() is like signal_pending(), we can use
> > it under spin_lock (and it is probably used this way by some out-of-tree
> > driver). The new helper is obviously "might_sleep()".
> 
> Something like this, perhaps:
> 
>  include/linux/kthread.h |1 +
>  kernel/kthread.c|   16 
>  kernel/rcutorture.c |5 ++---
>  3 files changed, 19 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.21-rc3-mm2/kernel/kthread.c
> ===
> --- linux-2.6.21-rc3-mm2.orig/kernel/kthread.c2007-03-08 
> 21:58:48.0 +0100
> +++ linux-2.6.21-rc3-mm2/kernel/kthread.c 2007-03-11 18:32:59.0 
> +0100
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  /*
> @@ -60,6 +61,21 @@ int kthread_should_stop(void)
>  }
>  EXPORT_SYMBOL(kthread_should_stop);
> 
> +/**
> + * kthread_should_stop_check_freeze - check if the thread should return now 
> and
> + * if not, check if there is a freezing request pending for it.
> + */
> +int kthread_should_stop_check_freeze(void)
> +{
> + might_sleep();
> + if (kthread_stop_info.k == current)
> + return 1;
> +
> + try_to_freeze();
> + return 0;
> +}
> +EXPORT_SYMBOL(kthread_should_stop_check_freeze);

I would prefer to have try_to_freeze() followed by the
kthread_stop_info.k check. Something like

if (try_to_freeze())
/*some barrier ensuring all writes are completed */

if (kthread_stop_info.k == current)
return 1;
return 0;

This would be helpful in situations (atleast for cpu-hotplug)
where we want to stop a frozen thread immediately after thawing it.
Something like

CPU_DEAD:
thaw_process(p);
kthread_stop(p);
p = NULL;

Is there a problem with this line of thinking ?

thanks and regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Removal of multipath cached (was Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.)

2007-03-12 Thread Andrew Morton
> On Mon, 12 Mar 2007 13:53:11 -0700 (PDT) David Miller <[EMAIL PROTECTED]> 
> wrote:
> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Mon, 12 Mar 2007 12:51:37 +0100
> 
> > But until then it'll unnecessarily spoil linux opinion as regards
> > stability and waste time of developers to check error messages.
> > So, maybe it's less evil to check those NULLs where possible and add
> > some WARN_ONs here and there...
> 
> It's a crash either way, so zero improvement.
> 
> And _THIS_ is my big problem with the multi-path cached code in the
> kernel.
> 
> NOBODY wants to step up and fix the code, but people refuse to let it
> get removed from the tree.  That is totally unacceptable, so I'm going
> to FIX THIS.
> 
> I'm going to FIX IT by saying that if nobody steps up to the plate to
> fix the multipath cached code by 2.6.23 IT IS GONE forver.
> 
> And there is absolutely no negotiations about this, I've held back on
> this for nearly 2 years, and nothing has happened, this code is not
> maintained, nobody cares enough to fix the bugs, and even no
> distributions enable it because it causes crashes.

Good stuff.

I suggest you put a big printk explaining the above into 2.6.21.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Need help on mach-ep93xx

2007-03-12 Thread Maxin John

Hi,

I have one question mach-ep93xx.

In  EP93xx  IRQ handling part in core.c,  the 2.6.19.2 kernel and
newer  kernels are configuring the 16 interrupts of the ports A & B
together. The code is not using the  interrupt capability  of the port
F which can provide 3 interrupts.

Why the port F is not configured for interrupts ?

Thanks in advance,

Maxin B. John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugme-new] [Bug 8187] New: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801

2007-03-12 Thread Andrew Morton
> On Mon, 12 Mar 2007 13:30:05 -0700 [EMAIL PROTECTED] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8187
> 
>Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> Kernel Version: 2.6.20
> Status: NEW
>   Severity: normal
>  Owner: [EMAIL PROTECTED]
>  Submitter: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did *NOT* occur:
> Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> 
> Distribution:  Slackware 11.0
> Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> Software Environment:  Xorg 6.9.0
> Problem Description:
> 
> Alan Cox introduced a "PCI: Quirks" patch (git commit
> 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> I82801 platform.  Specifically, it causes the PCI initialisation to become
> buggered; Xorg 6.9.0 dumps the following to the console:
>   (EE) end of block range 0x177 < begin 0x3f0
>   (EE) end of block range 0x177 < begin 0x3f0
>   (WW) INVALID IO ALLOCATION b: 0x14d0 e: 0x14d7 correcting
> [...]
>   Backtrace:
>   0: X(xf86SigHandler+0x8a) [0x8088b2a]
>   1: [0xb7f2b420]
>   2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
>   3: X(InitOutput+0xb83) [0x8072713]
>   4: X(main+0x226) [0x80d4496]
>   5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
>   6: X [0x806ff61]
> 
>   Fatal server error:
>   Caught signal 11.  Server aborting
> 
> Steps to reproduce:
> 
> Reverting the git commit mentioned above fixes the issue.  Apparently, this 
> may
> be limited to certain combinations of on-motherboard chipsets, as I haven't 
> seen
> many bug reports.  Googling shows some people having X11 segfault issues with
> 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> the evdev driver and not PCI initialisation.
> 
> I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks 
> ago
> but have heard nothing, so I'm leaving a bug here instead.
> 

argh.

Would we break more machines than we fix if we just revert that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cleanfile: a script to clean up stealth whitespace

2007-03-12 Thread Andrew Morton
> On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:
> This script cleans up various classes of stealth whitespace.  In
> particular, it cleans up:
> 
> - Whitespace (spaces or tabs)before newline;
> - DOS line endings (CR before LF);
> - Space before tab (spaces are deleted or converted to tabs);
> - Empty lines at end of file.

Fair enough.

It'd be nice to have a clean-up-a-patch version of this.  So it does
all these things, except it only changes lines which start with ^+.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-12 Thread Nick Piggin

Anton Blanchard wrote:
 
Hi Nick,




Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
at this, send me a mail (eg. especially with the sched_setscheduler issue,
you might be able to do something better).



I took a look at this today and figured Id document it:

http://ozlabs.org/~anton/linux/sysbench/

Bottom line: it looks like issues in the glibc malloc library, replacing
it with the google malloc library fixes the negative scaling:

# apt-get install libgoogle-perftools0
# LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld


Hi Anton,

Very cool. Yeah I had come to the conclusion that it wasn't a kernel
issue, and basically was afraid to look into userspace ;)

That bogus setscheduler thing must surely have never worked, though.
I wonder if FreeBSD avoids the scalability issue because it is using
SCHED_RR there, or because it has a decent threaded malloc implementation.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith
On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote:
> On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote:

> > As soon as your cpu is fully utilized, fairness looses or interactivity
> > loses.  Pick one.
> 
> That's not true unless you refuse to prioritise your tasks
> accordingly. Let's take this discussion in a different direction. You
> already nice your lame processes. Why? You already have the concept
> that you are prioritising things to normal or background tasks. You
> say so yourself that lame is a background task. Stating the bleedingly
> obvious, the unix way of prioritising things is via nice. You already
> do that. So moving on from that...

Sure.  If a user wants to do anything interactive, they can indeed nice
19 the rest of their box before they start.

> Your test case you ask "how can I maximise cpu usage". Well you know
> the answer already. You run two threads. I won't dispute that.
> 
> The debate seems to be centered on whether two tasks that are niced +5
> or to a higher value is background. In my opinion, nice 5 is not
> background, but relatively less cpu. You already are savvy enough to
> be using two threads and nicing them. All I ask you to do when using
> RSDL is to change your expectations slightly and your settings from
> nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you?

It's not "offensive" to me, it is a behavioral regression.  The
situation as we speak is that you can run cpu intensive tasks while
watching eye-candy.  With RSDL, you can't, you feel the non-interactive
load instantly.  Doesn't the fact that you're asking me to lower my
expectations tell you that I just might have a point?

> Please don't pick 5.none of the above. Please try to work with me on this.

I'm not trying to be pig-headed.  I'm of the opinion that fairness is
great... until you strictly enforce it wrt interactive tasks.  

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-12 Thread Andrew Morton
> On Mon, 12 Mar 2007 23:41:29 +0100 Herbert Poetzl <[EMAIL PROTECTED]> wrote:
> On Mon, Mar 12, 2007 at 11:42:59AM -0700, Dave Hansen wrote:
> > How about we drill down on these a bit more.
> > 
> > On Mon, 2007-03-12 at 02:00 +0100, Herbert Poetzl wrote:
> > >  - shared mappings of 'shared' files (binaries 
> > >and libraries) to allow for reduced memory
> > >footprint when N identical guests are running
> > 
> > So, it sounds like this can be phrased as a requirement like:
> > 
> > "Guests must be able to share pages."
> > 
> > Can you give us an idea why this is so? 
> 
> sure, one reason for this is that guests tend to
> be similar (or almost identical) which results
> in quite a lot of 'shared' libraries and executables
> which would otherwise get cached for each guest and
> would also be mapped for each guest separately

nooo.  What you're saying there amounts to text replication.  There is
no proposal here to create duplicated copies of pagecache pages: the VM
just doesn't support that (Nick has soe protopatches which do this as a
possible NUMA optimisation).

So these mmapped pages will contiue to be shared across all guests.  The
problem boils down to "which guest(s) get charged for each shared page".

A simple and obvious and easy-to-implement answer is "the guest which paged
it in".  I think we should firstly explain why that is insufficient.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-12 Thread Felipe Alfaro Solana

On 3/13/07, Willy Tarreau <[EMAIL PROTECTED]> wrote:

On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote:
> On Tuesday 13 March 2007 10:46, David Miller wrote:
> > From: Con Kolivas <[EMAIL PROTECTED]>
> > Date: Mon, 12 Mar 2007 10:58:11 +1100
> >
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > >30.patch
> >
> > FWIW, this boots and seems to work well on sparc64.  Tested
> > on UP SunBlade1500 and 24cpu Niagara T1000.
>
> Very nice. Thanks for the feedback and I'm sorry you have to work with such
> lousy hardware.

BTW, I don't know if you say this as a joke, but those are not necessarily
lousy hardware. Sun does lousy hardware when they put Sparcs in PCs (ultra5,
ultra10, blade100). But their servers generally are nice with large memory
busses and very scalable SMP architectures.


I guess Con was kidding. A 24-CPU system can be anything but lousy hardware.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Fix some coding-style errors in autofs

2007-03-12 Thread Randy.Dunlap
On Mon, 12 Mar 2007 [EMAIL PROTECTED] wrote:

> From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
> Subject: [PATCH 1/2] Fix some coding-style errors in autofs
>
> Fix coding style errors (extra spaces, long lines) in autofs
> and autofs4 files being modified for container/pidspace issues.
>
> ---
>  fs/autofs/inode.c  |   29 +++
>  fs/autofs/root.c   |   77 
> ++---
>  fs/autofs4/inode.c |   16 ---
>  fs/autofs4/root.c  |   18 ++--
>  4 files changed, 70 insertions(+), 70 deletions(-)
>
> Index: lx26-20-mm2c/fs/autofs/inode.c
> ===
> --- lx26-20-mm2c.orig/fs/autofs/inode.c   2007-02-28 14:48:35.0 
> -0800
> +++ lx26-20-mm2c/fs/autofs/inode.c2007-02-28 15:47:09.0 -0800
> @@ -34,12 +34,12 @@ void autofs_kill_sb(struct super_block *
>
>   autofs_hash_nuke(sbi);
> - for ( n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++ ) {
> - if ( test_bit(n, sbi->symlink_bitmap) )
> + for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) {
> + if (test_bit(n, sbi->symlink_bitmap))
>   kfree(sbi->symlink[n].data);
>   }

Please do a complete job on the 'for' line by eliminating the
space before each semi-colon.

-- 
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Replace pid_t in autofs with struct pid reference

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 2/2] Replace pid_t in autofs with struct pid reference.

Make autofs container-friendly by caching struct pid reference rather
than pid_t and using pid_nr() to retreive a task's pid_t.

ChangeLog:
- Fix Eric Biederman's comments - Use find_get_pid() to hold a
  reference to oz_pgrp and release while unmounting; separate out
  changes to autofs and autofs4.
- Fix Cedric's comments: retain old prototype of parse_options()
  and move necessary change to its caller.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: Eric Biederman <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>

---
 fs/autofs/autofs_i.h |4 ++--
 fs/autofs/inode.c|   20 
 fs/autofs/root.c |6 --
 3 files changed, 22 insertions(+), 8 deletions(-)

Index: lx26-21-rc3-mm2/fs/autofs/autofs_i.h
===
--- lx26-21-rc3-mm2.orig/fs/autofs/autofs_i.h   2007-03-12 17:12:05.0 
-0700
+++ lx26-21-rc3-mm2/fs/autofs/autofs_i.h2007-03-12 17:18:55.0 
-0700
@@ -101,7 +101,7 @@ struct autofs_symlink {
 struct autofs_sb_info {
u32 magic;
struct file *pipe;
-   pid_t oz_pgrp;
+   struct pid *oz_pgrp;
int catatonic;
struct super_block *sb;
unsigned long exp_timeout;
@@ -122,7 +122,7 @@ static inline struct autofs_sb_info *aut
filesystem without "magic".) */
 
 static inline int autofs_oz_mode(struct autofs_sb_info *sbi) {
-   return sbi->catatonic || process_group(current) == sbi->oz_pgrp;
+   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 }
 
 /* Hash operations */
Index: lx26-21-rc3-mm2/fs/autofs/inode.c
===
--- lx26-21-rc3-mm2.orig/fs/autofs/inode.c  2007-03-12 17:18:48.0 
-0700
+++ lx26-21-rc3-mm2/fs/autofs/inode.c   2007-03-12 17:18:55.0 -0700
@@ -37,6 +37,8 @@ void autofs_kill_sb(struct super_block *
if (!sbi->catatonic)
autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */
 
+   put_pid(sbi->oz_pgrp);
+
autofs_hash_nuke(sbi);
for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) {
if (test_bit(n, sbi->symlink_bitmap))
@@ -139,6 +141,7 @@ int autofs_fill_super(struct super_block
int pipefd;
struct autofs_sb_info *sbi;
int minproto, maxproto;
+   pid_t pgid;
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
@@ -150,7 +153,6 @@ int autofs_fill_super(struct super_block
sbi->pipe = NULL;
sbi->catatonic = 1;
sbi->exp_timeout = 0;
-   sbi->oz_pgrp = process_group(current);
autofs_initialize_hash(&sbi->dirhash);
sbi->queues = NULL;
memset(sbi->symlink_bitmap, 0, sizeof(long)*AUTOFS_SYMLINK_BITMAP_LEN);
@@ -171,7 +173,7 @@ int autofs_fill_super(struct super_block
 
/* Can this call block?  - WTF cares? s is locked. */
if (parse_options(data, &pipefd, &root_inode->i_uid,
-   &root_inode->i_gid, &sbi->oz_pgrp, &minproto,
+   &root_inode->i_gid, &pgid, &minproto,
&maxproto)) {
printk("autofs: called with bogus options\n");
goto fail_dput;
@@ -184,13 +186,21 @@ int autofs_fill_super(struct super_block
goto fail_dput;
}
 
-   DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, sbi->oz_pgrp));
+   DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, pgid));
+   sbi->oz_pgrp = find_get_pid(pgid);
+
+   if (!sbi->oz_pgrp) {
+   printk("autofs: could not find process group %d\n", pgid);
+   goto fail_dput;
+   }
+
pipe = fget(pipefd);

if (!pipe) {
printk("autofs: could not open pipe file descriptor\n");
-   goto fail_dput;
+   goto fail_put_pid;
}
+
if (!pipe->f_op || !pipe->f_op->write)
goto fail_fput;
sbi->pipe = pipe;
@@ -205,6 +215,8 @@ int autofs_fill_super(struct super_block
 fail_fput:
printk("autofs: pipe file descriptor does not contain proper ops\n");
fput(pipe);
+fail_put_pid:
+   put_pid(sbi->oz_pgrp);
 fail_dput:
dput(root);
goto fail_free;
Index: lx26-21-rc3-mm2/fs/autofs/root.c
===
--- lx26-21-rc3-mm2.orig/fs/autofs/root.c   2007-03-12 17:18:48.0 
-0700
+++ lx26-21-rc3-mm2/fs/autofs/root.c2007-03-12 17:18:55.0 -0700
@@ -213,8 +213,10 @@ static struct dentry *autofs_root_lookup
sbi = autofs_sbi(dir->i_sb);
 
oz_mode = autof

[PATCH 1/2] Fix some coding-style errors in autofs

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 1/2] Fix some coding-style errors in autofs

Fix coding style errors (extra spaces, long lines) in autofs
and autofs4 files being modified for container/pidspace issues.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Cc: Eric W. Biederman <[EMAIL PROTECTED]>
---
 fs/autofs/inode.c  |   29 +++
 fs/autofs/root.c   |   77 ++---
 fs/autofs4/inode.c |   16 ---
 fs/autofs4/root.c  |   18 ++--
 4 files changed, 70 insertions(+), 70 deletions(-)

Index: lx26-20-mm2c/fs/autofs/inode.c
===
--- lx26-20-mm2c.orig/fs/autofs/inode.c 2007-02-28 14:48:35.0 -0800
+++ lx26-20-mm2c/fs/autofs/inode.c  2007-02-28 15:47:09.0 -0800
@@ -34,12 +34,12 @@ void autofs_kill_sb(struct super_block *
if (!sbi)
goto out_kill_sb;
 
-   if ( !sbi->catatonic )
+   if (!sbi->catatonic)
autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */
 
autofs_hash_nuke(sbi);
-   for ( n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++ ) {
-   if ( test_bit(n, sbi->symlink_bitmap) )
+   for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) {
+   if (test_bit(n, sbi->symlink_bitmap))
kfree(sbi->symlink[n].data);
}
 
@@ -69,7 +69,8 @@ static match_table_t autofs_tokens = {
{Opt_err, NULL}
 };
 
-static int parse_options(char *options, int *pipefd, uid_t *uid, gid_t *gid, 
pid_t *pgrp, int *minproto, int *maxproto)
+static int parse_options(char *options, int *pipefd, uid_t *uid, gid_t *gid,
+   pid_t *pgrp, int *minproto, int *maxproto)
 {
char *p;
substring_t args[MAX_OPT_ARGS];
@@ -140,7 +141,7 @@ int autofs_fill_super(struct super_block
int minproto, maxproto;
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
-   if ( !sbi )
+   if (!sbi)
goto fail_unlock;
DPRINTK(("autofs: starting up, sbi = %p\n",sbi));
 
@@ -169,14 +170,16 @@ int autofs_fill_super(struct super_block
goto fail_iput;
 
/* Can this call block?  - WTF cares? s is locked. */
-   if ( 
parse_options(data,&pipefd,&root_inode->i_uid,&root_inode->i_gid,&sbi->oz_pgrp,&minproto,&maxproto)
 ) {
+   if (parse_options(data, &pipefd, &root_inode->i_uid,
+   &root_inode->i_gid, &sbi->oz_pgrp, &minproto,
+   &maxproto)) {
printk("autofs: called with bogus options\n");
goto fail_dput;
}
 
/* Couldn't this be tested earlier? */
-   if ( minproto > AUTOFS_PROTO_VERSION || 
-maxproto < AUTOFS_PROTO_VERSION ) {
+   if (minproto > AUTOFS_PROTO_VERSION ||
+maxproto < AUTOFS_PROTO_VERSION) {
printk("autofs: kernel does not match daemon version\n");
goto fail_dput;
}
@@ -184,11 +187,11 @@ int autofs_fill_super(struct super_block
DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, sbi->oz_pgrp));
pipe = fget(pipefd);

-   if ( !pipe ) {
+   if (!pipe) {
printk("autofs: could not open pipe file descriptor\n");
goto fail_dput;
}
-   if ( !pipe->f_op || !pipe->f_op->write )
+   if (!pipe->f_op || !pipe->f_op->write)
goto fail_fput;
sbi->pipe = pipe;
sbi->catatonic = 0;
@@ -230,7 +233,7 @@ static void autofs_read_inode(struct ino
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
inode->i_blocks = 0;
 
-   if ( ino == AUTOFS_ROOT_INO ) {
+   if (ino == AUTOFS_ROOT_INO) {
inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO | S_IWUSR;
inode->i_op = &autofs_root_inode_operations;
inode->i_fop = &autofs_root_operations;
@@ -241,12 +244,12 @@ static void autofs_read_inode(struct ino
inode->i_uid = inode->i_sb->s_root->d_inode->i_uid;
inode->i_gid = inode->i_sb->s_root->d_inode->i_gid;

-   if ( ino >= AUTOFS_FIRST_SYMLINK && ino < AUTOFS_FIRST_DIR_INO ) {
+   if (ino >= AUTOFS_FIRST_SYMLINK && ino < AUTOFS_FIRST_DIR_INO) {
/* Symlink inode - should be in symlink list */
struct autofs_symlink *sl;
 
n = ino - AUTOFS_FIRST_SYMLINK;
-   if ( n >= AUTOFS_MAX_SYMLINKS || 
!test_bit(n,sbi->symlink_bitmap)) {
+   if (n >= AUTOFS_MAX_SYMLINKS || 
!test_bit(n,sbi->symlink_bitmap)) {
printk("autofs: Looking for bad symlink inode %u\n", 
(unsigned int) ino);
return;
}
Index: lx26-20-mm2c/fs/autofs/root.c
=

[PATCH] Kill unused sesssion and group values in rocket driver

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH] Kill unused sesssion and group values in rocket driver

The process_session() and process_group() values are not really
used by the driver.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Cc: Eric W. Biederman <[EMAIL PROTECTED]>
---
 drivers/char/rocket.c |3 ---
 drivers/char/rocket_int.h |2 --
 2 files changed, 5 deletions(-)

Index: lx26-20-mm2c/drivers/char/rocket.c
===
--- lx26-20-mm2c.orig/drivers/char/rocket.c 2007-02-28 19:23:00.0 
-0800
+++ lx26-20-mm2c/drivers/char/rocket.c  2007-02-28 19:24:41.0 -0800
@@ -1018,9 +1018,6 @@ static int rp_open(struct tty_struct *tt
/*
 * Info->count is now 1; so it's safe to sleep now.
 */
-   info->session = process_session(current);
-   info->pgrp = process_group(current);
-
if ((info->flags & ROCKET_INITIALIZED) == 0) {
cp = &info->channel;
sSetRxTrigger(cp, TRIG_1);
Index: lx26-20-mm2c/drivers/char/rocket_int.h
===
--- lx26-20-mm2c.orig/drivers/char/rocket_int.h 2007-02-28 19:23:00.0 
-0800
+++ lx26-20-mm2c/drivers/char/rocket_int.h  2007-02-28 19:24:41.0 
-0800
@@ -1156,8 +1156,6 @@ struct r_port {
int xmit_head;
int xmit_tail;
int xmit_cnt;
-   int session;
-   int pgrp;
int cd_status;
int ignore_status_mask;
int read_status_mask;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Willy Tarreau
On Tue, Mar 13, 2007 at 12:04:42AM -0400, Gene Heskett wrote:
> On Monday 12 March 2007, Nish Aravamudan wrote:
> >On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
> >> On Monday 12 March 2007, Douglas McNaught wrote:
> >> >Patrick Mau <[EMAIL PROTECTED]> writes:
> >> >> Why not temporarly replace "/bin/tar" with a shell script that
> >> >> does:
> >> >>
> >> >> #!/bin/sh
> >> >> exec strace -f -o output /bin/real.tar $@
> >> >
> >> >You beat me to it.  :) I've done that before; it's a great
> >> > suggestion.
> >> >
> >> >Except that if you expect 'tar' to be invoked multiple times in a
> >> > run, you should probably use 'output.$$' for the output filename so
> >> > things don't get clobbered.
> >> >
> >> >-Doug
> >>
> >> In my case, Doug, it will get invoked 64 times, amanda does a dummy
> >> run to get an estimate, calculates what to do based on that output
> >> which is 32 runs, 1 per disklist entry and I have 32, and then reruns
> >> tar with the appropriate level options against each individual
> >> disklist entry.
> >>
> >> But I'm puzzled a bit, what does the double $$ do?, or it buried
> >> someplace in the bash manpage?  Its not something I've stumbled over
> >> yet.
> >
> >buried indeed:
> >
> >"Special Parameters:
> >  ...
> >   $  Expands to the process ID of the shell.  In a  () 
> > subshell,  it expands  to  the  process  ID of the current shell, not
> > the sub?$B!> shell.
> >"
> 
> Well, that's clear enough, but what of the double $$ case?  Would this 
> them make a PID unique to each invocation untill it finally wraps a 16 
> bit value, or will the kernel re-use them because they won't all be 
> running simultainiously, but limited by the number of unique 'spindle' 
> numbers on the system, this to prevent as best as it can, the thrashing 
> of a drive by having tar working on 2 separate (or more) partitions at 
> the same time.  In my case 2 are possible, as /var is on a separate 
> drive.

Yes there a risk of wrapping, but it is very small. You can add the command
line arguments to the file name if you want, like this :

#!/bin/sh
exec strace -f -o "output.$$.${*//\//_}" /bin/real.tar $@

It will name the output file "output..", replacing slashes with
underscores. This is very dirty but can help.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] Use struct pid parameter in copy_process()

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 3/5] Use struct pid parameter in copy_process()

Modify copy_process() to take a struct pid * parameter instead of a pid_t.
This simplifies the code a bit and also avoids having to call find_pid()
to convert the pid_t to a struct pid.

Changelog: 
- Fixed Badari Pulavarty's comments and passed in &init_struct_pid
  from fork_idle().
- Fixed Eric Biederman's comments and simplified this patch and
  used a new patch to remove the likely(pid) check.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: Eric Biederman <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 kernel/fork.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

Index: lx26-21-rc3-mm2/kernel/fork.c
===
--- lx26-21-rc3-mm2.orig/kernel/fork.c  2007-03-12 17:16:39.0 -0700
+++ lx26-21-rc3-mm2/kernel/fork.c   2007-03-12 17:17:48.0 -0700
@@ -966,7 +966,7 @@ static struct task_struct *copy_process(
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
-   int pid)
+   struct pid *pid)
 {
int retval;
struct task_struct *p = NULL;
@@ -1033,7 +1033,7 @@ static struct task_struct *copy_process(
p->did_exec = 0;
delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
copy_flags(clone_flags, p);
-   p->pid = pid;
+   p->pid = pid_nr(pid);
 
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
@@ -1265,7 +1265,7 @@ static struct task_struct *copy_process(
list_add_tail_rcu(&p->tasks, &init_task.tasks);
__get_cpu_var(process_counts)++;
}
-   attach_pid(p, PIDTYPE_PID, find_pid(p->pid));
+   attach_pid(p, PIDTYPE_PID, pid);
nr_threads++;
}
 
@@ -1336,7 +1336,8 @@ struct task_struct * __cpuinit fork_idle
struct task_struct *task;
struct pt_regs regs;
 
-   task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, NULL, 0);
+   task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, NULL,
+   &init_struct_pid);
if (!IS_ERR(task))
init_idle(task, cpu);
 
@@ -1364,7 +1365,7 @@ long do_fork(unsigned long clone_flags,
return -EAGAIN;
nr = pid->nr;
 
-   p = copy_process(clone_flags, stack_start, regs, stack_size, 
parent_tidptr, child_tidptr, nr);
+   p = copy_process(clone_flags, stack_start, regs, stack_size, 
parent_tidptr, child_tidptr, pid);
/*
 * Do this prior waking up the new thread - the thread pointer
 * might get invalid after that point, if the thread exits quickly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] Explicitly set pgid and sid of init process

2007-03-12 Thread sukadev


From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 2/5] Explicitly set pgid and sid of init process

Explicitly set pgid and sid of init process to 1.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: Eric Biederman <[EMAIL PROTECTED]>
Cc: Herbert Poetzl <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 init/main.c |1 +
 1 file changed, 1 insertion(+)

Index: lx26-20-mm2c/init/main.c
===
--- lx26-20-mm2c.orig/init/main.c   2007-02-28 15:49:13.0 -0800
+++ lx26-20-mm2c/init/main.c2007-02-28 15:49:35.0 -0800
@@ -791,6 +791,7 @@ static int __init init(void * unused)
 */
init_pid_ns.child_reaper = current;
 
+   __set_special_pids(1, 1);
cad_pid = task_pid(current);
 
smp_prepare_cpus(max_cpus);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] Remove the likely(pid) check in copy_process

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 4/5] Remove the likely(pid) check in copy_process

Now that we pass in a struct pid parameter to copy_process()
and even the swapper (pid_t == 0) has a valid struct pid,
we no longer need this check.

Changelog: 
Per Eric Biederman's comments, moved this out to a separate
patch for easier review.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>

---
 kernel/fork.c |   34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

Index: lx26-20-mm2c/kernel/fork.c
===
--- lx26-20-mm2c.orig/kernel/fork.c 2007-02-28 15:08:46.0 -0800
+++ lx26-20-mm2c/kernel/fork.c  2007-02-28 15:33:20.0 -0800
@@ -1249,26 +1249,24 @@ static struct task_struct *copy_process(
}
}
 
-   if (likely(p->pid)) {
-   add_parent(p);
-   tracehook_init_task(p);
-
-   if (thread_group_leader(p)) {
-   pid_t pgid = process_group(current);
-   pid_t sid = process_session(current);
-
-   p->signal->tty = current->signal->tty;
-   p->signal->pgrp = pgid;
-   set_signal_session(p->signal, process_session(current));
-   attach_pid(p, PIDTYPE_PGID, find_pid(pgid));
-   attach_pid(p, PIDTYPE_SID, find_pid(sid));
+   add_parent(p);
+   tracehook_init_task(p);
 
-   list_add_tail_rcu(&p->tasks, &init_task.tasks);
-   __get_cpu_var(process_counts)++;
-   }
-   attach_pid(p, PIDTYPE_PID, pid);
-   nr_threads++;
+   if (thread_group_leader(p)) {
+   pid_t pgid = process_group(current);
+   pid_t sid = process_session(current);
+
+   p->signal->tty = current->signal->tty;
+   p->signal->pgrp = pgid;
+   set_signal_session(p->signal, process_session(current));
+   attach_pid(p, PIDTYPE_PGID, find_pid(pgid));
+   attach_pid(p, PIDTYPE_SID, find_pid(sid));
+
+   list_add_tail_rcu(&p->tasks, &init_task.tasks);
+   __get_cpu_var(process_counts)++;
}
+   attach_pid(p, PIDTYPE_PID, pid);
+   nr_threads++;
 
total_forks++;
spin_unlock(¤t->sighand->siglock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] Use task_pgrp() task_session() in copy_process()

2007-03-12 Thread sukadev

From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 5/5] Use task_pgrp() task_session() in copy_process().

Use task_pgrp() and task_session() in copy_process(), and
avoid find_pid() call when attaching the task to its process
group and session.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 kernel/fork.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

Index: lx26-21-rc3-mm2/kernel/fork.c
===
--- lx26-21-rc3-mm2.orig/kernel/fork.c  2007-03-12 17:18:03.0 -0700
+++ lx26-21-rc3-mm2/kernel/fork.c   2007-03-12 17:18:11.0 -0700
@@ -1252,14 +1252,11 @@ static struct task_struct *copy_process(
tracehook_init_task(p);
 
if (thread_group_leader(p)) {
-   pid_t pgid = process_group(current);
-   pid_t sid = process_session(current);
-
p->signal->tty = current->signal->tty;
-   p->signal->pgrp = pgid;
+   p->signal->pgrp = process_group(current);
set_signal_session(p->signal, process_session(current));
-   attach_pid(p, PIDTYPE_PGID, find_pid(pgid));
-   attach_pid(p, PIDTYPE_SID, find_pid(sid));
+   attach_pid(p, PIDTYPE_PGID, task_pgrp(current));
+   attach_pid(p, PIDTYPE_SID, task_session(current));
 
list_add_tail_rcu(&p->tasks, &init_task.tasks);
__get_cpu_var(process_counts)++;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] statically initialize struct pid for swapper

2007-03-12 Thread sukadev


From: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Subject: [PATCH 1/5] statically initialize struct pid for swapper

Statically initialize a struct pid for the swapper process (pid_t == 0) and
attach it to init_task.  This is needed so task_pid(), task_pgrp() and
task_session() interfaces work on the swapper process also.

Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
Cc: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Dave Hansen <[EMAIL PROTECTED]>
Cc: Serge Hallyn <[EMAIL PROTECTED]>
Cc: Eric Biederman <[EMAIL PROTECTED]>
Cc: Herbert Poetzl <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 include/linux/init_task.h |   27 +++
 include/linux/pid.h   |2 ++
 kernel/pid.c  |2 ++
 3 files changed, 31 insertions(+)

Index: lx26-20-mm2c/include/linux/init_task.h
===
--- lx26-20-mm2c.orig/include/linux/init_task.h 2007-02-28 15:47:44.0 
-0800
+++ lx26-20-mm2c/include/linux/init_task.h  2007-02-28 15:48:07.0 
-0800
@@ -96,6 +96,28 @@ extern struct group_info init_groups;
 #define INIT_PREEMPT_RCU
 #endif
 
+#define INIT_STRUCT_PID {  \
+   .count  = ATOMIC_INIT(1),   \
+   .nr = 0,\
+   /* Don't put this struct pid in pid_hash */ \
+   .pid_chain  = { .next = NULL, .pprev = NULL },  \
+   .tasks  = { \
+   { .first = &init_task.pids[PIDTYPE_PID].node }, \
+   { .first = &init_task.pids[PIDTYPE_PGID].node },\
+   { .first = &init_task.pids[PIDTYPE_SID].node }, \
+   },  \
+   .rcu= RCU_HEAD_INIT,\
+}
+
+#define INIT_PID_LINK(type)\
+{  \
+   .node = {   \
+   .next = NULL,   \
+   .pprev = &init_struct_pid.tasks[type].first,\
+   },  \
+   .pid = &init_struct_pid,\
+}
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -145,6 +167,11 @@ extern struct group_info init_groups;
.cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers),  \
.fs_excl= ATOMIC_INIT(0),   \
.pi_lock= SPIN_LOCK_UNLOCKED,   \
+   .pids = {   \
+   [PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),\
+   [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),   \
+   [PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),\
+   },  \
INIT_TRACE_IRQFLAGS \
INIT_LOCKDEP\
 }
Index: lx26-20-mm2c/include/linux/pid.h
===
--- lx26-20-mm2c.orig/include/linux/pid.h   2007-02-28 15:48:07.0 
-0800
+++ lx26-20-mm2c/include/linux/pid.h2007-02-28 15:48:07.0 -0800
@@ -51,6 +51,8 @@ struct pid
struct rcu_head rcu;
 };
 
+extern struct pid init_struct_pid;
+
 struct pid_link
 {
struct hlist_node node;
Index: lx26-20-mm2c/kernel/pid.c
===
--- lx26-20-mm2c.orig/kernel/pid.c  2007-02-28 15:48:07.0 -0800
+++ lx26-20-mm2c/kernel/pid.c   2007-02-28 15:48:07.0 -0800
@@ -27,11 +27,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
 static struct hlist_head *pid_hash;
 static int pidhash_shift;
 static struct kmem_cache *pid_cachep;
+struct pid init_struct_pid = INIT_STRUCT_PID;
 
 int pid_max = PID_MAX_DEFAULT;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-12 Thread Willy Tarreau
On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote:
> On Tuesday 13 March 2007 10:46, David Miller wrote:
> > From: Con Kolivas <[EMAIL PROTECTED]>
> > Date: Mon, 12 Mar 2007 10:58:11 +1100
> >
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > >30.patch
> >
> > FWIW, this boots and seems to work well on sparc64.  Tested
> > on UP SunBlade1500 and 24cpu Niagara T1000.
> 
> Very nice. Thanks for the feedback and I'm sorry you have to work with such 
> lousy hardware.

BTW, I don't know if you say this as a joke, but those are not necessarily
lousy hardware. Sun does lousy hardware when they put Sparcs in PCs (ultra5,
ultra10, blade100). But their servers generally are nice with large memory
busses and very scalable SMP architectures.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Kyle Moffett

On Mar 12, 2007, at 11:26:25, Linus Torvalds wrote:
So "good fairness" really should involve some notion of "work done  
for others". It's just not very easy to do..


Maybe extend UNIX sockets to add another passable object type vis-a- 
vis SCM_RIGHTS, except in this case "SCM_CPUTIME".  You call  
SCM_CPUTIME with a time value in monotonic real-time nanoseconds  
(duration) and a value out of 100 indicating what percentage of your  
timeslices to give to the process (for the specified duration).  The  
receiving process would be informed of the estimated total number of  
nanoseconds of timeslice that it will be given based on the priority  
of the processes. (Maybe it could prioritize requests?).  The X  
libraries could then properly "pass" CPU time to the X server to help  
with rendering their requests, and the X server could give priority  
to tasks which give up more CPU time than is needed to render their  
data, and penalize those which use more than they give.  Initially  
even if you don't patch the X server you could at least patch the X  
clients to give up CPU to the X server to promote interactivity.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21rc suspend to ram regression on Lenovo X60

2007-03-12 Thread Dave Jones
I spent considerable time over the last day or so bisecting to
find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
(Total lockup, black screen of death).

The bisect log looked like this.

git-bisect start
# bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
# good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7
# bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of 
git://ftp.linux-mips.org/pub/scm/upstream-linus
git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8
# bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1
# good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit 
user-tokens (or drm_file offsets)
git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f
# good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7
# good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu 
support
git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee
# bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7
# good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove 
CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE
git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c
# good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk 
with calls to pci_no_msi()
git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d
# good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix 
msi_remove_pci_irq_vectors.
git-bisect good 866a8c87c4e51046602387953bbef76992107bcb
# good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more 
architectures
git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe
# good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert "PCI: remove 
duplicate device id from ata_piix"
git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee

which led me to a final 'bad' commit of 78149df6d565c36675463352d0bfeb02b7a7
which is a merge changeset of lots of PCI bits.
Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
pci=nomsi, and it resumed again.

Any ideas how to further debug this?
I'll try backing out individual changes from that merge tomorrow.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3280277 - ynlg

2007-03-12 Thread Virus Research
AVERT Labs - Beaverton

Current Scan Engine Version:5100.0194

Current DAT Version:4982.

Thank you for your submission.


Analysis ID: 3280277

File NameFindings   Detection
Type Extra
|--|
||-
[EMAIL PROTECTED]|current detection |w32/[EMAIL PROTECTED]
|Virus   |no   

current detection [EMAIL PROTECTED]


   The file received is infected and can be detected and removed with our
current DAT 
files and engine. It is recommended that you update your DAT and engine
files and scan 
your computer again.


If you are not seeing this with the product you are using, please speak with
technical 
support so that they can help you determine the cause of this discrepancy.


To find detailed information about viruses and other malware, please review
AVERT's
Virus Information Library:


http://vil.mcafeesecurity.com


In order to get the fastest possible response, you may wish to submit future

virus-samples to:


https://www.webimmune.net/default.asp


In most cases it can respond almost instantly with a solution. This may also
be the
best option if you are having a problem with gateway scanners stripping your
sample
submission.


If you believe your computer is infected, but are unsure which files should
be 
submitted to AVERT for review, please visit:


http://vil.mcafeesecurity.com/vil/submit-sample.aspx


For other virus-related information, please review the AVERT homepage at:


http://www.mcafee.com/us/threat_center/default.asp


Support -


Virus Research accepts file-samples for analysis and possible inclusion into
AV
signature DAT sets. We are also prepared to answer general virus questions.
All
product-related questions and comments can be addressed through technical
support and  
customer service, including:


* Product installation and update questions

* Product usage questions

* Specific operating system/version questions

* Assistance with detection and cleaning or removal of viruses or trojans


Use the following link to update your DAT and scan engine to the most
current version: 

http://www.mcafee.com/apps/downloads/security_updates/dat.asp


Use the following links to reach online technical support for McAfee
products -

Corporate Customers:


http://www.mcafeesecurity.com/us/support/


Single User/Retail Customers:


http://www.mcafeehelp.com


Note -


Due to the prevalence of network gateway AV products, it is important that
all 
submissions be zipped and the zip file password-protected (password -
infected). Some  
products will reject an email that contains a virus that is not sent in this
way. In   
addition, often we receive a file that appears not to have been infected, to
find  
later that the file was infected when it left the sender, and was cleaned
somewhere
along the line.


Regards,




McAfee AVERT tm

A division of McAfee, Inc

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Gene Heskett
On Monday 12 March 2007, Nish Aravamudan wrote:
>On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
>> On Monday 12 March 2007, Douglas McNaught wrote:
>> >Patrick Mau <[EMAIL PROTECTED]> writes:
>> >> Why not temporarly replace "/bin/tar" with a shell script that
>> >> does:
>> >>
>> >> #!/bin/sh
>> >> exec strace -f -o output /bin/real.tar $@
>> >
>> >You beat me to it.  :) I've done that before; it's a great
>> > suggestion.
>> >
>> >Except that if you expect 'tar' to be invoked multiple times in a
>> > run, you should probably use 'output.$$' for the output filename so
>> > things don't get clobbered.
>> >
>> >-Doug
>>
>> In my case, Doug, it will get invoked 64 times, amanda does a dummy
>> run to get an estimate, calculates what to do based on that output
>> which is 32 runs, 1 per disklist entry and I have 32, and then reruns
>> tar with the appropriate level options against each individual
>> disklist entry.
>>
>> But I'm puzzled a bit, what does the double $$ do?, or it buried
>> someplace in the bash manpage?  Its not something I've stumbled over
>> yet.
>
>buried indeed:
>
>"Special Parameters:
>  ...
>   $  Expands to the process ID of the shell.  In a  () 
> subshell,  it expands  to  the  process  ID of the current shell, not
> the sub‐ shell.
>"

Well, that's clear enough, but what of the double $$ case?  Would this 
them make a PID unique to each invocation untill it finally wraps a 16 
bit value, or will the kernel re-use them because they won't all be 
running simultainiously, but limited by the number of unique 'spindle' 
numbers on the system, this to prevent as best as it can, the thrashing 
of a drive by having tar working on 2 separate (or more) partitions at 
the same time.  In my case 2 are possible, as /var is on a separate 
drive.

>Thanks,
>Nish



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"Say yur prayers, yuh flea-pickin' varmint!"
-- Yosemite Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Attachment Received Autoreply

2007-03-12 Thread Virus Research
Thank you for your file-sample. We will review your email and either send
you a response or forward to the appropriate contact. If you have sent us a
file which is not in a password protected zip file (password - infected)
then your sample will not be reviewed. 

__

Virus Research accepts file-samples for analysis and possible inclusion into
AV signature DAT sets. We are also prepared to answer general virus
questions. Virus Research does not handle product related issues. 

This message has been sent based upon keywords in your message.  If you have
been sent this message in error, please resend your message with the word
"noauto" in the subject line. 

__

Information on recent threats, along with other AVERT resources and tools,
can be found at: http://www.mcafeesecurity.com/us/security/home.asp

All product-related questions and comments can be addressed through
technical support. Contact information for Technical Support can be found
at: http://www.mcafeesecurity.com/us/contact/home.htm.

Engine and DAT updates are available at:
http://www.mcafeesecurity.com/us/downloads/updates

For instructions on submitting a sample to AVERT please see:
http://vil.nai.com/vil/submit-sample.asp

If you suspect you have a new, unknown virus and have a system where you can
do a test scan, you may first wish to try our Beta Hourly DATs to get the
latest detection available at:
http://vil.mcafeesecurity.com/vil/averttools.asp


Thanks - McAfee AVERT(tm)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)

2007-03-12 Thread Srivatsa Vaddagiri
On Mon, Mar 12, 2007 at 05:45:24PM -0500, Anton Blanchard wrote:
> Then please document it _clearly_ with the kthread code somewhere. 

Document as well in the kernel_thread() API, as I notice people still
use kernel_thread() some places (ex: rtasd.c in powerpc arch)?

> The reason I brought this up is I had no idea we had to put the freezer gunk
> in all kernel thread loops and Ive been writing kernel threads for years.

I noticed that in the Powerpc code (atleast for rtas kernel thread)
here:

http://lkml.org/lkml/2007/1/9/61

That was not a serious problem perhaps because process freezer was mostly used
in software suspend and only those platforms supporting software suspend
had to worry abt it.

But now we intend to use process freezer for CPU hotplug as well, so all
platforms wanting to support CPU hotplug better support process freezer!

P.S : I believe kprobes is already using process freezer as well.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-12 Thread Con Kolivas
On Tuesday 13 March 2007 10:46, David Miller wrote:
> From: Con Kolivas <[EMAIL PROTECTED]>
> Date: Mon, 12 Mar 2007 10:58:11 +1100
>
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> >30.patch
>
> FWIW, this boots and seems to work well on sparc64.  Tested
> on UP SunBlade1500 and 24cpu Niagara T1000.

Very nice. Thanks for the feedback and I'm sorry you have to work with such 
lousy hardware.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Question: removal of syscall macros?

2007-03-12 Thread albcamus

2006/12/14, Teunis Peters <[EMAIL PROTECTED]>:


Now that syscall macros have been pulled from the -mm tree, what method
is recommended to use syscalls?

(I've wasted a day grubbing through sources before giving up and copying
the old syscall macros into one key driver)

_syscall macros are used by:
ATI driver  (no choice.  I'm working with laptops)


I have the same problem as yours.  Do  you have any idea to use ATI
firegl driver
in recent kernels ? Thanks in advance.

Regards,
albcamus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Make sure we populate the initroot filesystem late enough

2007-03-12 Thread Kumar Gala


On Mar 12, 2007, at 6:01 PM, Paul TBBle Hampson wrote:


On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:

On Wed, 2007-02-28 at 10:13 +, David Woodhouse wrote:

On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
I wouldn't be that sure ... I've had problems in the past with  
PMU based

cpufreq... looks like flushing all caches and hard-resetting the
processor on the fly when there can be pending DMAs might be a  
source of
trouble... especially on CPUs that don't have working cache  
flush HW

assist.


I've seen it on a PowerMac3,1 (400MHz G4) where we don't have  
cpufreq.
I've also seen it on the latest 1.5GHz Mac Mini, and on my  
shinybook.
They all fall over with the latest kernel, although the shinybook  
only
does so immediately when booted with mem=512M. The shinybook does  
crash

later with new kernels though; I don't yet know why. It could be the
same thing, or it could be something different. That one seemed to
appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels,  
where

we did nothing but turned CONFIG_SYSFS_DEPRECATED on.

I don't blame cpufreq. At various times I've been equally  
convinced that

it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.


Is there any pattern to the way it dies? Or is it just randomly  
dieing

somewhere depending on which config options you have enabled?


This is starting to sound reminiscent of a bug I chased for a  
while last

year on Power5, but didn't find. It was "fixed" on some machines by
disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
Unfortunately it magically stopped reproducing so I never caught  
it :/


Hmm. The crash came back after I booted into Mac OS X and back. It  
was however
a different crash, I believe it was coming from the USB modules (as  
it would
keep going when it happened, and get another crash, which tended to  
scroll away
too fast for me to capture) but I believe it was still getting down  
into the

slab code and actually dying there.

However, reverting the reversion of
8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
the following patch:

diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- 
source-2.6.20/arch/powerpc/mm/init_32.c
--- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05  
05:44:54.0 +1100
+++ linux-source-2.6.20/arch/powerpc/mm/init_32.c   2007-03-10  
11:03:56.0 +1100

@@ -244,7 +244,8 @@
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
if (start < end)
-   printk ("Freeing initrd memory: %ldk freed\n", (end  
- start) >> 10);
+   printk ("NOT Freeing initrd memory: %ldk freed\n",  
(end - start) >> 10);

+   return;
for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));

which if I recall correctly David Woodhouse posted to this thread,
seems to have fixed it.

I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
(ie 99 bytes over 12884k) and the above logs:
"NOT Freeing initrd memory: 12888k freed"
which makes sense...

I of course completely failed to think to check this with the crashing
kernel, if it seems relevant I can roll back to it and get the  
numbers.


Have you tried 2.6.20.2, there was a significant bug in get_order()  
that was deemed to be causing these issues.


- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Nish Aravamudan

On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote:

On Monday 12 March 2007, Douglas McNaught wrote:
>Patrick Mau <[EMAIL PROTECTED]> writes:
>> Why not temporarly replace "/bin/tar" with a shell script that does:
>>
>> #!/bin/sh
>> exec strace -f -o output /bin/real.tar $@
>
>You beat me to it.  :) I've done that before; it's a great suggestion.
>
>Except that if you expect 'tar' to be invoked multiple times in a run,
>you should probably use 'output.$$' for the output filename so things
>don't get clobbered.
>
>-Doug

In my case, Doug, it will get invoked 64 times, amanda does a dummy run to
get an estimate, calculates what to do based on that output which is 32
runs, 1 per disklist entry and I have 32, and then reruns tar with the
appropriate level options against each individual disklist entry.

But I'm puzzled a bit, what does the double $$ do?, or it buried someplace
in the bash manpage?  Its not something I've stumbled over yet.


buried indeed:

"Special Parameters:
 ...
  $  Expands to the process ID of the shell.  In a  ()  subshell,  it
 expands  to  the  process  ID of the current shell, not the sub‐
 shell.
"

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Gene Heskett
On Monday 12 March 2007, Douglas McNaught wrote:
>Patrick Mau <[EMAIL PROTECTED]> writes:
>> Why not temporarly replace "/bin/tar" with a shell script that does:
>>
>> #!/bin/sh
>> exec strace -f -o output /bin/real.tar $@
>
>You beat me to it.  :) I've done that before; it's a great suggestion.
>
>Except that if you expect 'tar' to be invoked multiple times in a run,
>you should probably use 'output.$$' for the output filename so things
>don't get clobbered.
>
>-Doug

In my case, Doug, it will get invoked 64 times, amanda does a dummy run to 
get an estimate, calculates what to do based on that output which is 32 
runs, 1 per disklist entry and I have 32, and then reruns tar with the 
appropriate level options against each individual disklist entry.

But I'm puzzled a bit, what does the double $$ do?, or it buried someplace 
in the bash manpage?  Its not something I've stumbled over yet.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
rugged, adj.:
Too heavy to lift.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fwd: libata extension

2007-03-12 Thread Vitaliyi

Why is the access to Control register needed?


To execute soft reset for example.


> In the perfect case i would like to be able to execute vendor command
> set (reverse engineered).

Sounds interesting. :-)

Could you give some more details on what are you going to implement?


Reading/writing service area, uploading, downloading modules, working
with flash etc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-12 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Mon, 12 Mar 2007 19:26:16 -0700 (PDT)

> From: Paul Mackerras <[EMAIL PROTECTED]>
> Date: Tue, 13 Mar 2007 11:37:32 +1100
> 
> > David Miller writes:
> > 
> > > I ported this to sparc64 as per the patch below, tested on
> > > UP SunBlade1500 and 24 cpu Niagara T1000.
> > 
> > Did you see any performance improvement?  We used to have quicklists
> > on ppc, but I remain to be convinced that they actually help.
> 
> It shaved about 3 or 4 seconds consistently off of my kernel
> build on Niagara which usually clocks in just over 4 minutes
> on this 24 thread machine.

I want to quantify this with the fact that all the cache false sharing
issues are irrelevant in this test because the L2 cache is shared
between all of the cpu threads on Niagara.

It was fast just because the quicklists were lighter weight than the
SLAB stuff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-12 Thread David Miller
From: Paul Mackerras <[EMAIL PROTECTED]>
Date: Tue, 13 Mar 2007 11:37:32 +1100

> David Miller writes:
> 
> > I ported this to sparc64 as per the patch below, tested on
> > UP SunBlade1500 and 24 cpu Niagara T1000.
> 
> Did you see any performance improvement?  We used to have quicklists
> on ppc, but I remain to be convinced that they actually help.

It shaved about 3 or 4 seconds consistently off of my kernel
build on Niagara which usually clocks in just over 4 minutes
on this 24 thread machine.

> Also, I didn't understand why we have to do quicklists to take
> advantage of the fact that the pages are in a pristine state when they
> are freed.  I thought the whole point of the slab allocator was to be
> able to take advantage of that...

He just wants to side-step the issue in SLUB, which arguably
is an attempt to simplify SLUB at the expense of functionality.

I don't agree with that, but I'm merely preemptively testing his
patches and porting them to sparc64 so it does not break when/if his
code is merged in.  After being bitten by stuff like this in the past,
I've decided to become more proactive :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage

On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote:


why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,


That's assuming that you're using network namespace virtualization,
with each group of tasks in a separate namespace. What if you don't
want the virtualization overhead, just the accounting?

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Lee Revell

On 3/12/07, David Lang <[EMAIL PROTECTED]> wrote:

the problem comes when this isn't enough. if you have several CPU hogs on a
system, and they are all around the same priority level, how can the scheduler
know which one needs the CPU the most for good interactivity?

in some cases you may be able to directly detect that your high-priority process
is waiting for another one (tracing pipes and local sockets for example), but
what if you are waiting for several of them? (think a multimedia desktop waiting
for the sound card, CDRom, hard drive, and video all at once) which one needs
the extra CPU the most?


I'm not an expert in this area by any means but after reading this
thread the OSX solution of simply telling the kernel "I'm the GUI,
schedule me accordingly" looks increasingly attractive.  Why make the
kernel guess when we can just be explicit?

Does anyone know of a UNIX-like system that has managed to solve this
problem without hooking the GUI into the scheduler?

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-12 Thread Srivatsa Vaddagiri
On Tue, Mar 13, 2007 at 07:27:06AM +0530, Balbir Singh wrote:
> I am not sure what went wrong. Could you please check your mail
> client, cause it seemed to even change email address to smtp.osdl.org
> which bounced back when I wrote to you earlier.

I have a problem doing a group-reply in mutt to Herbert's mails. His
email id gets dropped from the To or Cc list. Is that his email setting?
Don't know.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Tue, Mar 13, 2007 at 12:31:13AM +0100, Herbert Poetzl wrote:
> just means that the current Linux-VServer behaviour
> is a subset of that, no problem there as long as
> it really _is_ a subset :) we always like to provide
> more features in the future, no problem with that :)

Considering the example Sam quoted, doesn't it make sense to split
resource classes (some of them atleast) independent of each other?
That would also argue for providing multiple hierarchy feature in Paul's
patches.

Given that and the mail Serge sent on why nsproxy optimization is
usefull given numbers, can you reconsider your earlier proposals as
below:

- pid_ns and resource parameters should be in a single struct
  (point 1c, 7b in [1])

- pointers to resource controlling objects should be inserted
  in task_struct directly (instead of nsproxy indirection)
  (points 2c in [1])

[1] http://lkml.org/lkml/2007/3/12/138

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-12 Thread Balbir Singh

hmm, it is very unlikely that this would happen,
for several reasons ... and indeed, checking the
thread in my mailbox shows that akpm dropped you ...



But, I got Andrew's email.



Subject: [RFC][PATCH 2/7] RSS controller core
From: Pavel Emelianov <[EMAIL PROTECTED]>
To: Andrew Morton <[EMAIL PROTECTED]>, Paul Menage <[EMAIL PROTECTED]>,
Srivatsa Vaddagiri <[EMAIL PROTECTED]>,
Balbir Singh <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED],
Linux Kernel Mailing List 
Date: Tue, 06 Mar 2007 17:55:29 +0300

Subject: Re: [RFC][PATCH 2/7] RSS controller core
From: Andrew Morton <[EMAIL PROTECTED]>
To: Pavel Emelianov <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
Paul Menage <[EMAIL PROTECTED]>,
List 
Date: Tue, 6 Mar 2007 14:00:36 -0800

that's the one I 'group' replied to ...

> Could you please not modify the "cc" list.

I never modify the cc unless explicitely asked
to do so. I wish others would have it that way
too :)



Thats good to know, but my mailer shows


Andrew Morton <[EMAIL PROTECTED]>
to  Pavel Emelianov <[EMAIL PROTECTED]>   
cc  
Paul Menage <[EMAIL PROTECTED]>,
Srivatsa Vaddagiri <[EMAIL PROTECTED]>,
Balbir Singh <[EMAIL PROTECTED]> (see I am <>),
devel@openvz.org,
Linux Kernel Mailing List ,
[EMAIL PROTECTED],
Kirill Korotaev <[EMAIL PROTECTED]>   
dateMar 7, 2007 3:30 AM 
subject Re: [RFC][PATCH 2/7] RSS controller core
mailed-by   vger.kernel.org 
On Tue, 06 Mar 2007 17:55:29 +0300

and your reply as

Andrew Morton <[EMAIL PROTECTED]>,
Pavel Emelianov <[EMAIL PROTECTED]>,
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED],
Paul Menage <[EMAIL PROTECTED]>,
List
to  Andrew Morton <[EMAIL PROTECTED]> 
cc  
Pavel Emelianov <[EMAIL PROTECTED]>,
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED],
Paul Menage <[EMAIL PROTECTED]>,
List
dateMar 9, 2007 10:18 PM
subject Re: [RFC][PATCH 2/7] RSS controller core
mailed-by   vger.kernel.org

I am not sure what went wrong. Could you please check your mail
client, cause it seemed to even change email address to smtp.osdl.org
which bounced back when I wrote to you earlier.


best,
Herbert



Cheers,
Balbir
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fwd: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)

2007-03-12 Thread young dave

-- Forwarded message --
Hi,
I have tested on my mac mini g4.

The 2.6.21-rc2 will cause oops like the above post.

And for the new 2.6.21-rc3-git7 , the kernel load ok,  penguin pixmap
appears, but then it stopped, there's no error messages also.

Regards
dave


2007/3/7, Benjamin Herrenschmidt <[EMAIL PROTECTED]>:

On Wed, 2007-03-07 at 17:53 +1300, Paul Collins wrote:
> David Woodhouse <[EMAIL PROTECTED]> writes:
>
> > On Tue, 2007-03-06 at 14:53 +1300, Paul Collins wrote:
> >> In case it's of interest, 2.6.20 has been running fine on my
> >> PowerBook5,4.
> >
> > How much memory? What if you boot with mem=512M or mem=256M?
>
> 1GB.  Also works fine when booted with those options.

Can you try 2.6.21-rc3 ? We just fixed a nasty bug causing memory
corruption.

Ben.


___
Linuxppc-dev mailing list
[EMAIL PROTECTED]
https://ozlabs.org/mailman/listinfo/linuxppc-dev


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb-serial regression fix

2007-03-12 Thread Mark Lord

Jim Radford wrote:

On Mon, Mar 12, 2007 at 05:18:19PM -0700, Greg KH wrote:

On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote:

On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote:

On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote:

On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote:

Oliver Neukum wrote:

Mark Lord wrote:

Okay, from that part (above), the problem is obvious:
in that the "MCT U232 converter now disconnected"
appears, and then we continue to try and call the
driver's method.. Oops!



IMHO shutdown() is using serial->port[] and bombs.
Could you reverse the order here?



Do not NULL serial->port[i] since it is used in ->shutdown().
This wasn't an issue until the order or ->shutdown() and
device_unregister was corrected.



for (i = 0; i < serial->num_ports; ++i)
if (serial->port[i]->dev.parent != NULL) {
device_unregister(&serial->port[i]->dev);
-   serial->port[i] = NULL;
}



But shouldn't you null it out somewhere?  It will be an "empty"
pointer at some point in time...


Not as far as I can see. The serial structure that ->port[i] is in
gets kfree()ed soon after, in the same function, and nothing in
between, other than ->shutdown(), uses ->port[].  I assume it was
someone being overly cautious.


So where does the memory get freed -- the structure pointed at
by the serial->port[i] thingie ?  It's not a leak, is it?

???
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-12 Thread Christoph Lameter
On Tue, 13 Mar 2007, Paul Mackerras wrote:

> Also, I didn't understand why we have to do quicklists to take
> advantage of the fact that the pages are in a pristine state when they
> are freed.  I thought the whole point of the slab allocator was to be
> able to take advantage of that...

It used to be the case that initializating objects was better in the past. 
Today it is better to initialize the objects immediately before they are 
used. That will move them into the cpu caches and keep them there. 
Initializing them earlier may cause the cachelines of the object to be 
evicted from the cpu cache and then those have to be refetched. The 
benefit of this approach diminishes the larger objects get and the sparser 
the access to the cachelines of the object. In the case of page sized 
objects that are sparsely accessed (the PAGE_SIZE caches covered by 
quicklists) it makes sense to attempt to avoid having to touch all 
cachelines of the page on alloc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)

2007-03-12 Thread Douglas McNaught
Patrick Mau <[EMAIL PROTECTED]> writes:

> Why not temporarly replace "/bin/tar" with a shell script that does:
>
> #!/bin/sh
> exec strace -f -o output /bin/real.tar $@

You beat me to it.  :) I've done that before; it's a great suggestion.

Except that if you expect 'tar' to be invoked multiple times in a run,
you should probably use 'output.$$' for the output filename so things
don't get clobbered.

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-12 Thread Alan Cox
> Writing to a file from multiple processes is not usually the problem.
> Writing to a common "struct file" from multiple threads is.

Not normally because POSIX sensibly invented pread/pwrite. Forgot
preadv/pwritev but they did the basics and end of problem

> So what?  My products are shipping _now_.  

That doesn't inspire confidence.

> even funny.  If POSIX mandates stupid shit, and application
> programmers don't read that part of the manual anyway (and don't code
> on that assumption in practice), to hell with POSIX.  On many file

Thats funny, you were talking about quality a moment ago.

> descriptors, short writes simply can't happen -- and code that

There is almost no descriptor this is true for. Any file I/O can and will
end up short on disk full or resource limit exceeded or quota exceeded or
NFS server exploded or ...

And on the device side about the only thing with the vaguest guarantees
is pipe().

> purports to handle short writes but has never been exercised is
> arguably worse than code that simply bombs on short write.  So if I
> can't shim in an induce-short-writes-randomly-on-purpose mechanism
> during development, I don't want short writes in production, period.

Easy enough to do and gcov plus dejagnu or similar tools will let you
coverage analyse the resulting test set and replay it.

> Sure -- until the one code path in a hundred that handles the "short
> write" case incorrectly gets traversed in production, after having
> gone untested in a development environment that used a different
> filesystem that never happened to trigger it.

Competent QA and testing people test all the returns in the manual as
well as all the returns they can find in the code. See ptrace(2) if you
don't want to do a lot of relinking and strace for some useful worked
examples of syscall hooking.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Nick Piggin
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > >
> > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
> > > that as well, then I think it might be a good option.
> >
> > Oh, hmm if you can truncate these things then you still need to
> > force unmap so you still need i_mmap_nonlinear.
> 
> Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which 
> is 
> way similar I guess.
> 
> About the restriction to tmpfs, I have just discovered 
> '[PATCH] mm: tracking shared dirty pages' (commit 
> d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
> with remap_file_pages for file-based mmaps (and that's fully fine, for now).
> 
> Even if UML does not need it, till now if there is a VMA protection and a 
> page 
> hasn't been remapped with remap_file_pages, the VMA protection is used (just 
> because it makes sense).
> 
> However, it is only used when the PTE is first created - we can never change 
> protections on a VMA  - so it vma_wants_writenotify() is true (on all 
> file-based and on no shmfs based mapping, right?), and we write-protect the 
> VMA, it will always be write-protected.

Yes, I believe that is the case, however I wonder if that is going to be
a problem for you to distinguish between write faults for clean writable
ptes, and write faults for readonly ptes?

> That's no problem for UML, but for any other user (I guess I'll have to 
> prevent callers from trying such stuff - I started from a pretty generic 
> patch).
> 
> > But come to think of it, I still don't think nonlinear mappings are
> > too bad as they are ;)
> 
> Btw, I really like removing ->populate and merging the common code together. 
> filemap_populate and shmem_populate are so obnoxiously different that I 
> already wanted to do that (after merging remap_file_pages() core).

Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage,
and duplicate a lot of the same code ;)

> Also, I'm curious. Since my patches are already changing remap_file_pages() 
> code, should they be absolutely merged after yours?

Is there a big clash? I don't think I did a great deal to fremap.c (mainly
just removing stuff)...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: Simplify smp_call_function*() by using common implementation

2007-03-12 Thread Jeremy Fitzhardinge
Subject: Simplify smp_call_function*() by using common implementation

smp_call_function and smp_call_function_single are almost complete
duplicates of the same logic.  This patch combines them by
implementing them in terms of the more general
smp_call_function_mask().

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Stephane Eranian <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: "Randy.Dunlap" <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>

---
 arch/i386/kernel/smp.c |  213 ++--
 1 file changed, 102 insertions(+), 111 deletions(-)

===
--- a/arch/i386/kernel/smp.c
+++ b/arch/i386/kernel/smp.c
@@ -515,6 +515,73 @@ void unlock_ipi_call_lock(void)
 
 static struct call_data_struct *call_data;
 
+
+/**
+ * smp_call_function_mask(): Run a function on a set of other CPUs.
+ * @mask: The set of cpus to run on.  Must not include the current cpu.
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed on other 
CPUs.
+ *
+ * Returns 0 on success, else a negative status code. Does not return until
+ * remote CPUs are nearly ready to execute <> or are or have finished.
+ *
+ * You must not call this function with disabled interrupts or from a
+ * hardware interrupt handler or from a bottom half handler.
+ */
+int smp_call_function_mask(cpumask_t mask,
+  void (*func)(void *), void *info,
+  int wait)
+{
+   struct call_data_struct data;
+   cpumask_t allbutself;
+   int cpus;
+
+   /* Can deadlock when called with interrupts disabled */
+   WARN_ON(irqs_disabled());
+
+   /* Holding any lock stops cpus from going down. */
+   spin_lock(&call_lock);
+
+   allbutself = cpu_online_map;
+   cpu_clear(smp_processor_id(), allbutself);
+
+   cpus_and(mask, mask, allbutself);
+   cpus = cpus_weight(mask);
+
+   if (!cpus) {
+   spin_unlock(&call_lock);
+   return 0;
+   }
+
+   data.func = func;
+   data.info = info;
+   atomic_set(&data.started, 0);
+   data.wait = wait;
+   if (wait)
+   atomic_set(&data.finished, 0);
+
+   call_data = &data;
+   mb();
+
+   /* Send a message to other CPUs */
+   if (cpus_equal(mask, allbutself))
+   send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+   else
+   send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+
+   /* Wait for response */
+   while (atomic_read(&data.started) != cpus)
+   cpu_relax();
+
+   if (wait)
+   while (atomic_read(&data.finished) != cpus)
+   cpu_relax();
+   spin_unlock(&call_lock);
+
+   return 0;
+}
+
 /**
  * smp_call_function(): Run a function on all other CPUs.
  * @func: The function to run. This must be fast and non-blocking.
@@ -528,48 +595,43 @@ static struct call_data_struct *call_dat
  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler.
  */
-int smp_call_function (void (*func) (void *info), void *info, int nonatomic,
-   int wait)
-{
-   struct call_data_struct data;
-   int cpus;
-
-   /* Holding any lock stops cpus from going down. */
-   spin_lock(&call_lock);
-   cpus = num_online_cpus() - 1;
-   if (!cpus) {
-   spin_unlock(&call_lock);
-   return 0;
-   }
-
-   /* Can deadlock when called with interrupts disabled */
-   WARN_ON(irqs_disabled());
-
-   data.func = func;
-   data.info = info;
-   atomic_set(&data.started, 0);
-   data.wait = wait;
-   if (wait)
-   atomic_set(&data.finished, 0);
-
-   call_data = &data;
-   mb();
-   
-   /* Send a message to all other CPUs and wait for them to respond */
-   send_IPI_allbutself(CALL_FUNCTION_VECTOR);
-
-   /* Wait for response */
-   while (atomic_read(&data.started) != cpus)
-   cpu_relax();
-
-   if (wait)
-   while (atomic_read(&data.finished) != cpus)
-   cpu_relax();
-   spin_unlock(&call_lock);
-
-   return 0;
+int smp_call_function(void (*func) (void *info), void *info, int nonatomic,
+ int wait)
+{
+   return smp_call_function_mask(cpu_online_map, func, info, wait);
 }
 EXPORT_SYMBOL(smp_call_function);
+
+/*
+ * smp_call_function_single - Run a function on another CPU
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @nonatomic: Currently unused.
+ * @wait: If true, wait until function has completed on other CPUs.
+ *
+ * Retrurns 0 on success, else a negative

Re: sys_write() racy for multi-threaded append?

2007-03-12 Thread Michael K. Edwards

On 3/12/07, Bodo Eggert <[EMAIL PROTECTED]> wrote:

On Mon, 12 Mar 2007, Michael K. Edwards wrote:
> That's fine when you're doing integration test, and should probably be
> the default during development.  But if the race is first exposed in
> the field, or if the developer is trying to concentrate on a different
> problem, "spectacular crash and burn" may do more harm than good.
> It's easy enough to refactor the f_pos handling in the kernel so that
> it all goes through three or four inline accessor functions, at which
> point you can choose your trade-off between speed and idiot-proofness
> -- at _kernel_ compile time, or (given future hardware that supports
> standardized optionally-atomic-based-on-runtime-flag operations) per
> process at run-time.

CONFIG_WOMBAT

Waste memory, brain and time in order to grant an atomic write which is
neither guaranteed by the standard nor expected by any sane programmer,
just in case some idiot tries to write to one file from multiple
processes.

Warning: Programs expecting this behaviour are buggy and non-portable.


OK, I laughed out loud at this.  But I think you're missing my point,
which is that there's a time to be hard-core about code quality and
there's a time to be hard-core about _product_ quality.  Face it, all
products containing software more or less suck.  This is because most
programmers write crap code most of the time.  The only way to cope
with this, outside the confines of the European defense industry and
other niches insulated from economic reality, is to make the
production environment gentler on _application_ code than the
development environment is.  Hence CONFIG_WOMBAT.  (I like that name.
I'm going to use it in my patch, with your permission.  :-)

Writing to a file from multiple processes is not usually the problem.
Writing to a common "struct file" from multiple threads is.  99.999%
of the time it will work, because you're only writing as far as VFS
cache and then bumping f_pos, and your threads are probably on the
same processor anyway.  0.001% of the time the second thread will see
a stale f_pos and clobber the first write.  This is true even on file
types that can never return a short write.  If you remember to open
with O_APPEND so the pos argument to vfs_write is silently ignored, or
if the implementation underlying vfs_write effectively ignores the pos
argument irrespective of flags, you're OK.  If the pos argument isn't
ignored, or if you ever look at the result of a relative seek on any
fd that maps to that struct file, you're screwed.

(Note to the alert reader:  yes, this means shell scripts should
always use >> rather than > when routing stdout and/or stderr to a
file.  You're just as vulnerable to interleaving due to stdio
buffering issues as you are when stdio and stderr are sent to the tty,
and short writes may still be a problem if you are so foolish as to
use a filesystem that generates them on anything short of a
catastrophic error, but at least you get O_APPEND and sane behavior on
ftruncate().)


> Frankly, I think that unless application programmers poke at some sort
> of magic "I promise to handle short writes correctly" bit, write()
> should always return either the full number of bytes requested or an
> error code.

If you asume that you won't have short writes, your programs may fail on
e.g. solaris. There may be reasons for linux to use the same semantics at
some time in the future, you never know.


So what?  My products are shipping _now_.  Future kernels are
guaranteed to break them anyway because sysfs is a moving target.
Solaris is so not in the game for my kind of embedded work, it's not
even funny.  If POSIX mandates stupid shit, and application
programmers don't read that part of the manual anyway (and don't code
on that assumption in practice), to hell with POSIX.  On many file
descriptors, short writes simply can't happen -- and code that
purports to handle short writes but has never been exercised is
arguably worse than code that simply bombs on short write.  So if I
can't shim in an induce-short-writes-randomly-on-purpose mechanism
during development, I don't want short writes in production, period.

In my world, GNU/Linux is not a crappy imitation Solaris that you get
to pay out the wazoo for to Red Hat (and get no documentation and
lousy tech support that doesn't even cover your hardware).  It's a
full-source-code platform on which you can engineer robust industrial
and consumer products, because you can control the freeze and release
schedule component-by-component, and you can point fix anything in the
system at any time.  If, that is, you understand that the source code
is not the software, and that you can't retrofit stability and
security overnight onto code that was written with no thought of
anything but performance.


If you asume you *may* have short writes, you have no problem.


Sure -- until the one code path in a hundred that handles the "short
write" case incorrectly gets traversed in 

Re: [PATCH] usb-serial regression fix

2007-03-12 Thread Jim Radford
On Mon, Mar 12, 2007 at 05:18:19PM -0700, Greg KH wrote:
> On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote:
> > On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote:
> > > On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote:
> > > > On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote:
> > > > > Oliver Neukum wrote:
> > > > > > >Mark Lord wrote:
> > > > > > > >Okay, from that part (above), the problem is obvious:
> > > > > > > >in that the "MCT U232 converter now disconnected"
> > > > > > > >appears, and then we continue to try and call the
> > > > > > > >driver's method.. Oops!

> > > > > >IMHO shutdown() is using serial->port[] and bombs.
> > > > > >Could you reverse the order here?

> > Do not NULL serial->port[i] since it is used in ->shutdown().
> > This wasn't an issue until the order or ->shutdown() and
> > device_unregister was corrected.

> > for (i = 0; i < serial->num_ports; ++i)
> > if (serial->port[i]->dev.parent != NULL) {
> > device_unregister(&serial->port[i]->dev);
> > -   serial->port[i] = NULL;
> > }

> But shouldn't you null it out somewhere?  It will be an "empty"
> pointer at some point in time...

Not as far as I can see. The serial structure that ->port[i] is in
gets kfree()ed soon after, in the same function, and nothing in
between, other than ->shutdown(), uses ->port[].  I assume it was
someone being overly cautious.

-Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /sys/devices/system/cpu/cpuX/online are missing

2007-03-12 Thread Andreas Schwab
Giuliano Pochini <[EMAIL PROTECTED]> writes:

> I had a look at arch/powerpc/kernel/smp.c but I'm not familiar at all with 
> those parts of the kernel.

See arch/powerpc/kernel/sysfs.c:topology_init.  I don't think there is
anything to do here.  You probably don't have CONFIG_HOTPLUG_CPU enabled.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-12 Thread Paul Mackerras
David Miller writes:

> I ported this to sparc64 as per the patch below, tested on
> UP SunBlade1500 and 24 cpu Niagara T1000.

Did you see any performance improvement?  We used to have quicklists
on ppc, but I remain to be convinced that they actually help.

Also, I didn't understand why we have to do quicklists to take
advantage of the fact that the pages are in a pristine state when they
are freed.  I thought the whole point of the slab allocator was to be
able to take advantage of that...

Paul.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb-serial regression fix

2007-03-12 Thread Greg KH
On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote:
> On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote:
> > On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote:
> > > On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote:
> > > > Oliver Neukum wrote:
> > > > >>Mark Lord wrote:
> > > > >>>Okay, from that part (above), the problem is obvious:
> > > > >>>in that the "MCT U232 converter now disconnected" appears,
> > > > >>>and then we continue to try and call the driver's method.. Oops!
> > 
> > > > >IMHO shutdown() is using serial->port[] and bombs.
> > > > >Could you reverse the order here?
> > 
> > > > Yup.  Fixed.  Tested.  Works.
> > 
> > > > This patch fixes the Oops that otherwise occurs whenever
> > > > a USB serial adapter is unplugged from a system, as well
> > > > the Oops seen when one is in use before resume (to RAM).
> > 
> > > Argh, no, this change was done to help the ftdi drivers out.
> > 
> > > Look at changeset d9a7ecacac5f8274d2afce09aadcf37bdb42b93a in Linus's
> > > tree from Jim Radford:
> > >   
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d9a7ecacac5f8274d2afce09aadcf37bdb42b93a
> > > 
> > > It makes this change because the usb-serial drivers need the port
> > > devices when the port_remove() callbacks happen.  Otherwise you get an
> > > oops that way.
> > 
> > > Jim, can you take a look at this and see if you can figure something
> > > out?
> > 
> > The problem is really the
> > 
> >serial->port[i] = NULL;
> > 
> > line after device_unregister() which is used to flag "fake" devices
> > that don't need legacy cleanup later in the destrol_serial.  That
> > flagging should be done using a *real* flag, and not by overloading
> > the ->port[i] pointer since we require it to be non-NULL in
> > ->shutdown() in all drivers that are not converted to new
> > ->port_probe()/->port_remove() framework (currently all except ftdi).
> 
> > I'll work on a patch to do that, but for now, I think you should apply
> > Mark's patch to revert the order change since the FTDI driver no
> > longer requires the correct ordering of device_unregister() and
> > ->shutdown().
> 
> Do not NULL serial->port[i] since it is used in ->shutdown().  This
> wasn't an issue until the order or ->shutdown() and device_unregister
> was corrected.
> 
> Signed-Off: Jim Radford <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
> index 8511352..871c9a8 100644
> --- a/drivers/usb/serial/usb-serial.c
> +++ b/drivers/usb/serial/usb-serial.c
> @@ -145,7 +145,6 @@ static void destroy_serial(struct kref *kref)
> for (i = 0; i < serial->num_ports; ++i)
> if (serial->port[i]->dev.parent != NULL) {
> device_unregister(&serial->port[i]->dev);
> -   serial->port[i] = NULL;
> }

But shouldn't you null it out somewhere?  It will be an "empty" pointer
at some point in time...

Mark, does this solve your oops (after you revert your previous patch)?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Thibaut VARENE

On 3/12/07, michael chang <[EMAIL PROTECTED]> wrote:


Considering the concepts put out by projects such as BOINC and
[EMAIL PROTECTED], I wouldn't be thoroughly surprised by this ideology,
although I do question the particular way this test case is being run.


If Con actually implements SCHED_IDLEPRIO in RSDL, life is good even
in that case.


This seems to me like he's saying that there has to be a mechanism
(outside of nice) that can be used to treat processes that "I" want to
be interactive all special-like. It feels like something that would
have been said in the design of what the scheduler was in -ck and is
currently in vanilla.


Exactly. Driving us again toward the fact that different workloads
might benefit from different schedulers (eg: RSDL is cool for server
loads, previous staircase did an excellent job on desktop, etc) and
thus that having a choice of schedulers might be something that would
satisfy (some) people...


To me, that fundamentally clashes with the design behind RSDL. That
said, I could be wrong -- Con appears to have something that could be
very promising up his sleeve that could come out sooner or later. Once
he's written it, of course. In any case, RSDL seems very promising,
for the most part.


It certainly is. "Negative" feedback can be a good thing too, as it
helps improving it anyway. It's nonetheless true that it's practically
impossible to satisfy 100% of use case with a single design, so
choices will have to be made.

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Delete superfluous source file "net/wanrouter/af_wanpipe.c".

2007-03-12 Thread David Miller
From: "Robert P. J. Day" <[EMAIL PROTECTED]>
Date: Sat, 10 Mar 2007 03:49:52 -0500 (EST)

> 
>   Delete the apparently superfluous source file
> net/wanrouter/af_wanpipe.c.
> 
> Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

Applied, thanks Robert.

This thing isn't even built in 2.4.x :-)  Although there is
some ancient reference to the build module in
Documentation/networking/wan-router.txt, a heavily out of date
document.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr

2007-03-12 Thread Timothy Shimmin

Hi,

--On 9 March 2007 12:55:11 PM +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote:


Ed Cashin found a bug in the error handling code for the case where
a page allocation fails.  Here's the updated version:

Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-03-08 19:08:38.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-09 08:59:15.0 
+0100



+   for (i = 0; i < page_count; i++) {
+   bp->b_pages[i] = alloc_page(GFP_KERNEL);
+   if (!bp->b_pages[i])
+   goto fail_free_mem;
+   }
+   bp->b_flags |= _XBF_PAGES;
+
+   error = _xfs_buf_map_pages(bp, XBF_MAPPED);
+   if (unlikely(error)) {
+   printk(KERN_WARNING "%s: failed to map pages\n",
+   __FUNCTION__);
goto fail_free_mem;
-   bp->b_flags |= _XBF_KMEM_ALLOC;
+   }

xfs_buf_unlock(bp);

XB_TRACE(bp, "no_daddr", data);
return bp;
+
  fail_free_mem:
-   kmem_free(data, malloc_len);
+   for ( ; i >= 0; i--)
+   __free_page(bp->b_pages[i]);
  fail_free_buf:
xfs_buf_free(bp);
  fail:


It looks like you might need: for (i--; i >= 0; i--)
(or: for (j = 0; j < i; j++) etc.)

Because if the initial alloc_page loop goes to completion then:
 i == pagecount
and if alloc_page loop terminates early then
 bp->b_pages[i] == NULL
So we have gone 1 too far in both cases and need to
start free'ing back one.
Unless I missed something.

--Tim



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread David Lang

On Mon, 12 Mar 2007, Mike Galbraith wrote:


On Tue, 2007-03-13 at 07:38 +1100, Con Kolivas wrote:

On Tuesday 13 March 2007 07:11, Mike Galbraith wrote:


Killing the known corner case starvation scenarios is wonderful, but
let's not just pretend that interactive tasks don't have any special
requirements.


Now you're really making a stretch of things. Where on earth did I say that
interactive tasks don't have special requirements? It's a fundamental feature
of this scheduler that I go to great pains to get them as low latency as
possible and their fair share of cpu despite having a completely fair cpu
distribution.


As soon as your cpu is fully utilized, fairness looses or interactivity
loses.  Pick one.


correct.

the problem is that it's hard (if not impossible) to properly identify what is 
needed to make a system have good interactivity. in some cases it's a matter of 
low latency (wake up a process as quickly as you can when whatever it was 
waiting on is available), but in others it's a matter of allocating the _right_ 
process enough CPU (X needs enough CPU to do things)


where it's a matter of needing low-latency, it's possible to design a scheduler 
that will do things in a predictable enough way that you know the max latency 
you have to deal with (and the RSDL seems to do this)


the problem comes when this isn't enough. if you have several CPU hogs on a 
system, and they are all around the same priority level, how can the scheduler 
know which one needs the CPU the most for good interactivity?


in some cases you may be able to directly detect that your high-priority process 
is waiting for another one (tracing pipes and local sockets for example), but 
what if you are waiting for several of them? (think a multimedia desktop waiting 
for the sound card, CDRom, hard drive, and video all at once) which one needs 
the extra CPU the most?


Fairness is much easier to enforce (and much easier to understand)

the RSDL is concentrating on enforcing fairness, with bounded (and predictable) 
latencies.


if you are willing to tell the system what you consider more important (and how 
much more important you consider it), then it's much easier to figure out who to 
give the CPU to. Con is just asking you to do this (and you already do, by doing 
a nice -5. but it sounds like you want that to mean more then it currently does)


David Lang


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] net driver fixes

2007-03-12 Thread David Miller
From: Geert Uytterhoeven <[EMAIL PROTECTED]>
Date: Mon, 12 Mar 2007 11:02:43 +0100 (CET)

> On Tue, 6 Mar 2007, Jeff Garzik wrote:
> > Jay Vosburgh (3):
> >   bonding: Improve IGMP join processing
> 
> ip_mc_rejoin_group: Kill warning about unused variable `in_dev' when
> CONFIG_IP_MULTICAST is not set.
> 
> Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]>

Applied, thanks Geert.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_REORDER Kconfig help strange sentence.

2007-03-12 Thread Andi Kleen
On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote:
> OK, this confused me:
> 
> Function reordering (REORDER) [N/y/?] (NEW) ?
> 
> This option enables the toolchain to reorder functions for a more 
> optimal TLB usage. If you have pretty much any version of binutils, 
> this can increase your kernel build time by roughly one minute.
> 
> "If you have pretty much any version of binutils"?  Huh?
> 
> You mean "This will slow your kernel build by about a minute"?

Yes. Lots of sections seem to trigger some quadratic behaviour in ld.

It might be fixed in some unreleased CVS version though (not 100% sure) 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-12 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 09:50:08AM -0700, Dave Hansen wrote:
> On Mon, 2007-03-12 at 19:23 +0300, Kirill Korotaev wrote:
> > 
> > For these you essentially need per-container page->_mapcount counter,
> > otherwise you can't detect whether rss group still has the page 
> > in question being mapped in its processes' address spaces or not. 

> What do you mean by this?  You can always tell whether a process has a
> particular page mapped.  Could you explain the issue a bit more.  I'm
> not sure I get it.

OpenVZ wants to account _shared_ pages in a guest
different than separate pages, so that the RSS
accounted values reflect the actual used RAM instead
of the sum of all processes RSS' pages, which for
sure is more relevant to the administrator, but IMHO
not so terribly important to justify memory consuming
structures and sacrifice performance to get it right

YMMV, but maybe we can find a smart solution to the
issue too :)

best,
Herbert

> -- Dave
> 
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] mm: Inconsistent use of node IDs

2007-03-12 Thread Ethan Solomita

Andi Kleen wrote:

On Monday 12 March 2007 23:51, Ethan Solomita wrote:

This patch corrects inconsistent use of node numbers (variously "nid" or
"node") in the presence of fake NUMA.


I think it's very consistent -- your patch would make it inconsistent though.


	It's consistent to call node_online() with a physical node ID when the 
online node mask is composed of fake nodes?



Sorry, but when you ask for NUMA emulation you will get it. I don't see
any point in a "half way only for some subsystems I like" NUMA emulation. 
It's unlikely that your ideas of where it is useful and where is not

matches other NUMA emulation user's ideas too.


	I don't understand your comments. My code is intended to work for all 
systems. If the system is non-NUMA by nature, then all CPUs map to fake 
node 0.


	As an example, on a two chip dual-core AMD opteron system, there are 4 
"cpus" where CPUs 0 and 1 are close to the first half of memory, and 
CPUs 2 and 3 are close to the second half. Without this change CPUs 2 
and 3 are mapped to fake node 1. This results in awful performance. With 
this change, CPUs 2 and 3 are mapped to (roughly) 1/2 the fake node 
count. Their zonelists[] are ordered to do allocations preferentially 
from zones that are local to CPUs 2 and 3.


Can you tell me the scenario where my code makes things worse?

Besides adding such a secondary node space would be likely a huge long term 
mainteance issue. I just can it see breaking with every non trivial change.


	I'm adding no data structures to do this. The current code already has 
get_phys_node. My changes use the existing information about node 
layout, both the physical and fake, and defines a mapping. The current 
mapping just takes a physical node and says "it's the fake node too".



NACK.


	I wish you would include some specifics as to why you think what you 
do. You're suggesting we leave in place a system that destroys NUMA 
locality when using fake numa, and passes around physical node ids as an 
index into nodes[] whihc is indexed by fake nodes. My change has no 
effect without fake numa, and harms no one with fake numa.

-- Ethan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irda rmmod lockdep trace.

2007-03-12 Thread David Miller
From: Samuel Ortiz <[EMAIL PROTECTED]>
Date: Mon, 12 Mar 2007 02:38:43 +0200

> On Sat, Mar 10, 2007 at 07:43:26PM +0200, Samuel Ortiz wrote:
> > Hi Dave,
> > 
> > On Thu, Mar 08, 2007 at 05:54:36PM -0500, Dave Jones wrote:
> > > modprobe irda ; rmmod irda in 2.6.21rc3 gets me the spew below..
> > Well it seems that we call __irias_delete_object() from hashbin_delete(). 
> > Then
> > __irias_delete_object() calls itself hashbin_delete() again. We're trying to
> > get the lock recursively.
> Looking at the code more carefully, this seems to be a false positive:
> iriap_cleanup and and __irias_delete_object are taking 2 different locks from
> 2 different hashbin instances. The locks belong to the same lock class but
> they are hierarchically different. We need to tell the validator about it and
> the following patch does that. Comments are welcomed as I'm planning to push
> it to netdev soon:

I would strongly caution against adding any run-time overhead just to
cure a false lockdep warning.  Even adding a new function argument
is too much IMHO.

Make the cost show up for lockdep only, perhaps by putting each
hashbin lock into a seperate locking class?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.

2007-03-12 Thread Phil Kaslo
It is a pata cd drive, attached to the JMicron controller.

I'll look into whether the usb ports power off on shutdown.

Thanks,

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >