Re: pluggable scheduler thread (was Re: Volanomark slows by 80% under CFS)

2007-07-27 Thread Chris Snook

Andrea Arcangeli wrote:

On Fri, Jul 27, 2007 at 11:43:23PM -0400, Chris Snook wrote:
I'm pretty sure the point of posting a patch that triples CFS performance 
on a certain benchmark and arguably improves the semantics of sched_yield 
was to improve CFS.  You have a point, but it is a point for a different 
thread.  I have taken the liberty of starting this thread for you.


I've no real interest in starting or participating in flamewars
(especially the ones not backed by hard numbers). So I adjusted the
subject a bit in the hope the discussion will not degenerate as you
predicted, hope you don't mind.


Not at all.  I clearly misread your tone.


I'm pretty sure the point of posting that email was to show the
remaining performance regression with the sched_yield fix applied
too. Given you considered my post both offtopic and inflammatory, I
guess you think it's possible and reasonably easy to fix that
remaining regression without a pluggable scheduler, right? So please
enlighten us on your intend to achieve it.


There are four possibilities that are immediately obvious to me:

a) The remaining difference is due mostly to the algorithmic complexity 
of the rbtree algorithm in CFS.


If this is the case, we should be able to vary the test parameters (CPU 
count, thread count, etc.) graph the results, and see a roughly 
logarithmic divergence between the schedulers as some parameter(s) vary. 
 If this is the problem, we may be able to fix it with data structure 
tweaks or optimized base cases, like how quicksort can be optimized by 
using insertion sort below a certain threshold.


b) The remaining difference is due mostly to how the scheduler handles 
volanomark.


vmstat can give us a comparison of context switches between O(1), CFS, 
and CFS+patch.  If the decrease in throughput correlates with an 
increase in context switches, we may be able to induce more O(1)-like 
behavior by charging tasks for context switch overhead.


c) The remaining difference is due mostly to how the scheduler handles 
something other than volanomark.


If context switch count is not the problem, context switch pattern still 
could be.  I doubt we'd see a 40% difference due to cache misses, but 
it's possible.  Fortunately, oprofile can sample based on cache misses, 
so we can debug this too.


d) The remaining difference is due mostly to some implementation detail 
in CFS.


It's possible there's some constant-factor overhead in CFS that is 
magnified heavily by the context switching volanomark deliberately 
induces.  If this is the case, oprofile sampling on clock cycles should 
catch it.


Tim --

	Since you're already set up to do this benchmarking, would you mind 
varying the parameters a bit and collecting vmstat data?  If you want to 
run oprofile too, that wouldn't hurt.



Also consider the other numbers likely used nptl so they shouldn't be
affected by sched_yield changes.

Sure there is.  We can run a fully-functional POSIX OS without using any 
block devices at all.  We cannot run a fully-functional POSIX OS without a 
scheduler. Any feature without which the OS cannot execute userspace code 
 is sufficiently primitive that somewhere there is a device on which it will 
be impossible to debug if that feature fails to initialize.  It is quite 
reasonable to insist on only having one implementation of such features in 
any given kernel build.


Sounds like a red-herring to me... There aren't just pluggable I/O
schedulers in the kernel, there are pluggable packet schedulers too
(see `tc qdisc`). And both are switchable at runtime (not just at boot
time).

Can you run your fully-functional POSIX OS without a packet scheduler
and without an I/O scheduler? I wonder where are you going to
read/write data without HD and network?


If I'm missing both, I'm pretty screwed, but if either one is 
functional, I can send something out.



Also those pluggable things don't increase the risk of crash much, if
compared to the complexity of the schedulers.

Whether or not these alternatives belong in the source tree as config-time 
options is a political question, but preserving boot-time debugging 
capability is a perfectly reasonable technical motivation.


The scheduler is invoked very late in the boot process (printk and
serial console, kdb are working for ages when scheduler kicks in), so
it's fully debuggable (no debugger depends on the scheduler, they run
inside the nmi handler...), I don't really see your point.


I'm more concerned about embedded systems.  These are the same people 
who want userspace character drivers to control their custom hardware. 
Having the robot point to where it hurts is a lot more convenient than 
hooking up a JTAG debugger.



And even if there would be a subtle bug in the scheduler you'll never
trigger it at boot with so few tasks and so few context switches.


Sure, but it's the non-subtle bugs that worry me.  These are usually 
related to low-level hardware setup, so they could miss the mainst

Re: request for patches: showing mount options

2007-07-27 Thread Ian Kent
On Fri, 2007-07-27 at 17:40 +0200, Miklos Szeredi wrote:
> >   all - fs has options, but doesn't define ->show_options()
> >   some - fs defines ->show_options(), but some options are not shown
> >   noopt - fs does not have options
> >   good - fs shows all options
> >   patch - I have a patch
> 
> [...]
> 
> > > autofs  all
> > 
> > I'm not sure I understand this.
> > How does autofs show it's options without a ->show_options method?
> 
> It doesn't.  The "all" means, all of them need to be added to
> ->show_options(), not that all are shown.

Oh .. sorry, I wasn't paying enough attention.

But now might be a good time to propose the removal of autofs and rename
autofs4 to autofs. I would need to provide some way to map autofs4
module load requests to autofs for backward compatibility but haven't
thought about that yet.

> 
> I can see now that this is slightly confusing, sorry.
> 
> So the ones that need attention are "all" and "some".  The others are
> fine in theory.  Of course I may have missed something.
> 
> > > autofs4 some
> > 
> > OK, uid and gid aren't shown.
> > That should be straight forward to fix.
> > What's your time frame for this?
> 
> ASAP ;)
> 
> 2.6.24 would be a nice, but it won't be easy...

The autofs4 (and, if needed autofs) should be straight forward.
I'll do these.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’

2007-07-27 Thread Andrew Morton
On Fri, 27 Jul 2007 23:14:04 -0700 "Miles Lane" <[EMAIL PROTECTED]> wrote:

> On 7/27/07, Miles Lane <[EMAIL PROTECTED]> wrote:
> > Do you need my .config file?
> >
> >   CC  mm/sparse.o
> > mm/sparse.c: In function 'sparse_init':
> > mm/sparse.c:482: error: implicit declaration of function
> > 'sparse_early_usemap_alloc'
> > mm/sparse.c:482: warning: assignment makes pointer from integer without a 
> > cast
> > make[1]: *** [mm/sparse.o] Error 1
> >
> 
> #
> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.23-rc1-mm1
> # Fri Jul 27 22:54:36 2007

Whatever it is was gone away in the current -mm lineup so I
guess one of the post-2.6.23-rc1-mm1 patches I merged fixed it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc1-mm1 + hotfixes -- Section mismatches

2007-07-27 Thread Sam Ravnborg
On Fri, Jul 27, 2007 at 10:16:35PM -0700, Miles Lane wrote:
>   MODPOST vmlinux.o
> WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to
> .init.text.1:start_kernel (between 'is386' and 'check_x87')

This one is not fixed - yet.

The rest are fixed in latest -linus.
modpost choked over the added number following the section name.
Like in .init.text.4 below.
  ^^

> WARNING: vmlinux.o(.data+0x53c0): Section mismatch: reference to
> .init.text.4:native_smp_prepare_boot_cpu (between 'smp_ops' and
> 'call_lock')

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’

2007-07-27 Thread Miles Lane
On 7/27/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 27, 2007 at 11:00:54PM -0700, Miles Lane wrote:
> > Do you need my .config file?
>
> Please always send the .config - it makes reproducing an error and
> verifying a fix much easier.
>
> This list has a 400 kB per email limit, and as long as you don't hit
> this limit you have never sent too much information.
>
> >   CC  mm/sparse.o
> > mm/sparse.c: In function 'sparse_init':
> > mm/sparse.c:482: error: implicit declaration of function
> > 'sparse_early_usemap_alloc'
> > mm/sparse.c:482: warning: assignment makes pointer from integer without a 
> > cast
> > make[1]: *** [mm/sparse.o] Error 1
>
> The .config also tells which kernel you are using.
>
> This doesn't seem to be Linus' tree.
> This seems to be 2.6.23-rc1-mm1?

Rats.  I almost always remember to specify the kernel version.
Sorry.  Yes, it's the Andrew's latest tree, plus hotfixes.

Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ia64: fix a few section mismatch warnings

2007-07-27 Thread Sam Ravnborg
On Fri, Jul 27, 2007 at 03:32:13PM -0700, Luck, Tony wrote:
> - mca_data = alloc_bootmem(sizeof(struct ia64_mca_cpu)
> -  * NR_CPUS + KERNEL_STACK_SIZE);
> + mca_data = mca_bootmem(NR_CPUS + KERNEL_STACK_SIZE);
> 
> Oops.  You moved the multiply by sizeof(struct ia64_mca_cpu) up into
> the mca_bootmem() function to make it very specific to this use. But
> mutiply has higher precedence than addition.

Oh crap - good catch.
Shall I resubmit a corrected patch?

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sparse.c:482: error: impl icit declaration of function ‘sparse_early_usemap _alloc’

2007-07-27 Thread Adrian Bunk
On Fri, Jul 27, 2007 at 11:00:54PM -0700, Miles Lane wrote:
> Do you need my .config file?

Please always send the .config - it makes reproducing an error and 
verifying a fix much easier.

This list has a 400 kB per email limit, and as long as you don't hit 
this limit you have never sent too much information.

>   CC  mm/sparse.o
> mm/sparse.c: In function 'sparse_init':
> mm/sparse.c:482: error: implicit declaration of function
> 'sparse_early_usemap_alloc'
> mm/sparse.c:482: warning: assignment makes pointer from integer without a cast
> make[1]: *** [mm/sparse.o] Error 1

The .config also tells which kernel you are using.

This doesn't seem to be Linus' tree.
This seems to be 2.6.23-rc1-mm1?

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] flush icache before set_pte() take 5. [2/2] sync icache dcache for ia64

2007-07-27 Thread KAMEZAWA Hiroyuki
flush icache for ia64 take4.
This patch is against 2.6.23-rc1.

Changes V4 -> V5:
  - removed sync_icache_dcache from do_wp_page() page reuse case.

Changes v3 -> v4:
  - avoid implementing flush_(i)cache_pages().
  - added sync_icache_dcache() call.
  - change Documentation/cachetlb.txt

Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte(). This is wrong. This patch removes lazy_mmu_prot_update and
add sync_icache_dcache(). sync_icache_dcache() is called before set_pte()
if necessary and synchronize icache with dcache (fc.i instruction).

This patch fixes SIGILL problem on NFS/ia64.

About Icache-Dcache inconsistency in ia64
 - When the cache line is modified, Icache and Dcache are purged.

 - When I-cache misses, I-cache will access just the lower layer cache(memory).
   Then, If the lower_layer_cache is not up-to-date, I-cache will see
   old information. For avoiding this case, Icache-Dcache synchronization(fc.i)
   is necessary. (Icache-Dcache synchronization means making Dcache and lower
   layer unified cache(memory) consistent.)

Details:
 - In general, cache flushing macro are used for virtually tagged caches.
   IA64 has physically tagged caches but doesn't guarantee consistency 
   between Icache and Dcache. So, new macro, sync_icache_dcache() is added.
   This is NO-OP in other archs.
 - sync_icache_dcache() only works if pte is executable.
 - sync_icache_dcache must be called before set_pte().
 - A page which is consistent is marked as PG_arch_1.

About changes in generic codes:
 - do_wp_page() need to sync newly copied page.
Here, lazy_mmu_prot_update() was done before set_pte().
This was because someone mets SIGILL in JAVA and small
fix was applied.
 - do_anonymous_page()  newly installed anon pages doesn't contains any
instruction when set_pte() is executed, icache-dcache
synchronization is not necessary.

 - __do_fault()  need to sync newly-installed page.

 - handle_pte_fault()  just changes access bit...then, no need to sync.

 - remove_migration_pte() need to sync newly-installed page.

 
 - change_pte_range()  need to sync icache-dcache. When a user writes
 instruction into the page and modifies protection to be
 executable, it should be synced.

 - hugetlb_change_protection()  Maybe cache will be expired...but
  it is safe to sync Icache before set_pte().

 - page_mkclean_one()  no need to sync icache-dcache. There is no page
   contents modification. And there is no protection 
   change.

Thanks to Zoltan Menyhart for his advices.

Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 Documentation/cachetlb.txt|   11 +++
 arch/ia64/mm/init.c   |6 ++
 include/asm-generic/pgtable.h |8 
 include/asm-ia64/pgtable.h|   15 ++-
 mm/hugetlb.c  |3 +--
 mm/memory.c   |7 ++-
 mm/migrate.c  |2 +-
 mm/mprotect.c |2 +-
 mm/rmap.c |1 -
 9 files changed, 28 insertions(+), 27 deletions(-)

Index: linux-2.6.23-rc1.test/include/asm-generic/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-generic/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-generic/pgtable.h
@@ -124,14 +124,14 @@ static inline void ptep_set_wrprotect(st
 #define pgd_offset_gate(mm, addr)  pgd_offset(mm, addr)
 #endif
 
-#ifndef __HAVE_ARCH_LAZY_MMU_PROT_UPDATE
-#define lazy_mmu_prot_update(pte)  do { } while (0)
-#endif
-
 #ifndef __HAVE_ARCH_MOVE_PTE
 #define move_pte(pte, prot, old_addr, new_addr)(pte)
 #endif
 
+#ifndef __HAVE_ARCH_SYNC_ICACHE_DCACHE
+#define sync_icache_dcache(pte)do {} while (0)
+#endif
+
 /*
  * A facility to provide lazy MMU batching.  This allows PTE updates and
  * page invalidations to be delayed until a call to leave lazy MMU mode
Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
@@ -484,11 +484,17 @@ extern struct page *zero_page_memmap_ptr
 #endif
 
 /*
- * IA-64 doesn't have any external MMU info: the page tables contain all the 
necessary
- * information.  However, we use this routine to take care of any (delayed) 
i-cache
- * flushing that may be necessary.
+ * IA-64 doesn't guarantee Icache is consistent with Dcache. For ensure
+ * Icache consistency, we have to synchronize them before setting pte
+ * as an executable pte.
  */
-extern void lazy_mmu_prot_update (pte_t pte);
+extern void __sync_icache_dcache(pte_t pte);
+static inline void sync_icache_dcache(pte_t pte) {
+   if 

[PATCH] flush icache before set_pte() take 5. [1/2] cache flush in migration

2007-07-27 Thread KAMEZAWA Hiroyuki
In migration, a new page should be cache flushed before set_pte()
in some archs which have virtually-tagged cache..

V4 -> V5:
   * changed flush_icache_page to flush_cache_page.

Signed-Off-By: KAMEZAWA Hiruyoki <[EMAIL PROTECTED]>

---
 mm/migrate.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.23-rc1.test/mm/migrate.c
===
--- linux-2.6.23-rc1.test.orig/mm/migrate.c
+++ linux-2.6.23-rc1.test/mm/migrate.c
@@ -172,6 +172,7 @@ static void remove_migration_pte(struct 
pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
if (is_write_migration_entry(entry))
pte = pte_mkwrite(pte);
+   flush_cache_page(vma, addr, pte_pfn(pte));
set_pte_at(mm, addr, ptep, pte);
 
if (PageAnon(new))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’

2007-07-27 Thread Randy Dunlap
On Fri, 27 Jul 2007 23:00:54 -0700 Miles Lane wrote:

> Do you need my .config file?

Ideally, yes.  Is this for 2.6.23-rc1-mm1?


>   CC  mm/sparse.o
> mm/sparse.c: In function 'sparse_init':
> mm/sparse.c:482: error: implicit declaration of function
> 'sparse_early_usemap_alloc'
> mm/sparse.c:482: warning: assignment makes pointer from integer without a cast
> make[1]: *** [mm/sparse.o] Error 1


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] flush icache before set_pte() take 5.

2007-07-27 Thread KAMEZAWA Hiroyuki
Appliled comments on take 4.
patches are against 2.6.23-rc1.

Changes:
  - changes flush_icache_page to be flush_cache_page() in 
remove_migration_pte().
  - removed sync_icache_dcahe() in page reuse case of do_wp_page(). 

Considerations:
  - I can add CONFIG_MONTECITO if necessary. But it will be confusing, I think.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’

2007-07-27 Thread Miles Lane
Do you need my .config file?

  CC  mm/sparse.o
mm/sparse.c: In function 'sparse_init':
mm/sparse.c:482: error: implicit declaration of function
'sparse_early_usemap_alloc'
mm/sparse.c:482: warning: assignment makes pointer from integer without a cast
make[1]: *** [mm/sparse.o] Error 1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: Fix procfs compat_ioctl regression.

2007-07-27 Thread David Miller

Alexey reviewed the patch and is fine with this fix.

Please apply, thanks!

[PROCFS]: Fix ioctl regression.

It is important to only provide the compat_ioctl method
if the downstream de->proc_fops does too, otherwise this
utterly confuses the logic in fs/compat_ioctl.c and we
end up doing the wrong thing.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Acked-by: Alexey Dobriyan <[EMAIL PROTECTED]>

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 94e2c1a..a5b0dfd 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -386,6 +386,19 @@ static const struct file_operations proc_reg_file_ops = {
.release= proc_reg_release,
 };
 
+#ifdef CONFIG_COMPAT
+static const struct file_operations proc_reg_file_ops_no_compat = {
+   .llseek = proc_reg_llseek,
+   .read   = proc_reg_read,
+   .write  = proc_reg_write,
+   .poll   = proc_reg_poll,
+   .unlocked_ioctl = proc_reg_unlocked_ioctl,
+   .mmap   = proc_reg_mmap,
+   .open   = proc_reg_open,
+   .release= proc_reg_release,
+};
+#endif
+
 struct inode *proc_get_inode(struct super_block *sb, unsigned int ino,
struct proc_dir_entry *de)
 {
@@ -413,8 +426,15 @@ struct inode *proc_get_inode(struct super_block *sb, 
unsigned int ino,
if (de->proc_iops)
inode->i_op = de->proc_iops;
if (de->proc_fops) {
-   if (S_ISREG(inode->i_mode))
-   inode->i_fop = &proc_reg_file_ops;
+   if (S_ISREG(inode->i_mode)) {
+#ifdef CONFIG_COMPAT
+   if (!de->proc_fops->compat_ioctl)
+   inode->i_fop =
+   &proc_reg_file_ops_no_compat;
+   else
+#endif
+   inode->i_fop = &proc_reg_file_ops;
+   }
else
inode->i_fop = de->proc_fops;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: reduce pagetable-freeing latencies

2007-07-27 Thread Hugh Dickins
On Sat, 28 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> As I'm sweeping through arch code etc... preparing the ground for the
> proper mmu_gather surgery, I've been thinking about the way to deal with
> that per-cpu page list and finally came up with the idea that the best
> we can do is around the lines of trying to allocate the list via gfp,
> and if that fails, fallback to a (smaller than now) per-cpu. I'm
> reworking the interfaces such that the higher level code doesn't have to
> care whether preemption is enabled or disabled at a given point.

That doesn't sound like the best way to me at all.  Using two means
of buffering, one with preemption enabled and the other not, seems
complex and prone to error (perhaps not while you're working on it,
but later on).  We do already have that problem (the i_mmap_lock case
versus the others), but it's not a complication I'd want to extend.

The onstack array seems fine to me, even if you do end up deciding on
an array of one.  Is there any evidence that it's a problem getting a
page for the freeing (other than in circumstances that are already
badly slowed down)?  It's obvious that we need a fallback route,
but optimizing throughput on that route seems premature.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] flush cache fixes for ia64 [1/2] migration fix

2007-07-27 Thread KAMEZAWA Hiroyuki
On Sat, 28 Jul 2007 07:06:09 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Fri, 27 Jul 2007 09:39:16 -0700 (PDT)
> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> 
> > This will have no effect on x86_64, ia64 and i386. Maybe useful for 
> > virtually mapped platforms (parisc)?
> > 
> yes.
> 
Ahh... but I should notify you that I added sync_icahce_dcache() (for ia64)
here. I'll post take5.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: request for patches: showing mount options

2007-07-27 Thread Miklos Szeredi
> >> Some mount options are never passed to the kernel, and thus can't appear 
> >> in /proc/mounts.  Examples include user, users, and _netdev for NFS.
> > 
> > These options control *who* may mount and *when* to mount.  They are
> > not a property of the mount itself and are not added to /etc/mtab.
> > 
> > There's a "user=ID" option that is added to /etc/mtab in case of user
> > mounts.  This identifies the owner of the mount, so that it can be
> > unmounted by that user.  There are patches in -mm that enable the
> > kernel to store this info.
> > 
> > Do you have other examples in mind?
> 
> [no]quota comes to mind;

These are passed to the kernel.

> also auto,

This controls when a filesystem is mounted, same category as '_netdev'

> [no]owner, [no]group,

These control who can mount the filesystem, same category as 'user' and 'users'

> quiet/loud,

I can't find these in the manual as universal options.  Quiet is
defined for a couple of filesystems but with different meaning for
each of them.

> Aside: It's a confusing artifact of the mount CLI that these options 
> control who/when but are passed to the mount command in the same way the 
> other options are.

Yes, slightly.  Actually most of these options are just ignored on the
command line.  They only have an affect in /etc/fstab.

The right behavior of mount(8) would probably be to error out on these
options, since they make no sense on the command line.  But this is
not a kernel issue.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Rene Herman

On 07/27/2007 10:28 PM, Daniel Hazelton wrote:


Check the attitude at the door then re-read what I actually said:


Attitude? You wanted attitude dear boy?


Updatedb or another process that uses the FS heavily runs on a users
256MB P3-800 (when it is idle) and the VFS caches grow, causing memory
pressure that causes other applications to be swapped to disk. In the
morning the user has to wait for the system to swap those applications
back in.


I never said that it was the *program* itself - or *any* specific program (I 
used "Updatedb" because it has been the big name in the discussion) - doing 
the filling of memory. I actually said that the problem is that the kernel's 
caches - VFS and others - will grow *WITHOUT* *LIMIT*, filling all available 
memory. 


WHICH SWAP-PREFETCH DOES NOT HELP WITH.
WHICH SWAP-PREFETCH DOES NOT HELP WITH.
WHICH SWAP-PREFETCH DOES NOT HELP WITH.

And now finally get that through your thick scull or shut up, right fucking now.

You want to know what causes the problem? The current design of the caches. 
They will extend without much limit, to the point of actually pushing pages 
to disk so they can grow even more. 


Due to being a generally nice guy, I am going to try _once_ more to try and 
make you understand. Not twice, once. So pay attention. Right now.


Those caches are NOT causing any problem under discussion. If any caches 
grow to the point of causing swap-out, they have filled memory and 
swap-prefetch cannot and will not do anything since it needs free (as in not 
occupied by caches) memory. As such, people maintaining that swap-prefetch 
helps their situation are not being hit by caches.


The only way swap-prefetch can (and will) do anything is when something that 
by itself takes up lots of memory runs and exits. So can we now please 
finally drop the fucking red herring and start talking about swap-prefetch?


If we accept that some of the people maintaining that swap-prefetch helps 
them are not in fact deluded -- a bit of a stretch seeing as how not a 
single one of them is substantiating anything -- we have a number of 
slightly different possibilities for "something" in the above.


-- 1)

It could be an inefficient updatedb. Although he isn't experiencing the 
problem, Bjoern Steinbrink is posting numbers (w!) that show that at 
least the GNU version spawns a large memory "sort" process meaning that on a 
low-memory box updatedb itself can be what causes the observed problem.


While in this situation switching to a different updatedb (slocate, mlocate) 
obviously makes sense it's the kind of situation where swap-prefetch will help.


-- 2)

It could be something else entirely such as a backup run. I suppose people 
would know if they were running anything of the sort though and wouldn't 
blaim anything on updatedb.


Other than that, it's again the situation where swap-prefetch would help.

-- 3)

The something else entirely can also run _after_ updatedb, kicking out the 
VFS caches and leaving free memory upon exit. I still suppose the same thing 
as under (2) but this is the only way how updatedb / VFS caches can even be 
part of any problem, if the _combined_ memory pressure is just enough to 
make the difference.


The direct problem is still just the "something else entirely" and needs 
someone affected to tell us what it is.


I already did. You completely ignored it because I happened to use the magic 
words "updatedb" and "swap prefetch". 


No I did not. This thread is about swap-prefetch and you used the magic 
words VFS caches. I don't give a fryin' fuck if their filling is caused by 
updatedb or the cat sleeping on the "find /" keys on your keyboard, 
they're still not causing anything swap-prefetch helps with.


This thread has seen input from a selection of knowledgeable people and 
Morton was even running benchmarks to look at this supposed VFS cache 
problem and not finding it. The only further input this thread needs is 
someone affected by the supposed problem.


Which I ofcourse notice in a followup of yours you are not either -- you're 
just here to blabber, not to solve anything.


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc1-mm1 + hotfixes -- Section mismatches

2007-07-27 Thread Miles Lane
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to
.init.text.1:start_kernel (between 'is386' and 'check_x87')
WARNING: vmlinux.o(.data+0x53c0): Section mismatch: reference to
.init.text.4:native_smp_prepare_boot_cpu (between 'smp_ops' and
'call_lock')
WARNING: vmlinux.o(.data+0x53c4): Section mismatch: reference to
.init.text.4:native_smp_prepare_cpus (between 'smp_ops' and
'call_lock')
WARNING: vmlinux.o(.data+0x53cc): Section mismatch: reference to
.init.text.4:native_smp_cpus_done (between 'smp_ops' and 'call_lock')
WARNING: vmlinux.o(.data+0x6598): Section mismatch: reference to
.init.text.6:machine_specific_memory_setup (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x65a0): Section mismatch: reference to
.init.text.4:native_init_IRQ (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x65a4): Section mismatch: reference to
.init.text.4:hpet_time_init (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x65a8): Section mismatch: reference to
.init.text.5:native_pagetable_setup_start (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x65ac): Section mismatch: reference to
.init.text.5:native_pagetable_setup_done (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x65b0): Section mismatch: reference to
.init.text.4:default_banner (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x6674): Section mismatch: reference to
.init.text.4:setup_boot_APIC_clock (between 'paravirt_ops' and
'reserve_ioports')
WARNING: vmlinux.o(.data+0x17840): Section mismatch: reference to
.init.text.19:vesafb_probe (between 'vesafb_driver' and 'vesafb_ops')
WARNING: vmlinux.o(.data+0x1ef00): Section mismatch: reference to
.init.text.19:hvc_console_setup (between 'hvc_con_driver' and
'vtermnos')
WARNING: vmlinux.o(.data+0x20780): Section mismatch: reference to
.init.text.19:serial8250_console_setup (between 'serial8250_console'
and 'serial8250_reg')
WARNING: vmlinux.o(.data+0x20784): Section mismatch: reference to
.init.text.19:serial8250_console_early_setup (between
'serial8250_console' and 'serial8250_reg')
WARNING: vmlinux.o(.data+0x259cc): Section mismatch: reference to
.init.text.19:smsc_ircc_pnp_probe (between 'smsc_ircc_pnp_driver' and
'__param_str_ircc_transceiver')
WARNING: vmlinux.o(.data+0x2e5f0): Section mismatch: reference to
.init.text.19:pci_eisa_init (between 'pci_eisa_driver' and
'pci_eisa_pci_tbl')
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pluggable scheduler thread (was Re: Volanomark slows by 80% under CFS)

2007-07-27 Thread Andrea Arcangeli
On Fri, Jul 27, 2007 at 11:43:23PM -0400, Chris Snook wrote:
> I'm pretty sure the point of posting a patch that triples CFS performance 
> on a certain benchmark and arguably improves the semantics of sched_yield 
> was to improve CFS.  You have a point, but it is a point for a different 
> thread.  I have taken the liberty of starting this thread for you.

I've no real interest in starting or participating in flamewars
(especially the ones not backed by hard numbers). So I adjusted the
subject a bit in the hope the discussion will not degenerate as you
predicted, hope you don't mind.

I'm pretty sure the point of posting that email was to show the
remaining performance regression with the sched_yield fix applied
too. Given you considered my post both offtopic and inflammatory, I
guess you think it's possible and reasonably easy to fix that
remaining regression without a pluggable scheduler, right? So please
enlighten us on your intend to achieve it.

Also consider the other numbers likely used nptl so they shouldn't be
affected by sched_yield changes.

> Sure there is.  We can run a fully-functional POSIX OS without using any 
> block devices at all.  We cannot run a fully-functional POSIX OS without a 
> scheduler. Any feature without which the OS cannot execute userspace code 
>  is sufficiently primitive that somewhere there is a device on which it will 
> be impossible to debug if that feature fails to initialize.  It is quite 
> reasonable to insist on only having one implementation of such features in 
> any given kernel build.

Sounds like a red-herring to me... There aren't just pluggable I/O
schedulers in the kernel, there are pluggable packet schedulers too
(see `tc qdisc`). And both are switchable at runtime (not just at boot
time).

Can you run your fully-functional POSIX OS without a packet scheduler
and without an I/O scheduler? I wonder where are you going to
read/write data without HD and network?

Also those pluggable things don't increase the risk of crash much, if
compared to the complexity of the schedulers.

> Whether or not these alternatives belong in the source tree as config-time 
> options is a political question, but preserving boot-time debugging 
> capability is a perfectly reasonable technical motivation.

The scheduler is invoked very late in the boot process (printk and
serial console, kdb are working for ages when scheduler kicks in), so
it's fully debuggable (no debugger depends on the scheduler, they run
inside the nmi handler...), I don't really see your point.

And even if there would be a subtle bug in the scheduler you'll never
trigger it at boot with so few tasks and so few context switches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


IRQ Delivery Problem for MCP65

2007-07-27 Thread Craig Block
Hello,

I'm having trouble getting Linux to see any hard drives on an ASUS M2N-X
motherboard with an MCP65 (nForce 520) chipset.  When the kernel probes the
AHCI controllers, it hangs for a minute or so on each one and returns the
following;

ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Thanks in advance for any help you can lend,

  - Craig


   

Got a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] SOFTWARE_SUSPEND: handle HOTPLUG_CPU automatically

2007-07-27 Thread Adrian Bunk
On Fri, Jul 27, 2007 at 03:57:39PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 28 Jul 2007, Adrian Bunk wrote:
> > 
> > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so 
> > what about something like the patch below?
> 
> Yeah, this looks reasonable.
> 
> May I suggest another level of indirection, though:
> 
> > +config SUSPEND_SMP_POSSIBLE
> > +   bool
> > +   depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
> > +   depends on SMP
> > +   default y
> 
> How about making this a bit more split up, and do it as
> 
>   # SMP suspend is possible on ..
>   config SUSPEND_SMP_POSSIBLE
>   bool
>   depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || 
> PPC_PMAC))
>   default y
> 
>   # UP suspend is possible on ..
>   config SUSPEND_UP_POSSIBLE
>   bool
>   depends on X86 || PPC64_SWSUSP || FRV || PPC32
>   default y 

Sounds good.

>   # Can we suspend?
>   config SUSPEND_POSSIBLE
>   bool
>   depends on (SMP && SUSPEND_SMP_POSSIBLE) || 
> (SUSPEND_UP_POSSIBLE && !SMP)
>   default y

IMHO not required:

config SOFTWARE_SUSPEND
bool "Software Suspend (Hibernation)"
depends on PM && SWAP
depends on SUSPEND_UP_POSSIBLE || SUSPEND_SMP_POSSIBLE

> and then we have just a
> 
>   config SOFTWARE_SUSPEND
>   bool "Software Suspend (Hibernation)"
>   depends on PM && SWAP
>   depends on SUSPEND_POSSIBLE
> 
>   config SUSPEND_SMP
>   bool
>   depends on SOFTWARE_SUSPEND && SMP
>   select HOTPLUG_CPU
>   default y
> 
> and now each of the config options looks pretty simple and describe one 
> thing.
> 
> [ For extra bonus points: the SUSPEND_POSSIBLE thing is still pretty 
>   complicated, and it might actually be a better idea to make it a 
>   per-arch config option, and just make the x86/arch say
> 
>   config SUSPEND_POSSIBLE
>   bool
>   depends on !(X86_VOYAGER && SMP)
>   default y

This would give you "trying to assign nonexistent symbol SUSPEND_POSSIBLE"
kconfig warnings on architectures without SUSPEND_POSSIBLE.

(And you missed the UP case in your example.)

>   instead: since SUSPEND_POSSIBLE is always true on x86 regardless of SMP 
>   or not, just not on X86_VOYAGER. Then, each architecture can have its 
>   own private rules for whether that architecture has SUSPEND_POSSIBLE or 
>   not, so on ppc, it might look like
> 
>   config SUSPEND_POSSIBLE
>   bool
>   depends on (PPC64 && (PPC_PSERIES || PPC_PMAC)) || PPC_SWSUSP
>   bool y
> 
>   or something, but the point is, now the complexity is a per-architecture 
>   thing, so other architectures simply don't have to care any more! ]
> 
> And the user only ever sees one single question: the one for 
> "SOFTWARE_SUSPEND". All the others would directly flow either from the 
> architecture choice, or from that.
> 
> Anybody willing to rewrite it that way?

Patch below.

>   Linus

cu
Adrian


<--  snip  -->


An implementation detail of the suspend code that is not intuitive for 
the user is the HOTPLUG_CPU dependency of SOFTWARE_SUSPEND if SMP.

This patch handles the dependency of SOFTWARE_SUSPEND on HOTPLUG_CPU 
automatically without the user requiring to know about it.

Thanks to Stefan Richter and Linus Torvalds for valuable feedback.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/i386/Kconfig|   16 +++-
 arch/powerpc/Kconfig |   11 +--
 arch/x86_64/Kconfig  |   19 ---
 kernel/power/Kconfig |   23 +--
 4 files changed, 49 insertions(+), 20 deletions(-)

commit bb14e6721dc4e1a97efbfa5398d6021b321af52d
Author: Adrian Bunk <[EMAIL PROTECTED]>
Date:   Sat Jul 28 06:47:03 2007 +0200

asdf

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index abb582b..eb00a12 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -903,13 +903,19 @@ config PHYSICAL_ALIGN
 
  Don't change this unless you know what you are doing.
 
-config HOTPLUG_CPU
-   bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)"
+config HOTPLUG_CPU_POSSIBLE
+   bool
depends on SMP && HOTPLUG && EXPERIMENTAL && !X86_VOYAGER
+   select SUSPEND_SMP_POSSIBLE
+   default y
+
+config HOTPLUG_CPU
+   bool "Support for hot-pluggable CPUs (EXPERIMENTAL)" if !SUSPEND_SMP
+   depends on HOTPLUG_CPU_POSSIBLE
+   default y if SUSPEND_SMP
---help---
- Say Y here to experiment with turning CPUs off and on, and to
- enable suspend on SMP systems. CPUs can be controlled through
- /sys/devices/system/cpu.
+ Say Y here to experiment with turning CPUs off and on.
+ CPUs can be controlled through /sys/devices/system/cpu.
 
 config COMPAT

Re: serial flow control appears broken

2007-07-27 Thread Lee Howard

Paul Fulghum wrote:


So this seems to be a latency issue reading the receive
FIFO in the ISR. The current rx FIFO trigger level
should be 8 bytes (UART_FCR_R_TRIG_10) which gives the
ISR 694usec to get the data at 115200bps.

IIRC, in 2.2.X kernels this defaulted to 4 bytes
(TRIG_01) which gave a little more time to service the interrupt.

How does the data rate affect the frequency of the overrun errors?
Does 57600bps make them go away?
 



The overrun error message does not occur on every instance of data 
corruption.  (I just became aware of this as I've not been paying so 
much attention to the error messages as I have been to the corrupt 
data.)  The data gets far more corrupted than the error messages would 
lead me to believe.  Since the data being sent from the fax modem to the 
host is identical (same image data) every time it's easier for me to 
measure the effect of one bitrate over another by examining the number 
of missing bytes from the data.


The image has a total of 140465 bytes.  Just now I sent it 5 times each 
at 115200, 57600, 38400, and 19200 bps.


At 115200 bps the number of bytes skipped were:  63, 5, 44, 48, and 2.

At 57600 bps the number of bytes skipped were:  0, 1, 13, 9, and 12.

At 38400 bps the number of bytes skipped were 858, 0, 0, 0, and 8.

At 19200 bps the number of bytes skipped were 0, 0, 0, 0, and 0.

Curiously, the session at 38400 bps that skipped 858 bytes... coincided, 
not just in sequence but also in precice timing within the session, with 
a small but noticeable disk load that I caused by grepping through a 
hundred session logs.  (I can't reproduce it easily, though, because of 
disk caching.)


And, perhaps this is relevant... the way that I have the fax modem 
sending the data to the host is by receiving it from another fax modem 
which is sending it.  Thus, the modem on ttyS0 is sending a fax to the 
modem on ttyS1.  Due to the error correction protocol that is performed 
between the two fax endpoints I can guarantee that the data is correct 
as it leaves the DCE.  I mention this in case there is any limitation to 
how the 8250 driver performs when two modems are being run simultaneously.


Thanks,

Lee.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT 0/5] Input locking patches

2007-07-27 Thread Al Boldi
Indan Zupancic wrote:
> On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote:
> > Hi everyone,
> >
> > I finally managed to put together some patches implementing
> > locking in input core and main input handles. Please look
> > over them and give them a spin.
>
> Since kernel 2.6.21 or so I was annoyed by a warping mouse, and
> one kernel version later also by "stuck" keys, causing repeated input
> at the most inconvenient moments (e.g. when opening a program by
> pressing F1).
>
> As it happened irregularly and unpredictable it was hard to debug,
> and I suspected faulty hardware. My cpu was quite hot, but after
> removing all the dust it seems all right again. Unfortunately that
> was about the same time I upgraded to 2.6.23-rc1, so all I can say
> is that the stuck key problem seems to be gone, though not sure
> thanks to what, but that neither the cleaning nor the upgrade fixed
> the warping mouse problem.
>
> I'm running with these locking patches for two days now and the
> mouse doesn't warp any more (it can also have fixed the stuck key
> problem, not sure). Normally it would warp several times a day,
> and that didn't happen yet, so I'm tempted to praise your patches.
>
> Sorry for the babbling, just wanted to say that I've tested these
> patches and that they seem to fix real problems.

Thanks for babbling!

I'm having these same intermittent problems starting around 2.6.21, and 
wasn't really sure if it was hardware or not, so didn't bother reporting 
them.  This is what I see sometimes in the logs:

=
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: PC Speaker as /class/input/input3
input: AT Translated Set 2 keyboard as /class/input/input4
input: ImPS/2 Generic Wheel Mouse as /class/input/input5
psmouse.c: bad data from KBC - bad parity
psmouse.c: Wheel Mouse at isa0060/serio1/input0 lost synchronization, 
throwing 2 bytes away.
=

Thanks!

--
Al
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread Al Boldi
Chris Snook wrote:
> Al Boldi wrote:
> > IMHO, what everybody agrees on, is that swap-prefetch has a positive
> > effect in some cases, and nobody can prove an adverse effect (excluding
> > power consumption).  The reason for this positive effect is also crystal
> > clear: It prefetches from swap on idle into free memory, ie: it doesn't
> > force anybody out, and they are the first to be dropped without further
> > swap-out, which sounds really smart.
> >
> > Conclusion:  Either prove swap-prefetch is broken, or get this merged
> > quick.
>
> If you can't prove why it helps and doesn't hurt, then it's a hack, by
> definition.

Ok, slow down: swap-prefetch isn't a hack.  It's a kernel-thread that adds 
swap-prefetch functionality to the kernel.

> With swap prefetch, we're only optimizing the case when the box isn't
> loaded and there's RAM free, but we're not optimizing the case when the
> box is heavily loaded and we need for RAM to be free.

Exactly, swap-prefetch is very specific, and that's why it's so successful:  
It does one thing, and it does that very well.

> I'm inclined to view swap prefetch as a successful scientific experiment,
> and use that data to inform a more reasoned engineering effort.  If we can
> design something intelligent which happens to behave more or less like
> swap prefetch does under the circumstances where swap prefetch helps, and
> does something else smart under the circumstances where swap prefetch
> makes no discernable difference, it'll be a much bigger improvement.

Well, a swapless OS would really be the ultimate, but that's another thread 
entirely (see thread: '[RFC] VM: I have a dream...')

Don't mistake swap-prefetch as trying to additionally fix swap-in slowdown, 
and if it did, then that would be a hack, but it doesn't.

Instead, understand that swap-prefetch is viable even if all swapper issues 
have been solved, because swapping implies pages being swapped in when 
needed, and swap-prefetch smartly uses idle time to do so.

> Because we cannot prove why the existing patch helps, we cannot say what
> impact it will have when things like virtualization and solid state drives
> radically change the coefficients of the equation we have not solved. 
> Providing a sysctl to turn off a misbehaving feature is a poor substitute
> for doing it right the first time, and leaving it off by default will
> ensure that it only gets used by the handful of people who know enough to
> rebuild with the patch anyway.

But we do know why it helps: a proc eats memory, then page-cache, then swaps 
others out, and then dies to free its memory, and now swap-prefetch comes in 
if the system is idle.  Sounds really smart.

While many people may definitely benefit, others may just want to turn it 
off.  No harm done.


Thanks!

--
Al
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


How can we make page replacement smarter (was: swap-prefetch)

2007-07-27 Thread Al Boldi
Chris Snook wrote:
> Resource size has been outpacing processing latency since the dawn of
> time. Disks get bigger much faster than seek times shrink.  Main memory
> and cache keep growing, while single-threaded processing speed has nearly
> ground to a halt.
>
> In the old days, it made lots of sense to manage resource allocation in
> pages and blocks.  In the past few years, we started reserving blocks in
> ext3 automatically because it saves more in seek time than it costs in
> disk space. Now we're taking preallocation and antifragmentation to the
> next level with extent-based allocation in ext4.
>
> Well, we're still using bitmap-style allocation for pages, and the
> prefetch-less swap mechanism adheres to this design as well.  Maybe it's
> time to start thinking about memory in a somewhat more extent-like
> fashion.
>
> With swap prefetch, we're only optimizing the case when the box isn't
> loaded and there's RAM free, but we're not optimizing the case when the
> box is heavily loaded and we need for RAM to be free.  This is a complete
> reversal of sane development priorities.  If swap batching is an
> optimization at all (and we have empirical evidence that it is) then it
> should also be an optimization to swap out chunks of pages when we need to
> free memory.
>
> So, how do we go about this grouping?  I suggest that if we keep per-VMA
> reference/fault/dirty statistics, we can tell which logically distinct
> chunks of memory are being regularly used.  This would also us to apply
> different page replacement policies to chunks of memory that are being
> used in different fashions.
>
> With such statistics, we could then page out VMAs in 2MB chunks when we're
> under memory pressure, also giving us the option of transparently paging
> them back in to hugepages when we have the memory free, once anonymous
> hugepage support is in place.
>
> I'm inclined to view swap prefetch as a successful scientific experiment,
> and use that data to inform a more reasoned engineering effort.  If we can
> design something intelligent which happens to behave more or less like
> swap prefetch does under the circumstances where swap prefetch helps, and
> does something else smart under the circumstances where swap prefetch
> makes no discernable difference, it'll be a much bigger improvement.
>
> Because we cannot prove why the existing patch helps, we cannot say what
> impact it will have when things like virtualization and solid state drives
> radically change the coefficients of the equation we have not solved. 
> Providing a sysctl to turn off a misbehaving feature is a poor substitute
> for doing it right the first time, and leaving it off by default will
> ensure that it only gets used by the handful of people who know enough to
> rebuild with the patch anyway.
>
> Let's talk about how we can make page replacement smarter, so it naturally
> accomplishes what swap prefetch accomplishes, as part of a design we can
> reason about.
>
> CC-ing linux-mm, since that's where I think we should take this next.

Good idea, but unless we understand the problems involved, we are bound to 
repeat it.  So my first question would be:  Why is swap-in so slow?

As I have posted in other threads, swap-in of consecutive pages suffers a 2x 
slowdown wrt swap-out, whereas swap-in of random pages suffers over 6x 
slowdown.

Because it is hard to quantify the expected swap-in speed for random pages, 
let's first tackle the swap-in of consecutive pages, which should be at 
least as fast as swap-out.  So again, why is swap-in so slow?

Once we understand this problem, we may be able to suggest a smart 
improvement.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] add check do_direct_IO() return val

2007-07-27 Thread Joe Jin
> I tested Andrew's patch and panic was gone but got few ENOTBLK.
> So I tried with Joe's patch , both panic and ENOTBLK are gone now.
> But in Joe's patch if (ret == -ENOTBLK && (rw & WRITE)), dio_cleanup(dio)
> was not getting called because of break. So I moved dio_cleanup just 
> after if (ret).

Guru, actually, break from the loop with ENOTBLK will call dio_cleanup
at leater, if call it too early, that means will put_page(), maybe cause
other panic.

Thanks,
Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pluggable scheduler flamewar thread (was Re: Volanomark slows by 80% under CFS)

2007-07-27 Thread Chris Snook

Andrea Arcangeli wrote:

On Fri, Jul 27, 2007 at 08:31:19PM -0400, Chris Snook wrote:
I think Volanomark is being pretty stupid, and deserves to run slowly, but 


Indeed, any app doing what volanomark does is pretty inefficient.

But this is not the point. I/O schedulers are pluggable to help for
inefficient apps too. If apps would be extremely smart they would all
use async-io for their reads, and there wouldn't be the need of
anticipatory scheduler just for an example.


I'm pretty sure the point of posting a patch that triples CFS performance on a 
certain benchmark and arguably improves the semantics of sched_yield was to 
improve CFS.  You have a point, but it is a point for a different thread.  I 
have taken the liberty of starting this thread for you.



The fact is there's no technical explanation for which we're forbidden
to be able to choose between CFS and O(1) at least at boot time.


Sure there is.  We can run a fully-functional POSIX OS without using any block 
devices at all.  We cannot run a fully-functional POSIX OS without a scheduler. 
 Any feature without which the OS cannot execute userspace code is sufficiently 
primitive that somewhere there is a device on which it will be impossible to 
debug if that feature fails to initialize.  It is quite reasonable to insist on 
only having one implementation of such features in any given kernel build.


Whether or not these alternatives belong in the source tree as config-time 
options is a political question, but preserving boot-time debugging capability 
is a perfectly reasonable technical motivation.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.23

2007-07-27 Thread Daniel Cheng
Andrew Morton wrote:
[...]
> 
> And userspace can do a much better implementation of this
> how-to-handle-large-load-shifts problem, because it is really quite
> complex.  The system needs to be monitored to determine what is the "usual"
[...]
> All this would end up needing runtime configurability and tweakability and
> customisability.  All standard fare for userspace stuff - much easier than
> patching the kernel.

But a patch already exist.
Which is easier: (1) apply the patch ; or (2) write a new patch?

> 
> So.  We can
> a) provide a way for userspace to reload pagecache and
> b) merge maps2 (once it's finished) (pokes mpm)
> and we're done?

might be.
but merging maps2 have higher risk which should be done in a development
branch (er... 2.7, but we don't have it now).

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Doc] memory hotplug documentaion take 2.

2007-07-27 Thread Yasunori Goto
Thanks for your comment.
Fixed patch is attached at the last of this mail.

> > +
> > +Note(1): x86_64's has special implementation for memory hotplug.
> > + This test does not describe it.
> 
>  text (?)

Oops. Yes.

> > +1.2. Phases of memory hotplug
> > +---
> > +There are 2 phases in Memory Hotplug.
> > +  1) Physical Memory Hotplug phase
> > +  2) Logical Memory Hotplug phase.
> > +
> > +The First phase is to communicate hardware/firmware and make/erase
> > +environment for hotplugged memory. Basically, this phase is necessary
> > +for the purpose (B), but this is good phase for communication between
> > +highly virtulaized environments too.
> 
>   virtualized

Yes. fixed...

> 
> > +
> > +When memory is hotplugged, the kernel recognizes new memory, makes new 
> > memory
> > +management tables, and makes sysfs files for new memory's operation.
> > +
> > +If firmware supports notification of connection of new memory to OS,
> > +this phase is triggered automatically. ACPI can notify this event. If not,
> > +"probe" operation by system administration works instead of it.
> 
>   is used instead.

Ah, ok.


> > +(see Section 4.).
> > +
> > +Logical Memory Hotplug phase is to change memory state into
> > +avaiable/unavailable for users. Amount of memory from user's view is
> > +changed by this phase. The kernel makes all memory in it as free pages
> > +when a memory range is into available.
> 
>   ?? drop "into" ?
> or is a memory range always available?  Confusing.

Ok. I didn't know it was confusing. Thanks. I dropped it.

> > +In this document, this phase is described online/offline.
> 
>described as online/offline.

OK.

> > +
> > +Logical Memory Hotplug phase is trigged by write of sysfs file by system
> 
>triggered

Oops. yes.

> 
> > +administrator. When hot-add case, it must be executed after Physical 
> > Hotplug
> 
>   For the hot-add case,

OK.

> 
> > +phase by hand.
> > +(However, if you writes udev's hotplug scripts for memory hotplug, these
> > + phases can be execute in seamless way.)
> > +
> > +
> > +1.3. Unit of Memory online/offline operation
> > +
> > +Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole 
> > memory
> > +into chunks of the same size. The chunk is called a "section". The size of
> > +a section is architecture dependent. For example, power uses 16MiB, ia64 
> > uses
> > +1GiB. The unit of online/offline operation is "one section". (see Section 
> > 3.)
> > +
> > +To know the size of sections, please read this file:
> 
>To determine the size ...

I didn't know "determine" can be used for this sentence.
I remembered it means just "decide" due to my English 
vocabulary problem. Thanks. I changed it. :-)

> > +- For using remove memory, followings are necessary too
> 
>  To enable memory removal, the following are also necessary


Ok.

> 
> > +Allow for memory hot remove(CONFIG_MEMORY_HOTREMOVE)
> > +Page Migration (CONFIG_MIGRATION)
> > +
> > +- For ACPI memory hotplug, followings are necessary too
> 
>   the following are also necessary

Ok.

> > +Now, XXX is defined as start_address_of_section / secion_size.
> 
>  section_size.

Yes. Thanks.

> > +
> > +For example, assume 1GiB section size. A device for a memory starts from 
> > address
> 
>for memory starting at

Ok.

> > +
> > +In general, the firmware (ACPI) which supports memory hotplug defines
> > +memory class object of _HID "PNP0C80". When a notify is asserted to 
> > PNP0C80,
> > +Linux's ACPI handler does hot-add memory to the system and calls a hotplug 
> > udev
> > +script. This will be done in automatically.
> 
>  drop "in"

Ok.


> > +If firmware supports NUMA-node hotplug, and define object of _HID 
> > "ACPI0004",
> 
>defines an object

Ok.

> 
> > +"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI hander
> 
>  handler

Ah, yes.

Thanks again!



---
This is add a document for memory hotplug to describe "How to use" and "Current
status".


---
Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]>


 Documentation/memory-hotplug.txt |  322 +++
 1 files changed, 322 insertions(+)

Index: makedocument/Documentation/memory-hotplug.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ makedocument/Documentation/memory-hotplug.txt   2007-07-28 
11:47:52.0 +0900
@@ -0,0 +1,322 @@
+===

Re: [RFC] scheduler: improve SMP fairness in CFS

2007-07-27 Thread Chris Snook

Bill Huey (hui) wrote:

On Fri, Jul 27, 2007 at 07:36:17PM -0400, Chris Snook wrote:
I don't think that achieving a constant error bound is always a good thing. 
 We all know that fairness has overhead.  If I have 3 threads and 2 
processors, and I have a choice between fairly giving each thread 1.0 
billion cycles during the next second, or unfairly giving two of them 1.1 
billion cycles and giving the other 0.9 billion cycles, then we can have a 
useful discussion about where we want to draw the line on the 
fairness/performance tradeoff.  On the other hand, if we can give two of 
them 1.1 billion cycles and still give the other one 1.0 billion cycles, 
it's madness to waste those 0.2 billion cycles just to avoid user jealousy. 
 The more complex the memory topology of a system, the more "free" cycles 
you'll get by tolerating short-term unfairness.  As a crude heuristic, 
scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but 
eventually we should take the boot-time computed migration costs into 
consideration.


You have to consider the target for this kind of code. There are applications
where you need something that falls within a constant error bound. According
to the numbers, the current CFS rebalancing logic doesn't achieve that to
any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more
strict than that.


I've said from the beginning that I think that anyone who desperately needs 
perfect fairness should be explicitly enforcing it with the aid of realtime 
priorities.  The problem is that configuring and tuning a realtime application 
is a pain, and people want to be able to approximate this behavior without doing 
a whole lot of dirty work themselves.  I believe that CFS can and should be 
enhanced to ensure SMP-fairness over potentially short, user-configurable 
intervals, even for SCHED_OTHER.  I do not, however, believe that we should take 
it to the extreme of wasting CPU cycles on migrations that will not improve 
performance for *any* task, just to avoid letting some tasks get ahead of 
others.  We should be as fair as possible but no fairer.  If we've already made 
it as fair as possible, we should account for the margin of error and correct 
for it the next time we rebalance.  We should not burn the surplus just to get 
rid of it.


On a non-NUMA box with single-socket, non-SMT processors, a constant error bound 
is fine.  Once we add SMT, go multi-core, go NUMA, and add inter-chassis 
interconnects on top of that, we need to multiply this error bound at each stage 
in the hierarchy, or else we'll end up wasting CPU cycles on migrations that 
actually hurt the processes they're supposed to be helping, and hurt everyone 
else even more.  I believe we should enforce an error bound that is proportional 
to migration cost.



Even the rt overload code (from my memory) is subject to these limitations
as well until it's moved to use a single global queue while using CPU
binding to turn off that logic. It's the price you pay for accuracy.

If we allow a little short-term fairness (and I think we should) we can 
still account for this unfairness and compensate for it (again, with the 
same tolerance) at the next rebalancing.


Again, it's a function of *when* and depends on that application.

Adding system calls, while great for research, is not something which is 
done lightly in the published kernel.  If we're going to implement a user 
interface beyond simply interpreting existing priorities more precisely, it 
would be nice if this was part of a framework with a broader vision, such 
as a scheduler economy.


I'm not sure what you mean by scheduler economy, but CFS can and should
be extended to handle proportional scheduling which is outside of the
traditional Unix priority semantics. Having a new API to get at this is
unavoidable if you want it to eventually support -rt oriented appications
that have bandwidth semantics.


A scheduler economy is basically a credit scheduler, augmented to allow 
processes to exchange credits with each other.  If you want to get more 
sophisticated with fairness, you could price CPU time proportional to load on 
that CPU.


I've been house-hunting lately, so I like to think of it in real estate terms. 
If you're comfortable with your standard of living and you have enough money, 
you can rent the apartment in the chic part of town, right next to the subway 
station.  If you want to be more frugal because you're saving for retirement, 
you can get a place out in the suburbs, but the commute will be more of a pain. 
 If you can't make up your mind and keep moving back and forth, you spend a lot 
on moving and all your stuff gets dented and scratched.



All deadline based schedulers have API mechanisms like this to support
extended semantics. This is no different.

I had a feeling this patch was originally designed for the O(1) scheduler, 
and this is why.  The old scheduler had expired arrays, so adding a 
round-expired a

Re: Volanomark slows by 80% under CFS

2007-07-27 Thread Rik van Riel

Tim Chen wrote:

Ingo,

Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1.  
Benchmark was run on a 2 socket Core2 machine.


The change in scheduler treatment of sched_yield 
could play a part in changing Volanomark behavior.

In CFS, sched_yield is implemented
by dequeueing and requeueing a process .  The time a process 
has spent running probably reduced the the cpu time due it 
by only a bit. The process could get re-queued pretty close

to head of the queue, and may get scheduled again pretty
quickly if there is still a lot of cpu time due.  


I wonder if this explains the 30% drop in top performance
seen with the MySQL sysbench benchmark when the scheduler
changed to CFS...

See http://people.freebsd.org/~jeff/sysbench.png

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Source organization for two drivers sharing coomon code

2007-07-27 Thread Subbu Seetharaman
Thanks for all the answers.  The common code is mostly handling
the message passing for hardware  initialization, rings creation
and some ioctls.  drivers/message looks like a good place for this
code to live.

Subbu



From: Jan Engelhardt [mailto:[EMAIL PROTECTED]
To: Chris Friesen [mailto:[EMAIL PROTECTED]
Cc: Subbu Seetharaman [mailto:[EMAIL PROTECTED], linux-kernel@vger.kernel.org
Sent: Fri, 27 Jul 2007 12:34:16 -0700
Subject: Re: Source organization for two drivers sharing coomon code


On Jul 27 2007 13:12, Chris Friesen wrote:
> Jan Engelhardt wrote:
>> On Jul 27 2007 10:17, Subbu Seetharaman wrote:
>> 
>> >What is the recommended way for two drivers to share common code ?
>> >...The source code for these dirvers will fit under drivers/net and
>> >drivers/scsi. But both drivers share some common code.
>
>> You could create (in total) three modules, e.g. my-common.ko,
>> my-net.ko and my-scsi.ko, of which the latter two use functions from the
>> first.
>
> Where would the common code live, in such a case? Would you just pick one of
> the two locations at random, or put it in drivers/misc or maybe lib?

Perhaps drivers/message - well I can't answer that exactly.

As far as the output object files are concerned, it is not relevant,
since they will be autoloaded anyway :)



Jan
-- 

___
This message, together with any attachment(s), contains confidential and 
proprietary information of
ServerEngines LLC and is intended only for the designated recipient(s) named 
above. Any unauthorized
review, printing, retention, copying, disclosure or distribution is strictly 
prohibited.  If you are not the
intended recipient of this message, please immediately advise the sender by 
reply email message and
delete all copies of this message and any attachment(s). Thank you.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Documentation: document HFSPlus

2007-07-27 Thread Randy Dunlap
On Fri, 27 Jul 2007 21:25:47 -0400 Wyatt Banks wrote:

> From: Wyatt Banks <[EMAIL PROTECTED]>
> 
> Documentation: document HFSPlus filesystem and its mount options.
> 
> Signed-off-by:Wyatt Banks <[EMAIL PROTECTED]>

Thanks.

> ---
> 
> Patched against 2.6.22.1

FYI:  Patches should be against the latest -rc or -git (when
available), but it probably doesn't matter in this case.


> diff -uprN linux-2.6.22.1/Documentation/filesystems/hfsplus.txt 
> linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt
> --- linux-2.6.22.1/Documentation/filesystems/hfsplus.txt  1969-12-31 
> 19:00:00.0 -0500
> +++ linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt
> 2007-07-27 21:11:10.0 -0400
> @@ -0,0 +1,59 @@
> +
> +Macintosh HFSPlus Filesystem for Linux
> +==
> +
> +HFSPlus is a filesystem first introduced in MacOS 8.1.
> +HFSPlus has several extensions to HFS, including 32 bit allocation

32-bit

> +blocks, 255 character unicode filenames, and file sizes of 2^63 bytes.

   255-character

> +
> +
> +Mount options
> +=
> +
> +When mounting an HFSPlus filesystem, the following options are accepted:
> +
> +  creator=, type=
> + Specifies the creator/type values as shown by the MacOS finder
> + used for creating new files.  Default values: ''.
> +
> +  uid=n, gid=n
> + Specifies the user/group that owns all files on the filesystem
> + that have uninitialized permissions structures.
> + Default:  user/group id of the mounting process.
> +
> +  umask=n
> + Specifies the umask used for files and directories that have
> + uninitialized permissions structures.
> + Default:  umask of the mounting process.

in octal

> +  session=n
> + Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
> + leaving that decision to the CDROM driver.  This option will fail
> + with anything but a CDROM as underlying devices.
> +
> +  part=n
> + Select partition number n from the devices.  Does only makes
> + sense for CDROMS because they can't be partitioned under Linux.

  CDROMs or CD-ROMs
and this sentence is confusing to me.  Please check it.

> + For disk devices the generic partition parsing code does this
> + for us.  Defaults to not parsing the partition table at all.
> +
> +  decompose
> + Decompose file name characters.
> +
> +  nodecompose
> + Do not decompose file name characters.
> +
> +  force
> + Used to force write access to volumes that are marked as journalled
> + or locked.  Use at your own risk.
> +
> +  nls=
> + Encoding to use when presenting file names.
> +
> +
> +References
> +==
> +
> +kernel source:   
> +
> +Apple Technote 1150  http://developer.apple.com/technotes/tn/tn1150.html


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 12/68] 0 -> NULL, for arch/powerpc

2007-07-27 Thread Paul Mackerras
Yoann Padioleau writes:

> When comparing a pointer, it's clearer to compare it to NULL than to 0.

As other people have said, if you're going to spend time on this,
testing (!buf) is more idiomatic in the kernel than (buf == NULL).

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linus 2.6.23-rc1

2007-07-27 Thread Linus Torvalds


On Sat, 28 Jul 2007, Kasper Sandberg wrote:
>
> Im still not so keen about this, Ingo never did get CFS to match SD in
> smoothness for 3d applications, where my test subjects are quake(s),
> world of warcraft via wine, unreal tournament 2004. And this is despite
> many patches he sent me to try and tweak it.

You realize that different people get different behaviour, don't you? 
Maybe not.

People who think SD was "perfect" were simply ignoring reality. Sadly, 
that seemed to include Con too, which was one of the main reasons that I 
never ended entertaining the notion of merging SD for very long at all: 
Con ended up arguing against people who reported problems, rather than 
trying to work with them.

Andrew also reported an oops in the scheduler when SD was merged into -mm, 
so there were other issues.

> As far as im concerned, i may be forced to unofficially maintain SD for 
> my own systems(allthough lots in the gaming community is bound to be 
> interrested, as it does make games lots better)

You know what? You can do whatever you want to. That's kind of the point 
of open source. Keep people honest by having alternatives.

But the the thing is, if you want to do a good job of doing that, here's a 
big hint: instead of keeping to your isolated world, instead of just 
talking about your own machine and ignoring other peoples machines and 
issues and instead of just denying that problems may exist, and instead of 
attacking people who report problems, how about working with them?

That was where the SD patches fell down. They didn't have a maintainer 
that I could trust to actually care about any other issues than his own.

So here's a hint: if you think that your particular graphics card setup is 
the only one that matters, it's not going to be very interesting for 
anybody else. 

[ I realize that this comes as a shock to some of the SD people, but I'm 
  told that there was a university group that did some double-blind 
  testing of the different schedulers - old, SD and CFS - and that 
  everybody agreed that both SD and CFS were better than the old, but that 
  there was no significant difference between SD and CFS. You can try 
  asking Thomas Gleixner for more details. ]

I'm happy that SD was perfect for you. It wasn't for others, and it had 
nobody who was even interested in trying to solve those issues. 

As a long-term maintainer, trust me, I know what matters. And a person who 
can actually be bothered to follow up on problem reports is a *hell* of a 
lot more important than one who just argues with reporters.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Antonino A. Daplas
On Sat, 2007-07-28 at 10:14 +0800, Antonino A. Daplas wrote:
> On Sat, 2007-07-28 at 02:06 +0100, Adrian McMenamin wrote:
> > On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > 
> > >

> tmp = transp << var->transp.offset | red << var->red.offset |
>   green << var->green.offset | blue << var->green.offset;
>  

The above should be:

tmp = regno << var->transp.offset | regno << var->red.offset |
regno << var->green.offset | regno << var->green.offset;
 
Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Antonino A. Daplas
On Sat, 2007-07-28 at 02:06 +0100, Adrian McMenamin wrote:
> On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> 
> >
> But certainly better at 16bpp
> 
> Can mess about with it later to see if I can get the colours right I suppose.
> 

You can start with pvr2fb_setcolreg() and pvr2fb_set_pal_entry().

A few things I've noticed:

1. In pvr2fb_setcolreg(), pvr2fb_set_pal_entry() is called for bpp 16
and 32.  This means that the palette is modifiable, so
FB_VISUAL_TRUECOLOR is probably not the correct visual for this driver,
FB_VISUAL_DIRECTCOLOR is more appropriate.

So, you either remove the call to set_pal_entry() in setcolreg() or
change the visual to FB_VISUAL_DIRECTCOLOR. Of course, with directcolor,
the pseudo_palette is now written with tmp as:

tmp = transp << var->transp.offset | red << var->red.offset |
green << var->green.offset | blue << var->green.offset;
 

2. Perhaps, the 3rd parameter passed to set_pal_entry() is not  correct?
Maybe you can try doing it like this for all bpp's, assuming ARGB?

pvr2fb_set_pal_entry(par, regno, transp << 24 | red << 16 | green << 8 |
blue);

And if you want to maintain FB_VISUAL_TRUECOLOR format, initialize the
palette once on init:

for (i = 0; i < 256; i++)
pvr2fb_set_pal_entry(par, i, i << 24 | i << 16 | i << 8 | i);

to create a linear color map consistent with truecolor, then remove all
other calls to pvr2fb_set_pal_entry().

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linus 2.6.23-rc1

2007-07-27 Thread Kasper Sandberg
(sorry for repost, but there seemed to have been some troubles..)

On Sun, 2007-07-22 at 14:04 -0700, Linus Torvalds wrote:
> Ok, right on time, two weeks afetr 2.6.22, there's a 2.6.23-rc1 out there.
> 
> And it has a *ton* of changes as usual for the merge window, way too much 
> for me to be able to post even just the shortlog or diffstat on the 
> mailing list (but I had many people who wanted to full logs to stay 
> around, so you'll continue to see those being uploaded to kernel.org).
> 
> Lots of architecture updates (for just about all of them - x86[-64], arm, 
> alpha, mips, ia64, powerpc, s390, sh, sparc, um..), lots of driver updates 
> (again, all over - usb, net, dvb, ide, sata, scsi, isdn, infiniband, 
> firewire, i2c, you name it).
> 
> Filesystems, VM, networking, ACPI, it's all there. And virtualization all 
> over the place (kvm, lguest, Xen).
> 
> Notable new things might be the merge of the cfs scheduler, and the UIO 
> driver infrastructure might interest some people.
> 
Im still not so keen about this, Ingo never did get CFS to match SD in
smoothness for 3d applications, where my test subjects are quake(s),
world of warcraft via wine, unreal tournament 2004. And this is despite
many patches he sent me to try and tweak it. As far as im concerned, i
may be forced to unofficially maintain SD for my own systems(allthough
lots in the gaming community is bound to be interrested, as it does make
games lots better)





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] i386 relocable kernel breakes /proc/kcore debugging

2007-07-27 Thread Eric W. Biederman
Maxim Levitsky <[EMAIL PROTECTED]> writes:

> Hello,
>
> Today I noticed that gdb gets confused when I try to load a vmlinux image.
> gdb 'thinks' that all kernel symbols are below 0x8000 , while they are at 
> 0xC000
>
> Turning CONFIG_RELOCATABLE off fixes that, so I assume that is the reason for 
> that.
>
> I am using 2.6.23-rc1, although I don't think that older versions are better.
>
> Best regards,
>   Maxim Levitsky

Weird.

Vivek could this be related to the problem of problematic core dumps we
were seeing earlier?

Eric


> PS: 
> This is what gdb says:

>
> (gdb) disassemble sys_open
> Dump of assembler code for function sys_open:
> 0x8026fa60 :Cannot access memory at address 0x8026fa60
>
> While real address of sys_open is:
>
> [EMAIL PROTECTED] linux-2.6]# nm ./.obj/vmlinux |  grep sys_open
> .
> c016ea60 T sys_open
>
> Strange, but gdb recordnizes the above address directly:
>
> (gdb) disassemble 0xc016ea60
> Dump of assembler code for function sys_open:
> 0xc016ea60 :sub$0x4,%esp
> 0xc016ea63 :mov0x10(%esp),%eax
> ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread Chris Snook

Al Boldi wrote:

People wrote:

I believe the users who say their apps really do get paged back in
though, so suspect that's not the case.

Stopping the bush-circumference beating, I do not. -ck (and gentoo) have
this massive Calimero thing going among their users where people are
much less interested in technology than in how the nasty big kernel
meanies are keeping them down (*).

I think the problem is elsewhere. Users don't say: "My apps get paged
back in." They say: "My system is more responsive". They really don't
care *why* the reaction to a mouse click that takes three seconds with
a mainline kernel is instantaneous with -ck. Nasty big kernel meanies,
OTOH, want to understand *why* a patch helps in order to decide whether
it is really a good idea to merge it. So you've got a bunch of patches
(aka -ck) which visibly improve the overall responsiveness of a desktop
system, but apparently no one can conclusively explain why or how they
achieve that, and therefore they cannot be merged into mainline.

I don't have a solution to that dilemma either.


IMHO, what everybody agrees on, is that swap-prefetch has a positive effect 
in some cases, and nobody can prove an adverse effect (excluding power 
consumption).  The reason for this positive effect is also crystal clear:  
It prefetches from swap on idle into free memory, ie: it doesn't force 
anybody out, and they are the first to be dropped without further swap-out, 
which sounds really smart.


Conclusion:  Either prove swap-prefetch is broken, or get this merged quick.


If you can't prove why it helps and doesn't hurt, then it's a hack, by 
definition.  Behind any performance hack is some fundamental truth that can be 
exploited to greater effect if we reason about it.  So let's reason about it. 
I'll start.


Resource size has been outpacing processing latency since the dawn of time. 
Disks get bigger much faster than seek times shrink.  Main memory and cache keep 
growing, while single-threaded processing speed has nearly ground to a halt.


In the old days, it made lots of sense to manage resource allocation in pages 
and blocks.  In the past few years, we started reserving blocks in ext3 
automatically because it saves more in seek time than it costs in disk space. 
Now we're taking preallocation and antifragmentation to the next level with 
extent-based allocation in ext4.


Well, we're still using bitmap-style allocation for pages, and the prefetch-less 
swap mechanism adheres to this design as well.  Maybe it's time to start 
thinking about memory in a somewhat more extent-like fashion.


With swap prefetch, we're only optimizing the case when the box isn't loaded and 
there's RAM free, but we're not optimizing the case when the box is heavily 
loaded and we need for RAM to be free.  This is a complete reversal of sane 
development priorities.  If swap batching is an optimization at all (and we have 
empirical evidence that it is) then it should also be an optimization to swap 
out chunks of pages when we need to free memory.


So, how do we go about this grouping?  I suggest that if we keep per-VMA 
reference/fault/dirty statistics, we can tell which logically distinct chunks of 
memory are being regularly used.  This would also us to apply different page 
replacement policies to chunks of memory that are being used in different fashions.


With such statistics, we could then page out VMAs in 2MB chunks when we're under 
memory pressure, also giving us the option of transparently paging them back in 
to hugepages when we have the memory free, once anonymous hugepage support is in 
place.


I'm inclined to view swap prefetch as a successful scientific experiment, and 
use that data to inform a more reasoned engineering effort.  If we can design 
something intelligent which happens to behave more or less like swap prefetch 
does under the circumstances where swap prefetch helps, and does something else 
smart under the circumstances where swap prefetch makes no discernable 
difference, it'll be a much bigger improvement.


Because we cannot prove why the existing patch helps, we cannot say what impact 
it will have when things like virtualization and solid state drives radically 
change the coefficients of the equation we have not solved.  Providing a sysctl 
to turn off a misbehaving feature is a poor substitute for doing it right the 
first time, and leaving it off by default will ensure that it only gets used by 
the handful of people who know enough to rebuild with the patch anyway.


Let's talk about how we can make page replacement smarter, so it naturally 
accomplishes what swap prefetch accomplishes, as part of a design we can reason 
about.


CC-ing linux-mm, since that's where I think we should take this next.

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: Problems with reading DVD using 2.6.21.5

2007-07-27 Thread Robert Hancock

Manuel Reimer wrote:

Hello,

today I've tried to install Slackware 12.0

As the installer just "skipped" some install steps, I tried to find the 
error.


The problem seems to be unreadable parts on the DVD:

http://pastebin.com/f381e8a88

But the DVD is OK. I've checked the MD5sum directly from disc on the 
same system using the same DVD drive.


dmesg says:

http://pastebin.com/f63c5c389

The kernel, used on the Slackware setup disk, uses SMP, but my hardware 
doesn't support this (get error on dmesg). May this (SMP kernel on 
non-SMP system) cause such bugs?


Is this a known bug? How could code, which breaks DVD access, get into 
stable 2.6.21.5?


I don't think this is a bug, the drive was told to read a sector and 
returned error SK=03, ASC=02, ASCQ=00 which is "NO SEEK COMPLETE", in 
other words it couldn't find that sector. Could be that the disc is 
marginally readable and only sometimes causes read errors.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix DMA on Dreamcast

2007-07-27 Thread Peter Bortas
On 7/26/07, Paul Mundt <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 26, 2007 at 02:59:51PM +0200, Peter Bortas wrote:
> > On 7/26/07, Marcus Comstedt <[EMAIL PROTECTED]> wrote:
> > > "Peter Bortas" <[EMAIL PROTECTED]> writes:
> > > > On 7/21/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> > > >> On 21/07/07, Peter Bortas <[EMAIL PROTECTED]> wrote:
> > > >> > Sidenote: Does Linux handle the Dreamcast DMA errata?
> > > >>
> > > >> You need to explain what you mean (at least to me!).
> > > >>
> > > >> If you mean will it degrade gracefully - not without this patch if set
> > > >> to the (correct) defconfig. With iffy settings it will.
> > > >
> > > > If I remember correctly (and that's a big if since I last looked at it
> > > > in 2001) some revisions of the Dreamcast hardware would sporadically
> > > > lock up if you scheduled a new DMA request to quickly after a previous
> > > > one, even if you checked the ready bit. It's worked around by a delay
> > > > of X microseconds as recommended by Sega engineers. I don't remember
> > > > the value of X, nor where exactly in the flow this workaround should
> > > > be applied.
> > > >
> > > > Adding Marcus in case he has a better memory than me.
> > >
> > > I don't remeber any such delay.  Are you sure you're not thinking
> > > about the G2 bus problem (where accesses need to be programatically
> > > serialized, whether they are PIO or DMA)?
> >
> > In that case my memory is worse than I thought. I'll see if I can dig
> > up my old notes.
> >
> We've never hit any problems with the SH DMAC, so it would be interesting
> if you had some more information on this. The G2 problems are well known
> and documented, and the driver takes care of those issues already.

After grep-ing through some 8GiB of archived mails and notes it seems
marcus is absolutely correct, I'm thinking of the known G2 problem.

-- 
Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


scripts/mod/file2alias.c cross compile problem

2007-07-27 Thread Adrian Bunk
On Fri, Jul 27, 2007 at 04:21:47PM -0700, Luck, Tony wrote:
> > So it seems on ia64 with gcc 3.3.6 there's some 8 byte alignment of the 
> > array members?
> >
> > Sam and the ia64 maintainers Cc'ed - they might know better what's going 
> > on here.
> 
> This ia64 maintainer is baffled ... but I don't see the problem here (perhaps
> because my build machine has gcc 3.4.6).


I found what causes this problem, and it only occurs during cross 
compilation.


The struct is:

#define ACPI_ID_LEN 9

struct acpi_device_id {
 __u8 id[ACPI_ID_LEN];
kernel_ulong_t driver_data;
};


When compiling for ia64, this results in:

struct acpi_device_id {
__u8 id[9];
uint64_t driver_data;
};


sizeof(struct acpi_device_id) for ia64 is due to different padding
after id[] 20 bytes on i386 but 24 bytes on ia64.

scripts/mod/file2alias.c is compiled with HOSTCC and ensures that 
kernel_ulong_t is correct (in this case uint64_t for ia64), but it can't 
cope with different padding on different architectures.


> -Tony

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/68] 0 -> NULL, for arch/frv

2007-07-27 Thread Mike Frysinger
On 7/27/07, Robin Getz <[EMAIL PROTECTED]> wrote:
> If there is a definite style or semantic preference that everyone should live
> with - does it make sense to put checks in checkpatch.pl to enforce it?

checkpatch.pl does not have enough semantic knowledge to know if the
thing being tested is a pointer ... dont know if the sparse utility
would be able to pick it out as i'm not familiar with what level that
thing runs at
-mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: reduce pagetable-freeing latencies

2007-07-27 Thread Benjamin Herrenschmidt

> > So I'll first do patch #1, which will not fix the problem, but will make
> > the fix easier to fit in, in the meantime, please provide feedback of
> > your preferred solution for avoiding the get/put_cpu of the 2 above,
> > unless you find a good 3rd one.
> 
> I too would prefer the former solution. I think preemption notifiers are
> a particular iffy hack.
> 
> You could perhaps use C99 variable length arrays to avoid the stack
> waste when not needed, however Andi once told me that generates rather
> dubious code.

As I'm sweeping through arch code etc... preparing the ground for the
proper mmu_gather surgery, I've been thinking about the way to deal with
that per-cpu page list and finally came up with the idea that the best
we can do is around the lines of trying to allocate the list via gfp,
and if that fails, fallback to a (smaller than now) per-cpu. I'm
reworking the interfaces such that the higher level code doesn't have to
care whether preemption is enabled or disabled at a given point.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/68] 0 -> NULL, for arch/frv

2007-07-27 Thread Robin Getz
On Fri 27 Jul 2007 06:18, Yoann Padioleau pondered:
> David Howells <[EMAIL PROTECTED]> writes:
> 
> > Yoann Padioleau <[EMAIL PROTECTED]> wrote:
> >
> >> When comparing a pointer, it's clearer to compare it to NULL than to
> 0.
> >
> > Can you make them of style:
> >
> > if (!x)
> 
> Yes I can. I can make another semantic patch later to do that
> transformation. But some people may prefer (x == NULL) to (!x)
> so I don't know. I think that transformation 
> some 0 to NULL is less controversial.
> 
> 
> >
> > instead?

If there is a definite style or semantic preference that everyone should live 
with - does it make sense to put checks in checkpatch.pl to enforce it?

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rfc] direct IO submission and completion scalability issues

2007-07-27 Thread Siddha, Suresh B
We have been looking into the linux kernel direct IO scalability issues with
database workloads. Comments and suggestions on our below experiments are
welcome.

In the linux kernel, direct IO requests are not batched at the block layer.
i.e, as a new request comes in, the request get directly submitted to the
IO controller on the same cpu that the request originates. And the IO completion
likely happens on a different cpu which is processing interrupts. This results
in cacheline bouncing of some of the hot kernel cachelines (like timers, scsi
cmds, slab, sched, etc) and is becoming an important scalability issue
as the number of cpus and distance between them increase with multi-core
and numa.

In case of the controllers which support RIO/ZIO modes (like some qla2xxx),
IO submission path on each cpu also checks if there any completed
IO commands in the response queue and triggers softirq on the same cpu
to process the completed commands. This results in each logical cpu in the
system spending sometime in softirq processing and this causes contentions in
spinlocks and other data structures.

Not sure when the IO controllers with multiple request/response queues will be
available in the market. In that case we can dedicate each queue pair 
to group of cpus(/a node)  and be done with this problem.

In the absence of such HW today, we were looking into possible solutions for
these problemsa and did couple of experiments as part of this.

In the first experiment, we removed the completed IO command processing during
IO submission. This will now result in the processing of IO commands only
on the cpu receiving interrupts. This will result in more interrupts
(as we are not doing any proactive processing) but wanted to see if this is a
win over each cpu doing the softirq processing. This gave a 1.36% performance
improvement on a x86_64 MP system (total 16 logical cpus) and on two
node ia64 platform(2 nodes, 8 cores, 16 threads) we got 1.5% improvement
[please look at observation #1 below].

Reference patch for this:

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index c5b3c61..357a497 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -414,11 +414,6 @@ qla2x00_start_scsi(srb_t *sp)
WRT_REG_WORD(ISP_REQ_Q_IN(ha, reg), ha->req_ring_index);
RD_REG_WORD_RELAXED(ISP_REQ_Q_IN(ha, reg)); /* PCI Posting. */
 
-   /* Manage unprocessed RIO/ZIO commands in response queue. */
-   if (ha->flags.process_response_queue &&
-   ha->response_ring_ptr->signature != RESPONSE_PROCESSED)
-   qla2x00_process_response_queue(ha);
-
spin_unlock_irqrestore(&ha->hardware_lock, flags);
return (QLA_SUCCESS);
 
@@ -844,11 +839,6 @@ qla24xx_start_scsi(srb_t *sp)
WRT_REG_DWORD(®->req_q_in, ha->req_ring_index);
RD_REG_DWORD_RELAXED(®->req_q_in);   /* PCI Posting. */
 
-   /* Manage unprocessed RIO/ZIO commands in response queue. */
-   if (ha->flags.process_response_queue &&
-   ha->response_ring_ptr->signature != RESPONSE_PROCESSED)
-   qla24xx_process_response_queue(ha);
-
spin_unlock_irqrestore(&ha->hardware_lock, flags);
return QLA_SUCCESS;
 
Observation #1: This experiment puts heavy load on the cpu processing
interrupts. As such, equal distribution of task load by the scheduler didn't
give expected performance improvement(as cpu's with no interrupts race to idle
and migrate some tasks during idle balance, leading to some increase in idle
time aswell as costs associated with excessive task migration). We tweaked our
manual task binding so that cpu's with no interrupts get proportionally more
load compared to cpu's which process interrupts and this gave a nice performance
boost as mentioned above. Perhaps, we need to make the scheduler load balancing
aware of the irq load on that cpu.

Second experiment which we did was migrating the IO submission to the
IO completion cpu. Instead of submitting the IO on the same cpu where the
request arrived, in this experiment  the IO submission gets migrated to the
cpu that is processing IO completions(interrupt). This will minimize the
access to remote cachelines (that happens in timers, slab, scsi layers). The
IO submission request is forwarded to the kblockd thread on the cpu receiving
the interrupts. As part of this, we also made kblockd thread on each cpu as the
highest priority thread, so that IO gets submitted as soon as possible on the
interrupt cpu with out any delay. On x86_64 SMP platform with 16 cores, this
resulted in 2% performance improvement and 3.3% improvement on two node ia64
platform.

Quick and dirty prototype patch(not meant for inclusion) for this io migration
experiment is appended to this e-mail.

Observation #1 mentioned above is also applicable to this experiment. CPU's
processing interrupts will now have to cater IO submission/processing
load aswell.

Observation #2: This introduces

[PATCH] Documentation: document HFSPlus

2007-07-27 Thread Wyatt Banks
From:   Wyatt Banks <[EMAIL PROTECTED]>

Documentation: document HFSPlus filesystem and its mount options.

Signed-off-by:  Wyatt Banks <[EMAIL PROTECTED]>

---

Patched against 2.6.22.1

diff -uprN linux-2.6.22.1/Documentation/filesystems/hfsplus.txt 
linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt
--- linux-2.6.22.1/Documentation/filesystems/hfsplus.txt1969-12-31 
19:00:00.0 -0500
+++ linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt  2007-07-27 
21:11:10.0 -0400
@@ -0,0 +1,59 @@
+
+Macintosh HFSPlus Filesystem for Linux
+==
+
+HFSPlus is a filesystem first introduced in MacOS 8.1.
+HFSPlus has several extensions to HFS, including 32 bit allocation
+blocks, 255 character unicode filenames, and file sizes of 2^63 bytes.
+
+
+Mount options
+=
+
+When mounting an HFSPlus filesystem, the following options are accepted:
+
+  creator=, type=
+   Specifies the creator/type values as shown by the MacOS finder
+   used for creating new files.  Default values: ''.
+
+  uid=n, gid=n
+   Specifies the user/group that owns all files on the filesystem
+   that have uninitialized permissions structures.
+   Default:  user/group id of the mounting process.
+
+  umask=n
+   Specifies the umask used for files and directories that have
+   uninitialized permissions structures.
+   Default:  umask of the mounting process.
+
+  session=n
+   Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
+   leaving that decision to the CDROM driver.  This option will fail
+   with anything but a CDROM as underlying devices.
+
+  part=n
+   Select partition number n from the devices.  Does only makes
+   sense for CDROMS because they can't be partitioned under Linux.
+   For disk devices the generic partition parsing code does this
+   for us.  Defaults to not parsing the partition table at all.
+
+  decompose
+   Decompose file name characters.
+
+  nodecompose
+   Do not decompose file name characters.
+
+  force
+   Used to force write access to volumes that are marked as journalled
+   or locked.  Use at your own risk.
+
+  nls=
+   Encoding to use when presenting file names.
+
+
+References
+==
+
+kernel source: 
+
+Apple Technote 1150http://developer.apple.com/technotes/tn/tn1150.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] seekwatcher v0.3 IO graphing an animation

2007-07-27 Thread Chris Mason

Hello everyone,

I've tossed out seekwatcher v0.3.  The major changes are using rolling
averages to smooth out the seek and throughput graphs, and it can
generate mpgs of the IO done by a given trace.

Here's a sample of the smoother graphs (creating 20 kernel trees):

http://oss.oracle.com/~mason/seekwatcher/ext3_vs_btrfs_vs_xfs.png

There are details and sample movies of the kernel tree run at:

http://oss.oracle.com/~mason/seekwatcher

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Daniel Hazelton
On Friday 27 July 2007 19:29:19 Andi Kleen wrote:
> > Any faults in that reasoning?
>
> GNU sort uses a merge sort with temporary files on disk. Not sure
> how much it keeps in memory during that, but it's probably less
> than 150MB. At some point the dirty limit should kick in and write back the
> data of the temporary files; so it's not quite the same as anonymous
> memory. But it's not that different given.

Yes, this should occur. But how many programs use temporary files like that? 
>From what I can tell FireFox and OpenOffice both keep all their data in 
memory, only using a single file for some buffering purposes. When they get 
pushed out by a memory hog (either short term or long term) it takes several 
seconds for them to be swapped back in. (I'm on a P4-1.3GHz machine with 1G 
of ram and rarely run more than four programs (Mail Client, XChat, FireFox 
and a console window) and I've seen this lag in FireFox when switching to it 
after starting OOo. I've also seen the same sort of lag when exiting OOo. 
I'll see about getting some numbers about this)

> It would be better to measure than to guess. At least Andrew's measurements
> on 128MB actually didn't show updatedb being really that big a problem.

I agree. As I've said previously, it isn't updatedb itself which causes the 
problem. It's the way the VFS cache seems to just expand and expand - to the 
point of evicting pages to make room for itself. However, I may be wrong 
about that - I haven't actually tested it for myself, just looked at the 
numbers and other information that has been posted in this thread.

> Perhaps some people have much more files or simply a less efficient
> updatedb implementation?

Yes, it could be the proliferation of files. It could also be some other sort 
of problem that is exposing a corner-case in the VFS cache or the MM. I, 
personally, am of the opinion that it is likely the aforementioned corner 
case for people reporting the "updatedb" problem. If it is, then 
swap-prefetch is just papering over the problem. However I do not have the 
knowledge and understanding of the subsystems involved to be able to do much 
more than make a (probably wrong) guess.

> I guess the people who complain here that loudly really need to supply
> some real numbers.

I've seen numerous "real numbers" posted about this. As was said earlier in 
the thread "every time numbers are posted they are claimed to be no good". 
But hey, nobodies perfect :)

Anyway, the discussion seems to be turning to the technical merits of 
swap-prefetch...

Now, a completely different question:
During the research (and lots of thinking) I've been doing while this thread 
has been going on I've often wondered why swap prefetch wasn't already in the 
kernel. The problem of slow swap-in has long been known, and, given current 
hardware, the optimal solution would be some sort of data prefetch - similar 
to what is done to speed up normal disk reads. Swap prefetch looks like it 
does exactly that. The algo could be argued over and/or improved (to suggest 
ways to do that I'd have to give it more than a 10 minute look) but it does 
provide a speed-up.

This speed increase will probably be enjoyed more by the home users, but the 
performance increase could also help on enterprise systems.

Now I'll be the first one to admit that there is a trade-off there - it will 
cause more power to be used because the disk's don't get a chance to spin 
down (or go through a cycle every time the prefetch system starts) but that 
could, potentially, be alleviated by having "laptop mode" switch it off.

(And no, I'm not claiming that it is perfect - but then, what is when its 
first merged into the kernel?)

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Adrian McMenamin
On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:

>
> Is this with commit a66ad56eb2c9644717da4d7f05f971d6786145e3 reverted?
> Reapply this commit again, it might (fingers crossed) correct the color
> problem.
>
> As to your display doubling/quadrupling with bpp 24/32, I don't have any
> answers (no hardware) though it seems to be a framebuffer pitch/display
> width mismatch.
>

Mostly solved the colour problem at 16bpp (black background but pale
blue text - had previously been white). At 32bpp just as before -
oversized and yellow.

At 24bpp much as before too - all against black but two boot logos in
greenish shade and everything doubled up on screen in greenish shade
(ie around half of pixels in console text message on left, around half
in repeat on right).

But certainly better at 16bpp

Can mess about with it later to see if I can get the colours right I suppose.

Adrian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Volanomark slows by 80% under CFS

2007-07-27 Thread Andrea Arcangeli
On Fri, Jul 27, 2007 at 08:31:19PM -0400, Chris Snook wrote:
> I think Volanomark is being pretty stupid, and deserves to run slowly, but 

Indeed, any app doing what volanomark does is pretty inefficient.

But this is not the point. I/O schedulers are pluggable to help for
inefficient apps too. If apps would be extremely smart they would all
use async-io for their reads, and there wouldn't be the need of
anticipatory scheduler just for an example.

The fact is there's no technical explanation for which we're forbidden
to be able to choose between CFS and O(1) at least at boot time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] scheduler: improve SMP fairness in CFS

2007-07-27 Thread hui
On Fri, Jul 27, 2007 at 07:36:17PM -0400, Chris Snook wrote:
> I don't think that achieving a constant error bound is always a good thing. 
>  We all know that fairness has overhead.  If I have 3 threads and 2 
> processors, and I have a choice between fairly giving each thread 1.0 
> billion cycles during the next second, or unfairly giving two of them 1.1 
> billion cycles and giving the other 0.9 billion cycles, then we can have a 
> useful discussion about where we want to draw the line on the 
> fairness/performance tradeoff.  On the other hand, if we can give two of 
> them 1.1 billion cycles and still give the other one 1.0 billion cycles, 
> it's madness to waste those 0.2 billion cycles just to avoid user jealousy. 
>  The more complex the memory topology of a system, the more "free" cycles 
> you'll get by tolerating short-term unfairness.  As a crude heuristic, 
> scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but 
> eventually we should take the boot-time computed migration costs into 
> consideration.

You have to consider the target for this kind of code. There are applications
where you need something that falls within a constant error bound. According
to the numbers, the current CFS rebalancing logic doesn't achieve that to
any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more
strict than that.

Even the rt overload code (from my memory) is subject to these limitations
as well until it's moved to use a single global queue while using CPU
binding to turn off that logic. It's the price you pay for accuracy.

> If we allow a little short-term fairness (and I think we should) we can 
> still account for this unfairness and compensate for it (again, with the 
> same tolerance) at the next rebalancing.

Again, it's a function of *when* and depends on that application.

> Adding system calls, while great for research, is not something which is 
> done lightly in the published kernel.  If we're going to implement a user 
> interface beyond simply interpreting existing priorities more precisely, it 
> would be nice if this was part of a framework with a broader vision, such 
> as a scheduler economy.

I'm not sure what you mean by scheduler economy, but CFS can and should
be extended to handle proportional scheduling which is outside of the
traditional Unix priority semantics. Having a new API to get at this is
unavoidable if you want it to eventually support -rt oriented appications
that have bandwidth semantics.

All deadline based schedulers have API mechanisms like this to support
extended semantics. This is no different.

> I had a feeling this patch was originally designed for the O(1) scheduler, 
> and this is why.  The old scheduler had expired arrays, so adding a 
> round-expired array wasn't a radical departure from the design.  CFS does 
> not have an expired rbtree, so adding one *is* a radical departure from the 
> design.  I think we can implement DWRR or something very similar without 
> using this implementation method.  Since we've already got a tree of queued 
> tasks, it might be easiest to basically break off one subtree (usually just 
> one task, but not necessarily) and migrate it to a less loaded tree 
> whenever we can reduce the difference between the load on the two trees by 
> at least half.  This would prevent both overcorrection and undercorrection.

> The idea of rounds was another implementation detail that bothered me.  In 
> the old scheduler, quantizing CPU time was a necessary evil.  Now that we 
> can account for CPU time with nanosecond resolution, doing things on an 
> as-needed basis seems more appropriate, and should reduce the need for 
> global synchronization.

Well, there's nanosecond resolution with no mechanism that exploits it for
rebalancing. Rebalancing in general is a pain and the code for it is
generally orthogonal to the in-core scheduler data structures that are in
use, so I don't understand the objection to this argument and the choice
of methods. If it it gets the job done, then these kind of choices don't
have that much meaning.

> In summary, I think the accounting is sound, but the enforcement is 
> sub-optimal for the new scheduler.  A revision of the algorithm more 
> cognizant of the capabilities and design of the current scheduler would 
> seem to be in order.

That would be nice. But the amount of error in Tong's solution is much
less than the current CFS logic as was previously tested even without
consideration to high resolution clocks.

So you have to give some kind of credit for that approach and recognized
that current methods in CFS are technically a dead end if there's a need for
strict fairness in a more rigorous run category than SCHED_OTHER.

> I've referenced many times my desire to account for CPU/memory hierarchy in 
> these patches.  At present, I'm not sure we have sufficient infrastructure 
> in the kernel to automatically optimize for system topology, but I think 
> whatever de

Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Antonino A. Daplas
On Sat, 2007-07-28 at 01:32 +0100, Adrian McMenamin wrote:
> On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote:
> > > On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > > > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote:
> > > > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > With the patch reverted and 24bpp, it oopses before freezing (with 
> > > > > > two
> > > > > > odd looking boot logos on the screen):
> > > > > >
> > > > > Tested this further and it fails on:
> > > > >
> > > > > rev = fb_readl(par->mmio_base + 0x04);
> > > >
> > > > Doubtful if this line is the point of failure, this line is executed
> > > > only once, on initialization.
> > >
> > >
> > > par->mmio_base is corrupted in some way during the call to
> > > register_framebuffer - still investigating how/why.
> >
> > Possible, par->mmio_base is the last field in struct pvr2fb_par,
> > after that is the pseudo_palette. The oops did not manifest when the
> > pseudo_palette was written as u16, but oops'ed when written as u32.
> > Memory alignment problems?
> >
> > Try the patch I posted before, might help.
> >
> Apologies, missed the patch before.
> 
> With the patch applied the Dreamcast no longer crashes or locks with
> either 16, 24 or 32 bpp, so that's good.
> 
> With 24bpp everything is doubled up (eg two boot logos on screen) and
> about twice (?) the size it should be - though with a black screen.
> 
> With 32 bpp everything is about 4 (?) times the size it should be and
> all on a yellow background.
> 
> With 16bpp then everything is on a blue background as before, but is
> also the correct size (as before).

Is this with commit a66ad56eb2c9644717da4d7f05f971d6786145e3 reverted?
Reapply this commit again, it might (fingers crossed) correct the color
problem.

As to your display doubling/quadrupling with bpp 24/32, I don't have any
answers (no hardware) though it seems to be a framebuffer pitch/display
width mismatch.

Tony



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe

2007-07-27 Thread Neil Horman
On Fri, Jul 27, 2007 at 01:54:19PM -0700, Jeremy Fitzhardinge wrote:
> Neil Horman wrote:
> > +   int helper_argc = 0;
> >   
> > +   helper_argv = argv_split(GFP_KERNEL, corename+1, &helper_argc);
> >   
> 
> Hm, I suspect most users of argv_split don't really care about argc, so
> it would useful to change argv_split to take NULL as the argc pointer,
> rather than declare a bunch of unused variables.  Interested in throwing
> a patch together?
> 
> J


Gladly, I'll take care of it next week.

Regards
Neil

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Adrian McMenamin
On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote:
> > On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote:
> > > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > With the patch reverted and 24bpp, it oopses before freezing (with two
> > > > > odd looking boot logos on the screen):
> > > > >
> > > > Tested this further and it fails on:
> > > >
> > > > rev = fb_readl(par->mmio_base + 0x04);
> > >
> > > Doubtful if this line is the point of failure, this line is executed
> > > only once, on initialization.
> >
> >
> > par->mmio_base is corrupted in some way during the call to
> > register_framebuffer - still investigating how/why.
>
> Possible, par->mmio_base is the last field in struct pvr2fb_par,
> after that is the pseudo_palette. The oops did not manifest when the
> pseudo_palette was written as u16, but oops'ed when written as u32.
> Memory alignment problems?
>
> Try the patch I posted before, might help.
>
Apologies, missed the patch before.

With the patch applied the Dreamcast no longer crashes or locks with
either 16, 24 or 32 bpp, so that's good.

With 24bpp everything is doubled up (eg two boot logos on screen) and
about twice (?) the size it should be - though with a black screen.

With 32 bpp everything is about 4 (?) times the size it should be and
all on a yellow background.

With 16bpp then everything is on a blue background as before, but is
also the correct size (as before).

So, it's better certainly, but there are still a few issues with the
driver, though nothing that takes down the box.

So thanks!

Adrian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Volanomark slows by 80% under CFS

2007-07-27 Thread Chris Snook

Tim Chen wrote:

Ingo,

Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1.  
Benchmark was run on a 2 socket Core2 machine.


The change in scheduler treatment of sched_yield 
could play a part in changing Volanomark behavior.

In CFS, sched_yield is implemented
by dequeueing and requeueing a process .  The time a process 
has spent running probably reduced the the cpu time due it 
by only a bit. The process could get re-queued pretty close

to head of the queue, and may get scheduled again pretty
quickly if there is still a lot of cpu time due.  


It may make sense to queue the
yielding process a bit further behind in the queue. 
I made a slight change by zeroing out wait_runtime 
(i.e. have the process gives
up cpu time due for it to run) for experimentation. 
Let's put aside gripes that Volanomark should have used a 
better mechanism to coordinate threads instead sched_yield for 
a second.   Volanomark runs better
and is only 40% (instead of 80%) down from old scheduler 
without CFS.  


Of course we should not tune for Volanomark and this is
reference data. 
What are your view on how CFS's sched_yield should behave?


Regards,
Tim


The primary purpose of sched_yield is for SCHED_FIFO realtime processes.  Where 
nothing else will run, ever, unless the running thread blocks or yields the CPU. 
 Under CFS, the yielding process will still be leftmost in the rbtree, 
otherwise it would have already been scheduled out.


Zeroing out wait_runtime on sched_yield strikes me as completely appropriate. 
If the process wanted to sleep a finite duration, it should actually call a 
sleep function, but sched_yield is essentially saying "I don't have anything 
else to do right now", so it's hardly fair to claim you've been waiting for your 
chance when you just gave it up.


As for the remaining 40% degradation, if Volanomark is using it for 
synchronization, the scheduler is probably cycling through threads until it gets 
to the one that actually wants to do work.  The O(1) scheduler will do this very 
 quickly, whereas CFS has a bit more overhead.  Interactivity boosting may have 
also helped the old scheduler find the right thread faster.


I think Volanomark is being pretty stupid, and deserves to run slowly, but there 
are legitimate reasons to want to call sched_yield in a non-SCHED_FIFO process. 
 If I'm performing multiple different calculations on the same set of data in 
multiple threads, and accessing the shared data in a linear fashion, I'd like to 
be able to have one thread give the other some CPU time so they can stay at the 
same point in the stream and improve cache hit rates, but this is only an 
optimization if I can do it without wasting CPU or gradually nicing myself into 
oblivion.  Having sched_yield zero out wait_runtime seems like an appropriate 
way to make this use case work to the extent possible.  Any user attempting such 
an optimization should have the good sense to do real work between sched_yield 
calls, to avoid calling the scheduler in a tight loop.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.23

2007-07-27 Thread Matt Mackall
On Wed, Jul 25, 2007 at 11:50:37PM -0700, Andrew Morton wrote:
> On Wed, 25 Jul 2007 23:33:24 -0700 "Ray Lee" <[EMAIL PROTECTED]> wrote:
> 
> > > So.  We can
> > >
> > > a) provide a way for userspace to reload pagecache and
> > >
> > > b) merge maps2 (once it's finished) (pokes mpm)
> > >
> > > and we're done?
> > 
> > Eh, dunno. Maybe?
> > 
> > We're assuming we come up with an API for userspace to get
> > notifications of evictions (without polling, though poll() would be
> > fine -- you know what I mean), and an API for re-victing those things
> > on demand.
> 
> I was assuming that polling would work OK.  I expect it would.
> 
> > If you think that adding that API and maintaining it is
> > simpler/better than including a variation on the above hueristic I
> > offered, then yeah, I guess we are. It'll all have that vague
> > userspace s2ram odor about it, but I'm sure it could be made to work.
> 
> Actually, I overdesigned the API, I suspect.  What we _could_ do is to
> provide a way of allowing userspace to say "pretend process A touched page
> B": adopt its mm and go touch the page.  We in fact already have that:
> PTRACE_PEEKTEXT.
> 
> So I suspect this could all be done by polling maps2 and using PEEKTEXT. 
> The tricky part would be working out when to poll, and when to reestablish.
> 
> A neater implementation than PEEKTEXT would be to make the maps2 files
> writeable(!) so as a party trick you could tar 'em up and then, when you
> want to reestablish firefox's previous working set, do a untar in
> /proc/$(pidof firefox)/

Sick. But thankfully, unnecessary. The pagemaps give you more than
just a present bit, which is all we care about here. We simply need to
record which pages are mapped, then reference them all back to life..

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.23

2007-07-27 Thread Matt Mackall
On Wed, Jul 25, 2007 at 09:57:17PM -0700, Andrew Morton wrote:
> So.  We can
> 
> a) provide a way for userspace to reload pagecache and
> 
> b) merge maps2 (once it's finished) (pokes mpm)

Consider me poked, despite not being cc:ed.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] misannotation in pppol2tp

2007-07-27 Thread James Chapman

Al Viro wrote:

Address of auto variable is not a userland pointer.  A good thing, too,
since if pppol2tp_tunnel_getsockopt() would _really_ get a userland pointer
as argument, it would be an instant roothole...

Signed-off-by: Al Viro <[EMAIL PROTECTED]>

Acked-by: James Chapman <[EMAIL PROTECTED]>

Thanks Al.

--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote:
> > Any faults in that reasoning?
> 
> GNU sort uses a merge sort with temporary files on disk. Not sure
> how much it keeps in memory during that, but it's probably less
> than 150MB. At some point the dirty limit should kick in and write back the 
> data of the temporary files; so it's not quite the same as anonymous memory. 
> But it's not that different given.

Hm, does that change anything? The files need to be read at the end (so
they go into the cache) and are delete afterwards (cache gets freed I
guess?).

> It would be better to measure than to guess. At least Andrew's measurements
> on 128MB actually didn't show updatedb being really that big a problem.

Here's a before/after memory usage for an updatedb run:
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1995 15  0269779
-/+ buffers/cache:946   1064
Swap: 1945  0   1945
[EMAIL PROTECTED]:~# updatedb
[EMAIL PROTECTED]:~# free -m
 total   used   free sharedbuffers cached
Mem:  2011   1914 96  0209746
-/+ buffers/cache:958   1052
Swap: 1945  0   1944

81MB more unused RAM afterwards.

If anyone can make use of that, here's a snippet from /proc/$PID/smaps
of updatedb's sort process, when it was at about its peak memory usage
(according to the RSS column in top), which was about 50MB.

2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 
Size:  50184 kB
Rss:   50184 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 50184 kB
Referenced:50184 kB

> Perhaps some people have much more files or simply a less efficient
> updatedb implementation?

sort (GNU coreutils) 5.97

GNU updatedb version 4.2.31

> I guess the people who complain here that loudly really need to supply
> some real numbers. 

Just to clarify: I'm not complaining either way, neither about not
merging swap prefetch, nor about someone wanting that to be merge. It
was rather the "discussion" that caught my attention... Just in case ;-)

Björn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread Indan Zupancic
On Sat, July 28, 2007 01:34, grundig wrote:
> El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> 
> escribi�:
>
>> how do you know there will be other activity? You start the IO and that
>> basically blacks out the disk for 5 to 10 ms. If the "real" IO gets
>> submitted in that time you add latency. You cannot predict that IO
>> happening or not happening.
>
> If there hasn't be much IO for some time, it looks quite reasonable to expect
> that there won't be more in the near future.

Good argument.

> As most of heuristics can fail, but
> then this is a feature mostly for desktops, not servers.

Bad argument. It doesn't matter for who the feature is intended, it matter
what it does and if it does it well or not. In this case, prefetching swap 
without
disturbing anything else.

> There's an old saying that says something like "an open source project starts
> dying when new people can't participate in the project no matter how hard
> they try". It's hard to understand why there's so many people opposing to
> this when other more controversial features are merged much faster, (like, fe.
> the UIO driver framework).

Could people please stop this emotional crap non-argumentation? At best it 
reduces
the chance of swap-prefetch to be merged.

Perhaps one of the reasons is that this is core kernel code. And that it isn't 
a new
feature, but a performance improvement with doubtful trade-offs. The problem
statement isn't clear either. It seems like a natural enhancement, but is that 
enough
reason to merge it? Maybe, maybe not. But if slow swap-in is the problem, 
shouldn't
that be fixed instead of bypassed?

Yes, there are people that say that it works for them, but of those a lot claim
updatedb damage is fixed by it too, while that can't be true. And how many of 
those
people did test swap prefetch stand-alone? The ck kernel has other mm patches 
too,
perhaps those are the real goodies...

And there don't seem to be many people opposing swap prefetch either. A bunch 
seem
in favour of it, and others seem unconvinced.

Me, I don't know if it should be merged or not, it solves one very specific 
workload, and
nothing else (swap is used, and memory becomes free which won't be used in the 
near
future). All in all it seems good, but doubtful, and when in doubt, don't merge.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread Arjan van de Ven
On Sat, 2007-07-28 at 01:34 +0200, grundig wrote:
> El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> 
> escribió:
> 
> > how do you know there will be other activity? You start the IO and that
> > basically blacks out the disk for 5 to 10 ms. If the "real" IO gets
> > submitted in that time you add latency. You cannot predict that IO
> > happening or not happening.
> 
> If there hasn't be much IO for some time, it looks quite reasonable to expect
> that there won't be more in the near future. As most of heuristics can fail

exactly this was my point: just saying "there are no downsides" isn't
true.

> There's an old saying that says something like "an open source project starts
> dying when new people can't participate in the project no matter how hard
> they try". It's hard to understand why there's so many people opposing to
> this when other more controversial features are merged much faster, (like, 
> fe. 
> the UIO driver framework). 

I'm not opposing this or cheering for it. I'm opposing blindly saying
"there are no downsides". This needs showing with data at minimum, and
my reading of this saga seems to suggest data is the bit that is lacking
from the start...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [1/2] 2.6.23-rc1: known regressions with patches v2

2007-07-27 Thread Luck, Tony
> Subject : ia64 build failure from recent diskquota patch
> References  : http://lkml.org/lkml/2007/7/18/407
> Last known good : ?
> Submitter   : Doug Chapman <[EMAIL PROTECTED]>
> Caused-By   : Vasily Tarasov <[EMAIL PROTECTED]>
>   commit b716395e2b8e450e294537de0c91476ded2f0395
> Handled-By  : Luck, Tony <[EMAIL PROTECTED]>
> Patch1  : http://lkml.org/lkml/2007/7/20/255
> Patch2  : http://lkml.org/lkml/2007/7/20/272
> Status  : patch available

Just sent the "please pull" message to Linus.  The fix should show up in
his tree soon as commit 7a6c813594c9eb25a9afbcbd30c9865e38ee6f39

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Volanomark slows by 80% under CFS

2007-07-27 Thread Tim Chen
Ingo,

Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1.  
Benchmark was run on a 2 socket Core2 machine.

The change in scheduler treatment of sched_yield 
could play a part in changing Volanomark behavior.
In CFS, sched_yield is implemented
by dequeueing and requeueing a process .  The time a process 
has spent running probably reduced the the cpu time due it 
by only a bit. The process could get re-queued pretty close
to head of the queue, and may get scheduled again pretty
quickly if there is still a lot of cpu time due.  

It may make sense to queue the
yielding process a bit further behind in the queue. 
I made a slight change by zeroing out wait_runtime 
(i.e. have the process gives
up cpu time due for it to run) for experimentation. 
Let's put aside gripes that Volanomark should have used a 
better mechanism to coordinate threads instead sched_yield for 
a second.   Volanomark runs better
and is only 40% (instead of 80%) down from old scheduler 
without CFS.  

Of course we should not tune for Volanomark and this is
reference data. 
What are your view on how CFS's sched_yield should behave?

Regards,
Tim



--- linux-2.6.23-rc1/kernel/sched_fair.c.orig   2007-07-27 09:39:11.0 
-0700
+++ linux-2.6.23-rc1/kernel/sched_fair.c2007-07-27 09:40:41.0 
-0700
@@ -841,6 +841,7 @@
 * position within the tree:
 */
dequeue_entity(cfs_rq, &p->se, 0, now);
+   p->se.wait_runtime = 0; 
enqueue_entity(cfs_rq, &p->se, 0, now);
 }
 


Re: UML compile error

2007-07-27 Thread Gabriel C
Andrew Morton wrote:
> On Sat, 28 Jul 2007 00:46:57 +0200
> Gabriel C <[EMAIL PROTECTED]> wrote:
> 
>> UML does not compile on current git head. 
>>
>> 
>> $ make defconfig ARCH=um
>> [..]
>> $ make  ARCH=um
>> scripts/kconfig/conf -s arch/um/Kconfig
>> net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 
>> 'BT_HIDP' refers to undefined symbol 'HID'
>> drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol 
>> 'RTL8187' refers to undefined symbol 'EEPROM_93CX6'
>>   SYMLINK arch/um/include/kern_constants.h
>>   CHK arch/um/include/uml-config.h
>>   UPD arch/um/include/uml-config.h
>>   CC  arch/um/sys-i386/user-offsets.s
>>   CHK arch/um/include/user_constants.h
>>   CHK include/linux/version.h
>>   CHK include/linux/utsrelease.h
>>   CC  arch/um/kernel/asm-offsets.s
>> In file included from include/linux/sched.h:54,
>>  from arch/um/include/sysdep/kernel-offsets.h:2,
>>  from arch/um/kernel/asm-offsets.c:1:
>> include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined
>> include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined
> 
> I suspect your build setup broke.  Try `make mrproper' then
> have another go.
> 

Right this auto build tree broke for some reason.

A fresh git tree is fine ,  sorry for the noise.


Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.23-rc1

2007-07-27 Thread Andreas Schwab
Jan Dittmer <[EMAIL PROTECTED]> writes:

> Len Brown wrote:
>> Hi Linus,
>>
>> please pull from: 
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release
>
> This seems to break ia64 defconfig:
>
>   Building modules, stage 2.
>   MODPOST 157 modules
> FATAL: drivers/acpi/button: sizeof(struct acpi_device_id)=20 is not a modulo 
> of the size of section __mod_acpi_device_table=144.

Are you cross-compiling?  The definition of kernel_ulong_t won't work on
x86.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] remove gratuitous space in airo module description

2007-07-27 Thread Bill Nottingham
Currently the modinfo looks like:

description:Support for Cisco/Aironet 802.11 wireless ethernet  
  cards.  Direct support for ISA/PCI/MPI cards and support   
for PCMCIA when used with airo_cs.

Arguably, it should be cut at the end of the first sentence. 
This at least makes it somewhat more legible.

diff -up linux-2.6.22.x86_64/drivers/net/wireless/airo.c.foo 
linux-2.6.22.x86_64/drivers/net/wireless/airo.c
--- linux-2.6.22.x86_64/drivers/net/wireless/airo.c.foo 2007-07-27 
19:03:59.0 -0400
+++ linux-2.6.22.x86_64/drivers/net/wireless/airo.c 2007-07-27 
19:04:15.0 -0400
@@ -241,8 +241,8 @@ static int proc_perm = 0644;
 
 MODULE_AUTHOR("Benjamin Reed");
 MODULE_DESCRIPTION("Support for Cisco/Aironet 802.11 wireless ethernet \
-   cards.  Direct support for ISA/PCI/MPI cards and support \
-  for PCMCIA when used with airo_cs.");
+cards.  Direct support for ISA/PCI/MPI cards and support \
+for PCMCIA when used with airo_cs.");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_SUPPORTED_DEVICE("Aironet 4500, 4800 and Cisco 340/350");
 module_param_array(io, int, NULL, 0);

Signed-off-by: Bill Nottingham <[EMAIL PROTECTED]>

Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ATA scsi driver misbehavior under kdump capture kernel

2007-07-27 Thread Cliff Wickman



I've run into a problem with the ATA SCSI disk driver when running in a
kdump dump-capture kernel.

I'm running on 2-processor x86_64 box.  It has 2 scsi disks, /dev/sda and
/dev/sdb

My kernel is 2.6.22, and built to be a dump capturing kernel loaded by kexec.
When I boot this kernel by itself, it finds both sda and sdb.

But when it is loaded by kexec and booted on a panic it only finds sda.

Any ideas from those familiar with the ATA driver?


-Cliff Wickman
 SGI



I put some printk's into it and get this:

Standalone:

   [nv_adma_error_handler]
cpw: ata_host_register probe port 1 (error_handler:81348625)
cpw: ata_host_register call ata_port_probe
cpw: ata_host_register call ata_port_schedule
cpw: ata_host_register call ata_port_wait_eh
cpw: ata_port_wait_eh entered
cpw: ata_port_wait_eh, preparing to wait
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
cpw: ata_dev_configure entered
cpw: ata_dev_configure testing class
cpw: ata_dev_configure class is ATA_DEV_ATA
ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
ata2.00: 390721968 sectors, multi 16: LBA48
cpw: ata_dev_configure exiting
cpw: ata_dev_configure entered
cpw: ata_dev_configure testing class
cpw: ata_dev_configure class is ATA_DEV_ATA
cpw: ata_dev_configure exiting
cpw: ata_dev_set_mode printing:
ata2.00: configured for UDMA/133
cpw: ata_port_wait_eh, finished wait
cpw: ata_port_wait_eh exiting
cpw: ata_host_register done with probe port 1


When loaded with kexec and booted on a panic:

cpw: ata_host_register probe port 1 (error_handler:81348625)
cpw: ata_host_register call ata_port_probe
cpw: ata_host_register call ata_port_schedule
cpw: ata_host_register call ata_port_wait_eh
cpw: ata_port_wait_eh entered
cpw: ata_port_wait_eh, preparing to wait
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
cpw: ata_port_wait_eh, finished wait
cpw: ata_port_wait_eh exiting
cpw: ata_host_register done with probe port 1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: D-Link DFE-580TX 4 port NIC problems

2007-07-27 Thread Mario Doering
On Fri, 27 Jul 2007 14:08:09 +0200
Clemens Koller <[EMAIL PROTECTED]> wrote:

> Hi, Mario!
> 
> Mario Doering schrieb:
> > Hello,
> > 
> > are there any news or questions on this issue?
> 
> Can you try the latest kernel to see if the same problem
> persists?
> Is there any kernel version where it was working fine?


Hello Clemens,

I have tried different kernerls with no success so far. I have not
tried a 2.6.22 kernel yet, but I can do so of course.

It would take some time then again to wait for the error to arise ;-)

Bye,
Mario.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS & spinlocks

2007-07-27 Thread Satyam Sharma
Hi,

On 7/28/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> Hi Rodolfo,
>
> On 7/28/07, Rodolfo Giometti <[EMAIL PROTECTED]> wrote:
> > On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote:
> > >
> > > My point is that the lock should be used to protect specific data. Thus, 
> > > it
> > > would be more correct to say, "spinlock foo is taken because
> > > pps_register_source() accesses variable bar".
> > >
> > > That way, if someone else wants to access "bar", they know that they need
> > > to take lock "foo".
> >
> > Ah, ok! I see. :)
>
> I only glanced through the code, so could be wrong, but I noticed that
> the only global / shared data you have in there is a global "pps_source"
> array of pps_s structs. That's accessed / modified from the various
> syscalls introduced in the API exported to userspace, as well as the
> register/unregister/pps_event API exported to in-kernel client subsystems,
> yes? So it looks like you need to introduce proper locking for it, simply
> type-qualifying it as "volatile" is not enough.
>
> However, I think you've introduced two locks for it. The syscalls (that
> run in process context, obviously) seem to use a pps_mutex and
> pps_event() seems to be using the pps_lock spinlock (because that
> gets executed from interrupt context) -- and from the looks of it, the
> register/unregister functions are using /both/ the mutex and spinlock (!)
>
> This isn't quite right, (in fact there's nothing to protect pps_event from
> racing against a syscall), so you should use *only* the spinlock for
> synchronization -- the spin_lock_irqsave/restore() variants, in fact.

Take the race between the time_pps_setparams() syscall and a concurrent
pps_event() from an interrupt for instance. From sys_time_pps_setparams,
the parameters for an existing source are not modified / set atomically,
which means a pps_event() called on the same source in between will see
invalid parameters ... and bad things will happen.

> [ Also, have you considered making pps_source a list and not an array?
> It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
> kind of gymnastics in there, and you _can_ return a pointer to the
> corresponding pps source struct from the register() function to the in-kernel
> users, so that way you get to retain the O(1) access to the corresponding
> source when a client calls into pps_event(), similar to how you're using the
> array index presently. ]

I think the above would be sane and safe -- your driver has pretty simple
lifetime rules, and "sources" are only created / destroyed from within kernel,
as and when clients call pps_register_source() and pps_unregister_source().
So pps_event() can be called on a given source only between the
corresponding register() and unregister() -- which means register() can
return us a reference/pointer on the source after allocating / adding it to
the list (instead of the fixed array index as it presently is), which remains
valid for the entire duration of the source, till unregister() is called, after
which we can't be calling pps_event() on the same source anyway.

> I also noticed code like (from pps_event):
>
> +   /* Try to grab the lock, if not we prefere loose the event... */
> +   if (!spin_trylock(&pps_lock))
> +   return;
>
> which looks worrisome and unnecessary. That spinlock looks to be of
> fine enough granularity to me, do you think there'd be any contention
> on it? I /think/ you can simply make that a spin_lock().
>
> Overall the code looks simple / straightforward enough to me (except for
> the parport / uart stuff that I have no clue about), and I'll also read up on
> the relevant RFC for this and would hopefully try and give you a more
> meaningful review over the weekend.

Ok, I've looked through (most of) the RFC and code now, and am only
commenting on a design-level for now. Anyway, I didn't like the way
you've significantly drifted from the RFC in several ways:

1. The RFC mandates no such userspace interface / syscall as the
time_pps_cmd() that you've implemented -- it looks, smells, and feels
like an ioctl, in fact that's what it is for practical purposes. I'm confused
as to why didn't you just go ahead and implement the special-file-and-
file-descriptor based approach as advocated / mandated there.

[ You've implemented the (optional, as per RFC) time_pps_findsource
operation in the kernel using the above "pseudo-ioctl", but that wasn't
necessary -- as the RFC itself illustrates, it's something that can easily
be done (in fact should be done) completely in userspace itself. ]

2. If you fix the above two issues, you'll notice that you don't need to
short-circuit the (RFC-mandated) time_pps_create/destroy(handle)
syscalls in the userspace header/library anymore, as you presently are.

Here's how I'd go about desiging/implementing this:

* At the time of pps_register_source()  -- called by an in-kernel client
subsystem that creates a PPS source -- allocate a pps source, gener

Re: Kernel modules compilation

2007-07-27 Thread shacky
Thank you very much for your help!
Tomorrow I will try! :-)
Bye!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] scheduler: improve SMP fairness in CFS

2007-07-27 Thread Chris Snook

Tong Li wrote:

On Fri, 27 Jul 2007, Chris Snook wrote:


Tong Li wrote:
I'd like to clarify that I'm not trying to push this particular code 
to the kernel. I'm a researcher. My intent was to point out that we 
have a problem in the scheduler and my dwrr algorithm can potentially 
help fix it. The patch itself was merely a proof-of-concept. I'd be 
thrilled if the algorithm can be proven useful in the real world. I 
appreciate the people who have given me comments. Since then, I've 
revised my algorithm/code. Now it doesn't require global locking but 
retains strong fairness properties (which I was able to prove 
mathematically).


Thanks for doing this work.  Please don't take the implementation 
criticism as a lack of appreciation for the work.  I'd like to see 
dwrr in the scheduler, but I'm skeptical that re-introducing expired 
runqueues is the most efficient way to do it.


Given the inherently controversial nature of scheduler code, 
particularly that which attempts to enforce fairness, perhaps a 
concise design document would help us come to an agreement about what 
we think the scheduler should do and what tradeoffs we're willing to 
make to do those things.  Do you have a design document we could discuss?


-- Chris



Thanks for the interest. Attached is a design doc I wrote several months 
ago (with small modifications). It talks about the two pieces of my 
design: group scheduling and dwrr. The description was based on the 
original O(1) scheduler, but as my CFS patch showed, the algorithm is 
applicable to other underlying schedulers as well. It's interesting that 
I started working on this in January for the purpose of eventually 
writing a paper about it. So I knew reasonably well the related research 
work but was totally unaware that people in the Linux community were 
also working on similar things. This is good. If you are interested, I'd 
like to help with the algorithms and theory side of the things.


  tong

---
Overview:

Trio extends the existing Linux scheduler with support for 
proportional-share scheduling. It uses a scheduling algorithm, called 
Distributed Weighted Round-Robin (DWRR), which retains the existing 
scheduler design as much as possible, and extends it to achieve 
proportional fairness with O(1) time complexity and a constant error 
bound, compared to the ideal fair scheduling algorithm. The goal of Trio 
is not to improve interactive performance; rather, it relies on the 
existing scheduler for interactivity and extends it to support MP 
proportional fairness.


Trio has two unique features: (1) it enables users to control shares of 
CPU time for any thread or group of threads (e.g., a process, an 
application, etc.), and (2) it enables fair sharing of CPU time across 
multiple CPUs. For example, with ten tasks running on eight CPUs, Trio 
allows each task to take an equal fraction of the total CPU time. These 
features enable Trio to complement the existing Linux scheduler to 
enable greater user flexibility and stronger fairness.


Background:

Over the years, there has been a lot of criticism that conventional Unix 
priorities and the nice interface provide insufficient support for users 
to accurately control CPU shares of different threads or applications. 
Many have studied scheduling algorithms that achieve proportional 
fairness. Assuming that each thread has a weight that expresses its 
desired CPU share, informally, a scheduler is proportionally fair if (1) 
it is work-conserving, and (2) it allocates CPU time to threads in exact 
proportion to their weights in any time interval. Ideal proportional 
fairness is impractical since it requires that all runnable threads be 
running simultaneously and scheduled with infinitesimally small quanta. 
In practice, every proportional-share scheduling algorithm approximates 
the ideal algorithm with the goal of achieving a constant error bound. 
For more theoretical background, please refer to the following papers:


I don't think that achieving a constant error bound is always a good thing.  We 
all know that fairness has overhead.  If I have 3 threads and 2 processors, and 
I have a choice between fairly giving each thread 1.0 billion cycles during the 
next second, or unfairly giving two of them 1.1 billion cycles and giving the 
other 0.9 billion cycles, then we can have a useful discussion about where we 
want to draw the line on the fairness/performance tradeoff.  On the other hand, 
if we can give two of them 1.1 billion cycles and still give the other one 1.0 
billion cycles, it's madness to waste those 0.2 billion cycles just to avoid 
user jealousy.  The more complex the memory topology of a system, the more 
"free" cycles you'll get by tolerating short-term unfairness.  As a crude 
heuristic, scaling some fairly low tolerance by log2(NCPUS) seems appropriate, 
but eventually we should take the boot-time computed migration costs into 
consideration.



[1] A. 

Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread grundig
El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> 
escribió:

> how do you know there will be other activity? You start the IO and that
> basically blacks out the disk for 5 to 10 ms. If the "real" IO gets
> submitted in that time you add latency. You cannot predict that IO
> happening or not happening.

If there hasn't be much IO for some time, it looks quite reasonable to expect
that there won't be more in the near future. As most of heuristics can fail, but
then this is a feature mostly for desktops, not servers.

There's an old saying that says something like "an open source project starts
dying when new people can't participate in the project no matter how hard
they try". It's hard to understand why there's so many people opposing to
this when other more controversial features are merged much faster, (like, fe. 
the UIO driver framework). 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lzo: Add some missing casts

2007-07-27 Thread Richard Purdie
Add some casts to the LZO compression algorithm after they were removed
during cleanup and shouldn't have been.

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---

This fixes the reported problems for me, I've checked fairly carefully
and I can't see any other issues. Edward, could you see if this resolves
the problems in your case please?

Index: linux-2.6.22/lib/lzo/lzo1x_compress.c
===
--- linux-2.6.22.orig/lib/lzo/lzo1x_compress.c
+++ linux-2.6.22/lib/lzo/lzo1x_compress.c
@@ -32,13 +32,13 @@ _lzo1x_1_do_compress(const unsigned char
ip += 4;
 
for (;;) {
-   dindex = ((0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK;
+   dindex = ((size_t)(0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK;
m_pos = dict[dindex];
 
if (m_pos < in)
goto literal;
 
-   if (ip == m_pos || (ip - m_pos) > M4_MAX_OFFSET)
+   if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
goto literal;
 
m_off = ip - m_pos;
@@ -51,7 +51,7 @@ _lzo1x_1_do_compress(const unsigned char
if (m_pos < in)
goto literal;
 
-   if (ip == m_pos || (ip - m_pos) > M4_MAX_OFFSET)
+   if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
goto literal;
 
m_off = ip - m_pos;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: serial flow control appears broken

2007-07-27 Thread Paul Fulghum
On Fri, 2007-07-27 at 13:48 -0700, Lee Howard wrote:
> Here's the output:
> 
> type: 4
> line: 1
> line: 760
>  irq: 3
>flags: 1358954688
>   xmit_fifo_size: 16
>   custom_divisor: 0
>baud_base: 115200

OK, the FIFO should be enabled.

What is known:

* The error is a hardware FIFO overrun.
  - observed message is in n_tty due to driver setting TTY_OVERRUN

* The RTS/CTS flow control is not involved
  - this is done only by the ldisc in response to buffer levels
  - you verified crtscts is set
  - you did not observed RTS change when 'overflow error' logged
  - you did observe RTS change when application stopped reading

So this seems to be a latency issue reading the receive
FIFO in the ISR. The current rx FIFO trigger level
should be 8 bytes (UART_FCR_R_TRIG_10) which gives the
ISR 694usec to get the data at 115200bps.

IIRC, in 2.2.X kernels this defaulted to 4 bytes
(TRIG_01) which gave a little more time to service the interrupt.

How does the data rate affect the frequency of the overrun errors?
Does 57600bps make them go away?

--
Paul




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT 1/5] Input: implement proper locking in input core

2007-07-27 Thread Indan Zupancic
Hi,

Not real feedback, just some nitpicks.

On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote:
> +static int input_defuzz_abs_event(int value, int old_val, int fuzz)
> +{
> + if (fuzz) {
> + if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2)
> + return value;
>
> - add_input_randomness(type, code, value);
> + if (value > old_val - fuzz && value < old_val + fuzz)
> + return (old_val * 3 + value) / 4;
>
> - switch (type) {
> + if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2)
> + return (old_val + value) / 2;
> + }

Shouldn't the return values of the second and third case be reversed?
In the 2nd check the new values is weighted for 1/4, while in the 3rd
case it counts for 1/2, which breaks the "account new value more when
it is closer to the old one" logic that I thought I saw here. So to sum up,
should the second return be "return (old_val + value * 3) / 4"?


> +/*
> + * Generate software autorepeat event. Note that we take
> + * dev->event_lock here to avoid racing with input_event
> + * which may cause keys get "stuck".
> + */

Hurray. :-)

> - if (code > SW_MAX || !test_bit(code, dev->swbit) || 
> !!test_bit(code, dev->sw) == value)
> - return;
> + if (dev->rep[REP_PERIOD])
> + mod_timer(&dev->timer, jiffies +
> + msecs_to_jiffies(dev->rep[REP_PERIOD]));
> + }

Perhaps use a local var for the "msecs_to_jiffies(dev->rep[REP_PERIOD])" part.


> +static void input_start_autorepeat(struct input_dev *dev, int code)
> +{
> + if (test_bit(EV_REP, dev->evbit) &&
> + dev->rep[REP_PERIOD] && dev->rep[REP_DELAY] &&
> + dev->timer.data) {
> + dev->repeat_key = code;
> + mod_timer(&dev->timer,
> +   jiffies + msecs_to_jiffies(dev->rep[REP_DELAY]));
> + }
> +}

Same here.


> + case EV_KEY:
> + if (is_event_supported(code, dev->keybit, KEY_MAX) &&
> + !!test_bit(code, dev->key) != value) {

A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it?
So "test_bit(code, dev->key) != value" should be all right.
I noticed that the old code did it too, but still.

> - case EV_MSC:
> + case EV_SW:
> + if (is_event_supported(code, dev->swbit, SW_MAX) &&
> + !!test_bit(code, dev->sw) != value) {

Same.

> - break;
> + case EV_LED:
> + if (is_event_supported(code, dev->ledbit, LED_MAX) &&
> + !!test_bit(code, dev->led) != value) {

And here.


> +void input_inject_event(struct input_handle *handle,
> + unsigned int type, unsigned int code, int value)
>  {
> - struct input_dev *dev = (void *) data;
> + struct input_dev *dev = handle->dev;
> + struct input_handle *grab;
>
> - if (!test_bit(dev->repeat_key, dev->key))
> - return;
> + if (is_event_supported(type, dev->evbit, EV_MAX)) {
> + spin_lock_irq(&dev->event_lock);
>
> - input_event(dev, EV_KEY, dev->repeat_key, 2);
> - input_sync(dev);
> + grab = rcu_dereference(dev->grab);
> + if (!grab || grab == handle)
> + input_handle_event(dev, type, code, value);

'handle' can't be NULL, so can drop the "!grab" check, as checking
"grab == handle" should be sufficient.


> +/**
> + * input_open_device - open input device
> + * @handle: handle through which device is being accessed
> + *
> + * This function should be called by input handlers when they
> + * want to start receive events from given input device.
> + */
>  int input_open_device(struct input_handle *handle)
>  {
>   struct input_dev *dev = handle->dev;
> - int err;
> + int retval;
>
> - err = mutex_lock_interruptible(&dev->mutex);
> - if (err)
> - return err;
> + retval = mutex_lock_interruptible(&dev->mutex);
> + if (retval)
> + return retval;
> +
> + if (dev->going_away) {
> + retval = -ENODEV;
> + goto out;
> + }
>
>   handle->open++;
>
>   if (!dev->users++ && dev->open)

Ugh, not your code, and perhaps it's me, but that looks weird.
The ++ hidden inthe if check is ugly, and would mean that "users"
can be negative, which is strange.

> - err = dev->open(dev);
> + retval = dev->open(dev);
>
> - if (err)
> - handle->open--;
> + if (retval && !--handle->open) {

Eek! That -- is hidden well there. Would it hurt to call synchronize_sched()
unconditionally? Something like:

if (retval) {
handle->open--;

It's a rare case anyway.

> + /*
> +  * Make sure we are not delivering any more events
> +  * through this handle
> +  */
> + synchronize_

Re: UML compile error

2007-07-27 Thread Andrew Morton
On Sat, 28 Jul 2007 00:46:57 +0200
Gabriel C <[EMAIL PROTECTED]> wrote:

> UML does not compile on current git head. 
> 
> 
> $ make defconfig ARCH=um
> [..]
> $ make  ARCH=um
> scripts/kconfig/conf -s arch/um/Kconfig
> net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 
> 'BT_HIDP' refers to undefined symbol 'HID'
> drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol 
> 'RTL8187' refers to undefined symbol 'EEPROM_93CX6'
>   SYMLINK arch/um/include/kern_constants.h
>   CHK arch/um/include/uml-config.h
>   UPD arch/um/include/uml-config.h
>   CC  arch/um/sys-i386/user-offsets.s
>   CHK arch/um/include/user_constants.h
>   CHK include/linux/version.h
>   CHK include/linux/utsrelease.h
>   CC  arch/um/kernel/asm-offsets.s
> In file included from include/linux/sched.h:54,
>  from arch/um/include/sysdep/kernel-offsets.h:2,
>  from arch/um/kernel/asm-offsets.c:1:
> include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined

I suspect your build setup broke.  Try `make mrproper' then
have another go.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Andi Kleen
> Any faults in that reasoning?

GNU sort uses a merge sort with temporary files on disk. Not sure
how much it keeps in memory during that, but it's probably less
than 150MB. At some point the dirty limit should kick in and write back the 
data of the temporary files; so it's not quite the same as anonymous memory. 
But it's not that different given.

It would be better to measure than to guess. At least Andrew's measurements
on 128MB actually didn't show updatedb being really that big a problem.

Perhaps some people have much more files or simply a less efficient
updatedb implementation?

I guess the people who complain here that loudly really need to supply
some real numbers. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Antonino A. Daplas
On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote:
> On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote:
> > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> > >
> > > > With the patch reverted and 24bpp, it oopses before freezing (with two
> > > > odd looking boot logos on the screen):
> > > >
> > > Tested this further and it fails on:
> > >
> > > rev = fb_readl(par->mmio_base + 0x04);
> >
> > Doubtful if this line is the point of failure, this line is executed
> > only once, on initialization.
> 
> 
> par->mmio_base is corrupted in some way during the call to
> register_framebuffer - still investigating how/why.

Possible, par->mmio_base is the last field in struct pvr2fb_par, 
after that is the pseudo_palette. The oops did not manifest when the
pseudo_palette was written as u16, but oops'ed when written as u32.
Memory alignment problems?

Try the patch I posted before, might help.

Tony 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Björn Steinbrink
On 2007.07.27 20:16:32 +0200, Rene Herman wrote:
> On 07/27/2007 07:45 PM, Daniel Hazelton wrote:
>
>> Updatedb or another process that uses the FS heavily runs on a users
>> 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory
>> pressure that causes other applications to be swapped to disk. In the
>> morning the user has to wait for the system to swap those applications
>> back in.
>> Questions about it:
>> Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] 
>> Yes, it does. 
>
> No it does not. If updatedb filled memory to the point of causing swapping 
> (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch 
> hasn't any memory to prefetch into -- updatedb itself doesn't use any 
> significant memory.
>
> Here's swap-prefetch's author saying the same:
>
> http://lkml.org/lkml/2007/2/9/112
>
> | It can't help the updatedb scenario. Updatedb leaves the ram full and
> | swap prefetch wants to cost as little as possible so it will never
> | move anything out of ram in preference for the pages it wants to swap
> | back in.
>
> Now please finally either understand this, or tell us how we're wrong.

Con might have been wrong there for boxes with really little memory.

My desktop box has not even 300k inodes in use (IIRC someone posted a df
-i output showing 1 million inodes in use). Still, the memory footprint
of the "sort" process grows up to about 50MB. Assuming that the average
filename length stays, that would mean 150MB for the 1 million inode
case, just for the "sort" process.

Now, sort cannot produce any output before its got all its input, so
that RSS usage exists at least as long as the VFS cache is growing due
to the ongoing search for files.

And then, all that memory that "sort" uses is required, because sort
needs to output its results. So if there's memory pressure, the VFS
cache is likely to be dropped, because "sort" needs its data, for
sorting and producing output. And then sort terminates and leaves that
whole lot of memory _unused_. The other actions of updatedb only touch
the locate db, which is just a few megs (4.5MB here) big so the cache
won't grow that much again.

OK, so we got about, say, at least 128MB of totally unused memory, maybe
even more. If you look at the vmstat output I sent, you see that I had
between 90MB and 128MB free, depending on the swappiness setting, with
increased inode usage, that could very well scale up.

Conclusion: updatedb does _not_ leave the RAM full. And for a box with
little memory (say 256MB) it might even be 50% or more memory that is
free after updatedb ran. Might that make swap prefetch kick in?


Any faults in that reasoning?

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][sas] Fix potential NULL pointer dereference bug in sas_smp_get_phy_events()

2007-07-27 Thread Jesper Juhl


On 28/07/07, James Bottomley <[EMAIL PROTECTED]> wrote:
> On Fri, 2007-07-27 at 23:27 +0200, Jesper Juhl wrote:
> > In sas_smp_get_phy_events() we never test if the call to
> > alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run
> > the risk of dereferencing a NULL pointer if it does fail. Far
> > better to test if we got NULL back and in that case return -ENOMEM
> > just as we already do for the other memory allocation in that
> > function.
> > This patch reworks the memory allocation a bit to deal with it
> > (compile tested only).
> >
> >
> > Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
> > ---
> >
> >  drivers/scsi/libsas/sas_expander.c |   11 +--
> >  1 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/libsas/sas_expander.c 
> > b/drivers/scsi/libsas/sas_expander.c
> > index b500f0c..85f5145 100644
> > --- a/drivers/scsi/libsas/sas_expander.c
> > +++ b/drivers/scsi/libsas/sas_expander.c
> > @@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct 
> > asd_sas_port *port,
> >  int sas_smp_get_phy_events(struct sas_phy *phy)
> >  {
> >   int res;
> > + u8 *req;
> > + u8 *resp;
> >   struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
> >   struct domain_device *dev = sas_find_dev_by_rphy(rphy);
> > - u8 *req = alloc_smp_req(RPEL_REQ_SIZE);
> > - u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL);
> >
> > + resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL);
> 
> Actually, this should be alloc_smp_resp(RPEL_RESP_SIZE);
> 
> >   if (!resp)
> >   return -ENOMEM;
> >
> > + req = alloc_smp_req(RPEL_REQ_SIZE);
> > + if (!req) {
> > + res = -ENOMEM;
> > + goto out;
> > + }
> 
> Just for the sake of being the same as all the rest of the code, the
> sequence should be
> 
> req = alloc_smp_req(xxx_REQ_SIZE);
> if (!req)
> return -ENOMEM;
> 
> resp = alloc_smp_resp(xxx_RESP_SIZE);
> if (!resp) {
> kfree(req);
> return -ENOMEM;
> }
> 
> (allocate request then response).
> 
Fair enough. It makes the code a bit larger though : 

 My way, as per the original patch:
textdata bss dec hex filename
   13820   0   8   138283604 drivers/scsi/libsas/sas_expander.o
 Your way, as per this patch:
textdata bss dec hex filename
   13832   0   8   138403610 drivers/scsi/libsas/sas_expander.o

I hope this patch is acceptable : 


In sas_smp_get_phy_events() we never test if the call to
alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run
the risk of dereferencing a NULL pointer if it does fail. Far
better to test if we got NULL back and in that case return -ENOMEM
just as we already do for the other memory allocation in that
function.
This patch should take care of it (compile tested only).


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 drivers/scsi/libsas/sas_expander.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index b500f0c..e98d2b9 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct asd_sas_port 
*port,
 int sas_smp_get_phy_events(struct sas_phy *phy)
 {
int res;
+   u8 *req;
+   u8 *resp;
struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
struct domain_device *dev = sas_find_dev_by_rphy(rphy);
-   u8 *req = alloc_smp_req(RPEL_REQ_SIZE);
-   u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL);
 
-   if (!resp)
+   req = alloc_smp_req(RPEL_REQ_SIZE);
+   if (!req)
return -ENOMEM;
 
+   resp = alloc_smp_resp(RPEL_RESP_SIZE);
+   if (!resp) {
+   kfree(req);
+   return -ENOMEM;
+   }
+
req[1] = SMP_REPORT_PHY_ERR_LOG;
req[9] = phy->number;
 




> It looks like disc_resp could use a little love too (it's using the req
> alloc routines).
> 
I'll take a look at that later.


-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU

2007-07-27 Thread Adrian Bunk
On Sat, Jul 28, 2007 at 12:47:37AM +0200, Stefan Richter wrote:
> Adrian Bunk wrote:
> > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive,
> 
> It's not entirely unintuitive.  That option's full name is "Support for
> suspend on SMP and hot-pluggable CPUs".
> 
> Only the place where you find the option is unintuitive, as far as its
> first application is concerned.  (It lives in the "Processor type and
> features" menu which is OK for the 2nd application of this option.)  And
> the variable name of that option is unintuitive because it covers only
> the 2nd application of the option, I suppose for historical reasons.

We can figure out ourselves when HOTPLUG_CPU is required, so there's no 
reason to bother the user with it.

> > +config SUSPEND_SMP_POSSIBLE
> > +   bool
> > +   depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
> > +   depends on SMP
> > +   default y
> > +
> > +config SUSPEND_SMP
> > +   bool
> > +   depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND
> > +   select HOTPLUG_CPU
> > +   default y
> 
> Yes, that's the price to pay if you want to select something that in
> turn depends on a number of other things.

Yes, but a good user interface is worth it.

> Wait, doesn't HOTPLUG_CPU also depend on EXPERIMENTAL?

Damn, I started thinking about it, and then forgot about it when 
finishing the patch.

My thoughts were:
Is HOTPLUG_CPU still an experimental feature, or has it become a 
well-tested no longer experimental feature now that it's used on
most recent laptops?

> Stefan Richter

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel modules compilation

2007-07-27 Thread Jesper Juhl
On 28/07/07, shacky <[EMAIL PROTECTED]> wrote:
> > Symbol: USB [=y]
> > Prompt: Support for Host-side USB
> >   Defined at drivers/usb/Kconfig:51
> >   Depends on: USB_SUPPORT && USB_ARCH_HAS_HCD
> >   Location:
> > -> Device Drivers
> >   -> USB support (USB_SUPPORT [=y])
>
> Could you tell me how you found them, please?
>

Some of them I just knew from past experiences where to find, some of
them are logical (like, obviously 'reiserfs' is found in the
Filesystems submenu), some I searched for using "/" and a few I
googled.

> > Hint: In menuconfig, type "/" to search.
>
> Thank you very much!
>
You're welcome.  By the way; if you had taken the time to read the
text at the top of the menuconfig interface you'd have known this
already - "... Press  to exit,  for Help,  for
Search."

> > Not really. Your distribution could be loading a ton of modules that
> > you don't really need. 'lsmod' will just show you what is currently
> > loaded, but that that doesn't necessarily mean that all those modules
> > are really needed.
>
> Ok, how I can know what modules are needed and what not? Only knowing
> the hardware of my system?
>
If you know the hardware of the system, the filesystems you use etc
etc, then it should be possible to deduce what modules you need...
Read the help text for each config option related to your modules and
think about whether or not you need it...

> Another question please, what the symbol "---" near a kernel
> configuration entry in menuconfig means? This entry is activated (with
> * or M) or not?
>
It means that the option was automagically selected by some other
option you selected, so you can't disable it unless you first disable
that other option that selected it.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][sas] Fix potential NULL pointer dereference bug in sas_smp_get_phy_events()

2007-07-27 Thread James Bottomley
On Fri, 2007-07-27 at 23:27 +0200, Jesper Juhl wrote:
> In sas_smp_get_phy_events() we never test if the call to 
> alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run 
> the risk of dereferencing a NULL pointer if it does fail. Far 
> better to test if we got NULL back and in that case return -ENOMEM 
> just as we already do for the other memory allocation in that 
> function.
> This patch reworks the memory allocation a bit to deal with it 
> (compile tested only).
> 
> 
> Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
> ---
> 
>  drivers/scsi/libsas/sas_expander.c |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_expander.c 
> b/drivers/scsi/libsas/sas_expander.c
> index b500f0c..85f5145 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct 
> asd_sas_port *port,
>  int sas_smp_get_phy_events(struct sas_phy *phy)
>  {
>   int res;
> + u8 *req;
> + u8 *resp;
>   struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
>   struct domain_device *dev = sas_find_dev_by_rphy(rphy);
> - u8 *req = alloc_smp_req(RPEL_REQ_SIZE);
> - u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL);
>  
> + resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL);

Actually, this should be alloc_smp_resp(RPEL_RESP_SIZE);

>   if (!resp)
>   return -ENOMEM;
>  
> + req = alloc_smp_req(RPEL_REQ_SIZE);
> + if (!req) {
> + res = -ENOMEM;
> + goto out;
> + }

Just for the sake of being the same as all the rest of the code, the
sequence should be

req = alloc_smp_req(xxx_REQ_SIZE);
if (!req)
return -ENOMEM;

resp = alloc_smp_resp(xxx_RESP_SIZE);
if (!resp) {
kfree(req);
return -ENOMEM;
}

(allocate request then response).

It looks like disc_resp could use a little love too (it's using the req
alloc routines).

James

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU

2007-07-27 Thread Linus Torvalds


On Sat, 28 Jul 2007, Adrian Bunk wrote:
> 
> The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so 
> what about something like the patch below?

Yeah, this looks reasonable.

May I suggest another level of indirection, though:

> +config SUSPEND_SMP_POSSIBLE
> + bool
> + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
> + depends on SMP
> + default y

How about making this a bit more split up, and do it as

# SMP suspend is possible on ..
config SUSPEND_SMP_POSSIBLE
bool
depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || 
PPC_PMAC))
default y

# UP suspend is possible on ..
config SUSPEND_UP_POSSIBLE
bool
depends on X86 || PPC64_SWSUSP || FRV || PPC32
default y 

# Can we suspend?
config SUSPEND_POSSIBLE
bool
depends on (SMP && SUSPEND_SMP_POSSIBLE) || 
(SUSPEND_UP_POSSIBLE && !SMP)
default y

and then we have just a

config SOFTWARE_SUSPEND
bool "Software Suspend (Hibernation)"
depends on PM && SWAP
depends on SUSPEND_POSSIBLE

config SUSPEND_SMP
bool
depends on SOFTWARE_SUSPEND && SMP
select HOTPLUG_CPU
default y

and now each of the config options looks pretty simple and describe one 
thing.

[ For extra bonus points: the SUSPEND_POSSIBLE thing is still pretty 
  complicated, and it might actually be a better idea to make it a 
  per-arch config option, and just make the x86/arch say

config SUSPEND_POSSIBLE
bool
depends on !(X86_VOYAGER && SMP)
default y

  instead: since SUSPEND_POSSIBLE is always true on x86 regardless of SMP 
  or not, just not on X86_VOYAGER. Then, each architecture can have its 
  own private rules for whether that architecture has SUSPEND_POSSIBLE or 
  not, so on ppc, it might look like

config SUSPEND_POSSIBLE
bool
depends on (PPC64 && (PPC_PSERIES || PPC_PMAC)) || PPC_SWSUSP
bool y

  or something, but the point is, now the complexity is a per-architecture 
  thing, so other architectures simply don't have to care any more! ]

And the user only ever sees one single question: the one for 
"SOFTWARE_SUSPEND". All the others would directly flow either from the 
architecture choice, or from that.

Anybody willing to rewrite it that way?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-07-27 Thread Daniel Hazelton
On Friday 27 July 2007 18:08:44 Mike Galbraith wrote:
> On Fri, 2007-07-27 at 13:45 -0400, Daniel Hazelton wrote:
> > On Friday 27 July 2007 06:25:18 Mike Galbraith wrote:
> > > On Fri, 2007-07-27 at 03:00 -0700, Andrew Morton wrote:
> > > > So hrm.  Are we sure that updatedb is the problem?  There are quite a
> > > > few heavyweight things which happen in the wee small hours.
> > >
> > > The balance in _my_ world seems just fine.  I don't let any of those
> > > system maintenance things run while I'm using the system, and it
> > > doesn't bother me if my working set has to be reconstructed after
> > > heavy-weight maintenance things are allowed to run.  I'm not seeing
> > > anything I wouldn't expect to see when running a job the size of
> > > updatedb.
> > >
> > >   -Mike
> >
> > Do you realize you've totally missed the point?
>
> Did you notice that I didn't make one disparaging remark about the patch
> or the concept behind it?   Did you notice that I took _my time_  to
> test, to actually look at  the problem?  No, you're too busy running
> your mouth to appreciate the efforts of others.

If you're done being an ass, take note of the fact that I never even said you 
were doing that. What I was commenting on was the fact that you (and a lot of 
the other developers) seem to keep saying "It doesn't happen here, so it 
doesn't matter!" - ie: If I don't see something happening, it doesn't matter.

> 
>
> Do yourself a favor, go dig into the VM source.  Read it, understand it
> (not terribly easy), _then_ come back and preach to me.

I've been trying to do that since the thread started. Note that you snipped 
where I said (and I'm going to paraphrase myself) "There is another way to 
fix this, but I don't have the understanding necessary".

Now, once more, I'm going to ask: What is so terribly wrong with swap 
prefetch? Why does it seem that everyone against it says "Its treating a 
symptom, so it can't go in"?

Try coming up with an answer that isn't "I don't see the problem on my $10K 
system" or similar - try explaining it based on the *technical* merits. Does 
it cause the processor cache to get thrashed? Does it create locking 
problems?

I stand by my statements, as vitriolic as you and Rene seem to want to get 
over it. So far in this thread I have not seen one bit of *technical* 
discussion over the merits, just the bits I've simplified and stated before.

> Have a nice day.

I am. You being nasty when somebody gets fed up with a line of BS doesn't stop 
me from having a nice day. Only thing that could make my life any better 
would be to have the questions I've asked answered, rather than having 
supposedly intelligent people act like trolls.

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel modules compilation

2007-07-27 Thread shacky
> Symbol: USB [=y]
> Prompt: Support for Host-side USB
>   Defined at drivers/usb/Kconfig:51
>   Depends on: USB_SUPPORT && USB_ARCH_HAS_HCD
>   Location:
> -> Device Drivers
>   -> USB support (USB_SUPPORT [=y])

Could you tell me how you found them, please?

> Hint: In menuconfig, type "/" to search.

Thank you very much!

> Not really. Your distribution could be loading a ton of modules that
> you don't really need. 'lsmod' will just show you what is currently
> loaded, but that that doesn't necessarily mean that all those modules
> are really needed.

Ok, how I can know what modules are needed and what not? Only knowing
the hardware of my system?

Another question please, what the symbol "---" near a kernel
configuration entry in menuconfig means? This entry is activated (with
* or M) or not?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


UML compile error

2007-07-27 Thread Gabriel C
Hi,

UML does not compile on current git head. 


$ make defconfig ARCH=um
[..]
$ make  ARCH=um
scripts/kconfig/conf -s arch/um/Kconfig
net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 'BT_HIDP' 
refers to undefined symbol 'HID'
drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol 
'RTL8187' refers to undefined symbol 'EEPROM_93CX6'
  SYMLINK arch/um/include/kern_constants.h
  CHK arch/um/include/uml-config.h
  UPD arch/um/include/uml-config.h
  CC  arch/um/sys-i386/user-offsets.s
  CHK arch/um/include/user_constants.h
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CC  arch/um/kernel/asm-offsets.s
In file included from include/linux/sched.h:54,
 from arch/um/include/sysdep/kernel-offsets.h:2,
 from arch/um/kernel/asm-offsets.c:1:
include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:33:3: error: #error You lose.
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:225:31: error: division by zero in #if
include/linux/jiffies.h:225:46: warning: "SHIFT_HZ" is not defined
In file included from arch/um/include/sysdep/kernel-offsets.h:2,
 from arch/um/kernel/asm-offsets.c:1:
include/linux/sched.h: In function 'dequeue_signal_lock':
include/linux/sched.h:1501: error: implicit declaration of function 
'local_irq_save'
include/linux/sched.h:1503: error: implicit declaration of function 
'local_irq_restore'
In file included f

Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU

2007-07-27 Thread Stefan Richter
Adrian Bunk wrote:
> The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive,

It's not entirely unintuitive.  That option's full name is "Support for
suspend on SMP and hot-pluggable CPUs".

Only the place where you find the option is unintuitive, as far as its
first application is concerned.  (It lives in the "Processor type and
features" menu which is OK for the 2nd application of this option.)  And
the variable name of that option is unintuitive because it covers only
the 2nd application of the option, I suppose for historical reasons.

> +config SUSPEND_SMP_POSSIBLE
> + bool
> + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
> + depends on SMP
> + default y
> +
> +config SUSPEND_SMP
> + bool
> + depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND
> + select HOTPLUG_CPU
> + default y

Yes, that's the price to pay if you want to select something that in
turn depends on a number of other things.

Wait, doesn't HOTPLUG_CPU also depend on EXPERIMENTAL?
-- 
Stefan Richter
-=-=-=== -=== ===--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc1-mm1 - seems OK on Dell Latitude D820, except for tpm_tis

2007-07-27 Thread Bjorn Helgaas
On Friday 27 July 2007 07:28:09 am [EMAIL PROTECTED] wrote:
> Looks like the problematic code is in tpm_tis.c tpm_tis_init() near here:
> 
> for (i = 3; i < 16 && chip->vendor.irq == 0; i++) {
> iowrite8(i, chip->vendor.iobase +
> TPM_INT_VECTOR(chip->vendor.locality));
> if (request_irq
> (i, tis_int_probe, IRQF_SHARED,
>  chip->vendor.miscdev.name, chip) != 0) {
> dev_info(chip->dev,
>  "Unable to request irq: %d for 
> probe\n"
> ,   
>  i);
> continue;
> }
> 
> This seems to be misbehaving differently for the two different DEBUG_SHIRQ
> cases.
> 
> With DEBUG_SHIRQ=n, it starts at IRQ3, gets to at least 8 (where it complains
> it can't request it for probing), and possibly all the way to 15, without ever
> actually selecting and assigning an IRQ (to refresh memories, in that range
> /proc/interrupts only lists:
> 
>   8:  0  0   IO-APIC-edge  rtc
>   9:  3  0   IO-APIC-fasteoi   acpi
>  12: 94  0   IO-APIC-edge  i8042
>  14: 148166  0   IO-APIC-edge  libata
>  15: 94  0   IO-APIC-edge  libata
> 
> So there's certainly IRQ's available.  No idea why it doesn't choose one. But
> since it never chose one, it never gets into the "wait for the IRQ" protected
> by 'if (chip->vendor.irq)' at the end of tpm_tis_send.
> 
> With DEBUG_SHIRQ=y, It starts at IRQ3, and assigns it (which seems a good 
> thing).
> Unfortunately, this then hits the timeouts in tpm_tis_send.
> 
> Anybody got an idea what *should* be happening here?

I don't know why tpm_tis_init() is messing around trying different
IRQs between 3 and 16.  That looks suspiciously x86-dependent.

Maybe if you don't have PNP (though I doubt TPMs exist on any
pre-PNPBIOS machines) the "check-IRQ" loop would be necessary.

But you're using the PNP probe, and PNP should just tell you what
IRQ the device is configured for (and whether the IRQ can be
shared -- see 8250_pnp.c for an example).

The BIOS should have configured the TPM IRQ, and if we go and
mess with that IRQ setting without going through the PNP interface,
e.g., the ACPI _SRS method, we're liable to mess something up.  The
TPM is often behind a few bridges, and if the bridge has any IRQ
routing configuration, only the BIOS knows how to keep that in
sync with the TPM IRQ configuration.

> Just for the record, I see this in /sys:
> 
> % cat /sys/bus/pnp/devices/00:0e/id
> BCM0102
> PNP0c31

What's in /sys/bus/pnp/devices/00:0e/resources?

Bjorn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops

2007-07-27 Thread Alan Cox
On Fri, 27 Jul 2007 22:52:43 +0200
David Lamparter <[EMAIL PROTECTED]> wrote:

> [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops
> 
> Recognise the FSC Amilo A1630's incarnation of a SiS5513 chip as laptop to
> get UDMA100 support.
> 
> Signed-off-by: David Lamparter <[EMAIL PROTECTED]>

Looks good to me - I've made a matching update to drivers/ata/pata_sis.c
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ia64: fix a few section mismatch warnings

2007-07-27 Thread Luck, Tony
-   mca_data = alloc_bootmem(sizeof(struct ia64_mca_cpu)
-* NR_CPUS + KERNEL_STACK_SIZE);
+   mca_data = mca_bootmem(NR_CPUS + KERNEL_STACK_SIZE);

Oops.  You moved the multiply by sizeof(struct ia64_mca_cpu) up into
the mca_bootmem() function to make it very specific to this use. But
mutiply has higher precedence than addition.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with framebuffer in 2.6.22-git17

2007-07-27 Thread Adrian McMenamin
On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote:
> > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> >
> > > With the patch reverted and 24bpp, it oopses before freezing (with two
> > > odd looking boot logos on the screen):
> > >
> > Tested this further and it fails on:
> >
> > rev = fb_readl(par->mmio_base + 0x04);
>
> Doubtful if this line is the point of failure, this line is executed
> only once, on initialization.


par->mmio_base is corrupted in some way during the call to
register_framebuffer - still investigating how/why.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] let SUSPEND select HOTPLUG_CPU

2007-07-27 Thread Adrian Bunk
On Thu, Jul 26, 2007 at 01:55:18PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 26 Jul 2007, Rafael J. Wysocki wrote:
> > 
> > My point is we have ACPI dependent on PM, so if you want ACPI, you end
> > up with all of the STR stuff built in, which is what you don't like (if I
> > understand that correctly).  If we have CONFIG_SUSPEND, you'll be able to
> > choose ACPI alone. :-)
> 
> Good point. 
> 
> Anyway, I think the ACPI problem really is as trivial as the following 
> three-liner removal fix. If the user doesn't want suspend, ACPI shouldn't 
> force it on him.
> 
> A nicer fix might be to also make some of the ACPI helper routines depend 
> on whether they are needed or not (which in turn will depend on whether 
> suspend support has been compiled into the kernel), but quite frankly, 
> that's secondary at least for me.
> 
> So if we have a few ACPI routines that will never get called (because we 
> don't even enable the interfaces that would *cause* them to be called), I 
> don't think that's a huge problem. It's a beauty wart, but nobody really 
> cares (and it's even something that we could get the compiler to optimize 
> away for us if we really cared).
> 
>   Linus
> 
> ---
> Don't force-enable suspend/hibernate support just for ACPI
> 
> It's a totally independent decision for the user whether he wants
> suspend and/or hibernation support, and ACPI shouldn't care.
> 
> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> ---
>  drivers/acpi/Kconfig |3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 251344c..22b401b 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -11,9 +11,6 @@ menuconfig ACPI
>   depends on PCI
>   depends on PM
>   select PNP
> - # for sleep
> - select HOTPLUG_CPU if X86 && SMP
> - select SUSPEND_SMP if X86 && SMP
>   default y
>   ---help---
> Advanced Configuration and Power Interface (ACPI) support for 

The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so 
what about something like the patch below?

This should address a main issue behind Len's patch.

cu
Adrian


<--  snip  -->


An implementation detail of the suspend code that is not intuitive for 
the user is the HOTPLUG_CPU dependency of SOFTWARE_SUSPEND if SMP.

This patch changes SOFTWARE_SUSPEND if SMP to select HOTPLUG_CPU instead 
of depending on it.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 kernel/power/Kconfig |   20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -72,9 +72,22 @@ config PM_TRACE
CAUTION: this option will cause your machine's real-time clock to be
set to an invalid time after a resume.
 
+config SUSPEND_SMP_POSSIBLE
+   bool
+   depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
+   depends on SMP
+   default y
+
+config SUSPEND_SMP
+   bool
+   depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND
+   select HOTPLUG_CPU
+   default y
+
 config SOFTWARE_SUSPEND
bool "Software Suspend (Hibernation)"
-   depends on PM && SWAP && (((X86 || PPC64_SWSUSP) && (!SMP || 
SUSPEND_SMP)) || ((FRV || PPC32) && !SMP))
+   depends on PM && SWAP
+   depends on ((X86 || PPC64_SWSUSP || FRV || PPC32) && !SMP) || 
SUSPEND_SMP_POSSIBLE
---help---
  Enable the suspend to disk (STD) functionality, which is usually
  called "hibernation" in user interfaces.  STD checkpoints the
@@ -132,11 +145,6 @@ config PM_STD_PARTITION
  suspended image to. It will simply pick the first available swap 
  device.
 
-config SUSPEND_SMP
-   bool
-   depends on HOTPLUG_CPU && (X86 || PPC64) && PM
-   default y
-
 config APM_EMULATION
tristate "Advanced Power Management Emulation"
depends on PM && SYS_SUPPORTS_APM_EMULATION

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT 0/5] Input locking patches

2007-07-27 Thread Indan Zupancic
On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote:
> Hi everyone,
>
> I finally managed to put together some patches implementing
> locking in input core and main input handles. Please look
> over them and give them a spin.

Since kernel 2.6.21 or so I was annoyed by a warping mouse, and
one kernel version later also by "stuck" keys, causing repeated input
at the most inconvenient moments (e.g. when opening a program by
pressing F1).

As it happened irregularly and unpredictable it was hard to debug,
and I suspected faulty hardware. My cpu was quite hot, but after
removing all the dust it seems all right again. Unfortunately that
was about the same time I upgraded to 2.6.23-rc1, so all I can say
is that the stuck key problem seems to be gone, though not sure
thanks to what, but that neither the cleaning nor the upgrade fixed
the warping mouse problem.

I'm running with these locking patches for two days now and the
mouse doesn't warp any more (it can also have fixed the stuck key
problem, not sure). Normally it would warp several times a day,
and that didn't happen yet, so I'm tempted to praise your patches.

Sorry for the babbling, just wanted to say that I've tested these
patches and that they seem to fix real problems.

Thanks,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Edgeport UPS Monitoring Problems

2007-07-27 Thread Andrew Morton
On Fri, 27 Jul 2007 13:37:08 -0700
Nick Pasich <[EMAIL PROTECTED]> wrote:

> 
> Greg/Peter/Al,

added linux-usb-devel.

> I've been using the edgeport 4 port USB to Serial Converter 
> to monitor APC Smart UPS's via apcupsd for quite awhile on 
> various Linux boxes.
> 
> I just upgraded to Kernel Version 2.6.22.1 from 2.6.20.6 on a 
> couple of systems and both the edgeports stopped communicating.
> 
> I tried applying various patches, "PATCH 026/149" and "PATCH 082/149" 
> and one by Alan Cox..  but they didn't fix the problem.
> 
> I copied the 2.6.20.6 edgeport module sources to the new 
> 2.6.22.1 tree and everything works again.
> 
>   linux/drivers/usb/serial/io_edgeport.c
>   linux/drivers/usb/serial/io_edgeport.h
>   linux/drivers/usb/serial/io_edgeport.mod.c
>   linux/drivers/usb/serial/io_tables.h
> 
> 
> I thought you guys ought to be aware of this
> 

Straightforward regression, most serious.  Thanks for reporting it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with reading DVD using 2.6.21.5

2007-07-27 Thread Manuel Reimer

Hello,

today I've tried to install Slackware 12.0

As the installer just "skipped" some install steps, I tried to find the 
error.


The problem seems to be unreadable parts on the DVD:

http://pastebin.com/f381e8a88

But the DVD is OK. I've checked the MD5sum directly from disc on the 
same system using the same DVD drive.


dmesg says:

http://pastebin.com/f63c5c389

The kernel, used on the Slackware setup disk, uses SMP, but my hardware 
doesn't support this (get error on dmesg). May this (SMP kernel on 
non-SMP system) cause such bugs?


Is this a known bug? How could code, which breaks DVD access, get into 
stable 2.6.21.5?


Thanks very much in advance

CU

Manuel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)

2007-07-27 Thread Indan Zupancic
On Sat, July 28, 2007 00:06, Arjan van de Ven wrote:
> On Fri, 2007-07-27 at 23:51 +0200, Indan Zupanci
>> > also, they take up seek time (5 to 10 msec), so if you were to read
>> > something else at the time you get additional latency.
>>
>> If there's other disk activity swap prefetch shouldn't do much, so this isn't
>> really true.
>
> how do you know there will be other activity? You start the IO and that
> basically blacks out the disk for 5 to 10 ms. If the "real" IO gets
> submitted in that time you add latency. You cannot predict that IO
> happening or not happening.

Ah, in that way. Yes, you right about that (though NCQ might help then?),
but that's true for all disk activity. Though I think swap prefetch didn't want
to run when there was CPU activity, so that would reduce the chance that
new IO is submitted right at that moment. I think in practice this isn't worth
worrying about, the real issue is the extra disk activity in the first place.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] scheduler: improve SMP fairness in CFS

2007-07-27 Thread hui
On Fri, Jul 27, 2007 at 12:03:28PM -0700, Tong Li wrote:
> Thanks for the interest. Attached is a design doc I wrote several months 
> ago (with small modifications). It talks about the two pieces of my design: 
> group scheduling and dwrr. The description was based on the original O(1) 
> scheduler, but as my CFS patch showed, the algorithm is applicable to other 
> underlying schedulers as well. It's interesting that I started working on 
> this in January for the purpose of eventually writing a paper about it. So 
> I knew reasonably well the related research work but was totally unaware 
> that people in the Linux community were also working on similar things. 
> This is good. If you are interested, I'd like to help with the algorithms 
> and theory side of the things.

Tong,

This is sufficient as an overview of the algorithm but not detailed enough
for it to be a discussable design doc I believe. You should ask Chris to see
what he means by this.

Some examples of your rebalancing scheme and how your invariant applies
across processor rounds would be helpful for me and possibly others as well.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >