Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-18 Thread Jarek Poplawski
On Fri, Jan 18, 2008 at 03:48:02PM +0800, Dave Young wrote:
> On Jan 18, 2008 3:38 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
...
> > IMHO, it would be nice to get the real state of current lockdep
> > problems here to figure out if there is any chance to do this right &
> > without any warnings with current lockdep. If I got it right from
> > earlier threads it might be impossible with USB, at least.
> 
> I don't think so, usb doesn't be affected by struct class mutex, they
> only use the lock of struct device. As I replied before, the lockdep
> issue exist only between class_interface and class_device.

OK, but I've meant possibility of changing their own semaphores later.

> > So, since I think these nesting levels seem to be wrong in 7/7 patch,
> > maybe it's better to exclude it from this patchset, and to try this as
> > testing for some time.
> 
> I may file the updated patch with more nesting changes and test it of
> course. Actually I should have done it, thanks.
...
> 1) Using CLASS_NORMAL/CLASS_PARENT/CLASS_CHILD will be enough.
> or
> 2) Simply add SINGLE_LEVEL_NESTING in class_device_add and other
> class_device functions because it is the only possible nest-lock place
> as I know.

If SINGLE_LEVEL_NESTING is enough? (means 2 levels total)

I think you should more care about real (logical) relations here, than
what's enough to get rid of lockdep warnings.

Since there are not so much of these changes, you can try both
variants. I'll be glad to look at this - maybe I'll mangage to figure
out BTW, what it's all about...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_32: remove the useless NR_syscalls macro

2008-01-18 Thread Ingo Molnar

* Dmitri Vorobiev <[EMAIL PROTECTED]> wrote:

> This is against current x86.git.
> 
> The size of the system call table for 32-bit x86 kernels is obtained 
> by compile-time calculation of the sys_call_table array, not from the 
> value, which the NR_syscalls macro expands to. This trivial patch 
> removes the fossil macro.
> 
> Manually tested by grepping the x86 files for the "NR_syscalls" 
> string. No relevant use cases found.
> 
> Build-tested using allyesconfig, allnoconfig and a couple of 
> randconfig instances. All builds successfully finished.
> 
> Runtime test performed using a stripped-down Debian-ish config. The 
> system booted successfully.

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: clean arch/[i386|x86_64] in make mrproper

2008-01-18 Thread Michael Opdenacker
On 01/17/2008 02:37 PM, Michael Opdenacker wrote:
>
> Proposed fix: add arch/i386 and arch/x86_84 to the list of
> directories cleaned by "make mrproper"
>
> Cleaner solution: stop creating these symbolic links, but this
> could cause issues with scripts still expecting bzImage in
> arch/i386/boot/bzImage or in arch/x86_64/boot/bzImage.
>
> What do you think? I can submit another patch for the second
> option.
>
> Michael.
>
> Signed-off-by: Michael Opdenacker <[EMAIL PROTECTED]> ---
> linux-2.6.24-rc8-git1/Makefile2008-01-17 09:54:22.0
> +0100 +++ linux-2.6.24-rc8-git1-mrproper-x86/Makefile2008-01-17
> 10:49:19.0 +0100 @@ -1088,7 +1088,7 @@ .tmp_kallsyms*
> .tmp_version .tmp_vmlinux* .tmp_System.map
>
> # Directories & files removed with 'make mrproper' -MRPROPER_DIRS
> += include/config include2 usr/include +MRPROPER_DIRS  +=
> include/config include2 usr/include arch/i386 arch/x86_64
> MRPROPER_FILES += .config .config.old include/asm .version
> .old_version \ include/linux/autoconf.h include/linux/version.h
> \ include/linux/utsrelease.h\
>
The problem is still there in 2.6.24-rc8-git2. In my opinion, it's a
significant bug in the kernel build system that "make mrproper" and
"make distclean" don't remove all generated files. 2.6.24 shoudn't
ship with this bug.

What do you think?

Cheers,

Michael.


-- 
Michael Opdenacker, Free Electrons
Free Embedded Linux Training Materials
on http://free-electrons.com/training
(More than 1500 pages!)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] X86: fix typo PAT to X86_PAT

2008-01-18 Thread Ingo Molnar

* Yinghai Lu <[EMAIL PROTECTED]> wrote:

>  config MTRR
>   bool "MTRR (Memory Type Range Register) support"
> - depends on !PAT
> + depends on !X86_PAT
>   ---help---
> On Intel P6 family processors (Pentium Pro, Pentium II and later)
> the Memory Type Range Registers (MTRRs) may be used to control

thanks. But, i think we should rather do the following: if X86_PAT is 
eanbled then /proc/mtrr should be read-only. There's no problem 
_looking_ at MTRR contents, as long as we do not try to modify them. Hm?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: remove redundant cpu_has_ definitions

2008-01-18 Thread Ingo Molnar

* Kyle McMartin <[EMAIL PROTECTED]> wrote:

> --- a/include/asm-x86/cpufeature.h
> +++ b/include/asm-x86/cpufeature.h
> @@ -195,21 +195,6 @@
>  #undef  cpu_has_centaur_mcr
>  #define cpu_has_centaur_mcr  0
>  
> -#undef  cpu_has_pse
> -#define cpu_has_pse  1
> -
> -#undef  cpu_has_pge
> -#define cpu_has_pge  1
> -
> -#undef  cpu_has_xmm
> -#define cpu_has_xmm  1
> -
> -#undef  cpu_has_xmm2
> -#define cpu_has_xmm2 1
> -
> -#undef  cpu_has_fxsr
> -#define cpu_has_fxsr 1
> -

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Build Failure at scripts/mkubooting/crc32.c

2008-01-18 Thread Sam Ravnborg
On Fri, Jan 18, 2008 at 11:44:34AM +0530, Kamalesh Babulal wrote:
> Hi Andrew,
> 
> The kernel build fails with following error message
> 
> scripts/mkubootimg/crc32.c:15:18: error: zlib.h: No such file or directory
> scripts/mkubootimg/crc32.c:77: error: expected '=', ',', ';', 'asm' or 
> '__attribute__' before 'crc_table'
> scripts/mkubootimg/crc32.c:153: error: expected '=', ',', ';', 'asm' or 
> '__attribute__' before 'crc32'
> make[2]: *** [scripts/mkubootimg/crc32.o] Error 1
> make[1]: *** [scripts/mkubootimg] Error 2
> make: *** [scripts] Error 2
> 
> The patch causing this build failure may be git-kbuild.patch.

The mkubootimg patches in kbuild.git has been reverted - but that was
after akpm merged kbuild.git.
So it is fixed in next -mm.

The workaround for now is to just remove the line
containing "mkubootimg" in scripts/Makefile.

(Assuming you do not need the uImage target).

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Build Failure at scripts/mkubooting/crc32.c

2008-01-18 Thread Sam Ravnborg
On Fri, Jan 18, 2008 at 11:44:34AM +0530, Kamalesh Babulal wrote:
 Hi Andrew,
 
 The kernel build fails with following error message
 
 scripts/mkubootimg/crc32.c:15:18: error: zlib.h: No such file or directory
 scripts/mkubootimg/crc32.c:77: error: expected '=', ',', ';', 'asm' or 
 '__attribute__' before 'crc_table'
 scripts/mkubootimg/crc32.c:153: error: expected '=', ',', ';', 'asm' or 
 '__attribute__' before 'crc32'
 make[2]: *** [scripts/mkubootimg/crc32.o] Error 1
 make[1]: *** [scripts/mkubootimg] Error 2
 make: *** [scripts] Error 2
 
 The patch causing this build failure may be git-kbuild.patch.

The mkubootimg patches in kbuild.git has been reverted - but that was
after akpm merged kbuild.git.
So it is fixed in next -mm.

The workaround for now is to just remove the line
containing mkubootimg in scripts/Makefile.

(Assuming you do not need the uImage target).

Sam
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: remove redundant cpu_has_ definitions

2008-01-18 Thread Ingo Molnar

* Kyle McMartin [EMAIL PROTECTED] wrote:

 --- a/include/asm-x86/cpufeature.h
 +++ b/include/asm-x86/cpufeature.h
 @@ -195,21 +195,6 @@
  #undef  cpu_has_centaur_mcr
  #define cpu_has_centaur_mcr  0
  
 -#undef  cpu_has_pse
 -#define cpu_has_pse  1
 -
 -#undef  cpu_has_pge
 -#define cpu_has_pge  1
 -
 -#undef  cpu_has_xmm
 -#define cpu_has_xmm  1
 -
 -#undef  cpu_has_xmm2
 -#define cpu_has_xmm2 1
 -
 -#undef  cpu_has_fxsr
 -#define cpu_has_fxsr 1
 -

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_32: remove the useless NR_syscalls macro

2008-01-18 Thread Ingo Molnar

* Dmitri Vorobiev [EMAIL PROTECTED] wrote:

 This is against current x86.git.
 
 The size of the system call table for 32-bit x86 kernels is obtained 
 by compile-time calculation of the sys_call_table array, not from the 
 value, which the NR_syscalls macro expands to. This trivial patch 
 removes the fossil macro.
 
 Manually tested by grepping the x86 files for the NR_syscalls 
 string. No relevant use cases found.
 
 Build-tested using allyesconfig, allnoconfig and a couple of 
 randconfig instances. All builds successfully finished.
 
 Runtime test performed using a stripped-down Debian-ish config. The 
 system booted successfully.

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-18 Thread Fengguang Wu
On Fri, Jan 18, 2008 at 06:41:09AM +0100, Andi Kleen wrote:
 Fengguang Wu [EMAIL PROTECTED] writes:
 
  Suppose we want to grant longer expiration window for temp files,
  adding a new list named s_dirty_tmpfile would be a handy solution.
 
 How would the kernel know that a file is a tmp file?

No idea - but it makes a good example ;-)

But for those making different filesystems for /tmp, /var, /data etc, 
per-superblock expiration parameters may help.

  So the question is: should we need more than 3 QoS classes?
 
 [just a random idea; i have not worked out all the implications]
 
 Would it be possible to derive a writeback apriority from the ionice
 level of the process originating the IO? e.g. we have long standing
 problems that background jobs even when niced and can cause
 significant slow downs to foreground processes by starving IO 
 and pushing out pages. ionice was supposed to help with that
 but in practice it does not seem to have helped too much and I suspect
 it needs more prioritization higher up the VM food chain. Adding
 such priorities to writeback would seem like a step in the right
 direction, although it would of course not solve the problem
 completely.

Good idea. Michael may well be considering similar interfaces :-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression: 100% io-wait with 2.6.24-rcX

2008-01-18 Thread Martin Knoblauch
- Original Message 
 From: Mel Gorman [EMAIL PROTECTED]
 To: Martin Knoblauch [EMAIL PROTECTED]
 Cc: Fengguang Wu [EMAIL PROTECTED]; Mike Snitzer [EMAIL PROTECTED]; Peter 
 Zijlstra [EMAIL PROTECTED]; [EMAIL PROTECTED]; Ingo Molnar [EMAIL 
 PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED] [EMAIL 
 PROTECTED]; Linus Torvalds [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Thursday, January 17, 2008 11:12:21 PM
 Subject: Re: regression: 100% io-wait with 2.6.24-rcX
 
 On (17/01/08 13:50), Martin Knoblauch didst pronounce:
   
  
  The effect  is  defintely  depending on  the  IO  hardware.
  
 performed the same tests
  on a different box with an AACRAID controller and there things
  look different.
 
 I take it different also means it does not show this odd performance
 behaviour and is similar whether the patch is applied or not?


Here are the numbers (MB/s) from the AACRAID box, after a fresh boot:

Test   2.6.19.2   2.6.24-rc6  
2.6.24-rc6-81eabcbe0b991ddef5216f30ae91c4b226d54b6d
dd1 325   350 290
dd1-dir   180   160 160
dd2 2x90 2x113 2x110
dd2-dir   2x120   2x922x93
dd33x54  3x70   3x70
dd3-dir  3x83  3x64   3x64
mix3  55,2x30  400,2x25   310,2x25

 What we are seing here is that:

a) DIRECT IO takes a much bigger hit (2.6.19 vs. 2.6.24) on this IO system 
compared to the CCISS box
b) Reverting your patch hurts single stream
c) dual/triple stream are not affected by your patch and are improved over 
2.6.19
d) the mix3 performance is improved compared to 2.6.19.
d1) reverting your patch hurts the local-disk part of mix3
e) the AACRAID setup is definitely faster than the CCISS.

 So, on this box your patch is definitely needed to get the pre-2.6.24 
performance
when writing a single big file.

 Actually things on the CCISS box might be even more complicated. I forgot the 
fact
that on that box we have ext2/LVM/DM/Hardware, while on the AACRAID box we have
ext2/Hardware. Do you think that the LVM/MD are sensitive to the page 
order/coloring?

 Anyway: does your patch only address this performance issue, or are there also
data integrity concerns without it? I may consider reverting the patch for my
production environment. It really helps two thirds of my boxes big time, while 
it does
not hurt the other third that much :-)

  
   I can certainly stress the box before doing the tests. Please
  define many for the kernel compiles :-)
  
 
 With 8GiB of RAM, try making 24 copies of the kernel and compiling them
 all simultaneously. Running that for for 20-30 minutes should be enough
 
 to randomise the freelists affecting what color of page is used for the
 dd  test.
 

 ouch :-) OK, I will try that.

Martin



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Giacomo Catenazzi
And to demostrate that Linus is not the only person
with this view, I copy some paragraphs from C99 rationale
(you can find standard, rationale and other documents
in http://clc-wiki.net/wiki/C_standardisation:ISO )

Page 75 of C99 rationale:

Type qualifiers were introduced in part to provide greater control over 
optimization. Several
important optimization techniques are based on the principle of cacheing: 
under certain
circumstances the compiler can remember the last value accessed (read or 
written) from a
location, and use this retained value the next time that location is read. (The 
memory, or
cache, is typically a hardware register.) If this memory is a machine 
register, for instance, the
code can be smaller and faster using the register rather than accessing 
external memory.
The basic qualifiers can be characterized by the restrictions they impose on 
access and
cacheing:

const  No writes through this lvalue. In the absence of this qualifier, 
writes may occur
   through this lvalue.

volatile   No cacheing through this lvalue: each operation in the abstract 
semantics must
   be performed (that is, no cacheing assumptions may be made, 
since the location
   is not guaranteed to contain any previous value). In the absence 
of this qualifier,
   the contents of the designated location may be assumed to be 
unchanged except
   for possible aliasing.

restrict   Objects referenced through a restrict-qualified pointer have a 
special
   association with that pointer. All references to that object 
must directly or
   indirectly use the value of this pointer. In the absence of this 
qualifier, other
   pointers can alias this object. Cacheing the value in an object 
designated through
   a restrict-qualified pointer is safe at the beginning of the 
block in which the
   pointer is declared, because no pre-existing aliases may also be 
used to reference
   that object. The cached value must be restored to the object by 
the end of the
   block, where pre-existing aliases again become available. New 
aliases may be
   formed within the block, but these must all depend on the value 
of the
   restrict-qualified pointer, so that they can be identified and 
adjusted to refer
   to the cached value. For a restrict-qualified pointer at file 
scope, the block
   is the body of main.

ciao
cate
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-rc8-mm1] Locking API boot-time self-tests hangs.

2008-01-18 Thread Andrew Morton
On Fri, 18 Jan 2008 14:59:32 +0800 Dave Young [EMAIL PROTECTED] wrote:

 On Jan 18, 2008 2:40 PM, Tetsuo Handa
 [EMAIL PROTECTED] wrote:
  Hello.
 
  Andrew Morton wrote:
   It could be compiler version dependent.  I used gcc-4.1.0.  Which version
   were you and Zan using please?
 
 I have the same problem, gcc 3.4.6
 
 
  I'm using gcc 3.3.5 on Debian/Sarge .
 
  By the way, 2.6.24-rc8-mm1 doesn't hang with a kernel config
 
  # make -s allnoconfig
  # make -s menuconfig
 
Set [*] to all entries in Kernel hacking  --- section
(except KGDB: kernel debugging with remote gdb).
 
  # make -s  make -s install

The only machine I have with gcc-3.x takes an amazing amount of time to
compile a quite minimal kernel then barfs in headers_install with

  CHK include/linux/version.h
make[1]: `scripts/unifdef' is up to date.
make[1]: *** No rule to make target `|', needed by `asm-generic'.  Stop.

which is a mainline bug, too.  ISTR someone else reported it but I don't
recall it getting fixed.

I think I'll give up on this.  Either someone else fixes it or we merge the
bug into mainline.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling

2008-01-18 Thread Michael Opdenacker
On 01/18/2008 04:16 AM, Taral wrote:
 On 1/17/08, Michael Opdenacker [EMAIL PROTECTED] wrote:
   
 Another issue would be that we would no longer be able
 to load the speaker driver module from a kernel which
 wasn't originally compiled with support for this module.
 

 Have you looked at pcspeaker.o? As far as I can tell, it does *nothing*.
   
Do you mean almost nothing? It still allocates and adds a platform
device, and the corresponding function always gets called at boot time.

I know that not compiling this piece of code just reduces the
uncompressed kernel size by just a few bytes (218). However, many small
contributions of this kind can have a significant impact on embedded
systems (or on boot media or on Linux based bootloaders).

As I said earlier, I'm starting to think that this trick should only be
used when CONFIG_EMBEDDED is set. In the non-embedded case, it's
probably not acceptable not to declare a platform device that is always
present in the system (while it's perfectly fine not to load the
corresponding driver).

Your comments and suggestions are more than welcome!

Michael.

-- 
Michael Opdenacker, Free Electrons
Free Embedded Linux Training Materials
on http://free-electrons.com/training
(More than 1500 pages!)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

2008-01-18 Thread Fengguang Wu
On Thu, Jan 17, 2008 at 11:51:51PM -0800, Michael Rubin wrote:
 On Jan 15, 2008 4:36 AM, Fengguang Wu [EMAIL PROTECTED] wrote:
  Andrew,
 
  This patchset mainly polishes the writeback queuing policies.
  The main goals are:
 
  (1) small files should not be starved by big dirty files
  (2) sync as fast as possible for not-blocked inodes/pages
  - don't leave them out; no congestion_wait() in between them
  (3) avoid busy iowait for blocked inodes
  - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
 
  The role of the queues:
 
  s_dirty:   park for dirtied_when expiration
  s_io:  park for io submission
  s_more_io: for big dirty inodes, they will be retried in this run of pdflush
 (it ensures fairness between small/large files)
  s_more_io_wait: for blocked inodes, they will be picked up in next run of 
  s_io
 
 Quick question to make sure I get this. Each queue is sorted as such:
 
 s_dirty - sorted by the dirtied_when field
 s_io - sorted by  no explicit key but by the order we want to process
 in sync_sb_inodes
 s_more_io - held for later they are sorted in the same manner as s_io
 
 Is that it?

Yes, exactly. s_io and s_more_io can be considered as one list broken
up into two - to provide the cursor for sequential iteration.
And s_more_io_wait is simply a container for blocked inodes.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: runqueue locks in schedule()

2008-01-18 Thread Nick Piggin
On Friday 18 January 2008 17:33, stephane eranian wrote:
 Nick,

  It is arch specific. If an architecture wants interrupts on during
  context switch, or runqueue unlocked, then they set it (btw
  INTERRUPTS_ON_CTXSW also implies UNLOCKED_CTXSW).

 Yes , I noticed that. I am only interested in UNLOCKED_CTXSW.
 But it appears that the approach suggested my Peter does work. We are
 running some tests.

OK, that might be OK.


  Although, eg on x86, you would hold off interrupts and runqueue lock for
  slightly less time if you defined those, it results in _slightly_ more
  complicated context switching... although I did once find a workload
  where the reduced runqueue contention improved throughput a bit, it is
  not much problem in general to hold the lock.

 By complicated you mean that now you'd have to make sure you don't
 need to access runqueue data?

Well, not speaking about the arch-specific code (which may involve
more complexities), but the core scheduler needs the
task_struct-oncpu variable wheras that isn't required if the
runqueue is locked while switching tasks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-18 Thread Jarek Poplawski
On Fri, Jan 18, 2008 at 09:00:34AM +0100, Jarek Poplawski wrote:
 On Fri, Jan 18, 2008 at 09:42:25AM +0800, Dave Young wrote:
 ...
  After digging the class usage code again, I found that the only
  possible double lock place is the class_interface_register/unregister
  in which the class_device api could be called.
 
 OK, but currently after using mostly:
 mutex_lock(parent_class)
 
 and once:
 mutex_lock_nested(parent_class, SINGLE_DEPTH_NESTING)
 
 lockdep mostly thinks these parent classes are 2 different objects,
 with only 2 possible levels of nesting, so this parent_class has
 to have wrong name (2 parents can't be locked from the same thread,
 so maybe it's class_grandparent sometimes?).

...Hmm... I was probably wrong: this could be right if there are only
two levels of nesting used and class locks it's parent only!

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Kamalesh Babulal
Hi Andrew,

Following oops was seen while running kernbench on one of test machine
(power4+ box). I tried reproducing the oops but was unsuccessful. 
I will try to reproduce the oops with debug info compiled.


Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 4570 LR: 0fc42dc0 CTR: 
REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
MSR: 80001000 ME  CR: 28022422  XER: 
DAR: c0077b6bfce0, DSISR: 0a00
TASK = c00773164c40[19588] 'as' THREAD: c0077b6bc000 CPU: 1
GPR00: 4000 c0077b6bfb40 7346 d032 
GPR04: 043a  000c 0004 
GPR08: 0fd278c8 48022424 c0077b6bfe30 998be2321500 
GPR12: 80001030 c05f6280 1003 1003 
GPR16: 1003 1005 1006aac0 10053cd0 
GPR20:  0fe0 1005 1005 
GPR24: 0ff8 0fe8 0062 0fd27490 
GPR28: 0fd274c8 10099420 0fd25ff4 1009a400 
NIP [4570] 0x4570
LR [0fc42dc0] 0xfc42dc0
Call Trace:
[c0077b6bfb40] [c0077b292000] 0xc0077b292000 (unreliable)
Instruction dump:
4800    41820008    
4810    f92101a0    

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: echo mem /sys/power/state

2008-01-18 Thread Ingo Molnar

* Rafael J. Wysocki [EMAIL PROTECTED] wrote:

  Probably it would be more efficient to have the people who wrote the 
  code also test it.
 
 Well, that would certainly help.
 
 I do test all of my patches and generally all of the patches I sign 
 off, but surely that's not enough.

please add a .config option dependent on CONFIG_DEBUG_KERNEL=y [and 
default-disabled] that auto-tests suspend/resume functionality 60 
seconds after hitting user-space (the suspend/resume cycle kept small 
via a small RTC timeout) and s2ram correctness will be tested _a lot_ 
more.

(it doesnt matter if graphics does not resume fine - at least for my 
tests)

kprobes had similar problems and it now has a few simple smoke-tests - 
which i just saw trigger on a patch that i did not notice would break 
kprobes. I think this should be done for all functionality that is not 
regularly triggered by a normal distro bootup (and which is easy to 
overlook in testing).

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: echo mem /sys/power/state

2008-01-18 Thread Andrew Morton
On Fri, 18 Jan 2008 09:36:10 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
   Probably it would be more efficient to have the people who wrote the 
   code also test it.
  
  Well, that would certainly help.
  
  I do test all of my patches and generally all of the patches I sign 
  off, but surely that's not enough.
 
 please add a .config option dependent on CONFIG_DEBUG_KERNEL=y [and 
 default-disabled] that auto-tests suspend/resume functionality 60 
 seconds after hitting user-space (the suspend/resume cycle kept small 
 via a small RTC timeout) and s2ram correctness will be tested _a lot_ 
 more.
 
 (it doesnt matter if graphics does not resume fine - at least for my 
 tests)
 
 kprobes had similar problems and it now has a few simple smoke-tests - 
 which i just saw trigger on a patch that i did not notice would break 
 kprobes. I think this should be done for all functionality that is not 
 regularly triggered by a normal distro bootup (and which is easy to 
 overlook in testing).
 

Seeing as we're so lame about being able to distribute userspace stuff:
create a shell script in /proc/rc.kernel and start teaching initscripts to
run it.  Then we can modify it at will.

I hate me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Andrew Morton
On Fri, 18 Jan 2008 14:06:00 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:

 Hi Andrew,
 
 Following oops was seen while running kernbench on one of test machine
 (power4+ box). I tried reproducing the oops but was unsuccessful. 
 I will try to reproduce the oops with debug info compiled.
 
 
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA pSeries
 Modules linked in:
 NIP: 4570 LR: 0fc42dc0 CTR: 
 REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
 MSR: 80001000 ME  CR: 28022422  XER: 
 DAR: c0077b6bfce0, DSISR: 0a00
 TASK = c00773164c40[19588] 'as' THREAD: c0077b6bc000 CPU: 1
 GPR00: 4000 c0077b6bfb40 7346 d032 
 GPR04: 043a  000c 0004 
 GPR08: 0fd278c8 48022424 c0077b6bfe30 998be2321500 
 GPR12: 80001030 c05f6280 1003 1003 
 GPR16: 1003 1005 1006aac0 10053cd0 
 GPR20:  0fe0 1005 1005 
 GPR24: 0ff8 0fe8 0062 0fd27490 
 GPR28: 0fd274c8 10099420 0fd25ff4 1009a400 
 NIP [4570] 0x4570
 LR [0fc42dc0] 0xfc42dc0
 Call Trace:
 [c0077b6bfb40] [c0077b292000] 0xc0077b292000 (unreliable)
 Instruction dump:
 4800    41820008    
 4810    f92101a0    
 

odd.  Where did the stack trace go?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: echo mem /sys/power/state

2008-01-18 Thread Ingo Molnar

* Jiri Slaby [EMAIL PROTECTED] wrote:

 On 01/17/2008 08:13 PM, Andrew Morton wrote:
 On Thu, 17 Jan 2008 10:36:51 -0700 Zan Lynx [EMAIL PROTECTED] wrote:
 Heh.  Laptop suspend to anything has been so broken for so long in the
 -mm series on my Compaq R3000 that I didn't even know it was ever
 supposed to work.

 It gets broken more often than anything else.  I do test each release on
 two laptops and I get to do a lot of bisection searching and
 grumpygramming as a result.

 Probably it would be more efficient to have the people who wrote the code
 also test it.

 Big fat ACK from here. Suspend issues in past few -mms were *very* 
 hard (and time consuming) to track down.

it's really all a matter of reducing latencies. Testing suspend/resume 
is manual work currently (it either needs me hit a key on the laptop or 
necessiates the use of a test-script that might or might not work 
depending on whether the new /dev/rtc driver is enabled). So few people 
besides those that rely on it will do it. The more a patch that breaks 
suspend is out in the open unidentified, the more damage it does: it 
gets into more trees, gets harder to bisect, etc.

So please give us overworked maintainers an easy to use .config option 
dependent on CONFIG_DEBUG_KERNEL=y that automatically triggers a simple 
suspend+resume sequence 60 seconds after bootup. It would be godsent. 
(dont worry about proper gx resume) I compile and boot up every patch i 
add to x86.git, so this would catch crap the moment we add it to the 
tree.

The other, more long-term trick is to make rarely used functionality 
more widely used. Consolidate code. Try to merge as much of 
suspend/resume with bootup/module-insert/shutdown sequences as possible. 
Suspend unused devices more agressively - such as non-mounted block 
devices or downed networking ports. Try create more network effects with 
other functionality, suspend and resume is not just about suspending 
laptops, it can/could be used for so much more stuff. Try to get Pavel's 
Sleepy Linux concept to work reasonably well - so that more people 
(including developers) would use it in a daily basis.

Test coverage of a given piece of code is a direct function of its 
utility and of its ease of testing. Decreeing this is important really 
wont get more testing done. What you should realize i think is that this 
is not a social/mindset problem (so no need to get frustrated about it), 
this is a mostly technology problem: you can gradually _code_ your way 
into people's test efforts.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 refuses to build [Re: 2.6.24-rc8-mm1]

2008-01-18 Thread Ingo Molnar

* Dhaval Giani [EMAIL PROTECTED] wrote:

 grepping around and looking through the code, I notice it is because 
 these variables just do not exist for 32 bit NUMA. I am not sure how 
 to go about it, and will just leave it to folks who know what they are 
 doing there :).

yes, Mike Travis has i think some patches in the works for this build 
problem. Disabling NUMA on 32-bit is the solution meanwhile.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] X86: fix typo PAT to X86_PAT

2008-01-18 Thread Yinghai Lu
On Friday 18 January 2008 12:10:40 am Ingo Molnar wrote:
 
 * Yinghai Lu [EMAIL PROTECTED] wrote:
 
   config MTRR
  bool MTRR (Memory Type Range Register) support
  -   depends on !PAT
  +   depends on !X86_PAT
  ---help---
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
 
 thanks. But, i think we should rather do the following: if X86_PAT is 
 eanbled then /proc/mtrr should be read-only. There's no problem 
 _looking_ at MTRR contents, as long as we do not try to modify them. Hm?

anyway 

depends on !PAT

need to be removed.

it seems when PAT is used, some code still touch MTRR.

YH
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: only call early_init_amd one time

2008-01-18 Thread Yinghai Lu
[PATCH] x86_64: only call early_init_amd one time

Andi's patch

x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection

Need this in the next patch in time_init and that happens early.

This includes a minor fix on i386 where early_intel_workarounds()
[which is now called early_init_intel] really executes early as
the comments say.

calling early_init_amd in early_identify_cpu and identify_cpu two times.

this patch remove the one in identify_cpu

Signed-off-by: Yinghai Lu [EMAIL PROTECTED]

diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index aeaa17d..d236593 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -1029,6 +1029,9 @@ static void __cpuinit early_identify_cpu(struct 
cpuinfo_x86 *c)
case X86_VENDOR_AMD:
early_init_amd(c);
break;
+   case X86_VENDOR_INTEL:
+   early_init_intel(c);
+   break;
}
 
 }
@@ -1095,14 +1098,6 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
numa_add_cpu(smp_processor_id());
 #endif
 
-   switch (c-x86_vendor) {
-   case X86_VENDOR_AMD:
-   early_init_amd(c);
-   break;
-   case X86_VENDOR_INTEL:
-   early_init_intel(c);
-   break;
-   }
 }
 
 void __cpuinit print_cpu_info(struct cpuinfo_x86 *c)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: echo mem /sys/power/state

2008-01-18 Thread Harvey Harrison
On Fri, 2008-01-18 at 00:47 -0800, Andrew Morton wrote:
 On Fri, 18 Jan 2008 09:36:10 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:
 
  
  * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
Probably it would be more efficient to have the people who wrote the 
code also test it.
   
   Well, that would certainly help.
   
   I do test all of my patches and generally all of the patches I sign 
   off, but surely that's not enough.
  
  please add a .config option dependent on CONFIG_DEBUG_KERNEL=y [and 
  default-disabled] that auto-tests suspend/resume functionality 60 
  seconds after hitting user-space (the suspend/resume cycle kept small 
  via a small RTC timeout) and s2ram correctness will be tested _a lot_ 
  more.
  
  (it doesnt matter if graphics does not resume fine - at least for my 
  tests)
  
  kprobes had similar problems and it now has a few simple smoke-tests - 
  which i just saw trigger on a patch that i did not notice would break 
  kprobes. I think this should be done for all functionality that is not 
  regularly triggered by a normal distro bootup (and which is easy to 
  overlook in testing).
  
 
 Seeing as we're so lame about being able to distribute userspace stuff:
 create a shell script in /proc/rc.kernel and start teaching initscripts to
 run it.  Then we can modify it at will.
 
 I hate me.

With all the discussion lately about boot-time smoketests and self-tests
maybe this kind of stuff would be a good first candidate for useful
new early-userspace functionality.  Then the kernel build could be
taught about building an initramfs that runs a bunch of tests
and leaves the user in a shell letting them know if it passed or not.

This would be a great way for increasing the number of testers, just
ask them to build with that option and a _known_ testsuite could
be reported as working or not.

Or maybe I'm out to lunch

Harvey

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mos7720. anyone actualy have it wokring?

2008-01-18 Thread Oliver Neukum
Am Freitag, 18. Januar 2008 00:31:26 schrieb mathewss:
 This driver for me does not work when i try to cat /dev/ttyUSB2
 it fails and when i try to run
 
 statserial /dev/ttyUSB2 
 statserial: TIOCMGET failed: Invalid argument

We are always looking for testers. Can you recompile with CONFIG_USB_DEBUG,
provide a log and do an strace of your applications?

Regards
Oliver
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Rename stack_pointer to kernel_trap_sp

2008-01-18 Thread Ingo Molnar

* Harvey Harrison [EMAIL PROTECTED] wrote:

   struct frame_head *head = (struct frame_head *)frame_pointer(regs);
 - unsigned long stack = stack_pointer(regs);
 + unsigned long stack = kernel_trap_sp(regs);

thanks, applied.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-18 Thread Dave Young
On Jan 18, 2008 4:23 PM, Jarek Poplawski [EMAIL PROTECTED] wrote:
 On Fri, Jan 18, 2008 at 03:48:02PM +0800, Dave Young wrote:
  On Jan 18, 2008 3:38 PM, Jarek Poplawski [EMAIL PROTECTED] wrote:
 ...
   IMHO, it would be nice to get the real state of current lockdep
   problems here to figure out if there is any chance to do this right 
   without any warnings with current lockdep. If I got it right from
   earlier threads it might be impossible with USB, at least.
 
  I don't think so, usb doesn't be affected by struct class mutex, they
  only use the lock of struct device. As I replied before, the lockdep
  issue exist only between class_interface and class_device.

 OK, but I've meant possibility of changing their own semaphores later.

   So, since I think these nesting levels seem to be wrong in 7/7 patch,
   maybe it's better to exclude it from this patchset, and to try this as
   testing for some time.
 
  I may file the updated patch with more nesting changes and test it of
  course. Actually I should have done it, thanks.
 ...
  1) Using CLASS_NORMAL/CLASS_PARENT/CLASS_CHILD will be enough.
  or
  2) Simply add SINGLE_LEVEL_NESTING in class_device_add and other
  class_device functions because it is the only possible nest-lock place
  as I know.

 If SINGLE_LEVEL_NESTING is enough? (means 2 levels total)

I think so.


 I think you should more care about real (logical) relations here, than
 what's enough to get rid of lockdep warnings.

You are quite right, thanks.


 Since there are not so much of these changes, you can try both
 variants.

Will do.

I'll be glad to look at this - maybe I'll mangage to figure
 out BTW, what it's all about...

 Jarek P.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 and boot lockup during locking self-test

2008-01-18 Thread Ingo Molnar

* Zan Lynx [EMAIL PROTECTED] wrote:

 Included Ingo in CC because I think he did the locking test.
 
 The following is copied from a different boot, but looks the same to my
 eye as what I got on the console:
 
 
 | Locking API testsuite:
 
  | spin |wlock |rlock |mutex | wsem | rsem |
   --
  A-A deadlock:  ok  |
 
 It just sticks there.  I assume that means a problem with wlock?

does the patch below from Nick fix it? It went upstream just after 
rc8-mm1 so rc8-mm1 might be missing it.

Ingo

--
commit 5a26db5bd25cf4bf32ae9fa9f6136b6b6d5b45c5
Author: Nick Piggin [EMAIL PROTECTED]
Date:   Wed Jan 16 09:51:58 2008 +0100

lockdep: fix internal double unlock during self-test
Lockdep, during self-test (when it was simulating double unlocks) was
sometimes unconditionally unlocking a spinlock when it had not been
locked. This won't work for ticket locks.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 723bd9f..4335f12 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2943,9 +2943,10 @@ void lockdep_free_key_range(void *start, unsigned long 
size)
struct list_head *head;
unsigned long flags;
int i;
+   int locked;
 
raw_local_irq_save(flags);
-   graph_lock();
+   locked = graph_lock();
 
/*
 * Unhash all classes that were created by this module:
@@ -2959,7 +2960,8 @@ void lockdep_free_key_range(void *start, unsigned long 
size)
zap_class(class);
}
 
-   graph_unlock();
+   if (locked)
+   graph_unlock();
raw_local_irq_restore(flags);
 }
 
@@ -2969,6 +2971,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
struct list_head *head;
unsigned long flags;
int i, j;
+   int locked;
 
raw_local_irq_save(flags);
 
@@ -2987,7 +2990,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 * Debug check: in the end all mapped classes should
 * be gone.
 */
-   graph_lock();
+   locked = graph_lock();
for (i = 0; i  CLASSHASH_SIZE; i++) {
head = classhash_table + i;
if (list_empty(head))
@@ -3000,7 +3003,8 @@ void lockdep_reset_lock(struct lockdep_map *lock)
}
}
}
-   graph_unlock();
+   if (locked)
+   graph_unlock();
 
 out_restore:
raw_local_irq_restore(flags);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-rc6-mm1]Build failure in drivers/net/ehea/ehea_main.c

2008-01-18 Thread Jan-Bernd Themann
Hi,

sorry for answering so late, I'm only tracking netdev and ppc mailing list.

On Thursday 10 January 2008 18:34, Greg KH wrote:
  The structure device_driver(in device.h) has a member struct driver_private 
  which
  contains the member kobj (according to drivers/base/base.h).
  But in device.h struct driver_private has been declared localy and 
  neither defined nor included from base.h.
  So my effort to use driver-driver_private-obj also does not work.
  (I am surprised from where do you access the struct device_driver)
 
 That is because a driver should not be accessing such a field.
 
 And especially not in this manner, why would this driver be creating a
 symlink that has already been created by the driver core?  This whole
 thing can just be removed with no problems.  Can you try just removing
 the ehea_driver_sysfs_add and ehea_driver_sysfs_remove functions to
 verify this as I don't have the hardware present to test it out.

The eHEA driver tries to orginize its sys-entries as close as possible to
other ethernet drivers. Each eHEA NIC has multiple ports which is not that
common in PCI. This means that each port is represented by a subdirectory
which has not the driver sys-link, only the root directory has.
Some tools expect to have this driver link in each port directory.
That is the reason why this link is created manually.

Are there any other ways to create this link?

Regards,
Jan-Bernd Themann + Christoph Raisch
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] x86: Use v8086_mode helper, trivial unification

2008-01-18 Thread Harvey Harrison
On Fri, 2008-01-18 at 10:12 +0100, Ingo Molnar wrote:
 * Harvey Harrison [EMAIL PROTECTED] wrote:
 
  Use v8086_mode inline in fault_32.c, no functional change also ifdef 
  the section for 32-bit only and add to fault_64.c
 
  -   if (regs-flags  VM_MASK) {
  +   if (v8086_mode(regs)) {
 
  --- a/arch/x86/mm/fault_64.c
  +++ b/arch/x86/mm/fault_64.c
  @@ -551,6 +551,16 @@ good_area:
  tsk-maj_flt++;
  else
  tsk-min_flt++;
  +
  +   /*
  +* Did it hit the DOS screen memory VA from vm86 mode?
  +*/
  +   if (v8086_mode(regs)) {
  +   unsigned long bit = (address - 0xA)  PAGE_SHIFT;
  +   if (bit  32)
  +   tsk-thread.screen_bitmap |= 1  bit;
  +   }
 
 hm, is there even vm86 mode in 64-bit? Anyway, gcc will eliminate it i 
 guess. I've applied your patch.
 

No, it doesn't mean anything to 64-bit, but helps make the diff a little
bit smaller, getting pretty close now.

Still needs a bit of work to introduce oops_begin/end from 64-bit to
32-bit in traps_32.c and introduce a bad_pgtable-like function to
32bit, then we're down to small differences between 32/64 bit
do_page_fault and vmalloc_sync_all that should be relatively clean
to harmonize.

Got distracted with the ptrace stuff today, but patch coming soon.

Harvey

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86: Reduce memory and intra-node effects with large count NR_CPUs fixup

2008-01-18 Thread Ingo Molnar

* Mike Travis [EMAIL PROTECTED] wrote:

 Hi Andrew,
 
 My automatic scripts accidentally sent this mail prematurely.  Please 
 hold off applying yet.

I've picked it up for x86.git and i'll keep testing it (the patches seem 
straightforward) and will report any problems with the bite-head-off 
option unset.

[ The 32-bit NUMA compile issue is orthogonal to these patches - it's 
  due to the lack of 32-bit NUMA support in your changes :) That needs 
  fixing before this could go into v2.6.25. ]

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-devel] Oops involving RFCOMM and sysfs

2008-01-18 Thread Cornelia Huck
On Fri, 18 Jan 2008 11:37:21 +0800,
Dave Young [EMAIL PROTECTED] wrote:


 Lets see the device_move function, seems there's some problems in it:
 
 1302 int device_move(struct device *dev, struct device *new_parent)
 1303 {
 1304 int error;
 1305 struct device *old_parent;
 1306 struct kobject *new_parent_kobj;
 1307
 1308 dev = get_device(dev);
 1309 if (!dev)
 1310 return -EINVAL;
 1311
 1312 new_parent = get_device(new_parent);
 1313 new_parent_kobj = get_device_parent (dev, new_parent);
 
 Here could get kobject reference

Eww. get_device_parent() may inflate the refcount in one case
for !CONFIG_SYSFS_DEPRECATED, but often won't. (And the function is
named confusingly, since it hints that we always get a reference, which
we don't.)

 
 1314 if (IS_ERR(new_parent_kobj)) {
 1315 error = PTR_ERR(new_parent_kobj);
 1316 put_device(new_parent);
 1317 goto out;
 1318 }
 1319 pr_debug(DEVICE: moving '%s' to '%s'\n, dev-bus_id,
 1320  new_parent ? new_parent-bus_id : NULL);
 1321 error = kobject_move(dev-kobj, new_parent_kobj);
 1322 if (error) {
 1323 put_device(new_parent);
 
 imagine new_parent is NULL, then the new_parent_kobj should be put

No, we would need a put_device_parent() (crappy name) which puts the
reference iff get_device_parent() grabbed it.

 
 1324 goto out;
 1325 }
 1326 old_parent = dev-parent;
 1327 dev-parent = new_parent;
 1328 if (old_parent)
 1329 klist_remove(dev-knode_parent);
 1330 if (new_parent)
 1331 klist_add_tail(dev-knode_parent,
 new_parent-klist_children);
 1332 if (!dev-class)
 1333 goto out_put;
 
 Why not put new_parent | new_parent_kobj?

Because that is the good case :)

 
 1334 error = device_move_class_links(dev, old_parent, new_parent);
 1335 if (error) {
 1336 /* We ignore errors on cleanup since we're hosed
 anyway... */
 1337 device_move_class_links(dev, new_parent, old_parent);
 1338 if (!kobject_move(dev-kobj, old_parent-kobj)) {
 1339 if (new_parent)
 1340 klist_remove(dev-knode_parent);
 1341 if (old_parent)
 1342 klist_add_tail(dev-knode_parent,
 1343
 old_parent-klist_children);
 1344 }
 1345 put_device(new_parent);
 
 Same doubt as above

We'd need put_device_parent() or whatever here as well, I guess.

 
 1346 goto out;
 1347 }
 1348 out_put:
 1349 put_device(old_parent);
 1350 out:
 1351 put_device(dev);
 1352 return error;
 1353 }
 
 Hope I'm wrong, but if it's indeed bugs, I will send a patch about this.

There are more problems, I'm afraid :( setup_parent() calls
get_device_parent() as well, so device_add() has the same problems on
error cleanup...

I'll take a look at it if I find some time, but I'm afraid I'll not
be able to do so before next week.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-rc8-mm1] Locking API boot-time self-tests hangs.

2008-01-18 Thread Tetsuo Handa
Hello.

I tried gcc 4.1.2 on Fedora 8 using
http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-rc8-mm1 .
Same result as gcc 3.3.5 on Debian/Sarge.

It seems kernel config (and possibly hardware) dependent
rather than gcc version dependent.

This is VMware workstation 6.0.0 on Thinkpad X60 (Core 2 Duo).

Regards.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] x86: Use v8086_mode helper, trivial unification

2008-01-18 Thread Ingo Molnar

* Harvey Harrison [EMAIL PROTECTED] wrote:

 Use v8086_mode inline in fault_32.c, no functional change also ifdef 
 the section for 32-bit only and add to fault_64.c

 - if (regs-flags  VM_MASK) {
 + if (v8086_mode(regs)) {

 --- a/arch/x86/mm/fault_64.c
 +++ b/arch/x86/mm/fault_64.c
 @@ -551,6 +551,16 @@ good_area:
   tsk-maj_flt++;
   else
   tsk-min_flt++;
 +
 + /*
 +  * Did it hit the DOS screen memory VA from vm86 mode?
 +  */
 + if (v8086_mode(regs)) {
 + unsigned long bit = (address - 0xA)  PAGE_SHIFT;
 + if (bit  32)
 + tsk-thread.screen_bitmap |= 1  bit;
 + }

hm, is there even vm86 mode in 64-bit? Anyway, gcc will eliminate it i 
guess. I've applied your patch.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2 -v2] kexec/i386: kexec page table code clean up - page table setup in C

2008-01-18 Thread Simon Horman
On Tue, Jan 15, 2008 at 02:05:49PM +0800, Huang, Ying wrote:
 This patch transforms the kexec page tables setup code from assembler
 code to C code in machine_kexec_prepare. This improves readability and
 reduces code line number.

This looks good to me.

Simon Horman [EMAIL PROTECTED]

 Signed-off-by: Huang Ying [EMAIL PROTECTED]

-- 
Horms

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86: add is_f00f_bug helper to fault_32|64.c

2008-01-18 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 Harvey Harrison [EMAIL PROTECTED] writes:
 
  Further towards unifying these files, add another helper in same 
  spirit as is_errata93.
 
 The better way to handle this would be to move all these workarounds 
 into notifiers that only get registered on the CPUs that actually have 
 the bugs.
 
 There is right now no die notifier in the right place for this, but 
 you could just add one there. This is no performance critical place.

agreed in principle, but i think it's perhaps a bit more maintainable if 
we first aimed for unification, then did such cleanups ontop of the 
unified code. Almost everything we do prior unification is double the 
work.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Paul Mackerras
Andrew Morton writes:

 On Fri, 18 Jan 2008 14:06:00 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:
 
  Hi Andrew,
  
  Following oops was seen while running kernbench on one of test machine
  (power4+ box). I tried reproducing the oops but was unsuccessful. 
  I will try to reproduce the oops with debug info compiled.
  
  
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=32 NUMA pSeries
  Modules linked in:
  NIP: 4570 LR: 0fc42dc0 CTR: 
  REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
  MSR: 80001000 ME  CR: 28022422  XER: 
  DAR: c0077b6bfce0, DSISR: 0a00
  TASK = c00773164c40[19588] 'as' THREAD: c0077b6bc000 CPU: 1
  GPR00: 4000 c0077b6bfb40 7346 d032 
  GPR04: 043a  000c 0004 
  GPR08: 0fd278c8 48022424 c0077b6bfe30 998be2321500 
  GPR12: 80001030 c05f6280 1003 1003 
  GPR16: 1003 1005 1006aac0 10053cd0 
  GPR20:  0fe0 1005 1005 
  GPR24: 0ff8 0fe8 0062 0fd27490 
  GPR28: 0fd274c8 10099420 0fd25ff4 1009a400 
  NIP [4570] 0x4570
  LR [0fc42dc0] 0xfc42dc0
  Call Trace:
  [c0077b6bfb40] [c0077b292000] 0xc0077b292000 (unreliable)
  Instruction dump:
  4800    41820008    
  4810    f92101a0    
  
 
 odd.  Where did the stack trace go?

It's there, it's just really really short (one line).  The link
register is in userspace and the stack pointer looks to be right at
the top of a kernel stack area.

The trap was a data access exception which is very odd given that the
machine is in real mode (MMU off) with the pc at 0x4570.  Actually it
looks like the machine probably got a data access exception somewhere
(probably in userspace, probably a page fault or similar) and then got
another exception before it had finished saving the state from the
first exception.

Kamalesh, do you still have the vmlinux?  If so could you disassemble
the area from say 0x4500 to 0x4600, and find out what is the closest
symbol before 0xc0004570 from System.map, and show us those?

Paul.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: kdump failure

2008-01-18 Thread Ingo Molnar

* Hiroshi Shimamoto [EMAIL PROTECTED] wrote:

 kdump needs ELF_CORE_COPY_REGS in crash_save_cpu(). This lack of the 
 macro causes the following BUG.
 
 SysRq : Trigger a crashdump
 [ cut here ]
 kernel BUG at include/linux/elfcore.h:105!
 invalid opcode:  [1] PREEMPT SMP

thanks, applied.

 +/*
 + * regs is struct pt_regs, pr_reg is elf_gregset_t (which is
 + * now struct_user_regs, they are different)
 + */
 +
 +#define ELF_CORE_COPY_REGS(pr_reg, regs) do {\

this macro got removed by the regset patches. Roland, any ideas?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kgdb:unify x86-kgdb

2008-01-18 Thread Jan Kiszka
Jason Wessel wrote:
 Jan Kiszka wrote:
 Jason Wessel wrote:
   
 Jan Kiszka wrote:
 
 diff -up arch/x86/kernel/kgdb_32.c arch/x86/kernel/kgdb_64.c

 screamed for unification. Here it is.

 Signed-off-by: Jan Kiszka [EMAIL PROTECTED]

 ---
  arch/x86/kernel/Makefile_32 |2 
  arch/x86/kernel/Makefile_64 |2 
  arch/x86/kernel/kgdb.c  |  561 
 
  arch/x86/kernel/kgdb_32.c   |  414 
  arch/x86/kernel/kgdb_64.c   |  495 --
  5 files changed, 563 insertions(+), 911 deletions(-)
   
   
 FYI this is already done in the head branch for kgdb which is going into
 the -mm tree.  You will see the new patch set in the 2.6.25-rcX series.

 There is now a single x86-lite.patch which contains a number of other
 modifications to the core-lite.patch to make use of
 probe_kernel_address() and a new function probe_kernel_write().  Also
 the die hooks for the no context memory faults were removed.
 
 Hmm, is this any kind of patch I should have seen in your kernel.org
 git? Which branch?
   

Meanwhile I realized that the actual head-of-development was some branch
in the corner of kgdb's cvs on sf.net - I hope this will change in the
future... :-

 I'll be posting an update to the -mm tree soon.  Presently, I have been
 updating the x86 patch set because the x86 developers provided some
 insight as to what needed to change to gain adoption into the mainline
 kernel.
 
 
 See:
 http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_x86

Thanks, will pick it up and continue to hammer against it.

 
 Right now the scope is only 8 patches worth.  The other archs will be
 patched in when I move the patches forward against the -mm tree.
 
 If you or anyone else would like to help, I'll be happy to forward a
 copy of the todo list :-)

Can't promise anything beyond x86 ATM, because that's the arch our
customer asked for. But I will look around here if someone happens to
twiddle thumbs. ;)

Jan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/2 -v2] kexec/i386: kexec page table code clean up - add arch_kimage

2008-01-18 Thread Simon Horman
On Tue, Jan 15, 2008 at 02:05:46PM +0800, Huang, Ying wrote:
 This patch add an architecture specific struct arch_kimage into struct
 kimage. Three pointers to page table pages used by kexec are added to
 struct arch_kimage. The page tables pages are dynamically allocated in
 machine_kexec_prepare instead of statically from BSS segment. This
 will save up to 20k memory when kexec image is not loaded.

I like this idea a lot.

Acked-by: Simon Horman [EMAIL PROTECTED]

 Signed-off-by: Huang Ying [EMAIL PROTECTED]

-- 
Horms

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-18 Thread Michael Rubin
On Jan 18, 2008 12:54 AM, David Chinner [EMAIL PROTECTED] wrote:
 At this point, I'd say it is best to leave it to the filesystem and
 the elevator to do their jobs properly.

Amen.

mrubin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Debug warning: early ioremap leak of 2 areas detected.

2008-01-18 Thread Ingo Molnar

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

 I got this when doing a test boot of a current x86 kernel under kvm.

sidenote, is this failure normal:

 acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed

?

the leaked ioremap seems to be:

 early_ioremap(2fff0a10, 0040) [1] = Pid: 1, comm: swapper Not tainted 
 2.6.24-rc8 #1877
 [c04e3913] early_ioremap+0x49/0x157
 [c0117057] __acpi_map_table+0x2f/0x31
 [c0279459] acpi_os_map_memory+0x1a/0x1c
 [c028b4fc] acpi_tb_verify_table+0x20/0x4d
 [c028acea] acpi_get_table+0x4a/0x91
 [c04ec303] acpi_processor_init+0x35/0xcf
 [c04d44b8] kernel_init+0x14f/0x2a5
 [c0107c12] ? ret_from_fork+0x6/0x1c
 [c04d4369] ? kernel_init+0x0/0x2a5
 [c04d4369] ? kernel_init+0x0/0x2a5
 [c01089f3] kernel_thread_helper+0x7/0x10
 ===
 0a10 + ffd4

 Debug warning: early ioremap leak of 2 areas detected.
 please boot with early_ioremap_debug and report the dmesg.

hm, why does it say 2? I only see a single backtrace in the dmesg you 
sent. ( Could you boot with ignore_loglevel to make sure you get all 
printks to the log? )

Btw., did the bootup otherwise go fine? The typical nesting is at most 2 
levels, and i've kept the max nesting at 4 so the typical 1-2 leaks 
should have no functional/correctness aspect on the bootup, just that 
warning message. Once we hit the 5th leaked ioremap we start rejecting 
early_ioremap()s and that might result in boot failures.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-rc8-mm1] Locking API boot-time self-tests hangs.

2008-01-18 Thread Ingo Molnar

* Dave Young [EMAIL PROTECTED] wrote:

 On Jan 18, 2008 2:40 PM, Tetsuo Handa
 [EMAIL PROTECTED] wrote:
  Hello.
 
  Andrew Morton wrote:
   It could be compiler version dependent.  I used gcc-4.1.0.  Which version
   were you and Zan using please?
 
 I have the same problem, gcc 3.4.6

is the patch below already in -rc8-mm1? hmm ... it appears it's not. 
Could you give it a try?

Ingo

-
commit 5a26db5bd25cf4bf32ae9fa9f6136b6b6d5b45c5
Author: Nick Piggin [EMAIL PROTECTED]
Date:   Wed Jan 16 09:51:58 2008 +0100

lockdep: fix internal double unlock during self-test
Lockdep, during self-test (when it was simulating double unlocks) was
sometimes unconditionally unlocking a spinlock when it had not been
locked. This won't work for ticket locks.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 723bd9f..4335f12 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2943,9 +2943,10 @@ void lockdep_free_key_range(void *start, unsigned long 
size)
struct list_head *head;
unsigned long flags;
int i;
+   int locked;
 
raw_local_irq_save(flags);
-   graph_lock();
+   locked = graph_lock();
 
/*
 * Unhash all classes that were created by this module:
@@ -2959,7 +2960,8 @@ void lockdep_free_key_range(void *start, unsigned long 
size)
zap_class(class);
}
 
-   graph_unlock();
+   if (locked)
+   graph_unlock();
raw_local_irq_restore(flags);
 }
 
@@ -2969,6 +2971,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
struct list_head *head;
unsigned long flags;
int i, j;
+   int locked;
 
raw_local_irq_save(flags);
 
@@ -2987,7 +2990,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 * Debug check: in the end all mapped classes should
 * be gone.
 */
-   graph_lock();
+   locked = graph_lock();
for (i = 0; i  CLASSHASH_SIZE; i++) {
head = classhash_table + i;
if (list_empty(head))
@@ -3000,7 +3003,8 @@ void lockdep_reset_lock(struct lockdep_map *lock)
}
}
}
-   graph_unlock();
+   if (locked)
+   graph_unlock();
 
 out_restore:
raw_local_irq_restore(flags);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: kdump failure

2008-01-18 Thread Subrata Modak
On Fri, 2008-01-18 at 10:02 +0100, Ingo Molnar wrote:
 * Hiroshi Shimamoto [EMAIL PROTECTED] wrote:
 
  kdump needs ELF_CORE_COPY_REGS in crash_save_cpu(). This lack of the 
  macro causes the following BUG.
  
  SysRq : Trigger a crashdump
  [ cut here ]
  kernel BUG at include/linux/elfcore.h:105!
  invalid opcode:  [1] PREEMPT SMP
 
 thanks, applied.
 
  +/*
  + * regs is struct pt_regs, pr_reg is elf_gregset_t (which is
  + * now struct_user_regs, they are different)
  + */
  +
  +#define ELF_CORE_COPY_REGS(pr_reg, regs) do {  \
 
 this macro got removed by the regset patches. Roland, any ideas?
 
   Ingo
Hi Ingo/Hiroshi Shimamoto,

There also has been a huge update on ltp-kdump test suite. You can find
the same @ http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kdump/,

--Subrata
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] module: add modinfo support for all built-in modules

2008-01-18 Thread rae l
On Jan 16, 2008 8:25 PM, Rusty Russell [EMAIL PROTECTED] wrote:
 I'd love to see patches.  module_parm showed it's possible, if messy.

 Thanks!
 Rusty.

here's the patch, I added .modinfo section to the vmlinux, to collect
built-in module information.

I have just define __MODULE_INFO to another meaning while
CONFIG_MODULES undefined
(modules compiled built-in), instead of nothing; and so the
MODULE_LICENSE, MODULE_AUTHOR,
MODULE_DESCRIPTION's meaning also changed, each macro would define one
struct kernel_modinfo
entry in the .modinfo section of vmlinux; and one __initcall converts
all these information to read-only
files under /sys/modules/module-name/...

but the MODULE_PARM_DESC macro is still different:
it generates entries with the same tag, that would confuse
sys_create_group, so I skipped them in the __initcall,
since the parameters had been in /sys/modules//parameters/(with perm
non-zero) or didn't appear(with perm 0);
I think the parameter description might be only useful for external
module files, not needed in memory(under /sys/module/),
so a better solution is define MODULE_PARM_DESC to nothing while
CONFIG_MODULES undefined.

Another possible defect is that it compares two modname with
(km-modname != modname),
that depends on a gcc feature: keep same constant string only one copy
in the image,
this did work on my test machines, but I'm not sure it's standard or
not; and if not, I would change it to strcmp.

Apperantly this approach will increase the kernel image size. on a
moderate system(with 1.8MB bzImage),
this patch would increase vmlinux 46KB and after compression increase
bzImage 9.2KB.
and in the increment of vmlinux, the .modinfo section occupied 5.3KB
and others are constant strings.

However, the iscsid can now work well when scsi_transport_iscsi module
built-in without the problem refered in my former email.

please give comments.

From 50831a260b1ad2c8b495854a58408c1fbc75a3fe Mon Sep 17 00:00:00 2001
From: Denis Cheng [EMAIL PROTECTED]
Date: Fri, 18 Jan 2008 16:37:35 +0800
Subject: [PATCH] module: add modinfo support for all built-in modules

the current modinfo support is for external modules only, it provided module
information under /sys/module/XYZ/, such as verion, ...;
now some application(such as iscsid of open-iscsi) has been designed to
use this module information; but built-in modules don't have modinfo support,
so these apps would break if modules they depend on are compiled built-in.

this patch add modinfo support for all built-in modules, so now no matter
whether modules they depends on are built-in or external, modules' information
could always be accessed from /sys/module/XYZ/version, apps won't break.

Signed-off-by: Denis Cheng [EMAIL PROTECTED]
---
 include/asm-generic/vmlinux.lds.h |7 ++
 include/linux/moduleparam.h   |   18 -
 kernel/module.c   |  147 +
 3 files changed, 170 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h
b/include/asm-generic/vmlinux.lds.h
index 9f584cc..896f0fe 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -137,6 +137,13 @@
VMLINUX_SYMBOL(__start___param) = .;\
*(__param)  \
VMLINUX_SYMBOL(__stop___param) = .; \
+   }   \
+   \
+   /* Built-in module information. */  \
+   .modinfo : AT(ADDR(.modinfo) - LOAD_OFFSET) {   \
+   VMLINUX_SYMBOL(__start___modinfo) = .;  \
+   *(.modinfo) \
+   VMLINUX_SYMBOL(__stop___modinfo) = .;   \
VMLINUX_SYMBOL(__end_rodata) = .;   \
}   \
\
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 13410b2..86ddbd4 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -13,16 +13,30 @@
 #define MODULE_PARAM_PREFIX KBUILD_MODNAME .
 #endif

-#ifdef MODULE
 #define ___module_cat(a,b) __mod_ ## a ## b
 #define __module_cat(a,b) ___module_cat(a,b)
+
+#ifdef MODULE
 #define __MODULE_INFO(tag, name, info)   \
 static const char __module_cat(name,__LINE__)[]
  \
   __attribute_used__ \
   __attribute__((section(.modinfo),unused)) = __stringify(tag) = info
 #else  /* !MODULE */
-#define __MODULE_INFO(tag, name, info)
+struct kernel_modinfo {
+   char *modname;
+   char *tag;
+   char *info;
+};
+#define 

Re: [2.6.24-rc8-mm1] Locking API boot-time self-tests hangs.

2008-01-18 Thread Dave Young
On Jan 18, 2008 5:38 PM, Tetsuo Handa
[EMAIL PROTECTED] wrote:
 Hello.

 Ingo Molnar wrote:
  is the patch below already in -rc8-mm1? hmm ... it appears it's not.
  Could you give it a try?

Yes , It fixes that problem.


 This patch solved this bug.

So quick :)


 Thank you.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [0/7] Some random x86 patches that should all go into git-x86

2008-01-18 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 Some are reposts, some are not. See patch descriptions for details. I 
 believe I addressed all feedback that made sense in the reposted 
 patches.

thanks Andi, i've picked them up.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 0/2] Fixing the issue with memory-mapped file times

2008-01-18 Thread Miklos Szeredi
 4. Performance test was done using the program available from the
 following link:
 
 http://bugzilla.kernel.org/attachment.cgi?id=14493
 
 Result: the impact of the changes was negligible for files of a few
 hundred megabytes.

Could you also test with ext4 and post some numbers?  Afaik, ext4 uses
nanosecond timestamps, so the time updating code would be exercised
more during the page faults.

What about performance impact on msync(MS_ASYNC)?  Could you please do
some measurment of that as well?

Thanks,
Miklos

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-rc8-mm1] Locking API boot-time self-tests hangs.

2008-01-18 Thread Tetsuo Handa
Hello.

Ingo Molnar wrote:
 is the patch below already in -rc8-mm1? hmm ... it appears it's not. 
 Could you give it a try?

This patch solved this bug.

Thank you.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 1/2] Massive code cleanup of sys_msync()

2008-01-18 Thread Miklos Szeredi
   unsigned long end;
 - struct mm_struct *mm = current-mm;
 + int error, unmapped_error;
   struct vm_area_struct *vma;
 - int unmapped_error = 0;
 - int error = -EINVAL;
 + struct mm_struct *mm;
  
 + error = -EINVAL;

I think you may have misunderstood my last comment.  These are OK:

struct mm_struct *mm = current-mm;
int unmapped_error = 0;
int error = -EINVAL;

This is not so good:

int error, unmapped_error;

This is the worst:

int error = -EINVAL, unmapped_error = 0;

So I think the original code is fine as it is.

Othewise patch looks OK now.

Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Andrew Morton writes:
 
 On Fri, 18 Jan 2008 14:06:00 +0530 Kamalesh Babulal [EMAIL PROTECTED] 
 wrote:

 Hi Andrew,

 Following oops was seen while running kernbench on one of test machine
 (power4+ box). I tried reproducing the oops but was unsuccessful. 
 I will try to reproduce the oops with debug info compiled.


 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA pSeries
 Modules linked in:
 NIP: 4570 LR: 0fc42dc0 CTR: 
 REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
 MSR: 80001000 ME  CR: 28022422  XER: 
 DAR: c0077b6bfce0, DSISR: 0a00
 TASK = c00773164c40[19588] 'as' THREAD: c0077b6bc000 CPU: 1
 GPR00: 4000 c0077b6bfb40 7346 d032 
 GPR04: 043a  000c 0004 
 GPR08: 0fd278c8 48022424 c0077b6bfe30 998be2321500 
 GPR12: 80001030 c05f6280 1003 1003 
 GPR16: 1003 1005 1006aac0 10053cd0 
 GPR20:  0fe0 1005 1005 
 GPR24: 0ff8 0fe8 0062 0fd27490 
 GPR28: 0fd274c8 10099420 0fd25ff4 1009a400 
 NIP [4570] 0x4570
 LR [0fc42dc0] 0xfc42dc0
 Call Trace:
 [c0077b6bfb40] [c0077b292000] 0xc0077b292000 (unreliable)
 Instruction dump:
 4800    41820008    
 4810    f92101a0    

 odd.  Where did the stack trace go?

Only this much was captured in the serial console.
 
 It's there, it's just really really short (one line).  The link
 register is in userspace and the stack pointer looks to be right at
 the top of a kernel stack area.
 
 The trap was a data access exception which is very odd given that the
 machine is in real mode (MMU off) with the pc at 0x4570.  Actually it
 looks like the machine probably got a data access exception somewhere
 (probably in userspace, probably a page fault or similar) and then got
 another exception before it had finished saving the state from the
 first exception.
 
 Kamalesh, do you still have the vmlinux?  If so could you disassemble
 the area from say 0x4500 to 0x4600, and find out what is the closest
 symbol before 0xc0004570 from System.map, and show us those?
 
 Paul.
 --
I tried reproducing the problem and was successful with following trace
in which the pc is at 0x4570 as the above one

 Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: 4570 LR: 0ff0288c CTR: 0ff013e0
REGS: c0077e61f8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
MSR: 80001000 ME  CR: 28000422  XER: 
DAR: c0077e61fce0, DSISR: 0a00
TASK = c0077207f880[23480] 'cc1' THREAD: c0077e61c000 CPU: 3
GPR00: 4000 c0077e61fb40 0088 d032 
GPR04: 0088 030c fefefeff 7f7f7f7f 
GPR08: 8000 44000428 c0077e61fe30 998be2321500 
GPR12: 80001030 c05f6680 1003 1003 
GPR16: 105b 105b 1044 105b 
GPR20: 105b 105b 105b 105b 
GPR24: 105b 105b 105b ffa11b24 
GPR28:   0ffebff4 0ffec408 
NIP [4570] 0x4570
LR [0ff0288c] 0xff0288c
Call Trace:
[c0077e61fb40] [c0077e61fcf0] 0xc0077e61fcf0 (unreliable)
[c0077e61fbd0] [1044] 0x1044
Instruction dump:
4800    41820008    
4810    f92101a0    

The disassembled vmlinux from 0x4500 to 0x4600 

c0004500:   f9 4d 01 68 std r10,360(r13)
c0004504:   48 02 89 f9 bl  c002cefc 
.slb_allocate_realmode
c0004508:   e9 4d 01 68 ld  r10,360(r13)
c000450c:   e8 6d 01 60 ld  r3,352(r13)
c0004510:   81 2d 01 5c lwz r9,348(r13)
c0004514:   7d 48 03 a6 mtlrr10
c0004518:   71 8a 00 02 andi.   r10,r12,2
c000451c:   41 82 00 28 beq-c0004544 unrecov_slb
c0004520:   7d 38 01 20 mtocrf  128,r9  
c0004524:   7d 30 11 20 mtocrf  1,r9
c0004528:   e9 2d 01 20 ld  r9,288(r13)
c000452c:   e9 4d 01 28 ld  r10,296(r13)
c0004530:   e9 6d 01 30 ld  r11,304(r13)
c0004534:   e9 8d 01 38 ld  r12,312(r13)
c0004538:   e9 ad 01 40 ld  

[PATCH] Documentation: mention email-clients.txt in SubmittingPatches

2008-01-18 Thread Michael Opdenacker
Applies to 2.6.24-rc8-git2

I was struggling to get my email-client no to mangle my patch files,
and I didn't find enough information in the SubmittingPatches file.
By looking for more information on the web, I eventually found the
email-clients.txt file, and it answered all my needs

This patch adds a reference to email-clients.txt in SubmittingPatches,
and Mozilla related information which is no longer accurate
(as opposed to the details found in email-clients.txt).

This should be helpful for people sending their first patches,
or not sending patches on a frequent basis.

Michael.

--
Signed-off-by: Michael Opdenacker [EMAIL PROTECTED]

diff -Naur linux-2.6.24-rc8-git2/Documentation/SubmittingPatches 
linux-2.6.24-rc8-git2-sp/Documentation/SubmittingPatches
--- linux-2.6.24-rc8-git2/Documentation/SubmittingPatches   2008-01-17 
09:48:56.0 +0100
+++ linux-2.6.24-rc8-git2-sp/Documentation/SubmittingPatches2008-01-18 
10:29:46.0 +0100
@@ -220,20 +220,8 @@
 Exception:  If your mailer is mangling patches then someone may ask
 you to re-send them using MIME.
 
-
-WARNING: Some mailers like Mozilla send your messages with
- message header 
-Content-Type: text/plain; charset=us-ascii; format=flowed
- message header 
-The problem is that format=flowed makes some of the mailers
-on receiving side to replace TABs with spaces and do similar
-changes. Thus the patches from you can look corrupted.
-
-To fix this just make your mozilla defaults/pref/mailnews.js file to look like:
-pref(mailnews.send_plaintext_flowed, false); // RFC 2646===
-pref(mailnews.display.disable_format_flowed_support, true);
-
-
+See Documentation/email-clients.txt for hints about configuring
+your e-mail client so that it sends your patches untouched.
 
 8) E-mail size.
 

-- 
Michael Opdenacker, Free Electrons
Free Embedded Linux Training Materials
on http://free-electrons.com/training
(More than 1500 pages!)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix wrong sized spinlock flags argument

2008-01-18 Thread Mauro Carvalho Chehab
On Thu, 17 Jan 2008 16:05:06 -0800
Daniel Walker [EMAIL PROTECTED] wrote:

 
 On Thu, 2008-01-17 at 15:48 -0800, Linus Torvalds wrote:
  Applied.
  
  However, the patch itself didn't apply cleanly, because in my souce tree, 
  these two lines are in a different order:
  
  On Thu, 17 Jan 2008, Daniel Walker wrote:

 pci_set_power_state(pci_dev, PCI_D0);
 pci_restore_state(pci_dev);
  
  but I actually think your order is the *correct* one (because I'm not at 
  all sure that config space writes are even guaranteed to make a difference 
  when in D3cold).
 
 I was actually using 2.6.24-rc8-mm1 . The code looked similar enough,
 but I must have overlooked the fact that the lines above got switched..

There were lots of change on saa7134, including the implementation of S1/S3.
I'll run some tests here fixing the order and apply Daniel's patch.

Cheers,
Mauro
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: setting jiffies as the clocksource stops time

2008-01-18 Thread Thomas Gleixner
On Fri, 18 Jan 2008, Balaji Rao wrote:

 On Friday 18 January 2008 04:04:33 am Jan Engelhardt wrote:
  
  On Jan 16 2008 13:20, Daniel Walker wrote:
  On Thu, 2008-01-17 at 02:09 +0530, Balaji Rao wrote:
   Hi,
   
   When i set jiffies as the current_clocksource, date(1) tells me
   that wallclock time has stopped, and soon after that, the system
   becomes unresponsive. This is not seen with CONFIG_NO_HZ disabled.
   
   I wonder how can jiffies be used as a clocksource.. Its value
   depends on the tick and when we turn off ticks, we would stop
   incrementing jiffies and when we come come out of idle, we update
   the jiffies by reading the current_clocksource which now is
   'jiffies', and hence jiffies wouldn't get updated. Could this be
   the explanation ?
  
  Your right, It can't be used as a clocksource with nohz , and the system
  will refuse to automatically switch to it ..
  
  I think that manually changing to jiffies by echoing into sysfs
  should also be prohibited.
  
 Yea, right. But why not unregister jiffies as a clocksource itself when we 
 get into NO_HZ ? I think it's much cleaner 
 provided it has no other consequences.

As I said before. I have a patch lined up for the same issue vs. PIT
clocksource and I'm adding that for jiffies as well. It's just not an
urgent issue, which needs to go into .24

Thanks

tglx

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24 patch] x86: allow 64bit setting in Kconfig

2008-01-18 Thread Harvey Harrison

On Fri, 2008-01-18 at 11:44 +0100, Ingo Molnar wrote:
 * Adrian Bunk [EMAIL PROTECTED] wrote:
 
   # Select 32 or 64 bit
   config 64BIT
  -   bool 64-bit kernel if ARCH = x86
  +   bool 64-bit kernel
  default ARCH = x86_64
  help
Say yes to build a 64-bit kernel - formerly known as x86_64
 
 thx, i've added this to x86.git.

Style question, would the following be preferred?

config 64BIT
def_bool ARCH = x86_64
prompt 64-bit kernel
help...


Cheers,

Harvey

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usb-serial: pl2303: add support for RATOC REX-USB60F

2008-01-18 Thread Akira Tsukamoto
pl2303: add support for RATOC REX-USB60F

This patch adds support for RATOC REX-USB60F Serial Adapters,
which is widely used in Japan recently.

Signed-off-by: Akira Tsukamoto [EMAIL PROTECTED]
---

diff -uprX dontdiff linux-2.6.24-rc8.orig/drivers/usb/serial/pl2303.c 
linux-2.6.24-rc8/drivers/usb/serial/pl2303.c
--- linux-2.6.24-rc8.orig/drivers/usb/serial/pl2303.c   2008-01-18 
18:11:51.0 +0900
+++ linux-2.6.24-rc8/drivers/usb/serial/pl2303.c2008-01-18 
18:43:28.0 +0900
@@ -65,6 +65,7 @@ static struct usb_device_id id_table [] 
{ USB_DEVICE(ITEGNO_VENDOR_ID, ITEGNO_PRODUCT_ID_2080) },
{ USB_DEVICE(MA620_VENDOR_ID, MA620_PRODUCT_ID) },
{ USB_DEVICE(RATOC_VENDOR_ID, RATOC_PRODUCT_ID) },
+   { USB_DEVICE(RATOC_VENDOR_ID, RATOC_PRODUCT_ID_USB60F) },
{ USB_DEVICE(TRIPP_VENDOR_ID, TRIPP_PRODUCT_ID) },
{ USB_DEVICE(RADIOSHACK_VENDOR_ID, RADIOSHACK_PRODUCT_ID) },
{ USB_DEVICE(DCU10_VENDOR_ID, DCU10_PRODUCT_ID) },
diff -uprX dontdiff linux-2.6.24-rc8.orig/drivers/usb/serial/pl2303.h 
linux-2.6.24-rc8/drivers/usb/serial/pl2303.h
--- linux-2.6.24-rc8.orig/drivers/usb/serial/pl2303.h   2008-01-18 
18:11:51.0 +0900
+++ linux-2.6.24-rc8/drivers/usb/serial/pl2303.h2008-01-18 
18:42:28.0 +0900
@@ -35,6 +35,7 @@
 
 #define RATOC_VENDOR_ID0x0584
 #define RATOC_PRODUCT_ID   0xb000
+#define RATOC_PRODUCT_ID_USB60F0xb020
 
 #define TRIPP_VENDOR_ID0x2478
 #define TRIPP_PRODUCT_ID   0x2008


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Balbir Singh
* Kamalesh Babulal [EMAIL PROTECTED] [2008-01-18 16:14:00]:

 Paul Mackerras wrote:
  Kamalesh Babulal writes:
  
  NIP: 4570 LR: 0fc42dc0 CTR: 
  REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  
  (2.6.24-rc8-mm1-autotest)
  MSR: 80001000 ME  CR: 28022422  XER: 
  DAR: c0077b6bfce0, DSISR: 0a00
  
  Actually, how much RAM does this machine have?  If it has less than
  32GB, then the problem is that the kernel stack pointer is bogus.
  (How it got to be bogus is the interesting question, of course. :)
  
  Paul.
  
 Hi Paul,
 
 The machine has around 30GB of RAM, do you want me to try, by taking 
 the git-powerpc.patch out of the series and try reproducing the oops.


Kamalesh, I thought I saw Paul's request for trying without
git-powerpc.patch (it's in a separate email). 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: remove casts

2008-01-18 Thread Ingo Molnar

* Jan Engelhardt [EMAIL PROTECTED] wrote:

 This is against x86/mm.

hm, it has checkpatch failures - the changes introduce some new 
whitespace damage. Patch looks good otherwise. (please resend against 
latest x86.git)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-18 Thread Kay Sievers
On Fri, 2008-01-18 at 08:38 +0100, Jarek Poplawski wrote:
 On Fri, Jan 18, 2008 at 01:31:17PM +0800, Dave Young wrote:
  On Jan 18, 2008 11:18 AM, Kay Sievers [EMAIL PROTECTED] wrote:
 ...
   Yeah, might be better to wait until class_device is gone, otherwise you
   may need to fix stuff that is just going to be removed. Your change to
   have iterators for the class devices look like a nice preparation for
   future changes though.
  
   Our rough plan is:
2.6.25:
 - get the ~100 patches in Greg's tree (in -mm) merged :)
2.6.26:
   ???  - remove the 20 char limit in struct device
 - get rid of struct class_device
  
  Fine, thanks.
  
  Let's wait for other people's comment.
 
 Dave, I doubt you'll ever manage to do this if you're going to wait:
 probably there will be always some new changes like this around...

Well there are not changes in that sense, the class_device stuff will
be entirely ripped out, and I doubt we will want to change anything
there, just shortly before it's deleted.

Also your assumptions about device nesting are not really true, there is
no limit, even when there are no current users nesting deeper, and
struct device can be any nesting depth, and that's where it gets
interesting.

Kay

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)

2008-01-18 Thread Balbir Singh
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:44:58]:

 On Fri, 2008-01-18 at 16:34 +1100, Michael Ellerman wrote:
  On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote:
   Changelog
   
   1. Get rid of the constant 5 (based on comments from
   [EMAIL PROTECTED])
   2. Implement suggestions from Olof Johannson
   3. Check if cmdline is NULL in fake_numa_create_new_node()
   
   Tested with additional parameters from Olof
   
   numa=debug,fake=
   numa=foo,fake=bar
  
  
  I'm not sure why yet, but git bisect tells me it's this patch that's
  causing the for-2.6.25 tree to explode on boot on cell machines.
 
 This fixes it, although I'm a little worried about some of the
 removals/movings of node_set_online() in the patch.
 
 
 diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
 index 1666e7d..dcedc26 100644
 --- a/arch/powerpc/mm/numa.c
 +++ b/arch/powerpc/mm/numa.c
 @@ -49,7 +49,6 @@ static int __cpuinit fake_numa_create_new_node(unsigned 
 long end_pfn,
   static unsigned int fake_nid = 0;
   static unsigned long long curr_boundary = 0;
  
 - *nid = fake_nid;
   if (!p)
   return 0;
  
 @@ -60,6 +59,7 @@ static int __cpuinit fake_numa_create_new_node(unsigned 
 long end_pfn,
   if (mem  curr_boundary)
   return 0;
  
 + *nid = fake_nid;
   curr_boundary = mem;
  
   if ((end_pfn  PAGE_SHIFT)  mem) {
 

This patch makes sense, ideally fake_numa_create_new_node() should
just be a no-op in the case of machines with real NUMA nodes.


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)

2008-01-18 Thread Balbir Singh
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:34:53]:

 On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote:
  Changelog
  
  1. Get rid of the constant 5 (based on comments from
  [EMAIL PROTECTED])
  2. Implement suggestions from Olof Johannson
  3. Check if cmdline is NULL in fake_numa_create_new_node()
  
  Tested with additional parameters from Olof
  
  numa=debug,fake=
  numa=foo,fake=bar
 
 
 I'm not sure why yet, but git bisect tells me it's this patch that's
 causing the for-2.6.25 tree to explode on boot on cell machines.


Hi,

Do you boot with numa=options on your machine? Could I have your
machine configuration? Any OOPS/log would be helpful.

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 0/2] Fixing the issue with memory-mapped file times

2008-01-18 Thread Anton Salikhmetov
2008/1/18, Miklos Szeredi [EMAIL PROTECTED]:
  4. Performance test was done using the program available from the
  following link:
 
  http://bugzilla.kernel.org/attachment.cgi?id=14493
 
  Result: the impact of the changes was negligible for files of a few
  hundred megabytes.

 Could you also test with ext4 and post some numbers?  Afaik, ext4 uses
 nanosecond timestamps, so the time updating code would be exercised
 more during the page faults.

 What about performance impact on msync(MS_ASYNC)?  Could you please do
 some measurment of that as well?

I'll do the measurements for the MS_ASYNC case and for the Ext4 filesystem.


 Thanks,
 Miklos


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/5] memory_pressure_notify() caller

2008-01-18 Thread Daniel Spång
On 1/17/08, KOSAKI Motohiro [EMAIL PROTECTED] wrote:
 Hi Daniel

   Thank you for good point out!
   Could you please post your test program and reproduced method?
 
  Sure:
 
  1. Fill almost all available memory with page cache in a system without 
  swap.
  2. Run attached alloc-test program.
  3. Notification fires when page cache is reclaimed.

 Unfortunately, I can't reproduce it.

 my machine
 CPU:Pentium4 2.8GHz with HT
 memory: 512M


 1. I doubt ZONE_DMA, please shipment ignore zone_dma patch(below).
 2. Could you please send your .config and /etc/sysctl.conf?
I hope more reproduce challenge.

 thanks.

 - kosaki




 Signed-off-by: KOSAKI Motohiro [EMAIL PROTECTED]

 ---
  include/linux/mem_notify.h |3 +++
  mm/page_alloc.c|6 +-
  2 files changed, 8 insertions(+), 1 deletion(-)

 Index: linux-2.6.24-rc6-mm1-memnotify/include/linux/mem_notify.h
 ===
 --- linux-2.6.24-rc6-mm1-memnotify.orig/include/linux/mem_notify.h
  2008-01-16 21:31:09.0 +0900
 +++ linux-2.6.24-rc6-mm1-memnotify/include/linux/mem_notify.h
 2008-01-16 21:34:24.0 +0900
 @@ -22,6 +22,9 @@ static inline void memory_pressure_notif
 unsigned long target;
 unsigned long pages_high, pages_free, pages_reserve;

 +   if (unlikely(zone-mem_notify_status == -1))
 +   return;
 +
 if (pressure) {
 target = atomic_long_read(last_mem_notify) + MEM_NOTIFY_FREQ;
 if (likely(time_before(jiffies, target)))
 Index: linux-2.6.24-rc6-mm1-memnotify/mm/page_alloc.c
 ===
 --- linux-2.6.24-rc6-mm1-memnotify.orig/mm/page_alloc.c 2008-01-13
 19:50:27.0 +0900
 +++ linux-2.6.24-rc6-mm1-memnotify/mm/page_alloc.c  2008-01-16
 21:41:58.0 +0900
 @@ -3467,7 +3467,11 @@ static void __meminit free_area_init_cor
 zone-zone_pgdat = pgdat;

 zone-prev_priority = DEF_PRIORITY;
 -   zone-mem_notify_status = 0;
 +
 +   if (zone-present_pages  (pgdat-node_present_pages / 10))
 +   zone-mem_notify_status = -1;
 +   else
 +   zone-mem_notify_status = 0;

 zone_pcp_init(zone);
 INIT_LIST_HEAD(zone-active_list);

Your patch above solves the problem I had with early notification.

Cheers,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Peter Zijlstra

On Fri, 2008-01-18 at 11:15 +0100, Peter Zijlstra wrote:
 On Fri, 2008-01-18 at 10:51 +0100, Miklos Szeredi wrote:
 
   diff --git a/mm/msync.c b/mm/msync.c
   index a4de868..a49af28 100644
   --- a/mm/msync.c
   +++ b/mm/msync.c
   @@ -13,11 +13,33 @@
#include linux/syscalls.h

/*
   + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
   + * It will force a pagefault on the next write access.
   + */
   +static void vma_wrprotect(struct vm_area_struct *vma)
   +{
   + unsigned long addr;
   +
   + for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
   + spinlock_t *ptl;
   + pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
   + pud_t *pud = pud_offset(pgd, addr);
   + pmd_t *pmd = pmd_offset(pud, addr);
   + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
   +
   + if (pte_dirty(*pte)  pte_write(*pte))
   + *pte = pte_wrprotect(*pte);
   + pte_unmap_unlock(pte, ptl);
   + }
   +}
  
  What about ram based filesystems?  They don't start out with read-only
  pte's, so I think they don't want them read-protected now either.
  Unless this is essential for correct mtime/ctime accounting on these
  filesystems (I don't think it really is).  But then the mapping should
  start out read-only as well, otherwise the time update will only work
  after an msync(MS_ASYNC).
 
 page_mkclean() has all the needed logic for this, it also walks the rmap
 and cleans out all other users, which I think is needed too for
 consistencies sake:
 
 Process A Process B
 
 mmap(foo.txt) mmap(foo.txt)
 
 dirty page
   dirty page
 
 msync(MS_ASYNC)
 
   dirty page
 
 msync(MS_ASYNC) --- now what?!
 
 
 So what I would suggest is using the page table walkers from mm, and
 walks the page range, obtain the page using vm_normal_page() and call
 page_mkclean(). (Oh, and ensure you don't nest the pte lock :-)
 
 All in all, that sounds rather expensive..

Bah, and will break on s390... so we'd need a page_mkclean() variant
that doesn't actually clear dirty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] module: add modinfo support for all built-in modules

2008-01-18 Thread Dave Young
On Jan 18, 2008 5:54 PM, rae l [EMAIL PROTECTED] wrote:
 On Jan 16, 2008 8:25 PM, Rusty Russell [EMAIL PROTECTED] wrote:
  I'd love to see patches.  module_parm showed it's possible, if messy.
 
  Thanks!
  Rusty.

 here's the patch, I added .modinfo section to the vmlinux, to collect
 built-in module information.

 I have just define __MODULE_INFO to another meaning while
 CONFIG_MODULES undefined
 (modules compiled built-in), instead of nothing; and so the
 MODULE_LICENSE, MODULE_AUTHOR,
 MODULE_DESCRIPTION's meaning also changed, each macro would define one
 struct kernel_modinfo
 entry in the .modinfo section of vmlinux; and one __initcall converts
 all these information to read-only
 files under /sys/modules/module-name/...

 but the MODULE_PARM_DESC macro is still different:
 it generates entries with the same tag, that would confuse
 sys_create_group, so I skipped them in the __initcall,
 since the parameters had been in /sys/modules//parameters/(with perm
 non-zero) or didn't appear(with perm 0);
 I think the parameter description might be only useful for external
 module files, not needed in memory(under /sys/module/),
 so a better solution is define MODULE_PARM_DESC to nothing while
 CONFIG_MODULES undefined.

 Another possible defect is that it compares two modname with
 (km-modname != modname),
 that depends on a gcc feature: keep same constant string only one copy
 in the image,
 this did work on my test machines, but I'm not sure it's standard or
 not; and if not, I would change it to strcmp.

 Apperantly this approach will increase the kernel image size. on a
 moderate system(with 1.8MB bzImage),
 this patch would increase vmlinux 46KB and after compression increase
 bzImage 9.2KB.
 and in the increment of vmlinux, the .modinfo section occupied 5.3KB
 and others are constant strings.

 However, the iscsid can now work well when scsi_transport_iscsi module
 built-in without the problem refered in my former email.

 please give comments.

 From 50831a260b1ad2c8b495854a58408c1fbc75a3fe Mon Sep 17 00:00:00 2001
 From: Denis Cheng [EMAIL PROTECTED]
 Date: Fri, 18 Jan 2008 16:37:35 +0800
 Subject: [PATCH] module: add modinfo support for all built-in modules

 the current modinfo support is for external modules only, it provided module
 information under /sys/module/XYZ/, such as verion, ...;
 now some application(such as iscsid of open-iscsi) has been designed to
 use this module information; but built-in modules don't have modinfo support,
 so these apps would break if modules they depend on are compiled built-in.

 this patch add modinfo support for all built-in modules, so now no matter
 whether modules they depends on are built-in or external, modules' information
 could always be accessed from /sys/module/XYZ/version, apps won't break.

 Signed-off-by: Denis Cheng [EMAIL PROTECTED]
 ---
  include/asm-generic/vmlinux.lds.h |7 ++
  include/linux/moduleparam.h   |   18 -
  kernel/module.c   |  147 
 +
  3 files changed, 170 insertions(+), 2 deletions(-)

 diff --git a/include/asm-generic/vmlinux.lds.h
 b/include/asm-generic/vmlinux.lds.h
 index 9f584cc..896f0fe 100644
 --- a/include/asm-generic/vmlinux.lds.h
 +++ b/include/asm-generic/vmlinux.lds.h
 @@ -137,6 +137,13 @@
 VMLINUX_SYMBOL(__start___param) = .;\
 *(__param)  \
 VMLINUX_SYMBOL(__stop___param) = .; \
 +   }   \
 +   \
 +   /* Built-in module information. */  \
 +   .modinfo : AT(ADDR(.modinfo) - LOAD_OFFSET) {   \
 +   VMLINUX_SYMBOL(__start___modinfo) = .;  \
 +   *(.modinfo) \
 +   VMLINUX_SYMBOL(__stop___modinfo) = .;   \
 VMLINUX_SYMBOL(__end_rodata) = .;   \
 }   \
 \
 diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
 index 13410b2..86ddbd4 100644
 --- a/include/linux/moduleparam.h
 +++ b/include/linux/moduleparam.h
 @@ -13,16 +13,30 @@
  #define MODULE_PARAM_PREFIX KBUILD_MODNAME .
  #endif

 -#ifdef MODULE
  #define ___module_cat(a,b) __mod_ ## a ## b
  #define __module_cat(a,b) ___module_cat(a,b)
 +
 +#ifdef MODULE
  #define __MODULE_INFO(tag, name, info)   \
  static const char __module_cat(name,__LINE__)[]  
 \
__attribute_used__ \
__attribute__((section(.modinfo),unused)) = __stringify(tag) = info
  #else  /* !MODULE */
 

Re: HPET timer broken using 2.6.23.13 / nanosleep() hangs

2008-01-18 Thread Thomas Gleixner
On Wed, 16 Jan 2008, Andrew Paprocki wrote:

 I applied the patch and I am still locking up after
 Time: hpet clocksource has been installed.

That was expected :)
 
 I rebooted with clocksource=tsc to get the logs of the trace which
 was added. I'm assuming the grep below gets all the interesting parts.
 I enabled the HPET character device as mentioned before, which is why
 the hpet0 lines appear now.
 
 # dmesg | egrep -i (hpet|time|clock)
 ACPI: HPET 37FE7400, 0038 (r1 RS690  AWRDACPI 42302E31 AWRD   98)
 ATI board detected. Disabling timer routing over 8254.
 ACPI: PM-Timer IO Port: 0x4008
 ACPI: HPET id: 0x10b9a201 base: 0xfed0
 Kernel command line: vga=0x31a root=/dev/sda1 ro clocksource=tsc
 HPET check: t1=5 t2=1139 s=56226339975 n=56226539985

Ok, the counter works when we initialize the HPET.

t2-t1 = 1134 ticks ~= 79us
s-n = 200010 ~= 2525MHz -- That should be the frequency of your CPU.

 Jan 16 14:44:43 am2 kernel: Call Trace:
 Jan 16 14:44:48 am2 kernel:  [c01371be] enqueue_hrtimer+0xd7/0xe2
 Jan 16 14:44:48 am2 kernel:  [c0137803] hrtimer_start+0xe8/0xf4
 Jan 16 14:44:48 am2 kernel:  [c03ac8d3] do_nanosleep+0x48/0x73
 Jan 16 14:44:48 am2 kernel:  [c03ac932] hrtimer_nanosleep_restart+0x34/0xa1
 Jan 16 14:44:48 am2 kernel:  [c013735d] hrtimer_wakeup+0x0/0x18
 Jan 16 14:44:48 am2 kernel:  [c012e837] sys_restart_syscall+0xe/0xf
 Jan 16 14:44:48 am2 kernel:  [c0103d0a] sysenter_past_esp+0x5f/0x85

When the system is hung, can you please hit SysRq-Q wait a bit and hit
SysRq-Q again. Please provide the output.

Thanks,
tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 1/2] Massive code cleanup of sys_msync()

2008-01-18 Thread Anton Salikhmetov
2008/1/18, Miklos Szeredi [EMAIL PROTECTED]:
unsigned long end;
  - struct mm_struct *mm = current-mm;
  + int error, unmapped_error;
struct vm_area_struct *vma;
  - int unmapped_error = 0;
  - int error = -EINVAL;
  + struct mm_struct *mm;
 
  + error = -EINVAL;

 I think you may have misunderstood my last comment.  These are OK:

 struct mm_struct *mm = current-mm;
 int unmapped_error = 0;
 int error = -EINVAL;

 This is not so good:

 int error, unmapped_error;

 This is the worst:

 int error = -EINVAL, unmapped_error = 0;

 So I think the original code is fine as it is.

 Othewise patch looks OK now.

I moved the initialization of the variables to the code where they are needed.

I don't agree that int a; int b; is better than int a, b.


 Miklos

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Peter Zijlstra

On Fri, 2008-01-18 at 12:17 +0100, Miklos Szeredi wrote:
  diff --git a/mm/msync.c b/mm/msync.c
  index 144a757..a1b3fc6 100644
  --- a/mm/msync.c
  +++ b/mm/msync.c
  @@ -14,6 +14,122 @@
   #include linux/syscalls.h
   #include linux/sched.h
   
  +unsigned long masync_pte_range(struct vm_area_struct *vma, pmd_t *pdm,
  +   unsigned long addr, unsigned long end)
  +{
  +   pte_t *pte;
  +   spinlock_t *ptl;
  +
  +   pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
  +   arch_enter_lazy_mmu_mode();
  +   do {
  +   pte_t ptent = *pte;
  +
  +   if (pte_none(ptent))
  +   continue;
  +
  +   if (!pte_present(ptent))
  +   continue;
  +
  +   if (pte_dirty(ptent)  pte_write(ptent)) {
  +   flush_cache_page(vma, addr, pte_pfn(ptent));
 
 Hmm, I'm not sure flush_cache_page() is needed.  Or does does dirty
 data in the cache somehow interfere with the page protection?

No, just being paranoid..

  +   ptent = ptep_clear_flush(vma, addr, pte);
  +   ptent = pte_wrprotect(ptent);
  +   set_pte_at(vma-vm_mnm, addr, pte, ptent);
  +   }
  +   } while (pte++, addr += PAGE_SIZE, addr != end);
  +   arch_leave_lazy_mmu_mode();
  +   pte_unmap_unlock(pte - 1, ptl);
  +
  +   return addr;
  +}
  +
  +unsigned long masync_pmd_range(struct vm_area_struct *vma, pud_t *pud,
  +   unsigned long addr, unsigned long end)
  +{
  +   pmd_t *pmd;
  +   unsigned long next;
  +
  +   pmd = pmd_offset(pud, addr);
  +   do {
  +   next = pmd_addr_end(addr, end);
  +   if (pmd_none_or_clear_bad(pmd))
  +   continue;
  +   next = masync_pte_range(vma, pmd, addr, next);
  +   } while (pmd++, addr = next, addr != end);
  +
  +   return addr;
  +}
  +
  +unsigned long masync_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
  +   unsigned long addr, unsigned long end)
  +{
  +   pud_t *pud;
  +   unsigned long next;
  +
  +   pud = pud_offset(pgd, addr);
  +   do {
  +   next = pud_addr_end(addr, end);
  +   if (pud_none_or_clear_bad(pud))
  +   continue;
  +   next = masync_pmd_range(vma, pud, addr, next);
  +   } while (pud++, addr = next, addr != end);
  +
  +   return addr;
  +}
  +
  +unsigned long masync_pgd_range()
  +{
  +   pgd_t *pgd;
  +   unsigned long next;
  +
  +   pgd = pgd_offset(vma-vm_mm, addr);
  +   do {
  +   next = pgd_addr_end(addr, end);
  +   if (pgd_none_of_clear_bad(pgd))
  +   continue;
  +   next = masync_pud_range(vma, pgd, addr, next);
  +   } while (pgd++, addr = next, addr != end);
  +
  +   return addr;
  +}
  +
  +int masync_vma_one(struct vm_area_struct *vma,
  +   unsigned long start, unsigned long end)
  +{
  +   if (start  vma-vm_start)
  +   start = vma-vm_start;
  +
  +   if (end  vma-vm_end)
  +   end = vma-vm_end;
  +
  +   masync_pgd_range(vma, start, end);
  +
  +   return 0;
  +}
  +
  +int masync_vma(struct vm_area_struct *vma, 
  +   unsigned long start, unsigned long end)
  +{
  +   struct address_space *mapping;
  +   struct vm_area_struct *vma_iter;
  +
  +   if (!(vma-vm_flags  VM_SHARED))
  +   return 0;
  +
  +   mapping = vma-vm_file-f_mapping;
  +
  +   if (!mapping_cap_account_dirty(mapping))
  +   return 0;
  +
  +   spin_lock(mapping-i_mmap_lock);
  +   vma_prio_tree_foreach(vma_iter, iter, mapping-i_mmap, start, end)
  +   masync_vma_one(vma_iter, start, end);
  +   spin_unlock(mapping-i_mmap_lock);
 
 This is hoding i_mmap_lock for possibly quite long.  Isn't that going
 to cause problems?

Possibly, I didn't see a quick way to break that iteration.
From a quick glance at prio_tree.c the iterator isn't valid anymore
after releasing i_mmap_lock. Fixing that would be,.. 'fun'.

I also realized I forgot to copy/paste the prio_tree_iter declaration
and ought to make all these functions static.

But for a quick draft it conveys the idea pretty well, I guess :-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24 patch] x86: allow 64bit setting in Kconfig

2008-01-18 Thread Ingo Molnar

* Harvey Harrison [EMAIL PROTECTED] wrote:

 
 On Fri, 2008-01-18 at 11:44 +0100, Ingo Molnar wrote:
  * Adrian Bunk [EMAIL PROTECTED] wrote:
  
# Select 32 or 64 bit
config 64BIT
   - bool 64-bit kernel if ARCH = x86
   + bool 64-bit kernel
 default ARCH = x86_64
 help
   Say yes to build a 64-bit kernel - formerly known as x86_64
  
  thx, i've added this to x86.git.
 
 Style question, would the following be preferred?
 
 config 64BIT
   def_bool ARCH = x86_64
   prompt 64-bit kernel
   help...

sure, we could do that too.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Giacomo A. Catenazzi

Jakob Oestergaard wrote:

On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote:
...

Why do you make that mistake, when it is PROVABLY NOT TRUE!

Try this trivial program:

int main(int argc, char **argv)
{
int i;
const int *c;

i = 5;
c = i;
i = 10;
return *c;
}

and realize that according to the C rules, if it returns anything but 10, 
the compiler is *buggy*.


That's not how this works (as we obviously agree).

Please consider a rewrite of your example, demonstrating the usefulness and
proper application of const pointers:

extern foo(const int *);

int main(int argc, char **argv)
{
 int i;

 i = 5;
 foo(i);
 return i;
}

Now, if the program returns anything else than 5, it means someone cast away
const, which is generally considered a bad idea in most other software
projects, for this very reason.

*That* is the purpose of const pointers.


restrict exists for this reason. const is only about lvalue.

You should draw a line, not to make C more complex!

Changing the name of variables in your example:

extern print_int(const int *);

int main(int argc, char **argv)
{
  extern int errno;

  errno = 0;
  print_int(i);
  return errno;
}

print_int() doesn't know that errno is also the argument.
and this compilation unit doesn't know that print_int() will
modify errno.

Ok, I changed int to extern int, but you see the point?
Do you want complex rules about const, depending on
context (extern, volatile,...) ?

ciao
cate
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24 patch] x86: allow 64bit setting in Kconfig

2008-01-18 Thread Sam Ravnborg
On Fri, Jan 18, 2008 at 02:50:48AM -0800, Harvey Harrison wrote:
 
 On Fri, 2008-01-18 at 11:44 +0100, Ingo Molnar wrote:
  * Adrian Bunk [EMAIL PROTECTED] wrote:
  
# Select 32 or 64 bit
config 64BIT
   - bool 64-bit kernel if ARCH = x86
   + bool 64-bit kernel
 default ARCH = x86_64
 help
   Say yes to build a 64-bit kernel - formerly known as x86_64
  
  thx, i've added this to x86.git.
 
 Style question, would the following be preferred?
 
 config 64BIT
   def_bool ARCH = x86_64
   prompt 64-bit kernel
   help...

No.
It is most common to let the prompt follow the type and not
as a separate property.

Sam
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] update checkpatch.pl to version 0.13

2008-01-18 Thread Andy Whitcroft
On Thu, Jan 17, 2008 at 11:19:23AM -0800, Andrew Morton wrote:
 On Thu, 17 Jan 2008 16:23:51 - Andy Whitcroft [EMAIL PROTECTED] wrote:
 
  This version brings a large number of fixes which have built up over
  the Christmas period.  Mostly these are fixes for false positives, both
  through improvments to unary checks and possible type detection.  It
  also brings new checks for while location and CVS keywords.
 
 heh.  Doctor, heal thyself.

Heh, yeah I was feeling pressure to push out the update and forgot to
check it.  Spanner.

I have fixed the three lines which have random tabs on them.  Its
something I do in vi which is adding them, one day I will figure out
what the heck it is I do.

-apw

---
clean up some space violations in checkpatch.pl

Seems that something I do in vi leaves lines with multiple tabs on
them lying about.  Clean these up before edit things more.

Signed-off-by: Andy Whitcroft [EMAIL PROTECTED]
---
 checkpatch.pl |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/checkpatch.pl b/checkpatch.pl
index 07ba401..a2b4c41 100755
--- a/checkpatch.pl
+++ b/checkpatch.pl
@@ -341,7 +341,7 @@ sub sanitise_line {
my $clean = 'X' x length($1);
$res =~ s@(#\s*(?:error|warning)\s+)[EMAIL PROTECTED]@;
}
-   
+
return $res;
 }
 
@@ -947,7 +947,7 @@ sub process {
if ($realcnt) {
# Ignore goto labels.
if ($line =~ /$Ident:\*$/) {
-   
+
# Ignore functions being called
} elsif ($line =~ /^.\s*$Ident\s*\(/) {
 
@@ -1190,7 +1190,7 @@ sub process {
 
# Ignore those directives where spaces _are_ permitted.
if ($name =~ 
/^(?:if|for|while|switch|return|volatile|__volatile__|__attribute__|format|__extension__|Copyright|case)$/)
 {
-   
+
# cpp #define statements have non-optional spaces, ie
# if there is a space between the name and the open
# parenthesis it is simply not a parameter group.
-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk deadlocks if called with runqueue lock held

2008-01-18 Thread Steven Rostedt

On Fri, 18 Jan 2008, Jiri Kosina wrote:

 If this patch is going to be merged, you should perhaps adjust the comment
 introduced by the above mentioned commit, so that it reflects the new
 behavior.

Thanks for pointing this out. Updated patch below:

-- Steve
=

I thought that one could place a printk anywhere without worrying.
But it seems that it is not wise to place a printk where the runqueue
lock is held.

I just spent two hours debugging why some of my code was locking up,
to find that the lockup was caused by some debugging printk's that
I had in the scheduler.  The printk's were only in rare paths so
they shouldn't be too much of a problem, but after I hit the printk
the system locked up.

Thinking that it was locking up on my code I went looking down the
wrong path. I finally found (after examining an NMI dump) that
the lockup happened because printk was trying to wakeup the klogd
daemon, which caused a deadlock when the try_to_wakeup code tries
to grab the runqueue lock.

Since printks are seldom called with interrupts disabled, we can
hold off the waking of klogd if they are. We don't have access to
the runqueue locks from printk, but those locks need interrupts
disabled in order to be held.

Calling printk with interrupts disabled should only be done for
emergencies and debugging anyway.

And with this patch, my code ran fine ;-)

Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 kernel/printk.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

Index: linux-mcount.git/kernel/printk.c
===
--- linux-mcount.git.orig/kernel/printk.c   2008-01-18 06:29:15.0 
-0500
+++ linux-mcount.git/kernel/printk.c2008-01-18 06:32:38.0 -0500
@@ -595,9 +595,11 @@ static int have_callable_console(void)
  * @fmt: format string
  *
  * This is printk().  It can be called from any context.  We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
+ *
+ * Note: if printk() is called with interrupts disabled, it will not wake
+ * up the klogd. This is to avoid a deadlock from calling printk() in schedule
+ * with the runqueue lock held and having the wake_up grab the runqueue lock
+ * as well.
  *
  * We try to grab the console_sem.  If we succeed, it's easy - we log the 
output and
  * call the console drivers.  If we fail to get the semaphore we place the 
output
@@ -978,7 +980,13 @@ void release_console_sem(void)
console_locked = 0;
up(console_sem);
spin_unlock_irqrestore(logbuf_lock, flags);
-   if (wake_klogd)
+   /*
+* If we try to wake up klogd while printing with the runqueue lock
+* held, this will deadlock. We don't have access to the runqueue
+* lock from here, but just checking for interrupts disabled
+* should be enough.
+*/
+   if (!irqs_disabled()  wake_klogd)
wake_up_klogd();
 }
 EXPORT_SYMBOL(release_console_sem);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-18 Thread Jarek Poplawski
On Fri, Jan 18, 2008 at 11:45:12AM +0100, Kay Sievers wrote:
 On Fri, 2008-01-18 at 08:38 +0100, Jarek Poplawski wrote:
  On Fri, Jan 18, 2008 at 01:31:17PM +0800, Dave Young wrote:
   On Jan 18, 2008 11:18 AM, Kay Sievers [EMAIL PROTECTED] wrote:
  ...
Yeah, might be better to wait until class_device is gone, otherwise you
may need to fix stuff that is just going to be removed. Your change to
have iterators for the class devices look like a nice preparation for
future changes though.
   
Our rough plan is:
 2.6.25:
  - get the ~100 patches in Greg's tree (in -mm) merged :)
 2.6.26:
???  - remove the 20 char limit in struct device
  - get rid of struct class_device
   
   Fine, thanks.
   
   Let's wait for other people's comment.
  
  Dave, I doubt you'll ever manage to do this if you're going to wait:
  probably there will be always some new changes like this around...
 
 Well there are not changes in that sense, the class_device stuff will
 be entirely ripped out, and I doubt we will want to change anything
 there, just shortly before it's deleted.

So, 2.6.26 means shortly... And this all needs some time for testing,
debugging or maybe some change of concept, so this would take a while...
Well, it's not my problem, but since this stuff will go away, shouldn't
we care more about the staff that will stay?

 Also your assumptions about device nesting are not really true, there is
 no limit, even when there are no current users nesting deeper, and
 struct device can be any nesting depth, and that's where it gets
 interesting.

I'm just trying to figure this out. It seems this is a real problem
while freezing, but not necessarily here (but I can miss something).
 
Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/3] UCC TDM driver for MPC83xx platforms

2008-01-18 Thread Aggrwal Poonam
Hello All

The TDM driver just now does not have a proper framework. Probably the
interface cannot be generalised as such. Hence we could not decide
whether it would be right to think of a TDM framework. Infact the
interface this TDM driver(for MPC8323ERDB) supplies may not be usable
for some other client as such. Please suggest on this.

But you are right as far as Freescale PowerPC platforms are concerned
which have TDM devices. Like, 8315 also has a TDM driver which also
exposes similar interface as 8323 because the client it is talking to is
the same.

Following is the small description of the TDM driver along with
interface details:

The dts file keeps a track of the TDM devices present on the board.
Depending on them the TDM driver initializes those many driver instances
while coming up.

The driver on the upper level can plug to more than one tdm clients
depending on the availablity  of TDM devices. At every new request of
the TDM client to bind with a TDM device, a free driver  instance is
allocated to the client.

The interface can be described as follows.

tdm_register_client(struct tdm_client *)
This API returns a pointer to the structure tdm_client which is
of type
struct tdm_client {
u32 driver_handle;
u32 (*tdm_read)(u32 driver_handle, short chn_id, short
*pcm_buffer, short len);
u32 (*tdm_write)(u32 driver_handle, short chn_id, short
*pcm_buffer, short len);
wait_queue_head_t *wakeup_event;
}

   It consists of:
   - driver_handle: It is basically to identify the particular TDM
device/driver instance.
   - tdm_read: It is a function pointer returned by the TDM driver to be
used to read TDM data  form a particular TDM channel.
   - tdm_write: It is a function pointer returned by the TDM driver to
be used to write TDM data  to a particular TDM channel.
   - wakeup_event: It is address of a wait_queue event on which the
client keeps on sleeping,  and the TDM driver wakes it up periodically.
The driver is configured to wake up the client  after every 10ms.

Once the TDM client gets registered to a TDM driver instance and a TDM
device, it interfaces  with the driver using tdm_read, tdm_write and
wakeup_event.

Note: The TDM driver can be used by only kernel level modules. The
driver does not expose any  file interface for User Applications. Can be
compared to the spi driver which interfaces with  the SPI clients
through some APIs.


I need your feedback on the interface details. Some changes were
suggested by Andrew for 32 bit tdm handle which I will modify.(Thanks
Andrew)

Please give your ideas about a TDM framework in the kernel and the
interface.

Waiting for your feedback.

Thanks and Regards
Poonam 
 
 

-Original Message-
From: Kumar Gala [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 15, 2008 9:01 AM
To: Andrew Morton
Cc: Phillips Kim; Aggrwal Poonam; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org; Barkowski Michael;
Kalra Ashish; Cutler Richard
Subject: Re: [PATCH 0/3] UCC TDM driver for MPC83xx platforms


On Jan 14, 2008, at 3:15 PM, Andrew Morton wrote:

 On Mon, 14 Jan 2008 12:00:51 -0600
 Kim Phillips [EMAIL PROTECTED] wrote:

 On Thu, 10 Jan 2008 21:41:20 -0700
 Aggrwal Poonam [EMAIL PROTECTED] wrote:

 Hello  All

 I am waiting for more feedback on the patches.

 If there are no objections please consider them for 2.6.25.

 if this isn't going to go through Alessandro Rubini/misc drivers, can

 it go through the akpm/mm tree?


 That would work.  But it might be more appropriate to go Kumar-
 paulus-Linus.

I'm ok w/taking the arch/powerpc bits, but Im a bit concerned about  
the driver itself.  I'm wondering if we need a TDM framework in the  
kernel.

I guess if Poonam could possibly describe how this driver is actually  
used that would be helpful.  I see we have 8315 with a discrete TDM  
block and I'm guessing 82xx/85xx based CPM parts of some form of TDM  
as well.

- k
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Miklos Szeredi
 diff --git a/mm/msync.c b/mm/msync.c
 index 144a757..a1b3fc6 100644
 --- a/mm/msync.c
 +++ b/mm/msync.c
 @@ -14,6 +14,122 @@
  #include linux/syscalls.h
  #include linux/sched.h
  
 +unsigned long masync_pte_range(struct vm_area_struct *vma, pmd_t *pdm,
 + unsigned long addr, unsigned long end)
 +{
 + pte_t *pte;
 + spinlock_t *ptl;
 +
 + pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
 + arch_enter_lazy_mmu_mode();
 + do {
 + pte_t ptent = *pte;
 +
 + if (pte_none(ptent))
 + continue;
 +
 + if (!pte_present(ptent))
 + continue;
 +
 + if (pte_dirty(ptent)  pte_write(ptent)) {
 + flush_cache_page(vma, addr, pte_pfn(ptent));

Hmm, I'm not sure flush_cache_page() is needed.  Or does does dirty
data in the cache somehow interfere with the page protection?

 + ptent = ptep_clear_flush(vma, addr, pte);
 + ptent = pte_wrprotect(ptent);
 + set_pte_at(vma-vm_mnm, addr, pte, ptent);
 + }
 + } while (pte++, addr += PAGE_SIZE, addr != end);
 + arch_leave_lazy_mmu_mode();
 + pte_unmap_unlock(pte - 1, ptl);
 +
 + return addr;
 +}
 +
 +unsigned long masync_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 + unsigned long addr, unsigned long end)
 +{
 + pmd_t *pmd;
 + unsigned long next;
 +
 + pmd = pmd_offset(pud, addr);
 + do {
 + next = pmd_addr_end(addr, end);
 + if (pmd_none_or_clear_bad(pmd))
 + continue;
 + next = masync_pte_range(vma, pmd, addr, next);
 + } while (pmd++, addr = next, addr != end);
 +
 + return addr;
 +}
 +
 +unsigned long masync_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
 + unsigned long addr, unsigned long end)
 +{
 + pud_t *pud;
 + unsigned long next;
 +
 + pud = pud_offset(pgd, addr);
 + do {
 + next = pud_addr_end(addr, end);
 + if (pud_none_or_clear_bad(pud))
 + continue;
 + next = masync_pmd_range(vma, pud, addr, next);
 + } while (pud++, addr = next, addr != end);
 +
 + return addr;
 +}
 +
 +unsigned long masync_pgd_range()
 +{
 + pgd_t *pgd;
 + unsigned long next;
 +
 + pgd = pgd_offset(vma-vm_mm, addr);
 + do {
 + next = pgd_addr_end(addr, end);
 + if (pgd_none_of_clear_bad(pgd))
 + continue;
 + next = masync_pud_range(vma, pgd, addr, next);
 + } while (pgd++, addr = next, addr != end);
 +
 + return addr;
 +}
 +
 +int masync_vma_one(struct vm_area_struct *vma,
 + unsigned long start, unsigned long end)
 +{
 + if (start  vma-vm_start)
 + start = vma-vm_start;
 +
 + if (end  vma-vm_end)
 + end = vma-vm_end;
 +
 + masync_pgd_range(vma, start, end);
 +
 + return 0;
 +}
 +
 +int masync_vma(struct vm_area_struct *vma, 
 + unsigned long start, unsigned long end)
 +{
 + struct address_space *mapping;
 + struct vm_area_struct *vma_iter;
 +
 + if (!(vma-vm_flags  VM_SHARED))
 + return 0;
 +
 + mapping = vma-vm_file-f_mapping;
 +
 + if (!mapping_cap_account_dirty(mapping))
 + return 0;
 +
 + spin_lock(mapping-i_mmap_lock);
 + vma_prio_tree_foreach(vma_iter, iter, mapping-i_mmap, start, end)
 + masync_vma_one(vma_iter, start, end);
 + spin_unlock(mapping-i_mmap_lock);

This is hoding i_mmap_lock for possibly quite long.  Isn't that going
to cause problems?

Miklos

 +
 + return 0;
 +}
 +
  /*
   * MS_SYNC syncs the entire file - including mappings.
   *
 
 
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk deadlocks if called with runqueue lock held

2008-01-18 Thread Jiri Kosina
On Thu, 17 Jan 2008, Steven Rostedt wrote:

 Thinking that it was locking up on my code I went looking down the wrong 
 path. I finally found (after examining an NMI dump) that the lockup 
 happened because printk was trying to wakeup the klogd daemon, which 
 caused a deadlock when the try_to_wakeup code tries to grab the runqueue 
 lock.

... which I have documented in the printk() comment's in commit 1492192b
:)

 Since printks are seldom called with interrupts disabled, we can
 hold off the waking of klogd if they are. We don't have access to
 the runqueue locks from printk, but those locks need interrupts
 disabled in order to be held.

If this patch is going to be merged, you should perhaps adjust the comment 
introduced by the above mentioned commit, so that it reflects the new 
behavior.

Thanks,

-- 
Jiri Kosina
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-18 Thread Ingo Molnar

* Michael Opdenacker [EMAIL PROTECTED] wrote:

  obj-$(CONFIG_PARAVIRT)   += paravirt_32.o
 -obj-y+= pcspeaker.o
 -
  obj-$(CONFIG_SCx200) += scx200_32.o
  
 +ifdef CONFIG_INPUT_PCSPKR
 + obj-y   += pcspeaker.o
 +endif

why didnt you make this:

  obj-$(CONFIG_INPUT_PCSPKR)+= pcspeaker.o

?

Your patch looks fine to me otherwise, obviously if someone disables 
PCSPKR intentionally in the .config, the kernel should just do that. 
Could you resend it with the above thing fixed, and against x86.git#mm? 
The x86.git coordinates are at:

 http://redhat.com/~mingo/x86.git/README

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX

2008-01-18 Thread NeilBrown

(This should be merged with fix-occasional-deadlock-in-raid5.patch)

As we don't call stripe_handle in make_request any more, we need to
clear STRIPE_DELAYED to (previously done by stripe_handle) to ensure
that we test if the stripe still needs to be delayed or not.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |1 +
 1 file changed, 1 insertion(+)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2008-01-18 14:58:55.0 +1100
+++ ./drivers/md/raid5.c2008-01-18 14:59:53.0 +1100
@@ -3549,6 +3549,7 @@ static int make_request(struct request_q
}
finish_wait(conf-wait_for_overlap, w);
set_bit(STRIPE_HANDLE, sh-state);
+   clear_bit(STRIPE_DELAYED, sh-state);
release_stripe(sh);
} else {
/* cannot get stripe for read-ahead, just give-up */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING.

2008-01-18 Thread NeilBrown

Finish ITERATE_ to for_each conversion.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c   |8 
 ./include/linux/raid/md_k.h |   14 --
 2 files changed, 8 insertions(+), 14 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2008-01-18 11:19:09.0 +1100
+++ ./drivers/md/md.c   2008-01-18 11:19:24.0 +1100
@@ -3766,7 +3766,7 @@ static void autorun_devices(int part)
printk(KERN_INFO md: considering %s ...\n,
bdevname(rdev0-bdev,b));
INIT_LIST_HEAD(candidates);
-   ITERATE_RDEV_PENDING(rdev,tmp)
+   rdev_for_each_list(rdev, tmp, pending_raid_disks)
if (super_90_load(rdev, rdev0, 0) = 0) {
printk(KERN_INFO md:  adding %s ...\n,
bdevname(rdev-bdev,b));
@@ -3810,7 +3810,7 @@ static void autorun_devices(int part)
} else {
printk(KERN_INFO md: created %s\n, mdname(mddev));
mddev-persistent = 1;
-   ITERATE_RDEV_GENERIC(candidates,rdev,tmp) {
+   rdev_for_each_list(rdev, tmp, candidates) {
list_del_init(rdev-same_set);
if (bind_rdev_to_array(rdev, mddev))
export_rdev(rdev);
@@ -3821,7 +3821,7 @@ static void autorun_devices(int part)
/* on success, candidates will be empty, on error
 * it won't...
 */
-   ITERATE_RDEV_GENERIC(candidates,rdev,tmp)
+   rdev_for_each_list(rdev, tmp, candidates)
export_rdev(rdev);
mddev_put(mddev);
}
@@ -4936,7 +4936,7 @@ static void status_unused(struct seq_fil
 
seq_printf(seq, unused devices: );
 
-   ITERATE_RDEV_PENDING(rdev,tmp) {
+   rdev_for_each_list(rdev, tmp, pending_raid_disks) {
char b[BDEVNAME_SIZE];
i++;
seq_printf(seq, %s ,

diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h 2008-01-18 11:19:09.0 +1100
+++ ./include/linux/raid/md_k.h 2008-01-18 11:19:24.0 +1100
@@ -313,23 +313,17 @@ static inline char * mdname (mddev_t * m
  * iterates through some rdev ringlist. It's safe to remove the
  * current 'rdev'. Dont touch 'tmp' though.
  */
-#define ITERATE_RDEV_GENERIC(head,rdev,tmp)\
+#define rdev_for_each_list(rdev, tmp, list)\
\
-   for ((tmp) = (head).next;   \
+   for ((tmp) = (list).next;   \
(rdev) = (list_entry((tmp), mdk_rdev_t, same_set)), \
-   (tmp) = (tmp)-next, (tmp)-prev != (head) \
+   (tmp) = (tmp)-next, (tmp)-prev != (list) \
; )
 /*
  * iterates through the 'same array disks' ringlist
  */
 #define rdev_for_each(rdev, tmp, mddev)\
-   ITERATE_RDEV_GENERIC((mddev)-disks,rdev,tmp)
-
-/*
- * Iterates through 'pending RAID disks'
- */
-#define ITERATE_RDEV_PENDING(rdev,tmp) \
-   ITERATE_RDEV_GENERIC(pending_raid_disks,rdev,tmp)
+   rdev_for_each_list(rdev, tmp, (mddev)-disks)
 
 typedef struct mdk_thread_s {
void(*run) (mddev_t *mddev);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 000 of 4] md: assorted md patched - please read carefully.

2008-01-18 Thread NeilBrown

Following are 4 patches for md.

The first two replace
   md-allow-devices-to-be-shared-between-md-arrays.patch
which was recently remove.  They should go at the same place in the
series, between
md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
and
md-lock-address-when-changing-attributes-of-component-devices.patch

The third is a replacement for

md-change-iterate_rdev_generic-to-rdev_for_each_list-and-remove-iterate_rdev_pending.patch

which conflicts with the above change.

The last is a fix for
md-fix-an-occasional-deadlock-in-raid5.patch

which makes me a lot happier about this patch.  It introduced a
performance regression and I now understand why.  I'm now happy for
that patch with this fix to go into 2.6.24 if that is convenient (If
not, 2.6.24.1 will do).

Thanks,
NeilBrown


 [PATCH 001 of 4] md: Set and test the -persistent flag for md devices more 
consistently.
 [PATCH 002 of 4] md: Allow devices to be shared between md arrays.
 [PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and 
remove ITERATE_RDEV_PENDING.
 [PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24 patch] x86: allow 64bit setting in Kconfig

2008-01-18 Thread Ingo Molnar

* Adrian Bunk [EMAIL PROTECTED] wrote:

  # Select 32 or 64 bit
  config 64BIT
 - bool 64-bit kernel if ARCH = x86
 + bool 64-bit kernel
   default ARCH = x86_64
   help
 Say yes to build a 64-bit kernel - formerly known as x86_64

thx, i've added this to x86.git.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 NIP: 4570 LR: 0fc42dc0 CTR: 
 REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
 MSR: 80001000 ME  CR: 28022422  XER: 
 DAR: c0077b6bfce0, DSISR: 0a00
 
 Actually, how much RAM does this machine have?  If it has less than
 32GB, then the problem is that the kernel stack pointer is bogus.
 (How it got to be bogus is the interesting question, of course. :)
 
 Paul.
 
Hi Paul,

The machine has around 30GB of RAM, do you want me to try, by taking 
the git-powerpc.patch out of the series and try reproducing the oops.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-devel] Oops involving RFCOMM and sysfs

2008-01-18 Thread Dave Young
On Jan 18, 2008 6:23 PM, Cornelia Huck [EMAIL PROTECTED] wrote:
 On Fri, 18 Jan 2008 10:19:33 +0100,
 Cornelia Huck [EMAIL PROTECTED] wrote:

  
   1314 if (IS_ERR(new_parent_kobj)) {
   1315 error = PTR_ERR(new_parent_kobj);
   1316 put_device(new_parent);
   1317 goto out;
   1318 }
   1319 pr_debug(DEVICE: moving '%s' to '%s'\n, dev-bus_id,
   1320  new_parent ? new_parent-bus_id : NULL);
   1321 error = kobject_move(dev-kobj, new_parent_kobj);
   1322 if (error) {
   1323 put_device(new_parent);
  
   imagine new_parent is NULL, then the new_parent_kobj should be put
 
  No, we would need a put_device_parent() (crappy name) which puts the
  reference iff get_device_parent() grabbed it.

 And looking at Greg's patchset, it has cleanup_device_parent(), which
 does just that. But it is only called in device_del(), not when
 device_move() has errors.

 (get_device_parent() also always returns a pointer to a kobject or
 NULL, so we can get rid of those IS_ERR() checks in setup_parent() and
 device_move() as well.)


Hmm, thanks.
I will be offline during weekend,  but I will still check the
device_move and other code if I have time.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)

2008-01-18 Thread Balbir Singh
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:55:03]:

 On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote:
  Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake
  NUMA nodes can be specified using the following command line option
  
 
  
  Comments are as always welcome!
 
 Here's some :)
 

Thanks!

  diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c
  --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy  
  2007-12-07 21:25:55.0 +0530
  +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c  2007-12-08 
  03:19:46.0 +0530
  @@ -24,6 +24,8 @@
   
   static int numa_enabled = 1;
   
  +static char *cmdline __initdata;
 
 Can you call this fake_numa_args or something, cmdline is a bit generic.
 


I could if it makes code easier to understand. Will put it in my TODO
list.

 
  @@ -39,6 +41,43 @@ static bootmem_data_t __initdata plat_no
   static int min_common_depth;
   static int n_mem_addr_cells, n_mem_size_cells;
   
  +static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn,
  +   unsigned int *nid)
  +{
  +   unsigned long long mem;
  +   char *p = cmdline;
  +   static unsigned int fake_nid = 0;
  +   static unsigned long long curr_boundary = 0;
  +
  +   *nid = fake_nid;
 
 As I mentioned in my other email I think this is broken, you
 unconditionally overwrite *nid, even if no fake numa was specified?
 

Aah.. OK.. looks like a BUG. I'll also respond to your other email.


  +   if (!p)
  +   return 0;
  +
  +   mem = memparse(p, p);
  +   if (!mem)
  +   return 0;
  +
  +   if (mem  curr_boundary)
  +   return 0;
  +
  +   curr_boundary = mem;
  +
  +   if ((end_pfn  PAGE_SHIFT)  mem) {
  +   /*
  +* Skip commas and spaces
  +*/
  +   while (*p == ',' || *p == ' ' || *p == '\t')
  +   p++;
  +
  +   cmdline = p;
  +   fake_nid++;
  +   *nid = fake_nid;
  +   dbg(created new fake_node with id %d\n, fake_nid);
  +   return 1;
  +   }
  +   return 0;
  +}
  +
   static void __cpuinit map_cpu_to_node(int cpu, int node)
   {
  numa_cpu_lookup_table[cpu] = node;
  @@ -344,12 +383,14 @@ static void __init parse_drconf_memory(s
  if (nid == 0x || nid = MAX_NUMNODES)
  nid = default_nid;
  }
  -   node_set_online(nid);
   
  size = numa_enforce_memory_limit(start, lmb_size);
  if (!size)
  continue;
   
  +   fake_numa_create_new_node(((start + size)  PAGE_SHIFT), nid);
  +   node_set_online(nid);
 
 I can't convince myself that this is 100% ok, the moving of
 node_set_online(). At the very least it's a change in behaviour,
 previously we would online the node regardless of the memory limit.
 

Hmm.. this can be reverted, but do we gain anything by enabling nodes,
even though we are over the memory limit?


  add_active_range(nid, start  PAGE_SHIFT,
   (start  PAGE_SHIFT) + (size  PAGE_SHIFT));
  }
  @@ -429,7 +470,6 @@ new_range:
  nid = of_node_to_nid_single(memory);
  if (nid  0)
  nid = default_nid;
  -   node_set_online(nid);
   
  if (!(size = numa_enforce_memory_limit(start, size))) {
  if (--ranges)
  @@ -438,6 +478,9 @@ new_range:
  continue;
  }
   
  +   fake_numa_create_new_node(((start + size)  PAGE_SHIFT), nid);
  +   node_set_online(nid);
 
 Ditto previous comment.
 

Yes, point noted.

Thanks for your review and problem report.

 cheers
 
 -- 
 Michael Ellerman
 OzLabs, IBM Australia Development Lab
 
 wwweb: http://michael.ellerman.id.au
 phone: +61 2 6212 1183 (tie line 70 21183)
 
 We do not inherit the earth from our ancestors,
 we borrow it from our children. - S.M.A.R.T Person



-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Peter Zijlstra

On Fri, 2008-01-18 at 11:38 +0100, Miklos Szeredi wrote:
  On Fri, 2008-01-18 at 10:51 +0100, Miklos Szeredi wrote:
  
diff --git a/mm/msync.c b/mm/msync.c
index a4de868..a49af28 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -13,11 +13,33 @@
 #include linux/syscalls.h
 
 /*
+ * Scan the PTEs for pages belonging to the VMA and mark them 
read-only.
+ * It will force a pagefault on the next write access.
+ */
+static void vma_wrprotect(struct vm_area_struct *vma)
+{
+   unsigned long addr;
+
+   for (addr = vma-vm_start; addr  vma-vm_end; addr += 
PAGE_SIZE) {
+   spinlock_t *ptl;
+   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
+   pud_t *pud = pud_offset(pgd, addr);
+   pmd_t *pmd = pmd_offset(pud, addr);
+   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, 
ptl);
+
+   if (pte_dirty(*pte)  pte_write(*pte))
+   *pte = pte_wrprotect(*pte);
+   pte_unmap_unlock(pte, ptl);
+   }
+}
   
   What about ram based filesystems?  They don't start out with read-only
   pte's, so I think they don't want them read-protected now either.
   Unless this is essential for correct mtime/ctime accounting on these
   filesystems (I don't think it really is).  But then the mapping should
   start out read-only as well, otherwise the time update will only work
   after an msync(MS_ASYNC).
  
  page_mkclean() has all the needed logic for this, it also walks the rmap
  and cleans out all other users, which I think is needed too for
  consistencies sake:
  
  Process A   Process B
  
  mmap(foo.txt)   mmap(foo.txt)
  
  dirty page
  dirty page
  
  msync(MS_ASYNC)
  
  dirty page
  
  msync(MS_ASYNC) --- now what?!

how about:

diff --git a/mm/msync.c b/mm/msync.c
index 144a757..a1b3fc6 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -14,6 +14,122 @@
 #include linux/syscalls.h
 #include linux/sched.h
 
+unsigned long masync_pte_range(struct vm_area_struct *vma, pmd_t *pdm,
+   unsigned long addr, unsigned long end)
+{
+   pte_t *pte;
+   spinlock_t *ptl;
+
+   pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
+   arch_enter_lazy_mmu_mode();
+   do {
+   pte_t ptent = *pte;
+
+   if (pte_none(ptent))
+   continue;
+
+   if (!pte_present(ptent))
+   continue;
+
+   if (pte_dirty(ptent)  pte_write(ptent)) {
+   flush_cache_page(vma, addr, pte_pfn(ptent));
+   ptent = ptep_clear_flush(vma, addr, pte);
+   ptent = pte_wrprotect(ptent);
+   set_pte_at(vma-vm_mnm, addr, pte, ptent);
+   }
+   } while (pte++, addr += PAGE_SIZE, addr != end);
+   arch_leave_lazy_mmu_mode();
+   pte_unmap_unlock(pte - 1, ptl);
+
+   return addr;
+}
+
+unsigned long masync_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+   unsigned long addr, unsigned long end)
+{
+   pmd_t *pmd;
+   unsigned long next;
+
+   pmd = pmd_offset(pud, addr);
+   do {
+   next = pmd_addr_end(addr, end);
+   if (pmd_none_or_clear_bad(pmd))
+   continue;
+   next = masync_pte_range(vma, pmd, addr, next);
+   } while (pmd++, addr = next, addr != end);
+
+   return addr;
+}
+
+unsigned long masync_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+   unsigned long addr, unsigned long end)
+{
+   pud_t *pud;
+   unsigned long next;
+
+   pud = pud_offset(pgd, addr);
+   do {
+   next = pud_addr_end(addr, end);
+   if (pud_none_or_clear_bad(pud))
+   continue;
+   next = masync_pmd_range(vma, pud, addr, next);
+   } while (pud++, addr = next, addr != end);
+
+   return addr;
+}
+
+unsigned long masync_pgd_range()
+{
+   pgd_t *pgd;
+   unsigned long next;
+
+   pgd = pgd_offset(vma-vm_mm, addr);
+   do {
+   next = pgd_addr_end(addr, end);
+   if (pgd_none_of_clear_bad(pgd))
+   continue;
+   next = masync_pud_range(vma, pgd, addr, next);
+   } while (pgd++, addr = next, addr != end);
+
+   return addr;
+}
+
+int masync_vma_one(struct vm_area_struct *vma,
+   unsigned long start, unsigned long end)
+{
+   if (start  vma-vm_start)
+   start = vma-vm_start;
+
+   if (end  vma-vm_end)
+   end = vma-vm_end;
+
+   masync_pgd_range(vma, start, end);
+
+   return 0;
+}
+
+int masync_vma(struct vm_area_struct *vma, 
+   unsigned long start, unsigned long end)
+{
+   struct address_space *mapping;
+   struct 

Re: [Bluez-devel] Oops involving RFCOMM and sysfs

2008-01-18 Thread Cornelia Huck
On Fri, 18 Jan 2008 10:19:33 +0100,
Cornelia Huck [EMAIL PROTECTED] wrote:

  
  1314 if (IS_ERR(new_parent_kobj)) {
  1315 error = PTR_ERR(new_parent_kobj);
  1316 put_device(new_parent);
  1317 goto out;
  1318 }
  1319 pr_debug(DEVICE: moving '%s' to '%s'\n, dev-bus_id,
  1320  new_parent ? new_parent-bus_id : NULL);
  1321 error = kobject_move(dev-kobj, new_parent_kobj);
  1322 if (error) {
  1323 put_device(new_parent);
  
  imagine new_parent is NULL, then the new_parent_kobj should be put
 
 No, we would need a put_device_parent() (crappy name) which puts the
 reference iff get_device_parent() grabbed it.

And looking at Greg's patchset, it has cleanup_device_parent(), which
does just that. But it is only called in device_del(), not when
device_move() has errors.

(get_device_parent() also always returns a pointer to a kobject or
NULL, so we can get rid of those IS_ERR() checks in setup_parent() and
device_move() as well.)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Paul Mackerras
Kamalesh Babulal writes:

 I tried reproducing the problem and was successful with following trace
 in which the pc is at 0x4570 as the above one

What did you do to trigger it?

 c0004544 unrecov_slb:
 c0004544:   71 8a 40 00 andi.   r10,r12,16384
 c0004548:   7c 2a 0b 78 mr  r10,r1  
 c000454c:   38 21 fd 10 addir1,r1,-752
 c0004550:   41 82 00 08 beq-c0004558 
 unrecov_slb+0x14
 c0004554:   e8 2d 01 a8 ld  r1,424(r13)
 c0004558:   2c a1 00 00 cmpdi   cr1,r1,0
 c000455c:   40 84 00 08 bge-cr1,c0004564 
 unrecov_slb+0x20
 c0004560:   48 00 00 10 b   c0004570 
 unrecov_slb+0x2c
 c0004564:   38 20 41 00 li  r1,16640
 c0004568:   b0 2d 01 c8 sth r1,456(r13)
 c000456c:   4b ff fb 18 b   c0004084 bad_stack
 c0004570:   f9 21 01 a0 std r9,416(r1) 

So it's in the code that gets called on an unrecoverable SLB fault.
That's bad, we should never get those.  Does this happen with mainline
too, or only with -rc8-mm1?  I don't understand why we should start
seeing this problem unless something has changed in
arch/powerpc/kernel or arch/powerpc/mm (well I suppose a bug somewhere
else could cause memory corruption which might be able to lead to
this).

Does it still happen if you take git-powerpc.patch out of the series?

Paul.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-18 Thread Fengguang Wu
On Thu, Jan 17, 2008 at 10:43:15PM -0800, Michael Rubin wrote:
 On Jan 17, 2008 8:56 PM, Fengguang Wu [EMAIL PROTECTED] wrote:
  On Thu, Jan 17, 2008 at 01:07:05PM -0800, Michael Rubin wrote:
  Suppose we want to grant longer expiration window for temp files,
  adding a new list named s_dirty_tmpfile would be a handy solution.
 
 When you mean tmp do you mean files that eventually get written to

Yes, they are disk based and can be synced on.

 disk? If not I would just use the WRITEBACK_NEVER. If so I am not sure
 if that feature is worth making a special case. It seems like the
 location based ideas may be more useful.

I'm not interested in WRITEBACK_NEVER or location based writeback
for now :-)

- refill s_io iif it is drained
  this prevents promotion of big/old files
  
   Once a big file gets its first do_writepages it is moved behind the
   other smaller files via i_flushed_when. And the same in reverse for
   big vs old.
 
  You mean i_flush_gen?
 
 Yeah sorry. It was once called i_flush_when. (sheepish)
 
  No, sync_sb_inodes() will abort on every
  MAX_WRITEBACK_PAGES, and s_flush_gen will be updated accordingly.
  Hence the sync will restart from big/old files.
 
 If I understand you correctly I am not sure I agree. Here is what I
 think happens in the patch:
 
 1) pull big inode off of flush tree
 2) sync big inode
 3) Hit MAX_WRITEBACK_PAGES
 4) Re-insert big inode (without modifying the dirtied_when)
 5) update the i_flush_gen on big inode and re-insert behind small
 inodes we have not synced yet.
 
 In a subsequent sync_sb_inode we end up retrieving the small inode we
 had not serviced yet.

Yes, exactly. And then it will continue to sync the big one again.
It will never be able to move forward to the next dirtied_when before
exhausting the inodes in the current list(with the oldest dirtied_when).

- return from sync_sb_inodes() after one go of s_io
  
   I am not sure how this limit helps things out. Is this for superblock
   starvation? Can you elaborate?
 
  We should have a way to go to next superblock even if new dirty inodes
  or pages are emerging fast in this superblock. Fill and drain s_io
  only once and then abort helps.
 
 Got it.
 
  s_io is a stable and bounded working set in one go of superblock.
 
 Is this necessary with MAX_WRITEBACK_PAGES? It feels like a double limit.

We need a limit and continuing scheme at each level. It was so hard to
sort them out, that I'm really reluctant to restart all the fuss again.

  Basically you make one list_head in each rbtree node.
  That list_head is recycled cyclic, and is an analog to the old
  fashioned s_dirty. We need to know 'where we are' and 'where it ends'.
  So an extra indicator must be introduced - i_flush_gen. It's awkward.
  We are simply repeating the aged list_heads' problem.
 
 To me they both feel a little awkward. I feel like the original
 problem in 2.6.23 led to a lot of examination which is bringing new
 possibilities to light.
 
 BTW the issue that started me on this whole path (starving large
 files) was still present in 2.6.23-rc8 but now looks fixed in
 2.6.24-rc3.
 Still no idea about your changes in 2.6.24-rc6-mm1. I have given up
 trying to get that thing to boot.

Hehe, I guess the bug is still there in 2.6.24-rc3. But should be gone
in the latest patchset.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Peter Zijlstra

On Fri, 2008-01-18 at 10:51 +0100, Miklos Szeredi wrote:

  diff --git a/mm/msync.c b/mm/msync.c
  index a4de868..a49af28 100644
  --- a/mm/msync.c
  +++ b/mm/msync.c
  @@ -13,11 +13,33 @@
   #include linux/syscalls.h
   
   /*
  + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
  + * It will force a pagefault on the next write access.
  + */
  +static void vma_wrprotect(struct vm_area_struct *vma)
  +{
  +   unsigned long addr;
  +
  +   for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
  +   spinlock_t *ptl;
  +   pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
  +   pud_t *pud = pud_offset(pgd, addr);
  +   pmd_t *pmd = pmd_offset(pud, addr);
  +   pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
  +
  +   if (pte_dirty(*pte)  pte_write(*pte))
  +   *pte = pte_wrprotect(*pte);
  +   pte_unmap_unlock(pte, ptl);
  +   }
  +}
 
 What about ram based filesystems?  They don't start out with read-only
 pte's, so I think they don't want them read-protected now either.
 Unless this is essential for correct mtime/ctime accounting on these
 filesystems (I don't think it really is).  But then the mapping should
 start out read-only as well, otherwise the time update will only work
 after an msync(MS_ASYNC).

page_mkclean() has all the needed logic for this, it also walks the rmap
and cleans out all other users, which I think is needed too for
consistencies sake:

Process A   Process B

mmap(foo.txt)   mmap(foo.txt)

dirty page
dirty page

msync(MS_ASYNC)

dirty page

msync(MS_ASYNC) --- now what?!


So what I would suggest is using the page table walkers from mm, and
walks the page range, obtain the page using vm_normal_page() and call
page_mkclean(). (Oh, and ensure you don't nest the pte lock :-)

All in all, that sounds rather expensive..



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Jakob Oestergaard
On Thu, Jan 17, 2008 at 01:25:39PM -0800, Linus Torvalds wrote:
...
 Why do you make that mistake, when it is PROVABLY NOT TRUE!
 
 Try this trivial program:
 
   int main(int argc, char **argv)
   {
   int i;
   const int *c;
   
   i = 5;
   c = i;
   i = 10;
   return *c;
   }
 
 and realize that according to the C rules, if it returns anything but 10, 
 the compiler is *buggy*.

That's not how this works (as we obviously agree).

Please consider a rewrite of your example, demonstrating the usefulness and
proper application of const pointers:

extern foo(const int *);

int main(int argc, char **argv)
{
 int i;

 i = 5;
 foo(i);
 return i;
}

Now, if the program returns anything else than 5, it means someone cast away
const, which is generally considered a bad idea in most other software
projects, for this very reason.

*That* is the purpose of const pointers.

Besides, for most debugging-enabled free() implementations, free() does indeed
touch the memory pointed to by its argument, which makes giving it a const
pointer completely bogus except for a single potential optimized special-case
where it might actually not touch the memory.

-- 

 / jakob

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86: MMCONF enable MCFG early

2008-01-18 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  Ingo,
  it seems you removed
   patch
 x86: validate against ACPI motherboard resources
  last night. 
  
  was it dropped?
 
 yeah, i bounced it over to Greg - but Greg has not indicated it yet 
 whether he has picked it up. Andrew has it in -rc8-mm1 at the moment.

so these two patches of yours:

 Subject: [PATCH 1/2] x86: clear pci_mmcfg_virt when mmcfg get rejected
 Subject: [PATCH 2/2] x86: MMCONF enable MCFG early

should probably go to Andrew/Greg as well.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86: MMCONF enable MCFG early

2008-01-18 Thread Ingo Molnar

* Yinghai Lu [EMAIL PROTECTED] wrote:

 Ingo,
 it seems you removed
  patch
  x86: validate against ACPI motherboard resources
 last night. 
 
 was it dropped?

yeah, i bounced it over to Greg - but Greg has not indicated it yet 
whether he has picked it up. Andrew has it in -rc8-mm1 at the moment.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [0/36] Great change_page_attr patch series v3

2008-01-18 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 Changes to previous versions: 
 - Ported to the latest git-x86 including the PAT patchkit
 This undoes some changes in the PAT patches and reimplements them
 in a different way. End result should be equivalent, but this
 made it easier for me to merge the patches.
 - Fix NX bit handling (I think even after Jeremy's fixes it was
 still not completely right) 
 - Minor fixes based on feedback

thanks Andi for porting your CPA queue ontop of PAT. Now that PAT 
support is getting into shape i've test-merged your CPA series to 
x86.git.

v2.6.25 merging of CPA is still somewhat in limbo but worst-case i think 
we can still get away with just doing wbinvd instead of clflush and get 
rid of most of the risks that way. Could you please add a boot option 
and Kconfig option that does that? Something like noclflush and a 
.config option to achieve the same - just like we do for PAT.

We've got way too much stuff going on at the moment - and the PAT bits 
are more fundamental and more important than nice but non-essential 
optimizations like CPA. There's still a lot of cruft all around this 
area.

One thing, you undid a cleanup patch:

|  Subject: CPA: Undo white space changes
|  From: Andi Kleen [EMAIL PROTECTED]
|
|  Undo random white space changes. This reverts
|  ddb53b5735793a19dc17bcd98b050f672

this is perfectly fine as we do not want to make your merging harder via 
cleanups, as long as you redo the cleanups after your series. Your new 
code is pretty ugly to look at, and this very much shows in the 
checkpatch metrics too:

   errors   lines of code   errors/KLOC
 arch/x86/mm/pageattr_32.c 29 419  69.2
 arch/x86/mm/pageattr_64.c 31 384  80.7

prior the undo it was:

   errors   lines of code   errors/KLOC
 arch/x86/mm/pageattr_32.c  0 294 0
 arch/x86/mm/pageattr_64.c  0 275 0

please restore that cleanliness state. Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Miklos Szeredi
 Updating file times at write references to memory-mapped files and
 forcing file times update at the next write reference after
 calling the msync() system call with the MS_ASYNC flag.
 
 Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
 ---
  mm/memory.c |6 ++
  mm/msync.c  |   52 +++-
  2 files changed, 45 insertions(+), 13 deletions(-)
 
 diff --git a/mm/memory.c b/mm/memory.c
 index 4bf0b6d..13d5bbf 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -1668,6 +1668,9 @@ gotten:
  unlock:
   pte_unmap_unlock(page_table, ptl);
   if (dirty_page) {
 + if (vma-vm_file)
 + file_update_time(vma-vm_file);
 +
   /*
* Yes, Virginia, this is actually required to prevent a race
* with clear_page_dirty_for_io() from clearing the page dirty
 @@ -2341,6 +2344,9 @@ out_unlocked:
   if (anon)
   page_cache_release(vmf.page);
   else if (dirty_page) {
 + if (vma-vm_file)
 + file_update_time(vma-vm_file);
 +
   set_page_dirty_balance(dirty_page, page_mkwrite);
   put_page(dirty_page);
   }
 diff --git a/mm/msync.c b/mm/msync.c
 index a4de868..a49af28 100644
 --- a/mm/msync.c
 +++ b/mm/msync.c
 @@ -13,11 +13,33 @@
  #include linux/syscalls.h
  
  /*
 + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
 + * It will force a pagefault on the next write access.
 + */
 +static void vma_wrprotect(struct vm_area_struct *vma)
 +{
 + unsigned long addr;
 +
 + for (addr = vma-vm_start; addr  vma-vm_end; addr += PAGE_SIZE) {
 + spinlock_t *ptl;
 + pgd_t *pgd = pgd_offset(vma-vm_mm, addr);
 + pud_t *pud = pud_offset(pgd, addr);
 + pmd_t *pmd = pmd_offset(pud, addr);
 + pte_t *pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl);
 +
 + if (pte_dirty(*pte)  pte_write(*pte))
 + *pte = pte_wrprotect(*pte);
 + pte_unmap_unlock(pte, ptl);
 + }
 +}

What about ram based filesystems?  They don't start out with read-only
pte's, so I think they don't want them read-protected now either.
Unless this is essential for correct mtime/ctime accounting on these
filesystems (I don't think it really is).  But then the mapping should
start out read-only as well, otherwise the time update will only work
after an msync(MS_ASYNC).

 +
 +/*
   * MS_SYNC syncs the entire file - including mappings.
   *
 - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
 - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
 - * Now it doesn't do anything, since dirty pages are properly tracked.
 + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
 + * read-only by calling vma_wrprotect(). This is needed to catch the next
 + * write reference to the mapped region and update the file times
 + * accordingly.
   *
   * The application may now run fsync() to write out the dirty pages and
   * wait on the writeout and check the result. Or the application may run
 @@ -77,16 +99,20 @@ asmlinkage long sys_msync(unsigned long start, size_t 
 len, int flags)
   error = 0;
   start = vma-vm_end;
   file = vma-vm_file;
 - if (file  (vma-vm_flags  VM_SHARED)  (flags  MS_SYNC)) {
 - get_file(file);
 - up_read(mm-mmap_sem);
 - error = do_fsync(file, 0);
 - fput(file);
 - if (error || start = end)
 - goto out;
 - down_read(mm-mmap_sem);
 - vma = find_vma(mm, start);
 - continue;
 + if (file  (vma-vm_flags  VM_SHARED)) {
 + if (flags  MS_ASYNC)
 + vma_wrprotect(vma);
 + if (flags  MS_SYNC) {
 + get_file(file);
 + up_read(mm-mmap_sem);
 + error = do_fsync(file, 0);
 + fput(file);
 + if (error || start = end)
 + goto out;
 + down_read(mm-mmap_sem);
 + vma = find_vma(mm, start);
 + continue;
 + }
   }
  
   vma = vma-vm_next;
 -- 
 1.4.4.4
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 Kernel oops will running kernbench

2008-01-18 Thread Paul Mackerras
Kamalesh Babulal writes:

  NIP: 4570 LR: 0fc42dc0 CTR: 
  REGS: c0077b6bf8c0 TRAP: 0300   Not tainted  (2.6.24-rc8-mm1-autotest)
  MSR: 80001000 ME  CR: 28022422  XER: 
  DAR: c0077b6bfce0, DSISR: 0a00

Actually, how much RAM does this machine have?  If it has less than
32GB, then the problem is that the kernel stack pointer is bogus.
(How it got to be bogus is the interesting question, of course. :)

Paul.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/5] memory_pressure_notify() caller

2008-01-18 Thread KOSAKI Motohiro
Hi!

  1. I doubt ZONE_DMA, please shipment ignore zone_dma patch(below).

 Your patch above solves the problem I had with early notification.

really!?
I am really happy!!

Thanks you.


- kosaki


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1

2008-01-18 Thread Balbir Singh
* Andrew Morton [EMAIL PROTECTED] [2008-01-17 10:40:21]:

 On Thu, 17 Jan 2008 18:16:22 +0530 Balbir Singh [EMAIL PROTECTED] wrote:
 
  * Andrew Morton [EMAIL PROTECTED] [2008-01-17 02:35:14]:
  
   
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/
   
   - selinux is busted on one of my two selinux-enabled test machines.
   
   - suspend-to-ram and suspend-to-disk are totally hosed on one of my test
 machines.  I guess I get to bisect this.
   
   - git-nfsd is dropped due to conflicts with git-nfs
   
   - git-newsetup is dropped due to conflicts with git-x86 (I think)
   
   - git-perfmon is dropped due to conflicts with git-x86 (I think)
   
   - git-kgdb is dropped due to conflicts with git-damn-near-everything
   
   - git-block is dropped due to conflicts with the IDE tree
   
   - kvm probably doesn't work properly because I couldn't be bothered fixing
 the conflicts between git-kvm and the driver tree
   
   - the volume of rejects and build errors which are caused by subsystem
 maintainers fiddling with other people's stuff is quite out of control. 
 Something needs to happen here.
  
  Hi, Andrew,
  
  May be it was one of the conflicts, but my system fails to get
  ethernet working with this version. I see
  
  e100: Intel(R) PRO/100 Network Driver, 3. 5.23-k4-NAPI
  e100: Copyright(c) 1999-2006 Intel Corporation
  ACPI: PCI Interrupt :04:08.0[A] - GSI 20 (level, low) - IRQ 20
  modprobe:2584 conflicting cache attribute 5000-50001000
  uncached-default
  e100: :04:08.0: e100_probe: Cannot map device registers, aborting.
  ACPI: PCI interrupt for device :04:08.0 disabled
  e100: probe of :04:08.0 failed with error -12
  
  Other interesting boot information
  
  Using ACPI (MADT) for SMP configuration information
  PM: Registered nosave memory: 0008f000 - 000a
  PM: Registered nosave memory: 000a - 000e
  PM: Registered nosave memory: 000e - 0010
  PM: Registered nosave memory: 3e5d1000 - 3e6e5000
  PM: Registered nosave memory: 3f574000 - 3f57c000
  PM: Registered nosave memory: 3f62d000 - 3f631000
  PM: Registered nosave memory: 3f6a7000 - 3f6e9000
  PM: Registered nosave memory: 3f6ed000 - 3f6ff000
  Allocating PCI resources starting at 5000 (gap: 4000:bff8)
  
   PCI: Bridge: :00:1c.0
 IO window: disabled.
 MEM window:
  0x5030-0x503f
 PREFETCH window: disabled.
   PCI: Bridge: :00:1c.2
 IO window: disabled.
 MEM window:
  0x5040-0x504f
 PREFETCH window: disabled.
   PCI: Bridge: :00:1c.3
 IO window: disabled.
 MEM window:
  0x5050-0x505f
 PREFETCH window: disabled.
   PCI: Bridge: :00:1e.0
 IO window: 1000-1fff
 MEM window:
  0x5000-0x500f
 PREFETCH window: disabled.
  
  I am yet to get down to the root cause, thought I'd report it first to
  the x86 and ACPI list to see if someone has seen the problem before.
  
 
 It appears that the new PAT code didn't like e100's pci_iomap().  Venki, can 
 you
 take a look please?


I tried booting with nopat with no effect. 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    2   3   4   5   6   7   8   9   10   11   >