Re: [PATCH 09/20] drivers/s390/: use LIST_HEAD instead of LIST_HEAD_INIT
On Thu, Dec 06, 2007 at 11:19:41PM +0800, Denis Cheng wrote: > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> > --- > drivers/s390/block/dcssblk.c |2 +- > drivers/s390/char/raw3270.c |4 ++-- > drivers/s390/char/tape_core.c |2 +- > drivers/s390/net/netiucv.c|3 +-- > drivers/s390/net/smsgiucv.c |2 +- > 5 files changed, 6 insertions(+), 7 deletions(-) Thanks, applied. I added the possible change in arch/s390/mm/extmem.c to your patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add LZO compression support to cryptoapi
On Wed, Dec 05, 2007 at 12:24:16PM +0100, Zoltan Sogor wrote: > > I've modified the patch as you suggested and added an other patch which adds > a common compression test function (modifies deflate test case to use the > common function). Both applied to cryptodev-2.6. Thanks a lot Zoltan! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 17/20] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:09:43 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 16/20] net/x25/: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:07:19 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/20] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:04:36 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 7 2007 07:30, Nix wrote: >On 6 Dec 2007, Jan Engelhardt verbalised: >> On Dec 5 2007 19:29, Nix wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: > RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if > you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) >>> >>>Well, your kernels must be on a 0.90-superblocked RAID-0 or RAID-1 >>>device. It can't handle booting off 1.x superblocks nor RAID-[56] >>>(not that I could really hope for the latter). >> >> If the superblock is at the end (which is the case for 0.90 and 1.0), >> then the offsets for a specific block on /dev/mdX match the ones for >> /dev/sda, >> so it should be "easy" to use lilo on 1.0 too, no? > >Sure, but you may have to hack /sbin/lilo to convince it to create the >superblock there at all. It's likely to recognise that this is an md >device without a v0.90 superblock and refuse to continue. (But I haven't >tested it.) > In that case, see above - move to a different bootloader. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > # bad: [6f37ac793d6ba7b35d338f791974166f67fdd9ba] Merge branch 'master' of > > master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 > > git-bisect bad 6f37ac793d6ba7b35d338f791974166f67fdd9ba > > # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap > > INLINE and name timeval_cmp better > > git-bisect good 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 > I'm struggling to see how any of those could have broken block device > mounting on alpha. Are you sure you bisected right? the bisection log looks healthy so far - with nicely alternating good/bad bisection points. Barring the possibility that the bug is non-deterministic, i'd guess the bisection points are OK, at least judging from their statistical properties. but ... i went over the diffs too, and i fail to see how they could affect the bootup path of an Alpha box, which i suspect has no networking dependency up to the failure point. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sockets affected by IPsec always block (2.6.23)
Am Freitag, 7. Dezember 2007 04:20 schrieb David Miller: > If IPSEC takes a long time to resolve, and we don't block, the > connect() can hard fail (we will just keep dropping the outgoing SYN > packet send attempts, eventually hitting the retry limit) in cases > where if we did block it would not fail (because we wouldn't send > the first SYN until IPSEC resolved). David - I'm aware of this, the discussion is which behaviour is ok. Let's go back to a real life example. I've already researched that the squid web proxy has a poll() based main loop doing nonblocking connects, may be with multiple threads. Situation: One user wants to access a web page that needs IPSEC. The SA takes 30 seconds to come up. a) Non-blocking connect is respected: SYN packets during the first 30 seconds will be dropped as you said. Connection can be completed on the next SYN retry (timeout in linux: 3 minutes). During this time, the 500 other users can continue to browse using the proxy. b) Non-blocking connect is ignored during IPSEC resolving as you advocate it: Connection for the one user can be completed immediatly after IPSEC comes up. That's the pro. However, until then, the other 500 proxy user CANNOT ACCESS THE WEB because squid's threads are stuck in connect()s on sockets they configured not to block. If the IPSEC SA never resolves due to some network outage, squid will sleep forever or until an admin configures it that it doesn't try to connect the adress in question and restarts it. Don't you realize how broken this behaviour is? Can you give me ONE example of an application that works better with b) and why this outweights the problems it creates for everybody else? Even the DNS example you posted in <[EMAIL PROTECTED]> is wrong because the second server will never queried if the kernel puts the process into coma while the IPSEC SA to the first server cannot be resolved. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] LED bugfix
Linus, Could you please pull from: git://git.o-hand.com/linux-rpurdie-leds for-linus This is an LED trigger locking fix for 2.6.24. This fixes the issues discussed in bug 9264, the change has been tested in -mm. Thanks, Richard drivers/leds/led-class.c|6 ++--- drivers/leds/led-triggers.c | 49 ++-- include/linux/leds.h|3 +- 3 files changed, 30 insertions(+), 28 deletions(-) Richard Purdie (1): leds: Fix led trigger locking bugs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] ext2: xip check fix
> > I think so. The filemap_xip.c functionality doesn't work for Flash > > memory yet. Flash memory doesn't have struct pages to back it up with > > which this stuff depends on. > > Struct page is not the major issue. The primary problem is writing to > the media (and I am not a flash expert at all, just relaying here): > For some period of time, the flash memory is not usable and thus we > need to make sure we can nuke the page table entries that we have in > userland page tables. For that, we need a callback from the device so > that it can ask to get its references back. Oh, and a put_xip_page > counterpart to get_xip_page, so that the driver knows when it's safe > to erase. Well... That's the biggest/hardest problem, yes. But not the first. First we got to tackle the easy read only case, which doesn't require any of that unpleasantness, yet which is used in a bunch of out of tree hacks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Fri, 7 Dec 2007 09:45:59 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Stefano Brivio <[EMAIL PROTECTED]> wrote: > > > This patch fixes a regression introduced by: > > > > commit bb29ab26863c022743143f27956cc0ca362f258c > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > Date: Mon Jul 9 18:51:59 2007 +0200 > > > > This caused the jiffies counter to leap back and forth on cpufreq > > changes on my x86 box. I'd say that we can't always assume that TSC > > does "small errors" only, when marked unstable. On cpufreq changes > > these errors can be huge. > > ah, printk_clock() still uses sched_clock(), not jiffies. So it's not > the jiffies counter that goes back and forth, it's sched_clock() - so > this is a printk timestamps anomaly, not related to jiffies. I thought > we have fixed this bug in the printk code already: sched_clock() is a > 'raw' interface that should not be used directly - the proper interface > is cpu_clock(cpu). Does the patch below help? > > Ingo > > ---> > Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() > From: Ingo Molnar <[EMAIL PROTECTED]> > > Stefano Brivio reported weird printk timestamp behavior during > CPU frequency changes: > > http://bugzilla.kernel.org/show_bug.cgi?id=9475 > > fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() > instead. > > Reported-and-bisected-by: Stefano Brivio <[EMAIL PROTECTED]> > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > --- > kernel/printk.c |2 +- > kernel/sched.c |7 ++- > 2 files changed, 7 insertions(+), 2 deletions(-) > > Index: linux/kernel/printk.c > === > --- linux.orig/kernel/printk.c > +++ linux/kernel/printk.c > @@ -680,7 +680,7 @@ asmlinkage int vprintk(const char *fmt, > loglev_char = default_message_loglevel > + '0'; > } > - t = printk_clock(); > + t = cpu_clock(printk_cpu); > nanosec_rem = do_div(t, 10); > tlen = sprintf(tbuf, > "<%c>[%5lu.%06lu] ", A bit risky - it's quite an expansion of code which no longer can call printk. You might want to take that WARN_ON out of __update_rq_clock() ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: git guidance
Andreas Ericsson wrote: > So, to get to the bottom of this, which of the following workflows is it > you want git to support? > > ### WORKFLOW A ### > edit, edit, edit > edit, edit, edit > edit, edit, edit > Oops I made a mistake and need to hop back to "current - 12". > edit, edit, edit > edit, edit, edit > publish everything, similar to just tarring up your workdir and sending > out ### END WORKFLOW A ### > > ### WORKFLOW B ### > edit, edit, edit > ok this looks good, I want to save a checkpoint here > edit, edit, edit > looks good again. next checkpoint > edit, edit, edit > oh crap, back to checkpoint 2 > edit, edit, edit > ooh, that's better. save a checkpoint and publish those checkpoints > ### END WORKFLOW B ### ### WORKFLOW C ### for every save on a gitfs mounted dir, do an implied checkpoint, commit, or publish (should be adjustable), on its privately created on-the-fly repository. ### END WORKFLOW C ### For example: echo "// last comment on this file" >> /gitfs.mounted/file should do an implied checkpoint, and make these checkpoints immediately visible under some checkpoint branch of the gitfs mounted dir. Note, this way the developer gets version control without even noticing, and works completely transparent to any kind of application. Thanks! -- Al -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ptrace API extensions for BTS
>From: Andi Kleen [mailto:[EMAIL PROTECTED] >Sent: Freitag, 7. Dezember 2007 12:18 >> I would like to settle the discussion and find an interface that >> everybody can agree to, so I can implement that interface and we can >> move forward with the patch. > >The most efficient interface would be zero copy with tracer >user process >supplying memory that is pinned (get_user_pages()) subject to the >mlock rlimit. Then kernel telling the CPU to directly log into >that. That would require users to understand all kinds of BTS formats and to detect the hardware they are running on in order to interpret the data. So far, there are two different formats. But one of them is wasting an entire word of memory per record. I could imagine that this would change some day. Other architectures would likely use an entirely different format. Users who want to support several architectures would benefit from a common format for this from-to branch information. >> Regarding 1, we currently provide scheduling timestamps, >which are arch > >That's actually broken because you don't log the CPU number. >sched_clock() without the CPU number associated is meaningless >on systems without synchronized, pstate invariant TSC >[that is older Intel systems or some larger current systems] I see. The intention was not to provide exact timestamps, but rather a relative order of BTS chunks that would allow debuggers to show which parts were (actually, "might have been" is the best we can say) executed in parallel, and which parts were definitely executed sequentially. Without a global time, though, this becomes rather meaningless. Is there some other metric that would allow me to order BTS chunks for different threads? >> Additional architectures may want to (re)use and extend the x86 bts >> record, or they may want to invent their own format. In the >former case, > >I think that's actually not a good goal. If the code is so complicated >that it makes sense sharing then you did something wrong :) Agreed;-) Users would benefit if they wanted to support multiple architectures. They would need to invent such a more general interface; or duplicate code, which is never a good thing. regards, markus. - Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: programs vanish with 2.6.22+
On Fri, 7 Dec 2007, Markus wrote: > Hi again! > > The memtest ran 14 passes (~10h) without an error. > > I now have a 2.6.24-rc4 with some debug-options turned on, waiting for > something to happen... can I just leave it untill a window disappears > or do I need to manually enable something or run some user-space app?! It depends - different options have it differently. Most simple ones are just compile-time, so, you don't have to enable them. Look in "help" for respective debug-options. Thanks Guennadi --- Guennadi Liakhovetski -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ptrace API extensions for BTS
On Friday 07 December 2007 10:11:04 Metzger, Markus T wrote: > Roland, Andi, > > I would like to discuss the ptrace user interface for the BTS extension. > In previous emails, > Andi suggested a stream-like interface, but is also OK with an > array-like interface (as far as I understood). > Roland is dubious about the ptrace API additions. > > I would like to settle the discussion and find an interface that > everybody can agree to, so I can implement that interface and we can > move forward with the patch. The most efficient interface would be zero copy with tracer user process supplying memory that is pinned (get_user_pages()) subject to the mlock rlimit. Then kernel telling the CPU to directly log into that. Kernel buffers would be only needed for the per CPU kernel logging. Then the only information that would need to be passed with system calls would be wakeup, tail position and perhaps a wrapping counter. > Regarding 1, we currently provide scheduling timestamps, which are arch That's actually broken because you don't log the CPU number. sched_clock() without the CPU number associated is meaningless on systems without synchronized, pstate invariant TSC [that is older Intel systems or some larger current systems] And even if you log the CPU number it is unclear how user space would make sense of that. It can't generally, even the kernel can't. Perhaps better to just not supply any time stamps for this. Even on systems that don't have unsync TSC problem above it can be tricky to convert the TSC into real time. Right now we don't report the TSC frequency for once. Usually it tends to be at highest p state but finding that out is also difficult and unreliable (rounding errors) and might not always be true in the future. Anyways could be solved by reporting that separately in /proc/cpuinfo, but given all the other problems I have my doubts it is really worth it. I would suggest dropping the time stamp. > Additional architectures may want to (re)use and extend the x86 bts > record, or they may want to invent their own format. In the former case, I think that's actually not a good goal. If the code is so complicated that it makes sense sharing then you did something wrong :) -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Thomas Gleixner <[EMAIL PROTECTED]> writes: > > Hmrpf. sched_clock() is used for the time stamp of the printks. We > need to find some better solution other than killing off the tsc > access completely. Doing it properly requires pretty much most of my old sched-clock ff patch. Complicated and not pretty, but .. Unfortunately that version still had some jumps on cpufreq, but they are fixable there. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
On Thu, 6 Dec 2007 23:07:08 -0600 (CST) [EMAIL PROTECTED] (Bob Tracy) wrote: > Andrew Morton wrote: > > commit 6f37ac793d6ba7b35d338f791974166f67fdd9ba > > Merge: 2f1f53b... d90bf5a... > > Author: Linus Torvalds <[EMAIL PROTECTED]> > > Date: Wed Nov 14 18:51:48 2007 -0800 > > > > Merge branch 'master' of > > master.kernel.org:/pub/scm/linux/kernel/git/davem/n > > > > * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: > > [NET]: rt_check_expire() can take a long time, add a cond_resched() > > [ISDN] sc: Really, really fix warning > > [ISDN] sc: Fix sndpkt to have the correct number of arguments > > [TCP] FRTO: Clear frto_highmark only after process_frto that uses it > > [NET]: Remove notifier block from chain when > > register_netdevice_notifier f > > [FS_ENET]: Fix module build. > > [TCP]: Make sure write_queue_from does not begin with NULL ptr > > [TCP]: Fix size calculation in sk_stream_alloc_pskb > > [S2IO]: Fixed memory leak when MSI-X vector allocation fails > > [BONDING]: Fix resource use after free > > [SYSCTL]: Fix warning for token-ring from sysctl checker > > [NET] random : secure_tcp_sequence_number should not assume > > CONFIG_KTIME_S > > [IWLWIFI]: Not correctly dealing with hotunplug. > > [TCP] FRTO: Plug potential LOST-bit leak > > [TCP] FRTO: Limit snd_cwnd if TCP was application limited > > [E1000]: Fix schedule while atomic when called from mii-tool. > > [NETX]: Fix build failure added by 2.6.24 statistics cleanup. > > [EP93xx_ETH]: Build fix after 2.6.24 NAPI changes. > > [PKT_SCHED]: Check subqueue status before calling hard_start_xmit > > > > I'm struggling to see how any of those could have broken block device > > mounting on alpha. Are you sure you bisected right? > > Based on what's in that commit, it *does* appear something went wrong > with bisection. If the implicated commit is the next one in time > sequence relative to > > # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap > INLINE and name timeval_cmp better > > then the test of whether I bisected correctly is as simple as applying > the commit and seeing if things break, because I'm running on the > kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right > now. Let me give that a try and I'll report back. Worst case, I'll > have to start over and write off the past four days... Gad. I trust the second time will be faster. git-bisect _is_ very error prone. I find one of the problems is that each step is so far apart in time that you forget what you were doing. Did I remember to test that iteration? Did I install the right kernel? etc. > Sorry about this... Not appropriate ;) Thanks for helping out. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
On Fri, 07 Dec 2007 04:51:37 + David Woodhouse <[EMAIL PROTECTED]> wrote: > On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote: > > Well I clearly goofed when I added the initial network namespace support > > for /proc/net. Currently things work but there are odd details visible > > to user space, even when we have a single network namespace. > > > > Since we do not cache proc_dir_entry dentries at the moment we can > > just modify ->lookup to return a different directory inode depending > > on the network namespace of the process looking at /proc/net, replacing > > the current technique of using a magic and fragile follow_link method. > > > > To accomplish that this patch: > > - introduces a shadow_proc method to allow different dentries to > > be returned from proc_lookup. > > - Removes the old /proc/net follow_link magic > > - Fixes a weakness in our not caching of proc generic dentries. > > > > As shadow_proc uses a task struct to decided which dentry to return we > > can go back later and fix the proc generic caching without modifying any > > code that > > uses the shadow_proc method. > > > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > > --- > > fs/proc/generic.c | 12 ++- > > fs/proc/proc_net.c | 86 > > +++ > > include/linux/proc_fs.h |3 ++ > > 3 files changed, 19 insertions(+), 82 deletions(-) > > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416) > > This seems to have broken the use of /proc/bus/usb as a mountpoint. It > always appears empty now, whatever's supposed to be mounted there. > Yes. Denis and Eric are tossing around competing patches but afaik nobody is happy with any of them. Guys, could we get this sorted soonish please? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
broken suspend (sched related) [Was: 2.6.24-rc4-mm1]
On 12/05/2007 06:17 AM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ > git-sched.patch breaks suspend here since -rc3-mm2. More precisely, this one: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks 2.6.24-rc4-mm1 minus this one works just fine. Otherwise disks stop, graphics stops and then it hangs not powering down. Core 2 Duo, SMP kernel, voluntary preempt, 250 HZ, SLUB, 64 bit. Ideas? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86: scale cyc_2_nsec according to CPU frequency
Le Fri, 7 Dec 2007 14:55:25 +0100, Ingo Molnar <[EMAIL PROTECTED]> a écrit : > Firstly, we dont need the 'offset' anymore because cpu_clock() maintains > offsets itself. Yes, but a lower quality one. __update_rq_clock tries to compensate large jumping clocks with a jiffy resolution, while my offset arranges for a very smooth frequency transition. I agree with keeping a single offset, but I liked the fact that with my patch on frequency change, the clock had no jump at all. > + * ns += offset to avoid sched_clock jumps with cpufreq I guess this needs to go away if I don't make my point :-( > + printk("CPU#%d: changed cyc2ns scale from %ld to %ld\n", > + cpu, prev_scale, *scale); Pointing it out just to be sure it does not end in the final version ;-) Thanks for cleaning up my mess ;-) -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Andi Kleen wrote: Changing the delay instruction sequence from the outb to short jumps might be the safe thing. I don't think that makes sense to do on anything modern. The trouble is that the jumps will effectively execute near "infinitely fast" on any modern CPU compared to the bus. But the delay really needs to be something that is about IO port speed. This all presumes that you need any delay at all. From back in the early days (when I was writing DOS and BIOS code on 80286 class machines) the /only/ reason this was a problem was using really slow acting, non-buffered chips compared to the processor clock (8259?). If you think about it, if there is a sequence such as outb->device, inb<-device, the only reason for a delay would be that the device failed to process the out command, /and/ the device had no "done" flag. The other "slow" problem would be an out->device, out->device at a rate higher than the device could handle because it had a one-level buffer that ignored input that came too fast after the previous, but didn't stall the bus to protect the device. Modern machines just are not designed that way - a few of the early PC compatibles were. My machine in question, for example, needs no waiting within CMOS_READs at all. And I doubt any other chip/device needs waiting that isn't already provided by the bus. the i/o to port 80 is very, very odd in this context. Actually, modern machines have potentially more serious problems with i/o ops to non-existent addresses, which may cause real bus wierdness. So that's why I suggested the short-jump answer - it fixes the problem on the ancient machines, but doesn't do anything on the modern ones, where there should be no problem. One patch that makes immediate sense is to use the "virtualization" hooks for the CMOS_READ/WRITE ops that is there in the 32-bit code to allow substitution of a workable sequence for the RTC, which is where I experience the problem on my machine. This doesn't fix any lurking issues with the _p APIs, since they are not virtualized. I'd suggest the safest possible route that would fix my machine would be either an early_quirk, a boot parameter, or both that would then control the virtualization hook logic. That patch would fix my machine's current issues, and would not harm any machines that need the 0x80 delay. But I know it leaves a lurking issue for another day - for all the other inb_p and outb_p code in the kernel drivers. A grep suggests that they are used only in somewhat less modern drivers - perhaps for legacy machines. I don't think any such drivers are used on any of my machines. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86: scale cyc_2_nsec according to CPU frequency
* Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > Le Fri, 7 Dec 2007 14:55:25 +0100, > Ingo Molnar <[EMAIL PROTECTED]> a ??crit : > > > Firstly, we dont need the 'offset' anymore because cpu_clock() > > maintains offsets itself. > > Yes, but a lower quality one. __update_rq_clock tries to compensate > large jumping clocks with a jiffy resolution, while my offset arranges > for a very smooth frequency transition. yes, but that would be easy to fix up via calling sched_clock_idle_wakeup_event(0) when doing a frequency transition, without burdening the normal sched_clock() codepath with the offset. See the attached latest version. Ingo ---> Subject: x86: scale cyc_2_nsec according to CPU frequency From: "Guillaume Chazarain" <[EMAIL PROTECTED]> scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ [EMAIL PROTECTED]: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> --- arch/x86/kernel/tsc_32.c | 45 +++ arch/x86/kernel/tsc_64.c | 59 +++ include/asm-x86/timer.h | 23 ++ 3 files changed, 106 insertions(+), 21 deletions(-) Index: linux-x86.q/arch/x86/kernel/tsc_32.c === --- linux-x86.q.orig/arch/x86/kernel/tsc_32.c +++ linux-x86.q/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,35 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* +* Start smoothly with the new frequency: +*/ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +260,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +390,8 @@ static inline void check_geode_tsc_relia void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +405,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* +* Secondary CPUs do not run through tsc_init(), so set up +* all the scale factors for all CPUs, assuming the same +* speed as the bootup CPU. (cpufreq notifiers will fix this +* up if their speed diverges) +*/ + for_each_possible_cpu(cpu) + set_cyc2ns_scale(cpu_khz, cpu); + use_tsc_delay(); /* Check and install the TSC clocksource */ Index: linux-x86.q/arch/x86/kernel/tsc_64.c === --- linux-x86.q.orig/arch/x86/kernel/tsc_64.c +++ linux-x86.q/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include #include +#include static int notsc __initdata = 0; @@ -18,16 +19,50 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +/* Accelerators for sched_clock() + * convert from cycles(64bits) => nanoseconds (64bits) + * basic equation: + * ns = cycles / (freq / ns_per_sec) + * ns = cycles * (ns_per_sec / freq) + * ns = cycles * (10^9 / (cpu_khz * 10^3)) + * ns = cycles * (10^6 / cpu_khz) + * + *
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
> My machine in question, for example, needs no waiting within CMOS_READs > at all. And I doubt any other chip/device needs waiting that isn't I don't know about CMOS, but there were definitely some not too ancient systems (let's say not more than 10 years) who required IO delays in the floppy driver and the 8253/8259. But on those the jumps are already far too fast. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > third update. the cpufreq callbacks are not quite OK yet. fourth update - the cpufreq callbacks are back. This is a version that is supposed fix all known aspects of TSC and frequency-change weirdnesses. Ingo Index: linux/arch/arm/kernel/time.c === --- linux.orig/arch/arm/kernel/time.c +++ linux/arch/arm/kernel/time.c @@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset } #endif -/* - * An implementation of printk_clock() independent from - * sched_clock(). This avoids non-bootable kernels when - * printk_clock is enabled. - */ -unsigned long long printk_clock(void) -{ - return (unsigned long long)(jiffies - INITIAL_JIFFIES) * - (10 / HZ); -} - static unsigned long next_rtc_update; /* Index: linux/arch/ia64/kernel/time.c === --- linux.orig/arch/ia64/kernel/time.c +++ linux/arch/ia64/kernel/time.c @@ -344,33 +344,6 @@ udelay (unsigned long usecs) } EXPORT_SYMBOL(udelay); -static unsigned long long ia64_itc_printk_clock(void) -{ - if (ia64_get_kr(IA64_KR_PER_CPU_DATA)) - return sched_clock(); - return 0; -} - -static unsigned long long ia64_default_printk_clock(void) -{ - return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) * - (10/HZ); -} - -unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock; - -unsigned long long printk_clock(void) -{ - return ia64_printk_clock(); -} - -void __init -ia64_setup_printk_clock(void) -{ - if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) - ia64_printk_clock = ia64_itc_printk_clock; -} - /* IA64 doesn't cache the timezone */ void update_vsyscall_tz(void) { Index: linux/arch/x86/kernel/process_32.c === --- linux.orig/arch/x86/kernel/process_32.c +++ linux/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt();/* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux/arch/x86/kernel/tsc_32.c === --- linux.orig/arch/x86/kernel/tsc_32.c +++ linux/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,35 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* +* Start smoothly with the new frequency: +*/ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +260,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock tur
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
I wrote: > "git diff 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 > 6f37ac793d6ba7b35d338f791974166f67fdd9ba" > produced a relatively short patch (18,437 bytes). The list of involved > files: > > (omitted) > > Current state of the source tree is the 6f37ac... version, so I'll start > backing out the above diffs in related groups and continue until I've got > a working kernel. For lack of an obvious target, I'll start with the > seemingly innocuous change to sysctl_check.c. I'll report back when I've > got something. That was quick :-). Backing out the sysctl_check.c diff gives me a working kernel. Beats the [EMAIL PROTECTED] out of me how/why, though. Michael Cree: could you try backing out the diff below from your 2.6.24-rc3 tree and see if things are now working for you? Here's "uname -a", just to confirm (maybe) I'm running on what I say works: Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha Here's the diff I backed out (patch -R). It's short... diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c index 5a2f2b2..4abc6d2 100644 --- a/kernel/sysctl_check.c +++ b/kernel/sysctl_check.c @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = { { NET_ROSE, "rose", trans_net_rose_table }, { NET_IPV6, "ipv6", trans_net_ipv6_table }, { NET_X25, "x25", trans_net_x25_table }, - { NET_TR, "tr", trans_net_tr_table }, + { NET_TR, "token-ring", trans_net_tr_table }, { NET_DECNET, "decnet", trans_net_decnet_table }, /* NET_ECONET not used */ { NET_SCTP, "sctp", trans_net_sctp_table }, -- Bob Tracy | "They couldn't hit an elephant at this dist- " [EMAIL PROTECTED] | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] avr32 fixes for 2.6.24
Linus, Please pull from ssh://master.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6.git for-linus to receive the following updates. Yes, lots of complicated stuff has been changed here. There is currently a bug in the debug trap handling code which may cause a soft lockup while debugging userspace applications, and it took some major surgery to get it fixed. The lockdep stuff isn't really a part of the fix, but since it touches the low-level exception handling code, I think it is more risky to remove it than leaving it in. So I hope you won't be too upset by these changes. I wouldn't have pushed it if I didn't think the bug it fixes is very serious, and I've spent quite a few days testing that nothing broke. A customer has verified the fix too, and the LTP test cases that fail after this patch, failed before too. Haavard Skinnemoen (9): [AVR32] Add TIF_RESTORE_SIGMASK to the work masks [AVR32] Fix invalid status register bit definitions in asm/ptrace.h [AVR32] Kconfig: Use def_bool instead of bool + default [AVR32] Implement stacktrace support [AVR32] Implement irqflags trace and lockdep support [AVR32] Clean up OCD register usage [AVR32] Follow the rules when dealing with the OCD system [AVR32] Fix copy_to_user_page() breakage [AVR32] Fix wrong pt_regs in critical exception handler arch/avr32/Kconfig | 65 ++--- arch/avr32/kernel/Makefile |1 + arch/avr32/kernel/asm-offsets.c |2 + arch/avr32/kernel/entry-avr32b.S | 285 --- arch/avr32/kernel/kprobes.c | 14 +- arch/avr32/kernel/process.c |9 +- arch/avr32/kernel/ptrace.c | 273 ++ arch/avr32/kernel/stacktrace.c | 53 arch/avr32/kernel/traps.c|2 +- arch/avr32/kernel/vmlinux.lds.S |2 +- arch/avr32/mm/cache.c| 20 +- include/asm-avr32/cacheflush.h | 19 +- include/asm-avr32/ocd.h | 592 +- include/asm-avr32/processor.h|3 + include/asm-avr32/ptrace.h |6 +- include/asm-avr32/sysreg.h |2 + include/asm-avr32/system.h |4 +- include/asm-avr32/thread_info.h | 25 ++- 18 files changed, 1006 insertions(+), 371 deletions(-) create mode 100644 arch/avr32/kernel/stacktrace.c diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig index 4f402c9..b77abce 100644 --- a/arch/avr32/Kconfig +++ b/arch/avr32/Kconfig @@ -6,8 +6,7 @@ mainmenu "Linux Kernel Configuration" config AVR32 - bool - default y + def_bool y # With EMBEDDED=n, we get lots of stuff automatically selected # that we usually don't need on AVR32. select EMBEDDED @@ -20,51 +19,49 @@ config AVR32 http://avr32linux.org/. config GENERIC_GPIO - bool - default y + def_bool y config GENERIC_HARDIRQS - bool - default y + def_bool y + +config STACKTRACE_SUPPORT + def_bool y + +config LOCKDEP_SUPPORT + def_bool y + +config TRACE_IRQFLAGS_SUPPORT + def_bool y config HARDIRQS_SW_RESEND - bool - default y + def_bool y config GENERIC_IRQ_PROBE - bool - default y + def_bool y config RWSEM_GENERIC_SPINLOCK - bool - default y + def_bool y config GENERIC_TIME - bool - default y + def_bool y config RWSEM_XCHGADD_ALGORITHM - bool + def_bool n config ARCH_HAS_ILOG2_U32 - bool - default n + def_bool n config ARCH_HAS_ILOG2_U64 - bool - default n + def_bool n config GENERIC_HWEIGHT - bool - default y + def_bool y config GENERIC_CALIBRATE_DELAY - bool - default y + def_bool y config GENERIC_BUG - bool - default y + def_bool y depends on BUG source "init/Kconfig" @@ -139,28 +136,22 @@ config PHYS_OFFSET source "kernel/Kconfig.preempt" config HAVE_ARCH_BOOTMEM_NODE - bool - default n + def_bool n config ARCH_HAVE_MEMORY_PRESENT - bool - default n + def_bool n config NEED_NODE_MEMMAP_SIZE - bool - default n + def_bool n config ARCH_FLATMEM_ENABLE - bool - default y + def_bool y config ARCH_DISCONTIGMEM_ENABLE - bool - default n + def_bool n config ARCH_SPARSEMEM_ENABLE - bool - default n + def_bool n source "mm/Kconfig" diff --git a/arch/avr32/kernel/Makefile b/arch/avr32/kernel/Makefile index 989fcd1..2d6d48f 100644 --- a/arch/avr32/kernel/Makefile +++ b/arch/avr32/kernel/Makefile @@ -11,3 +11,4 @@ obj-y += signal.o sys_avr32.o process.o time.o obj-y += init_task.o switch_to.o cpu.o obj-$(CONFIG_MODULES) += module.o avr32_ksyms.o obj-$(CONFIG_KPROBES) += kprobes.o +obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --
[PATCH -mm 2/6] powerpc: convert iommu to use the IOMMU helper
This patch converts PPC's IOMMU to use the IOMMU helper functions. The IOMMU doesn't allocate a memory area spanning LLD's segment boundary anymore. iseries_hv_alloc and iseries_hv_map don't have proper device struct. 4GB boundary is used for them. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- arch/powerpc/Kconfig |3 + arch/powerpc/kernel/dma_64.c |6 +- arch/powerpc/kernel/iommu.c| 65 arch/powerpc/platforms/iseries/iommu.c |4 +- include/asm-powerpc/iommu.h| 10 ++-- 5 files changed, 45 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 98aef7f..1a6cf07 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -227,6 +227,9 @@ config IOMMU_VMERGE Most drivers don't have this problem; it is safe to say Y here. +config IOMMU_HELPER + def_bool PPC64 + config HOTPLUG_CPU bool "Support for enabling/disabling CPUs" depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC) diff --git a/arch/powerpc/kernel/dma_64.c b/arch/powerpc/kernel/dma_64.c index 1806d96..6fcb7cb 100644 --- a/arch/powerpc/kernel/dma_64.c +++ b/arch/powerpc/kernel/dma_64.c @@ -31,8 +31,8 @@ static inline unsigned long device_to_mask(struct device *dev) static void *dma_iommu_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag) { - return iommu_alloc_coherent(dev->archdata.dma_data, size, dma_handle, - device_to_mask(dev), flag, + return iommu_alloc_coherent(dev, dev->archdata.dma_data, size, + dma_handle, device_to_mask(dev), flag, dev->archdata.numa_node); } @@ -52,7 +52,7 @@ static dma_addr_t dma_iommu_map_single(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction) { - return iommu_map_single(dev->archdata.dma_data, vaddr, size, + return iommu_map_single(dev, dev->archdata.dma_data, vaddr, size, device_to_mask(dev), direction); } diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 7a5d247..6abf4c3 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -81,17 +82,19 @@ static int __init setup_iommu(char *str) __setup("protect4gb=", setup_protect4gb); __setup("iommu=", setup_iommu); -static unsigned long iommu_range_alloc(struct iommu_table *tbl, +static unsigned long iommu_range_alloc(struct device *dev, + struct iommu_table *tbl, unsigned long npages, unsigned long *handle, unsigned long mask, unsigned int align_order) { - unsigned long n, end, i, start; + unsigned long n, end, start; unsigned long limit; int largealloc = npages > 15; int pass = 0; unsigned long align_mask; + unsigned long boundary_size; align_mask = 0xl >> (64 - align_order); @@ -136,14 +139,17 @@ static unsigned long iommu_range_alloc(struct iommu_table *tbl, start &= mask; } - n = find_next_zero_bit(tbl->it_map, limit, start); - - /* Align allocation */ - n = (n + align_mask) & ~align_mask; - - end = n + npages; + if (dev) + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1, + 1 << IOMMU_PAGE_SHIFT); + else + boundary_size = ALIGN(1UL << 32, 1 << IOMMU_PAGE_SHIFT); + /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */ - if (unlikely(end >= limit)) { + n = iommu_area_alloc(tbl->it_map, limit, start, npages, +tbl->it_offset, boundary_size >> IOMMU_PAGE_SHIFT, +align_mask); + if (n == -1) { if (likely(pass < 2)) { /* First failure, just rescan the half of the table. * Second failure, rescan the other half of the table. @@ -158,14 +164,7 @@ static unsigned long iommu_range_alloc(struct iommu_table *tbl, } } - for (i = n; i < end; i++) - if (test_bit(i, tbl->it_map)) { - start = i+1; - goto again; - } - - for (i = n; i < end; i++) - __set_bit(i, tbl->it_map); + end = n + npages; /* Bump the hint to a new block for small allocs. */ if (largealloc) { @@ -184,16 +183,17 @@ static un
[PATCH -mm 3/6] powerpc: remove DMA 4GB boundary protection
Previously, during initialization of the IOMMU tables, the last entry at each 4GB boundary is marked as used since there are many adapters which cannot handle DMAing across any 4GB boundary. The IOMMU doesn't allocate a memory area spanning LLD's segment boundary anymore. The segment boundary of devices are set to 4GB by default. So we can remove 4GB boundary protection now. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- arch/powerpc/kernel/iommu.c | 21 + 1 files changed, 1 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 6abf4c3..bdb194c 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -448,9 +448,6 @@ void iommu_unmap_sg(struct iommu_table *tbl, struct scatterlist *sglist, struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) { unsigned long sz; - unsigned long start_index, end_index; - unsigned long entries_per_4g; - unsigned long index; static int welcomed = 0; struct page *page; @@ -472,6 +469,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) #ifdef CONFIG_CRASH_DUMP if (ppc_md.tce_get) { + unsigned long index; unsigned long tceval; unsigned long tcecount = 0; @@ -502,23 +500,6 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) ppc_md.tce_free(tbl, tbl->it_offset, tbl->it_size); #endif - /* -* DMA cannot cross 4 GB boundary. Mark last entry of each 4 -* GB chunk as reserved. -*/ - if (protect4gb) { - entries_per_4g = 0x1l >> IOMMU_PAGE_SHIFT; - - /* Mark the last bit before a 4GB boundary as used */ - start_index = tbl->it_offset | (entries_per_4g - 1); - start_index -= tbl->it_offset; - - end_index = tbl->it_size; - - for (index = start_index; index < end_index - 1; index += entries_per_4g) - __set_bit(index, tbl->it_map); - } - if (!welcomed) { printk(KERN_INFO "IOMMU table initialized, virtual merging %s\n", novmerge ? "disabled" : "enabled"); -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 0/6] fix iommu segment boundary problems (powerpc and x86)
This patchset is a sequel to my patchset to fix iommu segment boundary problems: http://www.mail-archive.com/[EMAIL PROTECTED]/msg11919.html This adds new IOMMU helper functions for the free area management. These functions take care of LLD's segment boundary limit for IOMMUs. They are useful for IOMMUs that use bitmap for the free area management. The helper functions are very low level. They just find a free area in bitmap appropriate for low level drivers. The IOMMUs continue to use their hardware specific techniques easily with the low level helper functions. This patchset converts three IOMMUs: POWERPC, X86 calgary, and X86 gart but I tested POWERPC patch. The rest are only compile tested since I don't have hardware. This is against 2.6.24-rc4-mm1. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 5/6] x86: convert gart IOMMU to use the IOMMU helper
This patch converts gart IOMMU to use the IOMMU helper functions. The IOMMU doesn't allocate a memory area spanning LLD's segment boundary anymore. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- arch/x86/Kconfig |2 +- arch/x86/kernel/pci-gart_64.c | 41 + 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index df22fe7..34519c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -434,7 +434,7 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT If unsure, say Y. config IOMMU_HELPER - def_bool CALGARY_IOMMU + def_bool (CALGARY_IOMMU || GART_IOMMU) # need this always selected by IOMMU for the VIA workaround config SWIOTLB diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c index b8595d6..d0b9033 100644 --- a/arch/x86/kernel/pci-gart_64.c +++ b/arch/x86/kernel/pci-gart_64.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -82,17 +83,24 @@ AGPEXTERN __u32 *agp_gatt_table; static unsigned long next_bit; /* protected by iommu_bitmap_lock */ static int need_flush; /* global flush state. set for each gart wrap */ -static unsigned long alloc_iommu(int size) +static unsigned long alloc_iommu(struct device *dev, int size) { unsigned long offset, flags; + unsigned long boundary_size; + unsigned long base_index; + + base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev), + PAGE_SIZE) >> PAGE_SHIFT; + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1, + PAGE_SIZE) >> PAGE_SHIFT; spin_lock_irqsave(&iommu_bitmap_lock, flags); - offset = find_next_zero_string(iommu_gart_bitmap, next_bit, - iommu_pages, size); + offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, next_bit, + size, base_index, boundary_size, 0); if (offset == -1) { need_flush = 1; - offset = find_next_zero_string(iommu_gart_bitmap, 0, - iommu_pages, size); + offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, 0, + size, base_index, boundary_size, 0); } if (offset != -1) { set_bit_string(iommu_gart_bitmap, offset, size); @@ -114,7 +122,7 @@ static void free_iommu(unsigned long offset, int size) unsigned long flags; spin_lock_irqsave(&iommu_bitmap_lock, flags); - __clear_bit_string(iommu_gart_bitmap, offset, size); + iommu_area_free(iommu_gart_bitmap, offset, size); spin_unlock_irqrestore(&iommu_bitmap_lock, flags); } @@ -235,7 +243,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, size_t size, int dir) { unsigned long npages = to_pages(phys_mem, size); - unsigned long iommu_page = alloc_iommu(npages); + unsigned long iommu_page = alloc_iommu(dev, npages); int i; if (iommu_page == -1) { @@ -355,10 +363,11 @@ static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg, } /* Map multiple scatterlist entries continuous into the first. */ -static int __dma_map_cont(struct scatterlist *start, int nelems, - struct scatterlist *sout, unsigned long pages) +static int __dma_map_cont(struct device *dev, struct scatterlist *start, + int nelems, struct scatterlist *sout, + unsigned long pages) { - unsigned long iommu_start = alloc_iommu(pages); + unsigned long iommu_start = alloc_iommu(dev, pages); unsigned long iommu_page = iommu_start; struct scatterlist *s; int i; @@ -394,8 +403,8 @@ static int __dma_map_cont(struct scatterlist *start, int nelems, } static inline int -dma_map_cont(struct scatterlist *start, int nelems, struct scatterlist *sout, -unsigned long pages, int need) +dma_map_cont(struct device *dev, struct scatterlist *start, int nelems, +struct scatterlist *sout, unsigned long pages, int need) { if (!need) { BUG_ON(nelems != 1); @@ -403,7 +412,7 @@ dma_map_cont(struct scatterlist *start, int nelems, struct scatterlist *sout, sout->dma_length = start->length; return 0; } - return __dma_map_cont(start, nelems, sout, pages); + return __dma_map_cont(dev, start, nelems, sout, pages); } /* @@ -452,8 +461,8 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, if (!iommu_merge || !nextneed || !need || s->offset || (s->length + seg_size > max_seg_size) || (ps->offset + ps->length) % PAGE_
[PATCH -mm 1/6] add IOMMU helper functions for the free area management
This adds IOMMU helper functions for the free area management. These functions take care of LLD's segment boundary limit for IOMMUs. They would be useful for IOMMUs that use bitmap for the free area management. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- include/linux/iommu-helper.h |7 lib/Makefile |1 + lib/iommu-helper.c | 76 ++ 3 files changed, 84 insertions(+), 0 deletions(-) create mode 100644 include/linux/iommu-helper.h create mode 100644 lib/iommu-helper.c diff --git a/include/linux/iommu-helper.h b/include/linux/iommu-helper.h new file mode 100644 index 000..4dd4c04 --- /dev/null +++ b/include/linux/iommu-helper.h @@ -0,0 +1,7 @@ +extern unsigned long iommu_area_alloc(unsigned long *map, unsigned long size, + unsigned long start, unsigned int nr, + unsigned long shift, + unsigned long boundary_size, + unsigned long align_mask); +extern void iommu_area_free(unsigned long *map, unsigned long start, + unsigned int nr); diff --git a/lib/Makefile b/lib/Makefile index b862b90..17fb758 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_SMP) += pcounter.o obj-$(CONFIG_AUDIT_GENERIC) += audit.o obj-$(CONFIG_SWIOTLB) += swiotlb.o +obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o lib-$(CONFIG_GENERIC_BUG) += bug.o diff --git a/lib/iommu-helper.c b/lib/iommu-helper.c new file mode 100644 index 000..e7d8544 --- /dev/null +++ b/lib/iommu-helper.c @@ -0,0 +1,76 @@ +/* + * IOMMU helper functions for the free area management + */ + +#include +#include + +static unsigned long find_next_zero_area(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr) +{ + unsigned long index, end, i; +again: + index = find_next_zero_bit(map, size, start); + end = index + nr; + if (end > size) + return -1; + for (i = index + 1; i < end; i++) { + if (test_bit(i, map)) { + start = i+1; + goto again; + } + } + return index; +} + +static inline void set_bit_area(unsigned long *map, unsigned long i, + int len) +{ + unsigned long end = i + len; + while (i < end) { + __set_bit(i, map); + i++; + } +} + +static inline int is_span_boundary(unsigned int index, unsigned int nr, + unsigned long shift, + unsigned long boundary_size) +{ + shift = (shift + index) & (boundary_size - 1); + return shift + nr > boundary_size; +} + +unsigned long iommu_area_alloc(unsigned long *map, unsigned long size, + unsigned long start, unsigned int nr, + unsigned long shift, unsigned long boundary_size, + unsigned long align_mask) +{ + unsigned long index; +again: + index = find_next_zero_area(map, size, start, nr); + if (index != -1) { + index = (index + align_mask) & ~align_mask; + if (is_span_boundary(index, nr, shift, boundary_size)) { + /* we could do more effectively */ + start = index + 1; + goto again; + } + set_bit_area(map, index, nr); + } + return index; +} +EXPORT_SYMBOL(iommu_area_alloc); + +void iommu_area_free(unsigned long *map, unsigned long start, unsigned int nr) +{ + unsigned long end = start + nr; + + while (start < end) { + __clear_bit(start, map); + start++; + } +} +EXPORT_SYMBOL(iommu_area_free); -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 6/6] kill __clear_bit_string and find_next_zero_string
This kills unused __clear_bit_string and find_next_zero_string (they were used by only gart and calgary IOMMUs). Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- arch/x86/lib/Makefile_64|2 +- arch/x86/lib/bitstr_64.c| 28 include/asm-x86/bitops_64.h | 16 3 files changed, 1 insertions(+), 45 deletions(-) delete mode 100644 arch/x86/lib/bitstr_64.c diff --git a/arch/x86/lib/Makefile_64 b/arch/x86/lib/Makefile_64 index bbabad3..1b72bda 100644 --- a/arch/x86/lib/Makefile_64 +++ b/arch/x86/lib/Makefile_64 @@ -9,5 +9,5 @@ obj-$(CONFIG_SMP) += msr-on-cpu.o lib-y := csum-partial_64.o csum-copy_64.o csum-wrappers_64.o delay_64.o \ usercopy_64.o getuser_64.o putuser_64.o \ - thunk_64.o clear_page_64.o copy_page_64.o bitstr_64.o bitops_64.o + thunk_64.o clear_page_64.o copy_page_64.o bitops_64.o lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o copy_user_nocache_64.o diff --git a/arch/x86/lib/bitstr_64.c b/arch/x86/lib/bitstr_64.c deleted file mode 100644 index 7445caf..000 --- a/arch/x86/lib/bitstr_64.c +++ /dev/null @@ -1,28 +0,0 @@ -#include -#include - -/* Find string of zero bits in a bitmap */ -unsigned long -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int len) -{ - unsigned long n, end, i; - - again: - n = find_next_zero_bit(bitmap, nbits, start); - if (n == -1) - return -1; - - /* could test bitsliced, but it's hardly worth it */ - end = n+len; - if (end > nbits) - return -1; - for (i = n+1; i < end; i++) { - if (test_bit(i, bitmap)) { - start = i+1; - goto again; - } - } - return n; -} - -EXPORT_SYMBOL(find_next_zero_string); diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h index 48adbf5..aaf1519 100644 --- a/include/asm-x86/bitops_64.h +++ b/include/asm-x86/bitops_64.h @@ -37,12 +37,6 @@ static inline long __scanbit(unsigned long val, unsigned long max) ((off)+(__scanbit(~(((*(unsigned long *)addr)) >> (off)),(size)-(off : \ find_next_zero_bit(addr,size,off))) -/* - * Find string of zero bits in a bitmap. -1 when not found. - */ -extern unsigned long -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int len); - static inline void set_bit_string(unsigned long *bitmap, unsigned long i, int len) { @@ -53,16 +47,6 @@ static inline void set_bit_string(unsigned long *bitmap, unsigned long i, } } -static inline void __clear_bit_string(unsigned long *bitmap, unsigned long i, - int len) -{ - unsigned long end = i + len; - while (i < end) { - __clear_bit(i, bitmap); - i++; - } -} - /** * ffz - find first zero in word. * @word: The word to search -- 1.5.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 4/6] x86: convert calgary IOMMU to use the IOMMU helper
This patch converts calgary IOMMU to use the IOMMU helper functions. The IOMMU doesn't allocate a memory area spanning LLD's segment boundary anymore. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> --- arch/x86/Kconfig |3 +++ arch/x86/kernel/pci-calgary_64.c | 34 -- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 48d09cb..df22fe7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -433,6 +433,9 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT Calgary anyway, pass 'iommu=calgary' on the kernel command line. If unsure, say Y. +config IOMMU_HELPER + def_bool CALGARY_IOMMU + # need this always selected by IOMMU for the VIA workaround config SWIOTLB bool diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c index 21f34db..f5b47ba 100644 --- a/arch/x86/kernel/pci-calgary_64.c +++ b/arch/x86/kernel/pci-calgary_64.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -260,22 +261,28 @@ static void iommu_range_reserve(struct iommu_table *tbl, spin_unlock_irqrestore(&tbl->it_lock, flags); } -static unsigned long iommu_range_alloc(struct iommu_table *tbl, - unsigned int npages) +static unsigned long iommu_range_alloc(struct device *dev, + struct iommu_table *tbl, + unsigned int npages) { unsigned long flags; unsigned long offset; + unsigned long boundary_size; + + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1, + PAGE_SIZE) >> PAGE_SHIFT; BUG_ON(npages == 0); spin_lock_irqsave(&tbl->it_lock, flags); - offset = find_next_zero_string(tbl->it_map, tbl->it_hint, - tbl->it_size, npages); + offset = iommu_area_alloc(tbl->it_map, tbl->it_size, tbl->it_hint, + npages, 0, boundary_size, 0); if (offset == ~0UL) { tbl->chip_ops->tce_cache_blast(tbl); - offset = find_next_zero_string(tbl->it_map, 0, - tbl->it_size, npages); + + offset = iommu_area_alloc(tbl->it_map, tbl->it_size, 0, + npages, 0, boundary_size, 0); if (offset == ~0UL) { printk(KERN_WARNING "Calgary: IOMMU full.\n"); spin_unlock_irqrestore(&tbl->it_lock, flags); @@ -286,7 +293,6 @@ static unsigned long iommu_range_alloc(struct iommu_table *tbl, } } - set_bit_string(tbl->it_map, offset, npages); tbl->it_hint = offset + npages; BUG_ON(tbl->it_hint > tbl->it_size); @@ -295,13 +301,13 @@ static unsigned long iommu_range_alloc(struct iommu_table *tbl, return offset; } -static dma_addr_t iommu_alloc(struct iommu_table *tbl, void *vaddr, - unsigned int npages, int direction) +static dma_addr_t iommu_alloc(struct device *dev, struct iommu_table *tbl, + void *vaddr, unsigned int npages, int direction) { unsigned long entry; dma_addr_t ret = bad_dma_address; - entry = iommu_range_alloc(tbl, npages); + entry = iommu_range_alloc(dev, tbl, npages); if (unlikely(entry == bad_dma_address)) goto error; @@ -354,7 +360,7 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr, badbit, tbl, dma_addr, entry, npages); } - __clear_bit_string(tbl->it_map, entry, npages); + iommu_area_free(tbl->it_map, entry, npages); spin_unlock_irqrestore(&tbl->it_lock, flags); } @@ -438,7 +444,7 @@ static int calgary_map_sg(struct device *dev, struct scatterlist *sg, vaddr = (unsigned long) sg_virt(s); npages = num_dma_pages(vaddr, s->length); - entry = iommu_range_alloc(tbl, npages); + entry = iommu_range_alloc(dev, tbl, npages); if (entry == bad_dma_address) { /* makes sure unmap knows to stop */ s->dma_length = 0; @@ -476,7 +482,7 @@ static dma_addr_t calgary_map_single(struct device *dev, void *vaddr, npages = num_dma_pages(uaddr, size); if (translation_enabled(tbl)) - dma_handle = iommu_alloc(tbl, vaddr, npages, direction); + dma_handle = iommu_alloc(dev, tbl, vaddr, npages, direction); else dma_handle = virt_to_bus(vaddr); @@ -516,7 +522,7 @@ static void* calgary_alloc_coherent(struct device *dev, size_t size, if (translation_enabled(tbl)) { /* set up tces to cover the allocated range */ - mapping = iommu_alloc(tbl, ret, npages, DMA_BIDIRECTIONAL); +
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Fri, Dec 07, 2007 at 09:39:44AM -0500, Vivek Goyal wrote: > On Thu, Dec 06, 2007 at 07:10:23PM -0500, Neil Horman wrote: > > On Thu, Dec 06, 2007 at 05:11:43PM -0500, Vivek Goyal wrote: > > > On Thu, Dec 06, 2007 at 04:39:51PM -0500, Neil Horman wrote: > > > > On Fri, Nov 30, 2007 at 09:51:31AM -0500, Neil Horman wrote: > > > > > On Fri, Nov 30, 2007 at 09:42:50AM -0500, Vivek Goyal wrote: > > > > > > > > > > > > > > Thats what I'm doing at the moment. I'm working on a RHEL5 patch at > > > > > the moment > > > > > (since thats whats on the production system thats failing), and will > > > > > forward > > > > > port it once its working > > > > > > > > > > And not to split hairs, but techically thats not our _only_ choice. > > > > > We could > > > > > force kdump boots on cpu0 as well ;) > > > > > > > > > > Thanks > > > > > Neil > > > > > > > > > > > Thanks > > > > > > Vivek > > > > > > > > > > > > > > > > > Sorry to have been quiet on this issue for a few days. Interesting news > > > > to > > > > report, though. So I was working on a patch to do early apic enabling > > > > on > > > > x86_64, and had something working for the old 2.6.18 kernel that we were > > > > origionally testing on. Unfortunately while it worked on 2.6.18 it > > > > failed > > > > miserably on 2.6.24-rc3-mm2, causing check_timer to consistently report > > > > that the > > > > timer interrupt wasn't getting received (even though we could > > > > successfully run > > > > calibrate_delay). Vivek and I were digging into this, when I ran > > > > accross the > > > > description of the hypertransport configuration register in the opteron > > > > specification. It contains a bit that, suprise, configures the ht bus > > > > to either > > > > unicast interrupts delivered accross the ht bus to a single cpu, or to > > > > broadcast > > > > it to all cpus. Since it seemed more likely that the 8259 in the nvidia > > > > southbridge was transporting legacy mode interrupts over the ht bus than > > > > directly to cpu0 via an actual wire, I wrote the attached patch to add > > > > a quirk > > > > for nvidia chipsets, which scanned for hypertransport controllers, and > > > > ensured > > > > that that broadcast bit was set. Test results indicate that this > > > > solves the > > > > problem, and kdump kernels boot just fine on the affected system. > > > > > > > > > > Hi Neil, > > > > > > Should we disable this broadcasting feature once we are through? Otherwise > > > in normal systems it might mean extra traffic on hypertransport. There > > > is no need for every interrupt to be broadcasted in normal systems? > > > > > > Thanks > > > Vivek > > > > No, I don't think thats necessecary. Once the apics are enabled, interrupts > > shouldn't travel accross the hypertransport bus anyway, opting instead to > > use > > the dedicated apic bus (at least thats my understanding). > > I think all interrupt message travel on hypertransport. Even after APICS > have been enabled. > > Look at the following document. > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24674.pdf > > Have a look at figure 1, figure 2 and section 3.4.2.2 and 3.4.2.3 > > That's a different thing that once IOAPIC has formed the vectored message, > Hypertransport might not touch the destination field. > Ok, that might be the case then. > Having said that, I am wondering what will happen if a system continues > to operate the timer through IOAPIC in ExtInt mode. Will hypertransport > keep on broadcasting that interrupt to every cpu? And every cpu will > process that interrupt. > I don't think so. IIRC once the other cpus are started they all disable the timer interrupt, except for one cpu, opting instead to get the timer tick via ipi, So while they all might see the interrupt packet on the ht bus, only one cpu will process it. > Hence, I feel it is safe to restore the broadcast bit back to BIOS value once > we are through calibrate_delay(). > I disagree. Looking at what Yinghai said, the default setting for the broadcast bit isn't actually to unicast the interrupt, its just to set the broadcast mask to 0xF, or to 0xFF. Its use is actually to allow cpus with an extended 8 bit apic id see interrupts. So its not so much to direct interrupts to cpu0, but rather to the first 16 cpus rather than to all 255 available cpus. From what I've seen in my testing, systems that 'work' already have this bit set by bios, and my quirk patch above does nothing to them. Disabling this bit after calibrate_dealy is going to introduce more uncertainty in systems that have been proven to work. We should leave well enough alone, and just enable the bit if its off, and we see that we are using extended apic ids via bit 18 of the same register, as Yinghai pointed out. By enabling the quirk that way, all we are really doing is bringing into alignment two bits that should arguably be set/cleared in unison anyway. Regards Neil > Thanks > Vivek
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Bob Tracy <[EMAIL PROTECTED]> wrote: > > Current state of the source tree is the 6f37ac... version, so I'll > > start backing out the above diffs in related groups and continue > > until I've got a working kernel. For lack of an obvious target, > > I'll start with the seemingly innocuous change to sysctl_check.c. > > I'll report back when I've got something. > > That was quick :-). Backing out the sysctl_check.c diff gives me a > working kernel. Beats the [EMAIL PROTECTED] out of me how/why, though. > > Michael Cree: could you try backing out the diff below from your > 2.6.24-rc3 tree and see if things are now working for you? > > Here's "uname -a", just to confirm (maybe) I'm running on what I say > works: > > Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha > > Here's the diff I backed out (patch -R). It's short... > > diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c > index 5a2f2b2..4abc6d2 100644 > --- a/kernel/sysctl_check.c > +++ b/kernel/sysctl_check.c > @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = { > { NET_ROSE, "rose", trans_net_rose_table }, > { NET_IPV6, "ipv6", trans_net_ipv6_table }, > { NET_X25, "x25", trans_net_x25_table }, > - { NET_TR, "tr", trans_net_tr_table }, > + { NET_TR, "token-ring", trans_net_tr_table }, > { NET_DECNET, "decnet", trans_net_decnet_table }, > /* NET_ECONET not used */ > { NET_SCTP, "sctp", trans_net_sctp_table }, reverting this makes the kernel image shorter by 8 bytes - so perhaps some alignment issue somewhere? Or something gets overflown? Does any of this get actually used by your bootup? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23.8: OOM killer kills wrong jobs
On Fri, 07 Dec 2007 10:25:23 +0100 Martin MOKREJŠ <[EMAIL PROTECTED]> wrote: > Hi, > first of all, sorry for not being up to date with how the OOM killer > works. I think there used to be a kernel config option to disable > OOM killer and instead kill the process which actually asks for the > memory and supposedly caused the memory lack. That is what I would > like to have on my system. I a have a 1GB RAM laptop and use t-coffee > software from > http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html > to do some science. ;) The OOM killer triggers where there is no way to fulfill a page request. Something has to go and there is no real notion of "right" or "wrong" process at that point. You can either set no overcommit in which case you'll get failed malloc and similar rather than allow overcommit, or you can set the OOM priority of tasks yourself so that your specific app of choice always dies first. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Thu, Dec 06, 2007 at 07:10:23PM -0500, Neil Horman wrote: > On Thu, Dec 06, 2007 at 05:11:43PM -0500, Vivek Goyal wrote: > > On Thu, Dec 06, 2007 at 04:39:51PM -0500, Neil Horman wrote: > > > On Fri, Nov 30, 2007 at 09:51:31AM -0500, Neil Horman wrote: > > > > On Fri, Nov 30, 2007 at 09:42:50AM -0500, Vivek Goyal wrote: > > > > > > > > > > > Thats what I'm doing at the moment. I'm working on a RHEL5 patch at > > > > the moment > > > > (since thats whats on the production system thats failing), and will > > > > forward > > > > port it once its working > > > > > > > > And not to split hairs, but techically thats not our _only_ choice. We > > > > could > > > > force kdump boots on cpu0 as well ;) > > > > > > > > Thanks > > > > Neil > > > > > > > > > Thanks > > > > > Vivek > > > > > > > > > > > > > Sorry to have been quiet on this issue for a few days. Interesting news to > > > report, though. So I was working on a patch to do early apic enabling on > > > x86_64, and had something working for the old 2.6.18 kernel that we were > > > origionally testing on. Unfortunately while it worked on 2.6.18 it failed > > > miserably on 2.6.24-rc3-mm2, causing check_timer to consistently report > > > that the > > > timer interrupt wasn't getting received (even though we could > > > successfully run > > > calibrate_delay). Vivek and I were digging into this, when I ran accross > > > the > > > description of the hypertransport configuration register in the opteron > > > specification. It contains a bit that, suprise, configures the ht bus to > > > either > > > unicast interrupts delivered accross the ht bus to a single cpu, or to > > > broadcast > > > it to all cpus. Since it seemed more likely that the 8259 in the nvidia > > > southbridge was transporting legacy mode interrupts over the ht bus than > > > directly to cpu0 via an actual wire, I wrote the attached patch to add a > > > quirk > > > for nvidia chipsets, which scanned for hypertransport controllers, and > > > ensured > > > that that broadcast bit was set. Test results indicate that this solves > > > the > > > problem, and kdump kernels boot just fine on the affected system. > > > > > > > Hi Neil, > > > > Should we disable this broadcasting feature once we are through? Otherwise > > in normal systems it might mean extra traffic on hypertransport. There > > is no need for every interrupt to be broadcasted in normal systems? > > > > Thanks > > Vivek > > No, I don't think thats necessecary. Once the apics are enabled, interrupts > shouldn't travel accross the hypertransport bus anyway, opting instead to use > the dedicated apic bus (at least thats my understanding). I think all interrupt message travel on hypertransport. Even after APICS have been enabled. Look at the following document. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24674.pdf Have a look at figure 1, figure 2 and section 3.4.2.2 and 3.4.2.3 That's a different thing that once IOAPIC has formed the vectored message, Hypertransport might not touch the destination field. Having said that, I am wondering what will happen if a system continues to operate the timer through IOAPIC in ExtInt mode. Will hypertransport keep on broadcasting that interrupt to every cpu? And every cpu will process that interrupt. Hence, I feel it is safe to restore the broadcast bit back to BIOS value once we are through calibrate_delay(). Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
re: 2.6.23.8: OOM killer kills wrong jobs
Marting Mokreja wrote: > first of all, sorry for not being up to date with how the OOM killer > works. I think there used to be a kernel config option to disable > OOM killer and instead kill the process which actually asks for the > memory and supposedly caused the memory lack. That is what I would > like to have on my system. I a have a 1GB RAM laptop You probably just need to add more swap space on your system, Any time the OOM killer fires, something's wrong with the system, and it's more productive to deal with that than to wish for a more accurate OOM killer; see http://lwn.net/Articles/111408/ When I was working at a company that used embedded Linux, I eventually figured this out, and patched the kernel to panic on OOM conditions; that gave users the right incentive to avoid configuring jobs that caused the system to run out of memory. - Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Fri, Dec 07, 2007 at 01:22:04AM -0800, Yinghai Lu wrote: > On Dec 7, 2007 12:50 AM, Yinghai Lu <[EMAIL PROTECTED]> wrote: > > > > On Dec 6, 2007 4:33 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > ... > > > > > > My feel is that if it is for legacy interrupts only it should not be a > > > problem. > > > Let's investigate and see if we can unconditionally enable this quirk > > > for all opteron systems. > > > > i checked that bit > > > > http://www.openbios.org/viewvc/trunk/LinuxBIOSv2/src/northbridge/amd/amdk8/coherent_ht.c?revision=2596&view=markup > > > > static void enable_apic_ext_id(u8 node) > > { > > #if ENABLE_APIC_EXT_ID==1 > > #warning "FIXME Is the right place to enable apic ext id here?" > > > > u32 val; > > > > val = pci_read_config32(NODE_HT(node), 0x68); > > val |= (HTTC_APIC_EXT_SPUR | HTTC_APIC_EXT_ID | > > HTTC_APIC_EXT_BRD_CST); > > pci_write_config32(NODE_HT(node), 0x68, val); > > #endif > > } > > > > that bit only be should be set when apic id is lifted and cpu apid is > > using 8 bits and that mean broadcast is 0xff instead 0x0f. > > for example 8 socket dual core system or 4 socket quad core > > system,that you should make BSP start from 0x04, so cpus apic id will > > be [0x04, 0x13) > > > > > > So if you want to enable that in early_quirk, you need to > > make sure apic id is using 8 bits by check if the bit 16 (HTTC_APIC_ID) is > > set. > > it should be bit 18 (HTTC_APIC_EXT_ID) > > > YH this seems reasonable, I can reroll the patch for this. As I think about it I'm also going to update the patch to make this check occur for any pci class 0600 device from vendor AMD, since its possible that more than just nvidia chipsets can be affected. I'll repost as soon as I've tested, thanks! Neil -- /*** *Neil Horman *Software Engineer *Red Hat, Inc. [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore
On Dec 7, 2007 8:33 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > On Friday, 7 of December 2007, Huang, Ying wrote: > > This patch adds a file in proc file system to access the loaded > > kexec_image, which may contains the memory image of kexeced > > system. This can be used by kexec based hibernation to create a file > > image of hibernating kernel, so that a kernel booting process is not > > needed for each hibernating. > > Hm, I'm not sure what you mean. > > Can you explain a bit, please? The normal kexec based hibernation procedure is as follow: 1. kexec_load the kernel image and initramfs 2. jump to hibernating kernel 3. the normal boot process of kexeced kernel 4. jump back to hibernated kernel 5. execute ACPI methods 6. jump to hibernating kernel 7. write memory image of hibernated kernel 8. go to ACPI S4 state With kimgcore: A. Prepare a memory image of hibernation kernel: A.1 kexec_load the kernel image and initramfs A.2 jump to hibernating kernel A.3 the normal boot process of kexeced kernel A.4 jump back to hibernated kernel A.5 save the memory image of hibernating kernel via kimgcore The normal hibernate process is as follow: 1. kexec load the kimgcore of hibernatin kernel 2. jump to the hibernating kernel 3. execute ACPI methods 4. jump to hibernating kernel 5. write memory image of hibernated kernel 6. go to ACPI S4 state So the boot process of hibernating kernel needs only once unless the hardware configuration is changed. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: question about sata-error on boot.
Hi, On Mittwoch, 7. November 2007, Andrew Morton wrote: > > On Fri, 2 Nov 2007 19:34:20 +0100 "Hemmann, Volker Armin" > > <[EMAIL PROTECTED]> wrote: Hi, > > (cc linux-ide) > > > for some time (and I can't say for how long, but the board is less than a > > month old) I get this error on boot: > > > > [ 42.116273] ahci :00:0a.0: version 2.2 > > [ 42.116482] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 > > [ 42.116653] ACPI: PCI Interrupt :00:0a.0[A] -> Link [LSA0] -> GSI > > 23 (level, low) -> IRQ 23 > > [ 43.119478] ahci :00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps > > 0xf impl IDE mode > > [ 43.119778] ahci :00:0a.0: flags: 64bit led clo pmp pio > > [ 43.119943] PCI: Setting latency timer of device :00:0a.0 to 64 > > [ 43.120149] scsi0 : ahci > > [ 43.120365] scsi1 : ahci > > [ 43.120556] scsi2 : ahci > > [ 43.120741] scsi3 : ahci > > [ 43.120927] ata1: SATA max UDMA/133 cmd 0xc2014100 ctl > > 0x bmdma 0x irq 315 > > [ 43.121227] ata2: SATA max UDMA/133 cmd 0xc2014180 ctl > > 0x bmdma 0x irq 315 > > [ 43.121526] ata3: SATA max UDMA/133 cmd 0xc2014200 ctl > > 0x bmdma 0x irq 315 > > [ 43.121826] ata4: SATA max UDMA/133 cmd 0xc2014280 ctl > > 0x bmdma 0x irq 315 > > [ 43.934296] ata1: softreset failed (1st FIS failed) > > [ 43.934461] ata1: reset failed (errno=-5), retrying in 10 secs > > [ 53.885194] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > > [ 53.885890] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max > > UDMA/133 [ 53.886056] ata1.00: 312581808 sectors, multi 16: LBA48 > > [ 53.886804] ata1.00: configured for UDMA/133 > > [ 54.201147] ata2: SATA link down (SStatus 0 SControl 300) > > [ 54.517101] ata3: SATA link down (SStatus 0 SControl 300) > > [ 54.833055] ata4: SATA link down (SStatus 0 SControl 300) this is gone with 2.6.22.13 an 2.6.23.9: [ 33.277039] scsi0 : ahci [ 33.277262] scsi1 : ahci [ 33.277454] scsi2 : ahci [ 33.277645] scsi3 : ahci [ 33.277826] ata1: SATA max UDMA/133 cmd 0xc2020100 ctl 0x bmdma 0x irq 315 [ 33.278120] ata2: SATA max UDMA/133 cmd 0xc2020180 ctl 0x bmdma 0x irq 315 [ 33.278414] ata3: SATA max UDMA/133 cmd 0xc2020200 ctl 0x bmdma 0x irq 315 [ 33.278708] ata4: SATA max UDMA/133 cmd 0xc2020280 ctl 0x bmdma 0x irq 315 [ 33.751855] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 33.752659] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 33.752821] ata1.00: 312581808 sectors, multi 16: LBA48 [ 33.753574] ata1.00: configured for UDMA/133 [ 34.067809] ata2: SATA link down (SStatus 0 SControl 300) [ 34.383762] ata3: SATA link down (SStatus 0 SControl 300) [ 34.699717] ata4: SATA link down (SStatus 0 SControl 300) [ 34.700029] scsi 0:0:0:0: Direct-Access ATA WDC WD1600JS-00M 10.0 PQ: 0 ANSI: 5 [ 34.700377] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) [ 34.700544] sd 0:0:0:0: [sda] Write Protect is off [ 34.700703] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 34.700712] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 34.701026] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) [ 34.701191] sd 0:0:0:0: [sda] Write Protect is off [ 34.701350] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 34.701358] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 34.701651] sda: sda1 sda2 sda3 sda4 < sda5 sda6 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Wed, 5 Dec 2007, David Miller wrote: > From: Reuben Farrelly <[EMAIL PROTECTED]> > Date: Thu, 06 Dec 2007 17:59:37 +1100 > > > On 5/12/2007 4:17 PM, Andrew Morton wrote: > > > - Lots of device IDs have been removed from the e1000 driver and moved > > > over > > > to e1000e. So if your e1000 stops working, you forgot to set > > > CONFIG_E1000E. > > > > This non fatal oops which I have just noticed may be related to this change > > then > > - certainly looks networking related. > > > > WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() > > Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 > > > > Call Trace: > > [] tcp_fastretrans_alert+0x229/0xe63 > > [] tcp_ack+0xa3f/0x127d > > [] tcp_rcv_established+0x55f/0x7f8 > > [] tcp_v4_do_rcv+0xdb/0x3a7 > > [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 > > No, it's from TCP assertions and changes added by Ilpo to the > net-2.6.25 tree recently. Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). I'll look what could go wrong with fack_count calculations, most likely it's the reason (I've found earlier one out-of-place retransmission segment in one of my test case which already indicated that there's something incorrect with them but didn't have time to debug it yet). Thanks for report. Some info about how easily you can reproduce & couple of sentences about the test case might be useful later on when evaluating the fix. -- i. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Possible EXT2 race
On linux-2.6.22.1, executing the following script while the mailer is writing to /var/spool/mail/linux-os. #!/bin/bash while true ; do >/var/spool/mail/linux-os; sleep 1; done ...will cause the following errors to occur. Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Sense Key : No Sense [deferred] Dec 7 04:05:55 chaos kernel: Info fld=0x1980240 Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687515944, limit=33736437 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656387: block -584027804 read error Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940964, count = 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940980, count = 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940980, count = 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: bit already cleared for block 1 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710941012, count = 1 Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687528104, limit=33736437 Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_branches: Read failure, inode=656399, block=-584026284 Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687529288, limit=33736437 Dec 7 04:08:15 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656400: block -584026136 read error Dec 7 04:08:18 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656403: bad block 30188 Caution is advised when testing because this destroyed a filesystem, making it unfixable by `fsck`. Cheers, Dick Johnson Penguin : Linux version 2.6.22.1 on an i686 machine (5588.29 BogoMips). My book : http://www.AbominableFirebug.com/ _ The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: everything in wait_for_completion, what is my system doing?
Hello Andrew, thanks for your help! On Friday 07 December 2007 02:09:11 Andrew Morton wrote: > On Wed, 5 Dec 2007 21:44:54 +0100 > > Bernd Schubert <[EMAIL PROTECTED]> wrote: > > after scsi-recovery a system here went into some kind lock-up, everything > > seems to be in wait_for_completion(). Please see the attached > > blocked_states.txt and all_states.txt files. > > This is 2.6.22.12, I can easily find out the line numbers if required. > > > > Any help is highly appreciated. > > Please cc linux-scsi on scsi-related reports. Sorry, I these traces confused me a bit. I had absolutely no idea about a possible reason. > > > [blocked_states.txt text/plain (20.5KB)] > > [generate break] > > [ 1818.566436] SysRq : Show Blocked State > > [ 1818.570260] > > [ 1818.570261] free > > sibling [ 1818.579253] task PCstack pid > > father child younger older [ 1818.586987] events/7 D > > 0155dd642280 026 2 (L-TLB) [ 1818.593747] > > 81012b529ac0 0046 810128280d18 [ > > 1818.601321] 8100ba2376f8 81012b689630 81012aff76b0 > > 00078023e215 [ 1818.608870] 00010003ca14 > > 810001065400 000780430c13 [ 1818.616222] Call Trace: > > [ 1818.618925] [] io_schedule+0x28/0x36 > > [ 1818.624207] [] get_request_wait+0x104/0x158 > > [ 1818.630112] [] blk_get_request+0x36/0x6b > > [ 1818.635755] [] scsi_execute+0x51/0x129 > > [ 1818.641240] [] > > :scsi_transport_spi:spi_execute+0x87/0xf8 [ 1818.648271] > > [] > > :scsi_transport_spi:spi_dv_device_echo_buffer+0x181/0x27d [ 1818.656739] > > [] :scsi_transport_spi:spi_dv_retrain+0x4e/0x240 [ > > 1818.664139] [] > > :scsi_transport_spi:spi_dv_device+0x615/0x69c [ 1818.671542] > > [] :mptspi:mptspi_dv_device+0xb3/0x14b [ 1818.678042] > > [] :mptspi:mptspi_dv_renegotiate_work+0xcb/0xef [ > > 1818.685348] [] run_workqueue+0x8e/0x120 > > [ 1818.690905] [] worker_thread+0x106/0x117 > > [ 1818.696540] [] kthread+0x4b/0x82 > > [ 1818.701474] [] child_rip+0xa/0x12 > > [ 1818.706495] > > [ 1818.708022] unionfs-fuse- D 01a76ef63463 0 1119 1 > > (NOTLB) [ 1818.714764] 810129765988 0082 > > 80337e22 [ 1818.722329] 8101297658c8 > > 81012b652f20 810129eec810 0006 [ 1818.729895] > > 00010005204e 81000105c400 000680337c3e [ > > 1818.737249] Call Trace: > > [ 1818.739953] [] schedule_timeout+0x8a/0xb6 > > [ 1818.745673] [] io_schedule_timeout+0x28/0x36 > > [ 1818.751664] [] congestion_wait+0x9d/0xc2 > > [ 1818.757300] [] > > balance_dirty_pages_ratelimited_nr+0x196/0x22f [ 1818.764781] > > [] generic_file_buffered_write+0x52a/0x60d [ > > 1818.771641] [] > > __generic_file_aio_write_nolock+0x45a/0x491 [ 1818.778852] > > [] generic_file_aio_write+0x61/0xc1 [ 1818.785101] > > [] nfs_file_write+0x138/0x1b7 > > [ 1818.790822] [] do_sync_write+0xcc/0x112 > > [ 1818.796372] [] vfs_write+0xc3/0x165 > > [ 1818.801575] [] sys_pwrite64+0x68/0x96 > > [ 1818.806959] [] system_call+0x7e/0x83 > > [ 1818.812250] [<2b4eeec3ea73>] > > > > [snippage] > > Possibly your device driver had conniptions and stopped generating > completion interrupts. > > Which driver is in use? This is this time easily visible from the traces (mptspi_dv_device) ;) So its the mpt driver, we are using LSI22320 cards (I CC'ed Eric). > > I don't suppose it is repeatable. Thats a clear "yes and no". Exactly this state we have got two or three times during an exhausting hardware stress test over the last weeks (with real and with simulated errors), but its not easily reproducible. Furthermore, the hardware will go into production soon and I don't have the chance to simulate further errors. However, we can easily get a similar state just on a raid6-rebuild (with high end hardware though. (You probably never won't run into into it with normal disks, we are doing software-raid over a bunch of several hardware raid systems). In the raid6-rebuild case the system is not completely locked up, just mostly. Somehow raid6-rebuild is still working, we can see this by the io usage status of the hardware-raids, but the system is completely blocked otherwise. Only pings and sysrq's are working. Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kbuild: implement modules.order, take #2
When multiple built-in modules (especially drivers) provide the same capability, they're prioritized by link order specified by the order listed in Makefile. This implicit ordering is lost for loadable modules. When driver modules are loaded by udev, what comes first in modules.alias file is selected. However, the order in this file is indeterministic (depends on filesystem listing order of installed modules). This causes confusion. The solution is two-parted. This patch updates kbuild such that it generates and installs modules.order which contains the name of modules ordered according to Makefile. The second part is update to depmod such that it generates output files according to this file. Note that both obj-y and obj-m subdirs can contain modules and ordering information between those two are lost from beginning. Currently obj-y subdirs are put before obj-m subdirs. Sam Ravnborg cleaned up Makefile modifications and suggested using awk to remove duplicate lines from modules.order instead of using separate C program. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: Sam Ravnborg <[EMAIL PROTECTED]> Cc: Bill Nottingham <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Greg Kroah-Hartman <[EMAIL PROTECTED]> Cc: Kay Sievers <[EMAIL PROTECTED]> --- Makefile |8 +++- scripts/Makefile.build | 17 - scripts/Makefile.lib |6 ++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 92dc3cb..1542dd2 100644 --- a/Makefile +++ b/Makefile @@ -1020,9 +1020,14 @@ ifdef CONFIG_MODULES all: modules # Build modules +# +# A module can be listed more than once in obj-m resulting in +# duplicate lines in modules.order files. Those are removed +# using awk while concatenating to the final file. PHONY += modules modules: $(vmlinux-dirs) $(if $(KBUILD_BUILTIN),vmlinux) + $(Q)$(AWK) '!x[$$0]++' $(vmlinux-dirs:%=$(objtree)/%/modules.order) > $(objtree)/modules.order @echo ' Building modules, stage 2.'; $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost @@ -1050,6 +1055,7 @@ _modinst_: rm -f $(MODLIB)/build ; \ ln -s $(objtree) $(MODLIB)/build ; \ fi + @cp -f $(objtree)/modules.order $(MODLIB)/ $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modinst # This depmod is only for convenience to give the initial @@ -1109,7 +1115,7 @@ clean: archclean $(clean-dirs) @find . $(RCS_FIND_IGNORE) \ \( -name '*.[oas]' -o -name '*.ko' -o -name '.*.cmd' \ -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \ - -o -name '*.symtypes' \) \ + -o -name '*.symtypes' -o -name 'modules.order' \) \ -type f -print | xargs rm -f # mrproper - Delete all generated files, including .config diff --git a/scripts/Makefile.build b/scripts/Makefile.build index de9836e..875cbdb 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -83,10 +83,12 @@ ifneq ($(strip $(obj-y) $(obj-m) $(obj-n) $(obj-) $(lib-target)),) builtin-target := $(obj)/built-in.o endif +modorder-target := $(obj)/modules.order + # We keep a list of all modules in $(MODVERDIR) __build: $(if $(KBUILD_BUILTIN),$(builtin-target) $(lib-target) $(extra-y)) \ -$(if $(KBUILD_MODULES),$(obj-m)) \ +$(if $(KBUILD_MODULES),$(obj-m) $(modorder-target)) \ $(subdir-ym) $(always) @: @@ -276,6 +278,19 @@ targets += $(builtin-target) endif # builtin-target # +# Rule to create modules.order file +# +# Create commands to either record .ko file or cat modules.order from +# a subdirectory +modorder-cmds =\ + $(foreach m, $(modorder), \ + $(if $(filter %/modules.order, $m), \ + cat $m;, echo kernel/$m;)) + +$(modorder-target): $(subdir-ym) FORCE + $(Q)(cat /dev/null; $(modorder-cmds)) > $@ + +# # Rule to compile a set of .o files into one .a file # ifdef lib-target diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 3c5e88b..8e44023 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -25,6 +25,11 @@ lib-y := $(filter-out $(obj-y), $(sort $(lib-y) $(lib-m))) # o if we encounter foo/ in $(obj-m), remove it from $(obj-m) # and add the directory to the list of dirs to descend into: $(subdir-m) +# Determine modorder. +# Unfortunately, we don't have information about ordering between -y +# and -m subdirs. Just put -y's first. +modorder := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y)) $(obj-m:.o=.ko)) + __subdir-y := $(patsubst %/,%,$(filter %/, $(obj-y))) subdir-y += $(__subdir-y) __subdir-m := $(patsubst %/,%,$(filter %/, $(obj-m))) @@ -64,6 +69,7 @@ real-objs-m := $(foreach m, $(obj-m), $(if $(strip $($(m:.o=-objs)) $($(m:.o=-y) extra-y:= $(addprefix $(obj)/,$(
Re: [RFC][POWERPC] Provide a way to protect 4k subpages when using 64k pages
On Friday 07 December 2007, Paul Mackerras wrote: > I have re-purposed the ioperm system call for this. The old ioperm > system call never did anything (except return an ENOSYS error) and in > fact never could have actually been useful for anything on the PowerPC > architecture, so nothing ever used it. Couldn't there be a program that relies on ioperm to return -ENOSYS on powerpc in order to fall back on some other method of I/O access? The risk of actually breaking something is certainly low, but I think you can never be sure here, so why not use a new syscall number? Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:18 PM, Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > Any pointer to it? Nevermind, I found it ... in this same thread :-( -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFSv2/3 broken exporting/mounting (permission denied) in 2.6.24-rc4
On Fri, Dec 07, 2007 at 11:54:38AM +0100, Mikael Pettersson wrote: > On Thu, 6 Dec 2007 21:20:41 -0500, Erez Zadok wrote: > > I get a "permission denied" when trying to mount a localhost nfsv2/3 > > exported volume, on v2.6.24-rc4-124-gf194d13. It works w/ nfsv4 mounting. > > It worked fine in 2.6.24-rc3. Here's a sequence of ops I tried: > > > > # mount -t ext2 /dev/hdb1 /n/lower/b0 > > # exportfs -o no_root_squash,rw localhost:/n/lower/b0 > > # mount -t nfs -o nfsvers=3 localhost:/n/lower/b0 /mnt > > I'm seeing something similar too. NFSv3 export of an ext3 partition > to another machine in my lan fails (client gets permission denied) > when the server runs 2.6.24-rc4. It worked fine in 2.6.24-rc3. > > There's no NFSv4 of any kind on either client or server. And you're not varying the client at all, you're only changing the kernel version on the server? There are literally no commits between v2.6.24-rc3 and v2.6.24-rc4 which touch fs/nfsd/. What filesystem are you exporting? Are the nfs-utils versions the same in both cases? Also, could you get a network trace showing the failure? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Fri, Dec 07, 2007 at 10:16:23AM -0500, Vivek Goyal wrote: > On Fri, Dec 07, 2007 at 09:53:15AM -0500, Neil Horman wrote: > > On Fri, Dec 07, 2007 at 09:39:44AM -0500, Vivek Goyal wrote: > > > On Thu, Dec 06, 2007 at 07:10:23PM -0500, Neil Horman wrote: > > > > On Thu, Dec 06, 2007 at 05:11:43PM -0500, Vivek Goyal wrote: > > > > > On Thu, Dec 06, 2007 at 04:39:51PM -0500, Neil Horman wrote: > > > > > > On Fri, Nov 30, 2007 at 09:51:31AM -0500, Neil Horman wrote: > > > > > > > On Fri, Nov 30, 2007 at 09:42:50AM -0500, Vivek Goyal wrote: > > > > > > > > > > > > > > > > > > > > Thats what I'm doing at the moment. I'm working on a RHEL5 patch > > > > > > > at the moment > > > > > > > (since thats whats on the production system thats failing), and > > > > > > > will forward > > > > > > > port it once its working > > > > > > > > > > > > > > And not to split hairs, but techically thats not our _only_ > > > > > > > choice. We could > > > > > > > force kdump boots on cpu0 as well ;) > > > > > > > > > > > > > > Thanks > > > > > > > Neil > > > > > > > > > > > > > > > Thanks > > > > > > > > Vivek > > > > > > > > > > > > > > > > > > > > > > > > > Sorry to have been quiet on this issue for a few days. Interesting > > > > > > news to > > > > > > report, though. So I was working on a patch to do early apic > > > > > > enabling on > > > > > > x86_64, and had something working for the old 2.6.18 kernel that we > > > > > > were > > > > > > origionally testing on. Unfortunately while it worked on 2.6.18 it > > > > > > failed > > > > > > miserably on 2.6.24-rc3-mm2, causing check_timer to consistently > > > > > > report that the > > > > > > timer interrupt wasn't getting received (even though we could > > > > > > successfully run > > > > > > calibrate_delay). Vivek and I were digging into this, when I ran > > > > > > accross the > > > > > > description of the hypertransport configuration register in the > > > > > > opteron > > > > > > specification. It contains a bit that, suprise, configures the ht > > > > > > bus to either > > > > > > unicast interrupts delivered accross the ht bus to a single cpu, or > > > > > > to broadcast > > > > > > it to all cpus. Since it seemed more likely that the 8259 in the > > > > > > nvidia > > > > > > southbridge was transporting legacy mode interrupts over the ht bus > > > > > > than > > > > > > directly to cpu0 via an actual wire, I wrote the attached patch to > > > > > > add a quirk > > > > > > for nvidia chipsets, which scanned for hypertransport controllers, > > > > > > and ensured > > > > > > that that broadcast bit was set. Test results indicate that this > > > > > > solves the > > > > > > problem, and kdump kernels boot just fine on the affected system. > > > > > > > > > > > > > > > > Hi Neil, > > > > > > > > > > Should we disable this broadcasting feature once we are through? > > > > > Otherwise > > > > > in normal systems it might mean extra traffic on hypertransport. There > > > > > is no need for every interrupt to be broadcasted in normal systems? > > > > > > > > > > Thanks > > > > > Vivek > > > > > > > > No, I don't think thats necessecary. Once the apics are enabled, > > > > interrupts > > > > shouldn't travel accross the hypertransport bus anyway, opting instead > > > > to use > > > > the dedicated apic bus (at least thats my understanding). > > > > > > I think all interrupt message travel on hypertransport. Even after APICS > > > have been enabled. > > > > > > Look at the following document. > > > > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24674.pdf > > > > > > Have a look at figure 1, figure 2 and section 3.4.2.2 and 3.4.2.3 > > > > > > That's a different thing that once IOAPIC has formed the vectored message, > > > Hypertransport might not touch the destination field. > > > > > Ok, that might be the case then. > > > > > Having said that, I am wondering what will happen if a system continues > > > to operate the timer through IOAPIC in ExtInt mode. Will hypertransport > > > keep on broadcasting that interrupt to every cpu? And every cpu will > > > process that interrupt. > > > > > I don't think so. IIRC once the other cpus are started they all disable the > > timer interrupt, except for one cpu, opting instead to get the timer tick > > via > > ipi, So while they all might see the interrupt packet on the ht bus, only > > one > > cpu will process it. > > > > Does LAPIC allow to disable a specific vector and not accept interrupts? I > don't think so. If a timer interrupt is broadcasted to every cpu I think > everybody will accept it (like broadcast IPI). That's why intelligence > is built into IOAPIC and direct interrupts to a cpu or group of cpu. > See disable_APIC_timer(). It seems to set the mask bit in the APIC_LVTT entry. > I am just trying to understand the functionality better. Can somebody help me > understand how do we make sure that same timer interrupt is not processed by
RE: ptrace API extensions for BTS
>From: Andi Kleen [mailto:[EMAIL PROTECTED] >Sent: Freitag, 7. Dezember 2007 14:04 >With Out-of-order CPUs exact global metrics are pretty difficult. >At which point of the instruction execution would you measure? All I want to do is order the execution chunks of different threads. Taking two snapshots somewhere near the beginning and the end of context switching should be good enough. There's all the scheduler code in between (or at least the context switch code). I don't think I need to worry about the exact point during instruction execution. I don't think it makes sense to try to correlate instructions from different threads. It would be a wonderful feature to show a synchronous trace across multiple threads. But that would require you to measure time for each instruction. I don't think that's feasible without reducing performance to single stepping;-) >Anyways if RDTSC doesn't work the only global alternatives are >much slower >(like southbridge timers) or very inaccurate (jiffies) Would jiffies be a metric that works across cpu's? At the granularity that I want to measure, I guess that accuracy is not important at all. >I would just drop it since it'll likely always be somewhat misleading. I guess I will (have to) drop it if it cannot be used for what I intended. thanks and regards, markus. - Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
David Miller wrote: From: Richard Knutsson <[EMAIL PROTECTED]> Date: Thu, 06 Dec 2007 15:37:46 +0100 David Miller wrote: But this time I'll just let you know up front that I don't see much value in this patch. It is not a clear improvement to replace int's with bool's in my mind and the other changes are just whitespace changes. Is it not an improvement to distinct booleans from actual values? Do you use integers for ASCII characters too? It can also avoid some potential bugs like the 'if (i == TRUE)'... What is wrong with 'size_t' (since it is unsigned, compared to (some) 'int')? When you say "int found;" is there any doubt in your mind that this integer is going to hold a 1 or a 0 depending upon whether we "found" something? That's the problem I have with these kinds of patches, they do not increase clarity, it's just pure mindless edits. But is there not a good thing if also the compiler knows + names are sometime not as clear as that one? In new code, fine, use booleans if you want. I would even accept that it helps to change to boolean for arguments to functions that are global in scope. But not for function local variables in cases like this. Oh, I see your point now. Believed it to be yet another 'booleans is not C idiom'. Sorry about the noise Richard Knutsson -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > Stefano, could you try this ontop of a recent-ish Linus tree - does > > this resolve all issues? (without introducing new ones ;-) > > updated version attached below. third update. the cpufreq callbacks are not quite OK yet. Ingo Index: linux/arch/arm/kernel/time.c === --- linux.orig/arch/arm/kernel/time.c +++ linux/arch/arm/kernel/time.c @@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset } #endif -/* - * An implementation of printk_clock() independent from - * sched_clock(). This avoids non-bootable kernels when - * printk_clock is enabled. - */ -unsigned long long printk_clock(void) -{ - return (unsigned long long)(jiffies - INITIAL_JIFFIES) * - (10 / HZ); -} - static unsigned long next_rtc_update; /* Index: linux/arch/ia64/kernel/time.c === --- linux.orig/arch/ia64/kernel/time.c +++ linux/arch/ia64/kernel/time.c @@ -344,33 +344,6 @@ udelay (unsigned long usecs) } EXPORT_SYMBOL(udelay); -static unsigned long long ia64_itc_printk_clock(void) -{ - if (ia64_get_kr(IA64_KR_PER_CPU_DATA)) - return sched_clock(); - return 0; -} - -static unsigned long long ia64_default_printk_clock(void) -{ - return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) * - (10/HZ); -} - -unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock; - -unsigned long long printk_clock(void) -{ - return ia64_printk_clock(); -} - -void __init -ia64_setup_printk_clock(void) -{ - if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) - ia64_printk_clock = ia64_itc_printk_clock; -} - /* IA64 doesn't cache the timezone */ void update_vsyscall_tz(void) { Index: linux/arch/x86/kernel/process_32.c === --- linux.orig/arch/x86/kernel/process_32.c +++ linux/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt();/* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux/arch/x86/lib/delay_32.c === --- linux.orig/arch/x86/lib/delay_32.c +++ linux/arch/x86/lib/delay_32.c @@ -38,17 +38,21 @@ static void delay_loop(unsigned long loo :"0" (loops)); } -/* TSC based delay: */ +/* cpu_clock() [TSC] based delay: */ static void delay_tsc(unsigned long loops) { - unsigned long bclock, now; + unsigned long long start, stop, now; + int this_cpu; + + preempt_disable(); + + this_cpu = smp_processor_id(); + start = now = cpu_clock(this_cpu); + stop = start + loops; + + while ((long long)(stop - now) > 0) + now = cpu_clock(this_cpu); - preempt_disable(); /* TSC's are per-cpu */ - rdtscl(bclock); - do { - rep_nop(); - rdtscl(now); - } while ((now-bclock) < loops); preempt_enable(); } Index: linux/arch/x86/lib/delay_64.c === --- linux.orig/arch/x86/lib/delay_64.c +++ linux/arch/x86/lib/delay_64.c @@ -26,19 +26,28 @@ int read_current_timer(unsigned long *ti return 0; } -void __delay(unsigned long loops) +/* cpu_clock() [TSC] based delay: */ +static void delay_tsc(unsigned long loops) { - unsigned bclock, now; + unsigned long long start, stop, now; + int this_cpu; + + preempt_disable(); + + this_cpu = smp_processor_id(); + start = now = cpu_clock(this_cpu); + stop = start + loops; + + while ((long long)(stop - now) > 0) + now = cpu_clock(this_cpu); - preempt_disable(); /* TSC's are pre-cpu */ - rdtscl(bclock); - do { - rep_nop(); - rdtscl(now); - } - while ((now-bclock) < loops); preempt_enable(); } + +void __delay(unsigned long loops) +{ + delay_tsc(loops); +} EXPORT_SYMBOL(__delay); inline void __const_udelay(unsi
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > then the test of whether I bisected correctly is as simple as > > applying the commit and seeing if things break, because I'm running > > on the kernel corresponding to > > 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right now. Let me give > > that a try and I'll report back. Worst case, I'll have to start > > over and write off the past four days... > > Gad. I trust the second time will be faster. > > git-bisect _is_ very error prone. I find one of the problems is that > each step is so far apart in time that you forget what you were doing. > Did I remember to test that iteration? Did I install the right > kernel? etc. i have a fully automated bootup-hang bisection script. It is based on "git-bisect run". I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it's a "bad" kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it) So i dont have to a single manual decision anytime during the bisection. But the scripts are very much tied to my ad-hoc test environment so it would not be of much general use. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Nick Piggin <[EMAIL PROTECTED]> wrote: > My patch should fix the worst cpufreq sched_clock jumping issue I > think. but it degrades the precision of sched_clock() and has other problems as well. cpu_clock() is the right interface to use for such things. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Friday 07 December 2007 19:45, Ingo Molnar wrote: > * Stefano Brivio <[EMAIL PROTECTED]> wrote: > > This patch fixes a regression introduced by: > > > > commit bb29ab26863c022743143f27956cc0ca362f258c > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > Date: Mon Jul 9 18:51:59 2007 +0200 > > > > This caused the jiffies counter to leap back and forth on cpufreq > > changes on my x86 box. I'd say that we can't always assume that TSC > > does "small errors" only, when marked unstable. On cpufreq changes > > these errors can be huge. > > ah, printk_clock() still uses sched_clock(), not jiffies. So it's not > the jiffies counter that goes back and forth, it's sched_clock() - so > this is a printk timestamps anomaly, not related to jiffies. I thought > we have fixed this bug in the printk code already: sched_clock() is a > 'raw' interface that should not be used directly - the proper interface > is cpu_clock(cpu). It's a single CPU box, so sched_clock() jumping would still be problematic, no? My patch should fix the worst cpufreq sched_clock jumping issue I think. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] depmod: sort output according to modules.order, take #2
Kbuild now generates and installs modules.order along with modules. This patch updates depmod such that it sorts module list according to the file before generating output files. Modules which aren't on modules.order are put after modules which are ordered by modules.order. This makes modprobe to prioritize modules according to kernel Makefile's just as built-in modules are link-ordered by them. This patch is against module-init-tools 3.3-pre1. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: Sam Ravnborg <[EMAIL PROTECTED]> Cc: Bill Nottingham <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Greg Kroah-Hartman <[EMAIL PROTECTED]> Cc: Kay Sievers <[EMAIL PROTECTED]> --- Comment added and path comparion logic slightly modified such that dirname part of mode->pathname is ignored instead of prepending dirname to lines read from modules.order. Behavior-wise it's identical to the previous version. Thanks. depmod.c | 49 + 1 file changed, 49 insertions(+) diff --git a/depmod.c b/depmod.c index ea7ad05..c3ae5a2 100644 --- a/depmod.c +++ b/depmod.c @@ -585,6 +585,54 @@ static struct module *grab_basedir(const char *dirname) return list; } +static void sort_modules(const char *dirname, struct module **listp) +{ + struct module *list = *listp, *tlist = NULL, **tpos = &tlist; + FILE *modorder; + int dir_len = strlen(dirname) + 1; + char file_name[dir_len + strlen("modules.order") + 1]; + char line[10240]; + + sprintf(file_name, "%s/%s", dirname, "modules.order"); + + modorder = fopen(file_name, "r"); + if (!modorder) { + /* Older kernels don't generate modules.order. Just + return if the file doesn't exist. */ + if (errno == ENOENT) + return; + fatal("Could not open '%s': %s\n", file_name, strerror(errno)); + } + + sprintf(line, "%s/", dirname); + + /* move modules listed in modorder file to tlist in order */ + while (fgets(line, sizeof(line), modorder)) { + struct module **pos, *mod; + int len = strlen(line); + + if (line[len - 1] == '\n') + line[len - 1] = '\0'; + + for (pos = &list; (mod = *pos); pos = &(*pos)->next) { + if (strcmp(line, mod->pathname + dir_len) == 0) { + *pos = mod->next; + mod->next = NULL; + *tpos = mod; + tpos = &mod->next; + break; + } + } + } + + /* append the rest */ + *tpos = list; + + fclose(modorder); + + *listp = tlist; +} + static void parse_modules(struct module *list) { struct module *i; @@ -857,6 +905,7 @@ int main(int argc, char *argv[]) } else { list = grab_basedir(dirname); } + sort_modules(dirname, &list); parse_modules(list); for (i = 0; i < sizeof(depfiles)/sizeof(depfiles[0]); i++) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add support for the S-35390A RTC chip.
This adds basic get/set time support for the Seiko Instruments S-35390A. This chip communicates using I2C and is used on the QNAP TS-109/TS-209 NAS devices. Signed-off-by: Byron Bradley <[EMAIL PROTECTED]> Tested-by: Tim Ellis <[EMAIL PROTECTED]> --- drivers/rtc/Kconfig |9 ++ drivers/rtc/Makefile |1 + drivers/rtc/rtc-s35390a.c | 302 + 3 files changed, 312 insertions(+), 0 deletions(-) create mode 100644 drivers/rtc/rtc-s35390a.c diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 1e6715e..6c0fdf9 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -246,6 +246,15 @@ config RTC_DRV_TWL92330 platforms. The support is integrated with the rest of the Menelaus driver; it's not separate module. +config RTC_DRV_S35390A + tristate "Seiko Instruments S-35390A" + help + If you say yes here you will get support for the Seiko + Instruments S-35390A. + + This driver can also be built as a module. If so the module + will be called rtc-s35390a. + endif # I2C comment "SPI RTC drivers" diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile index 465db4d..8d6218f 100644 --- a/drivers/rtc/Makefile +++ b/drivers/rtc/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_RTC_DRV_PL031) += rtc-pl031.o obj-$(CONFIG_RTC_DRV_RS5C313) += rtc-rs5c313.o obj-$(CONFIG_RTC_DRV_RS5C348) += rtc-rs5c348.o obj-$(CONFIG_RTC_DRV_RS5C372) += rtc-rs5c372.o +obj-$(CONFIG_RTC_DRV_S35390A) += rtc-s35390a.o obj-$(CONFIG_RTC_DRV_S3C) += rtc-s3c.o obj-$(CONFIG_RTC_DRV_SA1100) += rtc-sa1100.o obj-$(CONFIG_RTC_DRV_SH) += rtc-sh.o diff --git a/drivers/rtc/rtc-s35390a.c b/drivers/rtc/rtc-s35390a.c new file mode 100644 index 000..29a95b6 --- /dev/null +++ b/drivers/rtc/rtc-s35390a.c @@ -0,0 +1,302 @@ +/* + * Seiko Instruments S-35390A RTC Driver + * + * Copyright (c) 2007 Byron Bradley + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#define S35390A_CMD_STATUS10 +#define S35390A_CMD_STATUS21 +#define S35390A_CMD_TIME1 2 + +#define S35390A_BYTE_YEAR 0 +#define S35390A_BYTE_MONTH 1 +#define S35390A_BYTE_DAY 2 +#define S35390A_BYTE_WDAY 3 +#define S35390A_BYTE_HOURS 4 +#define S35390A_BYTE_MINS 5 +#define S35390A_BYTE_SECS 6 + +#define S35390A_FLAG_POC 0x01 +#define S35390A_FLAG_BLD 0x02 +#define S35390A_FLAG_24H 0x40 +#define S35390A_FLAG_RESET 0x80 +#define S35390A_FLAG_TEST 0x01 + +struct s35390a { + struct i2c_client *client; + struct rtc_device *rtc; + int twentyfourhour; +}; + +static int s35390a_set_reg(struct s35390a *s35390a, int reg, char *buf, int len) +{ + struct i2c_client *client = s35390a->client; + struct i2c_msg msg[] = { + { client->addr | reg, 0, len, buf }, + }; + + /* Only write to the writable bits in the status1 register */ + if (reg == S35390A_CMD_STATUS1) + buf[0] &= 0xf; + + if ((i2c_transfer(client->adapter, msg, 1)) != 1) + return -EIO; + + return 0; +} + +static int s35390a_get_reg(struct s35390a *s35390a, int reg, char *buf, int len) +{ + struct i2c_client *client = s35390a->client; + struct i2c_msg msg[] = { + { client->addr | reg, I2C_M_RD, len, buf }, + }; + + if ((i2c_transfer(client->adapter, msg, 1)) != 1) + return -EIO; + + return 0; +} + +static int s35390a_reset(struct s35390a *s35390a) +{ + char buf[1]; + + if (s35390a_get_reg(s35390a, S35390A_CMD_STATUS1, buf, sizeof(buf)) < 0) + return -EIO; + + if (!(buf[0] & (S35390A_FLAG_POC | S35390A_FLAG_BLD))) + return 0; + + buf[0] |= S35390A_FLAG_RESET; + return s35390a_set_reg(s35390a, S35390A_CMD_STATUS1, buf, sizeof(buf)); +} + +static int s35390a_disable_test_mode(struct s35390a *s35390a) +{ + char buf[1]; + + if (s35390a_get_reg(s35390a, S35390A_CMD_STATUS2, buf, sizeof(buf)) < 0) + return -EIO; + + if (!(buf[0] & S35390A_FLAG_TEST)) + return 0; + + buf[0] &= ~S35390A_FLAG_TEST; + return s35390a_set_reg(s35390a, S35390A_CMD_STATUS2, buf, sizeof(buf)); +} + +static char s35390a_hr2reg(struct s35390a *s35390a, int hour) +{ + if (s35390a->twentyfourhour) + return BIN2BCD(hour); + + if (hour < 12) + return BIN2BCD(hour); + + return 0x40 | BIN2BCD(hour - 12); +} + +static int s35390a_reg2hr(struct s35390a *s35390a, char reg) +{ + unsigned hour; + + if (s35390a->twentyfourhour) + return BCD2BIN(reg & 0x3f); + + hour = BCD2
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > I'll clean it up and resend it later. As I don't have the necessary > knowledge to do the tsc_{32,64}.c unification, should I copy paste > common functions into tsc_32.c and tsc_64.c to ease later unification > or should I start a common .c file? note that there are a couple of existing patches in this area. One is the fix below. There's also older frequency-scaling TSC patches - i'll try to dig them out. Ingo > Subject: x86: idle wakeup event in the HLT loop From: Ingo Molnar <[EMAIL PROTECTED]> do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/process_32.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) Index: linux/arch/x86/kernel/process_32.c === --- linux.orig/arch/x86/kernel/process_32.c +++ linux/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt();/* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86/mm] x86 vDSO: canonicalize sysenter .eh_frame
* Roland McGrath <[EMAIL PROTECTED]> wrote: > Some assembler versions automagically optimize .eh_frame contents, > changing their size. The CFI in sysenter.S was not using optimal > formatting, so it would be changed by newer/smarter assemblers. This > ran afoul of the wired constant for padding out the other vDSO images > to match its size. This changes the original hand-coded source to use > the optimal format encoding for its operations. That leaves nothing > more for a fancy assembler to do, so the sizes will match the wired-in > expected size regardless of the assembler version. > > Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- thanks, applied. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Dec 7, 2007 12:50 AM, Yinghai Lu <[EMAIL PROTECTED]> wrote: > > On Dec 6, 2007 4:33 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: ... > > > > My feel is that if it is for legacy interrupts only it should not be a > > problem. > > Let's investigate and see if we can unconditionally enable this quirk > > for all opteron systems. > > i checked that bit > > http://www.openbios.org/viewvc/trunk/LinuxBIOSv2/src/northbridge/amd/amdk8/coherent_ht.c?revision=2596&view=markup > > static void enable_apic_ext_id(u8 node) > { > #if ENABLE_APIC_EXT_ID==1 > #warning "FIXME Is the right place to enable apic ext id here?" > > u32 val; > > val = pci_read_config32(NODE_HT(node), 0x68); > val |= (HTTC_APIC_EXT_SPUR | HTTC_APIC_EXT_ID | > HTTC_APIC_EXT_BRD_CST); > pci_write_config32(NODE_HT(node), 0x68, val); > #endif > } > > that bit only be should be set when apic id is lifted and cpu apid is > using 8 bits and that mean broadcast is 0xff instead 0x0f. > for example 8 socket dual core system or 4 socket quad core > system,that you should make BSP start from 0x04, so cpus apic id will > be [0x04, 0x13) > > > So if you want to enable that in early_quirk, you need to > make sure apic id is using 8 bits by check if the bit 16 (HTTC_APIC_ID) is > set. it should be bit 18 (HTTC_APIC_EXT_ID) YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bloggoo.com สร้างเว็บบล็อกแบบ เร็ว ฟรี ง่าย ทันทีตอนนี้เลย
Dear linux-kernel@vger.kernel.org, [EMAIL PROTECTED] has sent you an invite to sign up at Bloggoo.com - http://bloggoo.com. "BlogGoo (www.bloggoo.com) จัดทำขึ้นเพื่อให้ผู้ใช้บริการได้มีพื้นที่ส่วนตัว ในการสร้างสรรค์งานเขียนต่างๆ ของตนเองอย่างอิสระ ทั้งบอกเล่าเรื่องราวส่วนตัว เหตุการณ์ที่เกิดขึ้นประจำวัน แบ่งปันข้อมูล บทความ ใส่รูปภาพ วีดีโอ และเสียง หรือแลกเปลี่ยนความคิดเห็น ข่าวสารต่างๆ ตามแต่ที่ผู้ใช้บริการแต่ละท่านต้องการ. นอกจากนั้น BlogGoo ยังถือเป็นชุมชนออนไลน์ ที่เจ้าของ Blog สามารถติดต่อ เชื่อมความสัมพันธ์ กับเจ้าของ Blog อื่นๆ สร้างมิตรภาพดีๆ บนโลกอินเทอร์เน็ต และเพื่อเปิดโลกทัศน์ให้กว้างขึ้น. ขณะนี้ทาง BlogGoo ได้อยู่ในช่วงที่ต้องการการทดสอบระบบก่อนใช้งานจริง ซึ่งจะเปิดให้ใช้อย่างเป็นทางการในเร็วๆ นี้ เราต้องการผู้ที่สนใจที่จะมีส่วนร่วมในการทดสอบครั้งนี้ ถ้าท่านสนใจก็สามารถสมัครสมาชิกสร้างบล็อกของคุณทันทีได้ฟรี ที่นี่ http://bloggoo.com/wp-signup.php เพื่อทดสอบการสร้างบล็อกได้เลยทันที. และท่านสามารถติชม หรือให้คำแนะนำเว็บไซต์ BlogGoo ได้ที่ [EMAIL PROTECTED] สุดท้ายนี้ ต้องขอขอบคุณทุกท่านที่ให้การสนับสนุน และขอให้มีความสุขกับการใช้บริการ BlogGoo ของเรานะครับ" You can create your account here: http://bloggoo.com/wp-signup.php We are looking forward to seeing you on the site. Cheers, --The Team @ Bloggoo.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/20] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:07:18 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Stefano Brivio <[EMAIL PROTECTED]> wrote: > This patch fixes a regression introduced by: > > commit bb29ab26863c022743143f27956cc0ca362f258c > Author: Ingo Molnar <[EMAIL PROTECTED]> > Date: Mon Jul 9 18:51:59 2007 +0200 > > This caused the jiffies counter to leap back and forth on cpufreq > changes on my x86 box. I'd say that we can't always assume that TSC > does "small errors" only, when marked unstable. On cpufreq changes > these errors can be huge. ah, printk_clock() still uses sched_clock(), not jiffies. So it's not the jiffies counter that goes back and forth, it's sched_clock() - so this is a printk timestamps anomaly, not related to jiffies. I thought we have fixed this bug in the printk code already: sched_clock() is a 'raw' interface that should not be used directly - the proper interface is cpu_clock(cpu). Does the patch below help? Ingo ---> Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() From: Ingo Molnar <[EMAIL PROTECTED]> Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/printk.c |2 +- kernel/sched.c |7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) Index: linux/kernel/printk.c === --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -680,7 +680,7 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } - t = printk_clock(); + t = cpu_clock(printk_cpu); nanosec_rem = do_div(t, 10); tlen = sprintf(tbuf, "<%c>[%5lu.%06lu] ", Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -599,7 +599,12 @@ unsigned long long cpu_clock(int cpu) local_irq_save(flags); rq = cpu_rq(cpu); - update_rq_clock(rq); + /* +* Only call sched_clock() if the scheduler has already been +* initialized (some code might call cpu_clock() very early): +*/ + if (rq->idle) + update_rq_clock(rq); now = rq->clock; local_irq_restore(flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
On 07-12-07 16:43, Rene Herman wrote: On 07-12-07 15:54, Andi Kleen wrote: My machine in question, for example, needs no waiting within CMOS_READs at all. And I doubt any other chip/device needs waiting that isn't I don't know about CMOS, but there were definitely some not too ancient systems (let's say not more than 10 years) who required IO delays in the floppy driver and the 8253/8259. But on those the jumps are already far too fast. Also see Alan's replies in the thread I posted a link to: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/5700.html Also 8254 (PIT) at least it seems. By the way, David, it would be interesting if you could test 0xed. If your problem is some piece of hardware getting upset at LPC bus aborts it's not going to matter and we'd know an outb delay is just not an option on your system at least. You said you could quickly reproduce the problem with port 0x80? Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduler behaviour
Arjan van de Ven wrote: On Wed, 05 Dec 2007 21:15:30 +0100 Holger Wolf <[EMAIL PROTECTED]> wrote: We discovered performance degradation with dbench when using kernel 2.6.23 compared to kernel 2.6.22. In our case we booted a Linux in a IBM System z9 LPAR with 256MB of ram with 4 CPU's. This system uses a striped LV with 16 disks on a Storage Server connected via 8 4GBit links. A dbench was started on that system performing I/O operations on the striped LV. dbench runs were performed with 1 to 62 processes. Measurements with a 2.6.22 kernel were compared to measurements with a 2.6.23 kernel. We saw a throughput degradation from 7.2 to 23.4 this is good news! dbench rewards unfair behavior... so higher dbench usually means a worse kernel ;) tests with 2.6.22 including CFS show the same results. This means the pressure on page cache is much higher when all processes run in parallel. We see this behavior as well with iozone when writing on many disks with many threads and just 256 MB memory. This means the scheduler schedules as it should - fair. regards Holger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]
* Jiri Slaby <[EMAIL PROTECTED]> wrote: > On 12/05/2007 06:17 AM, Andrew Morton wrote: > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ > > > git-sched.patch > > breaks suspend here since -rc3-mm2. More precisely, this one: > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > 2.6.24-rc4-mm1 minus this one works just fine. Otherwise disks stop, graphics > stops and then it hangs not powering down. > > Core 2 Duo, SMP kernel, voluntary preempt, 250 HZ, SLUB, 64 bit. > > Ideas? thanks for tracking it down. Does the patch below help? Ingo --- kernel/softlockup.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: linux/kernel/softlockup.c === --- linux.orig/kernel/softlockup.c +++ linux/kernel/softlockup.c @@ -101,7 +101,11 @@ void softlockup_tick(void) now = get_timestamp(this_cpu); - /* Warn about unreasonable delays: */ + /* Wake up the high-prio watchdog task every second: */ + if (now > (touch_timestamp + 1)) + wake_up_process(per_cpu(watchdog_task, this_cpu)); + + /* Warn about unreasonable 10+ seconds delays: */ if (now <= (touch_timestamp + softlockup_thresh)) return; @@ -214,7 +218,7 @@ static int watchdog(void *__bind_cpu) */ while (!kthread_should_stop()) { touch_softlockup_watchdog(); - msleep_interruptible(1); + schedule(); /* * Only do the hung-tasks check on one CPU: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SCSI: make pcmcia directory use obj-y|m instead of subdir-y|m
On Fri, Dec 07, 2007 at 10:36:23PM +0900, Tejun Heo wrote: > subdir-y|m isn't supposed to contain modules or built-in components. > Change subdir-$(CONFIG_PCMCIA) to obj-$(CONFIG_PCMCIA). > > Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> > Cc: Sam Ravnborg <[EMAIL PROTECTED]> > Cc: James Bottomley <[EMAIL PROTECTED]> Ack-by: Sam Ravnborg <[EMAIL PROTECTED]> > --- > drivers/scsi/Makefile |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile > index 2e6129f..72c8d2e 100644 > --- a/drivers/scsi/Makefile > +++ b/drivers/scsi/Makefile > @@ -18,7 +18,7 @@ CFLAGS_aha152x.o = -DAHA152X_STAT -DAUTOCONF > CFLAGS_gdth.o= # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ -DGDTH_STATISTICS > CFLAGS_seagate.o = -DARBITRATE -DPARITY -DSEAGATE_USE_ASM > > -subdir-$(CONFIG_PCMCIA) += pcmcia > +obj-$(CONFIG_PCMCIA) += pcmcia/ > > obj-$(CONFIG_SCSI) += scsi_mod.o > obj-$(CONFIG_SCSI_TGT) += scsi_tgt.o -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch-RFC 00/26] LTTng Kernel Trace Thread Flag
* Frank Ch. Eigler ([EMAIL PROTECTED]) wrote: > Mathieu Desnoyers <[EMAIL PROTECTED]> writes: > > > This is an RFC for addition of a new thread flag, TIF_KERNEL_TRACE, to each > > architecture to activate system-wide system call tracing. > > [...] > > Instead of creating a new flag, could you overload TIF_SYSCALL_TRACE, > putting the marker into syscall_trace(), and letting !PT_TRACED cause > a skip over the ptrace notification logic? > > - FChE I don't see any PT_TRACED flag in current kernel HEAD ? Hrm, let's see. If we share TIF_SYSCALL_TRACE with ptrace, we would then have to figure out how to get this working : - kernel tracing activated - ptracing some random processes - kernel tracing deactivated - stop ptracing those processes It means that we would have to keep some state information about the ptrace status of each process. This is currently kept by TIF_SYSCALL_TRACE, but since we would be overloading it, it would be lost when we deactivate kernel tracing. Adding a supplementary field to the thread_info structure is out of question here : we have to keep it as small as possible. So where do you propose to keep this information other than... another thread flag ? Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] SCSI: make pcmcia directory use obj-y|m instead of subdir-y|m
subdir-y|m isn't supposed to contain modules or built-in components. Change subdir-$(CONFIG_PCMCIA) to obj-$(CONFIG_PCMCIA). Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: Sam Ravnborg <[EMAIL PROTECTED]> Cc: James Bottomley <[EMAIL PROTECTED]> --- drivers/scsi/Makefile |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 2e6129f..72c8d2e 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -18,7 +18,7 @@ CFLAGS_aha152x.o = -DAHA152X_STAT -DAUTOCONF CFLAGS_gdth.o= # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ -DGDTH_STATISTICS CFLAGS_seagate.o = -DARBITRATE -DPARITY -DSEAGATE_USE_ASM -subdir-$(CONFIG_PCMCIA)+= pcmcia +obj-$(CONFIG_PCMCIA) += pcmcia/ obj-$(CONFIG_SCSI) += scsi_mod.o obj-$(CONFIG_SCSI_TGT) += scsi_tgt.o -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Fri, Dec 07, 2007 at 09:53:15AM -0500, Neil Horman wrote: > On Fri, Dec 07, 2007 at 09:39:44AM -0500, Vivek Goyal wrote: > > On Thu, Dec 06, 2007 at 07:10:23PM -0500, Neil Horman wrote: > > > On Thu, Dec 06, 2007 at 05:11:43PM -0500, Vivek Goyal wrote: > > > > On Thu, Dec 06, 2007 at 04:39:51PM -0500, Neil Horman wrote: > > > > > On Fri, Nov 30, 2007 at 09:51:31AM -0500, Neil Horman wrote: > > > > > > On Fri, Nov 30, 2007 at 09:42:50AM -0500, Vivek Goyal wrote: > > > > > > > > > > > > > > > > > Thats what I'm doing at the moment. I'm working on a RHEL5 patch > > > > > > at the moment > > > > > > (since thats whats on the production system thats failing), and > > > > > > will forward > > > > > > port it once its working > > > > > > > > > > > > And not to split hairs, but techically thats not our _only_ choice. > > > > > > We could > > > > > > force kdump boots on cpu0 as well ;) > > > > > > > > > > > > Thanks > > > > > > Neil > > > > > > > > > > > > > Thanks > > > > > > > Vivek > > > > > > > > > > > > > > > > > > > > > Sorry to have been quiet on this issue for a few days. Interesting > > > > > news to > > > > > report, though. So I was working on a patch to do early apic > > > > > enabling on > > > > > x86_64, and had something working for the old 2.6.18 kernel that we > > > > > were > > > > > origionally testing on. Unfortunately while it worked on 2.6.18 it > > > > > failed > > > > > miserably on 2.6.24-rc3-mm2, causing check_timer to consistently > > > > > report that the > > > > > timer interrupt wasn't getting received (even though we could > > > > > successfully run > > > > > calibrate_delay). Vivek and I were digging into this, when I ran > > > > > accross the > > > > > description of the hypertransport configuration register in the > > > > > opteron > > > > > specification. It contains a bit that, suprise, configures the ht > > > > > bus to either > > > > > unicast interrupts delivered accross the ht bus to a single cpu, or > > > > > to broadcast > > > > > it to all cpus. Since it seemed more likely that the 8259 in the > > > > > nvidia > > > > > southbridge was transporting legacy mode interrupts over the ht bus > > > > > than > > > > > directly to cpu0 via an actual wire, I wrote the attached patch to > > > > > add a quirk > > > > > for nvidia chipsets, which scanned for hypertransport controllers, > > > > > and ensured > > > > > that that broadcast bit was set. Test results indicate that this > > > > > solves the > > > > > problem, and kdump kernels boot just fine on the affected system. > > > > > > > > > > > > > Hi Neil, > > > > > > > > Should we disable this broadcasting feature once we are through? > > > > Otherwise > > > > in normal systems it might mean extra traffic on hypertransport. There > > > > is no need for every interrupt to be broadcasted in normal systems? > > > > > > > > Thanks > > > > Vivek > > > > > > No, I don't think thats necessecary. Once the apics are enabled, > > > interrupts > > > shouldn't travel accross the hypertransport bus anyway, opting instead to > > > use > > > the dedicated apic bus (at least thats my understanding). > > > > I think all interrupt message travel on hypertransport. Even after APICS > > have been enabled. > > > > Look at the following document. > > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24674.pdf > > > > Have a look at figure 1, figure 2 and section 3.4.2.2 and 3.4.2.3 > > > > That's a different thing that once IOAPIC has formed the vectored message, > > Hypertransport might not touch the destination field. > > > Ok, that might be the case then. > > > Having said that, I am wondering what will happen if a system continues > > to operate the timer through IOAPIC in ExtInt mode. Will hypertransport > > keep on broadcasting that interrupt to every cpu? And every cpu will > > process that interrupt. > > > I don't think so. IIRC once the other cpus are started they all disable the > timer interrupt, except for one cpu, opting instead to get the timer tick via > ipi, So while they all might see the interrupt packet on the ht bus, only one > cpu will process it. > Does LAPIC allow to disable a specific vector and not accept interrupts? I don't think so. If a timer interrupt is broadcasted to every cpu I think everybody will accept it (like broadcast IPI). That's why intelligence is built into IOAPIC and direct interrupts to a cpu or group of cpu. I am just trying to understand the functionality better. Can somebody help me understand how do we make sure that same timer interrupt is not processed by all cpus (assuming hypertransport is broadcasting it)? > > Hence, I feel it is safe to restore the broadcast bit back to BIOS value > > once > > we are through calibrate_delay(). > > > I disagree. Looking at what Yinghai said, the default setting for the > broadcast > bit isn't actually to unicast the interrupt, its just to set the broadcast > mask > to 0xF, o
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:13 PM, Nick Piggin <[EMAIL PROTECTED]> wrote: > My patch should fix the worst cpufreq sched_clock jumping issue > I think. Any pointer to it? Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > ah, printk_clock() still uses sched_clock(), not jiffies. So it's > > not the jiffies counter that goes back and forth, it's sched_clock() > > - so this is a printk timestamps anomaly, not related to jiffies. I > > thought we have fixed this bug in the printk code already: > > sched_clock() is a 'raw' interface that should not be used directly > > - the proper interface is cpu_clock(cpu). > > It's a single CPU box, so sched_clock() jumping would still be > problematic, no? sched_clock() is an internal API - the non-jumping API to be used by printk is cpu_clock(). Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
ok, here's a rollup of 11 patches that relate to this. I hoped we could wait with this for 2.6.25, but it seems more urgent as per Stefano's testing, as udelay() and drivers are affected as well. Stefano, could you try this ontop of a recent-ish Linus tree - does this resolve all issues? (without introducing new ones ;-) Ingo Index: linux/arch/arm/kernel/time.c === --- linux.orig/arch/arm/kernel/time.c +++ linux/arch/arm/kernel/time.c @@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset } #endif -/* - * An implementation of printk_clock() independent from - * sched_clock(). This avoids non-bootable kernels when - * printk_clock is enabled. - */ -unsigned long long printk_clock(void) -{ - return (unsigned long long)(jiffies - INITIAL_JIFFIES) * - (10 / HZ); -} - static unsigned long next_rtc_update; /* Index: linux/arch/ia64/kernel/time.c === --- linux.orig/arch/ia64/kernel/time.c +++ linux/arch/ia64/kernel/time.c @@ -344,33 +344,6 @@ udelay (unsigned long usecs) } EXPORT_SYMBOL(udelay); -static unsigned long long ia64_itc_printk_clock(void) -{ - if (ia64_get_kr(IA64_KR_PER_CPU_DATA)) - return sched_clock(); - return 0; -} - -static unsigned long long ia64_default_printk_clock(void) -{ - return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) * - (10/HZ); -} - -unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock; - -unsigned long long printk_clock(void) -{ - return ia64_printk_clock(); -} - -void __init -ia64_setup_printk_clock(void) -{ - if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) - ia64_printk_clock = ia64_itc_printk_clock; -} - /* IA64 doesn't cache the timezone */ void update_vsyscall_tz(void) { Index: linux/arch/x86/kernel/process_32.c === --- linux.orig/arch/x86/kernel/process_32.c +++ linux/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt();/* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux/arch/x86/kernel/tsc_32.c === --- linux.orig/arch/x86/kernel/tsc_32.c +++ linux/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,32 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; + +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; + + rdtscll(tsc_now); + params = &get_cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + params->offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } /* Index: linux/arch/x86/kernel/tsc_64.c === --- linux.orig/arch/x86/kernel/tsc_64.c +++ linux/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include #include +#include static int notsc __initdata = 0; @@ -18,16 +19,25 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cp
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > - t = printk_clock(); > > > + t = cpu_clock(printk_cpu); > > > nanosec_rem = do_div(t, 10); > > > tlen = sprintf(tbuf, > > > "<%c>[%5lu.%06lu] ", > > > > A bit risky - it's quite an expansion of code which no longer can call > > printk. > > > > You might want to take that WARN_ON out of __update_rq_clock() ;) > > hm, dont we already detect printk recursions and turn them into a > silent return instead of a hang/crash? ugh, we dont. So i guess the (tested) patch below is highly needed. (If such incidents become frequent then we could save the stackdump of the recursion via save_stack_trace() too - but i wanted to keep the initial code simple.) Ingo > Subject: printk: make printk more robust by not allowing recursion From: Ingo Molnar <[EMAIL PROTECTED]> make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk logic with irqs disabled. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/printk.c | 52 ++-- 1 file changed, 42 insertions(+), 10 deletions(-) Index: linux/kernel/printk.c === --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -623,30 +623,57 @@ asmlinkage int printk(const char *fmt, . /* cpu currently holding logbuf_lock */ static volatile unsigned int printk_cpu = UINT_MAX; +const char printk_recursion_bug_msg [] = + KERN_CRIT "BUG: recent printk recursion!\n"; +static int printk_recursion_bug; + asmlinkage int vprintk(const char *fmt, va_list args) { + static int log_level_unknown = 1; + static char printk_buf[1024]; + unsigned long flags; - int printed_len; + int printed_len = 0; + int this_cpu; char *p; - static char printk_buf[1024]; - static int log_level_unknown = 1; boot_delay_msec(); preempt_disable(); - if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id()) - /* If a crash is occurring during printk() on this CPU, -* make sure we can't deadlock */ - zap_locks(); - /* This stops the holder of console_sem just where we want him */ raw_local_irq_save(flags); + this_cpu = smp_processor_id(); + + /* +* Ouch, printk recursed into itself! +*/ + if (unlikely(printk_cpu == this_cpu)) { + /* +* If a crash is occurring during printk() on this CPU, +* then try to get the crash message out but make sure +* we can't deadlock. Otherwise just return to avoid the +* recursion and return - but flag the recursion so that +* it can be printed at the next appropriate moment: +*/ + if (!oops_in_progress) { + printk_recursion_bug = 1; + goto out_restore_irqs; + } + zap_locks(); + } + lockdep_off(); spin_lock(&logbuf_lock); - printk_cpu = smp_processor_id(); + printk_cpu = this_cpu; + if (printk_recursion_bug) { + printk_recursion_bug = 0; + strcpy(printk_buf, printk_recursion_bug_msg); + printed_len = sizeof(printk_recursion_bug_msg); + } /* Emit the output into the temporary buffer */ - printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args); + printed_len += vscnprintf(printk_buf + printed_len, + sizeof(printk_buf), fmt, args); /* * Copy the output into log_buf. If the caller didn't provide @@ -675,6 +702,10 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } + if (panic_timeout) { + panic_timeout = 0; + printk("recurse!\n"); + } t = cpu_clock(printk_cpu); nanosec_rem = do_div(t, 10); tlen = sprintf(tbuf, @@ -739,6
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > - t = printk_clock(); > > + t = cpu_clock(printk_cpu); > > nanosec_rem = do_div(t, 10); > > tlen = sprintf(tbuf, > > "<%c>[%5lu.%06lu] ", > > A bit risky - it's quite an expansion of code which no longer can call > printk. > > You might want to take that WARN_ON out of __update_rq_clock() ;) hm, dont we already detect printk recursions and turn them into a silent return instead of a hang/crash? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] syslets: add generic syslets infrastructure
Hi Zach. On Thu, Dec 06, 2007 at 03:20:18PM -0800, Zach Brown ([EMAIL PROTECTED]) wrote: > +/* > + * XXX todo: > + * - do we need all this '*cur = current' nonsense? > + * - try to prevent userspace from submitting too much.. lazy user ptr read? > + * - explain how to deal with waiting threads with stale data in current > + * - how does userspace tell that a syslet completion was lost? > + * provide an -errno argument to the userspace return function? > + */ > + > +/* > + * These structs are stored on the kernel stack of tasks which are waiting to > + * return to userspace. They are linked into their parent's list of syslet > + * children stored in 'syslet_tasks' in the parent's task_struct. > + */ > +struct syslet_task_entry { > + struct task_struct *task; > + struct list_head item; > +}; > + > +/* > + * syslet_ring doesn't have any kernel-side storage. Userspace allocates > them > + * in their address space and initializes their fields and then passes them > to > + * the kernel. > + * > + * These hashes provide the kernel-side storage for the wait queues which > + * sys_syslet_ring_wait() uses and the mutex which completion uses to > serialize > + * the (possible blocking) ordered writes of the completion and kernel head > + * index into the ring. > + * > + * We chose the bucket that supports a given ring by hashing a u32 that > + * userspace sets in the ring. > + */ > +#define SYSLET_HASH_BITS (CONFIG_BASE_SMALL ? 4 : 8) > +#define SYSLET_HASH_NR (1 << SYSLET_HASH_BITS) > +#define SYSLET_HASH_MASK (SYSLET_HASH_NR - 1) > +static wait_queue_head_t syslet_waitqs[SYSLET_HASH_NR]; > +static struct mutex syslet_muts[SYSLET_HASH_NR]; Why do you care about hashed tables scalability and not using trees? -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New Address Family: Inter Process Networking (IPN)
> Stop making excuses, with minor adjustments we have the facilities to > meet your needs. There is no need for yet-another-protocol to do what I suspect they would be better of just using IP multicast. But the localhost latency penalty vs Unix Chris was talking about probably needs to be investigated. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Thursday 06 December 2007, Roland Dreier wrote: > > Regarding the performance problem, have you checked whether converting all > > your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance > > on the older machines? Maybe it's already fast enough that way. > > It does seem that the only places that the hcall_lock is taken also > use msleep, so they must always be in process context. So you can > safely just use spin_lock(), right? I think it needs some more inspection. The msleep in there is only called for hcalls that return H_IS_LONG_BUSY(). In theory, you can call ehca_plpar_hcall_norets() from inside an interrupt handler if the hcall in question never returns long busy. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Peculiar out-of-sync boot log lines
Hello. Bartlomiej Zolnierkiewicz wrote: [PATCH] ide: DMA reporting and validity checking fixes (take 2) * ide_xfer_verbose() fixups: - beautify returned mode names - fix PIO5 reporting - make it return 'const char *' * Change printk() level from KERN_DEBUG to KERN_INFO in ide_find_dma_mode(). * Add ide_id_dma_bug() helper based on ide_dma_verbose() to check for invalid DMA info in identify block. * Use ide_id_dma_bug() in ide_tune_dma() and ide_driveid_update(). As a result DMA won't be tuned or will be disabled after tuning if device reports inconsistent info about enabled DMA mode (ide_dma_verbose() does the same checks while the IDE device is probed by ide-{cd,disk} device driver). * Since (id->capability & 1) && id->tDMA is a valid configuration handle it correctly in ide_id_dma_bug(). Huh? You don't check (id->capability & 1) there... * Remove no longer needed ide_dma_verbose(). This patch should fix the following problem with out-of-sync IDE messages reported by Nick Warne: hdd: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache<7>hdd: skipping word 93 validity check , UDMA(66) and later debugged by Mark Lord to be caused by: ide_dma_verbose() printk( ... "2048kB Cache"); eighty_ninty_three() printk(KERN_DEBUG "%s: skipping word 93 validity check\n"); ide_dma_verbose() printk(", UDMA(66)" Please note that as a result ide-{cd,disk} device drivers won't report the DMA speed used but this is intended since now DMA mode being used is always reported by IDE core code. v2: * fixes suggested by Randy: - use KERN_CONT for printk()-s in ide-{cd,disk}.c - don't remove argument name from ide_xfer_verbose() declaration Cc: Nick Warne <[EMAIL PROTECTED]> Cc: Mark Lord <[EMAIL PROTECTED]> Cc: Randy Dunlap <[EMAIL PROTECTED]> Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]> [...] Index: b/drivers/ide/ide-dma.c === --- a/drivers/ide/ide-dma.c +++ b/drivers/ide/ide-dma.c @@ -806,58 +809,26 @@ static int ide_dma_check(ide_drive_t *dr return vdma ? 0 : -1; } -void ide_dma_verbose(ide_drive_t *drive) +int ide_id_dma_bug(ide_drive_t *drive) { - struct hd_driveid *id = drive->id; - ide_hwif_t *hwif= HWIF(drive); + struct hd_driveid *id = drive->id; if (id->field_valid & 4) { if ((id->dma_ultra >> 8) && (id->dma_mword >> 8)) [...] + goto err_out; } else if (id->field_valid & 2) { if ((id->dma_mword >> 8) && (id->dma_1word >> 8)) - goto bug_dma_off; - printk(", DMA"); + goto err_out; } else if (id->field_valid & 1) { Hm, bit 0 only gurantees that current translation - goto bug_dma_off; + if (id->tDMA == 0) Despite the name, this is not a transfer period but SW DMA mode number, so why mode 0 is bad? + goto err_out; } - return; -bug_dma_off: - printk(", BUG DMA OFF"); - hwif->dma_off_quietly(drive); - return; + return 0; +err_out: + printk(KERN_ERR "%s: bad DMA info in identify block\n", drive->name); + return 1; } Index: b/drivers/ide/ide-lib.c === --- a/drivers/ide/ide-lib.c +++ b/drivers/ide/ide-lib.c @@ -29,41 +29,44 @@ * Add common non I/O op stuff here. Make sure it has proper * kernel-doc function headers or your patch will be rejected */ - + +static const char *udma_str[] = +{ "UDMA/16", "UDMA/25", "UDMA/33", "UDMA/44", + "UDMA/66", "UDMA/100", "UDMA/133", "UDMA7" }; +static const char *mwdma_str[] = + { "MWDMA0", "MWDMA1", "MWDMA2" }; +static const char *swdma_str[] = + { "SWDMA0", "SWDMA1", "SWDMA2" }; +static const char *pio_str[] = + { "PIO0", "PIO1", "PIO2", "PIO3", "PIO4", "PIO5" }; /** * ide_xfer_verbose- return IDE mode names - * @xfer_rate: rate to name + * @mode: transfer mode * * Returns a constant string giving the name of the mode * requested. */ -char *ide_xfer_verbose (u8 xfer_rate) +const char *ide_xfer_verbose(u8 mode) { [...] + const char *s; + u8 i = mode & 0xf; + + if (mode >= XFER_UDMA_0 && mode <= XFER_UDMA_7) + s = udma_str[i]; + else if (mode >= XFER_MW_DMA_0 && mode <= XFER_MW_DMA_2) + s = mwdma_str[i]; + else if (mode >= XFER_SW_DMA_0 && mode <= XFER_SW_DMA_2) + s = swdma_str[i]; + else if (mode >= XFER_PIO_0 && mode <= XFER_PIO_5) + s = pio_str[i & 0x7]; + else if (mode == XFER_PIO_SLOW) + s = "XFER SLOW"; Not "PIO SLOW"? + else + s = "XFER ERROR"; + +
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
> You don't need to. Port 0x80 historically is about 8uS so just udelay(8) > and make sure the initial default delay is conservative enough before the How would you make it conservative enough handling let's say a 6Ghz CPU that can execute multiple jumps per cycle? -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: git guidance
Al Boldi wrote: Johannes Schindelin wrote: Hi, Hi On Fri, 7 Dec 2007, Al Boldi wrote: You need to re-read the thread. I don't know why you write that, and then say thanks. Clearly, what you wrote originally, and what Andreas pointed out, were quite obvious indicators that git already does what you suggest. You _do_ work "transparently" (whatever you understand by that overused term) in the working directory, unimpeded by git. If you go back in the thread, you may find a link to a gitfs client that somebody kindly posted. This client pretty much defines the transparency I'm talking about. The only problem is that it's read-only. To make it really useful, it has to support versioning locally, disconnected from the server repository. One way to implement this, could be by committing every update unconditionally to an on-the-fly created git repository private to the gitfs client. Earlier you said that you need to be able to tell git when you want to make a commit, which means pretty much any old filesystem could serve as gitfs. Now you're saying you want every single update to be committed, which would make it mimic an editor's undo functionality. I still don't get what it is you really want. With this transparently created private scratch repository it should then be possible for the same gitfs to re-expose the locally created commits, all without any direct user-intervention. Later, this same scratch repository could then be managed by the normal git-management tools/commands to ultimately update the backend git repositories. That's exactly what's happening today. I imagine whoever wrote the gitfs thing did so to facilitate testing, or as some form of intellectual masturbation. So, to get to the bottom of this, which of the following workflows is it you want git to support? ### WORKFLOW A ### edit, edit, edit edit, edit, edit edit, edit, edit Oops I made a mistake and need to hop back to "current - 12". edit, edit, edit edit, edit, edit publish everything, similar to just tarring up your workdir and sending out ### END WORKFLOW A ### ### WORKFLOW B ### edit, edit, edit ok this looks good, I want to save a checkpoint here edit, edit, edit looks good again. next checkpoint edit, edit, edit oh crap, back to checkpoint 2 edit, edit, edit ooh, that's better. save a checkpoint and publish those checkpoints ### END WORKFLOW B ### If you could just answer that question and stop writing "transparent" or any synonym thereof six times in each email, we can possibly help you. As it stands now though, nobody is very interested because you haven't explained how you want this "transparency" of yours to work in an every day scenario. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] ext2: xip check fix
Jared Hulbert wrote: I think so. The filemap_xip.c functionality doesn't work for Flash memory yet. Flash memory doesn't have struct pages to back it up with which this stuff depends on. Struct page is not the major issue. The primary problem is writing to the media (and I am not a flash expert at all, just relaying here): For some period of time, the flash memory is not usable and thus we need to make sure we can nuke the page table entries that we have in userland page tables. For that, we need a callback from the device so that it can ask to get its references back. Oh, and a put_xip_page counterpart to get_xip_page, so that the driver knows when it's safe to erase. cheers, Carsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
"Guillaume Chazarain" <[EMAIL PROTECTED]> wrote: > On Dec 7, 2007 6:51 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > Hmrpf. sched_clock() is used for the time stamp of the printks. We > > need to find some better solution other than killing off the tsc > > access completely. > > Something like http://lkml.org/lkml/2007/3/16/291 that would need some > refresh? And here is a refreshed one just for testing with 2.6-git. The 64 bit part is a shamelessly untested copy/paste as I cannot test it. diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..d561b2f 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,32 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; + +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; + + rdtscll(tsc_now); + params = &get_cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + params->offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } /* diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 9c70af4..93e7a06 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include #include +#include static int notsc __initdata = 0; @@ -18,16 +19,25 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz; -} + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) >> NS_SCALE; + rdtscll(tsc_now); + params = &get_cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + params->offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } unsigned long long sched_clock(void) diff --git a/include/asm-x86/timer.h b/include/asm-x86/timer.h index 0db7e99..ff4f2a3 100644 --- a/include/asm-x86/timer.h +++ b/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include #include +#include #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) => nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,44 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -extern unsigned long cyc2ns_scale __read_mostly; + +struct cyc2ns_params { + unsigned long scale; + unsigned long long offset; +}; + +DECLARE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(struct cyc2ns_params *params, + unsigned long long cyc) { - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; + return ((cyc * params->scale) >> CYC2NS_SCALE_FACTOR) + params->offset; } +static inline unsigned long long cycles_2_ns(unsigned long long cyc) +{ + struct cyc2ns_para
Re: [PATCH 13/20] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:01:26 +0800 > single list_head variable initialized with LIST_HEAD_INIT could almost > always can be replaced with LIST_HEAD declaration, this shrinks the code > and looks better. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ptrace API extensions for BTS
Roland, Andi, I would like to discuss the ptrace user interface for the BTS extension. In previous emails, Andi suggested a stream-like interface, but is also OK with an array-like interface (as far as I understood). Roland is dubious about the ptrace API additions. I would like to settle the discussion and find an interface that everybody can agree to, so I can implement that interface and we can move forward with the patch. Here's the link to the original patch: http://lkml.org/lkml/2007/12/5/234. Here are the facts: - we need to provide access to an array (cyclic buffer) of BTS records - the array can be quite big - the most interesting part is the tail - a BTS record can either describe a branch (from, to address) or a scheduling event (task arrives/departs at timestamp) Let's look at the entire array, first. I see the following alternatives: 1. get the entire array in one command + simple interface, like GETREGS - a lot of (redundant) copying 2. array-like commands (get size, read element at index) + allows precise reads; minimizes copying 3. stream-like commands (read, maybe seek) [read from back to front] + favors most expected use cases - makes other uses much harder (e.g read from front to back) - harder to get the semantics right and intuitive (when to reset read pointer? e.g. when stepping between two reads) Alternatives 1 and 3 require a reordering to turn the cyclic buffer into a sequential array or stream. Alternative 2 would benefit from that, as well. When we reorder the array, the best order would be from back to front, so users can start reading the most interesting part first, and stop when they read enough. I would recommend alternative 2. Number 1 may result in too much copying, and number 3 is better done in user space; the kernel API should be more flexible and not favor a single use case. Let's look at the array size, next. 1. pre-defined array size + most simple, no extra command - one size will not fit all users 2. user-defined array size + most flexible for the user (need to set a system limit to restrain greedy users) I would recommend alternative 2. A good citizen will only ask for the space he needs. In the ideal case, the system limit would be variable (as Andi suggested). Let's look at the array contents. Currently, we have 3 different record types. 1. self-describing union + most extensible + allows single bts array - may waste (user-space) memory 2. separate fixed-type arrays + get command defines interpretation - need additional effort to describe relative order between array elements - extension requires new set of access commands I would recommend alternative 1. It is most flexible and most easily extensible. And it is easier to use. What extensions do we expect in the future? 1. more architectures 2. additional data Regarding 2, a union would easily allow us to add additional data; at the cost of a few wasted bytes, if the data is not evenly sized. A user may look at the qualifier and either ignore records he does not understand, or bail out. Regarding 1, we currently provide scheduling timestamps, which are arch independent, and from-to branch information, which should be available on all architectures for a similar feature. I could think of basic block from-to information as an alternative representation on some architectures. I could also imagine that other architectures provide additional information (like the predicted bit on Netburst that was dropped for later architectures). Both could be modelled using additional record types. Additional architectures may want to (re)use and extend the x86 bts record, or they may want to invent their own format. In the former case, we may move the bts union and the bts commands to the generic ptrace header, and provide a default implementation for architectures that do not support it (basically pretend that the array is empty or return an error). In the latter case, they may copy parts of the x86 header. I would postpone the decision until there are more arch's that wish to support this feature. Thank you for reading until here. regards, markus. - Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordom
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Dec 6, 2007 4:33 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > Vivek Goyal <[EMAIL PROTECTED]> writes: > > > > On Thu, Dec 06, 2007 at 04:39:51PM -0500, Neil Horman wrote: > >> On Fri, Nov 30, 2007 at 09:51:31AM -0500, Neil Horman wrote: > >> > On Fri, Nov 30, 2007 at 09:42:50AM -0500, Vivek Goyal wrote: > >> > >> > > >> > Thats what I'm doing at the moment. I'm working on a RHEL5 patch at the > > moment > >> > (since thats whats on the production system thats failing), and will > >> > forward > >> > port it once its working > >> > > >> > And not to split hairs, but techically thats not our _only_ choice. We > > could > >> > force kdump boots on cpu0 as well ;) > >> > > >> > Thanks > >> > Neil > >> > > >> > > Thanks > >> > > Vivek > >> > > >> > >> > >> Sorry to have been quiet on this issue for a few days. Interesting news to > >> report, though. So I was working on a patch to do early apic enabling on > >> x86_64, and had something working for the old 2.6.18 kernel that we were > >> origionally testing on. Unfortunately while it worked on 2.6.18 it failed > >> miserably on 2.6.24-rc3-mm2, causing check_timer to consistently report > >> that > > the > >> timer interrupt wasn't getting received (even though we could successfully > >> run > >> calibrate_delay). Vivek and I were digging into this, when I ran accross > >> the > >> description of the hypertransport configuration register in the opteron > >> specification. It contains a bit that, suprise, configures the ht bus to > > either > >> unicast interrupts delivered accross the ht bus to a single cpu, or to > > broadcast > >> it to all cpus. Since it seemed more likely that the 8259 in the nvidia > >> southbridge was transporting legacy mode interrupts over the ht bus than > >> directly to cpu0 via an actual wire, I wrote the attached patch to add a > >> quirk > >> for nvidia chipsets, which scanned for hypertransport controllers, and > >> ensured > >> that that broadcast bit was set. Test results indicate that this solves > >> the > >> problem, and kdump kernels boot just fine on the affected system. > >> > > > > Hi Neil, > > > > Should we disable this broadcasting feature once we are through? Otherwise > > in normal systems it might mean extra traffic on hypertransport. There > > is no need for every interrupt to be broadcasted in normal systems? > > My feel is that if it is for legacy interrupts only it should not be a > problem. > Let's investigate and see if we can unconditionally enable this quirk > for all opteron systems. i checked that bit http://www.openbios.org/viewvc/trunk/LinuxBIOSv2/src/northbridge/amd/amdk8/coherent_ht.c?revision=2596&view=markup static void enable_apic_ext_id(u8 node) { #if ENABLE_APIC_EXT_ID==1 #warning "FIXME Is the right place to enable apic ext id here?" u32 val; val = pci_read_config32(NODE_HT(node), 0x68); val |= (HTTC_APIC_EXT_SPUR | HTTC_APIC_EXT_ID | HTTC_APIC_EXT_BRD_CST); pci_write_config32(NODE_HT(node), 0x68, val); #endif } that bit only be should be set when apic id is lifted and cpu apid is using 8 bits and that mean broadcast is 0xff instead 0x0f. for example 8 socket dual core system or 4 socket quad core system,that you should make BSP start from 0x04, so cpus apic id will be [0x04, 0x13) So if you want to enable that in early_quirk, you need to make sure apic id is using 8 bits by check if the bit 16 (HTTC_APIC_ID) is set. most BIOS already did that. You may ask Supermicro fix their broken BIOS instead. YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 20/20] net/iucv/iucv.c: use LIST_HEAD instead of LIST_HEAD_INIT
From: Denis Cheng <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 00:13:25 +0800 > these three list_head are all local variables, but can also use LIST_HEAD. > > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
* Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > > Something like http://lkml.org/lkml/2007/3/16/291 that would need > > some refresh? > > And here is a refreshed one just for testing with 2.6-git. The 64 bit > part is a shamelessly untested copy/paste as I cannot test it. yeah, we can do something like this in 2.6.25 - this will improve the quality of sched_clock(). The other patch i sent should solve the problem for 2.6.24 - printk should not be using raw sched_clock() calls. (as the name says it's for the scheduler's internal use.) I've also queued up the patch below - it removes the now unnecessary printk clock code. Ingo -> Subject: sched: remove printk_clock() From: Ingo Molnar <[EMAIL PROTECTED]> printk_clock() is obsolete - it has been replaced with cpu_clock(). Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/arm/kernel/time.c | 11 --- arch/ia64/kernel/time.c | 27 --- kernel/printk.c |5 - 3 files changed, 43 deletions(-) Index: linux/arch/arm/kernel/time.c === --- linux.orig/arch/arm/kernel/time.c +++ linux/arch/arm/kernel/time.c @@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset } #endif -/* - * An implementation of printk_clock() independent from - * sched_clock(). This avoids non-bootable kernels when - * printk_clock is enabled. - */ -unsigned long long printk_clock(void) -{ - return (unsigned long long)(jiffies - INITIAL_JIFFIES) * - (10 / HZ); -} - static unsigned long next_rtc_update; /* Index: linux/arch/ia64/kernel/time.c === --- linux.orig/arch/ia64/kernel/time.c +++ linux/arch/ia64/kernel/time.c @@ -344,33 +344,6 @@ udelay (unsigned long usecs) } EXPORT_SYMBOL(udelay); -static unsigned long long ia64_itc_printk_clock(void) -{ - if (ia64_get_kr(IA64_KR_PER_CPU_DATA)) - return sched_clock(); - return 0; -} - -static unsigned long long ia64_default_printk_clock(void) -{ - return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) * - (10/HZ); -} - -unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock; - -unsigned long long printk_clock(void) -{ - return ia64_printk_clock(); -} - -void __init -ia64_setup_printk_clock(void) -{ - if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) - ia64_printk_clock = ia64_itc_printk_clock; -} - /* IA64 doesn't cache the timezone */ void update_vsyscall_tz(void) { Index: linux/kernel/printk.c === --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -573,11 +573,6 @@ static int __init printk_time_setup(char __setup("time", printk_time_setup); -__attribute__((weak)) unsigned long long printk_clock(void) -{ - return sched_clock(); -} - /* Check if we have any console registered that can be called early in boot. */ static int have_callable_console(void) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH BUGFIX] hid: the `bit' in hidinput_mapping_quirks() is an out parameter
Fix a panic, by changing hidinput_mapping_quirks(,, unsigned long *bit,) to hidinput_mapping_quirks(,, unsigned long **bit,) The `bit' in this function is an out parameter. Cc: Jiri Kosina <[EMAIL PROTECTED]> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- drivers/hid/hid-input-quirks.c | 36 +++ drivers/hid/hid-input.c|2 - include/linux/hid.h|2 - 3 files changed, 20 insertions(+), 20 deletions(-) --- linux-2.6.24-rc4-mm1.orig/include/linux/hid.h +++ linux-2.6.24-rc4-mm1/include/linux/hid.h @@ -526,7 +526,7 @@ extern void hidinput_disconnect(struct h int hid_set_field(struct hid_field *, unsigned, __s32); int hid_input_report(struct hid_device *, int type, u8 *, int, int); int hidinput_find_field(struct hid_device *hid, unsigned int type, unsigned int code, struct hid_field **field); -int hidinput_mapping_quirks(struct hid_usage *, struct input_dev *, unsigned long *, int *); +int hidinput_mapping_quirks(struct hid_usage *, struct input_dev *, unsigned long **, int *); void hidinput_event_quirks(struct hid_device *, struct hid_field *, struct hid_usage *, __s32); int hidinput_apple_event(struct hid_device *, struct input_dev *, struct hid_usage *, __s32); void hid_input_field(struct hid_device *hid, struct hid_field *field, __u8 *data, int interrupt); --- linux-2.6.24-rc4-mm1.orig/drivers/hid/hid-input.c +++ linux-2.6.24-rc4-mm1/drivers/hid/hid-input.c @@ -382,7 +382,7 @@ static void hidinput_configure_usage(str } /* handle input mappings for quirky devices */ - ret = hidinput_mapping_quirks(usage, input, bit, &max); + ret = hidinput_mapping_quirks(usage, input, &bit, &max); if (ret) goto mapped; --- linux-2.6.24-rc4-mm1.orig/drivers/hid/hid-input-quirks.c +++ linux-2.6.24-rc4-mm1/drivers/hid/hid-input-quirks.c @@ -16,16 +16,16 @@ #include #include -#define map_abs(c) do { usage->code = c; usage->type = EV_ABS; bit = input->absbit; *max = ABS_MAX; } while (0) -#define map_rel(c) do { usage->code = c; usage->type = EV_REL; bit = input->relbit; *max = REL_MAX; } while (0) -#define map_key(c) do { usage->code = c; usage->type = EV_KEY; bit = input->keybit; *max = KEY_MAX; } while (0) -#define map_led(c) do { usage->code = c; usage->type = EV_LED; bit = input->ledbit; *max = LED_MAX; } while (0) +#define map_abs(c) do { usage->code = c; usage->type = EV_ABS; *bit = input->absbit; *max = ABS_MAX; } while (0) +#define map_rel(c) do { usage->code = c; usage->type = EV_REL; *bit = input->relbit; *max = REL_MAX; } while (0) +#define map_key(c) do { usage->code = c; usage->type = EV_KEY; *bit = input->keybit; *max = KEY_MAX; } while (0) +#define map_led(c) do { usage->code = c; usage->type = EV_LED; *bit = input->ledbit; *max = LED_MAX; } while (0) -#define map_abs_clear(c)do { map_abs(c); clear_bit(c, bit); } while (0) -#define map_key_clear(c)do { map_key(c); clear_bit(c, bit); } while (0) +#define map_abs_clear(c)do { map_abs(c); clear_bit(c, *bit); } while (0) +#define map_key_clear(c)do { map_key(c); clear_bit(c, *bit); } while (0) static int quirk_belkin_wkbd(struct hid_usage *usage, struct input_dev *input, - unsigned long *bit, int *max) + unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_CONSUMER) return 0; @@ -41,7 +41,7 @@ static int quirk_belkin_wkbd(struct hid_ } static int quirk_cherry_cymotion(struct hid_usage *usage, struct input_dev *input, - unsigned long *bit, int *max) + unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_CONSUMER) return 0; @@ -57,7 +57,7 @@ static int quirk_cherry_cymotion(struct } static int quirk_logitech_ultrax_remote(struct hid_usage *usage, struct input_dev *input, - unsigned long *bit, int *max) + unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_LOGIVENDOR) return 0; @@ -90,7 +90,7 @@ static int quirk_logitech_ultrax_remote( } static int quirk_chicony_tactical_pad(struct hid_usage *usage, struct input_dev *input, - unsigned long *bit, int *max) + unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_MSVENDOR) return 0; @@ -115,7 +115,7 @@ static int quirk_chicony_tactical_pad(st } static int quirk_microsoft_ergonomy_kb(struct hid_usage *usage, struct input_dev *input, - unsigned long *bit, int *max) + unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_MSVENDOR) return 0; @@ -138,7 +138
[patch] x86: scale cyc_2_nsec according to CPU frequency
* Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > > > Hmrpf. sched_clock() is used for the time stamp of the printks. We > > > need to find some better solution other than killing off the tsc > > > access completely. > > > > Something like http://lkml.org/lkml/2007/3/16/291 that would need > > some refresh? > > And here is a refreshed one just for testing with 2.6-git. The 64 bit > part is a shamelessly untested copy/paste as I cannot test it. Guillaume, i've updated your patch with a handful of changes - see the result below. Firstly, we dont need the 'offset' anymore because cpu_clock() maintains offsets itself. This simplifies the math and speeds up the sched_clock() common case. Secondly, with PER_CPU variables we need to update them for all possible CPUs - otherwise they might end up with a zero scaling factor which is not good. (not all CPUs are cpufreq capable) Thirdly, we can do a bit smarter and faster by using the fact that local_irq_disable() is preempt-safe - so we can use per_cpu() instead of get_cpu_var(). Ingo -> Subject: x86: scale cyc_2_nsec according to CPU frequency From: "Guillaume Chazarain" <[EMAIL PROTECTED]> scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ [EMAIL PROTECTED]: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> --- arch/x86/kernel/tsc_32.c | 41 +++- arch/x86/kernel/tsc_64.c | 59 +++ include/asm-x86/timer.h | 23 ++ 3 files changed, 102 insertions(+), 21 deletions(-) Index: linux-x86.q/arch/x86/kernel/tsc_32.c === --- linux-x86.q.orig/arch/x86/kernel/tsc_32.c +++ linux-x86.q/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + printk("CPU#%d: changed cyc2ns scale from %ld to %ld\n", + cpu, prev_scale, *scale); + local_irq_restore(flags); } /* @@ -239,7 +256,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +386,8 @@ static inline void check_geode_tsc_relia void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +401,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* +* Secondary CPUs do not run through tsc_init(), so set up +* all the scale factors for all CPUs, assuming the same +* speed as the bootup CPU. (cpufreq notifiers will fix this +* up if their speed diverges) +*/ + for_each_possible_cpu(cpu) + set_cyc2ns_scale(cpu_khz, cpu); + use_tsc_delay(); /* Check and install the TSC clocksource */ Index: linux-x86.q/arch/x86/kernel/tsc_64.c === --- linux-x86.q.orig/arch/x86/kernel/tsc_64.c +++ linux-x86.q/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include #include +#include static int notsc __initdata = 0; @@ -18,16 +19,50 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned
Re: ptrace API extensions for BTS
On Friday 07 December 2007 13:01:28 Metzger, Markus T wrote: > >From: Andi Kleen [mailto:[EMAIL PROTECTED] > >Sent: Freitag, 7. Dezember 2007 12:18 > > >> I would like to settle the discussion and find an interface that > >> everybody can agree to, so I can implement that interface and we can > >> move forward with the patch. > > > >The most efficient interface would be zero copy with tracer > >user process > >supplying memory that is pinned (get_user_pages()) subject to the > >mlock rlimit. Then kernel telling the CPU to directly log into > >that. > > That would require users to understand all kinds of BTS formats > and to detect the hardware they are running on in order to interpret > the data. That's true. I guess it could be abstracted in a library, but doing it all in kernel is indeed nicer. Ok in theory you could go fancy and put the library into the vDSO which runs in ring 3. Then it would be tied to the kernel again. > So far, there are two different formats. But one of them is wasting > an entire word of memory per record. I could imagine that this would > change some day. > > Other architectures would likely use an entirely different format. > Users who want to support several architectures would benefit from > a common format for this from-to branch information. I guess some other users would prefer higher performance, but yes there are probably both types. I don't know what is more important. > Is there some other metric that would allow me to order BTS > chunks for different threads? With Out-of-order CPUs exact global metrics are pretty difficult. At which point of the instruction execution would you measure? Anyways if RDTSC doesn't work the only global alternatives are much slower (like southbridge timers) or very inaccurate (jiffies) I would just drop it since it'll likely always be somewhat misleading. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-git4 NFS crossmnt regression
On Thu, 6 Dec 2007 23:45:58 -0500 Shane <[EMAIL PROTECTED]> wrote: > Hi, > > The NFS crossmnt/nohide feature has been working beautifully > in 2.6.23. NFS in general has been really good in 2.6.23. Thanks! > > However, starting in 2.6.24-rc3-git4, I immediately get 'NFS Stale > file handle' messages for any accesses to the NFS crossmnt'ed > volumes. Regular NFS mounts are fine but the crossmnt'ed > subdirs return only that error message. > > 2.6.24-rc3-git1 is last known good kernel. The problem also exists > with the latest snap 2.6.24-rc4-git4. NFS server is 2.6.23-rc9 and > is unchanged. hm, there have been no nfs changes since 2.6.24-rc4. > It is easily reproducible here, hopefully for the person who > knows how to debug it too :) > I guess a full set of the commands which you typed to reproduce this would help. Rafael, please add to the post-2.6.23 regression list? (If there's any room left). Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: programs vanish with 2.6.22+
Hi again! The memtest ran 14 passes (~10h) without an error. I now have a 2.6.24-rc4 with some debug-options turned on, waiting for something to happen... can I just leave it untill a window disappears or do I need to manually enable something or run some user-space app?! Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Quoting Nick Piggin <[EMAIL PROTECTED]>: On Friday 07 December 2007 19:45, Ingo Molnar wrote: ah, printk_clock() still uses sched_clock(), not jiffies. So it's not the jiffies counter that goes back and forth, it's sched_clock() - so this is a printk timestamps anomaly, not related to jiffies. I thought we have fixed this bug in the printk code already: sched_clock() is a 'raw' interface that should not be used directly - the proper interface is cpu_clock(cpu). It's a single CPU box, so sched_clock() jumping would still be problematic, no? I guess so. Definitely, it didn't look like a printk issue. Drivers don't read logs, usually. But they got confused anyway (it seems that udelay's get scaled or fail or somesuch - I can't test it right now, will provide more feedback in a few hours). -- Ciao Stefano -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Fri, 7 Dec 2007 11:40:13 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > - t = printk_clock(); > > > + t = cpu_clock(printk_cpu); > > > nanosec_rem = do_div(t, 10); > > > tlen = sprintf(tbuf, > > > "<%c>[%5lu.%06lu] ", > > > > A bit risky - it's quite an expansion of code which no longer can call > > printk. > > > > You might want to take that WARN_ON out of __update_rq_clock() ;) > > hm, dont we already detect printk recursions and turn them into a silent > return instead of a hang/crash? > We'll pop the locks and will proceed to do the nested printk. So __update_rq_clock() will need rather a lot of stack ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
Andrew Morton wrote: > On Fri, 07 Dec 2007 04:51:37 + David Woodhouse <[EMAIL PROTECTED]> wrote: > >> On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote: >>> Well I clearly goofed when I added the initial network namespace support >>> for /proc/net. Currently things work but there are odd details visible >>> to user space, even when we have a single network namespace. >>> >>> Since we do not cache proc_dir_entry dentries at the moment we can >>> just modify ->lookup to return a different directory inode depending >>> on the network namespace of the process looking at /proc/net, replacing >>> the current technique of using a magic and fragile follow_link method. >>> >>> To accomplish that this patch: >>> - introduces a shadow_proc method to allow different dentries to >>> be returned from proc_lookup. >>> - Removes the old /proc/net follow_link magic >>> - Fixes a weakness in our not caching of proc generic dentries. >>> >>> As shadow_proc uses a task struct to decided which dentry to return we >>> can go back later and fix the proc generic caching without modifying any >>> code that >>> uses the shadow_proc method. >>> >>> Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> >>> --- >>> fs/proc/generic.c | 12 ++- >>> fs/proc/proc_net.c | 86 >>> +++ >>> include/linux/proc_fs.h |3 ++ >>> 3 files changed, 19 insertions(+), 82 deletions(-) >> (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416) >> >> This seems to have broken the use of /proc/bus/usb as a mountpoint. It >> always appears empty now, whatever's supposed to be mounted there. >> > > Yes. Denis and Eric are tossing around competing patches but afaik nobody > is happy with any of them. Guys, could we get this sorted soonish please? > Andrew, I become too relaxed after receiving "Tested-by: Giacomo Catenazzi <[EMAIL PROTECTED]>" Eric, I believe that reverting an original behavior is better than your new one as - you introduce search into the depth by calling have_submounts(dentry) during revalidation for all(!) /proc dentries - your shadowing behavior will be broken if you'll mount something in the depth of shadowed tree (this can be done as a DoS attempt) As a last minute call, may be it will be better to pin network namespace like a pid namespace during mount to avoid this crap at all? Regards, Den -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Rene Herman <[EMAIL PROTECTED]> writes: > > If there are no sensible fixes, an 0x80/0xed choice could I assume be > hung of DMI or something (if that _is_ parsed soon enough). Another possibility would be to key this off DMI year (or existence of DMI year since old systems don't have it). I guess it would be reasonable to not do any delays on anything modern. On x86-64 it could be presumably always disabled too, although I was always too chicken to do that. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: VDSOSYM build error
* Andrew Morton <[EMAIL PROTECTED]> wrote: > On Thu, 6 Dec 2007 18:28:25 -0500 > "Miles Lane" <[EMAIL PROTECTED]> wrote: > > > How can I find Roland's patches, so I can try backing them out? > > I looked in the broken out patches and only saw one related > > to VDSO. Backing it out did not help. I tried searching for > > messages to LKML sent by "roland" but mostly got a bunch of > > folks sending spam. > > They're all clumped into git-x86.patch. Hard. in theory the git merges could be generated as a flat series of patch files: x86.git.foo-fixes.patch x86.git.bar-updates.patch x86.git.foo-fixes-feh.patch ... which could also include the commit log. "git-log -p" might be a suitable generator. For example, x86.git can be processed per commit, via this script: for N in `git-rev-list --reverse --no-merges --remove-empty master..mm`; do git-log -p $N done the following git-export-quilt script (just wrote it, might be buggy, so careful - and it blows away the patches/ directory wherever you run it) will generate a series file into patches/series that can be applied via quilt: rm -rf patches mkdir patches for N in `git-rev-list --reverse --no-merges --remove-empty master..mm`; do git-log -p -1 $N > .tmp export SUBJECT=`head -5 .tmp | tail -1` # generate filename out of subject line: FILE=x86.git-"`echo $SUBJECT | cut -c10- | tr '[:punct:] \t' '-' | tr -s - | tr '[:upper:]' '[:lower:]'`" # generate unique name: while [ -f patches/$FILE.patch ]; do FILE="$FILE"_; done echo $FILE.patch mv .tmp patches/$FILE.patch echo $FILE.patch >> patches/series done ls -l patches/series i ran this script over x86.git and it produced a patch series with 247 patches that quilt was able to push correctly. (in theory this concept should work for other git trees too - but i have not tried it) this would increase the series size quite substantially though - but it would make cherry-picking and patch based bisection a lot easier. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Le Fri, 7 Dec 2007 09:51:21 +0100, Ingo Molnar <[EMAIL PROTECTED]> a écrit : > yeah, we can do something like this in 2.6.25 - this will improve the > quality of sched_clock(). Thanks a lot for your interest! I'll clean it up and resend it later. As I don't have the necessary knowledge to do the tsc_{32,64}.c unification, should I copy paste common functions into tsc_32.c and tsc_64.c to ease later unification or should I start a common .c file? Thanks again for showing interest. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
Andrew Morton wrote: > On Thu, 6 Dec 2007 23:07:08 -0600 (CST) [EMAIL PROTECTED] (Bob Tracy) wrote: > > Andrew Morton wrote: > > > commit 6f37ac793d6ba7b35d338f791974166f67fdd9ba > > > Merge: 2f1f53b... d90bf5a... > > > Author: Linus Torvalds <[EMAIL PROTECTED]> > > > Date: Wed Nov 14 18:51:48 2007 -0800 > > > > > > Merge branch 'master' of > > > master.kernel.org:/pub/scm/linux/kernel/git/davem/n > > > > > > * 'master' of > > > master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: > > > (omitted for brevity) > > > > > > I'm struggling to see how any of those could have broken block device > > > mounting on alpha. Are you sure you bisected right? > > > > Based on what's in that commit, it *does* appear something went wrong > > with bisection. If the implicated commit is the next one in time > > sequence relative to > > > > # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap > > INLINE and name timeval_cmp better > > > > then the test of whether I bisected correctly is as simple as applying > > the commit and seeing if things break, because I'm running on the > > kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right > > now. Let me give that a try and I'll report back. Worst case, I'll > > have to start over and write off the past four days... > > Gad. I trust the second time will be faster. > > git-bisect _is_ very error prone. I find one of the problems is that each > step is so far apart in time that you forget what you were doing. Did I > remember to test that iteration? Did I install the right kernel? etc. > > > Sorry about this... > > Not appropriate ;) Thanks for helping out. Thanks for the kind words... The above-mentioned test verified that the bisection was/is correct: 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 works, and 6f37ac793d6ba7b35d338f791974166f67fdd9ba doesn't. Now I've got to figure out why. "git diff 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 6f37ac793d6ba7b35d338f791974166f67fdd9ba" produced a relatively short patch (18,437 bytes). The list of involved files: diff --git a/drivers/char/random.c b/drivers/char/random.c diff --git a/drivers/isdn/sc/card.h b/drivers/isdn/sc/card.h diff --git a/drivers/isdn/sc/packet.c b/drivers/isdn/sc/packet.c diff --git a/drivers/isdn/sc/shmem.c b/drivers/isdn/sc/shmem.c diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c diff --git a/drivers/net/fs_enet/Kconfig b/drivers/net/fs_enet/Kconfig diff --git a/drivers/net/fs_enet/Makefile b/drivers/net/fs_enet/Makefile diff --git a/drivers/net/netx-eth.c b/drivers/net/netx-eth.c diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c diff --git a/include/net/sock.h b/include/net/sock.h diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c diff --git a/net/core/dev.c b/net/core/dev.c diff --git a/net/ipv4/route.c b/net/ipv4/route.c diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c Current state of the source tree is the 6f37ac... version, so I'll start backing out the above diffs in related groups and continue until I've got a working kernel. For lack of an obvious target, I'll start with the seemingly innocuous change to sysctl_check.c. I'll report back when I've got something. -- Bob Tracy | "They couldn't hit an elephant at this dist- " [EMAIL PROTECTED] | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/