Re: refcounting drivers' data structures used in sysfs buffers
On Mon, 12 Mar 2007, Dmitry Torokhov wrote: Do you think Linus would listen if all three of us (plus maybe Greg) tried to convince him? If we'd accompany the argument with the patch that changes scsi to use wq to perform deletion so we don't have deadlock regression in the kernel he might be more perceptive... I wrote that patch over the weekend but forgot to bring it in to work. I'll post it tonight or tomorrow. He is right about lifetime issues but this is not strictly lifetime issue as you correctly point out. Plus, refcounting also bloats the kernel so I don't relly want to use refcount for every integer I happen to export through sysfs if I can simply revoke access. Agreed. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 07:38 +1100, Con Kolivas wrote: On Tuesday 13 March 2007 07:11, Mike Galbraith wrote: Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any special requirements. Now you're really making a stretch of things. Where on earth did I say that interactive tasks don't have special requirements? It's a fundamental feature of this scheduler that I go to great pains to get them as low latency as possible and their fair share of cpu despite having a completely fair cpu distribution. As soon as your cpu is fully utilized, fairness looses or interactivity loses. Pick one. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Slab corruption - file_free_rcu ?
Folks, I'm getting this sort of message in my logs on occasion and my system dies on me some time later. Mar 13 08:52:02 localhost kernel: [ 343.931624] Slab corruption: start=d2756f04, len=208 Mar 13 08:52:02 localhost kernel: [ 343.932366] Redzone: 0x5a2cf071/0x5a2cf071. Mar 13 08:52:02 localhost kernel: [ 343.932797] Last user: [c0155562](file_free_rcu+0xf/0x11) Mar 13 08:52:02 localhost kernel: [ 343.933429] 090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 75 6b Mar 13 08:52:02 localhost kernel: [ 343.934225] 0a0: 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b 00 6b Mar 13 08:52:02 localhost kernel: [ 343.934999] 0b0: 6b 6b 6b 6b 6b 6b ad 6b 6b 6b 6b 6b 6b 6b 6b 6b Mar 13 08:52:02 localhost kernel: [ 343.935995] Prev obj: start=d2756e28, len=208 Mar 13 08:52:02 localhost kernel: [ 343.936740] Redzone: 0x5a2cf071/0x5a2cf071. Mar 13 08:52:02 localhost kernel: [ 343.937182] Last user: [c0155562](file_free_rcu+0xf/0x11) Mar 13 08:52:02 localhost kernel: [ 343.937682] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Mar 13 08:52:02 localhost kernel: [ 343.938473] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Mar 13 09:06:37 localhost kernel: klogd 1.4.1#20, log source = /proc/kmsg started. Kernel is Linus' git tree as of 3 days ago (i.e. post 2.6.21rc3). I do have some DCCP changes in there but those modules were not loaded at the time. I've had a quick look through lkml archives and can't find anything on this in last few days. Apologies if I've missed something. I'm not sure how long this has been occurring. I have been having slab corruption on earlier kernels but they did not put an identifier on last usage. This may have been issues with my Broadcom 4306 card as this used to have lots of errors as well, until more recent kernels where stability on that is much better. My config is attached. Please cc me on any queries as I'm not subscribed to lkml. Regards, Ian -- Web: http://wand.net.nz/~iam4 Blog: http://iansblog.jandi.co.nz WAND Network Research Group config.gz Description: GNU Zip compressed data
Re: [PATCH] x86_64, i386: Add command line length to boot protocol
On Mon, Mar 12, 2007 at 10:43:52AM +, Pavel Machek wrote: On Tue 2007-03-06 13:21:34, Dave Jones wrote: On Tue, Mar 06, 2007 at 07:14:30PM +0100, Bernhard Walle wrote: +cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line, Why a long? It's unlikely that someone is going to have a command line bigger than 0x. Well, I could imagine overflowing that. Describing your numa setup, excluding few bad bits of ram using memmap=exact, set up your boot over iscsi on cmdline these are likely to eat insane ammount of cmdline space. 65535 characters? Are you for real? Stop and think about just how big that is. If you have to create a boot command line that long, you have serious, serious issues. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO+KD_GRAPHICS mode
Hi! When the console is in VT_AUTO+KD_GRAPHICS mode, switching to the SUSPEND_CONSOLE fails, resulting in vt_waitactive() waiting indefinitely or until the task is interrupted. This patch tests if a console switch can occur in set_console() and returns early if a console switch is not possible. Signed-off-by: Andrew Johnson [EMAIL PROTECTED] ACK. (I hope it still applies to latest mainline). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MPT FUSION: Delete unused header files.
From: Moore, Eric [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 10:19:18 -0600 Valdis.Kletnieks silly little rant: Certainly appropriate content for something on your website, and vendors who provide programs like dmidecode and parsemce are always welcome. I could probably be convinced that such info should have at least a pointer somewhere in Documentation/lsi_debug.txt or some such. But quite frankly, if I'm reduced to wading through *.h files to figure out what some recalcitrant hardware is upset about, there's been a failure in documentation. *ESPECIALLY* if I go look at drivers/whatever/source.c and it doesn't even *reference* the *.h file in question. Its apparent to me that you don't have our hardware, nor have you actually waded thru this driver source code. If you did, you would of noticed that the header you want to delete, is actually referenced in the *.c source code. The file mpi_log_fc.h, is indeed mentioned in mptbase.c, in the function called mpt_fc_log_info, in the documention section above the function.This header file is very helpful to those supporting our hardware, and those using it For SAS(mpi_log_sas.h), I have broken out each loginfo in the strings you will find defined in originator_str, iop_code_str, pl_code_str, etc, I probably do that with fibre. If its that important to you to have the header files included, I will provide a patch that does that. If you're going to include it just for the sake of including it, not because the code in question actually uses types or function declarations defined in there, don't bother, you're just using an anti-social mechanism to keep this header file in the tree. Please, let's kill this header file if it is unused. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] mm: move common segment checks to separate helper function (v6)
Nick Piggin [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 10:57:53AM +0300, Dmitriy Monakhov wrote: I realy don't want to be annoying by sending this patcheset over and over again. If anyone think this patch is realy cappy, please comment what exectly is bad. Thank you. Doesn't seem like a bad idea. Changes: - patch was split in two patches. +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +inline int generic_segment_checks(const struct iovec *iov, +unsigned long *nr_segs, size_t *count, +unsigned long access_flags) Make it static and not inline, and the compiler will work it out. Wow i've just carefully checked and found more functions with duplicating code: fs/xfs/linux-2.6/xfs_lrw.c:655 xfs_write() fs/ntfs/file.c:2339 ntfs_file_aio_write_nolock() So i think nobody will object against exporting generic_segment_checks() and removing doplicating code. This function name doesn't really imply that it returns you the nr_segs and count, but that's not a big deal I guess. You also don't say that nr_segs should be initialised to the amount you which to write, while count must be initialised to zero. +{ +unsigned long seg; +for (seg = 0; seg *nr_segs; seg++) { +const struct iovec *iv = iov[seg]; + +/* + * If any segment has a negative length, or the cumulative + * length ever wraps negative then return -EINVAL. + */ +*count += iv-iov_len; +if (unlikely((ssize_t)(*count|iv-iov_len) 0)) +return -EINVAL; +if (access_ok(access_flags, iv-iov_base, iv-iov_len)) +continue; Why now insert the above test, and put the below statements inside the branch? OTOH, that makes it less obviously cp from the others. Maybe a subsequent patch. +if (seg == 0) +return -EFAULT; +*nr_segs = seg; +*count -= iv-iov_len; /* This segment is no good */ +break; +} You could assign to *count here, once, and remove the requirement that the caller initialised it to zero? +return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = iocb-ki_pos; count = 0; -for (seg = 0; seg nr_segs; seg++) { -const struct iovec *iv = iov[seg]; - -/* - * If any segment has a negative length, or the cumulative - * length ever wraps negative then return -EINVAL. - */ -count += iv-iov_len; -if (unlikely((ssize_t)(count|iv-iov_len) 0)) -return -EINVAL; -if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len)) -continue; -if (seg == 0) -return -EFAULT; -nr_segs = seg; -count -= iv-iov_len; /* This segment is no good */ -break; -} +retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE); +if (retval) +return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp-f_flags O_DIRECT) { @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping-host; -unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; -for (seg = 0; seg nr_segs; seg++) { -const struct iovec *iv = iov[seg]; - -/* - * If any segment has a negative length, or the cumulative - * length ever wraps negative then return -EINVAL. - */ -ocount += iv-iov_len; -if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) -return -EINVAL; -if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) -continue; -if (seg == 0) -return -EFAULT; -nr_segs = seg; -ocount -= iv-iov_len; /* This segment is no good */ -break; -} +err = generic_segment_checks(iov, nr_segs, ocount, VERIFY_READ); +if (err) +return err; count = ocount; pos = *ppos; -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at
[PATCH 2/2] incorrect direct io error handling (v7)
Changes against v6: - Handle direct_io failure inside generic_file_direct_write() as it was recommend by Andrew (during discussion v1), and by Nick (during discussion v6). - change comments, make it more clear. - one more time check what __generic_file_aio_write_nolock() always called under i_mutex for non blkdev files. Tested with: fsstress, manual direct_io tests Log: If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size. And fsck will complain about wrong i_size (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), after fsck will fix error i_size will be increased to the biggest block, but this blocks contain gurbage from previous write attempt, this is not information leak, but its silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize, and off corse only for non blkdev files. We need truncate any block beyond i_size after write have failed , do in simular generic_file_buffered_write() error path. We may safely call vmtruncate() here because i_mutex always held for non blkdev files. TEST_CASE: open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, aaa..., 104857600) = -1 ENOSPC (No space left on device) #stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fixy? yes Pass 2: Checking directory structure Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 28 1 files changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8bd1ea4..95d49fe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1932,8 +1932,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex is held in case of DIO_LOCKING, which protects +* generic_osync_inode() from livelocking. If it is not held, then +* the filesystem must prevent this livelock. AIO O_DIRECT ops +* attempt to sync metadata here. */ if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2155,8 +2157,26 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, loff_t endbyte; ssize_t written_buffered; + /* +* In case of non blockdev we may fail to buffered I/O. +* So i_mutex must be held. +*/ + if (!S_ISBLK(inode-i_mode)) + BUG_ON(!mutex_is_locked(inode-i_mutex)); + written = generic_file_direct_write(iocb, iov, nr_segs, pos, ppos, count, ocount); + /* +* If host is not S_ISBLK generic_file_direct_write() may +* have instantiated a few blocks outside i_size files +* Trim these off again. +*/ + if (unlikely(written 0) !S_ISBLK(inode-i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count isize) + vmtruncate(inode, isize); + } + if (written 0 || written == count) goto out; /* @@ -2261,8 +2281,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] mm: move common segment checks to separate helper function (v7)
Changes against v6 - remove duplicated code from xfs,ntfs - export generic_segment_checks, because it used by xfs,nfs now. - change arguments initialization pocily according to Nick's comments. Tested with: ltp readv/writev tests Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ntfs/file.c | 21 ++- fs/xfs/linux-2.6/xfs_lrw.c | 22 ++-- include/linux/fs.h |3 ++ mm/filemap.c | 83 --- 4 files changed, 55 insertions(+), 74 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index dbbac55..621de36 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb, struct address_space *mapping = file-f_mapping; struct inode *inode = mapping-host; loff_t pos; - unsigned long seg; size_t count; /* after file limit checks */ ssize_t written, err; count = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv-iov_len; - if (unlikely((ssize_t)(count|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (!seg) - return -EFAULT; - nr_segs = seg; - count -= iv-iov_len; /* This segment is no good */ - break; - } + err = generic_segment_checks(iov, nr_segs, count, VERIFY_READ); + if (err) + return err; pos = *ppos; vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); /* We can write back this queue in page reclaim. */ diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c index ff8d64e..558076d 100644 --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -639,7 +639,6 @@ xfs_write( xfs_fsize_t isize, new_size; xfs_iocore_t*io; bhv_vnode_t *vp; - unsigned long seg; int iolock; int eventsent = 0; bhv_vrwlock_t locktype; @@ -652,24 +651,9 @@ xfs_write( vp = BHV_TO_VNODE(bdp); xip = XFS_BHVTOI(bdp); - for (seg = 0; seg segs; seg++) { - const struct iovec *iv = iovp[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv-iov_len; - if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) - return -EFAULT; - segs = seg; - ocount -= iv-iov_len; /* This segment is no good */ - break; - } + error = generic_segment_checks(iovp, segs, ocount, VERIFY_READ); + if (error) + return error; count = ocount; pos = *offset; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a3d22e..3b99450 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1778,6 +1778,9 @@ extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor extern void do_generic_mapping_read(struct address_space *mapping, struct file_ra_state *, struct file *, loff_t *, read_descriptor_t *, read_actor_t); +extern int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags); /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, diff --git a/mm/filemap.c b/mm/filemap.c index 8e1849a..8bd1ea4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1159,6 +1159,46 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * @iov: io vector request + * @nr_segs: number of segments in the iovec + * @count: number of bytes to write + * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE + * + * Adjust number of segments and amount of bytes to write (nr_segs should be + * properly initialized first). Returns appropriate error code that caller + * should return or zero in case that write should be allowed. + */ +int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]): On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote: 3. This next leads me to think that 'tasks' file in each directory doesnt make sense for containers. In fact it can lend itself to error situations (by administrator/script mistake) when some tasks of a container are in one resource class while others are in a different class. Instead, from a containers pov, it may be usefull to write a 'container id' (if such a thing exists) into the tasks file which will move all the tasks of the container into the new resource class. This is the same requirement we discussed long back of moving all threads of a process into new resource class. I think you need to give a more concrete example and use case of what you're trying to propose here. I don't really see what advantage you're getting. Ok, this is what I had in mind: mount -t container -o ns /dev/namespace mount -t container -o cpu /dev/cpu Lets we have the namespaces/resource-groups created as under: /dev/namespace |-- prof ||- tasks - (T1, T2) ||- container_id - 1 (doesnt exist today perhaps) | |-- student ||- tasks - (T3, T4) ||- container_id - 2 (doesnt exist today perhaps) /dev/cpu |-- prof ||-- tasks ||-- cpu_limit (40%) | |-- student ||-- tasks ||-- cpu_limit (20%) | | Is it possible to create the above structure in container patches? /me thinks so. If so, then accidentally someone can do this: echo T1 /dev/cpu/prof/tasks echo T2 /dev/cpu/student/tasks with the result that tasks of the same container are now in different resource classes. What's wrong with that? Thats why in case of containers I felt we shldnt allow individual tasks to be cat'ed to tasks file. Or rather, it may be nice to say : echo cid 2 /dev/cpu/prof/tasks and have all tasks belonging to container id 2 move to the new resource group. Adding that feature sounds fine, but don't go stopping me from putting T1 into /dev/cpu/prof/tasks and T2 into /dev/cpu/student/tasks just because you have your own notion of what each task is supposed to be. Just because they're in the same namespaces doesn't mean they should get the same resource allocations. If you want to add that kind of policy, well, it should be policy - user-definable. -serge - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote: What's wrong with that? I had been asking around on what is the fundamental unit of res mgmt for vservers and the answer I got (from Herbert) was all tasks that are in the same pid namespace. From what you are saying above, it seems to be that there is no such fundamental unit. It can be a random mixture of tasks (taken across vservers) whose resource consumption needs to be controlled. Is that correct? echo cid 2 /dev/cpu/prof/tasks Adding that feature sounds fine, Ok yes ..that can be a optional feature. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]): On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote: What's wrong with that? I had been asking around on what is the fundamental unit of res mgmt for vservers and the answer I got (from Herbert) was all tasks that are in the same pid namespace. From what you are saying above, it seems to be that there is no such fundamental unit. It can be a random mixture of tasks (taken across vservers) whose resource consumption needs to be controlled. Is that correct? If I'm reading it right, yes. If for vservers the fundamental unit of res mgmt is a vserver, that can surely be done at a higher level than in the kernel. Actually, these could be tied just by doing mount -t container -o ns,cpuset /containers So now any task in /containers/vserver1 or any subdirectory thereof would have the same cpuset constraints as /containers. OTOH, you could mount them separately mount -t container -o ns /nsproxy mount -t container -o cpuset /cpuset and now you have the freedom to split tasks in the same vserver (under /nsproxy/vserver1) into different cpusets. -serge echo cid 2 /dev/cpu/prof/tasks Adding that feature sounds fine, Ok yes ..that can be a optional feature. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
vatsa wrote: This assumes that you can see the global vfs namespace right? What if you are inside a container/vserver which restricts your vfs namespace? i.e /dev/cpusets seen from one container is not same as what is seen from another container . Well, yes. But that restriction on the namespace is no doing of cpusets. It's some vfs namespace restriction, which should be an orthogonal mechanism. Well, it's probably not orthogonal at present. Cpusets might not yet handle a restricted vfs name space very well. For example the /proc/pid/cpuset path, giving path below /dev/cpuset of task pid's cpuset, might not be restricted. And the set of all CPUs and Memory Nodes that are online, which is visible in various /proc files, and also visible in ones top cpuset, might be inconsistent if restricted vfs namespace mapped you to a different top cpuset. There are probably other loose ends as well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sun, Mar 11, 2007 at 12:38:43PM -0700, Paul Jackson wrote: The primary reason for the cpuset double locking, as I recall, was because cpusets needs to access cpusets inside the memory allocator. needs to access cpusets - can you be more specific? Being able to safely walk cpuset-parent list - is it the only access you are talking of or more? -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]
(trimmed CC list a bit) On Mon, 12 Mar 2007, Jiri Slaby wrote: UHCI: Eliminate asynchronous skeleton Queue Headers Post it along with the usbmon log, and I'll try to figure out what happened. Here it comes: USBMON: f7525b40 1832950485 C Ii:004:01 0 8 = 5300 f7525b40 1832950517 S Ii:004:01 -115 8 f7525140 1832950540 S Co:004:00 s 21 09 0200 0001 1 = 01 f7525140 1832952485 C Co:004:00 0 1 Corresponds to numlock; 7; numlock; 7. Jiri, thanks. Could you also please redo the test with the offending uhci patch reverted and send the output of a working situation? Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
Mike Galbraith [EMAIL PROTECTED] writes: [snip] And let's not lose sight of things with this one testcase. RSDL fixes - every starvation case - all fairness isssues - is better 95% of the time on the desktop I don't know where you got that 95% number from. For the most part, the existing scheduler does well. If it sucked 95% of the time, it would have been shredded a long time ago. I tell you. http://article.gmane.org/gmane.linux.kernel/500027 http://article.gmane.org/gmane.linux.kernel/502996 http://article.gmane.org/gmane.linux.kernel/500119 http://article.gmane.org/gmane.linux.kernel/500784 http://article.gmane.org/gmane.linux.kernel/500768 http://article.gmane.org/gmane.linux.kernel/502255 http://article.gmane.org/gmane.linux.kernel/502282 http://article.gmane.org/gmane.linux.kernel/503650 http://article.gmane.org/gmane.linux.kernel/503695 http://article.gmane.org/gmane.linux.kernel.ck/6512 http://article.gmane.org/gmane.linux.kernel.ck/6539 http://article.gmane.org/gmane.linux.kernel.ck/6565 Also, count my email too. I'm using RSDL since day one on my laptop and my router/compute server and I wont come back to mainline, needless to say why. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtd: PMC MSP71xx flash/rootfs mappings
[PATCH] mtd: PMC MSP71xx flash/rootfs mappings Patch to add flash and rootfs mappings for the PMC-Sierra MSP71xx devices. This patch references some platform support files previously submitted to the [EMAIL PROTECTED] list. Thanks, Marc Signed-off-by: Marc St-Jean [EMAIL PROTECTED] --- This patch was first posted on Feb. 23rd. I didn't receive any feedback but I'm reposting based on feedback to other patches. If this is no longer the maintainer address please let me know. Changes: -Cleanup on style and formatting for comments, macros, etc. Kconfig | 33 + Makefile |2 pmcmsp-flash.c | 184 +++ pmcmsp-ramroot.c | 105 +++ 4 files changed, 324 insertions(+) diff --git a/drivers/mtd/maps/Kconfig b/drivers/mtd/maps/Kconfig index bbf0553..e28a1ad 100644 --- a/drivers/mtd/maps/Kconfig +++ b/drivers/mtd/maps/Kconfig @@ -69,6 +69,39 @@ config MTD_PHYSMAP_OF physically into the CPU's memory. The mapping description here is taken from OF device tree. +config MTD_PMC_MSP_EVM + tristate CFI Flash device mapped on PMC-Sierra MSP + depends on PMC_MSP MTD_CFI + select MTD_PARTITIONS + help + This provides a 'mapping' driver which support the way + in which user-programmable flash chips are connected on the + PMC-Sierra MSP eval/demo boards + +choice + prompt Maximum mappable memory avialable for flash IO + depends on MTD_PMC_MSP_EVM + default MSP_FLASH_MAP_LIMIT_32M + +config MSP_FLASH_MAP_LIMIT_32M + bool 32M + +endchoice + +config MSP_FLASH_MAP_LIMIT + hex + default 0x0200 + depends on MSP_FLASH_MAP_LIMIT_32M + +config MTD_PMC_MSP_RAMROOT + tristate Embedded RAM block device for root on PMC-Sierra MSP + depends on PMC_MSP_EMBEDDED_ROOTFS \ + (MTD_BLOCK || MTD_BLOCK_RO) \ + MTD_RAM + help + This provides support for the embedded root file system + on PMC MSP devices. This memory is mapped as a MTD block device. + config MTD_SUN_UFLASH tristate Sun Microsystems userflash support depends on SPARC MTD_CFI diff --git a/drivers/mtd/maps/Makefile b/drivers/mtd/maps/Makefile index 071d0bf..de036c5 100644 --- a/drivers/mtd/maps/Makefile +++ b/drivers/mtd/maps/Makefile @@ -27,6 +27,8 @@ obj-$(CONFIG_MTD_CEIVA) += ceiva.o obj-$(CONFIG_MTD_OCTAGON) += octagon-5066.o obj-$(CONFIG_MTD_PHYSMAP) += physmap.o obj-$(CONFIG_MTD_PHYSMAP_OF) += physmap_of.o +obj-$(CONFIG_MTD_PMC_MSP_EVM) += pmcmsp-flash.o +obj-$(CONFIG_MTD_PMC_MSP_RAMROOT)+= pmcmsp-ramroot.o obj-$(CONFIG_MTD_PNC2000) += pnc2000.o obj-$(CONFIG_MTD_PCMCIA) += pcmciamtd.o obj-$(CONFIG_MTD_RPXLITE) += rpxlite.o diff --git a/drivers/mtd/maps/pmcmsp-flash.c b/drivers/mtd/maps/pmcmsp-flash.c new file mode 100644 index 000..24cd8c0 --- /dev/null +++ b/drivers/mtd/maps/pmcmsp-flash.c @@ -0,0 +1,184 @@ +/* + * Mapping of a custom board with both AMD CFI and JEDEC flash in partitions. + * Config with both CFI and JEDEC device support. + * + * Basically physmap.c with the addition of partitions and + * an array of mapping info to accomodate more than one flash type per board. + * + * Copyright 2005-2007 PMC-Sierra, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN + * NO EVENT SHALL THE AUTHOR BELIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON + * ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include linux/module.h +#include linux/types.h +#include linux/kernel.h +#include linux/mtd/mtd.h +#include linux/mtd/map.h +#include linux/mtd/partitions.h + +#include asm/io.h + +#include msp_prom.h +#include msp_regs.h + + +static struct mtd_info **msp_flash; +static struct mtd_partition **msp_parts; +static struct map_info *msp_maps;
Re: [RFC][PATCH 2/7] RSS controller core
On Mon, Mar 12, 2007 at 12:02:01PM +0300, Pavel Emelianov wrote: Maybe you have some ideas how we can decide on this? We need to work out what the requirements are before we can settle on an implementation. Linux-VServer (and probably OpenVZ): - shared mappings of 'shared' files (binaries and libraries) to allow for reduced memory footprint when N identical guests are running This is done in current patches. nice, but the question was about _requirements_ (so your requirements are?) - virtual 'physical' limit should not cause swap out when there are still pages left on the host system (but pages of over limit guests can be preferred for swapping) So what to do when virtual physical limit is hit? OOM-kill current task? when the RSS limit is hit, but there _are_ enough pages left on the physical system, there is no good reason to swap out the page at all - there is no benefit in doing so (performance wise, that is) - it actually hurts performance, and could become a separate source for DoS what should happen instead (in an ideal world :) is that the page is considered swapped out for the guest (add guest penality for swapout), and when the page would be swapped in again, the guest takes a penalty (for the 'virtual' page in) and the page is returned to the guest, possibly kicking out (again virtually) a different page - accounting and limits have to be consistent and should roughly represent the actual used memory/swap (modulo optimizations, I can go into detail here, if necessary) This is true for current implementation for booth - this patchset ang OpenVZ beancounters. If you sum up the physpages values for all containers you'll get the exact number of RAM pages used. hmm, including or excluding the host pages? - OOM handling on a per guest basis, i.e. some out of memory condition in guest A must not affect guest B This is done in current patches. Herbert, did you look at the patches before sending this mail or do you just want to 'take part' in conversation w/o understanding of hat is going on? again, the question was about requirements, not your patches, and yes, I had a look at them _and_ the OpenVZ implementations ... best, Herbert PS: hat is going on? :) HTC, Herbert Sigh. Who is running this show? Anyone? You can actually do a form of overcommittment by allowing multiple containers to share one or more of the zones. Whether that is sufficient or suitable I don't know. That depends on the requirements, and we haven't even discussed those, let alone agreed to them. ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] swsusp: Stop using page flags
Hi, The following three patches make swsusp use its own data structures for memory management instead of special page flags, so that these page flags can be used for other purposes. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] swsusp: Use inline functions for changing page flags
From: Rafael J. Wysocki [EMAIL PROTECTED] Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with calls to inline functions that can be changed in subsequent patches without modifying the code calling them. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] Acked-by: Pavel Machek [EMAIL PROTECTED] --- include/linux/suspend.h | 33 + kernel/power/snapshot.c | 48 +--- mm/page_alloc.c |6 +++--- 3 files changed, 61 insertions(+), 26 deletions(-) Index: linux-2.6.21-rc2/include/linux/suspend.h === --- linux-2.6.21-rc2.orig/include/linux/suspend.h 2007-03-02 09:05:53.0 +0100 +++ linux-2.6.21-rc2/include/linux/suspend.h2007-03-02 09:24:02.0 +0100 @@ -8,6 +8,7 @@ #include linux/notifier.h #include linux/init.h #include linux/pm.h +#include linux/mm.h /* struct pbe is used for creating lists of pages that should be restored * atomically during the resume from disk, because the page frames they have @@ -49,6 +50,38 @@ void __save_processor_state(struct saved void __restore_processor_state(struct saved_context *ctxt); unsigned long get_safe_page(gfp_t gfp_mask); +/* Page management functions for the software suspend (swsusp) */ + +static inline void swsusp_set_page_forbidden(struct page *page) +{ + SetPageNosave(page); +} + +static inline int swsusp_page_is_forbidden(struct page *page) +{ + return PageNosave(page); +} + +static inline void swsusp_unset_page_forbidden(struct page *page) +{ + ClearPageNosave(page); +} + +static inline void swsusp_set_page_free(struct page *page) +{ + SetPageNosaveFree(page); +} + +static inline int swsusp_page_is_free(struct page *page) +{ + return PageNosaveFree(page); +} + +static inline void swsusp_unset_page_free(struct page *page) +{ + ClearPageNosaveFree(page); +} + /* * XXX: We try to keep some more pages free so that I/O operations succeed * without paging. Might this be more? Index: linux-2.6.21-rc2/kernel/power/snapshot.c === --- linux-2.6.21-rc2.orig/kernel/power/snapshot.c 2007-03-02 09:05:53.0 +0100 +++ linux-2.6.21-rc2/kernel/power/snapshot.c2007-03-02 09:27:06.0 +0100 @@ -67,15 +67,15 @@ static void *get_image_page(gfp_t gfp_ma res = (void *)get_zeroed_page(gfp_mask); if (safe_needed) - while (res PageNosaveFree(virt_to_page(res))) { + while (res swsusp_page_is_free(virt_to_page(res))) { /* The page is unsafe, mark it for swsusp_free() */ - SetPageNosave(virt_to_page(res)); + swsusp_set_page_forbidden(virt_to_page(res)); allocated_unsafe_pages++; res = (void *)get_zeroed_page(gfp_mask); } if (res) { - SetPageNosave(virt_to_page(res)); - SetPageNosaveFree(virt_to_page(res)); + swsusp_set_page_forbidden(virt_to_page(res)); + swsusp_set_page_free(virt_to_page(res)); } return res; } @@ -91,8 +91,8 @@ static struct page *alloc_image_page(gfp page = alloc_page(gfp_mask); if (page) { - SetPageNosave(page); - SetPageNosaveFree(page); + swsusp_set_page_forbidden(page); + swsusp_set_page_free(page); } return page; } @@ -110,9 +110,9 @@ static inline void free_image_page(void page = virt_to_page(addr); - ClearPageNosave(page); + swsusp_unset_page_forbidden(page); if (clear_nosave_free) - ClearPageNosaveFree(page); + swsusp_unset_page_free(page); __free_page(page); } @@ -615,7 +615,8 @@ static struct page *saveable_highmem_pag BUG_ON(!PageHighMem(page)); - if (PageNosave(page) || PageReserved(page) || PageNosaveFree(page)) + if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page) || + PageReserved(page)) return NULL; return page; @@ -681,7 +682,7 @@ static struct page *saveable_page(unsign BUG_ON(PageHighMem(page)); - if (PageNosave(page) || PageNosaveFree(page)) + if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page)) return NULL; if (PageReserved(page) pfn_is_nosave(pfn)) @@ -821,9 +822,10 @@ void swsusp_free(void) if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); - if (PageNosave(page) PageNosaveFree(page)) { - ClearPageNosave(page); - ClearPageNosaveFree(page); + if
[PATCH 3/3] mm: Remove unused page flags
From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the two page flags that were previously used by swsusp and are no longer needed. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] Acked-by: Pavel Machek [EMAIL PROTECTED] --- include/linux/page-flags.h | 12 1 file changed, 12 deletions(-) Index: linux-2.6.21-rc3/include/linux/page-flags.h === --- linux-2.6.21-rc3.orig/include/linux/page-flags.h +++ linux-2.6.21-rc3/include/linux/page-flags.h @@ -82,13 +82,11 @@ #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ -#define PG_nosave 13 /* Used for system suspend/resume */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ -#define PG_nosave_free 18 /* Used for system suspend/resume */ #define PG_buddy 19 /* Page is free, on buddy lists */ /* PG_owner_priv_1 users should have descriptive aliases */ @@ -214,16 +212,6 @@ static inline void SetPageUptodate(struc ret;\ }) -#define PageNosave(page) test_bit(PG_nosave, (page)-flags) -#define SetPageNosave(page)set_bit(PG_nosave, (page)-flags) -#define TestSetPageNosave(page)test_and_set_bit(PG_nosave, (page)-flags) -#define ClearPageNosave(page) clear_bit(PG_nosave, (page)-flags) -#define TestClearPageNosave(page) test_and_clear_bit(PG_nosave, (page)-flags) - -#define PageNosaveFree(page) test_bit(PG_nosave_free, (page)-flags) -#define SetPageNosaveFree(page)set_bit(PG_nosave_free, (page)-flags) -#define ClearPageNosaveFree(page) clear_bit(PG_nosave_free, (page)-flags) - #define PageBuddy(page)test_bit(PG_buddy, (page)-flags) #define __SetPageBuddy(page) __set_bit(PG_buddy, (page)-flags) #define __ClearPageBuddy(page) __clear_bit(PG_buddy, (page)-flags) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] swsusp: Do not use page flags
From: Rafael J. Wysocki [EMAIL PROTECTED] Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and free pages. This allows us to 'recycle' two page flags that can be used for other purposes. Also, the memory needed to store the bitmaps is allocated when necessary (ie. before the suspend) and freed after the resume which is more reasonable. The patch is designed to minimize the amount of changes and there are some nice simplifications and optimizations possible on top of it. I am going to implement them separately in the future. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] Acked-by: Pavel Machek [EMAIL PROTECTED] --- arch/x86_64/kernel/e820.c | 26 +--- include/linux/suspend.h | 58 +++--- kernel/power/disk.c | 23 +++- kernel/power/power.h |2 kernel/power/snapshot.c | 250 +++--- kernel/power/user.c |4 6 files changed, 281 insertions(+), 82 deletions(-) Index: linux-2.6.21-rc3/include/linux/suspend.h === --- linux-2.6.21-rc3.orig/include/linux/suspend.h +++ linux-2.6.21-rc3/include/linux/suspend.h @@ -24,63 +24,41 @@ struct pbe { extern void drain_local_pages(void); extern void mark_free_pages(struct zone *zone); -#ifdef CONFIG_PM -/* kernel/power/swsusp.c */ -extern int software_suspend(void); - -#if defined(CONFIG_VT) defined(CONFIG_VT_CONSOLE) +#if defined(CONFIG_PM) defined(CONFIG_VT) defined(CONFIG_VT_CONSOLE) extern int pm_prepare_console(void); extern void pm_restore_console(void); #else static inline int pm_prepare_console(void) { return 0; } static inline void pm_restore_console(void) {} -#endif /* defined(CONFIG_VT) defined(CONFIG_VT_CONSOLE) */ +#endif + +#if defined(CONFIG_PM) defined(CONFIG_SOFTWARE_SUSPEND) +/* kernel/power/swsusp.c */ +extern int software_suspend(void); +/* kernel/power/snapshot.c */ +extern void __init register_nosave_region(unsigned long, unsigned long); +extern int swsusp_page_is_forbidden(struct page *); +extern void swsusp_set_page_free(struct page *); +extern void swsusp_unset_page_free(struct page *); +extern unsigned long get_safe_page(gfp_t gfp_mask); #else static inline int software_suspend(void) { printk(Warning: fake suspend called\n); return -ENOSYS; } -#endif /* CONFIG_PM */ + +static inline void register_nosave_region(unsigned long b, unsigned long e) {} +static inline int swsusp_page_is_forbidden(struct page *p) { return 0; } +static inline void swsusp_set_page_free(struct page *p) {} +static inline void swsusp_unset_page_free(struct page *p) {} +#endif /* defined(CONFIG_PM) defined(CONFIG_SOFTWARE_SUSPEND) */ void save_processor_state(void); void restore_processor_state(void); struct saved_context; void __save_processor_state(struct saved_context *ctxt); void __restore_processor_state(struct saved_context *ctxt); -unsigned long get_safe_page(gfp_t gfp_mask); - -/* Page management functions for the software suspend (swsusp) */ - -static inline void swsusp_set_page_forbidden(struct page *page) -{ - SetPageNosave(page); -} - -static inline int swsusp_page_is_forbidden(struct page *page) -{ - return PageNosave(page); -} - -static inline void swsusp_unset_page_forbidden(struct page *page) -{ - ClearPageNosave(page); -} - -static inline void swsusp_set_page_free(struct page *page) -{ - SetPageNosaveFree(page); -} - -static inline int swsusp_page_is_free(struct page *page) -{ - return PageNosaveFree(page); -} - -static inline void swsusp_unset_page_free(struct page *page) -{ - ClearPageNosaveFree(page); -} /* * XXX: We try to keep some more pages free so that I/O operations succeed Index: linux-2.6.21-rc3/kernel/power/snapshot.c === --- linux-2.6.21-rc3.orig/kernel/power/snapshot.c +++ linux-2.6.21-rc3/kernel/power/snapshot.c @@ -21,6 +21,7 @@ #include linux/kernel.h #include linux/pm.h #include linux/device.h +#include linux/init.h #include linux/bootmem.h #include linux/syscalls.h #include linux/console.h @@ -34,6 +35,10 @@ #include power.h +static int swsusp_page_is_free(struct page *); +static void swsusp_set_page_forbidden(struct page *); +static void swsusp_unset_page_forbidden(struct page *); + /* List of PBEs needed for restoring the pages that were allocated before * the suspend and included in the suspend image, but have also been * allocated by the resume kernel, so their contents cannot be written @@ -224,11 +229,6 @@ static void chain_free(struct chain_allo * of type unsigned long each). It also contains the pfns that * correspond to the start and end of the represented memory area and * the number of bit chunks in the block. - * - * NOTE: Memory bitmaps are used for two types of operations only: - * set a bit and find the next bit set. Moreover, the searching - * is always
RE: [PATCH] MPT FUSION: Delete unused header files.
If you're going to include it just for the sake of including it, not because the code in question actually uses types or function declarations defined in there, don't bother, you're just using an anti-social mechanism to keep this header file in the tree. Please, let's kill this header file if it is unused. Beside including the header I plan to use every define in that header defined someplace in the source code. Now can I keep the header? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MPT FUSION: Delete unused header files.
From: Moore, Eric [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 15:29:45 -0600 Beside including the header I plan to use every define in that header defined someplace in the source code. Now can I keep the header? For sure :-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
Op Monday 12 March 2007, schreef Con Kolivas: If we fix 95% of the desktop and worsen 5% is that bad given how much else we've gained in the process? Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any special requirements. Now you're really making a stretch of things. Where on earth did I say that interactive tasks don't have special requirements? It's a fundamental feature of this scheduler that I go to great pains to get them as low latency as possible and their fair share of cpu despite having a completely fair cpu distribution. As far as I understand it, RSDL always gives an equal share of cpu, but interactive tasks can have lower latency, right? So you get in trouble with interactive tasks only when their share isn't enough to actually do what they have to do in that period, eg on a heavily (over?) loaded box. Staircase, like mainline which gave them MORE than their share, would support that (though this comes at a price). So, if your box is overloaded to a great extend, X, which can use a lot of cpu, can get unresponsive - unless it's negatively niced. But most other apps aren't as demanding as X is, so they won't really suffer. Thus the problem is mostly X. And at least part of that problem is being solved - X wasting cpu cycles. Also, cpu's are getting stronger, and I think it's likely X's relative CPU usage goes down as well. In the long term, RSDL seems like the best way to go. Nice X down, and you got most of the disadvantages. You still have the perfect fairness, no stalls and starvation ;-) If RSDL can be improved to help X, great. But introducing again the problem which RSDL was supposed to solve would be pretty pointless. I think that's what grumpy Con is trying to say, and he's right at it. grtz Jos -- Disclaimer: Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld. pgpsAMjZ37p5Q.pgp Description: PGP signature
Re: refcounting drivers' data structures used in sysfs buffers
On Mon, 2007-03-12 at 16:31 -0400, Dmitry Torokhov wrote: On 3/12/07, Alan Stern [EMAIL PROTECTED] wrote: On Mon, 12 Mar 2007, Oliver Neukum wrote: I don't like reverting my own code. But I predict he'll tell you that a driver's bond with a device should be represented in a data structure that is to be refcounted. There still would be a synchronization problem. Refcounts don't solve races; they only solve lifetime problems. And you would still have to change the sysfs API, plus all the other stuff... Do you think Linus would listen if all three of us (plus maybe Greg) tried to convince him? If we'd accompany the argument with the patch that changes scsi to use wq to perform deletion so we don't have deadlock regression in the kernel he might be more perceptive... He is right about lifetime issues but this is not strictly lifetime issue as you correctly point out. Plus, refcounting also bloats the kernel so I don't relly want to use refcount for every integer I happen to export through sysfs if I can simply revoke access. For what its worth, I think it makes sense if the driver no longer has to worry about sysfs attributes after they've been removed. This is something the core should look after, not each and every driver. http://marc.theaimsgroup.com/?l=linux-kernelm=117355959020831w=2 makes a lot of sense, particularly that No driver callbacks occur after unregistration. When writing the backlight class code, I remember checking into this, concluding that seemed to be the design of sysfs and thinking it a sane design. The alternative is to force each and every driver to do its own refcounting. My experience with locking in the extremely simple backlight class shows nobody reads the documentation or writes the code correctly. With that, I've given up and added suitable locking to the core even if not every driver needs it. In doing so, I made a net removal of a few hundred lines of broken ticking timebomb style code. I dread to think what would happen if every driver had to deal with sysfs refcounting. So count me as a vote for handling this in the sysfs core, not the drivers. Richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/12/07, Con Kolivas [EMAIL PROTECTED] wrote: On Tuesday 13 March 2007 07:11, Mike Galbraith wrote: On Tue, 2007-03-13 at 05:49 +1100, Con Kolivas wrote: On Tuesday 13 March 2007 01:34, Mike Galbraith wrote: On Mon, 2007-03-12 at 22:23 +1100, Con Kolivas wrote: Mike the cpu is being proportioned out perfectly according to fairness as I mentioned in the prior email, yet X is getting the lower latency scheduling. I'm not sure within the bounds of fairness what more would you have happen to your liking with this test case? It has been said that perfection is the enemy of good. The two interactive tasks receiving 40% cpu while two niced background jobs receive 60% may well be perfect, but it's damn sure not good. Again I think your test is not a valid testcase. Why use two threads for your encoding with one cpu? Is that what other dedicated desktop OSs would do? The testcase is perfectly valid. My buddies box has two full cores, so we used two encoders such that whatever bandwidth is not being actively consumed by more important things gets translated into mp3 encoding. How would you go about ensuring that there won't be any cycles wasted? _My_ box has 1 core that if fully utilized translates to 1.2 cores.. or whatever, depending on the phase of the moon. But no matter, logical vs physical cpu argument is pure hand-waving. What really matters here is the bottom line: your fair scheduler ignores the very real requirements of interactivity. Definitely not. It does not give unfair cpu towards interactive tasks. That's a very different argument. I think the issue here is that the scheduler is doing what Con expects it to do, but not what Mike Galbraith here feels it should do. Maybe Con and Mike here are using different definitions, as such, for interactivity, or at least have different ideas of how this is supposed to be accomplished. Does that sound right? I've begun using RSDL on my machines here, and so far there haven't been any issues with it, in my opinion. From a feel standpoint, it's not what I would call perfectly smooth, but it is better than the other schedulers I've seen (and the one case where there are still problems it is an issue of I/O contention, not CPU -- using RSDL has made a surprisingly large impact regardless). Perhaps, Mike Galbraith, do you feel that it should be possible to use the CPU at 100% for some task and still maintain excellent interactivity? (It has always seemed to me that if you wanted interactivity, you had to have the CPU idle at least a couple percent of the time. How much or how little that many percent had to be was usually affected by how much preempting you put in the kernel, and what CPU scheduler was in it at the time.) Considering the concepts put out by projects such as BOINC and [EMAIL PROTECTED], I wouldn't be thoroughly surprised by this ideology, although I do question the particular way this test case is being run. That said, I haven't run the test case in particular yet, although I will see if I can get the time to do so soon. In any case, I personally do have a few qualms about this test case being run on HT virtual cores: * I am curious about why splitting a task and running them on separate HT virtual cores improves interactivity any. (If it was Amarok on one virtual CPU and one lame on the other, I would get it. But I see two lame processes here -- wouldn't they just be allocated one to each virtual CPU, leaving Amarok out most of the time? How do you get interactivity with that?) Does using HT really fill up the CPU better than having the CPU announce itself as the single core it is? My understanding is that throughput goes down somewhat even just by using multiple threads with HT, compared to the single thread on the single core, and why would you use more than one lame thread unless you seek throughput? * Where are the lame processes encoding to/from? For example, are the results for both being sent to /dev/null? To a hard drive? etc. etc. In a real-world test case, I would imagine a user running TWO lame processes would be encoding from two sources to the same hard drive. (Or, they might even be both encoding FROM that same hard drive. Or both.) The need for the single HD to seek so much reduces throughput on most of these cases in HT, IIRC, which may be a factor that would probably defeat the point of this case for most users. Of course, my point is negated if they have multiple drives for their use of lame, and/or if they have sufficient memory and bandwidth to handle the issue, or if encoding throughput isn't their aim. The only reason I can think of that running two lame processes would improve interactivity would be so that if one particular portion gets stuck, then there's a chance the other thread will be working on an easier portion, making it appear like more is being done. This occurs, for example, with POV-Ray and Blender, where some parts of the image may require
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 00:05 +0300, Serge Belyshev wrote: Mike Galbraith [EMAIL PROTECTED] writes: [snip] And let's not lose sight of things with this one testcase. RSDL fixes - every starvation case - all fairness isssues - is better 95% of the time on the desktop I don't know where you got that 95% number from. For the most part, the existing scheduler does well. If it sucked 95% of the time, it would have been shredded a long time ago. I tell you. http://article.gmane.org/gmane.linux.kernel/500027 http://article.gmane.org/gmane.linux.kernel/502996 http://article.gmane.org/gmane.linux.kernel/500119 http://article.gmane.org/gmane.linux.kernel/500784 http://article.gmane.org/gmane.linux.kernel/500768 http://article.gmane.org/gmane.linux.kernel/502255 http://article.gmane.org/gmane.linux.kernel/502282 http://article.gmane.org/gmane.linux.kernel/503650 http://article.gmane.org/gmane.linux.kernel/503695 http://article.gmane.org/gmane.linux.kernel.ck/6512 http://article.gmane.org/gmane.linux.kernel.ck/6539 http://article.gmane.org/gmane.linux.kernel.ck/6565 Thanks, but I've already read them. They are part of the reason I decided to spend some time testing. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Move to unshared VMAs in NOMMU mode?
On Fri 9 Mar 2007 09:12, David Howells pondered: I've been considering how to deal with the SYSV SHM problem, and I think we may have to move to unshared VMAs in NOMMU mode to deal with this. Thanks for putting some good thoughts down. Currently, what we have is each mm_struct has in its arch-specific context argument a list of VMLs. Take the FRV context for example: [include/asm-frv/mmu.h] typedef struct { #ifdef CONFIG_MMU ... struct vm_list_struct *vmlist; unsigned long end_brk; #endif ... } mm_context_t; Each VML struct containes a pointer to a systemwide VMA and the next VML in the list: struct vm_list_struct { struct vm_list_struct *next; struct vm_area_struct *vma; }; The VMAs themselves are kept in an rb-tree in mm/nommu.c: /* list of shareable VMAs */ struct rb_root nommu_vma_tree = RB_ROOT; which can then be displayed through /proc/maps. There are some restrictions of this system, mainly due to the NOMMU constraints: (*) mmap() may not be used to overlay one mapping upon another (*) mmap() may not be used with MAP_FIXED. (*) mmap()'s of the same part of the same file will result in multiple mappings returning the same base address, assuming the maps are shareable. If they aren't shareable, they'll be at different base addresses. (*) for normal shareable file mappings, two mappings will only be shared if they precisely match offset, size and protection, otherwise a new mapping will be created (this is because VMAs will be shared). Splitting VMAs would reduce the this restriction, though subsequent mappings would have to be bounded by the first mapping, but wouldn't have to be the same size. (*) munmap() may only unmap a precise match amongst the mappings made; it may not be used to cut down or punch a hole in an existing mapping. The VMAs for private file mappings, private blockdev mappings and anonymous mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d region of memory in which the mapping contents reside. This region is discarded when the VMA is deleted. When a region can be shared the VMA is also shared, and so no reference counting need take place on the mapping contents as that is implied by the VMA. [*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared Note that for mappable chardevs with special BDI capability flags, extra VMAs may be allocated because (a) they may need to overlap non-exactly, and (b) the chardev itself pins the backing storage, if the backing storage is potentially transient. If VMAs are not shared for shared memory regions then some other means of retaining the actual allocated memory region must be found. The obvious way to do this is to have the VMA point to a shared, refcounted record that keeps track of the region: struct vm_region { /* the first parameters define the region as for the VMA */ pgprot_tvm_page_prot; unsigned long vm_start; unsigned long vm_end unsigned long vm_pgoff; struct file *vm_file; atomic_tvm_usage; /* region usage count */ struct rb_node vm_rb; /* region tree */ }; The VMA itself would then have to be modified to include a pointer to this, but wouldn't then need its own refcount. VMAs would belong, once again, to the mm_struct, the VML struct would vanish, and the VML list rooted in mm_context_t would vanish. For R/O shareable file mappings, it might be possible to actually use the target file's pagecache for the mapping. I do something of that sort for shared-writable mappings on ramfs files (to support POSIX SHM and SYSV SHM). The downside of allocating all these extra VMAs is that, of course, it takes up more memory, though that may not be too bad, especially if it's at the gain of additional consistency with the MM code. I guess I don't look at it as consistency with the MM code as being the primary request, but consistency in operation with the MM code from a user space perspective - hopefully the two goals are not divergent. However, consistency isn't for the most part a real issue. As I see it, drivers and filesystems should not concern themselves with anything other than the VMA they're given, and so it doesn't matter if these are shared or not. That brings us on to the problem with SYSV SHM which keeps an attachment count that the VMA mmap(), open() and release() ops manipulate. This means that the nattch count comes out wrong on NOMMU systems. Note that on MMU systems, doing a munmap() in the middle of an attached region will *also* break the nattch count, though this is self-correcting. Another way of dealing with the nattch count on NOMMU systems is to do it through
Re: _proxy_pda still makes linking modules fail
On Mon, 2007-03-12 at 10:48 +0100, Andi Kleen wrote: Rusty's pda-per_cpu patch will deal with this once and for all; have Not on x86-64. Indeed. Perhaps it's time I join the modern world and compile a 64-bit kernel... Will prepare patches, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
Hi Nick, Anyway, I'll keep experimenting. If anyone from MySQL wants to help look at this, send me a mail (eg. especially with the sched_setscheduler issue, you might be able to do something better). I took a look at this today and figured Id document it: http://ozlabs.org/~anton/linux/sysbench/ Bottom line: it looks like issues in the glibc malloc library, replacing it with the google malloc library fixes the negative scaling: # apt-get install libgoogle-perftools0 # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld Anton - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)
Hi Trent, Patch looks good, just one comment: On Mon, 2007-03-12 at 07:07 -0700, Trent Piepho wrote: + use = already_uses(a, b); + if (!use) { + printk(KERN_ERR module %s trying to un-use a module, %s, which + it is not using, a-name, b-name); +return 0; + } s/return 0/BUG()/. This is potentially quite a nasty bug. Thanks! Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] BUILD_BUG_ON_ZERO - BUILD_BUG_OR_ZERO
On Mon, 2007-03-12 at 15:14 +0100, Stefan Richter wrote: Robert P. J. Day wrote: On Mon, 12 Mar 2007, Stefan Richter wrote: Rusty Russell wrote: OTOH, BUILD_BUG_OR_ZERO says what happens: either it's a build bug, or it's zero. What about ZERO_UNLESS_BUILD_BUG_ON(e)? It's long though... how often is this going to be used? it's not like the tree is currently awash in calls to BUILD_BUG_ON_ZERO as it is. Most of the time it will hidden as a macro-in-a-macro, like in ARRAY_SIZE(). So the length of the name doesn't matter much. But then, the _name_ itself doesn't matter much because authors of public macros are the primary user group, not John Driverhacker. Well, there's a four line comment above it, so *someone* thought it worth documenting. Even if the new name isn't great, the old name is actively misleading. That's a 13, and we could be a 4. http://ozlabs.org/~rusty/ols-2003-keynote/img52.html Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]
Jiri Kosina napsal(a): (trimmed CC list a bit) On Mon, 12 Mar 2007, Jiri Slaby wrote: UHCI: Eliminate asynchronous skeleton Queue Headers Post it along with the usbmon log, and I'll try to figure out what happened. Here it comes: USBMON: f7525b40 1832950485 C Ii:004:01 0 8 = 5300 f7525b40 1832950517 S Ii:004:01 -115 8 f7525140 1832950540 S Co:004:00 s 21 09 0200 0001 1 = 01 f7525140 1832952485 C Co:004:00 0 1 Corresponds to numlock; 7; numlock; 7. Alan, sorry for the previous bad post, I mismatched 2 files. This is hopefully correct. thanks. Could you also please redo the test with the offending uhci patch reverted and send the output of a working situation? - BAD kernel: USBMON output: d28dba40 1882513063 C Ii:008:01 0 8 = 5300 d28dba40 1882513090 S Ii:008:01 -115 8 f7b31340 1882515363 S Co:008:00 s 21 09 0200 0001 1 = 00 f7b31340 1882517065 C Co:008:00 0 1 UHCI snapshot before hang: Root-hub state: running FSBR: 0 HC status usbcmd= 00c1 Maxp64 CF RS usbstat = usbint= 000f usbfrnum = (1)764 flbaseadd = 0303d764 sof = 40 stat1 = 01a5 LowSpeed Enabled Connected stat2 = 0095 Enabled Connected Most recent frame: 75a2 (418) Last ISO frame: 75a2 (418) Periodic load table 12 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 Total: 520, #INT: 4, #ISO: 0 Frame List Skeleton QHs - skel_unlink_qh [c3c41000] Skel QH link (0001) element (0001) queue is empty - skel_iso_qh [c3c41060] Skel QH link (0001) element (0001) queue is empty - skel_int128_qh [c3c410c0] Skel QH link (03c41542) element (0001) queue is empty [c3c41540] INT QH link (03c41362) element (02c4a0f0) period 128 phase 0 load 12 us urb_priv [f7b2da4c] urb [f7b314c0] qh [c3c41540] Dev=7 EP=1(IN) INT Actlen=0 1: [c2c4a0f0] link (02c4a0c0) e3 IOC Active NAK Length=7ff MaxLen=0 DT1 EndPt=1 Dev=7, PID=69(IN) (buf=36a4a040) Dummy TD [c2c4a0c0] link (02c4a120) e0 Length=0 MaxLen=7ff DT0 EndPt=0 Dev=0, PID=e1(OUT) (buf=) - skel_int64_qh [c3c41120] Skel QH link (03c41362) element (0001) queue is empty - skel_int32_qh [c3c41180] Skel QH link (03c41362) element (0001) queue is empty - skel_int16_qh [c3c411e0] Skel QH link (03c41362) element (0001) queue is empty - skel_int8_qh [c3c41240] Skel QH link (03c41482) element (0001) queue is empty [c3c41480] INT QH link (03c41602) element (02c4a030) period 8 phase 4 load 93 us urb_priv [f7b2d3bc] urb [d28dbc40] qh [c3c41480] Dev=2 EP=1(IN) INT Actlen=0 1: [c2c4a030] link (02c4a060) e3 LS IOC Active NAK Length=7ff MaxLen=3 DT0 EndPt=1 Dev=2, PID=69(IN) (buf=037c5000) Dummy TD [c2c4a060] link (02c4a0f0) e0 Length=0 MaxLen=7ff DT0 EndPt=0 Dev=0, PID=e1(OUT) (buf=) [c3c41600] INT QH link (03c41662) element (02c4a150) period 8 phase 4 load 17 us urb_priv [f7b2da30] urb [d28dba40] qh [c3c41600] Dev=8 EP=1(IN) INT Actlen=0 1: [c2c4a150] link (02c4a120) e3 IOC Active NAK Length=7ff MaxLen=7 DT1 EndPt=1 Dev=8, PID=69(IN) (buf=037c5180) Dummy TD [c2c4a120] link (02c4a180) e0 Length=0 MaxLen=7ff DT0 EndPt=0 Dev=0, PID=e1(OUT) (buf=) [c3c41660] INT QH link (03c41362) element (02c4a1b0) period 8 phase 4 load 17 us urb_priv [f7b2d9f8] urb [d1622840] qh [c3c41660] Dev=8 EP=2(IN) INT Actlen=0 1: [c2c4a1b0] link (02c4a1e0) e3 IOC Active NAK Length=7ff MaxLen=4 DT0 EndPt=2 Dev=8, PID=69(IN) (buf=037c5300) Dummy TD [c2c4a1e0] link (02c4a210) e0 Length=0 MaxLen=7ff DT0 EndPt=0 Dev=0, PID=e1(OUT) (buf=) - skel_int4_qh [c3c412a0] Skel QH link (03c41362) element (0001) queue is empty - skel_int2_qh [c3c41300] Skel QH link (03c41362) element (0001) queue is empty - skel_async_qh [c3c41360] Skel QH link (0001) element (02c4a000) queue is empty - skel_term_qh [c3c413c0] Skel QH link (0001) element (02c4a000) queue is empty UHCI snapshot after hang: Root-hub state: running FSBR: 0 HC status usbcmd= 00c1 Maxp64 CF RS usbstat = usbint= 000f usbfrnum = (1)c2c flbaseadd = 0303dc2c sof = 40 stat1 = 01a5 LowSpeed Enabled Connected stat2 = 0095 Enabled Connected Most recent frame: 9efc (764) Last ISO frame: 9efc (764) Periodic load table 12 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 0 0 0 0 127 0 0 0 Total: 520,
Djprobes questions
Hi Masami, I recently had to add support for inline code patching on i386 to my marker infrastructure. Clearly, it looks like what is done in djprobes, with the main difference that I only patch the immediate value of a 2 bytes load immediate instruction. I think I found a solution to one of the main issues with djprobes : it currently has to wait for each CPU to hit the probe before being sure that it's safe to patch the code with something else than an int3. This is due to PIII errata 49, which says that a CPU much execute a serializing instruction before executing cross-modified code. Here is what I do : While I use a breakpoint to fall in a trap for the CPUs that hit the site currently being modified, I also send an IPI to all CPUs so they execute cpuid. Once it returns, I am sure that every CPU has executed a serializing instruction, which enables me to go on with the complete code modification, therefore removing the initial breakpoint. Here is my code : http://ltt.polymtl.ca/cgi-bin/gitweb.cgi?p=linux-2.6-lttng.git;a=blob;f=arch/i386/kernel/marker.c;h=89b06f02f0966685be260d6364a0dd94c3d14456;hb=v2.6.20-lttng (Comments are welcome) On a second note, looking at the djprobes code triggered some question in my mind about the safety of using a worker thread to make sure every interrupt context has returned (so there is no IP pointing into the modified code). The following scenario might be possible : an interrupt handler (or trap handler) reenables interrupts, does irq_exit() or nmi_exit() (which reenables preemption) but does not do iret yet. My understanding is that it could be scheduled and have a return IP pointing to the code that is being modified. Am I right ? Regards, Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
Hi! Looks good to me! The other kthread_should_stop() calls in rcutorture.c should also become kthread_should_top_check_freeze(). Why is it useful? Because we want to avoid repeating while (!kthread_should_stop()) { try_to_freeze(); ... } in many places? Do not do it, then. Confusion it causes is not worth saving one line of code. You do less typing, but the resulting code is _less_ readable, not more. NAK. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
On Mon, Mar 12, 2007 at 11:42:59AM -0700, Dave Hansen wrote: How about we drill down on these a bit more. On Mon, 2007-03-12 at 02:00 +0100, Herbert Poetzl wrote: - shared mappings of 'shared' files (binaries and libraries) to allow for reduced memory footprint when N identical guests are running So, it sounds like this can be phrased as a requirement like: Guests must be able to share pages. Can you give us an idea why this is so? sure, one reason for this is that guests tend to be similar (or almost identical) which results in quite a lot of 'shared' libraries and executables which would otherwise get cached for each guest and would also be mapped for each guest separately On a typical vserver system, there is nothing like a typical Linux-VServer system :) how much memory would be lost if guests were not permitted to share pages like this? let me give a real world example here: - typical guest with 600MB disk space - about 100MB guest specific data (not shared) - assumed that 80% of the libs/tools are used gives 400MB of shared read only data assumed you are running 100 guests on a host, that makes ~39GB of virtual memory which will get paged in and out over and over again ... .. compared to 400MB shared pages in memory :) How much does this decrease the density of vservers? well, let's look at the overall memory resource function with the above assumptions: with sharing: f(N) = N*80M + 400M without sharing: g(N) = N*480M so the decrease N-inf: g/f - 6 (factor) which is quite realistic, if you consider that there are only so many distributions, OTOH, the factor might become less important when the guest specific data grows ... - virtual 'physical' limit should not cause swap out when there are still pages left on the host system (but pages of over limit guests can be preferred for swapping) Is this a really hard requirement? no, not hard, but a reasonable optimization ... let me note once again, that for full isolation you better go with Xen or some other Hypervisor because if you make it work like Xen, it will become as slow and resource hungry as any other paravirtualization solution ... It seems a bit fluffy to me. most optimizations might look strange at first glance, but when you check what the limitting factors for OS-Level virtualizations are, you will find that it looks like this: (in order of decreasing relevance) - I/O subsystem - available memory - network performance - CPU performance note: this is for 'typical' guests, not for number crunching or special database, or pure network bound applications/guests ... An added bonus if we can do it, but certainly not the most important requirement in the bunch. nope, not the _most_ important one, but it all summs up :) What are the consequences if this isn't done? Doesn't a loaded system eventually have all of its pages used anyway, so won't this always be a temporary situation? let's consider a quite limited guest (or several of them) which have a 'RAM' limit of 64MB and additional 64MB of 'virtual swap' assigned ... if they use roughly 96MB (memory footprint) then having this 'fluffy' optimization will keep them running without any effect on the host side, but without, they will continously swap in and out which will affect not only the host, but also the other guests ... This also seems potentially harmful if we aren't able to get pages *back* that we've given to a guest. no, the idea is not to keep them unconditionally, the concept is to allow them to stay, even if the guest has reached the RSS limit and a 'real' system would have to swap pages out (or simply drop them) to get other pages mapped ... Tasks can pin pages in lots of creative ways. sure, this is why we should have proper limits for that too :) - accounting and limits have to be consistent and should roughly represent the actual used memory/swap (modulo optimizations, I can go into detail here, if necessary) So, consistency is important, but is precision? IMHO precision is not that important, of course, the values should be in the same ballpark ... If we, for instance, used one of the hashing schemes, we could have some imprecise decisions made but the system would stay consistent overall. it is also important that the lack of precision cannot be exploited to allocate unreasonable ammounts of resources ... at least Linux-VServer could live with +/- 10% (or probably more) as I said, it is mainly used for preventing DoS or DoR attacks ... This requirement also doesn't seem to push us in the direction of having distinct page owners, or some sharing mechanism, because both would be consistent. - OOM handling on a per guest basis, i.e. some out of memory condition in guest A must not affect guest B I'll agree that this one is important and well stated as-is. Any disagreement on this one? nope
Re: [PATCH 2/2] pci: Repair pci_save/restore_state so we can restore one save many times.
Eric W. Biederman wrote: Because we do not reserve space for the pci-x and pci-e state in struct pci dev we need to dynamically allocate it. However because we need to support restore being called multiple times after a single save it is never safe to free the buffers we have allocated to hold the state. So this patch modifies the save routines to first check to see if we have already allocated a state buffer before allocating a new one. Then the restore routines are modified to not free the state after restoring it. Simple and it fixes some subtle error path handling bugs, that are hard to test for. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] I tested this patch and the other 2 in this series: [PATCH 0/2] Repair pci_restore_state when used with device resets [PATCH 1/2] msi: Safer state caching. against e1000 with suspend/resume functionality. Apart from a minor symmetry violation in e1000 for which I will send a patch later, these patches appear to work fine on my ich8 with 5 msi capable e1000 ports. Feel free to add my Signed-off-by: Auke Kok [EMAIL PROTECTED] Cheers, Auke --- drivers/pci/pci.c | 12 ++-- include/linux/pci.h |5 - 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 6fb78df..b292c9a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -551,7 +551,9 @@ static int pci_save_pcie_state(struct pci_dev *dev) if (pos = 0) return 0; - save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 4, GFP_KERNEL); + save_state = pci_find_saved_cap(dev, PCI_CAP_ID_EXP); + if (!save_state) + save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 4, GFP_KERNEL); if (!save_state) { dev_err(dev-dev, Out of memory in pci_save_pcie_state\n); return -ENOMEM; @@ -582,8 +584,6 @@ static void pci_restore_pcie_state(struct pci_dev *dev) pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, cap[i++]); pci_write_config_word(dev, pos + PCI_EXP_SLTCTL, cap[i++]); pci_write_config_word(dev, pos + PCI_EXP_RTCTL, cap[i++]); - pci_remove_saved_cap(save_state); - kfree(save_state); } @@ -597,7 +597,9 @@ static int pci_save_pcix_state(struct pci_dev *dev) if (pos = 0) return 0; - save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL); + save_state = pci_find_saved_cap(dev, PCI_CAP_ID_EXP); + if (!save_state) + save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL); if (!save_state) { dev_err(dev-dev, Out of memory in pci_save_pcie_state\n); return -ENOMEM; @@ -622,8 +624,6 @@ static void pci_restore_pcix_state(struct pci_dev *dev) cap = (u16 *)save_state-data[0]; pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); - pci_remove_saved_cap(save_state); - kfree(save_state); } diff --git a/include/linux/pci.h b/include/linux/pci.h index 78417e4..481ea06 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -209,11 +209,6 @@ static inline void pci_add_saved_cap(struct pci_dev *pci_dev, hlist_add_head(new_cap-next, pci_dev-saved_cap_space); } -static inline void pci_remove_saved_cap(struct pci_cap_saved_state *cap) -{ - hlist_del(cap-next); -} - /* * For PCI devices, the region numbers are assigned this way: * - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
Do not do it, then. Confusion it causes is not worth saving one line of code. You do less typing, but the resulting code is _less_ readable, not more. Then please document it _clearly_ with the kthread code somewhere. The reason I brought this up is I had no idea we had to put the freezer gunk in all kernel thread loops and Ive been writing kernel threads for years. Anton - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 13/03/07, Mike Galbraith [EMAIL PROTECTED] wrote: On Tue, 2007-03-13 at 07:38 +1100, Con Kolivas wrote: On Tuesday 13 March 2007 07:11, Mike Galbraith wrote: Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any special requirements. Now you're really making a stretch of things. Where on earth did I say that interactive tasks don't have special requirements? It's a fundamental feature of this scheduler that I go to great pains to get them as low latency as possible and their fair share of cpu despite having a completely fair cpu distribution. As soon as your cpu is fully utilized, fairness looses or interactivity loses. Pick one. That's not true unless you refuse to prioritise your tasks accordingly. Let's take this discussion in a different direction. You already nice your lame processes. Why? You already have the concept that you are prioritising things to normal or background tasks. You say so yourself that lame is a background task. Stating the bleedingly obvious, the unix way of prioritising things is via nice. You already do that. So moving on from that... Your test case you ask how can I maximise cpu usage. Well you know the answer already. You run two threads. I won't dispute that. The debate seems to be centered on whether two tasks that are niced +5 or to a higher value is background. In my opinion, nice 5 is not background, but relatively less cpu. You already are savvy enough to be using two threads and nicing them. All I ask you to do when using RSDL is to change your expectations slightly and your settings from nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you? nice 5 is 75% the cpu of nice 0. nice 10 is 50%, nice 15 is 25%, nice 19 is 5%.If you're so intent on defining nice 5 as background would it be a matter of me just modifying nice 5 to be 25% instead? I suspect your answer will be no because then you'll argue that you shouldn't nice at all, but it should be interesting to see your response. You seem to be advocating that the scheduler does everything and we need to implement some complex flag instead. I don't believe that's the right thing to do at all. So I offer you some options. 1. Be happy with changing your nice from 5 to15. I still don't think this is in any way unreasonable. 2. Wait for me to fix -niced tasks behaviour and -nice your X. I plan to implement this change anyway, not necessarily for X. 3. Have me redefine what nice 5 is, and tell me what percentage cpu you think is right. 4. Any combination of the above. Please don't pick 5.none of the above. Please try to work with me on this. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] mm: Inconsistent use of node IDs
This patch corrects inconsistent use of node numbers (variously nid or node) in the presence of fake NUMA. Both AMD and Intel x86_64 discovery code will determine a CPU's physical node and use that node when calling numa_add_cpu() to associate that CPU with the node, but numa_add_cpu() treats the node argument as a fake node. This physical node may not exist within the fake nodespace, and even if it does, it will likely incorrectly associate a CPU with a fake memory node that may not share the same underlying physical NUMA node. Similarly, the PCI code which determines the node of the PCI bus saves it in the pci_sysdata structure. This node then propagates down to other buses and devices which hang off the PCI bus, and is used to specify a node when allocating memory. The purpose is to provide NUMA locality, but the node is a physical node, and the memory allocation code expects a fake node argument. Provide a routine (get_fake_node()) to map a physical node ID to a fake node ID, where the fake node ID contains memory on the specified physical node ID. This fake node's zonelist is tied to other close fake nodes, maintaining NUMA locality. Also provide numa_online_phys() which is the same as numa_online() but takes a physical node ID. Change init_cpu_to_node(), x86_64 and PCI code use get_fake_node() and numa_online_phys() in order to convert to an appropriate fake ID. Signed-off-by: Ethan Solomita [EMAIL PROTECTED] --- arch/i386/pci/acpi.c |6 +++ arch/x86_64/kernel/setup.c| 14 arch/x86_64/mm/numa.c | 70 +- arch/x86_64/pci/k8-bus.c |3 + include/asm-x86_64/topology.h |8 5 files changed, 85 insertions(+), 16 deletions(-) diff -uprN -x install -X linux-2.6.21-rc3-mm2/Documentation/dontdiff linux-2.6.21-rc3-mm2/arch/i386/pci/acpi.c linux-2.6.21-rc3-mm2-phystofake/arch/i386/pci/acpi.c --- linux-2.6.21-rc3-mm2/arch/i386/pci/acpi.c 2007-03-09 16:42:42.0 -0800 +++ linux-2.6.21-rc3-mm2-phystofake/arch/i386/pci/acpi.c2007-03-12 12:36:50.0 -0700 @@ -35,8 +35,13 @@ struct pci_bus * __devinit pci_acpi_scan pxm = acpi_get_pxm(device-handle); #ifdef CONFIG_ACPI_NUMA - if (pxm = 0) + if (pxm = 0) { sd-node = pxm_to_node(pxm); +#ifdef CONFIG_NUMA_EMU + if (sd-node != -1) + sd-node = get_fake_node(sd-node); +#endif + } #endif bus = pci_scan_bus_parented(NULL, busnum, pci_root_ops, sd); diff -uprN -x install -X linux-2.6.21-rc3-mm2/Documentation/dontdiff linux-2.6.21-rc3-mm2/arch/x86_64/kernel/setup.c linux-2.6.21-rc3-mm2-phystofake/arch/x86_64/kernel/setup.c --- linux-2.6.21-rc3-mm2/arch/x86_64/kernel/setup.c 2007-03-09 16:42:42.0 -0800 +++ linux-2.6.21-rc3-mm2-phystofake/arch/x86_64/kernel/setup.c 2007-03-12 12:44:31.0 -0700 @@ -476,20 +476,20 @@ static void __cpuinit display_cacheinfo( } #ifdef CONFIG_NUMA -static int nearby_node(int apicid) +static int __init nearby_node(int apicid) { int i; for (i = apicid - 1; i = 0; i--) { int node = apicid_to_node[i]; - if (node != NUMA_NO_NODE node_online(node)) + if (node != NUMA_NO_NODE node_online_phys(node)) return node; } for (i = apicid + 1; i MAX_LOCAL_APIC; i++) { int node = apicid_to_node[i]; - if (node != NUMA_NO_NODE node_online(node)) + if (node != NUMA_NO_NODE node_online_phys(node)) return node; } - return first_node(node_online_map); /* Shouldn't happen */ + return NUMA_NO_NODE; /* Shouldn't happen */ } #endif @@ -528,7 +528,7 @@ static void __init amd_detect_cmp(struct node = c-phys_proc_id; if (apicid_to_node[apicid] != NUMA_NO_NODE) node = apicid_to_node[apicid]; - if (!node_online(node)) { + if (!node_online_phys(node)) { /* Two possibilities here: - The CPU is missing memory and no node was created. In that case try picking one from a nearby CPU @@ -543,9 +543,10 @@ static void __init amd_detect_cmp(struct apicid_to_node[ht_nodeid] != NUMA_NO_NODE) node = apicid_to_node[ht_nodeid]; /* Pick a nearby node */ - if (!node_online(node)) + if (!node_online_phys(node)) node = nearby_node(apicid); } + node = get_fake_node(node); numa_set_node(cpu, node); printk(KERN_INFO CPU %d/%x - Node %d\n, cpu, apicid, node); @@ -679,7 +680,7 @@ static int __cpuinit intel_num_cpu_cores return 1; } -static void srat_detect_node(void) +static void __cpuinit srat_detect_node(void) { #ifdef CONFIG_NUMA unsigned node; @@ -689,6 +690,7 @@ static void srat_detect_node(void) /* Don't do
[PATCH] Fix vmi time header bug
Some gcc put this function in .init.text because the header didn't match. For 2.6.21-rc. Zach Index: linux-2.6.21/include/asm-i386/vmi_time.h === --- linux-2.6.21.orig/include/asm-i386/vmi_time.h 2007-03-06 18:56:03.0 -0800 +++ linux-2.6.21/include/asm-i386/vmi_time.h2007-03-12 13:55:16.0 -0800 @@ -54,7 +54,7 @@ extern unsigned long vmi_cpu_khz(void); #ifdef CONFIG_X86_LOCAL_APIC extern void __init vmi_timer_setup_boot_alarm(void); -extern void __init vmi_timer_setup_secondary_alarm(void); +extern void __devinit vmi_timer_setup_secondary_alarm(void); extern void apic_vmi_timer_interrupt(void); #endif
Re: /sys/devices/system/cpu/cpuX/online are missing
On Mon, 12 Mar 2007, Heiko Carstens wrote: On Sun, Mar 11, 2007 at 10:26:52PM +0100, Giuliano Pochini wrote: Since 2.6.20 /sys/devices/system/cpu/cpuX/online isn't there anymore. The directories exist, though. I also tested linux-2.6.21rc3. I had a look at the archives and I found nothing about the removal of that file, which is still documented in Documentation/cpu-hotplug.txt. I don't know if other architectures are affected. $ uname -a Linux Jay 2.6.20 #1 SMP Mon Feb 5 22:42:18 CET 2007 ppc 7455, altivec supported PowerMac3,6 GNU/Linux No cpusets. CONFIG_HOTPLUG_CPU=y Somebody inverted the logic when and if the 'online' attribute for cpu devices appear. See 72486f1f8f0a2bc828b9d30cf4690cf2dd6807fc. The fix for s390 is this: 6721f77810dfcb7cbf8e97be6fa43fe2740dd0aa. Looks like arch/ppc was left out as well. I had a look at arch/powerpc/kernel/smp.c but I'm not familiar at all with those parts of the kernel. I'm cc'ing this message to linuxppc-dev. -- Giuliano. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wednesday 07 March 2007 11:02, Nick Piggin wrote: On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote: On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't have a file descriptor to get a pgoff, then remap_file_pages is a doorstop for everyone ;) address spaces (this likely also applies to UML, though they should really chime in to confirm). Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. That's no problem for UML, but for any other user (I guess I'll have to prevent callers from trying such stuff - I started from a pretty generic patch). But come to think of it, I still don't think nonlinear mappings are too bad as they are ;) Btw, I really like removing -populate and merging the common code together. filemap_populate and shmem_populate are so obnoxiously different that I already wanted to do that (after merging remap_file_pages() core). Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make sure we populate the initroot filesystem late enough
On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote: On Wed, 2007-02-28 at 10:13 +, David Woodhouse wrote: On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote: I wouldn't be that sure ... I've had problems in the past with PMU based cpufreq... looks like flushing all caches and hard-resetting the processor on the fly when there can be pending DMAs might be a source of trouble... especially on CPUs that don't have working cache flush HW assist. I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq. I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook. They all fall over with the latest kernel, although the shinybook only does so immediately when booted with mem=512M. The shinybook does crash later with new kernels though; I don't yet know why. It could be the same thing, or it could be something different. That one seemed to appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where we did nothing but turned CONFIG_SYSFS_DEPRECATED on. I don't blame cpufreq. At various times I've been equally convinced that it was due to CONFIG_KPROBES, and Linus' initrd-moving patch. Is there any pattern to the way it dies? Or is it just randomly dieing somewhere depending on which config options you have enabled? This is starting to sound reminiscent of a bug I chased for a while last year on Power5, but didn't find. It was fixed on some machines by disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options. Unfortunately it magically stopped reproducing so I never caught it :/ Hmm. The crash came back after I booted into Mac OS X and back. It was however a different crash, I believe it was coming from the USB modules (as it would keep going when it happened, and get another crash, which tended to scroll away too fast for me to capture) but I believe it was still getting down into the slab code and actually dying there. However, reverting the reversion of 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying the following patch: diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux-source-2.6.20/arch/powerpc/mm/init_32.c --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 2007-02-05 05:44:54.0 +1100 +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c 2007-03-10 11:03:56.0 +1100 @@ -244,7 +244,8 @@ void free_initrd_mem(unsigned long start, unsigned long end) { if (start end) - printk (Freeing initrd memory: %ldk freed\n, (end - start) 10); + printk (NOT Freeing initrd memory: %ldk freed\n, (end - start) 10); + return; for (; start end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); init_page_count(virt_to_page(start)); which if I recall correctly David Woodhouse posted to this thread, seems to have fixed it. I dunno if it's relevant, but my initrd.img is 13193315 bytes long, (ie 99 bytes over 12884k) and the above logs: NOT Freeing initrd memory: 12888k freed which makes sense... I of course completely failed to think to check this with the crashing kernel, if it seems relevant I can roll back to it and get the numbers. -- --- Paul TBBle Hampson, B.Sc, LPI, MCSE On-hiatus Asian Studies student, ANU The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361) [EMAIL PROTECTED] Of course Pacman didn't influence us as kids. If it did, we'd be running around in darkened rooms, popping pills and listening to repetitive music. -- Kristian Wilson, Nintendo, Inc, 1989 License: http://creativecommons.org/licenses/by/2.1/au/ --- pgpjlI9DiEDO9.pgp Description: PGP signature
Re: [RFC][PATCH 2/7] RSS controller core
On Mon, 2007-03-12 at 23:41 +0100, Herbert Poetzl wrote: On Mon, Mar 12, 2007 at 11:42:59AM -0700, Dave Hansen wrote: How about we drill down on these a bit more. On Mon, 2007-03-12 at 02:00 +0100, Herbert Poetzl wrote: - shared mappings of 'shared' files (binaries and libraries) to allow for reduced memory footprint when N identical guests are running So, it sounds like this can be phrased as a requirement like: Guests must be able to share pages. Can you give us an idea why this is so? sure, one reason for this is that guests tend to be similar (or almost identical) which results in quite a lot of 'shared' libraries and executables which would otherwise get cached for each guest and would also be mapped for each guest separately On a typical vserver system, there is nothing like a typical Linux-VServer system :) how much memory would be lost if guests were not permitted to share pages like this? let me give a real world example here: - typical guest with 600MB disk space - about 100MB guest specific data (not shared) - assumed that 80% of the libs/tools are used I get the general idea here, but I just don't think those numbers are very accurate. My laptop has a bunch of gunk open (xterm, evolution, firefox, xchat, etc...). I ran this command: lsof | egrep '/(usr/|lib.*\.so)' | awk '{print $9}' | sort | uniq | xargs du -Dcs and got: 113840 total On a web/database server that I have (ps aux | wc -l == 128), I just ran the same: 39168 total That's assuming that all of the libraries are fully read in and populated, just by their on-disk sizes. Is that not a reasonable measure of the kinds of things that we can expect to be shared in a vserver? If so, it's a long way from 400MB. Could you try a similar measurement on some of your machines? Perhaps mine are just weird. - virtual 'physical' limit should not cause swap out when there are still pages left on the host system (but pages of over limit guests can be preferred for swapping) Is this a really hard requirement? no, not hard, but a reasonable optimization ... let me note once again, that for full isolation you better go with Xen or some other Hypervisor because if you make it work like Xen, it will become as slow and resource hungry as any other paravirtualization solution ... Believe me, _I_ don't want Xen. :) It seems a bit fluffy to me. most optimizations might look strange at first glance, but when you check what the limitting factors for OS-Level virtualizations are, you will find that it looks like this: (in order of decreasing relevance) - I/O subsystem - available memory - network performance - CPU performance note: this is for 'typical' guests, not for number crunching or special database, or pure network bound applications/guests ... I don't doubt this, but doing this two-level page-out thing for containers/vservers over their limits is surely something that we should consider farther down the road, right? It's important to you, but you're obviously not doing any of the mainline coding, right? What are the consequences if this isn't done? Doesn't a loaded system eventually have all of its pages used anyway, so won't this always be a temporary situation? let's consider a quite limited guest (or several of them) which have a 'RAM' limit of 64MB and additional 64MB of 'virtual swap' assigned ... if they use roughly 96MB (memory footprint) then having this 'fluffy' optimization will keep them running without any effect on the host side, but without, they will continously swap in and out which will affect not only the host, but also the other guests ... All workloads that use $limit+1 pages of memory will always pay the price, right? :) -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64, i386: Add command line length to boot protocol
Hi! +cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line, Why a long? It's unlikely that someone is going to have a command line bigger than 0x. Well, I could imagine overflowing that. Describing your numa setup, excluding few bad bits of ram using memmap=exact, set up your boot over iscsi on cmdline these are likely to eat insane ammount of cmdline space. 65535 characters? Are you for real? Stop and think about just how big that is. If you have to create a boot command line that long, you have serious, serious issues. Well, it is about the same size as my .config... I agree we are unlikely to hit it any time soon... I could imagine some (ab)uses, like fixed_acpi_bios=lots of hex digits, but those are ugly. I could also imagine some uses where entire embedded machine is described at kernel commandline. Yes, all those are ugly/unlikely. OTOH saving 2 bytes does not seem like that great goal. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CONFIG_REORDER Kconfig help strange sentence.
OK, this confused me: Function reordering (REORDER) [N/y/?] (NEW) ? This option enables the toolchain to reorder functions for a more optimal TLB usage. If you have pretty much any version of binutils, this can increase your kernel build time by roughly one minute. If you have pretty much any version of binutils? Huh? You mean This will slow your kernel build by about a minute? Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] mm: Inconsistent use of node IDs
On Monday 12 March 2007 23:51, Ethan Solomita wrote: This patch corrects inconsistent use of node numbers (variously nid or node) in the presence of fake NUMA. I think it's very consistent -- your patch would make it inconsistent though. Both AMD and Intel x86_64 discovery code will determine a CPU's physical node and use that node when calling numa_add_cpu() to associate that CPU with the node, but numa_add_cpu() treats the node argument as a fake node. This physical node may not exist within the fake nodespace, and even if it does, it will likely incorrectly associate a CPU with a fake memory node that may not share the same underlying physical NUMA node. Similarly, the PCI code which determines the node of the PCI bus saves it in the pci_sysdata structure. This node then propagates down to other buses and devices which hang off the PCI bus, and is used to specify a node when allocating memory. The purpose is to provide NUMA locality, but the node is a physical node, and the memory allocation code expects a fake node argument. Sorry, but when you ask for NUMA emulation you will get it. I don't see any point in a half way only for some subsystems I like NUMA emulation. It's unlikely that your ideas of where it is useful and where is not matches other NUMA emulation user's ideas too. Besides adding such a secondary node space would be likely a huge long term mainteance issue. I just can it see breaking with every non trivial change. NACK. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64, i386: Add command line length to boot protocol
On Tue, Mar 13, 2007 at 12:12:20AM +0100, Pavel Machek wrote: 65535 characters? Are you for real? Stop and think about just how big that is. If you have to create a boot command line that long, you have serious, serious issues. Well, it is about the same size as my .config... So? That has *nothing* to do with the boot command line I agree we are unlikely to hit it any time soon... I could imagine some (ab)uses, like fixed_acpi_bios=lots of hex digits, but those are ugly. That's beyond ugly, and rapidly heading towards 'loony'. I could also imagine some uses where entire embedded machine is described at kernel commandline. There are far better ways to get configuration into the kernel than the boot command line. Anyways, I'm tired of arguing for the sake of arguing. I really could care less. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
On Mon, Mar 12, 2007 at 03:25:07PM +0530, Balbir Singh wrote: doesn't look so good for me, mainly becaus of the additional per page data and per page processing on 4GB memory, with 100 guests, 50% shared for each guest, this basically means ~1mio pages, 500k shared and 1500k x sizeof(page_container) entries, which roughly boils down to ~25MB of wasted memory ... increase the amount of shared pages and it starts getting worse, but maybe I'm missing something here We need to decide whether we want to do per-container memory limitation via these data structures, or whether we do it via a physical scan of some software zone, possibly based on Mel's patches. why not do simple page accounting (as done currently in Linux) and use that for the limits, without keeping the reference from container to page? best, Herbert Herbert, You lost me in the cc list and I almost missed this part of the thread. hmm, it is very unlikely that this would happen, for several reasons ... and indeed, checking the thread in my mailbox shows that akpm dropped you ... Subject: [RFC][PATCH 2/7] RSS controller core From: Pavel Emelianov [EMAIL PROTECTED] To: Andrew Morton [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], Srivatsa Vaddagiri [EMAIL PROTECTED], Balbir Singh [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], Linux Kernel Mailing List linux-kernel@vger.kernel.org Date: Tue, 06 Mar 2007 17:55:29 +0300 Subject: Re: [RFC][PATCH 2/7] RSS controller core From: Andrew Morton [EMAIL PROTECTED] To: Pavel Emelianov [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], List linux-kernel@vger.kernel.org Date: Tue, 6 Mar 2007 14:00:36 -0800 that's the one I 'group' replied to ... Could you please not modify the cc list. I never modify the cc unless explicitely asked to do so. I wish others would have it that way too :) best, Herbert Thanks, Balbir ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
From: Con Kolivas [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 10:58:11 +1100 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.30.patch FWIW, this boots and seems to work well on sparc64. Tested on UP SunBlade1500 and 24cpu Niagara T1000. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: irda rmmod lockdep trace.
From: Samuel Ortiz [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 02:38:43 +0200 On Sat, Mar 10, 2007 at 07:43:26PM +0200, Samuel Ortiz wrote: Hi Dave, On Thu, Mar 08, 2007 at 05:54:36PM -0500, Dave Jones wrote: modprobe irda ; rmmod irda in 2.6.21rc3 gets me the spew below.. Well it seems that we call __irias_delete_object() from hashbin_delete(). Then __irias_delete_object() calls itself hashbin_delete() again. We're trying to get the lock recursively. Looking at the code more carefully, this seems to be a false positive: iriap_cleanup and and __irias_delete_object are taking 2 different locks from 2 different hashbin instances. The locks belong to the same lock class but they are hierarchically different. We need to tell the validator about it and the following patch does that. Comments are welcomed as I'm planning to push it to netdev soon: I would strongly caution against adding any run-time overhead just to cure a false lockdep warning. Even adding a new function argument is too much IMHO. Make the cost show up for lockdep only, perhaps by putting each hashbin lock into a seperate locking class? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
On Mon, Mar 12, 2007 at 09:50:08AM -0700, Dave Hansen wrote: On Mon, 2007-03-12 at 19:23 +0300, Kirill Korotaev wrote: For these you essentially need per-container page-_mapcount counter, otherwise you can't detect whether rss group still has the page in question being mapped in its processes' address spaces or not. What do you mean by this? You can always tell whether a process has a particular page mapped. Could you explain the issue a bit more. I'm not sure I get it. OpenVZ wants to account _shared_ pages in a guest different than separate pages, so that the RSS accounted values reflect the actual used RAM instead of the sum of all processes RSS' pages, which for sure is more relevant to the administrator, but IMHO not so terribly important to justify memory consuming structures and sacrifice performance to get it right YMMV, but maybe we can find a smart solution to the issue too :) best, Herbert -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] mm: Inconsistent use of node IDs
Andi Kleen wrote: On Monday 12 March 2007 23:51, Ethan Solomita wrote: This patch corrects inconsistent use of node numbers (variously nid or node) in the presence of fake NUMA. I think it's very consistent -- your patch would make it inconsistent though. It's consistent to call node_online() with a physical node ID when the online node mask is composed of fake nodes? Sorry, but when you ask for NUMA emulation you will get it. I don't see any point in a half way only for some subsystems I like NUMA emulation. It's unlikely that your ideas of where it is useful and where is not matches other NUMA emulation user's ideas too. I don't understand your comments. My code is intended to work for all systems. If the system is non-NUMA by nature, then all CPUs map to fake node 0. As an example, on a two chip dual-core AMD opteron system, there are 4 cpus where CPUs 0 and 1 are close to the first half of memory, and CPUs 2 and 3 are close to the second half. Without this change CPUs 2 and 3 are mapped to fake node 1. This results in awful performance. With this change, CPUs 2 and 3 are mapped to (roughly) 1/2 the fake node count. Their zonelists[] are ordered to do allocations preferentially from zones that are local to CPUs 2 and 3. Can you tell me the scenario where my code makes things worse? Besides adding such a secondary node space would be likely a huge long term mainteance issue. I just can it see breaking with every non trivial change. I'm adding no data structures to do this. The current code already has get_phys_node. My changes use the existing information about node layout, both the physical and fake, and defines a mapping. The current mapping just takes a physical node and says it's the fake node too. NACK. I wish you would include some specifics as to why you think what you do. You're suggesting we leave in place a system that destroys NUMA locality when using fake numa, and passes around physical node ids as an index into nodes[] whihc is indexed by fake nodes. My change has no effect without fake numa, and harms no one with fake numa. -- Ethan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_REORDER Kconfig help strange sentence.
On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote: OK, this confused me: Function reordering (REORDER) [N/y/?] (NEW) ? This option enables the toolchain to reorder functions for a more optimal TLB usage. If you have pretty much any version of binutils, this can increase your kernel build time by roughly one minute. If you have pretty much any version of binutils? Huh? You mean This will slow your kernel build by about a minute? Yes. Lots of sections seem to trigger some quadratic behaviour in ld. It might be fixed in some unreleased CVS version though (not 100% sure) -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mon, 12 Mar 2007, Mike Galbraith wrote: On Tue, 2007-03-13 at 07:38 +1100, Con Kolivas wrote: On Tuesday 13 March 2007 07:11, Mike Galbraith wrote: Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any special requirements. Now you're really making a stretch of things. Where on earth did I say that interactive tasks don't have special requirements? It's a fundamental feature of this scheduler that I go to great pains to get them as low latency as possible and their fair share of cpu despite having a completely fair cpu distribution. As soon as your cpu is fully utilized, fairness looses or interactivity loses. Pick one. correct. the problem is that it's hard (if not impossible) to properly identify what is needed to make a system have good interactivity. in some cases it's a matter of low latency (wake up a process as quickly as you can when whatever it was waiting on is available), but in others it's a matter of allocating the _right_ process enough CPU (X needs enough CPU to do things) where it's a matter of needing low-latency, it's possible to design a scheduler that will do things in a predictable enough way that you know the max latency you have to deal with (and the RSDL seems to do this) the problem comes when this isn't enough. if you have several CPU hogs on a system, and they are all around the same priority level, how can the scheduler know which one needs the CPU the most for good interactivity? in some cases you may be able to directly detect that your high-priority process is waiting for another one (tracing pipes and local sockets for example), but what if you are waiting for several of them? (think a multimedia desktop waiting for the sound card, CDRom, hard drive, and video all at once) which one needs the extra CPU the most? Fairness is much easier to enforce (and much easier to understand) the RSDL is concentrating on enforcing fairness, with bounded (and predictable) latencies. if you are willing to tell the system what you consider more important (and how much more important you consider it), then it's much easier to figure out who to give the CPU to. Con is just asking you to do this (and you already do, by doing a nice -5. but it sounds like you want that to mean more then it currently does) David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Delete superfluous source file net/wanrouter/af_wanpipe.c.
From: Robert P. J. Day [EMAIL PROTECTED] Date: Sat, 10 Mar 2007 03:49:52 -0500 (EST) Delete the apparently superfluous source file net/wanrouter/af_wanpipe.c. Signed-off-by: Robert P. J. Day [EMAIL PROTECTED] Applied, thanks Robert. This thing isn't even built in 2.4.x :-) Although there is some ancient reference to the build module in Documentation/networking/wan-router.txt, a heavily out of date document. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr
Hi, --On 9 March 2007 12:55:11 PM +0100 Christoph Hellwig [EMAIL PROTECTED] wrote: Ed Cashin found a bug in the error handling code for the case where a page allocation fails. Here's the updated version: Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-03-08 19:08:38.0 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-09 08:59:15.0 +0100 + for (i = 0; i page_count; i++) { + bp-b_pages[i] = alloc_page(GFP_KERNEL); + if (!bp-b_pages[i]) + goto fail_free_mem; + } + bp-b_flags |= _XBF_PAGES; + + error = _xfs_buf_map_pages(bp, XBF_MAPPED); + if (unlikely(error)) { + printk(KERN_WARNING %s: failed to map pages\n, + __FUNCTION__); goto fail_free_mem; - bp-b_flags |= _XBF_KMEM_ALLOC; + } xfs_buf_unlock(bp); XB_TRACE(bp, no_daddr, data); return bp; + fail_free_mem: - kmem_free(data, malloc_len); + for ( ; i = 0; i--) + __free_page(bp-b_pages[i]); fail_free_buf: xfs_buf_free(bp); fail: It looks like you might need: for (i--; i = 0; i--) (or: for (j = 0; j i; j++) etc.) Because if the initial alloc_page loop goes to completion then: i == pagecount and if alloc_page loop terminates early then bp-b_pages[i] == NULL So we have gone 1 too far in both cases and need to start free'ing back one. Unless I missed something. --Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/12/07, michael chang [EMAIL PROTECTED] wrote: Considering the concepts put out by projects such as BOINC and [EMAIL PROTECTED], I wouldn't be thoroughly surprised by this ideology, although I do question the particular way this test case is being run. If Con actually implements SCHED_IDLEPRIO in RSDL, life is good even in that case. This seems to me like he's saying that there has to be a mechanism (outside of nice) that can be used to treat processes that I want to be interactive all special-like. It feels like something that would have been said in the design of what the scheduler was in -ck and is currently in vanilla. Exactly. Driving us again toward the fact that different workloads might benefit from different schedulers (eg: RSDL is cool for server loads, previous staircase did an excellent job on desktop, etc) and thus that having a choice of schedulers might be something that would satisfy (some) people... To me, that fundamentally clashes with the design behind RSDL. That said, I could be wrong -- Con appears to have something that could be very promising up his sleeve that could come out sooner or later. Once he's written it, of course. In any case, RSDL seems very promising, for the most part. It certainly is. Negative feedback can be a good thing too, as it helps improving it anyway. It's nonetheless true that it's practically impossible to satisfy 100% of use case with a single design, so choices will have to be made. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: /sys/devices/system/cpu/cpuX/online are missing
Giuliano Pochini [EMAIL PROTECTED] writes: I had a look at arch/powerpc/kernel/smp.c but I'm not familiar at all with those parts of the kernel. See arch/powerpc/kernel/sysfs.c:topology_init. I don't think there is anything to do here. You probably don't have CONFIG_HOTPLUG_CPU enabled. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
On 3/12/07, Bodo Eggert [EMAIL PROTECTED] wrote: On Mon, 12 Mar 2007, Michael K. Edwards wrote: That's fine when you're doing integration test, and should probably be the default during development. But if the race is first exposed in the field, or if the developer is trying to concentrate on a different problem, spectacular crash and burn may do more harm than good. It's easy enough to refactor the f_pos handling in the kernel so that it all goes through three or four inline accessor functions, at which point you can choose your trade-off between speed and idiot-proofness -- at _kernel_ compile time, or (given future hardware that supports standardized optionally-atomic-based-on-runtime-flag operations) per process at run-time. CONFIG_WOMBAT Waste memory, brain and time in order to grant an atomic write which is neither guaranteed by the standard nor expected by any sane programmer, just in case some idiot tries to write to one file from multiple processes. Warning: Programs expecting this behaviour are buggy and non-portable. OK, I laughed out loud at this. But I think you're missing my point, which is that there's a time to be hard-core about code quality and there's a time to be hard-core about _product_ quality. Face it, all products containing software more or less suck. This is because most programmers write crap code most of the time. The only way to cope with this, outside the confines of the European defense industry and other niches insulated from economic reality, is to make the production environment gentler on _application_ code than the development environment is. Hence CONFIG_WOMBAT. (I like that name. I'm going to use it in my patch, with your permission. :-) Writing to a file from multiple processes is not usually the problem. Writing to a common struct file from multiple threads is. 99.999% of the time it will work, because you're only writing as far as VFS cache and then bumping f_pos, and your threads are probably on the same processor anyway. 0.001% of the time the second thread will see a stale f_pos and clobber the first write. This is true even on file types that can never return a short write. If you remember to open with O_APPEND so the pos argument to vfs_write is silently ignored, or if the implementation underlying vfs_write effectively ignores the pos argument irrespective of flags, you're OK. If the pos argument isn't ignored, or if you ever look at the result of a relative seek on any fd that maps to that struct file, you're screwed. (Note to the alert reader: yes, this means shell scripts should always use rather than when routing stdout and/or stderr to a file. You're just as vulnerable to interleaving due to stdio buffering issues as you are when stdio and stderr are sent to the tty, and short writes may still be a problem if you are so foolish as to use a filesystem that generates them on anything short of a catastrophic error, but at least you get O_APPEND and sane behavior on ftruncate().) Frankly, I think that unless application programmers poke at some sort of magic I promise to handle short writes correctly bit, write() should always return either the full number of bytes requested or an error code. If you asume that you won't have short writes, your programs may fail on e.g. solaris. There may be reasons for linux to use the same semantics at some time in the future, you never know. So what? My products are shipping _now_. Future kernels are guaranteed to break them anyway because sysfs is a moving target. Solaris is so not in the game for my kind of embedded work, it's not even funny. If POSIX mandates stupid shit, and application programmers don't read that part of the manual anyway (and don't code on that assumption in practice), to hell with POSIX. On many file descriptors, short writes simply can't happen -- and code that purports to handle short writes but has never been exercised is arguably worse than code that simply bombs on short write. So if I can't shim in an induce-short-writes-randomly-on-purpose mechanism during development, I don't want short writes in production, period. In my world, GNU/Linux is not a crappy imitation Solaris that you get to pay out the wazoo for to Red Hat (and get no documentation and lousy tech support that doesn't even cover your hardware). It's a full-source-code platform on which you can engineer robust industrial and consumer products, because you can control the freeze and release schedule component-by-component, and you can point fix anything in the system at any time. If, that is, you understand that the source code is not the software, and that you can't retrofit stability and security overnight onto code that was written with no thought of anything but performance. If you asume you *may* have short writes, you have no problem. Sure -- until the one code path in a hundred that handles the short write case incorrectly gets traversed in production, after having
[PATCH] i386: Simplify smp_call_function*() by using common implementation
Subject: Simplify smp_call_function*() by using common implementation smp_call_function and smp_call_function_single are almost complete duplicates of the same logic. This patch combines them by implementing them in terms of the more general smp_call_function_mask(). Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED] Cc: Stephane Eranian [EMAIL PROTECTED] Cc: Andrew Morton [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Cc: Randy.Dunlap [EMAIL PROTECTED] Cc: Ingo Molnar [EMAIL PROTECTED] --- arch/i386/kernel/smp.c | 213 ++-- 1 file changed, 102 insertions(+), 111 deletions(-) === --- a/arch/i386/kernel/smp.c +++ b/arch/i386/kernel/smp.c @@ -515,6 +515,73 @@ void unlock_ipi_call_lock(void) static struct call_data_struct *call_data; + +/** + * smp_call_function_mask(): Run a function on a set of other CPUs. + * @mask: The set of cpus to run on. Must not include the current cpu. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait (atomically) until function has completed on other CPUs. + * + * Returns 0 on success, else a negative status code. Does not return until + * remote CPUs are nearly ready to execute func or are or have finished. + * + * You must not call this function with disabled interrupts or from a + * hardware interrupt handler or from a bottom half handler. + */ +int smp_call_function_mask(cpumask_t mask, + void (*func)(void *), void *info, + int wait) +{ + struct call_data_struct data; + cpumask_t allbutself; + int cpus; + + /* Can deadlock when called with interrupts disabled */ + WARN_ON(irqs_disabled()); + + /* Holding any lock stops cpus from going down. */ + spin_lock(call_lock); + + allbutself = cpu_online_map; + cpu_clear(smp_processor_id(), allbutself); + + cpus_and(mask, mask, allbutself); + cpus = cpus_weight(mask); + + if (!cpus) { + spin_unlock(call_lock); + return 0; + } + + data.func = func; + data.info = info; + atomic_set(data.started, 0); + data.wait = wait; + if (wait) + atomic_set(data.finished, 0); + + call_data = data; + mb(); + + /* Send a message to other CPUs */ + if (cpus_equal(mask, allbutself)) + send_IPI_allbutself(CALL_FUNCTION_VECTOR); + else + send_IPI_mask(mask, CALL_FUNCTION_VECTOR); + + /* Wait for response */ + while (atomic_read(data.started) != cpus) + cpu_relax(); + + if (wait) + while (atomic_read(data.finished) != cpus) + cpu_relax(); + spin_unlock(call_lock); + + return 0; +} + /** * smp_call_function(): Run a function on all other CPUs. * @func: The function to run. This must be fast and non-blocking. @@ -528,48 +595,43 @@ static struct call_data_struct *call_dat * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. */ -int smp_call_function (void (*func) (void *info), void *info, int nonatomic, - int wait) -{ - struct call_data_struct data; - int cpus; - - /* Holding any lock stops cpus from going down. */ - spin_lock(call_lock); - cpus = num_online_cpus() - 1; - if (!cpus) { - spin_unlock(call_lock); - return 0; - } - - /* Can deadlock when called with interrupts disabled */ - WARN_ON(irqs_disabled()); - - data.func = func; - data.info = info; - atomic_set(data.started, 0); - data.wait = wait; - if (wait) - atomic_set(data.finished, 0); - - call_data = data; - mb(); - - /* Send a message to all other CPUs and wait for them to respond */ - send_IPI_allbutself(CALL_FUNCTION_VECTOR); - - /* Wait for response */ - while (atomic_read(data.started) != cpus) - cpu_relax(); - - if (wait) - while (atomic_read(data.finished) != cpus) - cpu_relax(); - spin_unlock(call_lock); - - return 0; +int smp_call_function(void (*func) (void *info), void *info, int nonatomic, + int wait) +{ + return smp_call_function_mask(cpu_online_map, func, info, wait); } EXPORT_SYMBOL(smp_call_function); + +/* + * smp_call_function_single - Run a function on another CPU + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @nonatomic: Currently unused. + * @wait: If true, wait until function has completed on other CPUs. + * + * Retrurns 0 on success, else a negative status code. + * + * Does
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: On Wednesday 07 March 2007 11:02, Nick Piggin wrote: Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? That's no problem for UML, but for any other user (I guess I'll have to prevent callers from trying such stuff - I started from a pretty generic patch). But come to think of it, I still don't think nonlinear mappings are too bad as they are ;) Btw, I really like removing -populate and merging the common code together. filemap_populate and shmem_populate are so obnoxiously different that I already wanted to do that (after merging remap_file_pages() core). Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage, and duplicate a lot of the same code ;) Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? Is there a big clash? I don't think I did a great deal to fremap.c (mainly just removing stuff)... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
Writing to a file from multiple processes is not usually the problem. Writing to a common struct file from multiple threads is. Not normally because POSIX sensibly invented pread/pwrite. Forgot preadv/pwritev but they did the basics and end of problem So what? My products are shipping _now_. That doesn't inspire confidence. even funny. If POSIX mandates stupid shit, and application programmers don't read that part of the manual anyway (and don't code on that assumption in practice), to hell with POSIX. On many file Thats funny, you were talking about quality a moment ago. descriptors, short writes simply can't happen -- and code that There is almost no descriptor this is true for. Any file I/O can and will end up short on disk full or resource limit exceeded or quota exceeded or NFS server exploded or ... And on the device side about the only thing with the vaguest guarantees is pipe(). purports to handle short writes but has never been exercised is arguably worse than code that simply bombs on short write. So if I can't shim in an induce-short-writes-randomly-on-purpose mechanism during development, I don't want short writes in production, period. Easy enough to do and gcov plus dejagnu or similar tools will let you coverage analyse the resulting test set and replay it. Sure -- until the one code path in a hundred that handles the short write case incorrectly gets traversed in production, after having gone untested in a development environment that used a different filesystem that never happened to trigger it. Competent QA and testing people test all the returns in the manual as well as all the returns they can find in the code. See ptrace(2) if you don't want to do a lot of relinking and strace for some useful worked examples of syscall hooking. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
Patrick Mau [EMAIL PROTECTED] writes: Why not temporarly replace /bin/tar with a shell script that does: #!/bin/sh exec strace -f -o output /bin/real.tar $@ You beat me to it. :) I've done that before; it's a great suggestion. Except that if you expect 'tar' to be invoked multiple times in a run, you should probably use 'output.$$' for the output filename so things don't get clobbered. -Doug - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)
-- Forwarded message -- Hi, I have tested on my mac mini g4. The 2.6.21-rc2 will cause oops like the above post. And for the new 2.6.21-rc3-git7 , the kernel load ok, penguin pixmap appears, but then it stopped, there's no error messages also. Regards dave 2007/3/7, Benjamin Herrenschmidt [EMAIL PROTECTED]: On Wed, 2007-03-07 at 17:53 +1300, Paul Collins wrote: David Woodhouse [EMAIL PROTECTED] writes: On Tue, 2007-03-06 at 14:53 +1300, Paul Collins wrote: In case it's of interest, 2.6.20 has been running fine on my PowerBook5,4. How much memory? What if you boot with mem=512M or mem=256M? 1GB. Also works fine when booted with those options. Can you try 2.6.21-rc3 ? We just fixed a nasty bug causing memory corruption. Ben. ___ Linuxppc-dev mailing list [EMAIL PROTECTED] https://ozlabs.org/mailman/listinfo/linuxppc-dev - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
hmm, it is very unlikely that this would happen, for several reasons ... and indeed, checking the thread in my mailbox shows that akpm dropped you ... But, I got Andrew's email. Subject: [RFC][PATCH 2/7] RSS controller core From: Pavel Emelianov [EMAIL PROTECTED] To: Andrew Morton [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], Srivatsa Vaddagiri [EMAIL PROTECTED], Balbir Singh [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], Linux Kernel Mailing List linux-kernel@vger.kernel.org Date: Tue, 06 Mar 2007 17:55:29 +0300 Subject: Re: [RFC][PATCH 2/7] RSS controller core From: Andrew Morton [EMAIL PROTECTED] To: Pavel Emelianov [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], List linux-kernel@vger.kernel.org Date: Tue, 6 Mar 2007 14:00:36 -0800 that's the one I 'group' replied to ... Could you please not modify the cc list. I never modify the cc unless explicitely asked to do so. I wish others would have it that way too :) Thats good to know, but my mailer shows Andrew Morton [EMAIL PROTECTED] to Pavel Emelianov [EMAIL PROTECTED] cc Paul Menage [EMAIL PROTECTED], Srivatsa Vaddagiri [EMAIL PROTECTED], Balbir Singh [EMAIL PROTECTED] (see I am HERE), devel@openvz.org, Linux Kernel Mailing List linux-kernel@vger.kernel.org, [EMAIL PROTECTED], Kirill Korotaev [EMAIL PROTECTED] dateMar 7, 2007 3:30 AM subject Re: [RFC][PATCH 2/7] RSS controller core mailed-by vger.kernel.org On Tue, 06 Mar 2007 17:55:29 +0300 and your reply as Andrew Morton [EMAIL PROTECTED], Pavel Emelianov [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], List linux-kernel@vger.kernel.org to Andrew Morton [EMAIL PROTECTED] cc Pavel Emelianov [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage [EMAIL PROTECTED], List linux-kernel@vger.kernel.org dateMar 9, 2007 10:18 PM subject Re: [RFC][PATCH 2/7] RSS controller core mailed-by vger.kernel.org I am not sure what went wrong. Could you please check your mail client, cause it seemed to even change email address to smtp.osdl.org which bounced back when I wrote to you earlier. best, Herbert Cheers, Balbir - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
rmmod uhci_hcd - BUG: atomic counter underflow
Hi. After rmmoding of uhci_hcd on fresh booted 2.6.21-rc3-mm2 I got this: BUG: atomic counter underflow at: [c0104f0b] show_trace_log_lvl+0x1a/0x30 [c01055f3] show_trace+0x12/0x14 [c010567a] dump_stack+0x16/0x18 [c01dc41b] kref_put+0x4d/0xb2 [c01db754] kobject_put+0x14/0x16 [c01db8a3] kobject_unregister+0x22/0x25 [c024c987] bus_remove_driver+0x75/0x82 [c024d3b8] driver_unregister+0xb/0x18 [c01e7020] pci_unregister_driver+0x13/0x73 [f88dbbd9] uhci_hcd_cleanup+0xd/0x2d [uhci_hcd] [c013fb69] sys_delete_module+0x133/0x195 [c0103fe0] syscall_call+0x7/0xb === Note, that this is connected: Bus 004 Device 001: ID : Bus 003 Device 001: ID : Bus 002 Device 004: ID 0458:004c KYE Systems Corp. (Mouse Systems) Slimstar Pro Keyboard Bus 002 Device 003: ID 04b4:2050 Cypress Semiconductor Corp. Bus 002 Device 002: ID 045e:00f0 Microsoft Corp. Bus 002 Device 001: ID : Bus 001 Device 001: ID : Bus 005 Device 001: ID : What other info do you want me to post? regards, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rmmod uhci_hcd - BUG: atomic counter underflow
On Mon, 12 Mar 2007, Jiri Slaby wrote: Hi. After rmmoding of uhci_hcd on fresh booted 2.6.21-rc3-mm2 I got this: BUG: atomic counter underflow at: [c0104f0b] show_trace_log_lvl+0x1a/0x30 [c01055f3] show_trace+0x12/0x14 [c010567a] dump_stack+0x16/0x18 [c01dc41b] kref_put+0x4d/0xb2 [c01db754] kobject_put+0x14/0x16 [c01db8a3] kobject_unregister+0x22/0x25 [c024c987] bus_remove_driver+0x75/0x82 [c024d3b8] driver_unregister+0xb/0x18 [c01e7020] pci_unregister_driver+0x13/0x73 [f88dbbd9] uhci_hcd_cleanup+0xd/0x2d [uhci_hcd] [c013fb69] sys_delete_module+0x133/0x195 [c0103fe0] syscall_call+0x7/0xb === Note, that this is connected: Bus 004 Device 001: ID : Bus 003 Device 001: ID : Bus 002 Device 004: ID 0458:004c KYE Systems Corp. (Mouse Systems) Slimstar Pro Keyboard Bus 002 Device 003: ID 04b4:2050 Cypress Semiconductor Corp. Bus 002 Device 002: ID 045e:00f0 Microsoft Corp. Bus 002 Device 001: ID : Bus 001 Device 001: ID : Bus 005 Device 001: ID : What other info do you want me to post? My guess is that this was caused by changes to the driver core, not by anything connected to USB. Would it be possible for you to add the atomic counter underflow check to 2.6.21-rc3 and see if the problem still occurs? If it doesn't, that's a good indication the USB stack isn't guilty -- the bus registration code hasn't changed for several kernel releases. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rmmod uhci_hcd - BUG: atomic counter underflow
Alan Stern napsal(a): On Mon, 12 Mar 2007, Jiri Slaby wrote: After rmmoding of uhci_hcd on fresh booted 2.6.21-rc3-mm2 I got this: BUG: atomic counter underflow at: [...] [c01db754] kobject_put+0x14/0x16 [c01db8a3] kobject_unregister+0x22/0x25 [c024c987] bus_remove_driver+0x75/0x82 [c024d3b8] driver_unregister+0xb/0x18 [c01e7020] pci_unregister_driver+0x13/0x73 [f88dbbd9] uhci_hcd_cleanup+0xd/0x2d [uhci_hcd] [...] Would it be possible for you to add the atomic counter underflow check to 2.6.21-rc3 and see if the problem still occurs? If it doesn't, that's a good indication the USB stack isn't guilty -- the bus registration code hasn't changed for several kernel releases. Yes. regards, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E Hnus [EMAIL PROTECTED] is an alias for /dev/null - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rmmod uhci_hcd - BUG: atomic counter underflow
Jiri Slaby napsal(a): Alan Stern napsal(a): On Mon, 12 Mar 2007, Jiri Slaby wrote: After rmmoding of uhci_hcd on fresh booted 2.6.21-rc3-mm2 I got this: BUG: atomic counter underflow at: [...] [c01db754] kobject_put+0x14/0x16 [c01db8a3] kobject_unregister+0x22/0x25 [c024c987] bus_remove_driver+0x75/0x82 [c024d3b8] driver_unregister+0xb/0x18 [c01e7020] pci_unregister_driver+0x13/0x73 [f88dbbd9] uhci_hcd_cleanup+0xd/0x2d [uhci_hcd] [...] Would it be possible for you to add the atomic counter underflow check to 2.6.21-rc3 and see if the problem still occurs? If it doesn't, that's a good indication the USB stack isn't guilty -- the bus registration code hasn't changed for several kernel releases. Yes. I can confirm, that this issue went upstream and is currently present there. regards, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E Hnus [EMAIL PROTECTED] is an alias for /dev/null - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-serial regression (Oops) in 2.6.21-rc*
On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote: Oliver Neukum wrote: Mark Lord wrote: Okay, from that part (above), the problem is obvious: in that the MCT U232 converter now disconnected appears, and then we continue to try and call the driver's method.. Oops! .. IMHO shutdown() is using serial-port[] and bombs. Could you reverse the order here? Yup. Fixed. Tested. Works. This patch fixes the Oops that otherwise occurs whenever a USB serial adapter is unplugged from a system, as well the Oops seen when one is in use before resume (to RAM). GregKH: This needs to go into 2.6.21-rc*. Signed-off-by: Mark Lord [EMAIL PROTECTED] --- --- 2.6.21-rc3/drivers/usb/serial/usb-serial.c2007-03-12 11:22:43.0 -0400 +++ linux/drivers/usb/serial/usb-serial.c 2007-03-12 16:12:53.0 -0400 @@ -141,6 +141,9 @@ for (i = 0; i serial-num_ports; ++i) serial-port[i]-open_count = 0; + if (serial-type-shutdown) + serial-type-shutdown(serial); + /* the ports are cleaned up and released in port_release() */ for (i = 0; i serial-num_ports; ++i) if (serial-port[i]-dev.parent != NULL) { @@ -148,9 +151,6 @@ serial-port[i] = NULL; } - if (serial-type-shutdown) - serial-type-shutdown(serial); - Argh, no, this change was done to help the ftdi drivers out. Look at changeset d9a7ecacac5f8274d2afce09aadcf37bdb42b93a in Linus's tree from Jim Radford: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d9a7ecacac5f8274d2afce09aadcf37bdb42b93a It makes this change because the usb-serial drivers need the port devices when the port_remove() callbacks happen. Otherwise you get an oops that way. Jim, can you take a look at this and see if you can figure something out? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] 2.6.21-rc3-mm1
Hello, Any thoughts? Another mistake on my part. The correct command is echo -n '2-2:1.0' /sys/bus/usb/drivers/usbhid/unbind Without the -n, the system thinks that the newline character at the end of the line written by echo is part of the filename. Nice tip. Thanks. I've run some tests and as expected - no failure so far. Regards, Mariusz Kozlowski - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] 2.6.21-rc3-mm1
On Mon, 12 Mar 2007, Mariusz Kozlowski wrote: echo -n '2-2:1.0' /sys/bus/usb/drivers/usbhid/unbind Without the -n, the system thinks that the newline character at the end of the line written by echo is part of the filename. Nice tip. Thanks. I've run some tests and as expected - no failure so far. Thanks for testing. The patch fixing this already went to Linus in todays HID/USB HID update (which has not yet been merged). Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/8] per backing_dev dirty and writeback page accounting
On Mon, Mar 12, 2007 at 12:40:47PM +0100, Miklos Szeredi wrote: I have no idea how serious the scalability problems with this are. If they are serious, different solutions can probably be found for the above, but this is certainly the simplest. Atomic operations to a single per-backing device from all CPUs at once? That's a pretty serious scalability issue and it will cause a major performance regression for XFS. OK. How about just accounting writeback pages? That should be much less of a problem, since normally writeback is started from pdflush/kupdate in large batches without any concurrency. Except when you are throttling you bounce the cacheline around each cpu as it triggers foreground writeback. Or is it possible to export the state of the device queue to mm? E.g. could balance_dirty_pages() query the backing dev if there are any outstanding write requests? Not directly - writeback_in_progress(bdi) is a coarse measure indicating pdflush is active on this bdi, which implies outstanding write requests). I'd call this a showstopper right now - maybe you need to look at something like the ZVC code that Christoph Lameter wrote, perhaps? That's rather a heavyweight approach for this I think. But if you want to use per-page accounting, you are going to need a per-cpu or per-zone set of counters on each bdi to do this without introducing regressions. The only info balance_dirty_pages() really needs is whether there are any dirty+writeback bound for the backing dev or not. writeback bound (i.e. writing as fast as we can) is probably indicated fairly reliably by bdi_congested(bdi). Now all you need is the number of dirty pages It knows about the diry pages, since it calls writeback_inodes() which scans the dirty pages for this backing dev looking for ones to write out. It scans the dirty inode list for dirty inodes which indirectly finds the dirty pages. It does not know about the number of dirty pages directly... If after returning from writeback_inodes() wbc-nr_to_write didn't decrease and wbc-pages_skipped is zero then we know that there are no more dirty pages for the device. Or at least there are no dirty pages which aren't already under writeback. Sure, you can tell if there are _no_ dirty pages on the bdi, but if there are dirty pages, you can't tell how many there are. Your followup patches need to know how many dirty+writeback pages there are on the bdi, so I don't really see any way you can solve the deadlock in this manner without scalable bdi-nr_dirty accounting. IIUC, your problem is that there's another bdi that holds all the dirty pages, and this throttle loop never flushes pages from that other bdi and we sleep instead. It seems to me that the fundamental problem is that to clean the pages we need to flush both bdi's, not just the bdi we are directly dirtying. How about a dependent bdi link? i.e. if you have a loopback filesystem, it has a direct bdi (the loopback device) and a dependent bdi - the bdi that belongs to the underlying filesystem. When we enter the throttle loop we flush from the direct bdi and if we fail to flush all the pages we require, we flush the dependent bdi (maybe even just kick pdflush for that bdi) before we call congestion_wait() and go to sleep. This way we are always making progress cleaning pages on the machine, not just transferring dirty pages form one bdi to another. Wouldn't that solve the deadlock without needing painful accounting? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/8] per backing_dev dirty and writeback page accounting
I'll try to explain the reason for the deadlock first. IIUC, your problem is that there's another bdi that holds all the dirty pages, and this throttle loop never flushes pages from that other bdi and we sleep instead. It seems to me that the fundamental problem is that to clean the pages we need to flush both bdi's, not just the bdi we are directly dirtying. This is what happens: write fault on upper filesystem balance_dirty_pages submit write requests loop ... --- fuse IPC --- [fuse loopback fs thread 1] read request sys_write mutex_lock(i_mutex) ... balance_dirty_pages submit write requests loop ... write requests completed ... dirty still over limit ... ... loop forever [fuse loopback fs thread 1] read request sys_write mute_lock(i_mutex) blocks So the queue for the upper filesystem is full. The queue for the lower filesystem is empty. There are no dirty pages in the lower filesystem. So kicking pdflush for the lower filesystem doesn't help, there's nothing to do. balance_dirty_pages() for the lower filesystem should just realize that there's nothing to do and return, and then there would be progress. So there's there's really no need to do any accounting, just some logic to determine that a backing dev is nearly or completely quiescent. And getting out of this tight situation doesn't have to be efficient. This is probably a very rare corner case, that almost never happens in real life, only with aggressive test tools like bash_shared_mapping. OK. How about just accounting writeback pages? That should be much less of a problem, since normally writeback is started from pdflush/kupdate in large batches without any concurrency. Except when you are throttling you bounce the cacheline around each cpu as it triggers foreground writeback. Yeah, we'd loose a bit of CPU, but not any write performance, since it is being throttled back anyway. Or is it possible to export the state of the device queue to mm? E.g. could balance_dirty_pages() query the backing dev if there are any outstanding write requests? Not directly - writeback_in_progress(bdi) is a coarse measure indicating pdflush is active on this bdi, which implies outstanding write requests). Hmm, not quite what I need. I'd call this a showstopper right now - maybe you need to look at something like the ZVC code that Christoph Lameter wrote, perhaps? That's rather a heavyweight approach for this I think. But if you want to use per-page accounting, you are going to need a per-cpu or per-zone set of counters on each bdi to do this without introducing regressions. Yes, this is an option, but I hope for a simpler solution. Thanks, Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/8] per backing_dev dirty and writeback page accounting
On Mon, Mar 12, 2007 at 11:36:16PM +0100, Miklos Szeredi wrote: I'll try to explain the reason for the deadlock first. Ah, thanks for that. IIUC, your problem is that there's another bdi that holds all the dirty pages, and this throttle loop never flushes pages from that other bdi and we sleep instead. It seems to me that the fundamental problem is that to clean the pages we need to flush both bdi's, not just the bdi we are directly dirtying. This is what happens: write fault on upper filesystem balance_dirty_pages submit write requests loop ... Isn't this loop transferring the dirty state from the upper filesystem to the lower filesystem? What I don't see here is how the pages on this filesystem are not getting cleaned if the lower filesystem is being flushed properly. I'm probably missing something big and obvious, but I'm not familiar with the exact workings of FUSE so please excuse my ignorance --- fuse IPC --- [fuse loopback fs thread 1] This is the lower filesystem? Or a callback thread for doing the write requests to the lower filesystem? read request sys_write mutex_lock(i_mutex) ... balance_dirty_pages submit write requests loop ... write requests completed ... dirty still over limit ... ... loop forever Hmmm - the situation in balance_dirty_pages() after an attempt to writeback_inodes(wbc) that has written nothing because there is nothing to write would be: wbc-nr_write == write_chunk wbc-pages_skipped == 0 wbc-encountered_congestion == 0 !bdi_congested(wbc-bdi) What happens if you make that an exit condition to the loop? Or alternatively, adding another bit to the wbc structure to say there was nothing to do and setting that if we find list_empty(sb-s_dirty) when trying to flush dirty inodes. [ FWIW, this may also solve another problem of fast block devices being throttled incorrectly when a slow block dev is consuming all the dirty pages... ] Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [Patch 1/1] IBAC Patch
On Thu, 2007-03-08 at 22:19 -0500, [EMAIL PROTECTED] wrote: On Thu, 08 Mar 2007 17:58:16 EST, Mimi Zohar said: This is a request for comments for a new Integrity Based Access Control(IBAC) LSM module which bases access control decisions on the new integrity framework services. (Hopefully this will help clarify the interaction between an LSM module and LIM module.) OK, between this and the additional LIM hooks I didn't notice in an earlier patch, we're starting to see the API. The only problem is that although it may be the right API for *your* code, I suspect it's a non-starter without a discussion about whether it's the right *generic* API for an LIM (which will require at least one dramatic bun fight about what Integrity means). Absolutely, we need to make sure that the set of LIM hooks is complete and that nothing is missing in order to implement different types of LIM providers. I'm copying the digsig mailing list for their input on requirements, which this API might not satisfy or perhaps address. Index: linux-2.6.21-rc3-mm2/security/ibac/Kconfig Minor congnitive-dissonance alert: +config SECURITY_IBAC_BOOTPARAM + bool IBAC boot parameter + depends on SECURITY_IBAC + default y + If you are unsure how to answer this question, answer N. The 'default' should in general match the hint we give the user. Oops, blush. It will obviously be corrected in the next IBAC patch release. Mimi Zohar - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Odd suspend regression in 2.6.21-rc[123]
On Saturday 10 March 2007 01:18, Ray Lee wrote: Ray Lee wrote: In 2.6.21-rc1,2,3, my laptop will fully suspend to ram, but then *immediately* resumes back from suspension. (It resumes just fine, as well.) [...] HP/Compaq NX6125 system, AMD64, dmesg attached. hg bisect found the below patch as the culprit, and reverting it does fix the regression. It's supposed to address sometime ac/battery update stops after resume from disk. This thread: http://lkml.org/lkml/2007/2/24/111 appears to talk about the same issue, and therefore it may be solved without the below patch, so perhaps we can all be happy. Regardless, I think my laptop no longer being able to go into S3 sleep is a bit more important than someone else's laptop merely not showing the correct AC status :-). Please revert. (git patch id ed41dab90eb40ac4911e60406bc653661f0e4ce1) I'd rather not break the Acer, if possible. Ray, Please test the incremental patch below. --- Subject: ACPI: resolve GPE immediate wakeup regression From: Alexey Starikovskiy [EMAIL PROTECTED] Removing disabling of GPEs from enter_sleep function causes regression on nx6125. Doing disable_all_gpes both in prepare to sleep and in enter sleep resolves regression, while still fixes Acer notebooks. Signed-off-by: Alexey Starikovskiy [EMAIL PROTECTED] Signed-off-by: Len Brown [EMAIL PROTECTED] --- drivers/acpi/hardware/hwsleep.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/hardware/hwsleep.c b/drivers/acpi/hardware/hwsleep.c index 8fa9312..c84b1fa 100644 --- a/drivers/acpi/hardware/hwsleep.c +++ b/drivers/acpi/hardware/hwsleep.c @@ -300,6 +300,11 @@ acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state) /* * 2) Enable all wakeup GPEs */ + status = acpi_hw_disable_all_gpes(); + if (ACPI_FAILURE(status)) { + return_ACPI_STATUS(status); + } + acpi_gbl_system_awake_and_running = FALSE; status = acpi_hw_enable_all_wakeup_gpes(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ACPI: resolve GPE immediate wakeup regression
Len Brown wrote: On Saturday 10 March 2007 01:18, Ray Lee wrote: Ray Lee wrote: In 2.6.21-rc1,2,3, my laptop will fully suspend to ram, but then *immediately* resumes back from suspension. (It resumes just fine, as well.) [...] HP/Compaq NX6125 system, AMD64, dmesg attached. I'd rather not break the Acer, if possible. Ray, Please test the incremental patch below. Tested and Alexey's patch (copied below) fixes the problem. I added a signed-off-by just in case; feel free to yank it if inappropriate. Regardless, please apply. Thanks Len, Alexey. Ray --- Subject: ACPI: resolve GPE immediate wakeup regression From: Alexey Starikovskiy [EMAIL PROTECTED] Removing disabling of GPEs from enter_sleep function causes regression on nx6125. Doing disable_all_gpes both in prepare to sleep and in enter sleep resolves regression, while still fixes Acer notebooks. Signed-off-by: Alexey Starikovskiy [EMAIL PROTECTED] Signed-off-by: Len Brown [EMAIL PROTECTED] Signed-off-by: Ray Lee [EMAIL PROTECTED] --- drivers/acpi/hardware/hwsleep.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/hardware/hwsleep.c b/drivers/acpi/hardware/hwsleep.c index 8fa9312..c84b1fa 100644 --- a/drivers/acpi/hardware/hwsleep.c +++ b/drivers/acpi/hardware/hwsleep.c @@ -300,6 +300,11 @@ acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state) /* * 2) Enable all wakeup GPEs */ + status = acpi_hw_disable_all_gpes(); + if (ACPI_FAILURE(status)) { + return_ACPI_STATUS(status); + } + acpi_gbl_system_awake_and_running = FALSE; status = acpi_hw_enable_all_wakeup_gpes(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ACPI: resolve GPE immediate wakeup regression
On Monday 12 March 2007 12:59, Ray Lee wrote: Len Brown wrote: On Saturday 10 March 2007 01:18, Ray Lee wrote: Ray Lee wrote: In 2.6.21-rc1,2,3, my laptop will fully suspend to ram, but then *immediately* resumes back from suspension. (It resumes just fine, as well.) [...] HP/Compaq NX6125 system, AMD64, dmesg attached. I'd rather not break the Acer, if possible. Ray, Please test the incremental patch below. Tested and Alexey's patch (copied below) fixes the problem. I added a signed-off-by just in case; feel free to yank it if inappropriate. Regardless, please apply. Thanks Len, Alexey. Thanks for testing Ray, I'll apply this one. -Len --- Subject: ACPI: resolve GPE immediate wakeup regression From: Alexey Starikovskiy [EMAIL PROTECTED] Removing disabling of GPEs from enter_sleep function causes regression on nx6125. Doing disable_all_gpes both in prepare to sleep and in enter sleep resolves regression, while still fixes Acer notebooks. Signed-off-by: Alexey Starikovskiy [EMAIL PROTECTED] Signed-off-by: Len Brown [EMAIL PROTECTED] Signed-off-by: Ray Lee [EMAIL PROTECTED] --- drivers/acpi/hardware/hwsleep.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/hardware/hwsleep.c b/drivers/acpi/hardware/hwsleep.c index 8fa9312..c84b1fa 100644 --- a/drivers/acpi/hardware/hwsleep.c +++ b/drivers/acpi/hardware/hwsleep.c @@ -300,6 +300,11 @@ acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state) /* * 2) Enable all wakeup GPEs */ + status = acpi_hw_disable_all_gpes(); + if (ACPI_FAILURE(status)) { + return_ACPI_STATUS(status); + } + acpi_gbl_system_awake_and_running = FALSE; status = acpi_hw_enable_all_wakeup_gpes(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI disabled due to DMI failure or blacklisted year should be noted, as is done with other ACPI blacklisting
On Wednesday 07 March 2007 17:00, [EMAIL PROTECTED] wrote: The patch titled ACPI disabled due to DMI failure or blacklisted year should be noted, as is done with other ACPI blacklisting has been added to the -mm tree. Its filename is acpi-disabled-due-to-dmi-failure-or-blacklisted-year-should-be-noted-as-is-done-with-other-acpi-blacklisting.patch Thank you for applying it, Andrew. good one -- i just ran into this yesterday:-) applied. thanks, -len No, thank you. Tony Godshall - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
On Tue, Mar 13, 2007 at 07:27:06AM +0530, Balbir Singh wrote: I am not sure what went wrong. Could you please check your mail client, cause it seemed to even change email address to smtp.osdl.org which bounced back when I wrote to you earlier. I have a problem doing a group-reply in mutt to Herbert's mails. His email id gets dropped from the To or Cc list. Is that his email setting? Don't know. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/12/07, David Lang [EMAIL PROTECTED] wrote: the problem comes when this isn't enough. if you have several CPU hogs on a system, and they are all around the same priority level, how can the scheduler know which one needs the CPU the most for good interactivity? in some cases you may be able to directly detect that your high-priority process is waiting for another one (tracing pipes and local sockets for example), but what if you are waiting for several of them? (think a multimedia desktop waiting for the sound card, CDRom, hard drive, and video all at once) which one needs the extra CPU the most? I'm not an expert in this area by any means but after reading this thread the OSX solution of simply telling the kernel I'm the GUI, schedule me accordingly looks increasingly attractive. Why make the kernel guess when we can just be explicit? Does anyone know of a UNIX-like system that has managed to solve this problem without hooking the GUI into the scheduler? Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Monday 12 March 2007, Douglas McNaught wrote: Patrick Mau [EMAIL PROTECTED] writes: Why not temporarly replace /bin/tar with a shell script that does: #!/bin/sh exec strace -f -o output /bin/real.tar $@ You beat me to it. :) I've done that before; it's a great suggestion. Except that if you expect 'tar' to be invoked multiple times in a run, you should probably use 'output.$$' for the output filename so things don't get clobbered. -Doug In my case, Doug, it will get invoked 64 times, amanda does a dummy run to get an estimate, calculates what to do based on that output which is 32 runs, 1 per disklist entry and I have 32, and then reruns tar with the appropriate level options against each individual disklist entry. But I'm puzzled a bit, what does the double $$ do?, or it buried someplace in the bash manpage? Its not something I've stumbled over yet. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) rugged, adj.: Too heavy to lift. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On 3/12/07, Gene Heskett [EMAIL PROTECTED] wrote: On Monday 12 March 2007, Douglas McNaught wrote: Patrick Mau [EMAIL PROTECTED] writes: Why not temporarly replace /bin/tar with a shell script that does: #!/bin/sh exec strace -f -o output /bin/real.tar $@ You beat me to it. :) I've done that before; it's a great suggestion. Except that if you expect 'tar' to be invoked multiple times in a run, you should probably use 'output.$$' for the output filename so things don't get clobbered. -Doug In my case, Doug, it will get invoked 64 times, amanda does a dummy run to get an estimate, calculates what to do based on that output which is 32 runs, 1 per disklist entry and I have 32, and then reruns tar with the appropriate level options against each individual disklist entry. But I'm puzzled a bit, what does the double $$ do?, or it buried someplace in the bash manpage? Its not something I've stumbled over yet. buried indeed: Special Parameters: ... $ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the sub‐ shell. Thanks, Nish - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make sure we populate the initroot filesystem late enough
On Mar 12, 2007, at 6:01 PM, Paul TBBle Hampson wrote: On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote: On Wed, 2007-02-28 at 10:13 +, David Woodhouse wrote: On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote: I wouldn't be that sure ... I've had problems in the past with PMU based cpufreq... looks like flushing all caches and hard-resetting the processor on the fly when there can be pending DMAs might be a source of trouble... especially on CPUs that don't have working cache flush HW assist. I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq. I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook. They all fall over with the latest kernel, although the shinybook only does so immediately when booted with mem=512M. The shinybook does crash later with new kernels though; I don't yet know why. It could be the same thing, or it could be something different. That one seemed to appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where we did nothing but turned CONFIG_SYSFS_DEPRECATED on. I don't blame cpufreq. At various times I've been equally convinced that it was due to CONFIG_KPROBES, and Linus' initrd-moving patch. Is there any pattern to the way it dies? Or is it just randomly dieing somewhere depending on which config options you have enabled? This is starting to sound reminiscent of a bug I chased for a while last year on Power5, but didn't find. It was fixed on some machines by disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options. Unfortunately it magically stopped reproducing so I never caught it :/ Hmm. The crash came back after I booted into Mac OS X and back. It was however a different crash, I believe it was coming from the USB modules (as it would keep going when it happened, and get another crash, which tended to scroll away too fast for me to capture) but I believe it was still getting down into the slab code and actually dying there. However, reverting the reversion of 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying the following patch: diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- source-2.6.20/arch/powerpc/mm/init_32.c --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 2007-02-05 05:44:54.0 +1100 +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c 2007-03-10 11:03:56.0 +1100 @@ -244,7 +244,8 @@ void free_initrd_mem(unsigned long start, unsigned long end) { if (start end) - printk (Freeing initrd memory: %ldk freed\n, (end - start) 10); + printk (NOT Freeing initrd memory: %ldk freed\n, (end - start) 10); + return; for (; start end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); init_page_count(virt_to_page(start)); which if I recall correctly David Woodhouse posted to this thread, seems to have fixed it. I dunno if it's relevant, but my initrd.img is 13193315 bytes long, (ie 99 bytes over 12884k) and the above logs: NOT Freeing initrd memory: 12888k freed which makes sense... I of course completely failed to think to check this with the crashing kernel, if it seems relevant I can roll back to it and get the numbers. Have you tried 2.6.20.2, there was a significant bug in get_order() that was deemed to be causing these issues. - k - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
On Tuesday 13 March 2007 10:46, David Miller wrote: From: Con Kolivas [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 10:58:11 +1100 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. 30.patch FWIW, this boots and seems to work well on sparc64. Tested on UP SunBlade1500 and 24cpu Niagara T1000. Very nice. Thanks for the feedback and I'm sorry you have to work with such lousy hardware. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Question: removal of syscall macros?
2006/12/14, Teunis Peters [EMAIL PROTECTED]: Now that syscall macros have been pulled from the -mm tree, what method is recommended to use syscalls? (I've wasted a day grubbing through sources before giving up and copying the old syscall macros into one key driver) _syscall macros are used by: ATI driver (no choice. I'm working with laptops) I have the same problem as yours. Do you have any idea to use ATI firegl driver in recent kernels ? Thanks in advance. Regards, albcamus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
On Mon, Mar 12, 2007 at 05:45:24PM -0500, Anton Blanchard wrote: Then please document it _clearly_ with the kthread code somewhere. Document as well in the kernel_thread() API, as I notice people still use kernel_thread() some places (ex: rtasd.c in powerpc arch)? The reason I brought this up is I had no idea we had to put the freezer gunk in all kernel thread loops and Ive been writing kernel threads for years. I noticed that in the Powerpc code (atleast for rtas kernel thread) here: http://lkml.org/lkml/2007/1/9/61 That was not a serious problem perhaps because process freezer was mostly used in software suspend and only those platforms supporting software suspend had to worry abt it. But now we intend to use process freezer for CPU hotplug as well, so all platforms wanting to support CPU hotplug better support process freezer! P.S : I believe kprobes is already using process freezer as well. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Attachment Received Autoreply
Thank you for your file-sample. We will review your email and either send you a response or forward to the appropriate contact. If you have sent us a file which is not in a password protected zip file (password - infected) then your sample will not be reviewed. __ Virus Research accepts file-samples for analysis and possible inclusion into AV signature DAT sets. We are also prepared to answer general virus questions. Virus Research does not handle product related issues. This message has been sent based upon keywords in your message. If you have been sent this message in error, please resend your message with the word noauto in the subject line. __ Information on recent threats, along with other AVERT resources and tools, can be found at: http://www.mcafeesecurity.com/us/security/home.asp All product-related questions and comments can be addressed through technical support. Contact information for Technical Support can be found at: http://www.mcafeesecurity.com/us/contact/home.htm. Engine and DAT updates are available at: http://www.mcafeesecurity.com/us/downloads/updates For instructions on submitting a sample to AVERT please see: http://vil.nai.com/vil/submit-sample.asp If you suspect you have a new, unknown virus and have a system where you can do a test scan, you may first wish to try our Beta Hourly DATs to get the latest detection available at: http://vil.mcafeesecurity.com/vil/averttools.asp Thanks - McAfee AVERT(tm) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Monday 12 March 2007, Nish Aravamudan wrote: On 3/12/07, Gene Heskett [EMAIL PROTECTED] wrote: On Monday 12 March 2007, Douglas McNaught wrote: Patrick Mau [EMAIL PROTECTED] writes: Why not temporarly replace /bin/tar with a shell script that does: #!/bin/sh exec strace -f -o output /bin/real.tar $@ You beat me to it. :) I've done that before; it's a great suggestion. Except that if you expect 'tar' to be invoked multiple times in a run, you should probably use 'output.$$' for the output filename so things don't get clobbered. -Doug In my case, Doug, it will get invoked 64 times, amanda does a dummy run to get an estimate, calculates what to do based on that output which is 32 runs, 1 per disklist entry and I have 32, and then reruns tar with the appropriate level options against each individual disklist entry. But I'm puzzled a bit, what does the double $$ do?, or it buried someplace in the bash manpage? Its not something I've stumbled over yet. buried indeed: Special Parameters: ... $ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the sub‐ shell. Well, that's clear enough, but what of the double $$ case? Would this them make a PID unique to each invocation untill it finally wraps a 16 bit value, or will the kernel re-use them because they won't all be running simultainiously, but limited by the number of unique 'spindle' numbers on the system, this to prevent as best as it can, the thrashing of a drive by having tar working on 2 separate (or more) partitions at the same time. In my case 2 are possible, as /var is on a separate drive. Thanks, Nish -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Say yur prayers, yuh flea-pickin' varmint! -- Yosemite Sam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3280277 - ynlg
AVERT Labs - Beaverton Current Scan Engine Version:5100.0194 Current DAT Version:4982. Thank you for your submission. Analysis ID: 3280277 File NameFindings Detection Type Extra |--| ||- [EMAIL PROTECTED]|current detection |w32/[EMAIL PROTECTED] |Virus |no current detection [EMAIL PROTECTED] The file received is infected and can be detected and removed with our current DAT files and engine. It is recommended that you update your DAT and engine files and scan your computer again. If you are not seeing this with the product you are using, please speak with technical support so that they can help you determine the cause of this discrepancy. To find detailed information about viruses and other malware, please review AVERT's Virus Information Library: http://vil.mcafeesecurity.com In order to get the fastest possible response, you may wish to submit future virus-samples to: https://www.webimmune.net/default.asp In most cases it can respond almost instantly with a solution. This may also be the best option if you are having a problem with gateway scanners stripping your sample submission. If you believe your computer is infected, but are unsure which files should be submitted to AVERT for review, please visit: http://vil.mcafeesecurity.com/vil/submit-sample.aspx For other virus-related information, please review the AVERT homepage at: http://www.mcafee.com/us/threat_center/default.asp Support - Virus Research accepts file-samples for analysis and possible inclusion into AV signature DAT sets. We are also prepared to answer general virus questions. All product-related questions and comments can be addressed through technical support and customer service, including: * Product installation and update questions * Product usage questions * Specific operating system/version questions * Assistance with detection and cleaning or removal of viruses or trojans Use the following link to update your DAT and scan engine to the most current version: http://www.mcafee.com/apps/downloads/security_updates/dat.asp Use the following links to reach online technical support for McAfee products - Corporate Customers: http://www.mcafeesecurity.com/us/support/ Single User/Retail Customers: http://www.mcafeehelp.com Note - Due to the prevalence of network gateway AV products, it is important that all submissions be zipped and the zip file password-protected (password - infected). Some products will reject an email that contains a virus that is not sent in this way. In addition, often we receive a file that appears not to have been infected, to find later that the file was infected when it left the sender, and was cleaned somewhere along the line. Regards, McAfee AVERT tm A division of McAfee, Inc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21rc suspend to ram regression on Lenovo X60
I spent considerable time over the last day or so bisecting to find out why an X60 stopped resuming somewhen between 2.6.20 and current -git. (Total lockup, black screen of death). The bisect log looked like this. git-bisect start # bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1 git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20 git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7 # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8 # bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1 # good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit user-tokens (or drm_file offsets) git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f # good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6 git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7 # good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu support git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee # bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7 # good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c # good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk with calls to pci_no_msi() git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d # good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix msi_remove_pci_irq_vectors. git-bisect good 866a8c87c4e51046602387953bbef76992107bcb # good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more architectures git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe # good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert PCI: remove duplicate device id from ata_piix git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee which led me to a final 'bad' commit of 78149df6d565c36675463352d0bfeb02b7a7 which is a merge changeset of lots of PCI bits. Seeing a couple of MSI changes in there, on a hunch I booted latest tree with pci=nomsi, and it resumed again. Any ideas how to further debug this? I'll try backing out individual changes from that merge tomorrow. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mar 12, 2007, at 11:26:25, Linus Torvalds wrote: So good fairness really should involve some notion of work done for others. It's just not very easy to do.. Maybe extend UNIX sockets to add another passable object type vis-a- vis SCM_RIGHTS, except in this case SCM_CPUTIME. You call SCM_CPUTIME with a time value in monotonic real-time nanoseconds (duration) and a value out of 100 indicating what percentage of your timeslices to give to the process (for the specified duration). The receiving process would be informed of the estimated total number of nanoseconds of timeslice that it will be given based on the priority of the processes. (Maybe it could prioritize requests?). The X libraries could then properly pass CPU time to the X server to help with rendering their requests, and the X server could give priority to tasks which give up more CPU time than is needed to render their data, and penalize those which use more than they give. Initially even if you don't patch the X server you could at least patch the X clients to give up CPU to the X server to promote interactivity. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote: On Tuesday 13 March 2007 10:46, David Miller wrote: From: Con Kolivas [EMAIL PROTECTED] Date: Mon, 12 Mar 2007 10:58:11 +1100 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. 30.patch FWIW, this boots and seems to work well on sparc64. Tested on UP SunBlade1500 and 24cpu Niagara T1000. Very nice. Thanks for the feedback and I'm sorry you have to work with such lousy hardware. BTW, I don't know if you say this as a joke, but those are not necessarily lousy hardware. Sun does lousy hardware when they put Sparcs in PCs (ultra5, ultra10, blade100). But their servers generally are nice with large memory busses and very scalable SMP architectures. Regards, Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] statically initialize struct pid for swapper
From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH 1/5] statically initialize struct pid for swapper Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED] Cc: Cedric Le Goater [EMAIL PROTECTED] Cc: Dave Hansen [EMAIL PROTECTED] Cc: Serge Hallyn [EMAIL PROTECTED] Cc: Eric Biederman [EMAIL PROTECTED] Cc: Herbert Poetzl [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/init_task.h | 27 +++ include/linux/pid.h |2 ++ kernel/pid.c |2 ++ 3 files changed, 31 insertions(+) Index: lx26-20-mm2c/include/linux/init_task.h === --- lx26-20-mm2c.orig/include/linux/init_task.h 2007-02-28 15:47:44.0 -0800 +++ lx26-20-mm2c/include/linux/init_task.h 2007-02-28 15:48:07.0 -0800 @@ -96,6 +96,28 @@ extern struct group_info init_groups; #define INIT_PREEMPT_RCU #endif +#define INIT_STRUCT_PID { \ + .count = ATOMIC_INIT(1), \ + .nr = 0,\ + /* Don't put this struct pid in pid_hash */ \ + .pid_chain = { .next = NULL, .pprev = NULL }, \ + .tasks = { \ + { .first = init_task.pids[PIDTYPE_PID].node }, \ + { .first = init_task.pids[PIDTYPE_PGID].node },\ + { .first = init_task.pids[PIDTYPE_SID].node }, \ + }, \ + .rcu= RCU_HEAD_INIT,\ +} + +#define INIT_PID_LINK(type)\ +{ \ + .node = { \ + .next = NULL, \ + .pprev = init_struct_pid.tasks[type].first,\ + }, \ + .pid = init_struct_pid,\ +} + /* * INIT_TASK is used to set up the first task table, touch at * your own risk!. Base=0, limit=0x1f (=2MB) @@ -145,6 +167,11 @@ extern struct group_info init_groups; .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ .fs_excl= ATOMIC_INIT(0), \ .pi_lock= SPIN_LOCK_UNLOCKED, \ + .pids = { \ + [PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID),\ + [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \ + [PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID),\ + }, \ INIT_TRACE_IRQFLAGS \ INIT_LOCKDEP\ } Index: lx26-20-mm2c/include/linux/pid.h === --- lx26-20-mm2c.orig/include/linux/pid.h 2007-02-28 15:48:07.0 -0800 +++ lx26-20-mm2c/include/linux/pid.h2007-02-28 15:48:07.0 -0800 @@ -51,6 +51,8 @@ struct pid struct rcu_head rcu; }; +extern struct pid init_struct_pid; + struct pid_link { struct hlist_node node; Index: lx26-20-mm2c/kernel/pid.c === --- lx26-20-mm2c.orig/kernel/pid.c 2007-02-28 15:48:07.0 -0800 +++ lx26-20-mm2c/kernel/pid.c 2007-02-28 15:48:07.0 -0800 @@ -27,11 +27,13 @@ #include linux/bootmem.h #include linux/hash.h #include linux/pid_namespace.h +#include linux/init_task.h #define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) static struct hlist_head *pid_hash; static int pidhash_shift; static struct kmem_cache *pid_cachep; +struct pid init_struct_pid = INIT_STRUCT_PID; int pid_max = PID_MAX_DEFAULT; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] Use struct pid parameter in copy_process()
From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH 3/5] Use struct pid parameter in copy_process() Modify copy_process() to take a struct pid * parameter instead of a pid_t. This simplifies the code a bit and also avoids having to call find_pid() to convert the pid_t to a struct pid. Changelog: - Fixed Badari Pulavarty's comments and passed in init_struct_pid from fork_idle(). - Fixed Eric Biederman's comments and simplified this patch and used a new patch to remove the likely(pid) check. Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED] Cc: Cedric Le Goater [EMAIL PROTECTED] Cc: Dave Hansen [EMAIL PROTECTED] Cc: Serge Hallyn [EMAIL PROTECTED] Cc: Eric Biederman [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman [EMAIL PROTECTED] --- kernel/fork.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Index: lx26-21-rc3-mm2/kernel/fork.c === --- lx26-21-rc3-mm2.orig/kernel/fork.c 2007-03-12 17:16:39.0 -0700 +++ lx26-21-rc3-mm2/kernel/fork.c 2007-03-12 17:17:48.0 -0700 @@ -966,7 +966,7 @@ static struct task_struct *copy_process( unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr, - int pid) + struct pid *pid) { int retval; struct task_struct *p = NULL; @@ -1033,7 +1033,7 @@ static struct task_struct *copy_process( p-did_exec = 0; delayacct_tsk_init(p); /* Must remain after dup_task_struct() */ copy_flags(clone_flags, p); - p-pid = pid; + p-pid = pid_nr(pid); INIT_LIST_HEAD(p-children); INIT_LIST_HEAD(p-sibling); @@ -1265,7 +1265,7 @@ static struct task_struct *copy_process( list_add_tail_rcu(p-tasks, init_task.tasks); __get_cpu_var(process_counts)++; } - attach_pid(p, PIDTYPE_PID, find_pid(p-pid)); + attach_pid(p, PIDTYPE_PID, pid); nr_threads++; } @@ -1336,7 +1336,8 @@ struct task_struct * __cpuinit fork_idle struct task_struct *task; struct pt_regs regs; - task = copy_process(CLONE_VM, 0, idle_regs(regs), 0, NULL, NULL, 0); + task = copy_process(CLONE_VM, 0, idle_regs(regs), 0, NULL, NULL, + init_struct_pid); if (!IS_ERR(task)) init_idle(task, cpu); @@ -1364,7 +1365,7 @@ long do_fork(unsigned long clone_flags, return -EAGAIN; nr = pid-nr; - p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, nr); + p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, pid); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] Explicitly set pgid and sid of init process
From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH 2/5] Explicitly set pgid and sid of init process Explicitly set pgid and sid of init process to 1. Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED] Cc: Cedric Le Goater [EMAIL PROTECTED] Cc: Dave Hansen [EMAIL PROTECTED] Cc: Serge Hallyn [EMAIL PROTECTED] Cc: Eric Biederman [EMAIL PROTECTED] Cc: Herbert Poetzl [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman [EMAIL PROTECTED] --- init/main.c |1 + 1 file changed, 1 insertion(+) Index: lx26-20-mm2c/init/main.c === --- lx26-20-mm2c.orig/init/main.c 2007-02-28 15:49:13.0 -0800 +++ lx26-20-mm2c/init/main.c2007-02-28 15:49:35.0 -0800 @@ -791,6 +791,7 @@ static int __init init(void * unused) */ init_pid_ns.child_reaper = current; + __set_special_pids(1, 1); cad_pid = task_pid(current); smp_prepare_cpus(max_cpus); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/