Re: [BUG] scheduler: first timeslice of the exiting thread
On Sun, 2007-04-08 at 23:09 -0700, Andrew Morton wrote: > On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi <[EMAIL PROTECTED]> wrote: > > > When I was examining the following program ... > > > > 1. There are a large amount of small jobs takes several msecs, > > and the number of job increases constantly. > > 2. The process creates a thread or a process per job (I examined both > > the thread model and the process model). > > 3. Each child process/thread does the assigned job and exit immediately. > > > > ... I found that the thread model's latency is longer than proess > > model's one against my expectation. It's because of the current > > sched_fork()/sched_exit() implementation as follows: > > > > a) On sched_fork, the creator share its timeslice with new process. > > b) On sched_exit, if the exiting process didn't exhaust its first > > timeslice yet, it gives its timeslice to the parent. > > > > It has no problem on the process model since the creator is the parent. > > However, on the thread model, the creator is not the parent, it is same > > as the creator's parent. Hence, on this kind of program, the creator > > can't retrieve shared timeslice and exausts its timeslice at a rate of > > knots. In addition, somehow, the parent (typically shell?) gets extra > > timeslice. > > > > I believe it's a bug and the exiting process should give its timeslice > > to the creator. Now I have some patch plan to fix this problem as follow: > > > > a) Add the field for the creator to task_struct. It needs extra memory. > > b) Doesn't add extra field and have thread's parent the creater, which is > > same as process creation. However it has many side effects, for example, > > we also need to change sys_getppid() implementation. > > > > What do you think? Any comments are welcome. > > This comes at an awkward time, because we might well merge the > staircase/deadline work into 2.6.22, and I think it rewrites the part of > the scheduler which is causing the problems you're observing. > > Has anyone verified that SD fixes this problem and the one at > http://lkml.org/lkml/2007/4/7/21 ? Not verified either way in testing, but I believe this should be a problem for SD as well because timeslice fork/exit handling is identical with mainline. Individual slices are much smaller than mainline, so priority should drop rapidly, consuming bandwidth allotted for the current rotation, sending the creator off to the expired array prematurely. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SD scheduler testing hitch
On Sun, 2007-04-08 at 21:34 +0300, Al Boldi wrote: > Mike Galbraith wrote: > > On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote: > > > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent > > > here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The > > > starving 10% duty cycle task has trouble getting 1% CPU. > > > > Hmm. Playing with it some more today, it still happens, but it's not > > very repeatable. Something is odd. I wonder if any SD using readers > > will try it. > > Tried it on mainline 2.6.20.3. > It's not easily repeatable, but it's got the same problem. > > top - 21:21:45 up 27 min, 0 users, load average: 0.80, 0.43, 0.20 > Tasks: 45 total, 3 running, 42 sleeping, 0 stopped, 0 zombie > Cpu(s): 24.3% user, 0.5% system, 0.0% nice, 75.0% idle, 0.2% IO-wait > Mem:499488k total,27352k used, 472136k free, 1996k buffers > Swap: 1020088k total,0k used, 1020088k free, 9160k cached > > PID PR NI VIRT RES SHR SWAP nFLT nDRT WCHAN S %CPUTIME+ > Command > 688 25 0 1804 412 352 139200 rest_init R 94.7 2:37.01 > fairtest > 689 15 0 1804 264 204 154000 rest_init R 0.0 0:00.79 > fairtest Aha! Thanks a bunch for testing it. (thing was irritating me greatly) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Mon, 2007-04-09 at 01:23 -0400, Gene Heskett wrote: > This may not be so informative, its almost behaving ATM. > > 29252 amanda22 0 1856 572 220 R 76.4 0.1 1:07.24 gzip > 29235 amanda15 0 2992 1224 888 S 5.6 0.1 0:02.80 chunker > 29500 root 18 0 2996 1164 788 S 4.0 0.1 0:02.40 tar > 10459 amanda15 0 3340 1052 832 S 3.0 0.1 0:49.04 amandad > 10536 amanda15 0 3276 1308 1004 S 2.3 0.1 0:40.92 dumper > 29496 amanda18 0 2808 472 280 S 2.0 0.0 0:01.73 sendbackup > 4057 gkrellmd 15 0 11568 1172 896 S 1.3 0.1 7:45.82 gkrellmd > 29498 amanda18 0 2396 780 656 S 1.0 0.1 0:00.60 tar > 19183 root 15 0 000 S 0.7 0.0 0:01.92 pdflush > Yeah, this is showing the scheduler behaving properly. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] scheduler: first timeslice of the exiting thread
On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi <[EMAIL PROTECTED]> wrote: > When I was examining the following program ... > > 1. There are a large amount of small jobs takes several msecs, > and the number of job increases constantly. > 2. The process creates a thread or a process per job (I examined both > the thread model and the process model). > 3. Each child process/thread does the assigned job and exit immediately. > > ... I found that the thread model's latency is longer than proess > model's one against my expectation. It's because of the current > sched_fork()/sched_exit() implementation as follows: > > a) On sched_fork, the creator share its timeslice with new process. > b) On sched_exit, if the exiting process didn't exhaust its first > timeslice yet, it gives its timeslice to the parent. > > It has no problem on the process model since the creator is the parent. > However, on the thread model, the creator is not the parent, it is same > as the creator's parent. Hence, on this kind of program, the creator > can't retrieve shared timeslice and exausts its timeslice at a rate of > knots. In addition, somehow, the parent (typically shell?) gets extra > timeslice. > > I believe it's a bug and the exiting process should give its timeslice > to the creator. Now I have some patch plan to fix this problem as follow: > > a) Add the field for the creator to task_struct. It needs extra memory. > b) Doesn't add extra field and have thread's parent the creater, which is > same as process creation. However it has many side effects, for example, > we also need to change sys_getppid() implementation. > > What do you think? Any comments are welcome. This comes at an awkward time, because we might well merge the staircase/deadline work into 2.6.22, and I think it rewrites the part of the scheduler which is causing the problems you're observing. Has anyone verified that SD fixes this problem and the one at http://lkml.org/lkml/2007/4/7/21 ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Mon, 2007-04-09 at 01:16 -0400, Gene Heskett wrote: > On Monday 09 April 2007, Mike Galbraith wrote: > >So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the > >problem? > > > > -Mike > > That looks as if it should demo it pretty well if I understand correctly > everything you're doing there. Well, I let it process my ~250GB of data with my current tree, and it looked utterly harmless (and since I'm running SMP, was of course). I'll try building UP to make sure, and check mainline as well. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Mon, 2007-04-09 at 00:08 -0400, Gene Heskett wrote: > On Monday 09 April 2007, Mike Galbraith wrote: > > > > > >Actually, there was practically nil interest in testing. We made a > >couple of minor adjustments to the interactivity logic, and all went > >quiet, so I didn't think it was enough of a problem to require more > >intrusive countermeasures. > > > > -Mike > > Does one of these messages have a url so I can test the latest of your > patches for -rc6? Or was the one Ingo sent the most recent? No, my tree has a bugfix and some other adjustments that try to move the balance closer to fair without sacrificing interactivity. > Putting that url in your sig would be nice, and might result in its > getting a lot more exersize which should = more feedback. When I get it cleaned up and better tested, I'll post again. If you want, I'll CC you... willing victims are a highly valued commodity :) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SD scheduler testing hitch
On Mon, 2007-04-09 at 02:23 +0200, Dmitry Adamushko wrote: > > [...] > > Well, it's a late hour, so maybe I'm missing something... but it does > > look to be HZ and "will run" time interval related issue. Like > > described in (*). Or maybe we both observe similar situations but have > > different reasons behind them. > > I meant that account_user_time() is also called from timer_ISR -> > update_process_times() like scheduler_tick(). So if task's running > intervals are shorter than 1/HZ, it's not always accounted --> so cpu% > may be wrong for such a task... I think you're right wrt percentages, and that's making accurate measurement of SD fairness difficult. However, total runtime for user tasks should be pretty accurate for kernels that use nanoseconds, because they're added every time a tasks passes through schedule(). BTW, the aberration I noticed with my unverified "testcase" does _seem_ to be repeatable here. Once behavior changes, after a reboot the repeatability returns. I have no idea what's going on, but something is sure fishy. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fw: Re: + add-locking-to-evdev.patch added to -mm tree
On Fri, Mar 30, 2007 at 02:06:05PM -0700, [EMAIL PROTECTED] wrote: > > The patch titled > Add locking to evdev > has been added to the -mm tree. Its filename is > add-locking-to-evdev.patch > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find > out what to do about this > > -- > Subject: Add locking to evdev > From: Dmitry Torokhov <[EMAIL PROTECTED]> > > Input: evdev - implement proper locking OK, so I have to ask -- this is protecting multiple clients of a given mouse or keyboard, right? Doesn't look like it has much to do with connecting multiple mice/keyboards/joysticks/whatever to a given system, but thought I should ask. Excellent start, but some concerns marked with "!!!". If these are fixed, either by educating me or by appropriate changes, I will ack. A signal-related question for Oleg marked with "Oleg". Thanx, Paul > Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> > Cc: "Paul E. McKenney" <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > drivers/input/evdev.c | 351 > 1 files changed, 254 insertions(+), 97 deletions(-) > > diff -puN drivers/input/evdev.c~add-locking-to-evdev drivers/input/evdev.c > --- a/drivers/input/evdev.c~add-locking-to-evdev > +++ a/drivers/input/evdev.c > @@ -31,6 +31,8 @@ struct evdev { > wait_queue_head_t wait; > struct evdev_client *grab; > struct list_head client_list; > + spinlock_t client_lock; OK, what does this one protect? o ev_attach_client(): client_list field (permitting RCU readers). Adds element. o evdev_detach_client(): ditto, but deletes element. o evdev_hangup(): scans the list hanging off of the client_list field, invoking kill_fasync() on each. Looks to be delivering a POLL_HUP to all parties receiving events. Apparently the lock is preventing an entry from being deleted out from under evdev_hangup(). Need to check races with close(), I guess... (For example, it would be bad to have the process torn down to the point that it could not tolerate receiving (or ignoring) a signal before removing itself from the list.) o Readers of the evdev->client_list can use RCU. > + struct mutex mutex; And what does this one protect? o evdev_flush(): evdev->exist flag (which handles race with RCU removal?) Also invokes input_flush_device(), which invokes some flush-handler function. There may be more issues here, but they would be with users of evdev rather than with evdev itself, I am guessing. o evdev_release(): invokes evdev_ungrab(). This NULLs the evdev->grab field using rcu_assign_pointer(). o evdev_write(): invokes evdev_event_from_user() and input_inject_event(). The former copies from user space, so ->mutex indeed cannot be a spinlock. Not sure what we are protecting here -- perhaps event traffic? @@@ o evdev_ioctl_handler(): protecting ioctl. Consistent with the thought of protecting event traffic. o evdev_mark_dead(): protect setting evdev->exist to zero, adding weight to the speculation under evdev_flush() above that ->exist handles the race with RCU removal. o Readers of evdev->grab can use RCU. RCU readers caring about concurrent deletion should check for evdev->exist under evdev->mutex. Lock order: o evdev->client_lock => fown_struct->lock o fown_struct->lock => tasklist_lock o tasklist_lock => sighand_struct->siglock o evdev_table_mutex => evdev->client_lock. > struct device dev; > }; > > @@ -38,39 +40,48 @@ struct evdev_client { > struct input_event buffer[EVDEV_BUFFER_SIZE]; > int head; > int tail; > + spinlock_t buffer_lock; And what does this one protect? Presumably a buffer! ;-) o evdev_pass_event(): adding an event to evdev_client->buffer. This includes the evdev_client->head field. !!! Why doesn't this function need to check the evdev_client->tail field??? How do we know we won't overflow the buffer??? o evdev_new_client() [was evdev_open()]: evdev_client->client field (attaching the evdev to its client, apparently). Invokes evdev_attach_client() to do the list manipulation (protected in turn by evdev->client_lock). Argh... Strike that -- spin_lock_init() rather than spin_lock(). o evdev_fetch_next_event(): removing an event from evdev_client->buffer. This includes evdev_client->head and evdev_client->tail. > struct fasync_struct *fasync; > struct evdev *evdev; > struct list_head node; > };
Re: Ten percent test
On Sun, 2007-04-08 at 09:08 -0400, Ed Tomlinson wrote: > Hi, > > I am one of those who have been happily testing Con's patches. > > They work better than mainline here. (I tried a UP kernel yesterday, and even a single kernel build would make noticeable hitches if I move a window around. YMMV etc.) > If one really needs some sort of interactivity booster (I do not with SD), why > not move it into user space? With SD it would be simple enough to export > some info on estimated latency. With this user space could make a good > attempt to keep latency within bounds for a set of tasks just by renicing I don't think you can have very much effect on latency using nice with SD once the CPU is fully utilized. See below. /* * This contains a bitmap for each dynamic priority level with empty slots * for the valid priorities each different nice level can have. It allows * us to stagger the slots where differing priorities run in a way that * keeps latency differences between different nice levels at a minimum. * ie, where 0 means a slot for that priority, priority running from left to * right: * nice -20 * nice -10 1001000100100010001001000100010010001000 * nice 0 0101010101010101010101010101010101010101 * nice 5 1101011010110101101011010110101101011011 * nice 10 0110111011011101110110111011101101110111 * nice 15 0101101101011011 * nice 19 1110 */ Nice allocates bandwidth, but as long as the CPU is busy, tasks always proceed downward in priority until they hit the expired array. That's the design. If X gets busy and expires, and a nice 20 CPU hog wakes up after it's previous rotation has ended, but before the current rotation is ended (ie there is 1 task running at wakeup time), X will take a guaranteed minimum 160ms latency hit (quite noticeable) independent of nice level. The only way to avoid it is to use a realtime class. A nice -20 task has maximum bandwidth allocated, but that also makes it a bigger target for preemption from tasks at all nice levels as it proceeds downward toward expiration. AFAIKT, low latency scheduling just isn't possible once the CPU becomes 100% utilized, but it is bounded to runqueue length. In mainline OTOH, a nice -20 task will always preempt a nice 0 task, giving it instant gratification, and latency of lower priority tasks is bounded by the EXPIRED_STARVING(rq) safety net. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.21-rc5-git] make /proc/acpi/wakeup more useful
> On Sat, 2007-04-07 at 13:08 -0700, David Brownell wrote: > > On Friday 06 April 2007 10:01 pm, Greg KH wrote: > > > > > Are you _sure_ you have a 1-to-1 relationship here? No multiple devices > > > pointing to the same acpi node? Or the other way around? If so, you > > > are going to have to change the name to be something more unique. > > > > I've wondered that too. The short answer: APCI only supports 1-1 > > here. > > Right. > > > It will emit warnings if it tries to bind more than one ACPI > > device to a given "real" device ... but errors the other way are > > silently ignored. > > My understanding is different. > First, one "real" device can only have one device.archdata.acpi_handle, > which means it can only be bound to one ACPI device. > Second, AE_ALREADY_EXISTS will be returned when ACPI tries to bind more > than one "real" devices to the same ACPI device. Exactly. The "first" case emits a warning, the "second" case doesn't; no matter what it is (though I only saw ALREADY_EXISTS). When I added a warning to that case: > > By adding a warning over this create-links patch, I found that the > > system in the $SUBJECT patch (and likely every ACPI system) has > > two different nodes that correspond to one ACPI node: > > > > /sys/devices/pci:00 ... pci root node > > /sys/devices/pnp0/00:00 ... id PNP0a03 > > /sys/devices/acpi_system:00/device:00/PNP0A03:00 ... ditto > > > > Arguably that's too many sysfs nodes for one device... Presumably you've noticed this same thing (not necessarily pnp0/00:00) on other systems ... > > Plus, there's the issue of flakey ACPI tables; in the $SUBJECT patch > > both MDM and AUD nodes exist in the ACPI namespace, but they could > > only refer to one PCI device (with MDM as the wakeup source, not AUD > > as listed in the table). Or maybe that's another case where the ACPI > > code isn't handling the tables as sensibly as it might... > > Could you attach this acpidump please? :) Off-list; yes. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Monday 09 April 2007, Mike Galbraith wrote: >On Sun, 2007-04-08 at 13:57 -0400, Gene Heskett wrote: >> On Sunday 08 April 2007, Mike Galbraith wrote: >> >On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote: >> >> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote: >> >> > That seems to be the killer loading here, building a kernel (make >> >> > -j3) doesn't seem to lag it all that bad. One session of gzip >> >> > -best makes it fall plumb over though, which was a >> >> > disappointment. >> >> >> >> Can you make a testcase that doesn't require amanda? >> > >> >Or at least send me a couple of 5 or 10 second top snapshots (which >> > also show CPU usage of sleeping tasks) while the system is >> > misbehaving? >> > >> >-Mike >> >> With what monitor utility? > >Top. > > -Mike This may not be so informative, its almost behaving ATM. 29252 amanda22 0 1856 572 220 R 76.4 0.1 1:07.24 gzip 29235 amanda15 0 2992 1224 888 S 5.6 0.1 0:02.80 chunker 29500 root 18 0 2996 1164 788 S 4.0 0.1 0:02.40 tar 10459 amanda15 0 3340 1052 832 S 3.0 0.1 0:49.04 amandad 10536 amanda15 0 3276 1308 1004 S 2.3 0.1 0:40.92 dumper 29496 amanda18 0 2808 472 280 S 2.0 0.0 0:01.73 sendbackup 4057 gkrellmd 15 0 11568 1172 896 S 1.3 0.1 7:45.82 gkrellmd 29498 amanda18 0 2396 780 656 S 1.0 0.1 0:00.60 tar 19183 root 15 0 000 S 0.7 0.0 0:01.92 pdflush I also note with some disdain that I'm half a megabyte into swap, but I've had FF-2.0.0.3 busy for the last hour while amanda was trying to find a few cycles at the same time. Looking at a bunch of pdf's of circuit boards to see if I wanna build them for my milling machine. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Fatal Error: Found MS-Windows System -> Repartitioning Disk for Linux... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Monday 09 April 2007, Mike Galbraith wrote: >On Sun, 2007-04-08 at 13:56 -0400, Gene Heskett wrote: >> On Sunday 08 April 2007, Mike Galbraith wrote: >> >On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote: >> >> That seems to be the killer loading here, building a kernel (make >> >> -j3) doesn't seem to lag it all that bad. One session of gzip >> >> -best makes it fall plumb over though, which was a disappointment. >> > >> >Can you make a testcase that doesn't require amanda? >> > >> >-Mike >> >> Sure. Try 'tar czf nameofarchive.tar.gz /path/to-dir-to-be-backed-up' >> >> Or, from the runtar log from this morning, and this is all one line: >> >> runtar.20070408022016.debug:running: /bin/tar: 'gtar' '--create' >> '--file' '-' '--directory' '/usr/dlds-rpms' '--one-file-system' >> '--listed-incremental' >> '/usr/local/var/amanda/gnutar-lists/coyote_usr_dlds-rpms_1.new' >> '--sparse' '--ignore-failed-read' '--totals' '--exclude-from' >> '/tmp/amanda/sendbackup._usr_dlds-rpms.20070408022016.exclude' '.' >> >> and amanda will if requested, pipe that output through a |gzip -best, >> and its this process that brings the machine to the table begging for >> scraps like a puppy. Tar by itself can be felt but isn't bad. > >So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the >problem? > > -Mike That looks as if it should demo it pretty well if I understand correctly everything you're doing there. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) In /users3 did Kubla Kahn A stately pleasure dome decree, Where /bin, the sacred river ran Through Test Suites measureless to Man Down to a sunless C. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel
On Mon, 09 Apr 2007 00:58:53 +0200, "Richard Knutsson" <[EMAIL PROTECTED]> said: > Wow, I'm impressed. Think you got the record on how many mails you > referenced to in a reply... TWO actually. I guess you are easily impressed. A simple cut and paste error. > You have got some rude answers and you have called them back on it Yeah, I (fairly closely) mimicked their behavior to make a point. > + you have repeated the same statement several times, that is > not the best way of convincing people. I know you DON'T believe that, as you are about the tenth person to repeat that "repeating stuff has no effect." > I believe you picked up the "anti-Reiser religion"-phrase from previous > rant-wars (otherwise, why does that "religion"-phrase always come up, > and (almost) only when dealing with Reiser-fs), and yes, there has been > some clashes caused by both sides, so please be careful when dealing > with this matter. NO. You people simply come across as zealots who work together, against Reiser4. Hence the term "anti-Reiser religion." > Would you be willing to benchmark Reiser4 with some compressed > binary-blob and show the time as well as the CPU-usage? I might be. I don't really know how to set it all up. Perhaps if you guided me through it. > > > > You deliberately ignored the fact that bad blocks are NOT dealt with by > > the filesystem,... but by the operating system. Like I said: If your > > filesystem is writing to bad blocks, then throw away your operating > > system. > > > I may have missed something, but if my room-mate took my harddrive, > screwed it open, wrote a love-letter on the disk with a pencil and then > returned it (ok, there may be some more plausible reasons for > corruption), is the OS really suppose to handle it? Yeah, I can't see how the OS could read the love-letter either. But one thing is for sure. The FS ain't responsible for reading it. > Yes, it should not > assign any new data to those blocks but should it not also fall into the > file-systems domain to be able to restore some/all data? It's a tough ask of any FS. Microsoft's filesystem checker totally roasted all my data on an XP-box last night. I had used ntfsresize to reduce the partition size and had a power outage. Later, Windows booted, ran the filesystem checker, seemed OK. Next time I boot, all I get is Input/Output error. > > Just my 2c to the pond > Richard Knutsson > Addin my 2c John. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - A no graphics, no pop-ups email service - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
On Monday 09 April 2007, [EMAIL PROTECTED] wrote: > >I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY. > Many of us have, and recall the pain. We'll pass thank you. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) It is indeed desirable to be well descended, but the glory belongs to our ancestors. -- Plutarch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
[EMAIL PROTECTED] wrote: YOU GUYS WILL LAUGH ABOUT THIS: Yes, we are laughing at you. You keep using bonnie++ after being told it's a poor benchmark. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
[EMAIL PROTECTED] wrote: REISER4 FOR INCLUSION IN THE LINUX KERNEL. Dave Lynch takes a reasoned approach to REISER4. Dave Lynch wrote: Jeff Garzik wrote: If the compelling reason is that it needs a test, I'd say its not ready. Can you please elaborate ? I am not sure I understand what you are arguing ? Jeff Garzik is "saying" that he wants REISER4 to stay out of the main kernel, for reasons he is not willing to tell you. False. I have told you the reasons. I for one would at least play with it if it were in the distribution tree. I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY. You can download it now. Nobody is stopping you, or anyone else. As far as I could tell Hans pretty much everything else that was demanded. Hans eventually caved and provided - albeit with much pissing and moaning, and holy than thou rhetoric. It was not his pissing and moaning, etc,... these were just excuses to keep REISER4 from succeeding. The truth is, that any excuse would do. The real reasons are financial and backed by big money (sometimes, big egos). Put down the conspiracy crackpipe. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
YOU GUYS WILL LAUGH ABOUT THIS: I forgot the all the statistics that might support the sase for REISER4 inclusion. Well, here it all is: http://linuxhelp.150m.com/resources/fs-benchmarks.htm and http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. Column one measures the time taken to complete the bonnie++ benchmarking test (run with the parameters bonnie++ -n128:128k:0) Column two, Disk Usage: measures the amount of disk used to store 655MB of raw data (which was 3 different copies of the Linux kernel sources). OR LOOK AT THE FULL RESULTS: .-. |File |Disk |Copy |Copy |Tar |Unzip| Del | |System |Usage|655MB|655MB|Gzip |UnTar| 2.5 | |Type | (MB)| (1) | (2) |655MB|655MB| Gig | .-. |REISER4 gzip | 213 | 148 | 68 | 83 | 48 | 70 | |REISER4 lzo | 278 | 138 | 56 | 80 | 34 | 84 | |REISER4 tails| 673 | 148 | 63 | 78 | 33 | 65 | |REISER4 | 692 | 148 | 55 | 67 | 25 | 56 | |NTFS3g | 772 |1333 |1426 | 585 | 767 | 194 | |NTFS | 779 | 781 | 173 | X | X | X | |REISER3 | 793 | 184 | 98 | 85 | 63 | 22 | |XFS | 799 | 220 | 173 | 119 | 90 | 106 | |JFS | 806 | 228 | 202 | 95 | 97 | 127 | |EXT4 extents | 806 | 162 | 55 | 69 | 36 | 32 | |EXT4 default | 816 | 174 | 70 | 74 | 42 | 50 | |EXT3 | 816 | 182 | 74 | 73 | 43 | 51 | |EXT2 | 816 | 201 | 82 | 73 | 39 | 67 | |FAT32| 988 | 253 | 158 | 118 | 81 | 95 | .-. Each test was preformed 5 times and the average value recorded. Disk Usage: The amount of disk used to store the data (which was 3 different copies of the Linux kernel sources). The raw data (without filesystem meta-data, block alignment wastage, etc) was 655MB. Copy 655MB (1): Copy the data over a partition boundary. Copy 655MB (2): Copy the data within a partition. Tar Gzip 655MB: Tar and Gzip the data. Unzip UnTar 655MB: UnGzip and UnTar the data. Del 2.5 Gig: Delete everything just written (about 2.5 Gig). To get a feel for the performance increases that can be achieved by using compression, we look at the total time (in seconds) to run the test: bonnie++ -n128:128k:0 (bonnie++ is Version 1.93c) .---. | FILESYSTEM | TIME | .---. |REISER4 lzo | 1938| |REISER4 gzip| 2295| |REISER4 | 3462| |EXT4| 4408| |EXT2| 4092| |JFS | 4225| |EXT3| 4421| |XFS | 4625| |REISER3 | 6178| |FAT32 | 12342| |NTFS-3g |>10414| .---. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - IMAP accessible web-mail - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add a norecovery option to ext3/4?
Eric Sandeen wrote: Samuel Thibault wrote: Hi, Distribution installers usually try to probe OSes for building a suited grub menu. Unfortunately, mounting an ext3 partition, even in read-only mode, does perform some operations on the filesystem (log recovery). This is not a good idea since it may silently garbage data. Can you elaborate? Under what circumstances is log replay going to harm data? Do you mean that the installer mounts partitions, looking for what OS is installed? How is that harmful? It'll wreak havoc on my hibernated system when I've suspended it to do a test OS install on one of my spare partitions. The log replay will go fine, but then when the system resumes it's idea of what's on the disk won't match what is really there and ugly, ugly things happen. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/14] sysfs: fix error handling in binattr write()
Error handling in fs/sysfs/bin.c:write() was wrong because size_t count is used to receive return value from flush_write() which is negative on failure. This patch updates write() such that int variable is used instead. read() is updated the same way for consistency. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/bin.c | 21 - 1 files changed, 8 insertions(+), 13 deletions(-) diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c index d3b9f5f..8273dd6 100644 --- a/fs/sysfs/bin.c +++ b/fs/sysfs/bin.c @@ -33,16 +33,13 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) } static ssize_t -read(struct file * file, char __user * userbuf, size_t count, loff_t * off) +read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off) { char *buffer = file->private_data; struct dentry *dentry = file->f_path.dentry; int size = dentry->d_inode->i_size; loff_t offs = *off; - int ret; - - if (count > PAGE_SIZE) - count = PAGE_SIZE; + int count = min_t(size_t, bytes, PAGE_SIZE); if (size) { if (offs > size) @@ -51,10 +48,9 @@ read(struct file * file, char __user * userbuf, size_t count, loff_t * off) count = size - offs; } - ret = fill_read(dentry, buffer, offs, count); - if (ret < 0) - return ret; - count = ret; + count = fill_read(dentry, buffer, offs, count); + if (count < 0) + return count; if (copy_to_user(userbuf, buffer, count)) return -EFAULT; @@ -78,16 +74,15 @@ flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count) return attr->write(kobj, buffer, offset, count); } -static ssize_t write(struct file * file, const char __user * userbuf, -size_t count, loff_t * off) +static ssize_t write(struct file *file, const char __user *userbuf, +size_t bytes, loff_t *off) { char *buffer = file->private_data; struct dentry *dentry = file->f_path.dentry; int size = dentry->d_inode->i_size; loff_t offs = *off; + int count = min_t(size_t, bytes, PAGE_SIZE); - if (count > PAGE_SIZE) - count = PAGE_SIZE; if (size) { if (offs > size) return 0; -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/14] sysfs: add sysfs_dirent->s_parent
Add sysfs_dirent->s_parent. With this patch, each sd points to and holds a reference to its parent. This allows walking sysfs tree without referencing sd->s_dentry which can go away anytime if the user doesn't control when it's deleted. sd->s_parent is initialized and parent is referenced in sysfs_attach_dirent(). Reference to parent is released when the sd is released, so as long as reference to a sd is held, s_parent can be followed. dentry walk in sysfs_readdir() is convereted to s_parent walk. This will be used to reimplement symlink such that it uses only sysfs_dirent tree. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c | 27 --- fs/sysfs/mount.c |1 + fs/sysfs/sysfs.h |1 + 3 files changed, 22 insertions(+), 7 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 3e460f7..8c35a60 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -16,6 +16,11 @@ DECLARE_RWSEM(sysfs_rename_sem); void release_sysfs_dirent(struct sysfs_dirent * sd) { + struct sysfs_dirent *parent_sd; + + repeat: + parent_sd = sd->s_parent; + if (sd->s_type & SYSFS_KOBJ_LINK) { struct sysfs_symlink * sl = sd->s_element; kfree(sl->link_name); @@ -24,6 +29,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) } kfree(sd->s_iattr); kmem_cache_free(sysfs_dir_cachep, sd); + + sd = parent_sd; + if (sd && atomic_dec_and_test(&sd->s_count)) + goto repeat; } static void sysfs_d_iput(struct dentry * dentry, struct inode * inode) @@ -71,8 +80,10 @@ void sysfs_attach_dirent(struct sysfs_dirent *sd, dentry->d_op = &sysfs_dentry_ops; } - if (parent_sd) + if (parent_sd) { + sd->s_parent = sysfs_get(parent_sd); list_add(&sd->s_sibling, &parent_sd->s_children); + } } /* @@ -508,7 +519,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) i++; /* fallthrough */ case 1: - ino = (unsigned long)dentry->d_parent->d_fsdata; + ino = (unsigned long)parent_sd->s_parent; if (filldir(dirent, "..", 2, i, ino, DT_DIR) < 0) break; filp->f_pos++; @@ -625,13 +636,13 @@ int sysfs_make_shadowed_dir(struct kobject *kobj, struct dentry *sysfs_create_shadow_dir(struct kobject *kobj) { + struct dentry *dir = kobj->dentry; + struct inode *inode = dir->d_inode; + struct dentry *parent = dir->d_parent; + struct sysfs_dirent *parent_sd = parent->d_fsdata; + struct dentry *shadow; struct sysfs_dirent *sd; - struct dentry *parent, *dir, *shadow; - struct inode *inode; - dir = kobj->dentry; - inode = dir->d_inode; - parent = dir->d_parent; shadow = ERR_PTR(-EINVAL); if (!sysfs_is_shadowed_inode(inode)) goto out; @@ -643,6 +654,8 @@ struct dentry *sysfs_create_shadow_dir(struct kobject *kobj) sd = sysfs_new_dirent(kobj, inode->i_mode, SYSFS_DIR); if (!sd) goto nomem; + /* point to parent_sd but don't attach to it */ + sd->s_parent = sysfs_get(parent_sd); sysfs_attach_dirent(sd, NULL, shadow); d_instantiate(shadow, igrab(inode)); diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index 23a48a3..141f7b1 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -28,6 +28,7 @@ static const struct super_operations sysfs_ops = { }; static struct sysfs_dirent sysfs_root = { + .s_count= ATOMIC_INIT(1), .s_sibling = LIST_HEAD_INIT(sysfs_root.s_sibling), .s_children = LIST_HEAD_INIT(sysfs_root.s_children), .s_element = NULL, diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h index 0be1d94..f95ab31 100644 --- a/fs/sysfs/sysfs.h +++ b/fs/sysfs/sysfs.h @@ -1,5 +1,6 @@ struct sysfs_dirent { atomic_ts_count; + struct sysfs_dirent * s_parent; struct list_heads_sibling; struct list_heads_children; void* s_element; -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Sun, 2007-04-08 at 20:51 +0200, Rene Herman wrote: > On 04/08/2007 12:41 PM, Ingo Molnar wrote: > > > this is pretty hard to get right, and the most objective way to change > > it is to do it testcase-driven. FYI, interactivity tweaking has been > > gradual, the last bigger round of interactivity changes were done a year > > ago: > > > > commit 5ce74abe788a26698876e66b9c9ce7e7acc25413 > > Author: Mike Galbraith <[EMAIL PROTECTED]> > > Date: Mon Apr 10 22:52:44 2006 -0700 > > > > [PATCH] sched: fix interactive task starvation > > > > (and a few smaller tweaks since then too.) > > > > and that change from Mike responded to a testcase. Mike's latest changes > > (the ones you just tested) were mostly driven by actual testcases too, > > which measured long-term timeslice distribution fairness. > > Ah yes, that one. Here's the next one in that series: > > commit f1adad78dd2fc8edaa513e0bde92b4c64340245c > Author: Linus Torvalds <[EMAIL PROTECTED]> > Date: Sun May 21 18:54:09 2006 -0700 > > Revert "[PATCH] sched: fix interactive task starvation" > > It personally had me wonder if _anyone_ was testing this stuff... Well of course not. Making random untested changes, and reverting them later is half the fun of kernel development. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/14] sysfs: consolidate sysfs_dirent creation functions
Currently there are four functions to create sysfs_dirent - __sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and sysfs_make_dirent(). Other than sysfs_make_dirent(), no function has two users if calls to implement other functions are excluded. This patch consolidates sysfs_dirent creation functions into the following two. * sysfs_new_dirent() : allocate and initialize * sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or associate with dentry This simplifies interface and gives callers more flexibility. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c | 82 fs/sysfs/file.c| 21 ++--- fs/sysfs/symlink.c |7 ++-- fs/sysfs/sysfs.h |7 +++- 4 files changed, 50 insertions(+), 67 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 0005117..3e460f7 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -42,10 +42,7 @@ static struct dentry_operations sysfs_dentry_ops = { .d_iput = sysfs_d_iput, }; -/* - * Allocates a new sysfs_dirent and links it to the parent sysfs_dirent - */ -static struct sysfs_dirent * __sysfs_new_dirent(void * element) +struct sysfs_dirent *sysfs_new_dirent(void *element, umode_t mode, int type) { struct sysfs_dirent * sd; @@ -57,25 +54,25 @@ static struct sysfs_dirent * __sysfs_new_dirent(void * element) atomic_set(&sd->s_event, 1); INIT_LIST_HEAD(&sd->s_children); INIT_LIST_HEAD(&sd->s_sibling); + sd->s_element = element; + sd->s_mode = mode; + sd->s_type = type; return sd; } -static void __sysfs_list_dirent(struct sysfs_dirent *parent_sd, - struct sysfs_dirent *sd) +void sysfs_attach_dirent(struct sysfs_dirent *sd, +struct sysfs_dirent *parent_sd, struct dentry *dentry) { - if (sd) - list_add(&sd->s_sibling, &parent_sd->s_children); -} + if (dentry) { + sd->s_dentry = dentry; + dentry->d_fsdata = sysfs_get(sd); + dentry->d_op = &sysfs_dentry_ops; + } -static struct sysfs_dirent * sysfs_new_dirent(struct sysfs_dirent *parent_sd, - void * element) -{ - struct sysfs_dirent *sd; - sd = __sysfs_new_dirent(element); - __sysfs_list_dirent(parent_sd, sd); - return sd; + if (parent_sd) + list_add(&sd->s_sibling, &parent_sd->s_children); } /* @@ -103,39 +100,6 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd, return 0; } - -static struct sysfs_dirent * -__sysfs_make_dirent(struct dentry *dentry, void *element, mode_t mode, int type) -{ - struct sysfs_dirent * sd; - - sd = __sysfs_new_dirent(element); - if (!sd) - goto out; - - sd->s_mode = mode; - sd->s_type = type; - sd->s_dentry = dentry; - if (dentry) { - dentry->d_fsdata = sysfs_get(sd); - dentry->d_op = &sysfs_dentry_ops; - } - -out: - return sd; -} - -int sysfs_make_dirent(struct sysfs_dirent * parent_sd, struct dentry * dentry, - void * element, umode_t mode, int type) -{ - struct sysfs_dirent *sd; - - sd = __sysfs_make_dirent(dentry, element, mode, type); - __sysfs_list_dirent(parent_sd, sd); - - return sd ? 0 : -ENOMEM; -} - static int init_dir(struct inode * inode) { inode->i_op = &sysfs_dir_inode_operations; @@ -179,10 +143,11 @@ static int create_dir(struct kobject *kobj, struct dentry *parent, if (sysfs_dirent_exist(parent->d_fsdata, name)) goto out_dput; - error = sysfs_make_dirent(parent->d_fsdata, dentry, kobj, mode, - SYSFS_DIR); - if (error) + error = -ENOMEM; + sd = sysfs_new_dirent(kobj, mode, SYSFS_DIR); + if (!sd) goto out_drop; + sysfs_attach_dirent(sd, parent->d_fsdata, dentry); error = sysfs_create(dentry, mode, init_dir); if (error) @@ -197,7 +162,6 @@ static int create_dir(struct kobject *kobj, struct dentry *parent, goto out_dput; out_sput: - sd = dentry->d_fsdata; list_del_init(&sd->s_sibling); sysfs_put(sd); out_drop: @@ -494,13 +458,16 @@ static int sysfs_dir_open(struct inode *inode, struct file *file) { struct dentry * dentry = file->f_path.dentry; struct sysfs_dirent * parent_sd = dentry->d_fsdata; + struct sysfs_dirent * sd; mutex_lock(&dentry->d_inode->i_mutex); - file->private_data = sysfs_new_dirent(parent_sd, NULL); + sd = sysfs_new_dirent(NULL, 0, 0); + if (sd) + sysfs_attach_dirent(sd, parent_sd, NULL); mutex_unlock(&dentry->d_inode->i_mutex); - return file->private_da
[PATCH 01/14] sysfs: fix i_ino handling in sysfs
Inode number handling was incorrect in two ways. 1. sysfs uses the inode number allocated by new_inode() and never hashes it. When reporting the inode number, it uses iunique() if inode is inaccessible. This is incorrect because iunique() assumes the inodes are hashed. This can cause duplicate inode numbers and the condition is likely to happen because new_inode() and iunique() use separate increasing static counters to scan for empty slot. 2. sysfs_dirent->s_dentry can go away anytime and can't be referenced unless the caller knows the dentry is not and not going to be deleted. This patch makes sysfs report the pointer to sysfs_dirent as ino. ino_t is always as big as or larger than unsigned long && sysfs_dirent hierarchy is the internal representation of the sysfs tree, so it makes sense and simple to implement. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c | 11 --- fs/sysfs/inode.c |1 + 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 85a6686..5112f88 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -504,19 +504,19 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) struct sysfs_dirent * parent_sd = dentry->d_fsdata; struct sysfs_dirent *cursor = filp->private_data; struct list_head *p, *q = &cursor->s_sibling; - ino_t ino; + unsigned long ino; int i = filp->f_pos; switch (i) { case 0: - ino = dentry->d_inode->i_ino; + ino = (unsigned long)parent_sd; if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0) break; filp->f_pos++; i++; /* fallthrough */ case 1: - ino = parent_ino(dentry); + ino = (unsigned long)dentry->d_parent->d_fsdata; if (filldir(dirent, "..", 2, i, ino, DT_DIR) < 0) break; filp->f_pos++; @@ -538,10 +538,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) name = sysfs_get_name(next); len = strlen(name); - if (next->s_dentry) - ino = next->s_dentry->d_inode->i_ino; - else - ino = iunique(sysfs_sb, 2); + ino = (unsigned long)next; if (filldir(dirent, name, len, filp->f_pos, ino, dt_type(next)) < 0) diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c index 4de5c6b..b8b010c 100644 --- a/fs/sysfs/inode.c +++ b/fs/sysfs/inode.c @@ -140,6 +140,7 @@ struct inode * sysfs_new_inode(mode_t mode, struct sysfs_dirent * sd) inode->i_mapping->a_ops = &sysfs_aops; inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info; inode->i_op = &sysfs_inode_operations; + inode->i_ino = (unsigned long)sd; lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key); if (sd->s_iattr) { -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Sun, 2007-04-08 at 13:57 -0400, Gene Heskett wrote: > On Sunday 08 April 2007, Mike Galbraith wrote: > >On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote: > >> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote: > >> > That seems to be the killer loading here, building a kernel (make > >> > -j3) doesn't seem to lag it all that bad. One session of gzip -best > >> > makes it fall plumb over though, which was a disappointment. > >> > >> Can you make a testcase that doesn't require amanda? > > > >Or at least send me a couple of 5 or 10 second top snapshots (which also > >show CPU usage of sleeping tasks) while the system is misbehaving? > > > > -Mike > > With what monitor utility? Top. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/14] sysfs: implement kobj_sysfs_assoc_lock
kobj->dentry can go away anytime unless the user controls when the associated sysfs node is deleted. This patch implements kobj_sysfs_assoc_lock which protects kobj->dentry. This will be used to maintain kobj based API when converting sysfs to use sysfs_dirent tree instead of dentry/kobject. Note that this lock belongs to kobject/driver-model not sysfs. Once sysfs is converted to not use kobject in its interface, this can be removed from sysfs. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c |8 +++- fs/sysfs/sysfs.h |1 + 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 4070dc4..707eba9 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -13,6 +13,7 @@ #include "sysfs.h" DECLARE_RWSEM(sysfs_rename_sem); +spinlock_t kobj_sysfs_assoc_lock = SPIN_LOCK_UNLOCKED; void release_sysfs_dirent(struct sysfs_dirent * sd) { @@ -371,8 +372,13 @@ static void __sysfs_remove_dir(struct dentry *dentry) void sysfs_remove_dir(struct kobject * kobj) { - __sysfs_remove_dir(kobj->dentry); + struct dentry *d = kobj->dentry; + + spin_lock(&kobj_sysfs_assoc_lock); kobj->dentry = NULL; + spin_unlock(&kobj_sysfs_assoc_lock); + + __sysfs_remove_dir(d); } int sysfs_rename_dir(struct kobject * kobj, struct dentry *new_parent, diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h index b1a8a7e..5c41fc5 100644 --- a/fs/sysfs/sysfs.h +++ b/fs/sysfs/sysfs.h @@ -60,6 +60,7 @@ extern void sysfs_remove_subdir(struct dentry *); extern void sysfs_drop_dentry(struct sysfs_dirent *sd, struct dentry *parent); extern int sysfs_setattr(struct dentry *dentry, struct iattr *iattr); +extern spinlock_t kobj_sysfs_assoc_lock; extern struct rw_semaphore sysfs_rename_sem; extern struct super_block * sysfs_sb; extern const struct file_operations sysfs_dir_operations; -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/14] sysfs: make sysfs_dirent->s_element a union
Make sd->s_element a union of sysfs_elem_{dir|symlink|attr|bin_attr} and rename it to s_elem. This is to achieve... * some level of type checking : changing symlink to point to sysfs_dirent instead of kobject is much safer and less painful now. * easier / standardized dereferencing * allow sysfs_elem_* to contain more than one entry Where possible, pointer is obtained by directly deferencing from sd instead of going through other entities. This reduces dependencies to dentry, inode and kobject. to_attr() and to_bin_attr() are unused now and removed. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/bin.c | 18 ++-- fs/sysfs/dir.c | 31 +--- fs/sysfs/file.c| 19 + fs/sysfs/inode.c |2 +- fs/sysfs/mount.c |1 - fs/sysfs/symlink.c | 23 +++- fs/sysfs/sysfs.h | 56 --- 7 files changed, 71 insertions(+), 79 deletions(-) diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c index 8273dd6..0f0027b 100644 --- a/fs/sysfs/bin.c +++ b/fs/sysfs/bin.c @@ -23,7 +23,8 @@ static int fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) { - struct bin_attribute * attr = to_bin_attr(dentry); + struct sysfs_dirent *attr_sd = dentry->d_fsdata; + struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; struct kobject * kobj = to_kobj(dentry->d_parent); if (!attr->read) @@ -65,7 +66,8 @@ read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off) static int flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count) { - struct bin_attribute *attr = to_bin_attr(dentry); + struct sysfs_dirent *attr_sd = dentry->d_fsdata; + struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; struct kobject *kobj = to_kobj(dentry->d_parent); if (!attr->write) @@ -101,9 +103,9 @@ static ssize_t write(struct file *file, const char __user *userbuf, static int mmap(struct file *file, struct vm_area_struct *vma) { - struct dentry *dentry = file->f_path.dentry; - struct bin_attribute *attr = to_bin_attr(dentry); - struct kobject *kobj = to_kobj(dentry->d_parent); + struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; + struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; + struct kobject *kobj = to_kobj(file->f_path.dentry->d_parent); if (!attr->mmap) return -EINVAL; @@ -114,7 +116,8 @@ static int mmap(struct file *file, struct vm_area_struct *vma) static int open(struct inode * inode, struct file * file) { struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent); - struct bin_attribute * attr = to_bin_attr(file->f_path.dentry); + struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; + struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; int error = -EINVAL; if (!kobj || !attr) @@ -150,7 +153,8 @@ static int open(struct inode * inode, struct file * file) static int release(struct inode * inode, struct file * file) { struct kobject * kobj = to_kobj(file->f_path.dentry->d_parent); - struct bin_attribute * attr = to_bin_attr(file->f_path.dentry); + struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; + struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; u8 * buffer = file->private_data; kobject_put(kobj); diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 525c0e7..4070dc4 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -21,11 +21,8 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) repeat: parent_sd = sd->s_parent; - if (sd->s_type & SYSFS_KOBJ_LINK) { - struct sysfs_symlink * sl = sd->s_element; - kobject_put(sl->target_kobj); - kfree(sl); - } + if (sd->s_type & SYSFS_KOBJ_LINK) + kobject_put(sd->s_elem.symlink.target_kobj); if (sd->s_type & SYSFS_COPY_NAME) kfree(sd->s_name); kfree(sd->s_iattr); @@ -52,8 +49,7 @@ static struct dentry_operations sysfs_dentry_ops = { .d_iput = sysfs_d_iput, }; -struct sysfs_dirent *sysfs_new_dirent(const char *name, void *element, - umode_t mode, int type) +struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type) { char *dup_name = NULL; struct sysfs_dirent * sd; @@ -76,7 +72,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, void *element, INIT_LIST_HEAD(&sd->s_sibling); sd->s_name = name; - sd->s_element = element; sd->s_mode = mode; sd->s_type = type; @@ -111,7 +106,7 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd, struct sysfs_dirent * sd;
[PATCH 11/14] sysfs: implement bin_buffer
Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE buffer to properly synchronize accesses to per-openfile buffer and prepare for immediate-kobj-disconnect. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/bin.c | 64 ++- 1 files changed, 49 insertions(+), 15 deletions(-) diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c index 0f0027b..1dd1bf1 100644 --- a/fs/sysfs/bin.c +++ b/fs/sysfs/bin.c @@ -20,6 +20,11 @@ #include "sysfs.h" +struct bin_buffer { + struct mutexmutex; + void*buffer; +}; + static int fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) { @@ -36,7 +41,7 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) static ssize_t read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off) { - char *buffer = file->private_data; + struct bin_buffer *bb = file->private_data; struct dentry *dentry = file->f_path.dentry; int size = dentry->d_inode->i_size; loff_t offs = *off; @@ -49,17 +54,23 @@ read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off) count = size - offs; } - count = fill_read(dentry, buffer, offs, count); + mutex_lock(&bb->mutex); + + count = fill_read(dentry, bb->buffer, offs, count); if (count < 0) - return count; + goto out_unlock; - if (copy_to_user(userbuf, buffer, count)) - return -EFAULT; + if (copy_to_user(userbuf, bb->buffer, count)) { + count = -EFAULT; + goto out_unlock; + } pr_debug("offs = %lld, *off = %lld, count = %zd\n", offs, *off, count); *off = offs + count; + out_unlock: + mutex_unlock(&bb->mutex); return count; } @@ -79,7 +90,7 @@ flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count) static ssize_t write(struct file *file, const char __user *userbuf, size_t bytes, loff_t *off) { - char *buffer = file->private_data; + struct bin_buffer *bb = file->private_data; struct dentry *dentry = file->f_path.dentry; int size = dentry->d_inode->i_size; loff_t offs = *off; @@ -92,25 +103,38 @@ static ssize_t write(struct file *file, const char __user *userbuf, count = size - offs; } - if (copy_from_user(buffer, userbuf, count)) - return -EFAULT; + mutex_lock(&bb->mutex); + + if (copy_from_user(bb->buffer, userbuf, count)) { + count = -EFAULT; + goto out_unlock; + } - count = flush_write(dentry, buffer, offs, count); + count = flush_write(dentry, bb->buffer, offs, count); if (count > 0) *off = offs + count; + + out_unlock: + mutex_unlock(&bb->mutex); return count; } static int mmap(struct file *file, struct vm_area_struct *vma) { + struct bin_buffer *bb = file->private_data; struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; struct kobject *kobj = to_kobj(file->f_path.dentry->d_parent); + int rc; if (!attr->mmap) return -EINVAL; - return attr->mmap(kobj, attr, vma); + mutex_lock(&bb->mutex); + rc = attr->mmap(kobj, attr, vma); + mutex_unlock(&bb->mutex); + + return rc; } static int open(struct inode * inode, struct file * file) @@ -118,6 +142,7 @@ static int open(struct inode * inode, struct file * file) struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent); struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; + struct bin_buffer *bb = NULL; int error = -EINVAL; if (!kobj || !attr) @@ -135,14 +160,22 @@ static int open(struct inode * inode, struct file * file) goto Error; error = -ENOMEM; - file->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL); - if (!file->private_data) + bb = kzalloc(sizeof(*bb), GFP_KERNEL); + if (!bb) goto Error; + bb->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!bb->buffer) + goto Error; + + mutex_init(&bb->mutex); + file->private_data = bb; + error = 0; -goto Done; + goto Done; Error: + kfree(bb); module_put(attr->attr.owner); Done: if (error) @@ -155,11 +188,12 @@ static int release(struct inode * inode, struct file * file) struct kobject * kobj = to_kobj(file->f_path.dentry->d_parent); struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; - u8 * buffer
[PATCH 14/14] sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- drivers/base/class.c|2 -- drivers/base/core.c |4 drivers/base/firmware_class.c |2 +- drivers/block/pktcdvd.c |3 +-- drivers/char/ipmi/ipmi_msghandler.c | 10 -- drivers/cpufreq/cpufreq_stats.c |3 +-- drivers/cpufreq/cpufreq_userspace.c |2 +- drivers/cpufreq/freq_table.c|1 - drivers/firmware/dcdbas.h |3 +-- drivers/firmware/dell_rbu.c |6 +++--- drivers/firmware/edd.c |2 +- drivers/firmware/efivars.c |6 +++--- drivers/i2c/chips/eeprom.c |1 - drivers/i2c/chips/max6875.c |1 - drivers/infiniband/core/sysfs.c |1 - drivers/input/mouse/psmouse.h |1 - drivers/media/video/pvrusb2/pvrusb2-sysfs.c | 13 - drivers/misc/asus-laptop.c |3 +-- drivers/pci/hotplug/acpiphp_ibm.c |1 - drivers/pci/pci-sysfs.c |4 drivers/pcmcia/socket_sysfs.c |2 +- drivers/rtc/rtc-ds1553.c|1 - drivers/rtc/rtc-ds1742.c|1 - drivers/scsi/arcmsr/arcmsr_attr.c |3 --- drivers/scsi/lpfc/lpfc_attr.c |2 -- drivers/scsi/qla2xxx/qla_attr.c |6 -- drivers/spi/at25.c |1 - drivers/video/aty/radeon_base.c |2 -- drivers/video/backlight/backlight.c |2 +- drivers/video/backlight/lcd.c |2 +- drivers/w1/slaves/w1_ds2433.c |1 - drivers/w1/slaves/w1_therm.c|1 - drivers/w1/w1.c |2 -- fs/ecryptfs/main.c |2 -- fs/ocfs2/cluster/masklog.c |1 - fs/partitions/check.c |1 - fs/sysfs/bin.c | 19 +-- fs/sysfs/file.c | 21 + include/linux/sysdev.h |3 +-- include/linux/sysfs.h |7 +++ kernel/module.c |9 +++-- kernel/params.c |1 - net/bridge/br_sysfs_br.c|3 +-- net/bridge/br_sysfs_if.c|3 +-- 44 files changed, 35 insertions(+), 130 deletions(-) diff --git a/drivers/base/class.c b/drivers/base/class.c index d596812..064c1de 100644 --- a/drivers/base/class.c +++ b/drivers/base/class.c @@ -624,7 +624,6 @@ int class_device_add(struct class_device *class_dev) goto out3; class_dev->uevent_attr.attr.name = "uevent"; class_dev->uevent_attr.attr.mode = S_IWUSR; - class_dev->uevent_attr.attr.owner = parent_class->owner; class_dev->uevent_attr.store = store_uevent; error = class_device_create_file(class_dev, &class_dev->uevent_attr); if (error) @@ -639,7 +638,6 @@ int class_device_add(struct class_device *class_dev) } attr->attr.name = "dev"; attr->attr.mode = S_IRUGO; - attr->attr.owner = parent_class->owner; attr->show = show_dev; error = class_device_create_file(class_dev, attr); if (error) { diff --git a/drivers/base/core.c b/drivers/base/core.c index d7fcf82..37930d0 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -563,8 +563,6 @@ int device_add(struct device *dev) dev->uevent_attr.attr.name = "uevent"; dev->uevent_attr.attr.mode = S_IWUSR; - if (dev->driver) - dev->uevent_attr.attr.owner = dev->driver->owner; dev->uevent_attr.store = store_uevent; error = device_create_file(dev, &dev->uevent_attr); if (error) @@ -579,8 +577,6 @@ int device_add(struct device *dev) } attr->attr.name = "dev"; attr->attr.mode = S_IRUGO; - if (dev->driver) - attr->attr.owner = dev->driver->owner; attr->show = show_dev; error = device_create_file(dev, attr); if (
[PATCH 12/14] sysfs: implement sysfs_dirent active reference and immediate disconnect
Opening a sysfs node references its associated kobject, so userland can arbitrarily prolong lifetime of a kobject which complicates lifetime rules in drivers. This patch implements active reference and makes the association between kobject and sysfs immediately breakable. Now each sysfs_dirent has two reference counts - s_count and s_active. s_count is a regular reference count which guarantees that the containing sysfs_dirent is accessible. As long as s_count reference is held, all sysfs internal fields in sysfs_dirent are accessible including s_parent and s_name. The newly added s_active is active reference count. This is acquired by invoking sysfs_get_active() and it's the caller's responsibility to ensure sysfs_dirent itself is accessible (should be holding s_count one way or the other). Dereferencing sysfs_dirent to access objects out of sysfs proper requires active reference. This includes access to the associated kobjects, attributes and ops. The active references can be drained and denied by calling sysfs_deactivate(). All sysfs_dirents must be deactivated after deletion but before the default reference is dropped. This enables immediate disconnect of sysfs nodes. Once a sysfs_dirent is deleted, it won't access any entity external to sysfs proper. Because attr/bin_attr ops access both the node itself and its parent for kobject, they need to hold active references to both. sysfs_get/put_active_two() helpers are provided to help grabbing both references. Parent's is acquired first and released last. Unlike other operations, mmapped area lingers on after mmap() is finished and the module implement implementing it and kobj need to stay referenced till all the mapped pages are gone. This is accomplished by holding one set of active references to the bin_attr and its parent if there have been any mmap during lifetime of an openfile. The references are dropped when the openfile is released. This change makes sysfs lifetime rules independent from both kobject's and module's. It not only fixes several race conditions caused by sysfs not holding onto the proper module when referencing kobject, but also helps fixing and simplifying lifetime management in driver model and drivers by taking sysfs out of the equation. Please read the following message for more info. http://article.gmane.org/gmane.linux.kernel/510293 Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/bin.c | 95 ++-- fs/sysfs/dir.c | 18 ++- fs/sysfs/file.c | 130 +++-- fs/sysfs/inode.c |8 +++- fs/sysfs/sysfs.h | 107 +++- 5 files changed, 245 insertions(+), 113 deletions(-) diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c index 1dd1bf1..69bb8da 100644 --- a/fs/sysfs/bin.c +++ b/fs/sysfs/bin.c @@ -23,6 +23,7 @@ struct bin_buffer { struct mutexmutex; void*buffer; + int mmapped; }; static int @@ -30,12 +31,20 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) { struct sysfs_dirent *attr_sd = dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; - struct kobject * kobj = to_kobj(dentry->d_parent); + struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj; + int rc; + + /* need attr_sd for attr, its parent for kobj */ + if (!sysfs_get_active_two(attr_sd)) + return -ENODEV; - if (!attr->read) - return -EIO; + rc = -EIO; + if (attr->read) + rc = attr->read(kobj, buffer, off, count); - return attr->read(kobj, buffer, off, count); + sysfs_put_active_two(attr_sd); + + return rc; } static ssize_t @@ -79,12 +88,20 @@ flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count) { struct sysfs_dirent *attr_sd = dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; - struct kobject *kobj = to_kobj(dentry->d_parent); + struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj; + int rc; + + /* need attr_sd for attr, its parent for kobj */ + if (!sysfs_get_active_two(attr_sd)) + return -ENODEV; - if (!attr->write) - return -EIO; + rc = -EIO; + if (attr->write) + rc = attr->write(kobj, buffer, offset, count); - return attr->write(kobj, buffer, offset, count); + sysfs_put_active_two(attr_sd); + + return rc; } static ssize_t write(struct file *file, const char __user *userbuf, @@ -124,14 +141,24 @@ static int mmap(struct file *file, struct vm_area_struct *vma) struct bin_buffer *bb = file->private_data; struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr; - struct kobject *kobj =
[PATCH 13/14] sysfs: kill attribute file orphaning
Now that sysfs_dirent can be disconnected from kobject on deletion, there is no need to orphan each attribute files. All [bin_]attribute nodes are automatically orphaned when the parent node is deleted. Kill attribute file orphaning. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/file.c | 65 ++--- fs/sysfs/inode.c | 25 fs/sysfs/mount.c |8 -- fs/sysfs/sysfs.h | 16 - 4 files changed, 13 insertions(+), 101 deletions(-) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 6dd11ca..37b5ee5 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -51,29 +51,15 @@ static struct sysfs_ops subsys_sysfs_ops = { .store = subsys_attr_store, }; -/** - * add_to_collection - add buffer to a collection - * @buffer:buffer to be added - * @node: inode of set to add to - */ - -static inline void -add_to_collection(struct sysfs_buffer *buffer, struct inode *node) -{ - struct sysfs_buffer_collection *set = node->i_private; - - mutex_lock(&node->i_mutex); - list_add(&buffer->associates, &set->associates); - mutex_unlock(&node->i_mutex); -} - -static inline void -remove_from_collection(struct sysfs_buffer *buffer, struct inode *node) -{ - mutex_lock(&node->i_mutex); - list_del(&buffer->associates); - mutex_unlock(&node->i_mutex); -} +struct sysfs_buffer { + size_t count; + loff_t pos; + char* page; + struct sysfs_ops* ops; + struct semaphoresem; + int needs_read_fill; + int event; +}; /** * fill_read_buffer - allocate and fill buffer from object. @@ -175,10 +161,7 @@ sysfs_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) down(&buffer->sem); if (buffer->needs_read_fill) { - if (buffer->orphaned) - retval = -ENODEV; - else - retval = fill_read_buffer(file->f_path.dentry,buffer); + retval = fill_read_buffer(file->f_path.dentry,buffer); if (retval) goto out; } @@ -276,16 +259,11 @@ sysfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t ssize_t len; down(&buffer->sem); - if (buffer->orphaned) { - len = -ENODEV; - goto out; - } len = fill_write_buffer(buffer, buf, count); if (len > 0) len = flush_write_buffer(file->f_path.dentry, buffer, len); if (len > 0) *ppos += len; -out: up(&buffer->sem); return len; } @@ -295,7 +273,6 @@ static int sysfs_open_file(struct inode *inode, struct file *file) struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct attribute *attr = attr_sd->s_elem.attr.attr; struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj; - struct sysfs_buffer_collection *set; struct sysfs_buffer * buffer; struct sysfs_ops * ops = NULL; int error; @@ -319,26 +296,14 @@ static int sysfs_open_file(struct inode *inode, struct file *file) else ops = &subsys_sysfs_ops; + error = -EACCES; + /* No sysfs operations, either from having no subsystem, * or the subsystem have no operations. */ - error = -EACCES; if (!ops) goto err_mput; - /* make sure we have a collection to add our buffers to */ - mutex_lock(&inode->i_mutex); - if (!(set = inode->i_private)) { - error = -ENOMEM; - if (!(set = inode->i_private = kmalloc(sizeof(struct sysfs_buffer_collection), GFP_KERNEL))) - goto err_mput; - else - INIT_LIST_HEAD(&set->associates); - } - mutex_unlock(&inode->i_mutex); - - error = -EACCES; - /* File needs write support. * The inode's perms must say it's ok, * and we must have a store method. @@ -365,11 +330,9 @@ static int sysfs_open_file(struct inode *inode, struct file *file) if (!buffer) goto err_mput; - INIT_LIST_HEAD(&buffer->associates); init_MUTEX(&buffer->sem); buffer->needs_read_fill = 1; buffer->ops = ops; - add_to_collection(buffer, inode); file->private_data = buffer; /* open succeeded, put active references and pin attr_sd */ @@ -388,10 +351,8 @@ static int sysfs_release(struct inode * inode, struct file * filp) { struct sysfs_dirent *attr_sd = filp->f_path.dentry->d_fsdata; struct attribute *attr = attr_sd->s_elem.attr.attr; - struct sysfs_buffer * buffer = filp->private_data; + struct sysfs_buffer *buffer = filp->private_data; -
[PATCH 07/14] sysfs: add sysfs_dirent->s_name
Add s_name to sysfs_dirent. This is to further reduce dependency to the associated dentry. Name is copied for directories and symlinks but not for attributes. Where possible, name dereferences are converted to use sd->s_name. sysfs_symlink->link_name and sysfs_get_name() are unused now and removed. This change allows symlink to be implemented using sysfs_dirent tree proper, which is the last remaining dentry-dependent sysfs walk. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c| 33 + fs/sysfs/file.c |2 +- fs/sysfs/inode.c | 33 + fs/sysfs/symlink.c|8 +--- fs/sysfs/sysfs.h |7 +++ include/linux/sysfs.h |1 + 6 files changed, 28 insertions(+), 56 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 8c35a60..525c0e7 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -23,10 +23,11 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) if (sd->s_type & SYSFS_KOBJ_LINK) { struct sysfs_symlink * sl = sd->s_element; - kfree(sl->link_name); kobject_put(sl->target_kobj); kfree(sl); } + if (sd->s_type & SYSFS_COPY_NAME) + kfree(sd->s_name); kfree(sd->s_iattr); kmem_cache_free(sysfs_dir_cachep, sd); @@ -51,19 +52,30 @@ static struct dentry_operations sysfs_dentry_ops = { .d_iput = sysfs_d_iput, }; -struct sysfs_dirent *sysfs_new_dirent(void *element, umode_t mode, int type) +struct sysfs_dirent *sysfs_new_dirent(const char *name, void *element, + umode_t mode, int type) { + char *dup_name = NULL; struct sysfs_dirent * sd; + if (type & SYSFS_COPY_NAME) { + name = dup_name = kstrdup(name, GFP_KERNEL); + if (!name) + return NULL; + } + sd = kmem_cache_zalloc(sysfs_dir_cachep, GFP_KERNEL); - if (!sd) + if (!sd) { + kfree(dup_name); return NULL; + } atomic_set(&sd->s_count, 1); atomic_set(&sd->s_event, 1); INIT_LIST_HEAD(&sd->s_children); INIT_LIST_HEAD(&sd->s_sibling); + sd->s_name = name; sd->s_element = element; sd->s_mode = mode; sd->s_type = type; @@ -100,8 +112,7 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd, list_for_each_entry(sd, &parent_sd->s_children, s_sibling) { if (sd->s_element) { - const unsigned char *existing = sysfs_get_name(sd); - if (strcmp(existing, new)) + if (strcmp(sd->s_name, new)) continue; else return -EEXIST; @@ -155,7 +166,7 @@ static int create_dir(struct kobject *kobj, struct dentry *parent, goto out_dput; error = -ENOMEM; - sd = sysfs_new_dirent(kobj, mode, SYSFS_DIR); + sd = sysfs_new_dirent(name, kobj, mode, SYSFS_DIR); if (!sd) goto out_drop; sysfs_attach_dirent(sd, parent->d_fsdata, dentry); @@ -280,9 +291,7 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry, list_for_each_entry(sd, &parent_sd->s_children, s_sibling) { if (sd->s_type & SYSFS_NOT_PINNED) { - const unsigned char * name = sysfs_get_name(sd); - - if (strcmp(name, dentry->d_name.name)) + if (strcmp(sd->s_name, dentry->d_name.name)) continue; if (sd->s_type & SYSFS_KOBJ_LINK) @@ -472,7 +481,7 @@ static int sysfs_dir_open(struct inode *inode, struct file *file) struct sysfs_dirent * sd; mutex_lock(&dentry->d_inode->i_mutex); - sd = sysfs_new_dirent(NULL, 0, 0); + sd = sysfs_new_dirent("_DIR_", NULL, 0, 0); if (sd) sysfs_attach_dirent(sd, parent_sd, NULL); mutex_unlock(&dentry->d_inode->i_mutex); @@ -539,7 +548,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) if (!next->s_element) continue; - name = sysfs_get_name(next); + name = next->s_name; len = strlen(name); ino = (unsigned long)next; @@ -651,7 +660,7 @@ struct dentry *sysfs_create_shadow_dir(struct kobject *kobj) if (!shadow) goto nomem; - sd = sysfs_new_dirent(kobj, inode->i_mode, SYSFS_DIR); + sd = sysfs_new_dirent("_SHADOW_", kobj, inode->i_mode, SYSFS_DIR); if (!sd) goto nomem; /* point to parent_sd but don't attach to it */ diff --git a/fs/sysfs/file.
[PATCH 10/14] sysfs: reimplement symlink using sysfs_dirent tree
sysfs symlink is implemented by referencing dentry and kobject from sysfs_dirent - symlink entry references kobject, dentry is used to walk the tree. This complicates object lifetimes rules and is dangerous - for example, there is no way to tell to which module the target of a symlink belongs and referencing that kobject can make it linger after the module is gone. This patch reimplements symlink using only sysfs_dirent tree. sd for a symlink points and holds reference to the target sysfs_dirent and all walking is done using sysfs_dirent tree. Simpler and safer. Please read the following message for more info. http://article.gmane.org/gmane.linux.kernel/510293 Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c |2 +- fs/sysfs/symlink.c | 88 +++ fs/sysfs/sysfs.h |9 +++-- 3 files changed, 53 insertions(+), 46 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 707eba9..5b337c7 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -23,7 +23,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) parent_sd = sd->s_parent; if (sd->s_type & SYSFS_KOBJ_LINK) - kobject_put(sd->s_elem.symlink.target_kobj); + sysfs_put(sd->s_elem.symlink.target_sd); if (sd->s_type & SYSFS_COPY_NAME) kfree(sd->s_name); kfree(sd->s_iattr); diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c index 27df635..ff605d3 100644 --- a/fs/sysfs/symlink.c +++ b/fs/sysfs/symlink.c @@ -11,50 +11,49 @@ #include "sysfs.h" -static int object_depth(struct kobject * kobj) +static int object_depth(struct sysfs_dirent *sd) { - struct kobject * p = kobj; int depth = 0; - do { depth++; } while ((p = p->parent)); + + for (; sd->s_parent; sd = sd->s_parent) + depth++; + return depth; } -static int object_path_length(struct kobject * kobj) +static int object_path_length(struct sysfs_dirent * sd) { - struct kobject * p = kobj; int length = 1; - do { - length += strlen(kobject_name(p)) + 1; - p = p->parent; - } while (p); + + for (; sd->s_parent; sd = sd->s_parent) + length += strlen(sd->s_name) + 1; + return length; } -static void fill_object_path(struct kobject * kobj, char * buffer, int length) +static void fill_object_path(struct sysfs_dirent *sd, char *buffer, int length) { - struct kobject * p; - --length; - for (p = kobj; p; p = p->parent) { - int cur = strlen(kobject_name(p)); + for (; sd->s_parent; sd = sd->s_parent) { + int cur = strlen(sd->s_name); /* back up enough to print this bus id with '/' */ length -= cur; - strncpy(buffer + length,kobject_name(p),cur); + strncpy(buffer + length, sd->s_name, cur); *(buffer + --length) = '/'; } } -static int sysfs_add_link(struct dentry * parent, const char * name, struct kobject * target) +static int sysfs_add_link(struct sysfs_dirent * parent_sd, const char * name, + struct sysfs_dirent * target_sd) { - struct sysfs_dirent * parent_sd = parent->d_fsdata; struct sysfs_dirent * sd; sd = sysfs_new_dirent(name, S_IFLNK|S_IRWXUGO, SYSFS_KOBJ_LINK); if (!sd) return -ENOMEM; - sd->s_elem.symlink.target_kobj = kobject_get(target); + sd->s_elem.symlink.target_sd = target_sd; sysfs_attach_dirent(sd, parent_sd, NULL); return 0; } @@ -68,6 +67,8 @@ static int sysfs_add_link(struct dentry * parent, const char * name, struct kobj int sysfs_create_link(struct kobject * kobj, struct kobject * target, const char * name) { struct dentry *dentry = NULL; + struct sysfs_dirent *parent_sd = NULL; + struct sysfs_dirent *target_sd = NULL; int error = -EEXIST; BUG_ON(!name); @@ -80,11 +81,27 @@ int sysfs_create_link(struct kobject * kobj, struct kobject * target, const char if (!dentry) return -EFAULT; + parent_sd = dentry->d_fsdata; + + /* target->dentry can go away beneath us but is protected with +* kobj_sysfs_assoc_lock. Fetch target_sd from it. +*/ + spin_lock(&kobj_sysfs_assoc_lock); + if (target->dentry) + target_sd = sysfs_get(target->dentry->d_fsdata); + spin_unlock(&kobj_sysfs_assoc_lock); + + if (!target_sd) + return -ENOENT; mutex_lock(&dentry->d_inode->i_mutex); if (!sysfs_dirent_exist(dentry->d_fsdata, name)) - error = sysfs_add_link(dentry, name, target); + error = sysfs_add_link(parent_sd, name, target_sd); mutex_unlock(&dentry->d_inode->i_mutex); + + if (error) + sysfs_put(target_sd); + return error; } @@ -100,14 +117,14 @@ v
[PATCH 03/14] sysfs: move release_sysfs_dirent() to dir.c
There is no reason this function should be inlined and soon to follow sysfs object reference simplification will make it heavier. Move it to dir.c. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c | 12 fs/sysfs/sysfs.h | 13 + 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 5112f88..2d630bf 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -14,6 +14,18 @@ DECLARE_RWSEM(sysfs_rename_sem); +void release_sysfs_dirent(struct sysfs_dirent * sd) +{ + if (sd->s_type & SYSFS_KOBJ_LINK) { + struct sysfs_symlink * sl = sd->s_element; + kfree(sl->link_name); + kobject_put(sl->target_kobj); + kfree(sl); + } + kfree(sd->s_iattr); + kmem_cache_free(sysfs_dir_cachep, sd); +} + static void sysfs_d_iput(struct dentry * dentry, struct inode * inode) { struct sysfs_dirent * sd = dentry->d_fsdata; diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h index a77c57e..3b8aae0 100644 --- a/fs/sysfs/sysfs.h +++ b/fs/sysfs/sysfs.h @@ -17,6 +17,7 @@ extern void sysfs_delete_inode(struct inode *inode); extern struct inode * sysfs_new_inode(mode_t mode, struct sysfs_dirent *); extern int sysfs_create(struct dentry *, int mode, int (*init)(struct inode *)); +extern void release_sysfs_dirent(struct sysfs_dirent * sd); extern int sysfs_dirent_exist(struct sysfs_dirent *, const unsigned char *); extern int sysfs_make_dirent(struct sysfs_dirent *, struct dentry *, void *, umode_t, int); @@ -97,18 +98,6 @@ static inline struct kobject *sysfs_get_kobject(struct dentry *dentry) return kobj; } -static inline void release_sysfs_dirent(struct sysfs_dirent * sd) -{ - if (sd->s_type & SYSFS_KOBJ_LINK) { - struct sysfs_symlink * sl = sd->s_element; - kfree(sl->link_name); - kobject_put(sl->target_kobj); - kfree(sl); - } - kfree(sd->s_iattr); - kmem_cache_free(sysfs_dir_cachep, sd); -} - static inline struct sysfs_dirent * sysfs_get(struct sysfs_dirent * sd) { if (sd) { -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET #master] sysfs: make sysfs disconnect immediately on deletion, take 2
Hello, all. This is the second take of sysfs-immediate-disconnct patchset. In the last take, rwsem was added to s_elem.dir to protect kobj only. This wasn't enough because attr and bin_attr need to hold onto not only the kobject of their parents but also the module backing themselves and ops too, so the first set still needed separate and duplicate attribute file orphaning mechanism. In this take, the rwsem is generalized to become active reference count. Now each sysfs_dirent has two reference counts - s_count and s_active. s_count is a regular reference count which guarantees that the containing sysfs_dirent is accessible. As long as s_count reference is held, all sysfs internal fields in sysfs_dirent are accessible including s_parent and s_name. The newly added s_active is active reference count. This is acquired by invoking sysfs_get_active() and it's the caller's responsibility to ensure sysfs_dirent itself is accessible (should be holding s_count one way or the other). Dereferencing sysfs_dirent to access objects out of sysfs proper requires active reference. This includes access to the associated kobjects, attributes and ops. Because attr/bin_attr ops access both the node itself and its parent for kobject, they need to hold active references to both. sysfs_get/put_active_two() helpers are provided to help grabbing both references. Parent's is acquired first and released last. Basically, s_count provides the reference counted objects to the upper layer while s_active guards low level access such that low level objects can just go away when they want to, and the same mechanism is applied to all types of sysfs nodes. I think it's conceptually cleaner and thus easier to understand this way. With all the patches applied, the same test used in the last take ran 9+hrs without any problem. Change from the last take are... * Patch 3 now doesn't move sysfs_get_kobject() as the it's replaced by active references later. * Patch 12 updated such that sdir->rwsem is generalized into active reference count. Please read the original lifetime rules discussion[1] and description of the last take[2] for more info. Thanks. -- tejun [1] http://thread.gmane.org/gmane.linux.kernel/510293 [2] http://thread.gmane.org/gmane.linux.kernel/513334 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/14] sysfs: flatten cleanup paths in sysfs_add_link() and create_dir()
Flatten cleanup paths in sysfs_add_link() and create_dir() to improve readability and ease further changes to these functions. This is in preparation of object reference simplification. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- fs/sysfs/dir.c | 73 ++- fs/sysfs/symlink.c | 27 ++ 2 files changed, 58 insertions(+), 42 deletions(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 2d630bf..0005117 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -159,40 +159,53 @@ static int init_symlink(struct inode * inode) return 0; } -static int create_dir(struct kobject * k, struct dentry * p, - const char * n, struct dentry ** d) +static int create_dir(struct kobject *kobj, struct dentry *parent, + const char *name, struct dentry **p_dentry) { int error; umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO; + struct dentry *dentry; + struct sysfs_dirent *sd; - mutex_lock(&p->d_inode->i_mutex); - *d = lookup_one_len(n, p, strlen(n)); - if (!IS_ERR(*d)) { - if (sysfs_dirent_exist(p->d_fsdata, n)) - error = -EEXIST; - else - error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, - SYSFS_DIR); - if (!error) { - error = sysfs_create(*d, mode, init_dir); - if (!error) { - inc_nlink(p->d_inode); - (*d)->d_op = &sysfs_dentry_ops; - d_rehash(*d); - } - } - if (error && (error != -EEXIST)) { - struct sysfs_dirent *sd = (*d)->d_fsdata; - if (sd) { - list_del_init(&sd->s_sibling); - sysfs_put(sd); - } - d_drop(*d); - } - dput(*d); - } else - error = PTR_ERR(*d); - mutex_unlock(&p->d_inode->i_mutex); + mutex_lock(&parent->d_inode->i_mutex); + + dentry = lookup_one_len(name, parent, strlen(name)); + if (IS_ERR(dentry)) { + error = PTR_ERR(dentry); + goto out_unlock; + } + + error = -EEXIST; + if (sysfs_dirent_exist(parent->d_fsdata, name)) + goto out_dput; + + error = sysfs_make_dirent(parent->d_fsdata, dentry, kobj, mode, + SYSFS_DIR); + if (error) + goto out_drop; + + error = sysfs_create(dentry, mode, init_dir); + if (error) + goto out_sput; + + inc_nlink(parent->d_inode); + dentry->d_op = &sysfs_dentry_ops; + d_rehash(dentry); + + *p_dentry = dentry; + error = 0; + goto out_dput; + + out_sput: + sd = dentry->d_fsdata; + list_del_init(&sd->s_sibling); + sysfs_put(sd); + out_drop: + d_drop(dentry); + out_dput: + dput(dentry); + out_unlock: + mutex_unlock(&parent->d_inode->i_mutex); return error; } diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c index 7b9c5bf..b463f17 100644 --- a/fs/sysfs/symlink.c +++ b/fs/sysfs/symlink.c @@ -49,30 +49,33 @@ static int sysfs_add_link(struct dentry * parent, const char * name, struct kobj { struct sysfs_dirent * parent_sd = parent->d_fsdata; struct sysfs_symlink * sl; - int error = 0; + int error; error = -ENOMEM; - sl = kmalloc(sizeof(*sl), GFP_KERNEL); + sl = kzalloc(sizeof(*sl), GFP_KERNEL); if (!sl) - goto exit1; + goto err_out; sl->link_name = kmalloc(strlen(name) + 1, GFP_KERNEL); if (!sl->link_name) - goto exit2; + goto err_out; strcpy(sl->link_name, name); sl->target_kobj = kobject_get(target); error = sysfs_make_dirent(parent_sd, NULL, sl, S_IFLNK|S_IRWXUGO, SYSFS_KOBJ_LINK); - if (!error) - return 0; - - kobject_put(target); - kfree(sl->link_name); -exit2: - kfree(sl); -exit1: + if (error) + goto err_out; + + return 0; + + err_out: + if (sl) { + kobject_put(sl->target_kobj); + kfree(sl->link_name); + kfree(sl); + } return error; } -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Sun, 2007-04-08 at 13:56 -0400, Gene Heskett wrote: > On Sunday 08 April 2007, Mike Galbraith wrote: > >On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote: > >> That seems to be the killer loading here, building a kernel (make -j3) > >> doesn't seem to lag it all that bad. One session of gzip -best makes > >> it fall plumb over though, which was a disappointment. > > > >Can you make a testcase that doesn't require amanda? > > > > -Mike > > Sure. Try 'tar czf nameofarchive.tar.gz /path/to-dir-to-be-backed-up' > > Or, from the runtar log from this morning, and this is all one line: > > runtar.20070408022016.debug:running: /bin/tar: 'gtar' '--create' '--file' '-' > '--directory' '/usr/dlds-rpms' '--one-file-system' '--listed-incremental' > '/usr/local/var/amanda/gnutar-lists/coyote_usr_dlds-rpms_1.new' '--sparse' > '--ignore-failed-read' '--totals' '--exclude-from' > '/tmp/amanda/sendbackup._usr_dlds-rpms.20070408022016.exclude' '.' > > and amanda will if requested, pipe that output through a |gzip -best, and > its this process that brings the machine to the table begging for scraps > like a puppy. Tar by itself can be felt but isn't bad. So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the problem? -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
REISER4 FOR INCLUSION IN THE LINUX KERNEL.
REISER4 FOR INCLUSION IN THE LINUX KERNEL. Dave Lynch takes a reasoned approach to REISER4. Dave Lynch wrote: > > Jeff Garzik wrote: > > > > If the compelling reason is that it needs a test, I'd say its not ready. > > > > Can you please elaborate ? I am not sure I understand what you are > arguing ? Jeff Garzik is "saying" that he wants REISER4 to stay out of the main kernel, for reasons he is not willing to tell you. > Despite his substantially less than polite rhetoric, I have read > Hans's post from months if not years ago. > > Aside from the pissing contests - which where not entirely one > sided, On the basis of what I have seen here,... Hans Reiser was probably an angel. > I actually beleive that Hans made a reasonable case that > Reiser4 had gone about as far as it could reasonably go with regard > to testing, robustness, ... without the broader base of use that > even an experimental filesystem in distribution tree would get. Of course, this is an entirely reasonable request of Reiser's. One meet with an array of unreasonable actions, but mainly STALLING which has led to REISER4 never becoming part of the main kernel. It has also lead to many false claims about REISER4. Claims that are never backed up with solid statistics, but used to keep REISER4 out of the kernel and tar its reputation. > I for one would at least play with it if it were in the distribution > tree. I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY. > As far as I could tell Hans pretty much everything else that > was demanded. Hans eventually caved and provided - albeit with much > pissing and moaning, and holy than thou rhetoric. It was not his pissing and moaning, etc,... these were just excuses to keep REISER4 from succeeding. The truth is, that any excuse would do. The real reasons are financial and backed by big money (sometimes, big egos). > The argument that anything that needs testing can't get into the > distribution tree's is specious. There is alot of poorly tested crap in > the distribution trees. Yes, the argument that anything that needs testing can't get in is indeed stupid. But stupid things often work. Is REISER4 in the kernel? Is REISER4 a success? > But separately, there is the issue of scale. Namesys claims that > they have no currently know bugs, faults ... - with their base of > internal and external users. > > I would fully expect new failures to crop up with any filesystem, > driver, ... moving up an order of magnitude in users. > > Are you going to subject all filesystems and drivers to the same > high standards you are placing on Reiser4 ? If so then we need to strip > the distribution tree now. No. Only those things that threaten big money. > I am not looking to defend Hans - he is likely to be in jail and no > longer a factor for a long time. Nor am I looking to make or support > claims for Reiser4. Why not defend Hans? He is in jail on what appear to be trumped-up charges, just like the trumped-up complainants about his filesystem. > But I am asking - why we can not get past the bad blood, rhetoric, > and zealotry -which to my eyes has not been all one sided. Money talks, BS walks. Reiser4 is a little guy. You should play in my league. > I am NOT looking for a technical explanation of all the relative > merits and demerits of Reiser4. > > I do not care for arguments about whether it compresses 0's well, or > that tail combining is a bad thing. They may have merit, but there is > not a filesystem that is going to be all things to all people. Yeap. > Whether Reiser4 is a small niche filesystem or a significant general > use one, is a decision that should be reached by its performance > in practice, not it rhetoric. Regardless, even as a niche filesystem, > I beleive at this point it merits inclusion. Yeap. REISER4 merits inclusion. John. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - The professional email service - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Kprobes: The ON/OFF knob thru debugfs
On Sun, Apr 08, 2007 at 11:22:31AM +0100, Christoph Hellwig wrote: > On Wed, Apr 04, 2007 at 05:43:49PM +0530, Ananth N Mavinakayanahalli wrote: > > This patch provides a debugfs knob to turn kprobes on/off > > > > o A new file /debug/kprobes/enabled indicates if kprobes is enabled or > > not (default enabled) > > o Echoing 0 to this file will disarm all installed probes > > o Any new probe registration when disabled will register the probe but > > not arm it. A message will be printed out in such a case. > > o When a value 1 is echoed to the file, all probes (including ones > > registered in the intervening period) will be enabled > > o Unregistration will happen irrespective of whether probes are globally > > enabled or not. > > o Update Documentation/kprobes.txt to reflect these changes. While there > > also update the doc to make it current. > > Looks good. > > When I suggested a user interface to enable/disable probes was nice to > have I was more thinking about a interface to enable/disable individual > probes. Any chance you could try to implement that aswell as see if > any code can be shared with this feature? Thats on the TODO list - any preferences on what the debugfs control should look like? One file per kprobe seems simplest, but it'd be unwieldly if there are hundreds of active probes. > > - arch_arm_kprobe(p); > > + arch_arm_kprobe(p); > > + } else > > + printk("Kprobes are globally disabled. This kprobe [@ %p] " > > + "will be enabled with all other probes\n", p->addr); > > This printk seems far too verbose. Just remove it and make sure > the debugfs interface has an indicator of whether probes are en- or > disabled. Agreed... and "enabled" file is the indicator. Andrew, please include this incremental patch against 2.6.21-rc6-mm1 that removes the verbose printk. o Remove verbose printk during registration with kprobes globally disabled o Print out a message when kprobes are enabled/disabled globally Signed-off-by: Ananth N Mavinakyanahalli <[EMAIL PROTECTED]> --- kernel/kprobes.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) Index: linux-2.6.21-rc6/kernel/kprobes.c === --- linux-2.6.21-rc6.orig/kernel/kprobes.c +++ linux-2.6.21-rc6/kernel/kprobes.c @@ -574,10 +574,7 @@ static int __kprobes __register_kprobe(s register_page_fault_notifier(&kprobe_page_fault_nb); arch_arm_kprobe(p); - } else - printk("Kprobes are globally disabled. This kprobe [@ %p] " - "will be enabled with all other probes\n", p->addr); - + } out: mutex_unlock(&kprobe_mutex); @@ -928,6 +925,7 @@ static void __kprobes enable_all_kprobes } kprobe_enabled = true; + printk("Kprobes globally enabled\n"); already_enabled: mutex_unlock(&kprobe_mutex); @@ -948,6 +946,7 @@ static void __kprobes disable_all_kprobe goto already_disabled; kprobe_enabled = false; + printk("Kprobes globally disabled\n"); for (i = 0; i < KPROBE_TABLE_SIZE; i++) { head = &kprobe_table[i]; hlist_for_each_entry_rcu(p, node, head, hlist) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Monday 09 April 2007, Mike Galbraith wrote: >On Sun, 2007-04-08 at 13:04 -0400, Gene Heskett wrote: >> On Sunday 08 April 2007, Ingo Molnar wrote: >> >and note that a year ago Mike did a larger patch too, not unlike his >> >current patch - but we hoped that his smaller change would be >> > sufficient - and nobody came along and said "i tested Mike's and the >> > difference is significant on my system". >> >> May I suggest that while it may have been noticeable, it was >> not 'significant', so we didn't sing praises and bow to mecca at the >> time. > >Actually, there was practically nil interest in testing. We made a >couple of minor adjustments to the interactivity logic, and all went >quiet, so I didn't think it was enough of a problem to require more >intrusive countermeasures. > > -Mike Does one of these messages have a url so I can test the latest of your patches for -rc6? Or was the one Ingo sent the most recent? Putting that url in your sig would be nice, and might result in its getting a lot more exersize which should = more feedback. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Got a complaint about the Internal Revenue Service? Call the convenient toll-free "IRS Taxpayer Complaint Hot Line Number": 1-800-AUDITME - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Sun, 2007-04-08 at 13:04 -0400, Gene Heskett wrote: > On Sunday 08 April 2007, Ingo Molnar wrote: > >and note that a year ago Mike did a larger patch too, not unlike his > >current patch - but we hoped that his smaller change would be sufficient > >- and nobody came along and said "i tested Mike's and the difference is > >significant on my system". > > May I suggest that while it may have been noticeable, it was > not 'significant', so we didn't sing praises and bow to mecca at the > time. Actually, there was practically nil interest in testing. We made a couple of minor adjustments to the interactivity logic, and all went quiet, so I didn't think it was enough of a problem to require more intrusive countermeasures. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add a norecovery option to ext3/4?
Samuel Thibault wrote: Hm, so the root cause there seems that the installer found 2 legs of a mirror and mounted them independently, recovering them independently... But why did that cause problems? Because that thrashed his data (or at least it didn't help to keep data safe). Other options you may have in the installer, though, is to check for md superblocks before mounting bare partitions, or maybe use the BLKROSET ioctl to set the block device to read-only prior to mount, for added insurance... That's one the things proposed in the bugreport yes. The reason I suggest other options is because intentionally mounting a corrupted FS may not really be the way you want to go... norecovery on xfs at least is an option of last resort, not something to use by default. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add a norecovery option to ext3/4?
Eric Sandeen, le Sun 08 Apr 2007 22:24:50 -0500, a écrit : > Samuel Thibault wrote: > >Distribution installers usually try to probe OSes for building a suited > >grub menu. Unfortunately, mounting an ext3 partition, even in read-only > >mode, does perform some operations on the filesystem (log recovery). > >This is not a good idea since it may silently garbage data. > > Can you elaborate? Under what circumstances is log replay going to harm > data? Do you mean that the installer mounts partitions, looking for > what OS is installed? How is that harmful? > > Ohhh... this is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417407 > isn't it? Yes. > Hm, so the root cause there seems that the installer found 2 legs of a > mirror and mounted them independently, recovering them independently... > But why did that cause problems? Because that thrashed his data (or at least it didn't help to keep data safe). > Other options you may have in the installer, though, is to check for > md superblocks before mounting bare partitions, or maybe use the > BLKROSET ioctl to set the block device to read-only prior to mount, > for added insurance... That's one the things proposed in the bugreport yes. Samuel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add a norecovery option to ext3/4?
Samuel Thibault wrote: Hi, Distribution installers usually try to probe OSes for building a suited grub menu. Unfortunately, mounting an ext3 partition, even in read-only mode, does perform some operations on the filesystem (log recovery). This is not a good idea since it may silently garbage data. Can you elaborate? Under what circumstances is log replay going to harm data? Do you mean that the installer mounts partitions, looking for what OS is installed? How is that harmful? Ohhh... this is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417407 isn't it? Hm, so the root cause there seems that the installer found 2 legs of a mirror and mounted them independently, recovering them independently... But why did that cause problems? XFS has a norecovery option that allows to disable that, I'd say ext3/4 should have it too. The xfs mount option is useful on a purely read-only device, or if the log is corrupted to the point where it can't be replayed... It was put in place 9+ years ago. :) I'd have to ask the sgi guys to dig & see what the original use for... It'd be easy enough to add to ext3/4, I suppose. Other options you may have in the installer, though, is to check for md superblocks before mounting bare partitions, or maybe use the BLKROSET ioctl to set the block device to read-only prior to mount, for added insurance... -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
On Mon, Apr 02, 2007 at 10:51:27PM +0200, Pavel Machek wrote: > > > > > > Should we create CONFIG_FREEZER? > > > > Why do you think so? I think the freezer should be compiled automatically > > if any of the above is set, which is what this directive really means. > > Kconfig can do that. ("select statement"). If we have one such ifdef, > it is okay, but if it would be more of them. > Ok. > > > Eh? Why does kprobes code depend on config_pm? > > > > Because it uses the freezer? ;-) > > That is no longer true after this patch... Ugly ifdef above makes sure > freezer is there for kprobes. I'm trying to say that #if above is > now broken. Actually it was probably always broken, but it just became > more so. I have already removed it from in my version 3. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] CONFIG_PACKET_MMAP should depend on MMU
The option CONFIG_PACKET_MMAP should depend on MMU. Signed-off-by: Aubrey.Li <[EMAIL PROTECTED]> --- net/packet/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/packet/Kconfig b/net/packet/Kconfig index 34ff93f..959c272 100644 --- a/net/packet/Kconfig +++ b/net/packet/Kconfig @@ -17,7 +17,7 @@ config PACKET config PACKET_MMAP bool "Packet socket: mmapped IO" - depends on PACKET + depends on PACKET && MMU help If you say Y here, the Packet protocol driver will use an IO mechanism that results in faster communication. -- 1.5.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined
On Fri, Apr 06, 2007 at 11:25:00PM -0700, Andrew Morton wrote: > On Fri, 6 Apr 2007 14:41:50 -0700 "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> > wrote: > > > Subject: Avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined > > > > Avoid checking for cpu gone in mm hot path when > > CONFIG_HOTPLUG_CPU is not defined. > > > > Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> > > > > --- > > arch/i386/kernel/smp.c |4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > Index: work/arch/i386/kernel/smp.c > > === > > --- work.orig/arch/i386/kernel/smp.c > > +++ work/arch/i386/kernel/smp.c > > @@ -365,10 +365,12 @@ static void flush_tlb_others(cpumask_t c > > BUG_ON(cpu_isset(smp_processor_id(), cpumask)); > > BUG_ON(!mm); > > > > +#ifdef CONFIG_HOTPLUG_CPU > > /* If a CPU which we ran on has gone down, OK. */ > > cpus_and(cpumask, cpumask, cpu_online_map); > > - if (cpus_empty(cpumask)) > > + if (unlikely(cpus_empty(cpumask))) > > return; > > +#endif > > > > /* > > * i'm not happy about this global shared spinlock in the > > Fair enough. > > The code you're touching in with the original CPU-hotplug-for-i386 patches. > > x86_64 doesn't do it. It handles tlb flushing differently anyway. But I > suspect that x86_64 is just buggy, unless all callers of flush_tlb_others() > have taken care to disable preemption prior to their calculation of the > passed-in cpumask. > > Shudder. Gautham, this is code which we can cheerfully delete when we get > the freezer stuff done. Fortunately, Anil's patch will make it nice and > easy to find again. Ok, I will make a note of this one. If the IO-test results are good, I hope to post the patchset sometime this week. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.21-rc5-git] make /proc/acpi/wakeup more useful
On Sat, 2007-04-07 at 13:08 -0700, David Brownell wrote: > On Friday 06 April 2007 10:01 pm, Greg KH wrote: > > > Are you _sure_ you have a 1-to-1 relationship here? No multiple devices > > pointing to the same acpi node? Or the other way around? If so, you > > are going to have to change the name to be something more unique. > > I've wondered that too. The short answer: APCI only supports 1-1 > here. Right. > It will emit warnings if it tries to bind more than one ACPI > device to a given "real" device ... but errors the other way are > silently ignored. > My understanding is different. First, one "real" device can only have one device.archdata.acpi_handle, which means it can only be bound to one ACPI device. Second, AE_ALREADY_EXISTS will be returned when ACPI tries to bind more than one "real" devices to the same ACPI device. > By adding a warning over this create-links patch, I found that the > system in the $SUBJECT patch (and likely every ACPI system) has > two different nodes that correspond to one ACPI node: > > /sys/devices/pci:00 ... pci root node > /sys/devices/pnp0/00:00 ... id PNP0a03 > /sys/devices/acpi_system:00/device:00/PNP0A03:00 ... ditto > > Arguably that's too many sysfs nodes for one device... > > Plus, there's the issue of flakey ACPI tables; in the $SUBJECT patch > both MDM and AUD nodes exist in the ACPI namespace, but they could > only refer to one PCI device (with MDM as the wakeup source, not AUD > as listed in the table). Or maybe that's another case where the ACPI > code isn't handling the tables as sensibly as it might... > Could you attach this acpidump please? :) Thanks, Rui - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing
On Fri, 06 Apr 2007 17:02:58 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: >> You're too kind. wli's comment on the first version of this patch was >> something along the lines of "this patch causes a surprising amount of >> damage for what little it achieves". On Fri, Apr 06, 2007 at 05:28:44PM -0700, Andrew Morton wrote: > Damn, I wish I'd said that. ISTR it went: On Fri, Feb 16, 2007 at 02:21:07PM -0800, William Lee Irwin III wrote: > The amount of violence this patch manages to commit is phenomenal for > what little it actually does. There are also a number of oddities Cheers. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Scheduler: Improving the scheduler performance.
[EMAIL PROTECTED] wrote: On Sat, 07 Apr 2007 23:42:20 +0600, root said: As we know that, linux scheduler use separate runqueue for every CPU of a multiprocessor system, which having an active and an expired array.If we use only one expired array, then the CPUs of a multiprocessor system will be able to share their expired task via the accumulated expired array, I got this far, and the first thought that popped into my head was: "Wow. This might actually win on a UP or small MP (2-15 CPU). But the lock contention on a big 512-CPU machoflops box is likely going to *suck*". For that matter, my quick eyeballing of the code, although it doesn't *find* any race conditions, doesn't convince me there's any protection taken to make sure there aren't any. Is there some subtle algorithmic trick I'm missing to ensure Nothing Bad Can Happen? Lock contention is going to be the least of your worries. Destroying CPU affinity is the big one I suspect. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
I concur with Eric's assessment. Adding new magic bits to the generic clone path seems like a poor way to cope with kernel threads. I think it's better if kernel thread setup gets less like normal user process setup. I also agree with Eric that PPID of 0 is a very natural way for kernel threads to be displayed. We need to know more about the nature of the compatibility issue in procps to judge whether there is good reason to avoid changing it. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] zap_other_threads: remove unneeded ->exit_signal change
I think that's correct. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] HID bus prototype - 20070408
Hi, It seem the hid-pidff driver also should be sticky. Good luck - Li Yu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Sun, Apr 08, 2007 at 12:28:32PM -0700, H. Peter Anvin wrote: > Theodore Tso wrote: > >It doesn't state explicitly that you can use the telldir cookie() > >after closing the directory stream using closedir() and then reopening > >it using opendir(), but given that it states that results are > >undefined after a rewinddir() --- which is much less violent than a > >closedir()/opendir(), I would definitely argue that an application > >programmer would be very ill-advised to rely on this working. > > > >(Of course, I'd argue that an application programmer shouldn't use > >telldir/seekdir at all.) > > > >Ulrich, is it too late to insert a clarification that the telldir() > >cookie isn't guaranteed to be valid after closedir() *or* rewinddir()? > > More fundamentally, the telldir cookie should never be valid when > applied to a different DIR * (even one that refers to the same directory.) Well, Joern thought that rm -rf might relying on the telldir cookie being valid in precisely that circumstance. If that is true, I'd argue that this is a BUG in GNU coreutils that should be fixed... - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel oops with badly formatted module option
On Sat, 2007-04-07 at 19:47 -0700, Randy Dunlap wrote: > On Sat, 07 Apr 2007 19:21:01 -0500 Larry Finger wrote: > > > With the following line in /etc/modprobe.conf.local: > > > > options bcm43xx fwpostfix = ".fw3" locale=8 > > > > the kernel oops below is generated. I realize that the line should have no > > whitespace around the > > "=", but I do not feel that an oops is the best way to report the syntax > > error. Could there be a more gentle failure? > > > From: Randy Dunlap <[EMAIL PROTECTED]> > > Catch malformed kernel parameter usage of "param = value". > Spaces are not supported, but don't cause a kernel fault on > such usage, just report an error. > > Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> Thanks Randy, I even read your patch before I wrote my own, for a change! Acked-by: Rusty Russell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define EFLAGS_IF
On Fri, 2007-04-06 at 08:39 -0700, H. Peter Anvin wrote: > I will, unless Rusty does, first. No desire to step on each other. Oh no, please, after you! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Oleg Nesterov <[EMAIL PROTECTED]> writes: > On 04/08, Eric W. Biederman wrote: > >> If we are going to have kernel only flags please use an additional >> argument to do_fork and copy_process. > > Yes, we can do this. But we have a number of architectures which use > sys_clone() to implement kernel_thread(). It would be nice to have an > architecture neutral kernel_thread() implementation as you proposed. > We should change all of them if we want to add a new parameter to > do_fork(). > > Perhaps it is better to add reparent_kthread() (next patch) to kthread() > and forget about CLONE_KERNEL_THREAD. Please. > Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel > threads. And if ->parent == /sbin/init, we can't remove us from ->children > (unless we forbid sub-thread-of-init exec). So the only safe change is > set ->exit_state = -1. Yes. We certainly need ->exit_state = -1. Earlier I had forgotten about second the use of ->children to update the parent pointer of processes when their parent exits. There is a practical question how much we care about pstree being confused (I assume it doesn't crash). If this is just a confusion issue then I say go for it. PPID == 0 is a very legitimate way to say the kernel is the parent process. There are a few more cases where we are likely to get PPID == 0 in the future and /sbin/init already has that now. Plus there is a lot of historic precedent. The odd part is PPID = 0 having multiple children. If we decide maintaining a tree is important I would much rather put init_task on the task_list so we can see it in /proc then go the other way around. I would like a confirmation that it PPID == 0 is what is confusing pstree just to make certain we haven't half filled in some field in init_task and are thus giving in correct /proc output. But that is all the double checking I would do. >> Your current scheme also has the bad side that if user space supplied >> a kernel flag it is hard to detect it and return -EINVAL. Which >> limits future expansion. Silently dropping clone flags is a real >> pain, if you are trying to detect if a new flag has been implemented. > > Yes. But that is what we are doing now. copy_process() just ignores > unknown flags. Agreed. I fixed that in sys_unshare but I should really submit a patch to do the same for sys_clone at some point. When know flags aren't implemented we certainly return -EINVAL. Given that this line of work looks to fix the race that messes allows a threaded init to generate unkillable zombies I can probably find some time in the next while to work on it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] pata_icside driver
Alan Cox wrote: The second FIXME area is ata_irq_ack - it is unconditionally coded for SFF-type interfaces. I believe that using this function in non-BMDMA interfaces is wrong - it attempts to read from the BMDMA registers irrespective of whether ap->ioaddr.bmdma_addr is set or not. The question this poses is: what should non-BMDMA implementations use for this method? Note that pata_platform also uses this function despite not supporting BMDMA which seems even more suspicious. Thats a bug that has arrived again. The older code was corrected to handle this properly but the fix appears to have become lost. The ioread/iowrite code actually made quite a mess (all the address reporting is also broken) and we do some iffy things like compare the iomap result with zero and assume thats the same as checking for true bus zero addresses. ata_irq_ack is part of the SFF layer so its fine that it assumes SFF but its wrong that it is used unconditionally and it shouldn't be used this way. It just needs a (!ap->ioaddr.bmdma_addr) test adding (assuming thats valid for iomap) No. It does not need such a test, as it requires BMDMA, not just an SFF-style Status register. It is up to the driver to decide whether or not ata_irq_ack() is appropriate for your hardware. pata_icside needs its own ata_irq_ack -- which may just be as simple as reading the Status register to clear the interrupt condition. If others need this as well, ata_sff_irq_ack() would be a good generic function to create. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21-rc6
Andrew Morton wrote: netdev: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/forcedeth-work-around-null-skb-dereference-crash.patch It sounded this was specific to Ingo. I haven't heard anybody else complain, and AFAIK Ayaz and Ingo were still going back and forth. ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/depcac-fix-handling-of-platorm_device_add-failure.patch ACK this one. Need to send this up, but I'm intentionally avoiding work as we are having a big Easter bash here in Raleigh. Silly bunny-related traditions that have nothing to do with Jesus take priority ;-) I have a couple other bug fixes to push, but that will wait until Tuesday. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] intel_agp: PCI id update for Intel 965GM
[AGPGART] intel: Add 965GM chipset support Update PCI id info for Intel 965GM chipset. Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]> --- diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index e542a62..a9fdbf9 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -18,11 +18,14 @@ #define PCI_DEVICE_ID_INTEL_82965Q_IG 0x2992 #define PCI_DEVICE_ID_INTEL_82965G_HB 0x29A0 #define PCI_DEVICE_ID_INTEL_82965G_IG 0x29A2 +#define PCI_DEVICE_ID_INTEL_82965GM_HB 0x2A00 +#define PCI_DEVICE_ID_INTEL_82965GM_IG 0x2A02 #define IS_I965 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82946GZ_HB || \ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_1_HB || \ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965Q_HB || \ - agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB) + agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB || \ + agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965GM_HB) extern int agp_memory_reserved; @@ -1921,7 +1924,13 @@ static int __devinit agp_intel_probe(struct pci_dev *pdev, bridge->driver = &intel_845_driver; name = "965G"; break; - + case PCI_DEVICE_ID_INTEL_82965GM_HB: + if (find_i830(PCI_DEVICE_ID_INTEL_82965GM_IG)) + bridge->driver = &intel_i965_driver; + else + bridge->driver = &intel_845_driver; + name = "965GM"; + break; case PCI_DEVICE_ID_INTEL_7505_0: bridge->driver = &intel_7505_driver; name = "E7505"; @@ -2080,6 +2089,7 @@ static struct pci_device_id agp_intel_pci_table[] = { ID(PCI_DEVICE_ID_INTEL_82965G_1_HB), ID(PCI_DEVICE_ID_INTEL_82965Q_HB), ID(PCI_DEVICE_ID_INTEL_82965G_HB), + ID(PCI_DEVICE_ID_INTEL_82965GM_HB), { } }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21-rc6
On Sun, Apr 08, 2007 at 04:09:54PM -0700, Andrew Morton wrote: > > I'm sitting on five patches which look like 2.6.21 material, but which > would normally go through subsystem maintainers: > > driver core: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/update-documentation-driver-model-platformtxt.patch Feel free to forward it on with: Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> As it was just a documentation update, I figured it was safe to wait for 2.6.22, but I have no objection to it going in now. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SD scheduler testing hitch
[...] Well, it's a late hour, so maybe I'm missing something... but it does look to be HZ and "will run" time interval related issue. Like described in (*). Or maybe we both observe similar situations but have different reasons behind them. I meant that account_user_time() is also called from timer_ISR -> update_process_times() like scheduler_tick(). So if task's running intervals are shorter than 1/HZ, it's not always accounted --> so cpu% may be wrong for such a task... -- Best regards, Dmitry Adamushko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SD scheduler testing hitch
On 08/04/07, Mike Galbraith <[EMAIL PROTECTED]> wrote: On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote: > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent > here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The > starving 10% duty cycle task has trouble getting 1% CPU. Something is odd, very odd indeed. But surprise-surprise, it does not seem to be something merely SD-releated. In short, the question is - can we always believe statistics being provided by "top" (i.e. the way it's being collected by the kernel)? The tests are below. Somewhere in the middle are thoughts on how HZ and an interval of cpu usage by a given task may be connected to such a behaviour. The system: Pentiium 3 Coppermine 750 MHz (iThinkPad T21), 256 RAM. I tested 3 configurations: (1) 2.6.13-15 (default in SuSE 10) (2) 2.6.19 (3) 2.6.21-rc5 + sd-0.39 TEST: just a tenp.c, i.e. without Mike's "steal" (either as xx.c or as a part of modified fairtest.c) thingy, but tenp-- a tenp.c with a single running copy; tenp2 -- a tenp.c with 2 (1 additionally forked) running copies tenp15 - 15 copies (only for SD) (1) 2.6.13-15 Tasks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie Cpu(s): 8.6% us, 0.7% sy, 0.0% ni, 90.4% id, 0.0% wa, 0.3% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5582 dimm 15 0 1460 428 348 S 6.0 0.2 0:02.03 tenp 4047 messageb 17 0 3520 1584 1324 S 1.3 0.6 0:00.28 dbus-daemon Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie Cpu(s): 14.9% us, 0.3% sy, 0.0% ni, 84.8% id, 0.0% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5598 dimm 15 0 1460 428 348 S 7.2 0.2 0:01.42 tenp2 5599 dimm 15 0 1460 432 352 S 6.9 0.2 0:00.87 tenp2 5591 dimm 16 0 2108 988 764 R 0.3 0.4 0:00.47 top 1 root 16 0 688 260 224 S 0.0 0.1 0:01.78 init I repeated 7 times each of the tests (tenp and tenp2). All are ok. Now an interesting part starts. (2) 2.6.19 [ 2.1 ] ks: 73 total, 1 running, 72 sleeping, 0 stopped, 0 zombie Cpu(s): 1.3% us, 0.7% sy, 0.0% ni, 98.0% id, 0.0% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8312 root 15 0 27168 14m 2128 S 0.7 5.6 0:08.29 X 8640 dimm 15 0 28656 13m 9m S 0.7 5.4 0:03.44 konsole 8813 dimm 15 0 1460 432 352 S 0.3 0.2 0:00.32 tenp 1 root 15 0 696 268 228 S 0.0 0.1 0:01.12 init [ 2.2 ] ks: 73 total, 3 running, 70 sleeping, 0 stopped, 0 zombie Cpu(s): 6.6% us, 0.7% sy, 0.0% ni, 92.7% id, 0.0% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8816 dimm 15 0 1464 432 352 S 5.0 0.2 0:01.49 tenp 8312 root 15 0 27168 14m 2128 R 1.7 5.6 0:09.08 X See a difference between [ 2.1 ] and [ 2.2 ] ? [ 2.2 ] (which is ok) has happened 3 out of 10 times. Now for tenp2 [ 2.3 ] ks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie Cpu(s): 14.6% us, 0.3% sy, 0.0% ni, 85.1% id, 0.0% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8850 dimm 15 0 1460 432 352 S 6.6 0.2 0:01.32 tenp2 8851 dimm 15 0 1460 112 32 S 6.3 0.0 0:00.77 tenp2 8312 root 15 0 27168 14m 2128 S 0.7 5.6 0:11.73 X [ 2.4 ] ks: 74 total, 2 running, 72 sleeping, 0 stopped, 0 zombie Cpu(s): 3.3% us, 0.3% sy, 0.0% ni, 96.3% id, 0.0% wa, 0.0% hi, 0.0% si PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8312 root 15 0 27168 14m 2128 S 2.0 5.6 0:12.97 X 8640 dimm 15 0 28748 13m 9m R 0.7 5.4 0:07.22 konsole 8532 dimm 18 0 2476 416 268 S 0.3 0.2 0:00.04 gpg-agent 8852 dimm 15 0 2116 996 772 R 0.3 0.4 0:00.27 top 8859 dimm 15 0 1460 432 352 S 0.3 0.2 0:00.44 tenp2 8860 dimm 15 0 1460 112 32 S 0.3 0.0 0:00.02 tenp2 1 root 15 0 696 268 228 S 0.0 0.1 0:01.12 init Again, [ 2.3 ] took place only 3 times. Some observations: /1/ for the "ok" ( [ 2.2 ] and [ 2.3 ] ) cases, the "will run" and "will sleep" times from tenp's calibration output look /higher/ than on average : e.g. Each fork will run for 5863 usecs and sleep for 52767 usecs v.s. something in between Each fork will run for 2392 usecs and sleep for 21528 usecs Each fork will run for 3880 usecs and sleep for 34920 usecs for the most part of cases (when tenp's cpu% ~0.3). /2/ HZ = 250 for 2.6.19 and I think it was still 1000 for 2.6.13 (arghh.. forgot to check and would like to avoid a reboot in this already late hour... but I believe it was still the time of 1000 by default). = (*) HZ == 250 ==> timer_tick is once in 4 ms. So - "will run for" < 4 ms - may come well unaccounted? :o) The funny thing i
Re: Security computation within Linux kernel
On 4/8/07, JanuGerman <[EMAIL PROTECTED]> wrote: Hi every one, I have one question regarding security libraries, already shipped with Linux Kernel. That is, all PKI, RSA libraries, as provided by OpenSSL are already integrated within the linux kernel source code? OR, one have to use OpenSSL seperately in this regard. I can see, linux/crypto.h and linux/hash.h files shipped with 2.6 kernel and know that SHA1 "hashing" can be done using linux/hash.h, but beside that, any possiblity for RSA or PKI. What do you expect the kernel to use PKI for? That's userspace stuff. What problem are you trying to solve? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add a norecovery option to ext3/4?
Hi, Distribution installers usually try to probe OSes for building a suited grub menu. Unfortunately, mounting an ext3 partition, even in read-only mode, does perform some operations on the filesystem (log recovery). This is not a good idea since it may silently garbage data. XFS has a norecovery option that allows to disable that, I'd say ext3/4 should have it too. Samuel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21-rc6
On Thu, 5 Apr 2007 19:50:11 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > Ok, > I don't think there really is anything very interesting here, but we're > hopefully whittling down the list of regressions, and fixing various > random other small issues while at it. > > Some smallish MIPS updates, networking (and network driver) fixes, removal > of a long obsolete framebuffer driver, etc etc. The shortlog really tells > the story. > > We should be getting close to a 2.6.21 release, so please update any > regression reports you've done, > I'm sitting on five patches which look like 2.6.21 material, but which would normally go through subsystem maintainers: pcmcia: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/fix-hotplug-for-legacy-platform-drivers.patch ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/fix-hotplug-for-legacy-platform-drivers-update.patch driver core: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/update-documentation-driver-model-platformtxt.patch netdev: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/forcedeth-work-around-null-skb-dereference-crash.patch ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/depcac-fix-handling-of-platorm_device_add-failure.patch net: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/pktgen-add-try_to_freeze.patch please send acks, nacks or smacks asap, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel
Wow, I'm impressed. Think you got the record on how many mails you referenced to in a reply... But dude, please calm down, the caps-lock is not the answer. You have got some rude answers and you have called them back on it + you have repeated the same statement several times, that is not the best way of convincing people. I believe you picked up the "anti-Reiser religion"-phrase from previous rant-wars (otherwise, why does that "religion"-phrase always come up, and (almost) only when dealing with Reiser-fs), and yes, there has been some clashes caused by both sides, so please be careful when dealing with this matter. Would you be willing to benchmark Reiser4 with some compressed binary-blob and show the time as well as the CPU-usage? And document how it is set up so it can be reproduced. After all, Windows is suppose to be more stable, maintained and cost-efficient then Linux, but they don't tell us how ;) since it can't benefit as much from similarity between files. So if that is the case and you really want to save diskspace you almost have to look at read-only compressed filesystems such as cramfs, squashfs, zisofs, cloop and various other variants in combination with a unionfs overlay to get read/write functionality. But in the end everything is a tradeoff. You can save diskspace, but increase the cost of corruption. You deliberately ignored the fact that bad blocks are NOT dealt with by the filesystem,... but by the operating system. Like I said: If your filesystem is writing to bad blocks, then throw away your operating system. I may have missed something, but if my room-mate took my harddrive, screwed it open, wrote a love-letter on the disk with a pencil and then returned it (ok, there may be some more plausible reasons for corruption), is the OS really suppose to handle it? Yes, it should not assign any new data to those blocks but should it not also fall into the file-systems domain to be able to restore some/all data? Just my 2c to the pond Richard Knutsson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel
Christer Weinigel: Until YOU, have actually used the REISER4 filesystem yourself, I think YOU OWE IT to the people on the linux-kernel mailing list, to, AS YOU SAY, shut the fuck up. Even reading up on the REISER4 filesystem would help. Applying a little intelligence would undoubtedly help too. > [EMAIL PROTECTED] writes: > > > Lennart. Tell me again that these results from > > > > http://linuxhelp.150m.com/resources/fs-benchmarks.htm and > > http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm > > > > are not of interest to you. I still don't understand why you > > have your head in the sand. > > Oh, for fucks sake, stop sounding like a broken record. Oh, for fucks sake, would you, and your religious anti-REISER cohorts, stop sounding like a broken record. > You have repeated the same totally meaningless statistics more > times than I care to count. Please shut the fuck up. You, and your religious anti-REISER cohorts, have indeed repeated the same broken arguments (if you can call them such) more times than I care to count. NO statistics, NO real facts, just selective MANIPULATION of facts. > Please shut the fuck up. Yes, why don't you politely, shut the fuck up. Until YOU, have actually used the REISER4 filesystem yourself, I think YOU OWE IT to the people on the linux-kernel mailing list, to shut the fuck up, as YOU say. I guess, the fact that you are a TOTAL HYPOCRITE, has completely escaped you. By the way: Did I thank you "delightful" people for the "pleasant" welcome to the linux-kernel mailing list? - > So the two bonnie benchmarks with lzo and gzip are > totally meaningless for any real life usages. YOU (yes, the one with no experience and next to NO knowledge on the subject) claim that because bonnie++ writes files that are mostly zeros, the results are meaningless. It should be mentioned that bonnie++ writes files that are mostly zero for all the filesystems compared. So the results are meaningful, contrary to would you claim. And hopefully all will notice that you just ignore these tests: .-. |File |Disk |Copy |Copy |Tar |Unzip| Del | |System |Usage|655MB|655MB|Gzip |UnTar| 2.5 | |Type | (MB)| (1) | (2) |655MB|655MB| Gig | .-. |REISER4 gzip | 213 | 148 | 68 | 83 | 48 | 70 | |REISER4 lzo | 278 | 138 | 56 | 80 | 34 | 84 | |REISER4 tails| 673 | 148 | 63 | 78 | 33 | 65 | |REISER4 | 692 | 148 | 55 | 67 | 25 | 56 | |NTFS3g | 772 |1333 |1426 | 585 | 767 | 194 | |NTFS | 779 | 781 | 173 | X | X | X | |REISER3 | 793 | 184 | 98 | 85 | 63 | 22 | |XFS | 799 | 220 | 173 | 119 | 90 | 106 | |JFS | 806 | 228 | 202 | 95 | 97 | 127 | |EXT4 extents | 806 | 162 | 55 | 69 | 36 | 32 | |EXT4 default | 816 | 174 | 70 | 74 | 42 | 50 | |EXT3 | 816 | 182 | 74 | 73 | 43 | 51 | |EXT2 | 816 | 201 | 82 | 73 | 39 | 67 | |FAT32| 988 | 253 | 158 | 118 | 81 | 95 | .-. where the files are definitely NOT mostly zeros. Your negligence has to be deliberate,... but why? Are you manipulating the facts just to try and win an argument? Most sane people will realize, that what you say is simply wrong. ALSO YOU IGNORE examples offered by others, on lkml, which contradict your assertion: FOR EXAMPLE: > I see the same thing with my nightly scripts that do syslog analysis, last > year > I trimmed 2 hours from the nightly run by processing compressed files instead > of > uncompressed ones (after I did this I configured it to compress the files as > they are rolled, but rolling every 5 min the compression takes <20 seconds, > so > the compression is < 30 min) >From David Lang http://lkml.org/lkml/2007/4/7/196 Willy Tarreau also mentions this situation in a couple of articles. Let me spoon feed you: David has said that compressing the logs takes 24 x 12 x 20 secs = 5,760 secs = 1.6 hours of CPU time (over the day) but he saves 2 hours of CPU time on the daily syslog analysis. For a total (minimum) saving of 24 minutes. The actual saving is probably much greater. It depends on the CPU utilization when not compressing, ie, whether you are using ide CPU cycles or not. I guess it also depends on whether you can go home one and a half hours earlier by using compression, or if your boss makes you stick around anyway. NOTE THAT THE FILES IN THIS EXAMPLE ARE ALSO NOT MAINLY ZEROS. MAYBE you just lacked the knowledge to understand what David was saying, or maybe your desire to denigrate REISER4 is so strong, that you simply don't care what other people say about similar circumstances. I am not sure why you have to be spoon feed on these matters, or why you adamantly refuse to find the facts of the matter, for yourself. --
2.6.21-rc6-mm1
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/ - Lots of x86 updates - This is a 25MB diff against mainline, which is rather large. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail [EMAIL PROTECTED] - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.21-rc5-mm4: origin.patch git-acpi.patch git-alsa.patch git-agpgart.patch git-arm.patch git-avr32.patch git-cifs.patch git-cpufreq.patch git-powerpc.patch git-drm.patch git-dvb.patch git-gfs2-nmw.patch git-hid.patch git-ia64.patch git-ieee1394.patch git-infiniband.patch git-input.patch git-jfs.patch git-kbuild.patch git-kvm.patch git-leds.patch git-libata-all.patch git-md-accel.patch git-mips.patch git-mmc.patch git-mtd.patch git-ubi.patch git-netdev-all.patch git-e1000.patch git-net.patch git-ioat.patch git-ocfs2.patch git-parisc.patch git-r8169.patch git-selinux.patch git-pciseg.patch git-s390.patch git-scsi-misc.patch git-block.patch git-unionfs.patch git-watchdog.patch git-wireless.patch git-ipwireless_cs.patch git-cryptodev.patch git-gccbug.patch git trees. -md-avoid-a-deadlock-when-removing-a-device-from-an-md-array-via-sysfs.patch -md-avoid-a-deadlock-when-removing-a-device-from-an-md-array-via-sysfs-fix.patch -revert-driver-core-do-not-wait-unnecessarily-in-driver_unregister.patch -net-sunrpc-svcsockc-fix-a-check.patch -agp-prevent-probe-collision-of-sis-agp-and-amd64_agp.patch -cifs-remove-unneeded-checks.patch -git-libata-all-ipr-fix.patch -pcmcia-spot-slave-decode-flaws-for-testing.patch -sata_nv-dont-read-shadow-registers-when-in-adma-mode.patch -pata_ali-remove-all-the-crap-again-and-switch-to.patch -pata_amd-remove-all-the-crud-and-restore-the-cable-detect.patch -pata_netcell-re-remove-all-the-crud.patch -pata_qdi-restore-cable-detect.patch -pata_sl82c105-restore-cable-detect-method.patch -pata_winbond-restore-cable-method.patch -pata_optidma-rework-for-cable-detect-and-to-remove.patch -ide-sl82c105-rework-pio-support.patch -ide-sl82c105-dma-support-code-cleanup-take3.patch -mtd-pmc-msp71xx-flash-rootfs-mappings.patch -jffs2-delete-everything-related-to-obsolete-jffs2_proc.patch -mtd-support-for-auto-locking-flash-on-power-up.patch -make-drivers-net-qla3xxxcphy_devices-static.patch -git-wireless-debug-build-fixes.patch -cxgb3-safeguard-tcam-size-usage.patch -cxgb3-detect-nic-only-adapters.patch -cxgb3-tighten-xgmac-workaround.patch -cxgb3-firwmare-update.patch -fix-scsi_send_eh_cmnd-scatterlist-handling.patch -slab-mention-slab-name-when-listing-corrupt-objects.patch -turn-do_sync_file_range-into-do_sync_mapping_range.patch Merged into mainline or a subsystem tree. +fuse-validate-rootmode-mount-option.patch +proper-fix-for-highmem-kmap_atomic-functions-for-vmi-for-2621.patch +omap_cf-oops-on-suspend-fix.patch +x86_64-early-quirks-fix-early_qrk-section-tag.patch +i386-irqbalance_disable-section-fix.patch 2.6.21 queue. -vmi-paravirt-ops-bugfix-for-2621.patch Dropped. +make-proc-acpi-wakeup-more-useful.patch +sony-laptop-remove-acpi-references-from-variable-and-function-names.patch +sony-laptop-prepare-the-platform-driver-for-multiple-users.patch +sony-laptop-add-debug-macros-also-used-by-the-sonypi-reimplementation.patch +sony-laptop-add-sny6001-device-handling-sonypi-reimplementation.patch +sony-laptop-unify-the-input-subsystem-event-forwarding.patch +sony-laptop-additional-platform-attributes-coming-from-sny6001.patch +sony-laptop-sanitize-printks.patch +sony-laptop-update-documentation-and-kconfig-help.patch +sony-laptop-add-sonypi-compat-code.patch sony-laptop work. +arm-fix-section-mismatch-warning-in-board-sam9260.patch A
Re: [RFC][PATCH -mm] swsusp: Use rbtree for tracking allocated swap
Hi. On Sun, 2007-04-08 at 18:47 +0200, Rafael J. Wysocki wrote: > On Sunday, 8 April 2007 01:42, Nigel Cunningham wrote: > > Hi. > > > > On Sun, 2007-04-08 at 01:13 +0200, Rafael J. Wysocki wrote: > > > On Sunday, 8 April 2007 00:31, Nigel Cunningham wrote: > > > > Hi. > > > > > > > > On Sat, 2007-04-07 at 15:06 -0700, Andrew Morton wrote: > > > > > On Sat, 7 Apr 2007 23:20:39 +0200 "Rafael J. Wysocki" <[EMAIL > > > > > PROTECTED]> wrote: > > > > > > > > > > > This should allow us to reduce the memory usage, practically > > > > > > always, and > > > > > > improve performance. > > > > > > > > > > And does it? > > > > > > Yes. There are theoretical corner cases in which it may be less efficient > > > than the current approach, but in the usual situation it is _much_ better. > > > > > > > It will. I've been using extents for ages, for the same reasons. I don't > > > > put them in an rb_tree because I view it as less than most efficient, > > > > > > Actually, I don't agree with that. In the normal situation (ie. one > > > extent is > > > needed) there is no difference as far as the memory usage or performance > > > are concerned, but if there are more extents, the rbtree should be more > > > efficient. > > > > I don't think it's worth having a big discussion over, but let me give > > you the details, which you can then feel free to ignore :) > > > > The rb_node struct adds an unsigned long and two struct rb_node * > > pointers. My extents use one struct extent * pointer. The difference is > > thus 12/24 bytes per extent (32/64 bits) vs 20/40. > > Well, you use open-coded lists. If you used list.h lists, the numbers > would be different. :-) Yes, but I don't need doubly linked lists. > > In the normal situation, not worth worrying about, but I'm also using these > > for > > recording the sectors we write too, and thinking about swap files and > > multiple swap devices. Nearly double the memory use bites more as you > > get more extents. > > > > Insertion cost for rb_node includes keeping the tree balanced. For > > extents, I start with the location of the last insertion to minimise the > > cost, so insertion time is usually virtually zero (inc max of last > > extent or append a new one). > > Isn't the appending one actually linear worst-case? Worst case would be the swap allocator returning swap pages in reverse order. As you and I both know, that doesn't happen. I first implemented this in 2003. If the worst case actually happened, I would have seen the effect by now :) > > If for some reason swap was allocated out of order, I might need to traverse > > the whole chain from the start. > > Exactly. > > > Normal usage in both cases is simply iterating through the list, so I > > guess the cost would be approximately the same. > > > > Deletion could would include rebalancing for the rb_nodes. > > In swsusp the deletions are needed only if there's an error. When freeing swap at the end of the cycle? > > Code cost is a gain for you - you're leveraging existing code, I'm > > adding a bit more. extent.c is 300 lines including code for serialising > > the chains in an image header and iterating through a group of chains > > (multiple swap devices support). > > > > rb_nodes seem to be the wrong solution to me because we generally don't > > care about searching. We care about minimising memory usage and > > maximising the speed of iteration, insertion and deletion. I believe > > I've managed to do that with a singly linked, sorted list. > > The insertion also uses searching and in fact I don't really care for anything > else. Ok :) Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] Fix MTRR suspend support for specific machines (some AMD64, maybe others also)
On Tue, 3 Apr 2007 15:55:32 +0200 (CEST) Bernhard Kaindl <[EMAIL PROTECTED]> wrote: > With at least 3 of the following 4 patches, s2ram and s2disk are > fixed on at least the Acer Ferrari 1000 notebooks and at least > s2disk on the Acer Ferrari 5000 notebooks. These patches cause my Vaio to oops during suspend-to-disk. oops: http://userweb.kernel.org/~akpm/s5000499.jpg config: http://userweb.kernel.org/~akpm/config-sony.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote: > 1) Deprecate telldir/seekdir() altogether. Relatively few progams use > this functionality, and it is highly questionable how useful it is, > anyway. If you use telldir/seekdir and keep the cookie for a long > time, even the POSIX-provided guarantees about files that are created > and deleted between the telldir() and seekdir() points in time makes > its utility highly dubious. How will nfsd implement readdir? > 2) If application programs must have telldir/seekdir, than expand the > size of the cookie from 32-bits to a minimum of 128 bits, and > preferably larger --- say 512 bits, to accomodate systems that might > be using 512-bit variant of SHA-2. NFS readdir cookies are currently 64 bits. It'd be interesting to think about how to modify the protocol to make this all easier, but any network filesystem protocol will need to give clients some way to read through big directories one piece at a time. Might also be nice if it worked even if the server rebooted partway through --b. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6
Please reproduce and provide a new crash dump without the nvidia binary-only module loaded. Hi again, Here is a new crash dump (I also removed vmnet and vmmon properitary modules), this time I also included a lspci output: Apr 8 21:47:21 localhost pppd[2114]: rcvd [proto=0xfc3b] bc d4 80 eb 43 62 d0 7e 6d 27 0a e0 22 e4 8d e6 3e f1 a3 10 39 c8 fd cb e7 23 db f1 cf a8 e0 4d ... Apr 8 21:47:21 localhost pppd[2114]: Unsupported protocol 0xfc3b received Apr 8 21:47:21 localhost pppd[2114]: sent [LCP ProtRej id=0x75 fc 3b bc d4 80 eb 43 62 d0 7e 6d 27 0a e0 22 e4 8d e6 3e f1 a3 10 39 c8 fd cb e7 23 db f1 cf a8 ...] Apr 8 21:47:22 localhost pppd[2114]: rcvd [proto=0xcd] f7 4e 69 54 c1 d2 82 d3 bf 1c 33 46 a1 ee 90 97 14 7c a7 23 9d 84 c3 d4 ff 6c ec 25 a7 65 a3 bd ... Apr 8 21:47:22 localhost pppd[2114]: Unsupported protocol 0xcd received Apr 8 21:47:22 localhost pppd[2114]: sent [LCP ProtRej id=0x76 00 cd f7 4e 69 54 c1 d2 82 d3 bf 1c 33 46 a1 ee 90 97 14 7c a7 23 9d 84 c3 d4 ff 6c ec 25 a7 65 ...] Apr 8 21:47:22 localhost kernel: skb_under_panic: text:f8c3cc0e len:1268 put:1 head:c399f800 data:c399f7ff tail:c399fcf3 end:c399fe00 dev: Apr 8 21:47:22 localhost kernel: [ cut here ] Apr 8 21:47:22 localhost kernel: kernel BUG at net/core/skbuff.c:111! Apr 8 21:47:22 localhost kernel: invalid opcode: [#1] Apr 8 21:47:22 localhost kernel: Modules linked in: nfs nfsd exportfs lockd nfs_acl sunrpc button xt_TCPMSS xt_limit xt_tcpudp nf_nat_irc nf_nat_ftp iptable_nat iptable_mangle ipt_LOG ipt_MASQUERADE nf_nat ipt_TOS ipt_REJECT nf_conntrack_irc nf_conntrack_ftp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables x_tables ppp_async ipv6 ppp_generic slhc xfs eeprom w83781d w83627hf hwmon_vid i2c_isa ide_generic snd_via82xx snd_ac97_codec ac97_bus snd_pcm snd_timer snd_page_alloc i2c_viapro snd_mpu401_uart i2c_core via_ircc snd_rawmidi snd_seq_device floppy snd serio_raw soundcore irda rtc psmouse via_agp agpgart crc_ccitt pcspkr evdev ext3 jbd mbcache usbhid ide_cd cdrom ide_disk generic uhci_hcd usbcore via82cxxx ide_core e100 mii thermal processor fan Apr 8 21:47:22 localhost kernel: CPU:0 Apr 8 21:47:22 localhost kernel: EIP:0060:[] Tainted: P VLI Apr 8 21:47:22 localhost kernel: EFLAGS: 00010096 (2.6.21-rc6 #3) Apr 8 21:47:22 localhost kernel: EIP is at skb_under_panic+0x59/0x5d Apr 8 21:47:22 localhost kernel: eax: 0073 ebx: c399f800 ecx: edx: Apr 8 21:47:22 localhost kernel: esi: edi: c399fcf5 ebp: c399fcf1 esp: c1ce5ed8 Apr 8 21:47:22 localhost kernel: ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Apr 8 21:47:22 localhost kernel: Process events/0 (pid: 3, ti=c1ce4000 task=dfd02030 task.ti=c1ce4000) Apr 8 21:47:22 localhost kernel: Stack: c02c47d0 f8c3cc0e 04f4 0001 c399f800 c399f7ff c399fcf3 c399fe00 Apr 8 21:47:22 localhost kernel:c02b7ed8 f7ef5600 00ff f8c3cc13 dfff5c20 c1fee800 0208 Apr 8 21:47:22 localhost kernel:c1fee5ad c1fee4ad c1fee800 0202 dfd7ce00 0004 c1fee400 c1fee80c Apr 8 21:47:22 localhost kernel: Call Trace: Apr 8 21:47:22 localhost kernel: [] ppp_asynctty_receive+0x3b0/0x584 [ppp_async] Apr 8 21:47:22 localhost kernel: [] ppp_asynctty_receive+0x3b5/0x584 [ppp_async] Apr 8 21:47:22 localhost kernel: [] flush_to_ldisc+0xe6/0x124 Apr 8 21:47:22 localhost kernel: [] flush_to_ldisc+0x0/0x124 Apr 8 21:47:22 localhost kernel: [] run_workqueue+0x70/0x101 Apr 8 21:47:22 localhost kernel: [] worker_thread+0x105/0x12e Apr 8 21:47:22 localhost kernel: [] default_wake_function+0x0/0xc Apr 8 21:47:22 localhost kernel: [] worker_thread+0x0/0x12e Apr 8 21:47:23 localhost kernel: [] kthread+0xa0/0xc8 Apr 8 21:47:23 localhost kernel: [] kthread+0x0/0xc8 Apr 8 21:47:23 localhost kernel: [] kernel_thread_helper+0x7/0x10 Apr 8 21:47:23 localhost kernel: === Apr 8 21:47:23 localhost kernel: Code: 00 00 89 5c 24 14 8b 98 a0 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 24 04 c7 04 24 d0 47 2c c0 89 44 24 08 e8 af c5 ef ff <0f> 0b eb fe 56 53 bb d8 7e 2b c0 83 ec 24 8b 70 14 85 f6 0f 45 Apr 8 21:47:23 localhost kernel: EIP: [] skb_under_panic+0x59/0x5d SS:ESP 0068:c1ce5ed8 Apr 8 21:48:01 localhost /USR/SBIN/CRON[6287]: (root) CMD (/usr/local/bin/pppd_test.sh) Apr 8 21:48:09 localhost pppd[2114]: No response to 5 echo-requests Apr 8 21:48:09 localhost pppd[2114]: Serial link appears to be disconnected. Apr 8 21:48:09 localhost pppd[2114]: Connect time 522.7 minutes. Apr 8 21:48:09 localhost pppd[2114]: Sent 57811374 bytes, received 186299345 bytes. Apr 8 21:48:09 localhost pppd[2114]: Script /etc/ppp/ip-down started (pid 6289) Apr 8 21:48:09 localhost pppd[2114]: sent [LCP TermReq id=0x77 "Peer not responding"] Apr 8 21:48:09 localhost pppd[2114]: Script /etc/ppp/ip-down finished (pid 6289), status = 0x0 Apr 8 21:48:12 localhost pppd[2114]: sent [LCP TermReq id=0x78 "Peer no
Re: [PATCH 2/4] Save the MTRRs of the BSP before booting an AP
On Tue, 3 Apr 2007 16:00:36 +0200 (CEST) Bernhard Kaindl <[EMAIL PROTECTED]> wrote: > --- linux-2.6.20.orig/arch/i386/kernel/smpboot.c > +++ linux-2.6.20/arch/i386/kernel/smpboot.c > @@ -59,6 +59,7 @@ > #include > #include > #include > +#include > This inclusion breaks `make headers_check'. Please always at least test allmodconfig builds before releasing a patchset. Additional hints are in Documentation/SubmitChecklist. +static __inline__ void mtrr_save_state (void) +{ + if (smp_processor_id() == 0) + mtrr_save_fixed_ranges(NULL); + else + smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1); +} - Please use inline, not __inline__ - No space before the ( - We should uninline this function - It is not performance critical - It is probably too large to be inlined anwyay - It uses a lot of tricky stuff which requires a lot of header files. From: Andrew Morton <[EMAIL PROTECTED]> Fix `make headers_check'. Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Bernhard Kaindl <[EMAIL PROTECTED]> Cc: Dave Jones <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- arch/i386/kernel/cpu/mtrr/main.c | 11 +++ include/asm-i386/mtrr.h | 12 +--- include/asm-x86_64/proto.h | 12 +--- 3 files changed, 13 insertions(+), 22 deletions(-) diff -puN arch/i386/kernel/smpboot.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix arch/i386/kernel/smpboot.c diff -puN arch/x86_64/kernel/smpboot.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix arch/x86_64/kernel/smpboot.c diff -puN include/asm-i386/mtrr.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix include/asm-i386/mtrr.h --- a/include/asm-i386/mtrr.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix +++ a/include/asm-i386/mtrr.h @@ -25,7 +25,6 @@ #include #include -#include #defineMTRR_IOCTL_BASE 'M' @@ -71,16 +70,7 @@ struct mtrr_gentry /* The following functions are for use by other drivers */ # ifdef CONFIG_MTRR extern void mtrr_save_fixed_ranges(void *); -/** - * Save current fixed-range MTRR state of the BSP - */ -static __inline__ void mtrr_save_state (void) -{ - if (smp_processor_id() == 0) - mtrr_save_fixed_ranges(NULL); - else - smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1); -} +extern void mtrr_save_state(void); extern int mtrr_add (unsigned long base, unsigned long size, unsigned int type, char increment); extern int mtrr_add_page (unsigned long base, unsigned long size, diff -puN include/asm-x86_64/proto.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix include/asm-x86_64/proto.h --- a/include/asm-x86_64/proto.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix +++ a/include/asm-x86_64/proto.h @@ -2,7 +2,6 @@ #define _ASM_X8664_PROTO_H 1 #include -#include /* misc architecture specific prototypes */ @@ -19,16 +18,7 @@ extern void mcheck_init(struct cpuinfo_x extern void mtrr_ap_init(void); extern void mtrr_bp_init(void); extern void mtrr_save_fixed_ranges(void *); -static __inline__ void mtrr_save_state (void) -{ - /* -* Save current fixed-range MTRR state of the BSP: -*/ - if (smp_processor_id() == 0) - mtrr_save_fixed_ranges(NULL); - else - smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1); -} +extern void mtrr_save_state(void); #else #define mtrr_ap_init() do {} while (0) #define mtrr_bp_init() do {} while (0) diff -puN arch/i386/kernel/cpu/mtrr/main.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix arch/i386/kernel/cpu/mtrr/main.c --- a/arch/i386/kernel/cpu/mtrr/main.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix +++ a/arch/i386/kernel/cpu/mtrr/main.c @@ -729,6 +729,17 @@ void mtrr_ap_init(void) local_irq_restore(flags); } +/** + * Save current fixed-range MTRR state of the BSP + */ +void mtrr_save_state(void) +{ + if (smp_processor_id() == 0) + mtrr_save_fixed_ranges(NULL); + else + smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1); +} + static int __init mtrr_init_finialize(void) { if (!mtrr_if) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] pata_icside driver
> + /* > + * DMA is based on a 16MHz clock > + */ > + if (ata_timing_compute(adev, adev->dma_mode, &t, 1000, 1)) > + return; This seems strange for a 16MHz clock. > + > + /* > + * Now, properly adjust the timings. If we have a 62.5ns clock > + * period and we ask for MWDMA2, it calculates the following > + * timings: active 125ns, recovery 62.5ns, cycle 125ns. > + * Quite obviously bogus. NAK. At this point you need to work out why you are getting bogus results and fix it or demonstrate a bug in the core code and fix that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] timekeeping: drop irq-context clocksource polling
On Sun, 2007-04-08 at 10:33 +0200, Thomas Gleixner wrote: > > Oh well, this is a leftover from the days where we tried to use TSC > despite of frequency changes. It still modifies the scale factor of the > tsc clocksource. > > I agree that it can be removed as we switch off TSC anyway in that case. That's what I was thinking .. However, I wanted to wait for John to comment on it also .. Dainel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Weird MMC errors: 2 of 2 - inconsistent state after data crc error
Alex Dubov wrote: > Problem 2: After a data crc error all subsequent commands fail. May it be > caused by stop command > leaving card in some bad state (something clearable by SEND_STATUS)? On the > other hand, is there a > real need to issue a stop command in case main command failed? > It might be, depending on what the problem is. E.g. timeout might still mean the card processed the command and will start sending data. Anyway, CRC errors should be extremely rare so I'd guess that either the card or the controller has gotten confused. In many cases the card will shut down when it gets annoyed, so that might be what you're seeing here. Other than that, I'm not sure I can help that much. The stop commands should never wedge the card, so that isn't the issue (unless the card is buggy). Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On 4/8/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote: More fundamentally, the telldir cookie should never be valid when applied to a different DIR * (even one that refers to the same directory.) Don't worry about this. This is clearly the semantics which was always wanted. I've filed a defect report and it'll be handled. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Weird MMC errors: 1 of 2 - bad ocr value
Alex Dubov wrote: > Recently, I've obtained a bug report concerning an MMC card. Two problems are > described, both > sporadic. > Problem 1: illegal ocr value is returned. You may notice, in the non-working > case, obviously > incorrect ocr value (0x) is returned. The card won't work after this, > unless reinserted. > What, to your opinion, shall we do about it? > I got something similar when there was problem with the power supply. The card was booting through power it drain from the other pins, but it didn't work correctly. Try adding some more delay after you power up in case the controller needs some time to stabilize. -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] merge compat_ioctl.h into compat_ioctl.c
From: Arnd Bergmann <[EMAIL PROTECTED]> Date: Sun, 8 Apr 2007 16:51:20 +0200 > On Sunday 08 April 2007, Christoph Hellwig wrote: > > > > Now that there is no arch-specific compat ioctl handling left there > > is not point in having a separate copat_ioctl.h, so merge it into > > compat_ioctl.c > > Yes, definitely a good idea. > > > Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> > > Acked-by: Arnd Bergmann <[EMAIL PROTECTED]> Acked-by: David S. Miller <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MMC: Fix handling of low-voltage cards (take 2)
Philip Langdale wrote: > Fix handling of low voltage MMC cards. > > Sorry, my fifo filled up and you got stuck at the far end. I've applied this and will push to andrew in a bit. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
Theodore Tso wrote: It doesn't state explicitly that you can use the telldir cookie() after closing the directory stream using closedir() and then reopening it using opendir(), but given that it states that results are undefined after a rewinddir() --- which is much less violent than a closedir()/opendir(), I would definitely argue that an application programmer would be very ill-advised to rely on this working. (Of course, I'd argue that an application programmer shouldn't use telldir/seekdir at all.) Ulrich, is it too late to insert a clarification that the telldir() cookie isn't guaranteed to be valid after closedir() *or* rewinddir()? More fundamentally, the telldir cookie should never be valid when applied to a different DIR * (even one that refers to the same directory.) -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On 4/8/07, Theodore Tso <[EMAIL PROTECTED]> wrote: Ulrich, is it too late to insert a clarification that the telldir() cookie isn't guaranteed to be valid after closedir() *or* rewinddir()? It's never too late. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.5
Chris Wright wrote: Arnaldo Carvalho de Melo (1): DCCP: Fix exploitable hole in DCCP socket options Does this fix cure CVE-2007-1730 and CVE-2007-1734, or just one of them? They both seem to be in the exact same code path the patch touches. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Sun, Apr 08, 2007 at 08:41:30PM +0200, Jörn Engel wrote: > > Garbage-collecting them on closedir() does not work. It surprised me as > well, but there seem to be applications that keep the telldir() cookie > around after closedir(). Iirc, "rm -r" was one of them. > > Neil, is this correct? Well, according to the Single Unix Specification: If the value of loc was not obtained from an earlier call to telldir(), or if a call to rewinddir() occurred between the call to telldir() and the call to seekdir(), the results of subsequent calls to readdir() are unspecified. It doesn't state explicitly that you can use the telldir cookie() after closing the directory stream using closedir() and then reopening it using opendir(), but given that it states that results are undefined after a rewinddir() --- which is much less violent than a closedir()/opendir(), I would definitely argue that an application programmer would be very ill-advised to rely on this working. (Of course, I'd argue that an application programmer shouldn't use telldir/seekdir at all.) Ulrich, is it too late to insert a clarification that the telldir() cookie isn't guaranteed to be valid after closedir() *or* rewinddir()? - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
Theodore Tso wrote: You could, but then you're succeptible to a memory allocation attack. If you have an arbitrarily large directory (say, one with multiple millions of entries), and the attacker program calls seekdir() after every single readdir() call, you would then force the kernel to allocate and then pin arbitrarily large amounts of memory, which as you point out, as currently specified by the POSIX specification, you are not allowed to release until closedir(). This could be done in userspace, by forcing glibc to readdir() the entire directory into memory, at which point seekdir()/telldir() will work just fine. But for a really big directory, this could consume a huge amount of space. If we had the 64-byte telldir cookie that I had proposed, then in userspace we could simply associate that 64-byte telldir cookie with a small 32-bit integer, either in memory, or in some berkdb or tdb interface, at least until the use of telldir/seekdir had actually disappeared. (Which probably wouldn't take that long; I really doubt there are that many users of it out there, so it's probably OK if they suffer a performance penality if they use this really wretched and horrible interface.) If you want to have a large cookies, you could have glibc allocate a memory block to store it, and have glibc responsible for keeping track of it. As far as I know, off_t can hold a pointer on all our implementations (only 32-bit machines have 32-bit off_t as an option; Alpha *might* be an exception but I don't think so.) I'll also note, by the way, that there are those who have been much more cavalier with breaking the wireless interface or the udev/sys interface after one year. Not that I would agree with that, but over some deprecation period measured in years, I think it is possible to nuke what was a horribly misguided interface that should have never existed. Whoever invented it really should receive the brown paper award for one of the worst design decisions of all time. Readdir/telldir are much, much, more fundamental than that. We're talking interfaces which have been standardized for longer than Linux itself has existed. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5: Thinkpad X60 gets critical thermal shutdowns
On Mon, 02 Apr 2007 10:35:40 +0200, Rene Rebe said: (Sorry for the late reply..) > IIRC a MSI Megabook S270 (I formerly owned) BIOS notifies this > "Critical temperature reached (128C)" when the battery run empty > when the OS did no action due to battery low indications. I guess > the BIOS people thought this is a good last resort to let the OS > really shutdown before the box just turns off. It's not just MSI - I recently managed to put a Dell Latitude D820 into its bag while still running, where it babbled to itself running on the warm side for several hours. When I finally did get it out, it *was* quite hot to the touch, but I was amazed that it managed to run the battery down to somewhere under 4% (which took some 4 or 5 hours) and then throw the thermal check that made it shut down - quite the coincidence indeed. However, "ran warm but tolerable and then used the thermal to shut down when the battery failed" matches the symptoms much better pgp7lmMALdsLK.pgp Description: PGP signature
Re: [RFC] pata_icside driver
> The second FIXME area is ata_irq_ack - it is unconditionally coded > for SFF-type interfaces. I believe that using this function in > non-BMDMA interfaces is wrong - it attempts to read from the BMDMA > registers irrespective of whether ap->ioaddr.bmdma_addr is set or > not. The question this poses is: what should non-BMDMA implementations > use for this method? Note that pata_platform also uses this > function despite not supporting BMDMA which seems even more suspicious. Thats a bug that has arrived again. The older code was corrected to handle this properly but the fix appears to have become lost. The ioread/iowrite code actually made quite a mess (all the address reporting is also broken) and we do some iffy things like compare the iomap result with zero and assume thats the same as checking for true bus zero addresses. ata_irq_ack is part of the SFF layer so its fine that it assumes SFF but its wrong that it is used unconditionally and it shouldn't be used this way. It just needs a (!ap->ioaddr.bmdma_addr) test adding (assuming thats valid for iomap) Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On 04/08/2007 12:41 PM, Ingo Molnar wrote: this is pretty hard to get right, and the most objective way to change it is to do it testcase-driven. FYI, interactivity tweaking has been gradual, the last bigger round of interactivity changes were done a year ago: commit 5ce74abe788a26698876e66b9c9ce7e7acc25413 Author: Mike Galbraith <[EMAIL PROTECTED]> Date: Mon Apr 10 22:52:44 2006 -0700 [PATCH] sched: fix interactive task starvation (and a few smaller tweaks since then too.) and that change from Mike responded to a testcase. Mike's latest changes (the ones you just tested) were mostly driven by actual testcases too, which measured long-term timeslice distribution fairness. Ah yes, that one. Here's the next one in that series: commit f1adad78dd2fc8edaa513e0bde92b4c64340245c Author: Linus Torvalds <[EMAIL PROTECTED]> Date: Sun May 21 18:54:09 2006 -0700 Revert "[PATCH] sched: fix interactive task starvation" It personally had me wonder if _anyone_ was testing this stuff... Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On 4/7/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote: It's not going to solve anything at all. We can't stop supporting functionality that has been there forever. Not necessarily. One problem here is that the interface for using readdir() with and without telldir()/seekdir() is the same. A second problem is that the functionality is universally required. Both of these problems can be addressed. For the second problem, I certainly could imagine that making the functionality to to use seekdir()/telldir() optional. It might be hard in POSIX but this does not mean anything about implementations. Implementations just have to provide a way to allow these functions to be used. It does not mean it always and everywhere has to work. What this means is that if, for instance, a filesystem would be (for now) be able to have a mount option to not allow seekdir()/telldir() the system still can conform to POSIX. At the same time we can gather information as to whether seekdir()/telldir() are really needed. I personally think the number of apps which depend on this functionality is miniscule. Using a mount option isn't the nicest solution, though. If a filesystem can support seekdir()/telldir() the better solution from the userlevel API POV would be to provide a better, alternative interface. Maybe an alternative opendir() call (opendir2?) which takes a second parameter as to whether seeking is needed or not. Then this opendir2() function can use a new getdents() syscall and return the entries. The difference would be that if the user wants to use seekdir()/telldir() the userlevel code would cache the old results and the seekdir()/telldir() handling would be entirely at userlevel. It's not a good idea to make this the default behavior for the old opendir() since the vast majority of the current users don't want to seek and therefore the caching would significantly impact the performance. With the extra argument saying when caching is needed this is no problem anymore. Over time people would migrate off of opendir() and towards opendir2() (with some "careful" encouragement) and the whole problem will go away. And the best: this is certainly a path I can see being viable for POSIX. But it requires that we have a) established existing practice b) shown the impact is really low So, I think it would be great to get started writing this new getdents call. Yes, for now it means maintaining two separate versions. If all goes well those filsystems which feel a high burden can simply stop supporting the old syscall or at least the seek functionality. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Sun, Apr 08, 2007 at 11:11:20AM -0700, H. Peter Anvin wrote: > Christoph Hellwig wrote: > >On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote: > >>this functionality, and it is highly questionable how useful it is, > >>anyway. If you use telldir/seekdir and keep the cookie for a long > >>time, even the POSIX-provided guarantees about files that are created > >>and deleted between the telldir() and seekdir() points in time makes > >>its utility highly dubious. > > > >It's not going to solve anything at all. We can't stop supporting > >functionality that has been there forever. > > Well, the question is if you can keep the seekdir/telldir cookie around > as a pointer -- preferrably in userspace, of course. You would > presumably garbage-collect them on closedir() -- there is no other point > at which you could. > > I personally suspect that hch is right -- this stuff has been there > since time immemorial and it'll be hard or impossible to deprecate it. You could, but then you're succeptible to a memory allocation attack. If you have an arbitrarily large directory (say, one with multiple millions of entries), and the attacker program calls seekdir() after every single readdir() call, you would then force the kernel to allocate and then pin arbitrarily large amounts of memory, which as you point out, as currently specified by the POSIX specification, you are not allowed to release until closedir(). This could be done in userspace, by forcing glibc to readdir() the entire directory into memory, at which point seekdir()/telldir() will work just fine. But for a really big directory, this could consume a huge amount of space. If we had the 64-byte telldir cookie that I had proposed, then in userspace we could simply associate that 64-byte telldir cookie with a small 32-bit integer, either in memory, or in some berkdb or tdb interface, at least until the use of telldir/seekdir had actually disappeared. (Which probably wouldn't take that long; I really doubt there are that many users of it out there, so it's probably OK if they suffer a performance penality if they use this really wretched and horrible interface.) I'll also note, by the way, that there are those who have been much more cavalier with breaking the wireless interface or the udev/sys interface after one year. Not that I would agree with that, but over some deprecation period measured in years, I think it is possible to nuke what was a horribly misguided interface that should have never existed. Whoever invented it really should receive the brown paper award for one of the worst design decisions of all time. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Sun, 8 April 2007 11:11:20 -0700, H. Peter Anvin wrote: > > Well, the question is if you can keep the seekdir/telldir cookie around > as a pointer -- preferrably in userspace, of course. You would > presumably garbage-collect them on closedir() -- there is no other point > at which you could. Garbage-collecting them on closedir() does not work. It surprised me as well, but there seem to be applications that keep the telldir() cookie around after closedir(). Iirc, "rm -r" was one of them. Neil, is this correct? Jörn -- Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. -- Rob Pike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
man-pages-2.44 is released
Gidday, After a long hiatus, a new man-pages release... And a happy announcement! My work on man-pages is now partially supported by my employer, Google. Henceforth, something up to 20% [*] of my working week (depending on other time pressures...) will be spent on man-pages maintenance. Thanks, Google! So... I've released man-pages-2.44. This release is now available for download at: ftp://ftp.kernel.org/pub/linux/docs/manpages or mirrors: ftp://ftp.XX.kernel.org/pub/linux/docs/manpages and soon at: ftp://ftp.win.tue.nl/pub/linux-local/manpages Cheers, Michael [*] http://www.google.com/support/jobs/bin/static.py?page=about.html http://en.wikipedia.org/wiki/Google#.22Twenty_percent.22_time === This release contains a very large number of changes. Among the changes that may be of interest to readers of this list are the following: New pages - termio.7 mtk, after a bit of prodding by Reuben Thomas A brief discussion of the old System V termio interface, with pointers to pages that will contain the information that the reader probably wants. Changes to individual pages --- access.2 mtk Since 2.6.20, access() honours the MS_NOEXEC mount flag. mincore.2 Nick Piggin Kernel 2.6.21 fixes several earlier bugs in mincore(). mtk Rewrote various parts to make the page clearer. mmap.2 mtk Rewrote and reorganised various parts to be clearer. mount.2 mtk / Val Henson Document MS_RELATIME, new in Linux 2.6.20. semop.2 mtk If sops contains multiple operations, then these are performed in array order. All Unix systems that I know of do this, and some Linux applications depend on this behaviour. SUSv3 made no explicit statement here, but SUSv4 will explicitly require this behaviour. ptrace.2 Chuck Ebbert When the parent receives an event with PTRACE_EVENT_* set, the child is not in the normal signal delivery path. This means the parent cannot do ptrace(PTRACE_CONT) with a signal or ptrace(PTRACE_KILL). kill() with a SIGKILL signal can be used instead to kill the child process after receiving one of these messages. time.7 mtk Since kernel 2.6.20, the software clock can also be 300 HZ. -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APIC error on 32-bit kernel
[Adding linux-kernel to the cc list, hoping for wider exposure.] On Fri, 23 Mar 2007 20:08:17 -0500 Jay Cliburn <[EMAIL PROTECTED]> wrote: > We're trying to track down the source of a problem that occurs > whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4 and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others. > We can load the driver just fine, but whenever we activate the > network, we see APIC errors (a sample of them are shown here, > captured from a serial console): > > [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk > [EMAIL PROTECTED] ~]# [ 93.942012] process `sysctl' is using deprecated > sysctl (sysc. > [ 94.396609] atl1: eth0 link is up 1000 Mbps full duplex > [ 94.498887] APIC error on CPU0: 00(08) > [ 94.498534] APIC error on CPU1: 00(08) > [ 94.550079] APIC error on CPU0: 08(08) > [ 94.549725] APIC error on CPU1: 08(08) > [ 94.600915] APIC error on CPU1: 08(08) > [ 94.601276] APIC error on CPU0: 08(08) > [ 94.652108] APIC error on CPU1: 08(08) > [ 94.652470] APIC error on CPU0: 08(08) > [ 94.703659] APIC error on CPU0: 08(08) > [ 94.703305] APIC error on CPU1: 08(08) > [ 94.754852] APIC error on CPU0: 08(40) > [ 94.806045] APIC error on CPU0: 40(08) > [ 94.805692] APIC error on CPU1: 08(08) > [ 94.857238] APIC error on CPU0: 08(08) > [ 94.856884] APIC error on CPU1: 08(08) > [ 94.908432] APIC error on CPU0: 08(08) > [ 94.908078] APIC error on CPU1: 08(08) > [snip, more of the same] > [ 98.901156] APIC error on CPU1: 08(08) > [ 98.952702] APIC error on CPU0: 08(08) > [ 98.952349] APIC error on CPU1: 08(08) > [ 99.003895] APIC error on CPU0: 08(08) > [ 99.003542] APIC error on CPU1: 08(08) > > The machine hangs for about 5-10 seconds, then spontaneously reboots > without further console output. I can prompt an oops by pinging my router while the apic errors are scrolling by. > > This is an Asus M2V (Via K8T890) motherboard. > > The problem does not occur on a 32-bit kernel if we boot with > pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same > motherboard. > > We also do not see this problem on Intel-based motherboards, with > either 32- or 64-bit kernels. A full raft of documentation -- including acpidump and linux-firmware-kit output, console capture, kernel config, lspci -vvxxx (with apic=debug boot option), dmesg, and /proc/interrupts -- is available at http://www.hogchain.net/m2v/apic-problem/ If this is a motherboard problem, that's fine; I'd just like to know the details so I tell users something more than "it's a motherboard problem." Thanks, Jay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SD scheduler testing hitch
Mike Galbraith wrote: > On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote: > > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent > > here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The > > starving 10% duty cycle task has trouble getting 1% CPU. > > Hmm. Playing with it some more today, it still happens, but it's not > very repeatable. Something is odd. I wonder if any SD using readers > will try it. Tried it on mainline 2.6.20.3. It's not easily repeatable, but it's got the same problem. top - 21:21:45 up 27 min, 0 users, load average: 0.80, 0.43, 0.20 Tasks: 45 total, 3 running, 42 sleeping, 0 stopped, 0 zombie Cpu(s): 24.3% user, 0.5% system, 0.0% nice, 75.0% idle, 0.2% IO-wait Mem:499488k total,27352k used, 472136k free, 1996k buffers Swap: 1020088k total,0k used, 1020088k free, 9160k cached PID PR NI VIRT RES SHR SWAP nFLT nDRT WCHAN S %CPUTIME+ Command 688 25 0 1804 412 352 139200 rest_init R 94.7 2:37.01 fairtest 689 15 0 1804 264 204 154000 rest_init R 0.0 0:00.79 fairtest 1 15 0 1440 500 444 940 150 rest_init S 0.0 0:00.73 init Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] worker_thread: fix racy try_to_freeze() usage
worker_thread() can miss freeze_process()->signal_wake_up() if it happens between try_to_freeze() and prepare_to_wait(). We should check freezing() before entering schedule(). This race was introduced by me in [PATCH 1/1] workqueue: don't migrate pending works from the dead CPU Looks like mm/vmscan.c:kswapd() has the same race. Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 2.6.21-rc5-mm4/kernel/workqueue.c~wq_freeze 2007-04-07 20:11:14.0 +0400 +++ 2.6.21-rc5-mm4/kernel/workqueue.c 2007-04-08 21:37:43.0 +0400 @@ -307,14 +307,14 @@ static int worker_thread(void *__cwq) do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0); for (;;) { - if (cwq->wq->freezeable) - try_to_freeze(); - prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE); - if (!cwq->should_stop && list_empty(&cwq->worklist)) + if (!freezing(current) && !cwq->should_stop + && list_empty(&cwq->worklist)) schedule(); finish_wait(&cwq->more_work, &wait); + try_to_freeze(); + if (cwq_should_stop(cwq)) break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] zap_other_threads: remove unneeded ->exit_signal change
We already depend on fact that all sub-threads have ->exit_signal == -1, no need to set it in zap_other_threads(). Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 2.6.21-rc5-mm4/kernel/signal.c~zat 2007-04-07 20:11:14.0 +0400 +++ 2.6.21-rc5-mm4/kernel/signal.c 2007-04-08 22:09:20.0 +0400 @@ -1163,17 +1163,6 @@ void zap_other_threads(struct task_struc if (t->exit_state) continue; - /* -* We don't want to notify the parent, since we are -* killed as part of a thread group due to another -* thread doing an execve() or similar. So set the -* exit signal to -1 to allow immediate reaping of -* the process. But don't detach the thread group -* leader. -*/ - if (t != p->group_leader) - t->exit_signal = -1; - /* SIGKILL will be handled before any pending SIGSTOP */ sigaddset(&t->pending.signal, SIGKILL); signal_wake_up(t, 1); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reiser4. BEST FILESYSTEM EVER.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Theodore Tso wrote: > The reason why I ignore the tar+gzip tests is that in the past Hans > has rigged the test by using a tar ball which was generated by > unpacking a set of kernel sources on a reiser4 filesystem, and then > repacking them using tar+gzip. The result was a tar file whose files > were optimally laid out so that reiser4 could insert them into the > filesystem b-tree without doing any extra work. > > I can't say for sure whether or not this set of benchmarks has done > this (there's not enough information describing the benchmark setup), > but the sad fact of the matter is that people trying to pitch Reiser4 > have generated for themselves a reputation for using rigged > benchmarks. Hans's used of a carefully stacked and ordered tar file > (which is the same as stacking a deck of cards), and your repeated use > of the bonnee++ benchmarks despite being told that it is a meaningless > result given the fact that well, zero's compress very well and most > people are interested in storing a file of all zeros, has caused me to > look at any benchmarks cited by Reiser4 partisans with a very > jaundiced and skeptical eye. > > Fortunately for you, it's not up to me whether or not Reiser4 makes it > into the kernel. And if it works for you, hey, go wild. You can > always patch it into your own kernel and encourage others to do the > same with respect to getting it tested and adopted. My personal take > on it is that Reiser3, Reiser4 and JFS suffer the same problems, which > is to say they have a very small and limited development community, > and this was referenced in Novell's decision to drop Reiser3: > > http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-default-fs/ > > SuSE has deprecated Reiser3 *and* JFS, and I believe quite strongly it > is the failure of the organizations to attract a diverse development > community is ultimately what doomed them in the long term, both in > terms of support as the kernel migrated and new feature support. It > is for that reason that Hans' personality traits that tend to drive > away those developers who would help them, beyond those that he hires, > is what has been so self-destructive to Reiser4. Read the > announcement Jeff Mahoney from SUSE Labs again; he pointed out was > that reiser3 was getting dropped even though it performs better than > ext3 in some scenarios. There are many other considerations, such as > a filesystem's robustness in case on-disk corruption, long term > maintenance as the kernel maintains, availability of developers to > provide bug fixes, how well the system performs on systems with > multiple cores/CPU's, etc. Those are all arguments I've made and still stand by, but I should address one point that has been repeated fairly often. Novell _isn't_ dropping support for Reiser3 in any of our products. The change only refers to the choice of a default file system. Most users don't care about which file system they use, and those that do are still free to choose reiser3 if they want it. We'll support it and I still have patches under development to improve it. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFGGTHYLPWxlyuTD7IRAj0SAJ4txD5NoStOA4GFgkzcXDdE/Xf9ngCZATNL QtyNTGbi6YFbNF71T5C9hTA= =Emwr -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
Christoph Hellwig wrote: On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote: this functionality, and it is highly questionable how useful it is, anyway. If you use telldir/seekdir and keep the cookie for a long time, even the POSIX-provided guarantees about files that are created and deleted between the telldir() and seekdir() points in time makes its utility highly dubious. It's not going to solve anything at all. We can't stop supporting functionality that has been there forever. Well, the question is if you can keep the seekdir/telldir cookie around as a pointer -- preferrably in userspace, of course. You would presumably garbage-collect them on closedir() -- there is no other point at which you could. I personally suspect that hch is right -- this stuff has been there since time immemorial and it'll be hard or impossible to deprecate it. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: need help
vjn wrote: in my project i want to code the kernel such that when i plugged my usb it should ask for password and check it in the kernel space . can anyone help me I think the correct solution is to use an excrypted mount, and issue the mount command manually with the question in user space. There's no code to ask for input, nor anyway to positively decide which connected terminal is the terminal to ask. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ten percent test
On Sunday 08 April 2007, Mike Galbraith wrote: >On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote: >> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote: >> > That seems to be the killer loading here, building a kernel (make >> > -j3) doesn't seem to lag it all that bad. One session of gzip -best >> > makes it fall plumb over though, which was a disappointment. >> >> Can you make a testcase that doesn't require amanda? > >Or at least send me a couple of 5 or 10 second top snapshots (which also >show CPU usage of sleeping tasks) while the system is misbehaving? > > -Mike With what monitor utility? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) "Microsoft technology" -- isn't that an oxymoron? -- Gareth Barnard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/