Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files
Hi Linus, On Friday 18 January 2008, Linus Torvalds wrote: > On Fri, 18 Jan 2008, Miklos Szeredi wrote: > > > > What I'm saying is that the times could be left un-updated for a long > > time if program doesn't do munmap() or msync(MS_SYNC) for a long time. > > Sure. > > But in those circumstances, the programmer cannot depend on the mtime > *anyway* (because there is no synchronization), so what's the downside? Can we get "if the write to the page hits the disk, the mtime has hit the disk already no less than SOME_GRANULARITY before"? That is very important for computer forensics. Esp. in saving your ass! Ok, now back again to making that fast :-) Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] x86: Add config variables for SMP_MAX
Hi Mike, On Friday 18 January 2008, [EMAIL PROTECTED] wrote: > +config THREAD_ORDER > + int "Kernel stack size (in page order)" > + range 1 3 > + depends on X86_64_SMP > + default "3" if X86_SMP_MAX > + default "1" > + help > + Increases kernel stack size. > + Could you please elaborate, why this is needed and put more info about this requirement into this patch description? People worked hard to push data allocation from stack to heap to make THREAD_ORDER of 0 and 1 possible. So why increase it again and why does this help scalability? Many thanks and Best Regards Ingo Oeser, puzzled a bit :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kinit (was: sleep before boot panic)
On Wednesday 09 January 2008, H. Peter Anvin wrote: > Pavel Machek wrote: > > > >> Of course, if we'd been using kinit, "soft panic" would > >> have been done exclusively in userspace... > > > > What's the status of kinit, btw? > > Pavel > > It's bitrotted a bit since it was first rejected. It wouldn't take too > much work to bring it back up to speed, however. klibc, and some of the > kinit components, are used for the initramfs in Debian. Yes, and I like most of it. The only thing really missing for me is LVM support. Debian (?) did a evil hack to make it work. Maybe one day this itches soo much, I'll even scratch it :-) Then I'll be able to test kernels on a standard LVM installation again. So please keep up the good work! Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sleep before boot panic
On Monday 07 January 2008, Andi Kleen wrote: > Bernd Schubert <[EMAIL PROTECTED]> writes: > > > Hi, > > > > I just switched to libata (pata) on my laptop and the immediate panic made > > it > > impossible to figure out why my boot partition wasn't available. > > After applying this little patch I could check boot printk output and then > > saw > > everything was properly recognized and only scsi-disk support was missing. > > The correct fix would be to make scroll back (and sysrq) still work > after panic. It's a little more complicated, but possible (essentially > it needs a polled keyboard handler) Customer: "This system could not find the root fs." Support: "Oh, yeah, just connect a (USB-) keyboard and scroll back." Hmm, device detection works after panic? I really like the "soft" panic better, where you still can operate the kernel debugging features, but just have no user space supporting it. One better hopes, that keyboards never need external firmware to be loaded at this stage :-) Best Regards Ingo Oeser, who just hit the same problem yesterday... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sleep before boot panic
Hi Bernd, CC'ed hpa, since I'm sure he can give useful advise on that :-) On Sunday 06 January 2008, Bernd Schubert wrote: > On Sunday 06 January 2008, Ingo Oeser wrote: > > Hi Bernd, > > > > On Sunday 06 January 2008, you wrote: > > > Index: zd1211rw.git.beno/init/do_mounts.c > > > === > > > --- zd1211rw.git.beno.orig/init/do_mounts.c 2008-01-06 > > > 18:44:23.0 > > > +0100 > > > +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0 > > > +0100 @@ -330,6 +330,7 @@ > > > printk("Please append a correct \"root=\" boot option; here are > > > the > > > available partitions:\n"); > > > > > > printk_all_partitions(); > > > + msleep(60 * 1000); > > > > ssleep(60); > > feel free to replace it replace it :) Not that urgent, but if you resubmit please do it :-) > There is no dump_stack() here, but disc detection is relatively early in boot > process and on all these information are already scrolled off screen when the > panic is done. For this and any other panic it would be optimal if scrolling > still would work, but scrolling also requires kernel code, so I see there's a > reason not to this for all panics. However, for this boot problem I tend to > say there's no need to panic at all... But the kernel cannot continue from that position. You would need a "soft" panic, which allows behavior of panic=X, but let the kernel continue. Even better is to continue with the init in the builtin ramfs. That should always be available and can implement any behavior desired (like droping into a dash). > Btw, not all stack straces are useless, *most* of them are actually very > useful. I didn't say that. Just if you cannot continue due to admin error, but the kernel is in a perfect valid state otherwise, dumping stack is next to useless. Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sleep before boot panic
Hi Bernd, On Sunday 06 January 2008, you wrote: > Index: zd1211rw.git.beno/init/do_mounts.c > === > --- zd1211rw.git.beno.orig/init/do_mounts.c 2008-01-06 18:44:23.0 > +0100 > +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0 > +0100 > @@ -330,6 +330,7 @@ > printk("Please append a correct \"root=\" boot option; here are > the > available partitions:\n"); > > printk_all_partitions(); > + msleep(60 * 1000); ssleep(60); > panic("VFS: Unable to mount root fs on %s", b); > } Better would be for this and similiar panic()s (fatal user/admin errors on boot) to NOT print a stack trace+registers, since it is useless and actually hides useful information. Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] add task handling notifier
Hi Jan, I like and support your idea! On Thursday 20 December 2007, Jan Beulich wrote: > With more and more sub-systems/sub-components leaving their footprint > in task handling functions, it seems reasonable to add notifiers that > these components can use instead of having them all patch themselves > directly into core files. Yes, but why export variables? Wouldn't it be better to export an API? That simplifies the callers (they all pass "current" as task and "task_notifier_list" as arguments). It also prevents exposing internal variables (notifier lists ARE internal variables) to modules. What do you think? Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] dynamic pipe resizing
On Saturday 15 December 2007, Jan Engelhardt wrote: > On Aug 24 2007 10:52, Jens Axboe wrote: > >Subject: [PATCH][RFC] dynamic pipe resizing > >Like with my original splice patches from 2005, I used fcntl() > >F_GETPIPE_SZ and F_SETPIPE_SZ to change the size of the pipe. I'm not > >particularly fond of that interface, so suggestions on how to improve it > >would be appreciated. Even if fcntl() should be the preferred approach, > >I think it would be better to pass in a byte based value instead of a > >number of pages. > > > Could this patch still make it in? > Yes, I think its set() and get() parts should use bytes and convert > to/from pages. > > Perhaps just round up, and mention the rounding in the manpage, so that > noone gets a shock when the pipe is not exactly as small as requested. Yes, but document only that it is rounding and make the unit arbitrary. That reduces the ABI requirements. Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
On Saturday 01 December 2007, Ingo Molnar wrote: > maybe, but we'd have to see how often this gets triggered. An OOM is > something that could happen in any overloaded system - while a hung task > is likely due to a kernel bug. What about a client using hard mounted NFS shares here? That shouldn't be killed by the OOM killer in that situation, should it? Am I missing sth.? Best Regards Ingo Oeser -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)
On Thursday 25 October 2007, Kristen Carlson Accardi wrote: > I did look into using the LED class for this, but it didn't appropriate > as I wanted the leds to be associated with a particular disk, and not > with the platform as a whole. It seemed to me that the led_class was > a bit of overkill for what we needed to do here, since we just need > on/off and nothing else. Maybe. But didn't you want mdadm to control it? Then it would make sense. But you have a point in the LED API missing the ability to associate a LED to a specific device (e.g. where it is installed :-). So I'm fine either way, since I see you point. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)
Hi Kristen, On Thursday 25 October 2007, Kristen Carlson Accardi wrote: > Enable enclosure management via LED > > As described in the AHCI spec, some AHCI controllers may support > Enclosure management via a variety of protocols. This patch > adds support for the LED message type that is specified in > AHCI 1.1 and higher. Linux has a LED subsystem for that. May I suggest, that you just register these leds and let userspace handle them via that via the LED API? The LED userspace API is described in Documentation/leds-class.txt and the headers for registering LEDs is linux/leds.h under include/ Since you explicitly WANT user space to control these, that should be the right API. Richard, what do YOU think? Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 13/13] RT: Cache cpus_allowed weight for optimizing migration
Hi Gregory, On Tuesday 23 October 2007, Gregory Haskins wrote: > Calculating the weight is probably relatively expensive, so it is only > done when the cpus_allowed mask is updated (which should be relatively > infrequent, especially compared to scheduling frequency) and cached in > the task_struct. Why not make it a task flag, since according to your code, you are only interested whether this is <= 1 or > 1. Since !(x <= 1) <=> (x > 1) for any given unsigned integer x, the required data structure is a "boolean" or a flag. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Extending kbuild syntax
Hi Sam, On Saturday 29 September 2007, Sam Ravnborg wrote: > Introducing the following new variable could make this a oneliner: > ccflags-y > > ccflags-$(DEBUG) := -DDEBUG > > grep -r -C 1 -B 1 EXTRA_CFLAGS shows that the above is a > very common pattern especially in drivers/ ACK. Also ACK for asflags, if done the same way :-) > The second is the more controversial suggestion. Yes, but please bear in mind, what the developers are trying to express in these cases. > In several Makefile we have simple if expression of the variants: > if ($(CONFIG_FOO),y) > obj-$(CONFIG_BAR) += fubar.o > endif This is "feature FOO of module BAR" where the feature itself cannot be a module. The composition scheme described in section 3.3 is at least equally useful. And that is used today. Maybe the documentation of that scheme is not prominent enough :-) > obj-y-ifn- This is the only one needed, because it is cumbersome to express negative rules in kbuild to include stubs (e.g. nommu stuff). But again this can be done with composition rules right now, but is order dependent. If we could get rid of this requirement, I would be happy already. So kbuild is just lacking an "else" clause here. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Combine path_put and path_put_conditional
Hi Andreas, On Friday 28 September 2007, Andreas Gruenbacher wrote: > The name path_put_conditional (formerly, dput_path) is a little unclear. > Replace (path_put_conditional + path_put) with path_walk_put_both, > "put a pair of paths after a path_walk" (see the kerneldoc). ^ So why not name it path_walk_put_pair() then? Rationale: "_both" is just counting, "_pair" means they are related somehow. Best regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[OT] kbuild syntax extension for ccflags and asflags (was: [PATCH 1/3] CodingStyle updates)
Hi Sam, On Saturday 29 September 2007, Sam Ravnborg wrote: > Lately I have considered extending the kbuild syntax a bit. > > Introducing > ccflags-y > asflags-y > > [with same functionality as the EXTRA_CFLAGS, EXTRA_AFLAGS] > would allow us to do: > > ccflags-$(CONFIG_WHATEVER_DEBUG) := -DDEBUG Please do! That is very useful for testing and developing new modules. I learnt a lot from you in this regard and used that kind of syntax to the extreme in some other non-kernel project of mine. There it included also ccflags, asflags and so on. I further split that into -debug-y and -optimize-y flags, but that was just for my own convenience. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] Fix coding style
On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote: > Index: current/kernel/sched_debug.c > === > --- current.orig/kernel/sched_debug.c > +++ current/kernel/sched_debug.c > @@ -239,11 +239,7 @@ static int > root_user_share_read_proc(char *page, char **start, off_t off, int count, >int *eof, void *data) > { > - int len; > - > - len = sprintf(page, "%d\n", init_task_grp_load); > - > - return len; > + return sprintf(page, "%d\n", init_task_grp_load); > } > > static int > @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf > pe->proc_fops = &sched_debug_fops; > > #ifdef CONFIG_FAIR_USER_SCHED > - pe = create_proc_entry("root_user_share", 0644, NULL); > + pe = create_proc_entry("root_user_cpu_share", 0644, NULL); > if (!pe) > return -ENOMEM; What about moving this debug stuff under debugfs? Please consider using the functions in . They compile into nothing, if DEBUGFS is not compiled in and have already useful functions for reading/writing integers and booleans. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why do so many machines need "noapic"?
On Saturday 15 September 2007, Andrew Morton wrote: > There are 48 bugs in bugzilla which mention "noapic" > > http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=noapic&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=®ression=both&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0= > > And there are 173,000 on the internet ;) > http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search > > We screwed this pooch a long time ago - years. Perhaps if some of the many > noapic users could run a bisection search to work out when it broke we > could start fixing things. But they all have a workaround so there's no > motivation. I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS), one is from June 2006 (Gigabyte M57SLI-S4). There are many reasons: 1. Bugs which have such a simple workaround don't get much attention. 2. Usually SMP boards are used for machines, which just HAVE to work, since they have been expensive. These are not consumer boards. 3. I usually had only USB problems (no IRQ), if ommiting noapic. USB technology is a cosumer grade technology and enterprise grade developers don't have much interest in it (until now?). 4. IRQ routing setup is often a BIOS issue. You might be able to fix that by upgrading your BIOS. That often needs a Windows tool. Linux people not always (want to) have access to Windows :-) I reported the all the problems (starting 2001), no developer seemed interested. I can report them against the latest RC6 kernel tomorrow and put them into bugzilla, if we now REALLY care. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the intention.
[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the intention. Signed-off-by: Ingo Oeser <[EMAIL PROTECTED]> --- Hi Herbert, here is the requested patch against Linus' latest tree. It at least compiles. Best Regards Ingo Oeser crypto/blkcipher.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index d8f8ec3..1c99d92 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -65,7 +65,7 @@ static inline void blkcipher_unmap_dst(struct blkcipher_walk *walk) static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len) { u8 *end_page = (u8 *)(((unsigned long)(start + len - 1)) & PAGE_MASK); - return start > end_page ? start : end_page; + return max(start, end_page); } static inline unsigned int blkcipher_done_slow(struct crypto_blkcipher *tfm, -- 1.5.2.5 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] crypto: blkcipher_get_spot() handling of buffer at end of page
Hi Herbert, On Monday 10 September 2007, Herbert Xu wrote: > On Sat, Sep 08, 2007 at 12:14:23PM +0800, Herbert Xu wrote: > > > > [CRYPTO] blkcipher: Fix handling of kmalloc page straddling > > As Bob correctly noted, I had the boolean test inverted. > Here is the correction: > > [CRYPTO] blkcipher: Fix inverted test in blkcipher_get_spot > > The previous patch had the conditional inverted. This patch fixes it > so that we return the original position if it does not straddle a page. What about using max() for this to make your intention obvious? static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len) { u8 *end_page = (u8 *)(((unsigned long)(start + len - 1)) & PAGE_MASK); return max(start, end_page); } Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFD] new syscalls: suspend_other/resume_other?
Hi there, at the moment implementing a mark and sweep garbage collection subsystem is quite a hack, because you always have to use up some signals for suspend/resume all threads to implement this. For runtime environments (like D system libraries or JVMs) this is a hack, since you take away flexibility from the application. A possible solution would be a syscall or a PTRACE extension to realize the suspend/resume. I best describe the possible syscall manpages here, so you get an idea. NAME suspend_other - suspends execution of all but the calling thread SYNOPSIS long suspend_other(void); RETURN VALUE Positive count suspended threads on success. If 0, then suspend_other was a no-op and there is nothing to resume, but the call should still considered successful. If the number is -1, the errno has to be checked for possible error values. ERRORS EDEADLK We run already a suspend_other() and the calling thread has just been resumed. EPERM The calling thread is not allowed to do this. (optional case due to security) DESCRIPTION After sucessful return of this call, the affected process is single threaded and only the calling thread runs in this process (==MM struct). The thread, which calls this is responsible for resuming all the suspended threads. One can iterate through "/proc/self/task/", to verify for sure that one knows all threads, if the returned count doesn't match the expected value. Any per thread queued signals are deferred until resume_other() or process destruction. NOTES This call might be restricted to the main thread. NAME resume_other - resume execution of foreign thread in this process SYNOPSIS long resume_other(pid_t tid) RETURN VALUE Returns 0, if successful. Otherwise -1 is returned, the errno has to be checked for possible error values and the call has no effect at all. Any non-blocked signals of that thread which happend during suspend/resume are deliverd now. ERRORS EINVAL The thread is running already. (this is a severe caller BUG). ESRCH The thread with tid does not exist. (or doesn't belong to this process). EPERM The calling thread is not allowed to do this. (optional case due to security) DESCRIPTION After sucessful return of this call, the affected thread is the the state it was before it was suspended by calling suspend_other(). NOTES The value -1 for tid is reserved for future extension (e.g. meaning ALL other threads). This call might be restricted to the main thread. - Any opinions? Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] memchr (trivial) optimization
On Wednesday 22 August 2007, lode leroy wrote: > While profiling something completely unrelated, I noticed > that on the workloads I used memchr for, I saw a 30%-40% improvement > in performance, with the following trivial changes... > (basically, it saves 3 operations for each call) Yes, but then you could be a bit more explicit to the compiler on what you are doing here: void *memchr(const void *s, int c, size_t n) { const unsigned char *p = s; for (; n != 0; n--, p++) { if ((unsigned char)c == *p) { return (void *)p; } return NULL; } Now the compiler should see the loop more clearly. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: proc: export a processes resource limits via proc/
Hi Neil, > +static struct limit_names lnames[RLIM_NLIMITS] = { static const ... may be better here. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Driver-level memory management
Hi Michael, On Sunday 12 August 2007, Michael Bourgeous wrote: > I'm working on a driver for older HDTV cards based on the TL880 chip. > These cards typically have 16MB of their own memory, which is > available to me over the PCI bus. Various functions of the card > require me to manage this memory, allocating and freeing chunks of it > as necessary. I can easily include my own allocation and management > code, Ok. > but I'm sure this is a problem that has been solved before. Yes! in your Kconfig select GENERIC_ALLOCATOR in your driver.c #include Code is in lib/genalloc.c, if you like to take a look. Memory for MANAGING free/allocated space is NOT taken from your on-card memory! That allocator is explicitly developed for such use cases. Happy hacking! Best regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 03/11] fuse: add reference counting to fuse_file
Hi Miklos, On Friday 03 August 2007, Miklos Szeredi wrote: > From: Miklos Szeredi <[EMAIL PROTECTED]> > > Make lifetime of 'struct fuse_file' independent from 'struct file' by > adding a reference counter and destructor. What about using krefs to implement that? see include/linux/kref.h and lib/kref.c Just embed that "struct kref" inside your struct fuse_file struct fuse_file { ... struct kref ref; ... } init in struct fuse_file *fuse_file_alloc(void) where you added the counter. ... kref_init(&ff->ref); ... and implement the release function like: static void fuse_file_release(struct kref *ff_ref) { struct fuse_file *ff = container_of(ff_ref, struct fuse_file, ref); struct fuse_req *req = ff->reserved_req; struct fuse_conn *fc = get_fuse_conn(req->dentry->d_inode); request_send_background(fc, req); kfree(ff); } This will also fix the missing smp_barriers, is very simple, saves code, makes your life easier and is a well known known kernel infrastructure :-) BTW: FUSE rocks! :-) You can add my "Signed-off-by: Ingo Oeser <[EMAIL PROTECTED]>", if you want to use that suggestion. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.
On Tuesday 17 July 2007, Kalpak Shah wrote: > Index: linux-2.6.22/include/linux/ext4_fs.h > === > --- linux-2.6.22.orig/include/linux/ext4_fs.h > +++ linux-2.6.22/include/linux/ext4_fs.h > @@ -797,12 +797,18 @@ struct ext4_dir_entry_2 { >#define is_dx(dir) (EXT4_HAS_COMPAT_FEATURE(dir->i_sb, \ > EXT4_FEATURE_COMPAT_DIR_INDEX) > && \ > (EXT4_I(dir)->i_flags & EXT4_INDEX_FL)) > -#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= > EXT4_LINK_MAX) > -#define EXT4_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2 || (dir)->i_nlink == 1) > +static inline int ext4_dir_link_max(struct inode *dir) > +{ > + return (!is_dx(dir) && (dir)->i_nlink >= EXT4_LINK_MAX); > +} > +static inline int ext4_dir_link_empty(struct inode *dir) > +{ > + return ((dir)->i_nlink == 2 || (dir)->i_nlink == 1); > +} even better: static inline bool ext4_is_dx(const struct inode *dir) { #ifdef FOOBAR return EXT4_HAS_COMPAT_FEATURE(dir->i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) && (EXT4_I(dir)->i_flags & EXT4_INDEX_FL)); #else return false; #endif } static inline bool ext4_dir_link_max(const struct inode *dir) { return !ext4_is_dx(dir) && (dir->i_nlink >= EXT4_LINK_MAX); } static inline bool ext4_dir_link_empty(const struct inode *dir) { #ifdef FOOBAR return dir->i_nlink == 2 || dir->i_nlink == 1; #else return dir->i_nlink == 2; #endif } FOOBAR is the define, which enables ext4_is_dx(). That is not in the patch, so left as an exercise to the reader :-) Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-rng: Undo mess made by an 80 column extremist
On Friday 08 June 2007, John Stoffel wrote: > Jeff> On Thu, Jun 07, 2007 at 09:56:06PM -0400, John Stoffel wrote: > >> Thinking about it more, I wonder if Krysztof is bitching more about > >> the tab width of 8 characters? I know that it ticks me off, > > Jeff> Even if he is, _that_ is definitely not getting changed. > > Oh sure... I know that part is written in stone. Yes, and as a person doing Linux code review for 12 years now, I'm really thankful for it. 8 char tab, 80 column rule and 25-50 lines of code per function actually enable effective review of code snippets. Because you can see more code flow per patch. And enables high code reuse. If you can get within 1-5min, what a functions does and match it with your actually written down last 20 code lines, you just reuse it more often. If you have more to choose from, you reuse naturally. Personally I find best candidates by code position in tree and function signature. > Jeff> If code starts creeping way right due to indentation levels, > Jeff> create a new function. > > Sure... compilers are good, us humans haven't gotten much better, make > it easier on us and harder on the computer. Yes, let compile remove all the abstraction overhead. GCC does a pretty good job there, I think. I recently analyzed some code and it took much, much longer (factor 2-3), because of laxer coding rules similiar to the ones you suggest. I even asked the developers, who wrote that code and to ones who work daily with that code base and they had the same problems. They all couldn't explain the "Why?" only the "How?". Not to mention, that this was a core component. After refactoring some big messy parts into smaller functions, identical, missing, unhandled cases became visible, inappropriate usages were identified and even some loops could be removed. Now try to find such problems within Linux. They should be a small percentage and not within core components. So a big THANKS to all the code cops here: You actually make the damn fast change rate of Linux possible by keeping the base clean and neat. Best Regards Ingo oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/6] Storing ipcs into radix trees
Hi, On Friday 08 June 2007, Nadia Derbey wrote: > Ingo Oeser wrote: > > ... together with this means 4*256 -> 1k of precious stack space used. > > Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that. > You're completely right, but trying to lower the extraction size, I'm > afraid this will have an impact on performances. > > Here are the results of a small test I did: I have run ctxbench on both > the 256 and and 16 entries versions > > 1) 256 entries: > 42523679 itterations in 300.005423 seconds = 141743/sec > 2) 16 entries: > 41774255 itterations in 300.005334 seconds = 139245/sec So that is around 1.8% in a benchmark. Not bad, if one considers, that this is an expensive syncronisation primitive anyway (and thus shouldn't dominate any real workload). At least _much_ better than possible stack underflow :-) BTW: You forgot to include measurements with the unmodified code as it is in Linus' tree now. They woule be a nice data point here. > Will try with a dynamic allocation. But than you have an additional error path or have to sleep until memory becomes available. Maybe try doubling IPCS_MAX_SCAN_ENTRIES - until the performance impact is in the noise - is simpler. Up to 64 seems acceptable. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/6] Storing ipcs into radix trees
Hi Nadia, good to see someone is pounding this old beast again :-) On Thursday 07 June 2007, [EMAIL PROTECTED] wrote: > Index: linux-2.6.21/ipc/util.h > === > --- linux-2.6.21.orig/ipc/util.h 2007-06-07 11:00:30.0 +0200 > +++ linux-2.6.21/ipc/util.h 2007-06-07 11:07:22.0 +0200 > @@ -13,6 +13,8 @@ > #define USHRT_MAX 0x > #define SEQ_MULTIPLIER (IPCMNI) > > +#define IPCS_MAX_SCAN_ENTRIES 256 That ... > Index: linux-2.6.21/ipc/util.c > === > --- linux-2.6.21.orig/ipc/util.c 2007-06-07 11:00:30.0 +0200 > +++ linux-2.6.21/ipc/util.c 2007-06-07 11:29:43.0 +0200 > @@ -252,72 +241,94 @@ void __init ipc_init_proc_interface(cons > * @key: The key to find > * > * Requires ipc_ids.mutex locked. > - * Returns the identifier if found or -1 if not. > + * Returns the LOCKED pointer to the ipc structure if found or NULL > + * if not. > + * If key is found ipc contains its ipc structure > */ > > -int ipc_findkey(struct ipc_ids* ids, key_t key) > +struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key) > { > - int id; > - struct kern_ipc_perm* p; > - int max_id = ids->max_id; > + struct kern_ipc_perm *ipc; > + struct kern_ipc_perm *ipcs[IPCS_MAX_SCAN_ENTRIES]; ... together with this means 4*256 -> 1k of precious stack space used. Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that. Same problem with your third patch called "Changing the loops on a single ipcid into radix_tree_gang_lookup() calls" If you cannot sleep, try to lower that constant (e.g. 16-32). The current users use much smaller numbers. If you can sleep and performance goes down after lowering that constant, try to kmalloc these arrays (since kmalloc() of that small amount should succeed easily). Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bridge] [BUG] Dropping fragmented IP packets within VLAN frames on bridge
On Saturday 26 May 2007, Patrick McHardy wrote: > Adam Osuchowski wrote: > > if (((skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu) || > > (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)) && > > !skb_is_gso(skb)) > > return ip_fragment ... > > > net/8021q ignores the VLAN header overhead, so we should probably do the > same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current > -rc) looks fine, additionally we should probably also check for > skb->nfct != NULL to make sure that at least without connection tracking > the bridge doesn't perform fragmentation. And could we separe the conditions for that into a static helper function explaining each of these conditions? e.g. sth. like that: static bool br_nf_need_fragment(struct sk_buff *skb) { /* Plain IP packet does not fit in MTU */ if (!(skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu)) return true; /* VLAN encapsulated IP packet does not fit in MTU */ if (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN) return true; /* PPPoE encapsulated IP packet does not fit in MTU */ if (IS_PPPOE_IP(skb) && skb->len > skb->dev->mtu - PPPOE_SES_HLEN) return true; return false; } and then br_nf_dev_queue_xmit() becomes: static int br_nf_dev_queue_xmit(struct sk_buff *skb) { if (br_nf_need_fragment(skb) && !skb_is_gso(skb)) return ip_fragment(skb, br_dev_queue_push_xmit); else return br_dev_queue_push_xmit(skb); } which is much more readable, more documented and doesn't contain a condition monster :-) @Patrick: Could you check, wether the PPPoE case is correct? What do you think? Should I submit a patch for that? Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll,threading
Hi Arunachalam, On Saturday 26 May 2007, Arunachalam wrote: > I want to know in detail about , what the events (epoll or /dev/poll or > select ) achieve in contrast to thread per client. > > i can have a thread per client and use send and recv system call directly > right? Why do i go for these event mechanisms? Try 30.000 clients or more on a x86 32bit box. That will show you the difference quite nicely :-) More seriously: Thread per client scales only to a certain amount of clients per RAM. If you like to scale beyond that to like to minimize your state per client. If you have a thread then you have a task structure as unswappable memory in kernel, a per-thread stack, which is reducing your virtual memory per process (you have only around 3GB of virtual memory per process in Linux x86 32bit). So one uses a process or thread pool to scale beyond that. Pool size is typically related to the amount of CPU cores in the system. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce boot based time
Hi John and Tomas, On Thursday 10 May 2007, john stultz wrote: > I'm not sure I follow this. > > total_sleep_time stores seconds. So on 32bit systems that's 130some > years, so it shouldn't be an issue. > > Is the reason you want it to be a ktime is because you want a way to > keep sub-second sleep granularity? No, I'm just overworked and getting sloppy :-/ Sorry for the noise... Best regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce boot based time
Hi Tomas, On Thursday 10 May 2007, Tomas Janousek wrote: > diff --git a/include/linux/time.h b/include/linux/time.h > index 8997b61..06f3eaf 100644 > --- a/include/linux/time.h > +++ b/include/linux/time.h > @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval > *value, > extern unsigned int alarm_setitimer(unsigned int seconds); > extern int do_getitimer(int which, struct itimerval *value); > extern void getnstimeofday(struct timespec *tv); > +extern void getboottime(struct timespec *ts); > +extern void monotonic_to_bootbased(struct timespec *ts); > > extern struct timespec timespec_trunc(struct timespec t, unsigned gran); > extern int timekeeping_is_continuous(void); > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index f9217bf..dd9647a 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock); > * at zero at system boot time, so wall_to_monotonic will be negative, > * however, we will ALWAYS keep the tv_nsec part positive so we can use > * the usual normalization. > + * > + * wall_to_monotonic is moved after resume from suspend for the monotonic > + * time not to jump. We need to add total_sleep_time to wall_to_monotonic > + * to get the real boot based time offset. > + * > + * - wall_to_monotonic is no longer the boot time, getboottime must be > + * used instead. > */ > struct timespec xtime __attribute__ ((aligned (16))); > struct timespec wall_to_monotonic __attribute__ ((aligned (16))); > +static unsigned long total_sleep_time; Could you make that a ktime_t (or struct ktime)? There are machines, which sleep more than they are awake. Just imagine a surveillance camera triggered by door entrance. Yes, these things might run Linux (e.g. on "cris" architecture). Or your VCR. Yes, these devices might sleep more than they are awake, if you are not a TV junkie :-) Best regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] LogFS proper
On Tuesday 08 May 2007, Thomas Gleixner wrote: > On Tue, 2007-05-08 at 00:00 +0200, Jörn Engel wrote: > > +#define packed __attribute__((__packed__)) > > Please use the __attribute__((__packed__)) on your structs instead of > creating some extra "needs lookup" magic. Don't worry, we have __packed predefined for this. Just look in include/linux/compiler-gcc.h I love it, because I always forget at least one brace or undescore level :-) Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
On Tuesday 01 May 2007, Dave Airlie wrote: > > - what's with the /proc interface? Don't add new proc code for > > non-process related things. This should all go into sysfs > > somewhere. And yes, I know /proc/dri/ is there today, but don't add > > new stuff please. > > Well we should move all that stuff to sysfs, but we have all the > infrastructure for publishing this stuff under /proc/dri and adding > new files doesn't take a major amount, as much as I appreciate sysfs, > it isn't suitable for this sort of information dump, the whole one > value per file is quite useless to provide this sort of information > which is uni-directional for users to send to us for debugging without > have to install some special tool to join all the values into one > place.. and I don't think drmfs is the answer either... or maybe it > is Ok, what about debugfs then? If it is just for debugging blobs -> debugfs, if it is crucial for operation -> sysfs and representation of one value per file. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] crypto: Use padlock.ko only as a module
Hi Scott, On Sunday 29 April 2007, Simon Arlott wrote: > Ideally I'd just remove that module completely, all it does is > trigger the loading of the other two modules when modules are > used - so I'll submit a patch for that instead. That's much better! When you force a feature to be a module on a kernel without module support, it will effectivly be disabled. And if it is so simple to do the same in userspace like you suggest, than that's much better. Best Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
On Tuesday 10 April 2007, Jeff Garzik wrote: > Thus, rather than forcing authors to make their code more complex, we > should find another solution. What about sth. like the "pre-forking" concept? So just have a thread creator thread, which checks the amount of unused threads and keeps them within certain limits. So that anything which needs a thread now simply queues up the work and specifies, that it wants a new thread, if possible. One problem seems to be, that a thread is nothing else but a statement on what other tasks I can wait before doing my current one (e.g. I don't want to mlseep() twice on the same reset timeout). But we usually use locking to order that. Do I miss anything fundamental here? Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init's children list is long and slows reaping children.
On Tuesday 10 April 2007, Jeff Garzik wrote: > That's why I feel thread creation -- cheap under Linux -- is quite > appropriate for many of these situations. Maybe that (thread creation) can be done at open(), socket-creation, service request, syscall or whatever event triggers a driver/subsystem to actually queue work into a thread. And when there is a close(), socket-destruction, service completion or whatever these threads can be marked for destruction and destroyed by a timer or even immediately. Regards Ingo Oeser -- If something is getting cheap, it is getting wasted just because it is cheap. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: getting processor numbers
Hi Ulrich, On Tuesday 03 April 2007, Ulrich Drepper wrote: > So, anybody else has a proposal? This is a pressing issue and cannot > wait until someday in the distant future NUMA topology information is > easily and speedily accessible. Since for now you just need a fast and dirty hack, which will be replaced with better interfaces, I suggest creating a directory with some files in it. These should just contain, what you need to handle your most pressing cases. I propose /sys/devices/system/topology_counters/ for that. These can contain "online_cpu", "proped_cpu", "max_cpu" and maybe the same for nodes. All that as a simple file with an integer value. Since sysfs-attribute files are pollable (if the owners notifies sysfs on changes), you also have the notification system you need (select, poll, epoll etc.). If you promise to just keep the slow code around, than one day when the shiny NUMA topology stuff is ready, this directory can be completely removed and glibc (plus all their users) keeps working. It will then even work better with a new glibc version, which supports the shiny new NUMA topology stuff. The kernel can create these counters quiete easy, since most of them are the hamming weight (or population count) of some bitmaps. Does this sound like a proper hacky solution? :-) Regards Ingo Oeser pgpeUyaLE4v0G.pgp Description: PGP signature
Re: [PATCH] sysctl: vfs_cache_divisor
Hi Randy, On Monday 19 March 2007, Randy Dunlap wrote: > Were there any patches written after this? If so, I missed them. > If not, does this patch help any? How is division by zero avoided? Maybe one can avoid setting it to zero. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] semaphores: add down_interruptible_timeout() and asm-generic/semaphore.h
Hi Inaky, On Tuesday, 27. February 2007, Inaky Perez-Gonzalez wrote: > On Monday 26 February 2007 18:18, Alan wrote: > > > Yeah, I need semaphore. This is a hw register that says when the hw > > > is ready to accept a new command. Code that wants to send commands has > > > to down the semaphore and then send it. When hw is ready to get a new > > > command, it sends and IRQ and the IRQ up()s the semaphore. > > > > So you need a mutex not a semaphore > > Theoretically I could use a mutex. Practically it would trigger ugly > complications. Only the owner can unlock a mutex (for example), so > I could not unlock from an IRQ handler -- not to mention that the > semantic rules outlined in Documentation/mutex-design.txt explicitly > forbid IRQ usage. > > And then, this is what semaphores where designed for, as gates :) > for once that I get to use a semaphore properly... But they are not required for that :-) I would suggest to use an irq-safe spinlock for the hardware access and a status indicator (ready for command), if this is really just a command register. If the status indicator is updated (in IRQ) and read under spinlock, that is safe. If that command sending is speed critical, please try a FIFO and batch that stuff. Timeout based locking mechanisms are flawed, because they introduce the hard to find timing sensitive bugs. Please try sth. different (e.g. like suggested above). Semaphores aren't good "busy/ready flags", as you might have already noticed. Many Thanks and Best Regards Ingo Oeser, the down{_interruptible,}_timeout() implementation of Linux :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/1] MM: detach_vmas_to_be_unmapped fix
Hi, On Wednesday, 21. February 2007, [EMAIL PROTECTED] wrote: > > --- > > mm/mmap.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff -puN mm/mmap.c~Avoiding-mmap-fragmentation_fixup mm/mmap.c > --- linux-2.6_clean/mm/mmap.c~Avoiding-mmap-fragmentation_fixup > 2007-02-21 09:49:32.0 -0800 > +++ linux-2.6_clean-akuster/mm/mmap.c 2007-02-21 09:51:26.0 -0800 > @@ -1720,9 +1720,9 @@ detach_vmas_to_be_unmapped(struct mm_str > *insertion_point = vma; > tail_vma->vm_next = NULL; > if (mm->unmap_area == arch_unmap_area) > - addr = prev ? prev->vm_end : mm->mmap_base; > + addr = prev ? prev->vm_start : mm->mmap_base; > else > - addr = vma ? vma->vm_start : mm->mmap_base; > + addr = vma ? vma->vm_end : mm->mmap_base; > mm->unmap_area(mm, addr); > mm->mmap_cache = NULL; /* Kill the cache. */ > } Please comment, why you think this is necessary. Thanks & Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] MTD: fix DOC2000/2001/2001PLUS build error
On Monday, 5. February 2007, Linus Torvalds wrote: > So thank God for the few selects we have, and we should add a whole lot > more! But "select" is not fine grained enough. I would like to have "require", "recommend", "suggest" for feature A. require X does not work without X, but X is way down the tree e.g. ext3 and block device or how select currently is intended recommend X it is usable but uncomfortable without X, enabled per default e.g. firewalling recommends connection tracking support or NAT recommends all NAT helpers suggest X many people use A together with X, so you might be interested in enabling it, but I disabled it per default unless you said "featuritis mode" before. e.g. highmem and SMP or a network driver and NAPI. That is what the Debian/Ubuntu package management does and maybe other too. And this also gives us new keywords to replace select with, so migration is doable :-) This would also make "EMBEDDED" superflous, because it would just mean "disable anything not required". And this would enable an individual tree for the users current configuration problem instead of a global one. Regards Ingo "and tomorrow we change the world" Oeser :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left
On Tuesday, 16. January 2007 06:37, Nate Diller wrote: > On 1/15/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote: > > > Convert code using iocb->ki_left to use the more generic iov_length() > > > call. > > > > No way. We need to reduce the numer of iovec traversals, not adding > > more of them. > > ok, I can work on a version of this that uses struct iodesc. Maybe > something like this? > > struct iodesc { > struct iovec *iov; > unsigned long nr_segs; > size_t nbytes; > }; > > I suppose it's worth doing the iodesc thing along with this patchset > anyway, since it'll avoid an extra round of interface churn. What about this instead struct iodesc { struct iovec *iov; unsigned long nr_segs; unsigned long seg_limit; size_t nr_bytes; }; That will enable resizeable iodescs with partial completion state and will enable successive filling of an iodesc with iovs. This will be needed anyway. I built an complete short userspace module for that already. I can post and GPLv2 it somewhere, if people are interested. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 31/59] sysctl: C99 convert the ctl_tables in arch/mips/au1000/common/power.c
Hi Eric, On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote: > diff --git a/arch/mips/au1000/common/power.c b/arch/mips/au1000/common/power.c > index b531ab7..31256b8 100644 > --- a/arch/mips/au1000/common/power.c > +++ b/arch/mips/au1000/common/power.c > @@ -419,15 +419,41 @@ static int pm_do_freq(ctl_table * ctl, int write, > struct file *file, > + { > + .ctl_name = CTL_UNNUMBERED, > + .procname = "suspend", > + .data = NULL, > + .maxlen = 0, > + .mode = 0600, > + .proc_handler = &pm_do_suspend > + }, No need for zero initialization for maxlen. > + { > + .ctl_name = CTL_UNNUMBERED, > + .procname = "sleep", > + .data = NULL, > + .maxlen = 0, > + .mode = 0600, > + .proc_handler = &pm_do_sleep > + }, dito > + { > + .ctl_name = CTL_UNNUMBERED, > + .procname = "freq", > + .data = NULL, > + .maxlen = 0, > + .mode = 0600, > + .proc_handler = &pm_do_freq > + }, > + {} > }; dito Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 45/59] sysctl: C99 convert ctl_tables in drivers/parport/procfs.c
Hi Eric, On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote: > diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c > index 2e744a2..5337789 100644 > --- a/drivers/parport/procfs.c > +++ b/drivers/parport/procfs.c > @@ -263,50 +263,118 @@ struct parport_sysctl_table { > + { > + .ctl_name = DEV_PARPORT_BASE_ADDR, > + .procname = "base-addr", > + .data = NULL, > + .maxlen = 0, > + .mode = 0444, > + .proc_handler = &do_hardware_base_addr > + }, No need to initialize to zero or NULL. Just list any variable, which is NOT zero or NULL. > + { > + .ctl_name = DEV_PARPORT_AUTOPROBE + 1, > + .procname = "autoprobe0", > + .data = NULL, > + .maxlen = 0, > + .maxlen = 0444, > + .proc_handler = &do_autoprobe > + }, Typo here? .mode = 0444 makes mor sense. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.
On Monday, 1. January 2007 17:25, Andreas Schwab wrote: > Ingo Oeser <[EMAIL PROTECTED]> writes: > > Then this works, because the side effect (+20) is evaluated only once. > > It's not a side effect, it's a non-lvalue, and you can't take the address > of a non-lvalue. Just verified this. So If we cannot make it work in all cases, it will cause more problems then it will solve. So we are left with a function, which will a) only be used by janitors to provide "kfree(x); x = NULL;" with an macro KFREE(x) in all the simple cases. b) be used by developers, who are aware of the fact that reusable pointer values should set to NULL after kfree(). Doing a) and b) is "running into open doors", so doesn't prevent any error, obfuscates code more and works only sometimes. I give up here and would vote for dropping that idea then. Regards Ingo Oeser pgpcCSfafJsC7.pgp Description: PGP signature
Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.
Hi, On Monday, 1. January 2007 07:37, Amit Choudhary wrote: > --- Ingo Oeser <[EMAIL PROTECTED]> wrote: > > #define kfree_nullify(x) do { \ > > if (__builtin_constant_p(x)) { \ > > kfree(x); \ > > } else { \ > > typeof(x) *__addr_x = &x; \ Ok, I should change that line to typeof(x) *__addr_x = &(x); \ > > kfree(*__addr_x); \ > > *__addr_x = NULL; \ > > } \ > > } while (0) > > > > Regards > > > > Ingo Oeser > > > > This is a nice approach but what if someone does kfree_nullify(x+20). Then this works, because the side effect (+20) is evaluated only once. AFAIK __builtin_constant_p() and typeof() are both free of side effects. > I decided to keep it simple. If someone is calling kfree_nullify() with > anything other than a > simple variable, then they should call kfree(). kfree_nullify() has to replace kfree() to be of any use one day. So this is not an option. Anybody thinking of "Hey, this must be NULL afterwards!", will set it to NULL himself. Anybody else doesn't know or care about it, which is the case we like to catch. > But definitely an approach that takes care of all > situations is the best but I cannot think of a macro that can handle all > situations. The simple > macro that I sent earlier will catch all the other usage at compile time. The problems I see are: 1. parameter to kfree is a value not a pointer -> solved by using a macro instead of function, but generate new (the other) problems -> take the address of the value there. 2. possible side effects of macro parameter usage -> solved by assigning once only and using typeof 3. Constants don't have an address -> need to check for constant So apart from missing braces before taking the address, I don't see any problem with my solution :-) Should I send a patch? > Please let me know if I have missed something. I reviewed it and you missed side effects (kfree(x); x = NULL). Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.
On Sunday, 31. December 2006 14:38, Bernd Petrovitsch wrote: > That depends on the decision/definition if (so called) "double free" is > an error or not (and "free(NULL)" must work in POSIX-compliant > environments). A double free of non-NULL is certainly an error. So the idea of setting it to NULL is ok, since then you can kfree the variable over and over again without any harm. It is just complicated to do this side effect free. Maybe one should check for builtin-constant and take the address, if this is not an builtin-constant. sth, like this #define kfree_nullify(x) do { \ if (__builtin_constant_p(x)) { \ kfree(x); \ } else { \ typeof(x) *__addr_x = &x; \ kfree(*__addr_x); \ *__addr_x = NULL; \ } \ } while (0) Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/8] KVM: Implement a few system configuration msrs
Hi, On Thursday, 28. December 2006 11:11, Avi Kivity wrote: > Index: linux-2.6/drivers/kvm/svm.c > === > --- linux-2.6.orig/drivers/kvm/svm.c > +++ linux-2.6/drivers/kvm/svm.c > @@ -1068,6 +1068,9 @@ static int emulate_on_interception(struc > static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) > { > switch (ecx) { > + case 0xc0010010: /* SYSCFG */ > + case 0xc0010015: /* HWCR */ > + case MSR_IA32_PLATFORM_ID: > case MSR_IA32_P5_MC_ADDR: > case MSR_IA32_P5_MC_TYPE: > case MSR_IA32_MC0_CTL: What about just defining constants for these? Then you can rip out these comments. Same for linux-2.6/drivers/kvm/vmx.c Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 009 of 14] knfsd: SUNRPC: teach svc_sendto() to deal with IPv6 addresses
On Wednesday, 13. December 2006 00:59, NeilBrown wrote: > diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c > --- .prev/net/sunrpc/svcsock.c2006-12-13 10:31:39.0 +1100 > +++ ./net/sunrpc/svcsock.c2006-12-13 10:32:15.0 +1100 > @@ -438,6 +439,47 @@ svc_wake_up(struct svc_serv *serv) > } > } > > +union svc_pktinfo_u { > + struct in_pktinfo pkti; > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > + struct in6_pktinfo pkti6; > +#endif > +}; > + > +static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh) > +{ > + switch (rqstp->rq_sock->sk_sk->sk_family) { > + case AF_INET: > + do { > + struct in_pktinfo *pki = > + (struct in_pktinfo *) CMSG_DATA(cmh); struct in_pktinfo *pki = CMSG_DATA(cmh); Ugly casting not needed here, since CMSG_DATA should return "void *", which can be casted to any pointer. > + > + cmh->cmsg_level = SOL_IP; > + cmh->cmsg_type = IP_PKTINFO; > + pki->ipi_ifindex = 0; > + pki->ipi_spec_dst.s_addr = rqstp->rq_daddr.addr.s_addr; > + cmh->cmsg_len = CMSG_LEN(sizeof(*pki)); > + } while (0); > + break; > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > + case AF_INET6: > + do { > + struct in6_pktinfo *pki = > + (struct in6_pktinfo *) CMSG_DATA(cmh); > + No casting needed, so: struct in6_pktinfo *pki = CMSG_DATA(cmh); Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.19] e1000: replace kmalloc with kzalloc
On Tuesday, 12. December 2006 18:34, Pekka Enberg wrote: > On 12/12/06, Yan Burman <[EMAIL PROTECTED]> wrote: > > size = txdr->count * sizeof(struct e1000_buffer); > > - if (!(txdr->buffer_info = kmalloc(size, GFP_KERNEL))) { > > + if (!(txdr->buffer_info = kzalloc(size, GFP_KERNEL))) { > > ret_val = 1; > > goto err_nomem; > > } > > - memset(txdr->buffer_info, 0, size); > > No one seems to be using size elsewhere so why not convert to > kcalloc() and get rid of it? (Seems to apply to other places as well.) Because if done properly that often exceeds the 80 column limit. The intermediate variable should be optimized away from the compiler. But kcalloc() is better for another reason: Overflow checking. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
EEPROM infrastructure (was: [PATCH] eeprom_93cx6: Add write support)
Lennart Sorensen schrieb: > On Wed, Dec 13, 2006 at 07:56:50PM +0100, Ivo van Doorn wrote: > > This patch addes support for writing to the eeprom, > > this also moves some duplicate code into seperate functions. > > > > Signed-off-by Ivo van Doorn <[EMAIL PROTECTED]> > > Thank you. I will have a try with that to see if I can get that to work > with the jsm driver. Too bad the serial drivers don't have any > geteeprom/seteeprom standard ioctl's the way ethtool does for network > devices. It might be even better to have eeprom writing infrastructure. Many device types come with eeproms today and they implement it per driver or subsystem. On embedded platforms these EEPROMs might even be shared among different devices. So it might be time to generalize this like we did with LEDs. Any comments? Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Introduce mutex_lock_timeout
Hi Matthew, On Saturday, 25. November 2006 17:32, Matthew Wilcox wrote: > In the qla case, the mutex can be acquired by a thread which then waits > for the hardware to do something. If the hardware locks up, it is > preferable that the system not hang. Ok, I looked at it (drivers/scsi/qla2xxx/qla_mbx.c) and the solution seems simple: - Introduce an busy flag, check that BEFORE this mutex_lock() and don't protect it by that mutex. - return -EBUSY to the upper layers, if mailbox still busy - upper layers can either queue the command or use a retry mechanism There are many examples for this in the kernel. NICs have the same problems (transmitter busy or stuck) and have no problem handling that gracefully since ages. > I assumed that he'd spent enough time thinking about it that fixing it > really wasn't feasible. That doesn't depend on time, just whether you get the right idea or not. Anyway I CCed the current maintainers. So my point still stands: Timeout based locking is evil and hides bugs. In this case the bugs are: 1. That mutex protects a code path (mailbox command submission and retrieve) instead of data. 2. "Mailbox is free" is an event, so you should use wait_event_timout() for that Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/1] ipw2100: remove by-hand function entry/exit debugging
Hi Jeff, Jeff Garzik wrote: > David S. Miller wrote: > > From: Jeff Garzik <[EMAIL PROTECTED]> > > Date: Tue, 06 Sep 2005 21:51:21 -0400 > > > >>NAK. Rationale: maintainer's choice. Pavel doesn't get to choose > >>the debugger of choice for the driver maintainer. > > > > If it makes the driver unreadable and thus harder to maintain, > > I think such changes should seriously be considered. > > > > Most of the DEBUG_INFO macro usage is fine, but those "enter" > > and "exit" ones are just pure noise and should be removed. > > I find them useful in my own drivers; they are definitely not pure noise. gcc -finstrument-functions can do that completely without adding noise to the sources. been there, done that. With a gcc-patch you don't even need to resolve symbols. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][CFLART] ipmi procfs bogosity
Hi, On Thursday 01 September 2005 22:30, Alexey Dobriyan wrote: > On Thu, Sep 01, 2005 at 03:00:44PM -0500, Corey Minyard wrote: > > Plus the scanning function I wrote handles arbitrary leading and > > trailing space, etc. Not a big deal, but a little nicer. > > You can say from the beggining that > > echo -n "2 " >/proc/FUBAR > > is illegal and don't add bloat to kernel. No, user interfaces should be robust. Just remember the mantra "Be liberal in what you accept and conservative in what you send." I would suggest adding sth. like Coreys user_strtoul() to lib/string.c which would reduce bloat and security threats for the kernel. Regards Ingo Oeser pgpZueJv3kxez.pgp Description: PGP signature
Re: [PATCH] New: Omnikey CardMan 4040 PCMCIA Driver
On Sunday 04 September 2005 12:12, Harald Welte wrote: > cmx_llseek just use return nonseekable_open(inode, filp); as your last statement in cmx_open() instead of return 0; to really disable any file pointer positioning (e.g. pwrite/pread too). Addtionally cmx_llseek() is implement already as "no_llseek()" by the VFS, so you delete it from the driver an use no_llseek() from the VFS instead. Regards Ingo Oeser pgpfDmvKLYTKl.pgp Description: PGP signature
Re: [2.6 patch] lib/sort.c: small cleanups
On Saturday 03 September 2005 15:25, Adrian Bunk wrote: > This patch contains the following small cleanups: > - make two needlessly global functions static > - every file should #include the header files containing the prototypes > of it's global functions While this is a nice cleanup, does anybody remember, why the inner loops are duplicated in the source? If there are no arguments for it, I would like to consolidate them to a function or a define, if they share to much state. Or is the duplicate just considered cleaner? Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] acpi: Handle cpu_index greater than 256 properly in processor_core.c
Hi Venkatesh, On Saturday 27 August 2005 02:07, Venkatesh Pallipadi wrote: > Fix convert_acpiid_to_cpu function to handle cpu_index greater than 256. This > patch also prevents a warning in IA64 cross-compile of this file > (drivers/acpi/processor_core.c:517: warning: comparison is always false due > to limited range of data type). Why don't you just change the datatype to "unsigned int" and the return failure value to NR_CPUS? That reduces the code changes and leaves the code quite clear. It should also reduce compiled code size by some bytes, but I'm not sure about that one. Regards Ingo Oeser pgpjbXsBpz0gN.pgp Description: PGP signature
Re: [PATCH] Use sg_init_one where appropriate
Hi David, I appreciate your work on unifying common code, but have some comments. On Saturday 27 August 2005 02:33, David Härdeman wrote: > The same code as in sg_init_one can be found in a number of places, this > patch changes them to call the function instead. > Index: linux-sginitone/include/linux/scatterlist.h > === > --- linux-sginitone.orig/include/linux/scatterlist.h 2005-03-02 > 08:38:32.0 +0100 > +++ linux-sginitone/include/linux/scatterlist.h 2005-08-27 > 00:20:53.0 +0200 > @@ -1,8 +1,9 @@ > #ifndef _LINUX_SCATTERLIST_H > #define _LINUX_SCATTERLIST_H > > -static inline void sg_init_one(struct scatterlist *sg, > -u8 *buf, unsigned int buflen) > +static inline void sg_init_one(const struct scatterlist *sg, > +const u8 *buf, > +const unsigned int buflen) > { > memset(sg, 0, sizeof(*sg)); > In short: please remove all "const" markers from the function, try to uninline it somewhere and resend. Explanation: If this compiles without any warning, then your compiler is clearly broken. You promise to not modify the memory pointed to by "sg" and set it to zero then? You also assign buflen to a variable, which voids the "const" attribute anyway. For "buf" this is also wrong. The memory pointed to it will be assigned to a variable whose modification you cannot control. And while you are at it, please check, wether this can be uninlined, since it does a lot of things and is called from quite some sites then. Regards Ingo Oeser pgpmUfxLaepzk.pgp Description: PGP signature
Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.
Hi, On Friday 19 August 2005 00:18, George Anzinger wrote: > Not to say that is wrong but just to make it clear that saying the > itanium speed is is like saying that a cummings diesel is fast with > out saying what sort of car/truck it is mounted in. Yes, esp. since we all known that the fastest diesel is actually Vin Diesel :-) Have Fun! Ingo Oeser pgpCokNmuRJzI.pgp Description: PGP signature
Re: Environment variables inside the kernel?
Hi Guillermo, On Thursday 18 August 2005 17:44, Guillermo López Alejos wrote: > I have a piece of code which uses environment variables. I have been > told that it is not going to work in kernel space because the concept > of environment is not applicable inside the kernel. > > I belive that, but I need to demonstrate it. I do not know how to > proof this, perhaps referring to a solid reference about Linux design > that points to the idea that it has no sense to use environment > variables in kernel space. The Linux kernel is technically one big process with lots of threads. An environment variable is per process and is usally to be threated read only within it. Also the Linux kernel is the first "process" ever. Who should set up it's environment variables? That's why there are none. These arguments are no real proof in a mathematical sense, but should help you argumenting. Regards Ingo Oeser pgpQQpmbZAcFx.pgp Description: PGP signature
Re: [PATCH] Fix mmap_kmem (was: [question] What's the difference between /dev/kmem and /dev/mem)
Hi Andi, On Friday 12 August 2005 18:54, Andi Kleen wrote: > Acessing vmalloc in /dev/mem would be pretty awkward. Yes it doesn't > also work in mmap of /dev/kmem, but at least in read/write. > There are quite a lot of scripts that use it for kernel debugging > like dumping variables. And for that you really want to access modules > and vmalloc. And it's much easier to parse than /proc/kcore Perfect! So it should be under CONFIG_DEBUG_KERNEL and default to off. So you can still debug and we raise the bar higher for rootkits, if they are the only other user. Too simple? Regards Ingo Oeser pgpciqwUeESwg.pgp Description: PGP signature
Re: [PATCH] ARCH_HAS_IRQ_PER_CPU avoids dead code in __do_IRQ()
Hi Karsten, On Sunday 07 August 2005 12:25, Karsten Wiese wrote: > With my proposal the > #if defined(ARCH_HAS_IRQ_PER_CPU) > > #endif > lets readers of __do_IRQ() immediately grasp: > "this block might not be compiled / depends an ARCH" > And you'll get compile error's using IRQ_PER_CPU on ie i386, > letting you immediately know, > that you've got to change something to be able to use IRQ_PER_CPU. > > That are advantages I think. That's a valid argument. But an if is an if for the reader. It is a conditional he has to be aware of and it usally has no idention, if it is just inside "#if" instead of "if ()". I have seen people seen missing "#if 0" [1] around code while reading it. Missing an normal if () is harder with proper idention. A normal conditional has also the advantage, that the compiler checks the code for syntactic and some semantic errors within it. In an "#if 0" you can basically write any plain text[2] and any error will go undetected, until it becomes an "#if 1". Since your define is true for most compilations out there, this argument is not very strong. Last argument: Many kernel developers -- including Linus -- don't like "#if" in C files and prefer them in headers. Their reasons might be similiar to my own. Regards Ingo Oeser [1] Let's just consider the values of the pre-processor symbols here, ok? [2] Pavel Machek used this already to combine Makefile and C file :-) pgpS2eoyKJboQ.pgp Description: PGP signature
Re: [PATCH] ARCH_HAS_IRQ_PER_CPU avoids dead code in __do_IRQ()
Hi Karsten, On Saturday 06 August 2005 18:14, Karsten Wiese wrote: > From: Karsten Wiese <[EMAIL PROTECTED]> > > IRQ_PER_CPU is not used by all architectures. > To avoid dead code generation in __do_IRQ() > this patch introduces the macro ARCH_HAS_IRQ_PER_CPU. > > ARCH_HAS_IRQ_PER_CPU is defined by architectures using > IRQ_PER_CPU in their > include/asm_ARCH/irq.h > file. Why not the other way around? Just define IRQ_PER_CPU to 0 on architectures not needing it and add a FAT comment there, that this disables it. Or make it a config option. Then just leave the code as is and let GCC optimize the dead code away without any changes in the C file. It works, I just checked it ;-) Regards Ingo Oeser pgpjkl6LFAYYy.pgp Description: PGP signature
Re: [patch 3/5] Driver core: Documentation: use snprintf and strnlen
Hi Domen, On Sunday 31 July 2005 13:12, [EMAIL PROTECTED] wrote: > From: Jan Veldeman <[EMAIL PROTECTED]> > Documentation should give the good example of using snprintf and > strnlen in stead of sprintf and strlen. > > PAGE_SIZE is used as the maximal length to reflect the behaviour of > show/store. The whole part of the Documentation is obsoleted by the fact, that struct device has no structure member called "name". People hacking sysfs should also try to hack the docu to match or at least remove the obsolete parts of it. So you can drop this patch altogether, I think. Regards Ingo Oeser pgpyZkeZGy25T.pgp Description: PGP signature
Re: Average instruction length in x86-built kernel?
Hi Karim, On Friday 29 July 2005 23:32, Karim Yaghmour wrote: > Googling around, I can find references claiming that the average > instruction length on x86 is anywhere from 2.7 to 3.5 bytes, but I > can't find anything studying Linux specifically. This is not that hard to find out yourself: Just study the output od objdump -d and average the differences of the first hex number in a line printed, which are followed by a ":" e.g. scripts/kconfig/mconf.o: file format elf32-i386 Disassembly of section .text: : 0: 83 ec 1csub$0x1c,%esp 3: 8d 44 24 10 lea0x10(%esp),%eax 7: 89 44 24 08 mov%eax,0x8(%esp) so avg(3-7, 3-0) = 2.5 and so on... Happy analyzing! Regards Ingo Oeser pgpQz9Sa4VgH3.pgp Description: PGP signature
Re: [patch 1/1] Audit return code of create_proc_*
Hi Domen, On Friday 15 July 2005 00:19, you wrote: > Audit return of create_proc_* functions. This (and related changes) spam the log, if kernel is compiled without /proc-support. Kernels without /proc-support are quite common in the embedded world. Just provide a function in a suitable header (include/linux/proc_fs.h looks promising) file, which contains the following: #ifdef CONFIG_PROC_FS #define procfs_failure(msg) do { printk(msg); } while(0) #else #define procfs_failure(msg) do {} while(0) #endif and use it instead of the direct printk call. That way you get both: Your GCC or checking tool warning is silenced and the log is not spammed for the embedded people. For code, which is broken without procfs, the code should be fixed or it should select PROC_FS in its Kconfig file. Regards Ingo Oeser pgp4s9qFpX6R1.pgp Description: PGP signature
Re: kernel guide to space
On Monday 11 July 2005 17:44, Dmitry Torokhov wrote: > >Descendant must be indented at least to the level of the innermost > >compound expression in the parent. All descendants at the same level > >are indented the same. > >if (foobar(.) + barbar * foobar(bar + > >foo * > > > > oof)) { > >} > > Ugh, that's as ugly as it can get... Something like below is much > easier to read... > > if (foobar(.) + > barbar * foobar(bar + foo * oof)) { > } Even easier is if (foobar(.) + barbar * foobar(bar + foo * oof)) { } since a statement cannot start with binary operators and as such we are SURE that there must have been something before. And this matches with old shop owner calculations like: 1 + 2 + 3 ---- 6 which we all know since early math classes. Regards Ingo Oeser pgpC5TxreXsJl.pgp Description: PGP signature
Re: [PATCH] add securityfs for all LSMs to use
Hi Greg, On Wednesday 06 July 2005 10:17, Greg KH wrote: > + * TODO: > + * I think I can get rid of these default_file_ops, but not quite sure... > + */ > +static ssize_t default_read_file(struct file *file, char __user *buf, > + size_t count, loff_t *ppos) > +{ > + return 0; > +} > + > +static ssize_t default_write_file(struct file *file, const char __user *buf, > +size_t count, loff_t *ppos) > +{ > + return count; > +} Yes, you can get rid of both, if you move read_null and write_null from drivers/char/mem.c to fs/libfs.c and export them. But for what do you need a successful dummy read/write? Regards Ingo Oeser pgpsV5J7kgPff.pgp Description: PGP signature
Re: [ARM] Group device drivers together under their own menu
Hi, Randy.Dunlap wrote: > The real problem AFAICT is that Networking options > includes some protocols and then Network Device Support > includes some other protocols. Maybe if there was a Network Protocol > section things could be clearer. ?? I would really welcome that change. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Releasing resources with children
Russell King wrote: > The only thing I'd question is whether we really need to BUG_ON() here. > ISTR Linus' policy for BUG()/BUG_ON() was only if the condition lead > directly to a filesystem-corrupting bug. I consider it quite effective to flag interface violations. Programming by contract anyone? ;-) Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6] fix mprotect() with len=(size_t)(-1) to return -ENOMEM
Hi Arjan, You wrote: > shouldn't we just fix the alignment code instead that the overflow case > doesn't align to 0??? > that sounds really odd. How? You have to align and you are out of bits for representing the next number. What is the next number you can round to? "null" right! Just remember that integer math with limited bits is always ring math ;-) I love to abuse this for buffers and save an if. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PPC64] Allow emulation of mfpvr on ppc64 kernel
David Gibson wrote: > Andrew, please apply. > > Allow userspace programs on ppc64 to use the (privileged) mfpvr > instruction to determine the processor type. At the moment it > emulates the instruction to provide the real PVR value, though it > could be made to lie in future if for some reason we wish to restrict > what CPU features userspace uses. Why not putting the required information into the AUX table when executing your ELF programs? I loved this feature in the ix86 arch. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
Chris Friesen wrote: > Ingo Oeser wrote: > > Stupid applications can starve other applications for a while, but not > > forever, because the kernel is still running and deciding. > > Not so. > > > > task 1: sched_rr, priority 1, takes mutex > task 2: sched_rr, priority 2, cpu hog, infinite loop > task 3: sched_rr, priority 99, tries to get mutex > > And now tasks 1 and 3 are starved forever. Arguably bad application > design, but it demonstrates a case where applications can starve other > applications. You are right. In "If a SCHED_RR process has been running for a time period equal to or longer than the time quantum, it will be put at the end of the list for its priority" I missed the "for its priority" part. You would need to change the priority of task 1 until it releases the mutex. Ideally the owner gets the maximum priority of his and all the waiters on it, until it releases his mutex, where he regains its old priority after release of mutex. But this priority elevation happens only, if he is runnable. If not, he gets his old priority back, until he is runnable. But then again you just need to grab a mutex shared with a high priority task and consume CPU. Since this behavior is not defined in POSIX AFAIK, you just have to write your applications properly or use SCHED_OTHER for CPU hogging. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
Chad N. Tindel wrote: > I think what we have are the need for two levels of applications: > > 1. That which wishes to be the highest priority userspace application, and > wishes to preempt all other userspace applications. Such an application is > OK being preempted by the kernel when the kernel needs to do work. IMHO, > this should be the default behavior for any SCHED_FIFO application. If one > of these has a bug and goes CPU-bound, the worst it can do is prevent other > apps from ever using the CPU it is on. That is basically, what you do with SCHED_RR. (Be preempted after maximum quantum, even if having work to do) > 2. Applications which actually want to be the highest priority thing on > the system, including being higher than the kernel. These applications are > OK with the fact that they may cause system hangs and deadlocks, and are > careful not to shoot themselves in the foot. This is SCHED_FIFO. (Strict priority scheduling, allowed to starve anything below) So just try to use the right scheduler for your application right now, ok? If your system is busy with top priority task, why should the kernel disturb it? Things will stop anyway, if your high priority task is needing a resource, which is blocked. Than it becomes unrunnable and other tasks have chances to continue. Kernel threads are likely to execute then, because they are likely runnable then. Your task could even migrate, if a lot of kernel tasks are waiting in one CPU and your task is NOT bound to a specific CPU. So the system is not brought down, but just busy in a infortunate way. Stupid applications can starve other applications for a while, but not forever, because the kernel is still running and deciding. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch 4/6] Bind Mount Extensions 0.06
Hi, Herbert Poetzl wrote: > +static inline int mnt_may_unlink(struct vfsmount *mnt, struct inode *dir, > struct dentry *child) { + if (!child->d_inode) > + return -ENOENT; > + if (MNT_IS_RDONLY(mnt)) > + return -EROFS; > + return 0; > +} The argument "dir" is not used. Please remove it and fix the callers. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] hotplug-ng 001 release
Hi, Greg KH write: > Very nice stuff. Ok, that's a good reason not to get rid of these > files, although they can be generated on the fly from the modules > themselves (like depmod does it.) Time to resurrect modinfo? ;-) Didn't we plan to get rid of that, too? If we like to use information from modules, there should be a scriptable tool to extract this kind of information, otherwise it will be a bitch to maintain those tools. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc3: Kylix application no longer works?
Hi, Rik van Riel wrote: > On Wed, 9 Feb 2005, Daniel Jacobowitz wrote: > > On Tue, Feb 08, 2005 at 06:10:18PM -0800, Andrew Morton wrote: > > It's asking for a lot of unwritable zeroed space. See this: > >> LOAD 0x00 0x08048000 0x08048000 0xb7354 0x1b7354 R E > >> 0x1000 LOAD 0x0b7354 0x08200354 0x08200354 0x1e3e4 0x1f648 RW > >> 0x1000 > > > > clear_user's probably not the right way to provide the extra zeroing. > > Indeed, clear_user() refuses to zero data when it's not writable > to the user process ... So if the application wants an read only range of zeroed pages, why not just map the ZERO_PAGE() multiple times there? I can imagine _valid_ uses for that (templates for zero intitialized data), although there are _better_ ways to do that. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] device-mapper: multipath hardware handler
Hi Alasdair, Alasdair G Kergon wrote: > +/* > + * Constructs a hardware handler object, takes custom arguments > + */ > +typedef int (*hwh_ctr_fn) (struct hw_handler *hwh, unsigned arc, char > **argv); +typedef void (*hwh_dtr_fn) (struct hw_handler *hwh); > + > +typedef void (*hwh_pg_init_fn) (struct hw_handler *hwh, unsigned bypassed, > +struct path *path); > +typedef unsigned (*hwh_err_fn) (struct hw_handler *hwh, struct bio *bio); > +typedef int (*hwh_status_fn) (struct hw_handler *hwh, > + status_type_t type, > + char *result, unsigned int maxlen); > + > +/* Information about a hardware handler type */ > +struct hw_handler_type { > + char *name; > + struct module *module; > + > + hwh_ctr_fn ctr; > + hwh_dtr_fn dtr; > + > + hwh_pg_init_fn pg_init; > + hwh_err_fn err; > + hwh_status_fn status; > +}; Please loose the prototypes, don't use prefixes/suffixes and use more descriptive names. Reasons are in Documentation/CodingStyle, Chapter 4. So I suggest declaring it like this: struct hardware_handler_operations { char *name; struct module *module; int (*create) (struct hw_handler *handler, unsigned int argc, char **argv); void (*destroy) (struct hw_handler *handler); void (*pg_init) (struct hw_handler *handler, unsigned int bypassed, struct path *path); unsigned (*error) (struct hw_handler *hwh, struct bio *bio); int (*status) (struct hw_handler *hwh, status_type_t type, char *result, unsigned int maxlen); }; But you might want to loose status_type_t, too. Also hw_foo is a bit generic, isn't it? We are all dealing with "hardware" in any driver (which is basically another word for "hardware handler"). So please be a bit more creative on WHAT you drive. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: msdos/vfat defaults are annoying
Michelle Konzack schrieb: > Am 2005-02-07 09:47:09, schrieb Pozsár Balázs: > > See? I _have_ that patch applied, that's why it tried vfat and not msdos > > first. > > With this, you will nerver mount a Filesystem "msdos". > > Because "vfat" IS "msdos" + "lfn". > > You can attach to ALL "msdos" media "lfn" and you will have "vfat". So msdos is vfat WITHOUT lfn, which is a a restriction like noatime or mounting ext3 as ext2. That's why the default should be vfat indeed and the restriction should be "nolfn", which will not allow lfns to be created and is what you actually intend, right? But this will break API today, so it should be added to list of features that will change. Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make pipe data structure be a circular list of pages, rather
Hi Linus, Linus Torvalds wrote: > +static long do_splice_from(struct inode *pipe, struct file *out, size_t len, > unsigned long flags) > +static long do_splice_to(struct file *in, struct inode *pipe, size_t len, > unsigned long flags) > +static long do_splice(struct file *in, struct file *out, size_t len, > unsigned long flags) > +asmlinkage long sys_splice(int fdin, int fdout, size_t len, unsigned long > flags) That part looks quite perfect. As long as they stay like this, I'm totally happy. I have even no problem about limiting to a length, since I can use that to measure progress (e.g. a simple progress bar). This way I also keep the process as an "actor" like "[EMAIL PROTECTED]" pointed out. It has unnecessary scheduling overhead, but the ability to stop/resume the transfer by killing the process doing it is worth it, I agree. So I would put a structure in the inode identifying the special device and check, whether the "in" and "out" parameters are from devices suitable for a direct on wire transfer. If they are, I just set up some registers and wait for the transfer to happen. Then I get an interrupt/wakeup, if the requested amount is streamed, increment some user space pointers, switch to user space, user space tells me abort or stream more and I follow. Continue until abort by user or streaming problems happen. Just to give you an idea: I debugged such a machine and I had a hard hanging kernel with interrupts disabled. It still got data from a tuner, through an MPEG decoder, an MPEG demultiplexer and played it to the audio card. Not just a buffer like ALSA/OSS, but as long as I would like and it's end to end without any CPU intervention. That behavior would be perfect, but I could also live with a "pushing process". Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How does ramfs actually fills the page cache with data?
On Fri, Jun 22, 2001 at 05:45:27PM -0400, Ho Chak Hung wrote: > In fs/ramfs/inode.c, how does ramfs actually fills the page > cache with data? In the readpage operation, it only zero-fill > the page if it didn't already exist in the page cache. However, > how do I actually fill the page with data? The page cache does it itself. "readpage" is to move pages from the backing store into the page cache. "writepage" and friends is for updating the backing store with the contents of the page cache. There is no real backing store of ramfs, since ramfs data lives completly in page cache. But we cannot give the user random memory contents, so we zero it out on readpage and prepare_write. The data is copied with copy_{from,to}_user in the generic file operations (look how ramfs_file_operations is defined and look at the functions referenced), which read/write through page cache. Regards Ingo Oeser -- Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP spin-locks
On Thu, Jun 14, 2001 at 05:05:07PM -0400, Richard B. Johnson wrote: > The problem is that a data acquisition board across the PCI bus > gives a data transfer rate of 10 to 11 megabytes per second > with a UP kernel, and the transfer drops to 5-6 megabytes per > second with a SMP kernel. The ISR is really simple and copies > data, that's all. > > The 'read()' routine uses a spinlock when it modifies pointers. > > I started to look into where all the CPU clocks were going. The > SMP spinlock code is where it's going. There is often contention > for the lock because interrupts normally occur at 50 to 60 kHz. Then you need another (better?) queueing mechanism. Use multiple queues and a _overflowable_ sequence number as global variable between the queues. N Queues (N := no. of CPUs + 1), which have a spin_lock for each queue. optionally: One reader packet reassembly priority queue (APQ) ordered by sequence number (implicitly or explicitly), if this shouldn't be done in user space. In the writer ISR: Foreach Queue in RR order (start with remebered one): - Try to lock it with spin_trylock (totally inline!) + Failed * if we failed to find a free queue for x "rounds", disable device (we have no reader) and notify user space somehow * increment "rounds" * next queue + Succeed * Increment sequence number * Put data record into queue (* remember this queue as last queue used) (* mark queue "not empty") * do other IRQ work... In the reader routine: Foreach Queue in RR order (start with remebered one): - No data counter above threshold -> EAGAIN [1] - Try to lock it with spin_trylock (totally inline!) + Failed -> next queue + Succeed * if queue empty, unlock and try next one (* remember this queue as last queue used) * Get one data record from queue (in queue order!) * Move data record into APQ * Unlock queue * Deliver as much data from the APQ, as the user wants and is available - if all queues empty or locked -> increment "no data round" counter Notes: The "last queue used" variable is static, but local to routine. It is there to decrease the number of iterations and distribute the data to all queues as more equally. Statistics about lock contention per queue, per round and per try would be nice here to estimate the number of queues needed. The APQ can be quite large, if the sequences are bad distributed and some queues tend to be always locked, if the reader wants to read from this queue. The above can be solved by 2^N "One entry queues" (aka slots) and sequence numbers mapping to this slots. If you need many slots (more then 256, I would say) then this is again inaccaptable, because of the iteration cost in the ISR. What do you think? After some polishing this should decrease lock contention noticibly. Regards Ingo Oeser [1] Blocking will be harder to implement here, since we need to notify the reader routine, that we have data available, which involves some latency you cannot afford. Maybe this could be done via schedule_task(), if needed. -- Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [isocompr PATCH]: announcing stable port to kernel 2.2.18
On Mon, Jun 11, 2001 at 10:59:44PM +0200, Pavel Machek wrote: > > The current version of the patch for 2.2.18 is very stable > > (we use it for DemoLinux [see www.demolinux.org] heavily), > > and I wonder if it could not be a good idea to see if this > > code can be folded into the official releases sometime in the > > future (I have been looking at 2.4.x code, but the new page > > cache means some changes might be needed: I will try to post > > a first version for 2.4.x soon). > > I think that 2.5.0 should be your target... It is definitely new > feature, and both 2.4.X and 2.2.X are in feature freeze. Right. And besides: HPA coded a similar patch for 2.4.x, while he fixed some issues. So you might try his work or even come to an agreement on the format. Regards Ingo Oeser -- Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APIC problem or 3com 3c590 driver problem in smp kernel 2.4.x
On Thu, May 31, 2001 at 12:27:07PM -0400, Feng Xian wrote: > The driver for my pci device, I have the SA_SHIRQ set. What kind of PCI device do you have? I had this problem once with an PCI-Matchmaker[1] based board (for which we still have the wrong PCI-ID btw, but my patch was rejected twice...). > Actually what I am thinking it may be APIC support problem. I rebuild my > kernel to use single cpu without APIC support, my device and 3c905 both > work fine. they don't work for SMP kernel (APIC is by default enabled) > Then I configured my uni-processor kernel to enable the APIC support, my > device won't work with the 3c905, just exactly same as it behaves in the > SMP kernel. With 2.2 I also had this without APIC. I have been flooded with interrupts which have been intended for the Cyclone card (3c905B 100BaseTX), and exited the ISR quickly after querying the interrupt register of my Matchmaker board without any ACKing, but the Cyclone never got these interrupts anymore. But is doesn't seem to be a 3c905 based problem, as I have 11: 95772726 XT-PIC es1371, eth0, eth1 in /proc/interrupts where eth0 and eth1 are both Cyclones. Even the vga card has IRQ 11 assigned. So this is not really unknown ;-) Regards Ingo Oeser [1] class 0b40, vendor id: 10e8, device id: 807d -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Please help me fill in the blanks.
On Sat, May 26, 2001 at 10:27:09PM -0400, Jeff Garzik wrote: > > * Service Location Protocol (SLP) www.openslp.org Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dedicated Interrupt handling on SMP
On Fri, May 25, 2001 at 12:43:11PM -0400, Randy wrote: > I'm trying to find the easiest way to to deidcate one CPU to responding > to a specific Interrupt request. > That CPU should only listen for that request while all other CPU should > ignore the interrupt. cat /proc/irq/*/smp_affinity There you can select on which if the 32 CPUs Linux should handle this IRQ. Read Documentation/IRQ-affinity.txt for more. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD w/info-PATCH] device arguments from lookup, partion code
On Sun, May 20, 2001 at 12:02:35PM -0700, Linus Torvalds wrote: > The problem with ioctl's is, let me repeat, not technology. > > It's people. > > ioctl's are a way to do ugly things. That's what they have ALWAYS been. > And because of that, people don't care about following the rules - if > ioctl's followed the rules, they wouldn't _be_ ioctls in the first place, > but instead have a good interface (say, read()/write()). > > Basically, ioctl's will _never_ be done right, because of the way people > think about them. They are a back door. They are by design typeless and > without rules. They are, in fact, the Microsoft of UNIX. Yes, they are. Why? Because we cannot fit all behavior of a devices _cleanly_ into read/write/mmap/lseek. If we do, we would need different device views (which implies aliasing of devices, which HPA does not like) and it would still be not that clean, because reading from readonly gives a stream and writing gives a stream too, not particular order required until now. [good points] > Would fs/ioctl.c be an ugly mess of some special cases? Yes. But would > that make the ugliness explicit and possibly easier to try to manage and > fix? Very probably. And it would mean that driver writers could not just > say "fuck design, I'm going to do this my own really ugly way". Ok, then I give you an real world example where I idly fight with design since nearly 2 years. A free programmable DSP (or set of DSPs) with several kinds of memory and additional optional devices (like DAC/ADC, ISDN frames and sth. like that) on it. This DSP is attached via some glue logic on Parallel port, PCI, ISA or (soon to come) USB. This thingie can (once programmed) act as a data sink, data source or data processing pipe. OTOH it should be randomly accessable via debuggers and program loaders. It is also resettable/rebootable, has discontinous memory of certain kinds (possibly harvard architecture) and many more funny stuff. And it needs to upload software. I try to unify all these stuff into a "Generic Processing Device Layer" for Linux. Now I like to be shown how I should fit this into clean design that: - uses NO ioctls (Linus) - has only one device per DSP (H.P.A) - Does not emulate ioctls via read/write transactions (which I consider bogus) Theory is nice, but until someone can show me a clean design for this (admittedly heavy ;-)) example, I just don't buy your arguments. A *better* ioctl would be nice, but we still need an "catch all exceptional accesses" interface, IMNSHO. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: const __init
On Sun, May 20, 2001 at 05:34:48PM -0400, Jeff Garzik wrote: > This might be a very valid point... > > (let me know if the following test is flawed) It is imho. > > [jgarzik@rum tmp]$ cat > sectest.c > > #include > > #include > > static const char version[] __initdata = "foo"; static char version2[] __initdata = "bar"; > > [jgarzik@rum tmp]$ gcc -D__KERNEL__ -I/spare/cvs/linux_2_4/include -Wall >-Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe >-mpreferred-stack-boundary=2 -march=i686-c -o sectest.o sectest.c > > [jgarzik@rum tmp]$ > > No section type conflict appears. Now it SHOULD conflict on these binutils, but doesn't on mine (2.9.5) ;-) It is decided to put it into .data.init as expected. AFAIK "const" is only a promise to the compiler, that we write this data ONCE and read only after this initial write. So the decision on the section is implementation defined. What I don't understand is, why GCC overrides our explicit override (done by setting the "section attribute" explicitly). I would consider this a BUG in GCC. I don't understand, why we support this BUG... Maybe some GCC people can enlighten me, why GCC ignores such overrides, that are for the cases where we DO KNOW BETTER than GCC, what section is correct. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Re: Linux 2.4.4-ac10
On Sun, May 20, 2001 at 05:29:49AM +0200, Mike Galbraith wrote: > I'm not sure why that helps. I didn't put it in as a trick or > anything though. I put it in because it didn't seem like a > good idea to ever have more cleaned pages than free pages at a > time when we're yammering for help.. so I did that and it helped. The rationale for this is easy: free pages is wasted memory, clean pages is hot, clean cache. The best state a cache can be in. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Sat, May 19, 2001 at 05:11:30PM +0100, Alan Cox wrote: > If it had been a manufacturer in most respectable areas of business they'd be > recalling and reissuing components, and paying for the end resllers to notify > each customer This is consumer hardware. Consumer products are optimized for a good buzzword count per $ ratio. Everything else is secondary. Producing cheap stuff has its price. And being so smart an buing cheapest available has the same price. QA and recalling are expensive as hell. That's why cheap products usally have this quality tradeoff. Most consumers don't like to pay for quality. Germany has learned this lesson and thus "Made in Germany" doesn't mean anything for certain products anymore :-( Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD w/info-PATCH] device arguments from lookup, partion code
On Sat, May 19, 2001 at 11:34:48AM -0700, Linus Torvalds wrote: [Reasons] > So the "English is bad" argument is a complete non-argument. Jepp, I have to agree. English is used more or less as an communication protocol in computer science and for operating computers. Once you know how to operate an computer in English, you can operate nearly every computer in the world, because they have English as default locale. Let's not repeat Babel please :-( PS: English is neither mine, nor Linus native language. Why do the English natives complain instead of us? ;-) And be glad that's not German, that has this role. English sentences are WAY easier to parse by computers, because it doesn't use much suffixes and prefixes on words and has very few exceptions. Also these exceptions are eleminated from command languages WITHOUT influencing readability and comprehensability. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.4-ac10
On Fri, May 18, 2001 at 03:23:03PM -0300, Rik van Riel wrote: > On Fri, 18 May 2001, Ingo Oeser wrote: > > > Rik: Would you take patches for such a tradeoff sysctl? > > "such a tradeoff" ? > > While this sounds reasonable, I have to point out that > up to now nobody has described exactly WHAT tradeoff > they'd like to make tunable and why... Amount of pages reclaimed from swapout_mm() versus amount of pages reclaimed from caches. A value that says: "use XX% of my main memory for RSS of processes, even if I run heavy disk loadf now" would be nice. For general purpose machines, where I run several services but also play games, this would allow both to survive. The external services would go slower. Who cares, if some CVS updates or NFS services go slower, if I can play my favorite game at full speed? ;-) > I'm not against making things tunable, but I would like > to at least see the proponents of tunable things explain > WHAT they want tunable and exactly WHY. Ideally: Every value that the kernel decides by heuristics, because heuristics can fail to get even close to an optimal result. But this is too much. Some tunables from refill_inactive would be nice. Also the patch for honouring the soft rss limit is good (is it in?). Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.4-ac10
On Fri, May 18, 2001 at 07:45:15PM +0200, Mike Galbraith wrote: > Yes, ~exactly! I chose 30 tasks because they almost do (tool/userland > dependant.. must recalibrate often) fit. The bitch is to get the vm > to automagically detect the rss/cache munch tradeoff point without all > the manual help. What about a sysctl for that? Choose decent steps and let 0 (which is an insane value) mean "let's kernel decide" and make this default. In the past we could do this by adjusting some watermarks in /proc/sys/vm but now, we can't do anything but trust the genius kernel developers. I doubt that we can test all kinds of workload and even imagine what pervert stuff some people do with their machines. Tuning _is_ manual work. Always has been and always will be. This countinously "I know it better then you" is what I hated about Windows and now this comes more and more into Linux :-( Rik: Would you take patches for such a tradeoff sysctl? Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.4-ac10
On Thu, May 17, 2001 at 05:45:38PM +0100, Alan Cox wrote: > 2.4.4-ac10 I think someone forgot this little return. It removes the following warning: serial.c:4208: warning: control reaches end of non-void function --- linux-2.4.4-ac10/drivers/char/serial.c Thu May 17 20:41:05 2001 +++ linux-2.4.4-ac10-ioe/drivers/char/serial.c Thu May 17 20:35:53 2001 @@ -4205,6 +4205,7 @@ { __set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(HZ/10); + return 0; } /* Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cmpci sound chip lockup
On Wed, May 16, 2001 at 08:02:06PM -0300, Rik van Riel wrote: > I'm seeing a similar thing on 2.4.4-pre[23], but in a far less > serious way. Using xmms the music stops after anything between > a few seconds and a minute, I suspect a race condition somewhere. > > Using mpg123 everything works fine... Your xmms uses esd[1]? Friends of mine report problems with esd and 2.4.x. Tested on SB-Live! and es1371. Regards Ingo Oeser [1] E Sound Deamon - A sound mixing framework -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CMPXCHG
On Wed, May 16, 2001 at 03:37:00PM -0700, Scott Huang wrote: > Four adapters need to produce data to a > descriptor queue which is consumed by a > user process. A lock mechanism was implemented > to sync the adapters. However, this causes > a performance hit. Is it possible to use > CMPXCHG on Intel's i-386 to avoid the locking? What about using atomic operations for that? This is more general and works on ALL architectures. CMPXCHG is just and special atomic operation on ia32. > Where can I find some doc and some sample code? Documentation/DocBook/kernel-hacking.tmpl But better do make htmldocs in the kernel top level directory and read Documentation/DocBook/kernel-hacking/lk-hacking-guide.html instead. Sample code is scattered all around in the kernel. Regards Ingo Oeser -- To the systems programmer, users and applications serve only to provide a test load. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
On Wed, May 16, 2001 at 02:36:44PM -0700, H. Peter Anvin wrote: > > But all devices which export a CD-ROM interface will do so. So the > > device node that is associated with the CD-ROM driver will export > > CD-ROM semantics, and the trailing name will be "/cd". > > > > Other interfaces a device exports, such as a CD-RW, appear as a > > different device node ("generic" for SCSI, because we have no CD-RW > > classification at this point). > > > > My scheme works already, and works reliably. Nothing had to be done to > > support the CD-ROM interface to CD-RW and DVD devices. > > > > It's still completely braindamaged: (a) these interfaces aren't > disjoint. They refer to the same device, and will interfere with each > other; (b) it is highly undesirable to tie the naming to the interfaces > in this way. It further restricts the namespaces you can export, for one > thing. We do this already with ide-scsi. A device is visible as /dev/hda and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda, /dev/sr0 and /dev/sg0. All at the same time. It is perfectly normal to export different interfaces for the same device. This is basically, what subfunctions on PCI do: Same device with different interfaces. Just that we do it through a driver with ide and through the hardware with a multi function PCI card. Applications don't care about devices. They care about entities that have capabilities and programming interfaces. What they _really_ are and if this is only emulated is not important. Sorry, I don't see your point here :-( Regards Ingo Oeser -- 10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag> <<<<<<<<<<<< been there and had much fun >>>>>>>>>>>> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
On Mon, May 14, 2001 at 09:33:35PM -0300, Rik van Riel wrote: > Agreed. However, if this thing means I cannot use the -linus > tree without devfs, then it will also mean my VM stuff only > gets tested on -ac kernels... No Problem. I test most of your VM stuff anyway and I use devfs on that machine ;-) PS: It's not that hard to build a machine, which can support both. E-Mail me, if you would like to know _how_ to do that. Regards Ingo Oeser -- 10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag> <<<<<<<<<<<< been there and had much fun >>>>>>>>>>>> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
On Tue, May 15, 2001 at 10:44:23AM -0700, James Simmons wrote: > different. I do plan on some day merging drm and fbdev into one interface. So > I plan to change this behavior. I like to see this interface ioctl-less > (is their such a word ???). You mmap to alter buffers. Mmap is much more > flexiable than write for graphics buffers anyways. You use write to pass > "data" to the driver. The only problem with mmap(): You cannot know, if the page changed under you a**. What would first mmap()ed page of the screen look like, if some accelerator wrote a line there? Invalidating all mmap()ed pages for each and every accelerator command would be evil. Forbidding reads of that page is evil, too. I have the same problem with DSPs, which like to mmap() some of their memory into the application, but can alter this memory every instruction the execute. mmap() has it's beauties, but ... Regards Ingo Oeser -- 10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag> <<<<<<<<<<<< been there and had much fun >>>>>>>>>>>> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/