Re: What still uses the block layer?
[EMAIL PROTECTED] wrote: > On Mon, 15 Oct 2007, Stefan Richter wrote: >> Low-level networking drivers suggest a default interface name (per >> interface or as a template like eth%d into which the networking core >> inserts a lowest spare number). ... >> Could low-level SCSI drivers provide similar name templates which give a >> hint on the transport involved? ... > one other option that could be considered (and I do realize I'm bringing > up flame-bait here) is that drivers that have fixed addresses could > offer up a device name that include that address. ... That's already implemented. :-) Transport drivers expose transport specific information in sysfs; udev scripts examine it and create by-id and by-path symlinks to device files of HDDs. Not everybody agrees, but many think that it's sensible to implement just mechanism in kernel and leave policy to userspace. My suggestion and the default network interface names already violate this principle to a degree, but it can still be implemented in a transport independent way, and userspace can continue to create whatever names the user needs. -- Stefan Richter -=-=-=== =-=- = http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH try #3] Input/Joystick Driver: add support AD7142 joystick driver
On 10/16/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote: > On 10/15/07, Bryan Wu <[EMAIL PROTECTED]> wrote: > > On 10/15/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote: > > > > > > Completion is just not a good abstraction here... Please use work > > > abstraction and possibly a separate workqueue. > > > > Yes, I agree with you now, although I have a little concern about the > > possibility of big delay introduced by workqueue. > > > > Having a separate workqueue should isolate the driver from users > hogging keventd. Otherwise the speed should be pretty much the same as > with a kthread. > Does this driver need the create a new kthread instead of keventd? I think keventd might be sufficient for this driver. > > > > Thanks a lot for you kindly review. > > I will resend update patch later. > > Thank you for not getting frustrated with all my change requests. Oh, your help is very useful. It encourages us to send out our drivers to LKML. > Btw, > blackfin keypad driver is in my tree and should be in mainline once > Linus does the pull I requested. > Thanks again, I noticed it was merged already. -Bryan Wu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] Protect crashkernel against BSS overlap
On Mon, Oct 15, 2007 at 01:50:42PM +0200, Bernhard Walle wrote: > I observed the problem that even when you choose the default 16M as > crashkernel base address and the kernel is very big, the reserved area may > overlap with the kernel BSS. Currently, this is not checked at runtime, so the > kernel just crashes when you load the panic kernel in the sys_kexec call. > > This two patches check this at runtime. The patches are against current git, > but with the patches > > extended-crashkernel-command-line.patch > extended-crashkernel-command-line-update.patch > extended-crashkernel-command-line-comment-fix.patch > > extended-crashkernel-command-line-improve-error-handling-in-parse_crashkernel_mem.patch > use-extended-crashkernel-command-line-on-i386.patch > use-extended-crashkernel-command-line-on-i386-update.patch > use-extended-crashkernel-command-line-on-x86_64.patch > use-extended-crashkernel-command-line-on-x86_64-update.patch > use-extended-crashkernel-command-line-on-ia64.patch > use-extended-crashkernel-command-line-on-ia64-fix.patch > use-extended-crashkernel-command-line-on-ia64-update.patch > use-extended-crashkernel-command-line-on-ppc64.patch > use-extended-crashkernel-command-line-on-ppc64-update.patch > use-extended-crashkernel-command-line-on-sh.patch > use-extended-crashkernel-command-line-on-sh-update.patch > > from -mm tree applied since they are marked to be merged in 2.6.24. > > I know that the implementation of both patches is only x86 (i386 and x86-64), > but if you agree that it's the way to go, I can add the BSS resource > and the check for all other architectures that apply. > Hi Bernhard, Shouldn't bootmem allocator have the functionality to flag an error if we try to reserve a memory which is already reserved? I see that bootmem allocator is currently printing a warning under CONFIG_DEBUG_BOOTMEM. Wouldn't it be better if we reserve the code, data and bss memory also using bootmem allocator and when somebody tries to reserve craskernel memory and if there is an overlap, boot memory allocator should scream? In second patch, you are checking for crash kernel reserved memory being beyond _end. That will make sure that there is no overlap with kernel text, data or bss. I am wondering then why do we need first patch and why should we register bss memory in the resources list. Second patch would make sure that there is no overlap with crash kernel memory and kexec will not place any segment outside crashkernel memory. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-git8 missing include file
Hi, The build fails with following error CC drivers/usb/storage/scsiglue.o CC drivers/usb/storage/protocol.o CC drivers/usb/storage/transport.o In file included from drivers/usb/storage/transport.c:53: include/scsi/scsi_eh.h:79: error: field 'sense_sgl' has incomplete type make[3]: *** [drivers/usb/storage/transport.o] Error 1 make[2]: *** [drivers/usb/storage] Error 2 make[1]: *** [drivers/usb] Error 2 make: *** [drivers] Error 2 This patch fixes the build failure --- linux-2.6.23/include/scsi/scsi_cmnd.h 2007-10-16 09:58:30.0 +0530 +++ linux-2.6.23/include/scsi/~scsi_cmnd.h 2007-10-16 10:20:32.0 +0530 @@ -5,6 +5,7 @@ #include #include #include +#include struct request; struct scatterlist; -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] forcedeth: fix the NAPI poll function
Ingo Molnar wrote: * Jeff Garzik <[EMAIL PROTECTED]> wrote: Two comments: 1) we have a vague definition of "RX work processed." Due to error conditions and goto's in that function, rx_processed_cnt may or may not equal the number of packets actually processed. 2) man I dislike these inline C statement combinations (ranting at original code style, not you). I would much rather waste a few extra lines of source code and make the conditions obvious: while (... && (rx_processed_cnt < limit)) { rx_processed_cnt++; ... } or even while (1) { ... if (rx_processed_cnt == limit) break; rx_processed_cnt++; } The compiler certainly doesn't care, and IMO it prevents bugs. agreed. Do you have an uptodate patch/git-URI for the forcedeth rewrite you did? I can throw it into the testbed. Branch 'fe-lock' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git It works here locally, but at this very minute I am rewriting those changesets yet again :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-2.6.23-mm1 crashed
On 10/14/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > I didn't notice that qemu was involved. Does qemu have an emulator for the > gdth hardware? > I think no, the kernel just probe exist or not hardware, and hangs after that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] forcedeth: fix the NAPI poll function
* Jeff Garzik <[EMAIL PROTECTED]> wrote: > Two comments: > > 1) we have a vague definition of "RX work processed." Due to error > conditions and goto's in that function, rx_processed_cnt may or may > not equal the number of packets actually processed. > > 2) man I dislike these inline C statement combinations (ranting at > original code style, not you). I would much rather waste a few extra > lines of source code and make the conditions obvious: > > while (... && (rx_processed_cnt < limit)) { > rx_processed_cnt++; > > ... > } > > or even > > while (1) { > ... > if (rx_processed_cnt == limit) > break; > rx_processed_cnt++; > } > > The compiler certainly doesn't care, and IMO it prevents bugs. agreed. Do you have an uptodate patch/git-URI for the forcedeth rewrite you did? I can throw it into the testbed. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb
On Mon, Oct 15, 2007 at 10:37:04AM -0700, Yinghai Lu wrote: > Greg, > > from linus's git this morning.. > > ACPI: PCI Interrupt :00:02.1[B] -> Link [LUB2] -> GSI 21 (level, > low) -> IRQ 21 > ehci_hcd :00:02.1: EHCI Host Controller > ehci_hcd :00:02.1: new USB bus registered, assigned bus number 1 > ehci_hcd :00:02.1: debug port 1 > ehci_hcd :00:02.1: irq 21, io mem 0xdefbec00 > ehci_hcd :00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 > usb usb1: configuration #1 chosen from 1 choice > hub 1-0:1.0: USB hub found > hub 1-0:1.0: 10 ports detected > ACPI: PCI Interrupt :00:02.0[A] -> Link [LUB0] -> GSI 22 (level, > low) -> IRQ 22 > ohci_hcd :00:02.0: OHCI Host Controller > ohci_hcd :00:02.0: new USB bus registered, assigned bus number 2 > ohci_hcd :00:02.0: irq 22, io mem 0xdefbf000 > usb 1-1: new high speed USB device using ehci_hcd and address 2 > usb usb2: configuration #1 chosen from 1 choice > hub 2-0:1.0: USB hub found > hub 2-0:1.0: 10 ports detected > USB Universal Host Controller Interface driver v3.0 > Initializing USB Mass Storage driver... > usb 1-1: configuration #1 chosen from 1 choice > usb 1-6: new high speed USB device using ehci_hcd and address 5 > usb 1-6: configuration #1 chosen from 1 choice > hub 1-6:1.0: USB hub found > hub 1-6:1.0: 4 ports detected > sysfs: duplicate filename 'bInterfaceNumber' can not be created > WARNING: at fs/sysfs/dir.c:425 sysfs_add_one() > > Call Trace: > [] sysfs_add_one+0x54/0xbd > [] sysfs_add_file+0x50/0x81 > [] sysfs_create_group+0x9a/0xf2 > [] usb_create_sysfs_intf_files+0x32/0xc7 > [] usb_set_configuration+0x49d/0x4c0 > [] generic_probe+0x53/0x95 > [] driver_probe_device+0xd3/0x150 > [] __device_attach+0x0/0x5 > [] bus_for_each_drv+0x40/0x71 > [] device_attach+0x63/0x7a > [] bus_attach_device+0x2a/0x78 > [] device_add+0x308/0x51e > [] usb_new_device+0x47/0x80 > [] hub_thread+0x75a/0xb4a > [] autoremove_wake_function+0x0/0x2e > [] hub_thread+0x0/0xb4a > [] kthread+0x47/0x76 > [] child_rip+0xa/0x12 > [] kthread+0x0/0x76 > [] child_rip+0x0/0x12 > > usb 2-3: new full speed USB device using ohci_hcd and address 2 > usb 2-3: configuration #1 chosen from 1 choice > hub 2-3:1.0: USB hub found > hub 2-3:1.0: 4 ports detected > usb 2-5: new low speed USB device using ohci_hcd and address 3 > usb 2-5: configuration #1 chosen from 1 choice > scsi15 : SCSI emulation for USB Mass Storage devices > usb 2-3.1: new low speed USB device using ohci_hcd and address 4 > usb 2-3.1: configuration #1 chosen from 1 choice > usb 2-3.4: new low speed USB device using ohci_hcd and address 5 > usb 2-3.4: configuration #1 chosen from 1 choice Yes, I can duplicate this here now too. Will work on this tomorrow to track down... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure
On Tuesday 16 October 2007 14:57, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > >> make_page_uptodate() is most hideous part I have run into. > >> It has to know details about other layers to now what not > >> to stomp. I think my incorrect simplification of this is what messed > >> things up, last round. > > > > Not really, it's just named funny. That's just a minor utility > > function that more or less does what it says it should do. > > > > The main problem is really that it's implementing a block device > > who's data comes from its own buffercache :P. I think. > > Well to put it another way, mark_page_uptodate() is the only > place where we really need to know about the upper layers. > Given that you can kill ramdisks by coding it as: > > static void make_page_uptodate(struct page *page) > { > clear_highpage(page); > flush_dcache_page(page); > SetPageUptodate(page); > } > > Something is seriously non-intuitive about that function if > you understand the usual rules for how to use the page cache. You're overwriting some buffers that were uptodate and dirty. That would be expected to cause problems. > The problem is that we support a case in the buffer cache > where pages are partially uptodate and only the buffer_heads > remember which parts are valid. Assuming we are using them > correctly. > > Having to walk through all of the buffer heads in make_page_uptodate > seems to me to be a nasty layering violation in rd.c Sure, but it's not just about the buffers. It's the pagecache in general. It is supposed to be invisible to the device driver and sitting above it, and yet it is taking the buffercache and using it to pull its data out of. > > I think it's worthwhile, given that we'd have a "real" looking > > block device and minus these bugs. > > For testing purposes I think I can agree with that. What non-testing uses does it have? > >> Having a separate store would > >> solve some of the problems, and probably remove the need > >> for carefully specifying the ramdisk block size. We would > >> still need the magic restictions on page allocations though > >> and it we would use them more often as the initial write to the > >> ramdisk would not populate the pages we need. > > > > What magic restrictions on page allocations? Actually we have > > fewer restrictions on page allocations because we can use > > highmem! > > With the proposed rewrite yes. > > > And the lowmem buffercache pages that we currently pin > > (unsuccessfully, in the case of this bug) are now completely > > reclaimable. And all your buffer heads are now reclaimable. > > Hmm. Good point. So in net it should save memory even if > it consumes a little more in the worst case. Highmem systems would definitely like it. For others, yes, all the duplicated pages should be able to get reclaimed if memory gets tight, along with the buffer heads, so yeah footprint may be a tad smaller. > > If you mean GFP_NOIO... I don't see any problem. Block device > > drivers have to allocate memory with GFP_NOIO; this may have > > been considered magic or deep badness back when the code was > > written, but it's pretty simple and accepted now. > > Well I always figured it was a bit rude allocating large amounts > of memory GFP_NOIO but whatever. You'd rather not, of course, but with dirty data limits now, it doesn't matter much. (and I doubt anybody outside testing is going to be hammering like crazy on rd). Note that the buffercache based ramdisk driver is going to also be allocating with GFP_NOFS if you're talking about a filesystem writing to its metadata. In most systems, GFP_NOFS isn't much different to GFP_NOIO. We could introduce a mode which allocates pages up front quite easily if it were a problem (which I doubt it ever would be). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.23-git8 kernel oops at __rb_rotate_left+0x7/0x70
Hi, While running kernbench with the 2.6.23-git8 following oops is produced Unable to handle kernel NULL pointer dereference at 0010 RIP: [] __rb_rotate_left+0x7/0x70 PGD 31f7ad067 PUD 31f14d067 PMD 0 Oops: [1] SMP CPU 8 Modules linked in: loop dm_mod md_mod sg Pid: 6923, comm: slpd Not tainted 2.6.23-git8-autokern1 #1 RIP: 0010:[] [] __rb_rotate_left+0x7/0x70 RSP: 0018:81031d083e90 EFLAGS: 00010086 RAX: 8106147550d0 RBX: 81033007b650 RCX: RDX: RSI: 810330080808 RDI: 8106147550d0 RBP: 8106147550d0 R08: 81033007b650 R09: 81033007b650 R10: 8103300807e0 R11: R12: 8106147550d0 R13: 810330080808 R14: 810330080780 R15: 0008 FS: 2ab70eae80a0() GS:8106146b5440() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0010 CR3: 00031d08f000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process slpd (pid: 6923, threadinfo 81031d082000, task 81031f31f100) Stack: 8033f4aa 8106147550c0 81031d083ed0 8103300807e0 81031f31f300 8022bc59 0008 0384 81031d083f70 804b6dec 81031d61b0c0 1000 Call Trace: [] rb_insert_color+0x8a/0xf0 [] put_prev_task_fair+0x49/0x60 [] schedule+0xec/0x1d1 [] vfs_read+0xc5/0x160 [] sys_read+0x53/0x90 [] sysret_careful+0xd/0x10 Code: 48 8b 51 10 49 83 e0 fc 48 85 d2 48 89 57 08 74 0c 48 8b 02 RIP [] __rb_rotate_left+0x7/0x70 RSP CR2: 0010 -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-git8-autokern1 # Mon Oct 15 19:35:23 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=20 # CONFIG_CPUSETS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set # CONFIG_X86_MSR is not set # CONFIG_X86_CPUID is not set CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y # CONFIG_MTRR is not set CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y
Re: [PATCH 11/11] maps3: make page monitoring /proc file optional
On Mon, 15 Oct 2007, Matt Mackall wrote: > Index: l/init/Kconfig > === > --- l.orig/init/Kconfig 2007-10-14 13:35:07.0 -0500 > +++ l/init/Kconfig2007-10-15 17:18:16.0 -0500 > @@ -571,6 +571,15 @@ config SLOB > > endchoice > > +config PROC_PAGE_MONITOR > + default y > + bool "Enable /proc page monitoring" if EMBEDDED && PROC_FS && MMU > + help > + Various /proc files exist to monitor process memory utilization: > + /proc/pid/smaps, /proc/pid/clear_refs, /proc/pid/pagemap, > + /proc/kpagecount, and /proc/kpageflags. Disabling these > + interfaces will reduce the size of the kernel by approximately 4kb. > + > endmenu # General setup > > config RT_MUTEXES It's probably better not to include the text size savings since it will most likely be outdated at some time in the future. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] usb+sysfs: duplicate filename 'bInterfaceNumber'
>On 10/16/07, Greg KH <[EMAIL PROTECTED]> wrote: > On Mon, Oct 15, 2007 at 02:38:25PM -0400, Alan Stern wrote: > > On Mon, 15 Oct 2007, Dave Young wrote: > > > > > On 10/14/07, Borislav Petkov <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > > > > > i get the following warning on yesterday's git tree > > > > (v2.6.23-2840-g752097c): > > > > > > > > Oct 14 09:07:15 zmei kernel: [ 49.368030] sysfs: duplicate filename > > > > 'bInterfaceNumber' can not be created > > > > Oct 14 09:07:15 zmei kernel: [ 49.368086] WARNING: at > > > > fs/sysfs/dir.c:425 sysfs_add_one() > > > > Oct 14 09:07:15 zmei kernel: [ 49.368134] [] > > > > show_trace_log_lvl+0x1a/0x2f > > > > Oct 14 09:07:15 zmei kernel: [ 49.368220] [] > > > > show_trace+0x12/0x14 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368300] [] > > > > dump_stack+0x16/0x18 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368379] [] > > > > sysfs_add_one+0x57/0xbc > > > > Oct 14 09:07:15 zmei kernel: [ 49.368461] [] > > > > sysfs_add_file+0x49/0x71 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368541] [] > > > > sysfs_create_group+0x86/0xe8 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368621] [] > > > > usb_create_sysfs_intf_files+0x27/0x9b > > > > Oct 14 09:07:15 zmei kernel: [ 49.368704] [] > > > > usb_set_configuration+0x454/0x466 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368787] [] > > > > generic_probe+0x53/0x94 > > > > Oct 14 09:07:15 zmei kernel: [ 49.368867] [] > > > > usb_probe_device+0x35/0x3b > > > > Oct 14 09:07:15 zmei kernel: [ 49.368947] [] > > > > driver_probe_device+0xcb/0x14f > > > > Oct 14 09:07:15 zmei kernel: [ 49.369039] [] > > > > __device_attach+0x8/0xa > > > > Oct 14 09:07:15 zmei kernel: [ 49.369119] [] > > > > bus_for_each_drv+0x3b/0x63 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369199] [] > > > > device_attach+0x70/0x85 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369279] [] > > > > bus_attach_device+0x29/0x77 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369359] [] > > > > device_add+0x28c/0x445 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369439] [] > > > > usb_new_device+0x44/0x82 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369519] [] > > > > hub_thread+0x666/0x9c2 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369598] [] > > > > kthread+0x3b/0x62 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369679] [] > > > > kernel_thread_helper+0x7/0x10 > > > > Oct 14 09:07:15 zmei kernel: [ 49.369759] === > > > > > > > > The usb hub in question is named 4-1:1.0 and it has an extension > > > > connected to it > > > > which is used to activate the 2 usb connectors at the side of the pc's > > > > monitor. > > > > Correct me if i'm wrong but from what i've understood so far from > > > > reading the code, > > > > i think, it adds the bInterfaceNumber-file after calling > > > > usb_create_sysfs_intf_files(intf). > > > > However, the currently active usbhost interface alternate setting is > > > > the only one active > > > > so the bInterfaceNumber exists already and therefore the warning, but > > > > this is > > > > just a guess since i'm not that fluent in the usb internals. > > > Hi, > > > I have encountered the same problem which was reported in > > > http://lkml.org/lkml/2007/9/29/45 > > > > > > For the first one "usbcore duplicated sysfs filename" , I have submit > > > a patch to fix it. > > > > > > For the "bInterfaceNumber" one, I have no idea, the same problem still > > > exist in the latest 23-mm1 tree. > > > > I have tried several times to duplicate this, most recently under > > 2.6.23-mm1. But nothing goes wrong; the error messages don't appear. > > > > You may have to do your own debugging. Try adding printk statements to > > usb_create_sysfs_intf_files() and usb_remove_sysfs_intf_files() so you > > can tell when they get called. > > I finally duplicated this on one of my machines here at boot time, with > USB built into the kernel. I'll work tomorrow on tracking this down > further... Hi, I add some printk messages, dump_stack and some others, here is the dmesg dump with debug info(lines begin with "hidave"): Linux version 2.6.23-mm1 ([EMAIL PROTECTED]) (gcc version 3.4.6) #4 SMP PREEMPT Tue Oct 16 11:14:10 CST 2007 BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 3fe88c00 (usable) BIOS-e820: 3fe88c00 - 3fe8ac00 (ACPI NVS) BIOS-e820: 3fe8ac00 - 3fe8cc00 (ACPI data) BIOS-e820: 3fe8cc00 - 4000 (reserved) BIOS-e820: f000 - f400 (reserved) BIOS-e820: fec0 - fed00400 (reserved) BIOS-e820: fed2 - feda (reserved) BIOS-e820: fee0 - fef0 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 126MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710
Re: [RFC] cpuset update_cgroup_cpus_allowed
> Will do - I justed wanted to get this quickly out to show the idea > that I was working on. Ok - good. In the final analysis, I'll take whatever works ;). I'll lobby for keeping the code "simple" (a subjective metric) and poke what holes I can in things, and propose what alternatives I can muster. But so long as setting a cpusets 'cpus' in 2.6.24 leads, whether by my historical "rewrite the pid to its own 'tasks' file" hack, or by a proper solution such as you have advocated, or by some other scheme or hack, to updating the cpus_allowed of each task in that cpuset, then I'm ok. Right now, that goal is not met, with the cgroup patches lined up in *-mm for what will become 2.6.24. We're getting short of time to fix this. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/11] maps3: move clear_refs code to task_mmu.c
On Mon, 15 Oct 2007, Matt Mackall wrote: > Index: l/fs/proc/task_mmu.c > === > --- l.orig/fs/proc/task_mmu.c 2007-10-14 13:38:43.0 -0500 > +++ l/fs/proc/task_mmu.c 2007-10-14 13:39:00.0 -0500 > @@ -324,19 +324,47 @@ static int show_smap(struct seq_file *m, > > static struct mm_walk clear_refs_walk = { .pmd_entry = clear_refs_pte_range > }; > > -void clear_refs_smap(struct mm_struct *mm) > +static ssize_t clear_refs_write(struct file *file, const char __user *buf, > + size_t count, loff_t *ppos) > { > + struct task_struct *task; > + char buffer[13], *end; The #define for PROC_NUMBUF will need to be moved from fs/proc/base.c to include/linux/proc_fs.h and used here instead of hardcoding it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow kconfig to accept overrides
On Monday 15 October 2007 11:29:58 pm Sam Ravnborg wrote: > Hi Rob & Jan. > > On Fri, Oct 12, 2007 at 11:44:08PM +0200, Jan Engelhardt wrote: > > Allow config variables in .config to override earlier ones in the same > > file. In other words, > > > > # CONFIG_SECURITY is not defined > > CONFIG_SECURITY=y > > > > will activate it. This makes it a bit easier to do > > > > (cat original-config myconfig myconfig2 ... >.config) > > > > and run menuconfig as expected. > > How far is this from the miniconfig functionality? > Is it the same or can we achieve the miniconfig support > by extending Jan's patch? > > See: http://lkml.org/lkml/2007/10/12/391 Way way back (2.6.10 or thereabouts) I first did a miniconfig via running allnoconfig, concatenating a miniconfig to the result, and running "make oldconfig" on that. This concatenation method had two main problems: 1) Around 2.6.15 the kconfig infrastructure changed so the first instance symbol won rather than the last symbol. It looks like this patch just sets the behavior back to what we had in 2.6.14 and earlier. 2) When a symbol activates new subsymbols (opening a new menu, for example), those dependant symbols would be activated at their oldconfig default values, not their allnoconfig default values. This meant there might be a valid configuration that you couldn't specify without saying "symbol=n" to turn some of them off in your miniconfig, which is something a miniconfig should never have to do. (This happens when allnoconfig and oldconfig are run in two separate passes. The oldconfig pass uses the wrong default values for newly enabled symbols. Menuconfig has the same defaults as oldconfig, which are _not_ the same defaults as allnoconfig.) Note that the infrastructure I'm using to _read_ miniconfig files is just a repurpose of the existing KCONFIG_ALLCONFIG as applied to allnoconfig. That's in kconfig already, has been since 2.6.15-ish, and works fine. The syntax is nonobvious (two patches from me to improve said syntax and add some error checking were rejected), but the functionality is there and easy enough to trigger: make allnoconfig KCONFIG_ALLCONFIG=mini.conf That expands a mini.conf into a .config, and does the other setup necessary. (You can feed that O= to build out of tree, or ARCH= to build another architecture... Anything you can currently do with allnoconfig.) It's the "shrinking a .config into a mini.conf" side of things that uses a hideous shell script that's not in the tree: http://landley.net/hg/firmware/raw-file/tip/sources/toys/miniconfig.sh To use it: make ARCH=arm defconfig mv .config tempname ARCH=arm ./miniconfig.sh tempname ls -l mini.config (Obviously, the ARCH=arm is optional and you don't have to start with defconfig.) If I had unlimited spare time I might teach kconfig to automatically write a mini.conf every time it writes a .config, and have it use whichever was newer in the update commands (oldconfig/menuconfig/etc). But after two rejected patches on this topic already, with the shell script meeting my needs, that's impressively far down on my todo list. Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] cpuset update_cgroup_cpus_allowed
On 10/15/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > currently against an older kernel > > ah .. which older kernel? 2.6.18, but I can do a version against 2.6.23-mm1. > + if (!retval) { > + cpus_allowed = cpuset_cpus_allowed(p); > + if (!cpus_subset(new_mask, cpus_allowed)) { > + /* > +* We must have raced with a concurrent cpuset > +* update. Just reset the cpus_allowed to the > +* cpuset's cpus_allowed > +*/ > + new_mask = cpus_allowed; > > This narrows the race, perhaps sufficiently, but I don't see that it > guarantees closure. Memory accesses to two different locations are not > guaranteed to be ordered across nodes, as best I recall. The second > line above, that rereads the cpuset cpus_allowed, could get an old > value, in essence. > > cpuset update task sched_setaffinity task > -- -- > > A. write cpuset [Q] V. read cpuset [Q] > B. read task [P]W. check ok > C. write task [P] X. write task [P] > Y. reread cpuset [Q] > Z. check ok again > > Two memory locations: > [P] the cpus_allowed mask in the task_struct of the > task doing the sched_setaffinity call. > [Q] the cpus_allowed mask in the cpuset of the cpuset > to which the sched_setaffinity task is attached. > > Even though, from the perspective of location [P], both B. and C. > happened before X., still from the perspective of location [Q] the > rereading in Y. could return the value the cpuset cpus_allowed had > before the write in A. This could result in a task running with > a cpus_allowed that was totally outside its cpusets cpus_allowed. But cpuset_cpus_allowed() synchronizes on callback_mutex. So I assert this race isn't an issue. > > I will grant that this is a narrow window. I won't loose much sleep > over it. > > > - uses a priority heap to pick the processes to act on, based on start time > > This adds a fair bit of code and complexity, relative to my patch. > This I do loose more sleep over. There has to be a compelling > reason for doing this. My plan was to hide this inside cgroup_iter_* so that users didn't have to hold the cssgroup_lock across the entire iteration. > > The point that David raises, regarding the interaction of this with > hotplug, seems to be a compelling reason for doing -something- > different than my patch proposal. > > I don't know yet if it compels us to this much code, however. > > Any chance you could provide a patch that works against cgroups? > Will do - I justed wanted to get this quickly out to show the idea that I was working on. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Rationalize sys_sched_rr_get_interval()
Jarek Poplawski wrote: On 13-10-2007 03:29, Peter Williams wrote: Jarek Poplawski wrote: On 12-10-2007 00:23, Peter Williams wrote: ... The reason I was going that route was for modularity (which helps when adding plugsched patches). I'll submit a revised patch for consideration. ... IMHO, it looks like modularity could suck here: +static unsigned int default_timeslice_fair(struct task_struct *p) +{ +return NS_TO_JIFFIES(sysctl_sched_min_granularity); +} If it's needed for outside and sched_fair will use something else (to avoid double conversion) this could be misleading. Shouldn't this be kind of private and return something usable for the class mainly? This is supplying data for a system call not something for internal use by the class. As far as the sched_fair class is concerned this is just a (necessary - because it's need by a system call) diversion. So, now all is clear: this is the misleading case! Why anything else than sched_fair should care about this? sched_fair doesn't care so if nothing else does why do we even have sys_sched_rr_get_interval()? Is this whole function an anachronism that can be expunged? I'm assuming that the reason it exists is that there are user space programs that use this system call. Am I correct in this assumption? Personally, I can't think of anything it would be useful for other than satisfying curiosity. Since this is for some special aim (not default for most classes, at least not for sched_fair) I'd suggest to change names: default_timeslice_fair() and .default_timeslice to something like eg.: rr_timeslice_fair() and .rr_timeslice or rr_interval_fair() and .rr_interval (maybe with this "default" before_"rr_" if necessary). On the other hand man (2) sched_rr_get_interval mentions that: "The identified process should be running under the SCHED_RR scheduling policy". Also this place seems to say about something simpler: http://www.gnu.org/software/libc/manual/html_node/Basic-Scheduling-Functions.html So, I still doubt sched_fair's "notion" of timeslices should be necessary here. As do I. Even more so now that you've shown me the man page for sched_rr_get_interval(). I'd suggest that we modify sched_rr_get_interval() to return -EINVAL (with *interval set to zero) if the target task is not SCHED_RR. That way we can save a lot of unnecessary code. I'll work on a patch. Unless you want to do it? Sorry for too harsh words. I didn't consider them harsh. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/11] maps3: use pagewalker in clear_refs and smaps
On Mon, 15 Oct 2007, Matt Mackall wrote: > Use the generic pagewalker for smaps and clear_refs > > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> Acked-by: David Rientjes <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/11] maps3: introduce a generic page walker
On Mon, 15 Oct 2007, Matt Mackall wrote: > Introduce a general page table walker > > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> > > Index: l/include/linux/mm.h > === > --- l.orig/include/linux/mm.h 2007-10-09 17:37:59.0 -0500 > +++ l/include/linux/mm.h 2007-10-10 11:46:37.0 -0500 > @@ -773,6 +773,17 @@ unsigned long unmap_vmas(struct mmu_gath > struct vm_area_struct *start_vma, unsigned long start_addr, > unsigned long end_addr, unsigned long *nr_accounted, > struct zap_details *); > + > +struct mm_walk { > + int (*pgd_entry)(pgd_t *, unsigned long, unsigned long, void *); > + int (*pud_entry)(pud_t *, unsigned long, unsigned long, void *); > + int (*pmd_entry)(pmd_t *, unsigned long, unsigned long, void *); > + int (*pte_entry)(pte_t *, unsigned long, unsigned long, void *); > + int (*pte_hole) (unsigned long, unsigned long, void *); > +}; > + > +int walk_page_range(struct mm_struct *, unsigned long addr, unsigned long > end, > + struct mm_walk *walk, void *private); The struct mm_walk * can be qualified as const. > Index: l/mm/pagewalk.c > === > --- /dev/null 1970-01-01 00:00:00.0 + > +++ l/mm/pagewalk.c 2007-10-10 11:46:37.0 -0500 > @@ -0,0 +1,120 @@ > +#include > +#include > +#include > + > +static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > + struct mm_walk *walk, void *private) > +{ > + pte_t *pte; > + int err = 0; > + > + pte = pte_offset_map(pmd, addr); > + do { > + err = walk->pte_entry(pte, addr, addr, private); > + if (err) > +break; > + } while (pte++, addr += PAGE_SIZE, addr != end); > + > + pte_unmap(pte); > + return err; > +} > + > +static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > + struct mm_walk *walk, void *private) > +{ > + pmd_t *pmd; > + unsigned long next; > + int err = 0; > + > + pmd = pmd_offset(pud, addr); > + do { > + next = pmd_addr_end(addr, end); > + if (pmd_none_or_clear_bad(pmd)) { > + if (walk->pte_hole) > + err = walk->pte_hole(addr, next, private); > + if (err) > + break; > + continue; > + } > + if (walk->pmd_entry) > + err = walk->pmd_entry(pmd, addr, next, private); > + if (!err && walk->pte_entry) > + err = walk_pte_range(pmd, addr, next, walk, private); > + if (err) > + break; > + } while (pmd++, addr = next, addr != end); > + > + return err; > +} > + > +static int walk_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end, > + struct mm_walk *walk, void *private) > +{ > + pud_t *pud; > + unsigned long next; > + int err = 0; > + > + pud = pud_offset(pgd, addr); > + do { > + next = pud_addr_end(addr, end); > + if (pud_none_or_clear_bad(pud)) { > + if (walk->pte_hole) > + err = walk->pte_hole(addr, next, private); > + if (err) > + break; > + continue; > + } > + if (walk->pud_entry) > + err = walk->pud_entry(pud, addr, next, private); > + if (!err && (walk->pmd_entry || walk->pte_entry)) > + err = walk_pmd_range(pud, addr, next, walk, private); > + if (err) > + break; > + } while (pud++, addr = next, addr != end); > + > + return err; > +} > + > +/* > + * walk_page_range - walk a memory map's page tables with a callback > + * @mm - memory map to walk > + * @addr - starting address > + * @end - ending address > + * @walk - set of callbacks to invoke for each level of the tree > + * @private - private data passed to the callback function > + * > + * Recursively walk the page table for the memory area in a VMA, calling > + * a callback for every bottom-level (PTE) page table. > + */ > +int walk_page_range(struct mm_struct *mm, > + unsigned long addr, unsigned long end, > + struct mm_walk *walk, void *private) > +{ > + pgd_t *pgd; > + unsigned long next; > + int err = 0; > + > + if (addr >= end) > + return err; unlikely? > + > + pgd = pgd_offset(mm, addr); > + do { > + next = pgd_addr_end(addr, end); > + if (pgd_none_or_clear_bad(pgd)) { > + if (walk->pte_hole) > + err = walk->pte_hole(addr, next, private); > + if (err)
Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure
Nick Piggin <[EMAIL PROTECTED]> writes: >> >> make_page_uptodate() is most hideous part I have run into. >> It has to know details about other layers to now what not >> to stomp. I think my incorrect simplification of this is what messed >> things up, last round. > > Not really, it's just named funny. That's just a minor utility > function that more or less does what it says it should do. > > The main problem is really that it's implementing a block device > who's data comes from its own buffercache :P. I think. Well to put it another way, mark_page_uptodate() is the only place where we really need to know about the upper layers. Given that you can kill ramdisks by coding it as: static void make_page_uptodate(struct page *page) { clear_highpage(page); flush_dcache_page(page); SetPageUptodate(page); } Something is seriously non-intuitive about that function if you understand the usual rules for how to use the page cache. The problem is that we support a case in the buffer cache where pages are partially uptodate and only the buffer_heads remember which parts are valid. Assuming we are using them correctly. Having to walk through all of the buffer heads in make_page_uptodate seems to me to be a nasty layering violation in rd.c >> > I guess it's not nice >> > for operating on the pagecache from its request_fn, but the >> > alternative is to duplicate pages for backing store and buffer >> > cache (actually that might not be a bad alternative really). >> >> Cool. Triple buffering :) Although I guess that would only >> apply to metadata these days. > > Double buffering. You no longer serve data out of your buffer > cache. All filesystem data was already double buffered anyway, > so we'd be just losing out on one layer of savings for metadata. Yep we are in agreement there. > I think it's worthwhile, given that we'd have a "real" looking > block device and minus these bugs. For testing purposes I think I can agree with that. >> Having a separate store would >> solve some of the problems, and probably remove the need >> for carefully specifying the ramdisk block size. We would >> still need the magic restictions on page allocations though >> and it we would use them more often as the initial write to the >> ramdisk would not populate the pages we need. > > What magic restrictions on page allocations? Actually we have > fewer restrictions on page allocations because we can use > highmem! With the proposed rewrite yes. > And the lowmem buffercache pages that we currently pin > (unsuccessfully, in the case of this bug) are now completely > reclaimable. And all your buffer heads are now reclaimable. Hmm. Good point. So in net it should save memory even if it consumes a little more in the worst case. > If you mean GFP_NOIO... I don't see any problem. Block device > drivers have to allocate memory with GFP_NOIO; this may have > been considered magic or deep badness back when the code was > written, but it's pretty simple and accepted now. Well I always figured it was a bit rude allocating large amounts of memory GFP_NOIO but whatever. >> A very ugly bit seems to be the fact that we assume we can >> dereference bh->b_data without any special magic which >> means the ramdisk must live in low memory on 32bit machines. > > Yeah but that's not rd.c. You need to rewrite the buffer layer > to fix that (see fsblock ;)). I'm not certain which way we should go. Take fsblock and run it in parallel until everything is converted or use fsblock as a prototype and once we have figured out which way we should go convert struct buffer_head into struct fsblock one patch at a time. I'm inclined to think we should evolve the buffer_head. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007, Greg KH wrote: On Mon, Oct 15, 2007 at 10:04:01PM -0600, Matthew Wilcox wrote: On Mon, Oct 15, 2007 at 07:54:22PM -0700, [EMAIL PROTECTED] wrote: do PCI devices reorder their bus numbers spontaniously, or only if you change the hardware? The only system I've had that reordered PCI bus numbers was when I had a partitionable system and changed the partitioning. Not quite "change the hardware", but neither was it "spontaneous". It was certainly unexpected (for me). Greg probably has quite different examples. Changing the hardware (adding a new PCI device or removing one) are the most common times this happens. But I have seen reports of this happening when you upgrade/downgrade BIOS versions, and, in some oops-we-messed-up cases, when we changed things in the kernel. BIOS upgrades qualify as changing hardware (or close to it) oops-we-messed-up cases of kernel changes don't justify 'best effort' nameing, it's a regression that needs to be fixed. now the other example given of docking a laptop is closer to reasonable (and is definantly a reason to have 'best effort' nameing as an option), but that's still a relativly special case, and it _is_ definantly changeing the hardware David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-2.6.23-git3: Many sysfs-related warnings in dmesg
On Sat, Oct 13, 2007 at 09:26:32PM +0200, Rafael J. Wysocki wrote: > Hi, > > There are many traces like this in my dmesg from 2.6.23-git3 (they don't > appear for vanilla 2.6.23): > > <4>sysfs: duplicate filename 'ethxx1' can not be created > WARNING: at /home/rafael/src/linux-2.6/fs/sysfs/dir.c:425 sysfs_add_one() > > Call Trace: > [] sysfs_add_one+0x5c/0xc9 > [] sysfs_create_link+0xd1/0x12c > [] device_rename+0x17a/0x1db > [] dev_change_name+0x114/0x20c > [] dev_ifsioc+0x204/0x2d0 > [] dev_ioctl+0x520/0x633 > [] sk_alloc+0x37/0x10c > [] up_read+0x9/0xb > [] sock_ioctl+0x1fe/0x20c > [] do_ioctl+0x2a/0x77 > [] vfs_ioctl+0x251/0x26e > [] sys_ioctl+0x5f/0x83 > [] system_call+0x7e/0x83 > > net ethxx1: device_rename: sysfs_create_symlink failed (-17) > sysfs: duplicate filename 'eth1' can not be created > > Everything seems to work, but this just looks fishy. This is a userspace program renaming your network device to a name that is already in use. What distro and release is this? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
On Tuesday 16 October 2007 14:38, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote: > > I don't follow your logic. We don't need SWAP > RAM in order to swap > > effectively, IMO. > > The steady state of a system that is heavily and usably swapping but > not thrashing is that all of the pages in RAM are in the swap cache, > at least that used to be the case. Yeah, it works better in 2.6 (and, IIRC later 2.4 kernels). > > I don't know if there is a causal relationship there. I mean, I > > think it's been a long time since thrashing was ever a viable mode > > of operation, right? > > Right. But swapping heavily has been a viable mode of operation > and that the vast gap in disk random IO performance seems to have > hurt significantly. Or, just not improved as fast as everything else is improving. There isn't too much the kernel can do about that. It just relatively changes the point at which you'd consider "swapping heavily", right? > It be very clear is used to able to run a problem at little below > full speed with the disk pegged with swap traffic, and I did this > regularly when I started out with linux. I can do this now. In make -jhuge tests for example, you can get a 4GB, 4 core machine to max out a disk with swapping and still have 0 idle time. Of course you can also go past that point and your idle time comes up. That's not new though. > > Maybe desktops just have less need for swapping now, so nobody sees > > it much until something goes _really_ bad. When I'm using my 256MB > > machine, unused stuff goes to swap. > > There is a bit of truth in the fact that there is less need for > swapping now. At the same time however swapping simply does not > work well right now, and I'm not at all certain why. > > >> the disk for is very limited. I wonder if we could figure out > >> how to push and pull 1M or bigger chunks into and out of swap? > > > > Pulling in 1MB pages can really easily end up compounding the > > thrashing problem unless you're very sure a significant amount > > of it will be used. > > It's a hard call. The I/O time for 1MB of contiguous disk data > is about the I/O time of 512 bytes of contiguous disk data. And if you're thrashing, then by definition you need to throw out 1MB of your working set in order to read it in. > >> I don't know if swap has actually worked since we vmscan stopped > >> going over the virtual addresses. > > > > I do, and it does ;) > > Really? Not just the pushing of unused stuff into swap. We had several bugs and things that caused swapping performance regressions vs 2.4 in earlyish 2.6. After those were fixed, we're pretty competitive with 2.4 in some basic tests I was using. I haven't run them for a fair while, so something might have broken since then, I don't know. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] usb+sysfs: duplicate filename 'bInterfaceNumber'
On Mon, Oct 15, 2007 at 02:38:25PM -0400, Alan Stern wrote: > On Mon, 15 Oct 2007, Dave Young wrote: > > > On 10/14/07, Borislav Petkov <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > i get the following warning on yesterday's git tree > > > (v2.6.23-2840-g752097c): > > > > > > Oct 14 09:07:15 zmei kernel: [ 49.368030] sysfs: duplicate filename > > > 'bInterfaceNumber' can not be created > > > Oct 14 09:07:15 zmei kernel: [ 49.368086] WARNING: at > > > fs/sysfs/dir.c:425 sysfs_add_one() > > > Oct 14 09:07:15 zmei kernel: [ 49.368134] [] > > > show_trace_log_lvl+0x1a/0x2f > > > Oct 14 09:07:15 zmei kernel: [ 49.368220] [] > > > show_trace+0x12/0x14 > > > Oct 14 09:07:15 zmei kernel: [ 49.368300] [] > > > dump_stack+0x16/0x18 > > > Oct 14 09:07:15 zmei kernel: [ 49.368379] [] > > > sysfs_add_one+0x57/0xbc > > > Oct 14 09:07:15 zmei kernel: [ 49.368461] [] > > > sysfs_add_file+0x49/0x71 > > > Oct 14 09:07:15 zmei kernel: [ 49.368541] [] > > > sysfs_create_group+0x86/0xe8 > > > Oct 14 09:07:15 zmei kernel: [ 49.368621] [] > > > usb_create_sysfs_intf_files+0x27/0x9b > > > Oct 14 09:07:15 zmei kernel: [ 49.368704] [] > > > usb_set_configuration+0x454/0x466 > > > Oct 14 09:07:15 zmei kernel: [ 49.368787] [] > > > generic_probe+0x53/0x94 > > > Oct 14 09:07:15 zmei kernel: [ 49.368867] [] > > > usb_probe_device+0x35/0x3b > > > Oct 14 09:07:15 zmei kernel: [ 49.368947] [] > > > driver_probe_device+0xcb/0x14f > > > Oct 14 09:07:15 zmei kernel: [ 49.369039] [] > > > __device_attach+0x8/0xa > > > Oct 14 09:07:15 zmei kernel: [ 49.369119] [] > > > bus_for_each_drv+0x3b/0x63 > > > Oct 14 09:07:15 zmei kernel: [ 49.369199] [] > > > device_attach+0x70/0x85 > > > Oct 14 09:07:15 zmei kernel: [ 49.369279] [] > > > bus_attach_device+0x29/0x77 > > > Oct 14 09:07:15 zmei kernel: [ 49.369359] [] > > > device_add+0x28c/0x445 > > > Oct 14 09:07:15 zmei kernel: [ 49.369439] [] > > > usb_new_device+0x44/0x82 > > > Oct 14 09:07:15 zmei kernel: [ 49.369519] [] > > > hub_thread+0x666/0x9c2 > > > Oct 14 09:07:15 zmei kernel: [ 49.369598] [] > > > kthread+0x3b/0x62 > > > Oct 14 09:07:15 zmei kernel: [ 49.369679] [] > > > kernel_thread_helper+0x7/0x10 > > > Oct 14 09:07:15 zmei kernel: [ 49.369759] === > > > > > > The usb hub in question is named 4-1:1.0 and it has an extension > > > connected to it > > > which is used to activate the 2 usb connectors at the side of the pc's > > > monitor. > > > Correct me if i'm wrong but from what i've understood so far from reading > > > the code, > > > i think, it adds the bInterfaceNumber-file after calling > > > usb_create_sysfs_intf_files(intf). > > > However, the currently active usbhost interface alternate setting is the > > > only one active > > > so the bInterfaceNumber exists already and therefore the warning, but > > > this is > > > just a guess since i'm not that fluent in the usb internals. > > Hi, > > I have encountered the same problem which was reported in > > http://lkml.org/lkml/2007/9/29/45 > > > > For the first one "usbcore duplicated sysfs filename" , I have submit > > a patch to fix it. > > > > For the "bInterfaceNumber" one, I have no idea, the same problem still > > exist in the latest 23-mm1 tree. > > I have tried several times to duplicate this, most recently under > 2.6.23-mm1. But nothing goes wrong; the error messages don't appear. > > You may have to do your own debugging. Try adding printk statements to > usb_create_sysfs_intf_files() and usb_remove_sysfs_intf_files() so you > can tell when they get called. I finally duplicated this on one of my machines here at boot time, with USB built into the kernel. I'll work tomorrow on tracking this down further... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] lockdep: fixup the inode dir annotation
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > please pull the lockdep tree from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep.git > > v2.6.24-lockdep > > Hmm. I'm now getting > > WARNING: at kernel/lockdep.c:700 look_up_lock_class() it triggered here too - the patch from Peter below was tested overnight and seems to do the trick for me. Ingo -> Subject: lockdep: fixup the inode dir annotation A slight oversight tripped lockdep debugging code, each lockdep class should have but a single init site. Rearange the code to make this true. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- fs/inode.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) Index: linux/fs/inode.c === --- linux.orig/fs/inode.c +++ linux/fs/inode.c @@ -568,16 +568,16 @@ EXPORT_SYMBOL(new_inode); void unlock_new_inode(struct inode *inode) { #ifdef CONFIG_DEBUG_LOCK_ALLOC - struct file_system_type *type = inode->i_sb->s_type; - /* -* ensure nobody is actually holding i_mutex -*/ - mutex_destroy(>i_mutex); - mutex_init(>i_mutex); - if (inode->i_mode & S_IFDIR) + if (inode->i_mode & S_IFDIR) { + struct file_system_type *type = inode->i_sb->s_type; + + /* +* ensure nobody is actually holding i_mutex +*/ + mutex_destroy(>i_mutex); + mutex_init(>i_mutex); lockdep_set_class(>i_mutex, >i_mutex_dir_key); - else - lockdep_set_class(>i_mutex, >i_mutex_key); + } #endif /* * This is special! We do not need the spinlock - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
[EMAIL PROTECTED] writes: > > on some kernel versions you are correct about needing swap > ram, but on > current > versions you are not. the swap space gets allocated as needed, and re-used as > needed (I don't know the mechanism of this, but I remember the last time this > changed from vm=max(ram,swap) to vm=ram+swap) I don't think I can recall a linux kernel that required swap > ram. However for serious swapping under linux having swap > ram was very useful and pretty much a requirement for a workload that involved swapping heavily (not thrashing). >> I have not heard of many people swapping and not thrashing lately. >> I think part of the problem is that we do random access to the swap >> partition which makes us seek limited. And since the number of >> seeks per unit time has been increasing at a linear or slower rate >> that if we are doing random disk I/O then the amount we can use >> the disk for is very limited. I wonder if we could figure out >> how to push and pull 1M or bigger chunks into and out of swap? > > it has been noted by many people that linux is very slow to pull things back > into ram from swap, significantly slower then simple seed limiting would seem > to > account for. Yes. It may be the large amount of random access (my current guess) or it may be something else. I'm wonder if I should build an application with a configurable data set and working set that can be used for swap testing. I don't think it would be very hard and it might help sort through some of the swap performance problems. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
Nick Piggin <[EMAIL PROTECTED]> writes: > On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote: >> Nick Piggin <[EMAIL PROTECTED]> writes: > >> > How much swap do you have configured? You really shouldn't configure >> > so much unless you do want the kernel to actually use it all, right? >> >> No. >> >> There are three basic swapping scenarios. >> - Pushing unused data out of ram >> - Swapping >> - Thrashing >> >> To effectively swap you need SWAP > RAM because after a little while of >> swapping all of your pages in RAM should be assigned a location in the >> page cache. > > I don't follow your logic. We don't need SWAP > RAM in order to swap > effectively, IMO. The steady state of a system that is heavily and usably swapping but not thrashing is that all of the pages in RAM are in the swap cache, at least that used to be the case. >> I have not heard of many people swapping and not thrashing lately. >> I think part of the problem is that we do random access to the swap >> partition which makes us seek limited. And since the number of >> seeks per unit time has been increasing at a linear or slower rate >> that if we are doing random disk I/O then the amount we can use > > I don't know if there is a causal relationship there. I mean, I > think it's been a long time since thrashing was ever a viable mode > of operation, right? Right. But swapping heavily has been a viable mode of operation and that the vast gap in disk random IO performance seems to have hurt significantly. It be very clear is used to able to run a problem at little below full speed with the disk pegged with swap traffic, and I did this regularly when I started out with linux. > Maybe desktops just have less need for swapping now, so nobody sees > it much until something goes _really_ bad. When I'm using my 256MB > machine, unused stuff goes to swap. There is a bit of truth in the fact that there is less need for swapping now. At the same time however swapping simply does not work well right now, and I'm not at all certain why. >> the disk for is very limited. I wonder if we could figure out >> how to push and pull 1M or bigger chunks into and out of swap? > > Pulling in 1MB pages can really easily end up compounding the > thrashing problem unless you're very sure a significant amount > of it will be used. It's a hard call. The I/O time for 1MB of contiguous disk data is about the I/O time of 512 bytes of contiguous disk data. >> I don't know if swap has actually worked since we vmscan stopped >> going over the virtual addresses. > > I do, and it does ;) Really? Not just the pushing of unused stuff into swap. >> > Because if we're not really conservative about OOM killing, then the >> > user who actually really did want to use all the swap they configured >> > gets angry when we kill their jobs without using it all. >> >> I totally agree. The fact that the OOM killer started is a sign that >> the system was completely overwhelmed and nothing better could happen. >> >> In this case my gut feel says limiting the total number of processes >> would have been much more effective then anything at all to do with >> swap. make -j reminds me of the classic fork bomb. > > Yep. > > >> > Would an oom-kill-someone-now sysrq be of help, I wonder? >> >> Well we have SAQ which should kill everything on your current VT >> which should include X and all of it's children. > > Which is exactly what you don't want to do if you've just forkbombed > yourself. I missed the fact that we now have a manual oom kill... You probably have a point there. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow kconfig to accept overrides
Hi Rob & Jan. On Fri, Oct 12, 2007 at 11:44:08PM +0200, Jan Engelhardt wrote: > > Allow config variables in .config to override earlier ones in the same > file. In other words, > > # CONFIG_SECURITY is not defined > CONFIG_SECURITY=y > > will activate it. This makes it a bit easier to do > > (cat original-config myconfig myconfig2 ... >.config) > > and run menuconfig as expected. How far is this from the miniconfig functionality? Is it the same or can we achieve the miniconfig support by extending Jan's patch? See: http://lkml.org/lkml/2007/10/12/391 Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, Oct 15, 2007 at 10:04:01PM -0600, Matthew Wilcox wrote: > On Mon, Oct 15, 2007 at 07:54:22PM -0700, [EMAIL PROTECTED] wrote: > > do PCI devices reorder their bus numbers spontaniously, or only if you > > change the hardware? > > The only system I've had that reordered PCI bus numbers was when I had a > partitionable system and changed the partitioning. Not quite "change > the hardware", but neither was it "spontaneous". It was certainly > unexpected (for me). > > Greg probably has quite different examples. Changing the hardware (adding a new PCI device or removing one) are the most common times this happens. But I have seen reports of this happening when you upgrade/downgrade BIOS versions, and, in some oops-we-messed-up cases, when we changed things in the kernel. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > How much swap do you have configured? You really shouldn't configure > > so much unless you do want the kernel to actually use it all, right? > > No. > > There are three basic swapping scenarios. > - Pushing unused data out of ram > - Swapping > - Thrashing > > To effectively swap you need SWAP > RAM because after a little while of > swapping all of your pages in RAM should be assigned a location in the > page cache. I don't follow your logic. We don't need SWAP > RAM in order to swap effectively, IMO. > I have not heard of many people swapping and not thrashing lately. > I think part of the problem is that we do random access to the swap > partition which makes us seek limited. And since the number of > seeks per unit time has been increasing at a linear or slower rate > that if we are doing random disk I/O then the amount we can use I don't know if there is a causal relationship there. I mean, I think it's been a long time since thrashing was ever a viable mode of operation, right? Maybe desktops just have less need for swapping now, so nobody sees it much until something goes _really_ bad. When I'm using my 256MB machine, unused stuff goes to swap. > the disk for is very limited. I wonder if we could figure out > how to push and pull 1M or bigger chunks into and out of swap? Pulling in 1MB pages can really easily end up compounding the thrashing problem unless you're very sure a significant amount of it will be used. > I don't know if swap has actually worked since we vmscan stopped > going over the virtual addresses. I do, and it does ;) > > Because if we're not really conservative about OOM killing, then the > > user who actually really did want to use all the swap they configured > > gets angry when we kill their jobs without using it all. > > I totally agree. The fact that the OOM killer started is a sign that > the system was completely overwhelmed and nothing better could happen. > > In this case my gut feel says limiting the total number of processes > would have been much more effective then anything at all to do with > swap. make -j reminds me of the classic fork bomb. Yep. > > Would an oom-kill-someone-now sysrq be of help, I wonder? > > Well we have SAQ which should kill everything on your current VT > which should include X and all of it's children. Which is exactly what you don't want to do if you've just forkbombed yourself. I missed the fact that we now have a manual oom kill... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Make m68k cross compile like every other architecture.
On Mon, Oct 15, 2007 at 07:31:54PM -0500, Rob Landley wrote: > On Monday 15 October 2007 3:25:35 pm Geert Uytterhoeven wrote: > > 64-bit parisc tests if /usr/bin/hppa64-linux-gnu- exists. > > If yes, it sets CROSS_COMPILE to hppa64-linux-gnu-. > > If no, it sets CROSS_COMPILE to hppa64-linux- > > > > 32-bit parisc unconditionally sets CROSS_COMPILE to hppa-linux-. > > > > This still breaks Rob's setup if his compiler is called differently. > > Another thing to take into account is that kconfig was recently changed to > save ARCH and CROSS_COMPILE in the .config file: > > http://lwn.net/Articles/253889/ The patch is postponed one merge window. It caused troubles I had not foreseen which needs some attention first. I plan to have it ready for next merge window. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007 22:04:01 -0600 Matthew Wilcox <[EMAIL PROTECTED]> wrote: > On Mon, Oct 15, 2007 at 07:54:22PM -0700, [EMAIL PROTECTED] wrote: > > do PCI devices reorder their bus numbers spontaniously, or only if > > you change the hardware? > > The only system I've had that reordered PCI bus numbers was when I > had a partitionable system and changed the partitioning. Not quite > "change the hardware", but neither was it "spontaneous". It was > certainly unexpected (for me). > a very common one is booting your laptop docked (a real dock, not just a port extender) versus non-docked - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007, Matthew Wilcox wrote: On Mon, Oct 15, 2007 at 07:54:22PM -0700, [EMAIL PROTECTED] wrote: do PCI devices reorder their bus numbers spontaniously, or only if you change the hardware? The only system I've had that reordered PCI bus numbers was when I had a partitionable system and changed the partitioning. Not quite "change the hardware", but neither was it "spontaneous". It was certainly unexpected (for me). Ok, I would class that as the equivalent of 'changing the hardware'. Greg probably has quite different examples. I would definantly be interested in hearing some of them. Greg's comment makes it sound like this is something that (with modern hardware) could happen to anyone at any time (which, if true, would be sufficiant to require 'best effort' nameing of devices for everything), while my experiance is that if the hardware is static (i.e. you don't plugin or unplug PCI devices) the numbering of exisitng PCI devices and buses is static. and while I understand that consumer distros want to have everything 'best effort' named to make it easier for users, I disagree that this should force everyone to use 'best effort' when there are many situations where it's unnessasary overhead and chances for errors. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
On Mon, 15 Oct 2007, Eric W. Biederman wrote: Nick Piggin <[EMAIL PROTECTED]> writes: How much swap do you have configured? You really shouldn't configure so much unless you do want the kernel to actually use it all, right? No. There are three basic swapping scenarios. - Pushing unused data out of ram - Swapping - Thrashing To effectively swap you need SWAP > RAM because after a little while of swapping all of your pages in RAM should be assigned a location in the page cache. on some kernel versions you are correct about needing swap > ram, but on current versions you are not. the swap space gets allocated as needed, and re-used as needed (I don't know the mechanism of this, but I remember the last time this changed from vm=max(ram,swap) to vm=ram+swap) I have not heard of many people swapping and not thrashing lately. I think part of the problem is that we do random access to the swap partition which makes us seek limited. And since the number of seeks per unit time has been increasing at a linear or slower rate that if we are doing random disk I/O then the amount we can use the disk for is very limited. I wonder if we could figure out how to push and pull 1M or bigger chunks into and out of swap? it has been noted by many people that linux is very slow to pull things back into ram from swap, significantly slower then simple seed limiting would seem to account for. Davdi Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure
On Tuesday 16 October 2007 13:14, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > On Monday 15 October 2007 19:16, Andrew Morton wrote: > >> On Tue, 16 Oct 2007 00:06:19 +1000 Nick Piggin <[EMAIL PROTECTED]> > > > > wrote: > >> > On Monday 15 October 2007 18:28, Christian Borntraeger wrote: > >> > > Andrew, this is a resend of a bugfix patch. Ramdisk seems a bit > >> > > unmaintained, so decided to sent the patch to you :-). > >> > > I have CCed Ted, who did work on the code in the 90s. I found no > >> > > current email address of Chad Page. > >> > > >> > This really needs to be fixed... > >> > >> rd.c is fairly mind-boggling vfs abuse. > > > > Why do you say that? I guess it is _different_, by necessity(?) > > Is there anything that is really bad? > > make_page_uptodate() is most hideous part I have run into. > It has to know details about other layers to now what not > to stomp. I think my incorrect simplification of this is what messed > things up, last round. Not really, it's just named funny. That's just a minor utility function that more or less does what it says it should do. The main problem is really that it's implementing a block device who's data comes from its own buffercache :P. I think. > > I guess it's not nice > > for operating on the pagecache from its request_fn, but the > > alternative is to duplicate pages for backing store and buffer > > cache (actually that might not be a bad alternative really). > > Cool. Triple buffering :) Although I guess that would only > apply to metadata these days. Double buffering. You no longer serve data out of your buffer cache. All filesystem data was already double buffered anyway, so we'd be just losing out on one layer of savings for metadata. I think it's worthwhile, given that we'd have a "real" looking block device and minus these bugs. > Having a separate store would > solve some of the problems, and probably remove the need > for carefully specifying the ramdisk block size. We would > still need the magic restictions on page allocations though > and it we would use them more often as the initial write to the > ramdisk would not populate the pages we need. What magic restrictions on page allocations? Actually we have fewer restrictions on page allocations because we can use highmem! And the lowmem buffercache pages that we currently pin (unsuccessfully, in the case of this bug) are now completely reclaimable. And all your buffer heads are now reclaimable. If you mean GFP_NOIO... I don't see any problem. Block device drivers have to allocate memory with GFP_NOIO; this may have been considered magic or deep badness back when the code was written, but it's pretty simple and accepted now. > A very ugly bit seems to be the fact that we assume we can > dereference bh->b_data without any special magic which > means the ramdisk must live in low memory on 32bit machines. Yeah but that's not rd.c. You need to rewrite the buffer layer to fix that (see fsblock ;)). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, Oct 15, 2007 at 07:54:22PM -0700, [EMAIL PROTECTED] wrote: > do PCI devices reorder their bus numbers spontaniously, or only if you > change the hardware? The only system I've had that reordered PCI bus numbers was when I had a partitionable system and changed the partitioning. Not quite "change the hardware", but neither was it "spontaneous". It was certainly unexpected (for me). Greg probably has quite different examples. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOM killer gripe (was Re: What still uses the block layer?)
Nick Piggin <[EMAIL PROTECTED]> writes: > On Monday 15 October 2007 18:04, Rob Landley wrote: >> On Sunday 14 October 2007 8:45:03 pm Theodore Tso wrote: > >> > > excuse for conflating different categories of devices in the first >> > > place. >> > >> > See the thinkpad Ultrabay drive example above. >> >> Last week I drove my laptop so deep into swap (with a "make -j" on qemu) >> that after half an hour trying to repaint my kmail window, it locked solid. >> Again. You'd think the oom killer would come to the rescue, but it didn't. >> Maybe Ubuntu disabled it. I have _2_gigs_ of ram in this sucker, on a >> stock Ubuntu 7.04 install (with the "upgrade all" tab pressed a few times), >> and yet I managed to make it swap itself to death one more time. >> >> Virtual memory isn't perfect. I've _always_ been able to come up with >> examples where it just doesn't work for me. This doesn't mean VM >> overcommit should be abolished, because it's useful more often than not. > > I hate to go completely offtopic here, but disks are so incredibly > slow when compared to RAM that there is really nothing the kernel > can do about this. Presumably the job will finish, given infinite > time. > > How much swap do you have configured? You really shouldn't configure > so much unless you do want the kernel to actually use it all, right? No. There are three basic swapping scenarios. - Pushing unused data out of ram - Swapping - Thrashing To effectively swap you need SWAP > RAM because after a little while of swapping all of your pages in RAM should be assigned a location in the page cache. I have not heard of many people swapping and not thrashing lately. I think part of the problem is that we do random access to the swap partition which makes us seek limited. And since the number of seeks per unit time has been increasing at a linear or slower rate that if we are doing random disk I/O then the amount we can use the disk for is very limited. I wonder if we could figure out how to push and pull 1M or bigger chunks into and out of swap? I don't know if swap has actually worked since we vmscan stopped going over the virtual addresses. > Because if we're not really conservative about OOM killing, then the > user who actually really did want to use all the swap they configured > gets angry when we kill their jobs without using it all. I totally agree. The fact that the OOM killer started is a sign that the system was completely overwhelmed and nothing better could happen. In this case my gut feel says limiting the total number of processes would have been much more effective then anything at all to do with swap. make -j reminds me of the classic fork bomb. > Would an oom-kill-someone-now sysrq be of help, I wonder? Well we have SAQ which should kill everything on your current VT which should include X and all of it's children. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] powerpc64 vDSO: linker script indentation
This cleans up the formatting in the vDSO linker script, mostly just the use of whitespace. It's intended to approximate the kernel standard conventions for indenting C, treating elements of the linker script about like initialized variable definitions. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> CC: Sam Ravnborg <[EMAIL PROTECTED]> --- arch/powerpc/kernel/vdso64/vdso64.lds.S | 225 +-- 1 files changed, 122 insertions(+), 103 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S index 2d70f35..932b3fd 100644 --- a/arch/powerpc/kernel/vdso64/vdso64.lds.S +++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S @@ -10,100 +10,114 @@ ENTRY(_start) SECTIONS { - . = VDSO64_LBASE + SIZEOF_HEADERS; - .hash : { *(.hash) } :text - .gnu.hash : { *(.gnu.hash) } - .dynsym : { *(.dynsym) } - .dynstr : { *(.dynstr) } - .gnu.version: { *(.gnu.version) } - .gnu.version_d : { *(.gnu.version_d) } - .gnu.version_r : { *(.gnu.version_r) } - - .note : { *(.note.*) } :text :note - - . = ALIGN (16); - .text : - { -*(.text .stub .text.* .gnu.linkonce.t.*) -*(.sfpr .glink) - }:text - PROVIDE (__etext = .); - PROVIDE (_etext = .); - PROVIDE (etext = .); - - . = ALIGN(8); - __ftr_fixup : { -*(__ftr_fixup) - } - - . = ALIGN(8); - __fw_ftr_fixup : { -*(__fw_ftr_fixup) - } - - /* Other stuff is appended to the text segment: */ - .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } - .rodata1: { *(.rodata1) } - .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr - .eh_frame : { KEEP (*(.eh_frame)) }:text - .gcc_except_table : { *(.gcc_except_table) } - - .opd ALIGN(8) : { KEEP (*(.opd)) } - .got ALIGN(8) : { *(.got .toc) } - .rela.dyn ALIGN(8) : { *(.rela.dyn) } - - .dynamic: { *(.dynamic) }:text :dynamic - - _end = .; - PROVIDE (end = .); - - /* Stabs debugging sections are here too - */ - .stab 0 : { *(.stab) } - .stabstr 0 : { *(.stabstr) } - .stab.excl 0 : { *(.stab.excl) } - .stab.exclstr 0 : { *(.stab.exclstr) } - .stab.index0 : { *(.stab.index) } - .stab.indexstr 0 : { *(.stab.indexstr) } - .comment 0 : { *(.comment) } - /* DWARF debug sectio/ns. - Symbols in the DWARF debugging sections are relative to the beginning - of the section so we begin them at 0. */ - /* DWARF 1 */ - .debug 0 : { *(.debug) } - .line 0 : { *(.line) } - /* GNU DWARF 1 extensions */ - .debug_srcinfo 0 : { *(.debug_srcinfo) } - .debug_sfnames 0 : { *(.debug_sfnames) } - /* DWARF 1.1 and DWARF 2 */ - .debug_aranges 0 : { *(.debug_aranges) } - .debug_pubnames 0 : { *(.debug_pubnames) } - /* DWARF 2 */ - .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } - .debug_abbrev 0 : { *(.debug_abbrev) } - .debug_line 0 : { *(.debug_line) } - .debug_frame0 : { *(.debug_frame) } - .debug_str 0 : { *(.debug_str) } - .debug_loc 0 : { *(.debug_loc) } - .debug_macinfo 0 : { *(.debug_macinfo) } - /* SGI/MIPS DWARF 2 extensions */ - .debug_weaknames 0 : { *(.debug_weaknames) } - .debug_funcnames 0 : { *(.debug_funcnames) } - .debug_typenames 0 : { *(.debug_typenames) } - .debug_varnames 0 : { *(.debug_varnames) } - - /DISCARD/ : { *(.note.GNU-stack) } - /DISCARD/ : { *(.branch_lt) } - /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.*) } - /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } + . = VDSO64_LBASE + SIZEOF_HEADERS; + + .hash : { *(.hash) } :text + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version: { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + .note : { *(.note.*) }:text :note + + . = ALIGN(16); + .text : { + *(.text .stub .text.* .gnu.linkonce.t.*) + *(.sfpr .glink) + } :text + PROVIDE(__etext = .); + PROVIDE(_etext = .); + PROVIDE(etext = .); + + . = ALIGN(8); + __ftr_fixup : { *(__ftr_fixup) } + + . = ALIGN(8); + __fw_ftr_fixup : { *(__fw_ftr_fixup) } + + /* +* Other stuff is appended to the text segment: +*/ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1: { *(.rodata1) } + + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + + .opd ALIGN(8) : { KEEP (*(.opd)) } + .got ALIGN(8) :
[PATCH] powerpc32 vDSO: linker script indentation
This cleans up the formatting in the vDSO linker script, mostly just the use of whitespace. It's intended to approximate the kernel standard conventions for indenting C, treating elements of the linker script about like initialized variable definitions. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> CC: Sam Ravnborg <[EMAIL PROTECTED]> --- arch/powerpc/kernel/vdso32/vdso32.lds.S | 219 +-- 1 files changed, 118 insertions(+), 101 deletions(-) diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S index 26e138c..9352ab5 100644 --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S @@ -1,130 +1,147 @@ - /* * This is the infamous ld script for the 32 bits vdso * library */ #include -/* Default link addresses for the vDSOs */ OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") OUTPUT_ARCH(powerpc:common) ENTRY(_start) SECTIONS { - . = VDSO32_LBASE + SIZEOF_HEADERS; - .hash : { *(.hash) } :text - .gnu.hash : { *(.gnu.hash) } - .dynsym : { *(.dynsym) } - .dynstr : { *(.dynstr) } - .gnu.version: { *(.gnu.version) } - .gnu.version_d : { *(.gnu.version_d) } - .gnu.version_r : { *(.gnu.version_r) } - - .note : { *(.note.*) } :text :note - - . = ALIGN (16); - .text : - { -*(.text .stub .text.* .gnu.linkonce.t.*) - } - PROVIDE (__etext = .); - PROVIDE (_etext = .); - PROVIDE (etext = .); - - . = ALIGN(8); - __ftr_fixup : { -*(__ftr_fixup) - } + . = VDSO32_LBASE + SIZEOF_HEADERS; + + .hash : { *(.hash) } :text + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version: { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + .note : { *(.note.*) }:text :note + + . = ALIGN(16); + .text : { + *(.text .stub .text.* .gnu.linkonce.t.*) + } + PROVIDE(__etext = .); + PROVIDE(_etext = .); + PROVIDE(etext = .); + + . = ALIGN(8); + __ftr_fixup : { *(__ftr_fixup) } #ifdef CONFIG_PPC64 - . = ALIGN(8); - __fw_ftr_fixup : { -*(__fw_ftr_fixup) - } + . = ALIGN(8); + __fw_ftr_fixup : { *(__fw_ftr_fixup) } #endif - /* Other stuff is appended to the text segment: */ - .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } - .rodata1 : { *(.rodata1) } - - .eh_frame_hdr: { *(.eh_frame_hdr) } :text :eh_frame_hdr - .eh_frame: { KEEP (*(.eh_frame)) } :text - .gcc_except_table: { *(.gcc_except_table) } - .fixup : { *(.fixup) } - - .dynamic : { *(.dynamic) } :text :dynamic - .got : { *(.got) } - .plt : { *(.plt) } - - _end = .; - __end = .; - PROVIDE (end = .); - - - /* Stabs debugging sections are here too - */ - .stab 0 : { *(.stab) } - .stabstr 0 : { *(.stabstr) } - .stab.excl 0 : { *(.stab.excl) } - .stab.exclstr 0 : { *(.stab.exclstr) } - .stab.index 0 : { *(.stab.index) } - .stab.indexstr 0 : { *(.stab.indexstr) } - .comment 0 : { *(.comment) } - .debug 0 : { *(.debug) } - .line 0 : { *(.line) } - - .debug_srcinfo 0 : { *(.debug_srcinfo) } - .debug_sfnames 0 : { *(.debug_sfnames) } - - .debug_aranges 0 : { *(.debug_aranges) } - .debug_pubnames 0 : { *(.debug_pubnames) } - - .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } - .debug_abbrev 0 : { *(.debug_abbrev) } - .debug_line 0 : { *(.debug_line) } - .debug_frame 0 : { *(.debug_frame) } - .debug_str 0 : { *(.debug_str) } - .debug_loc 0 : { *(.debug_loc) } - .debug_macinfo 0 : { *(.debug_macinfo) } - - .debug_weaknames 0 : { *(.debug_weaknames) } - .debug_funcnames 0 : { *(.debug_funcnames) } - .debug_typenames 0 : { *(.debug_typenames) } - .debug_varnames 0 : { *(.debug_varnames) } - - /DISCARD/ : { *(.note.GNU-stack) } - /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.* .sdata*) } - /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } + /* +* Other stuff is appended to the text segment: +*/ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1: { *(.rodata1) } + + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + .fixup : { *(.fixup) } + + .dynamic: { *(.dynamic) } :text :dynamic + .got: { *(.got) } + .plt: { *(.plt) } + + _end = .; + __end = .; + PROVIDE(end = .); + + /* +* Stabs debugging sections are here too. +*/ +
[PATCH] SH vDSO: linker script indentation
This cleans up the formatting in the vDSO linker script, mostly just the use of whitespace. It's intended to approximate the kernel standard conventions for indenting C, treating elements of the linker script about like initialized variable definitions. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> CC: Sam Ravnborg <[EMAIL PROTECTED]> --- arch/sh/kernel/vsyscall/vsyscall.lds.S | 77 +-- 1 files changed, 42 insertions(+), 35 deletions(-) diff --git a/arch/sh/kernel/vsyscall/vsyscall.lds.S b/arch/sh/kernel/vsyscall/vsyscall.lds.S index b13c3d4..c9bf2af 100644 --- a/arch/sh/kernel/vsyscall/vsyscall.lds.S +++ b/arch/sh/kernel/vsyscall/vsyscall.lds.S @@ -17,45 +17,52 @@ ENTRY(__kernel_vsyscall); SECTIONS { - . = SIZEOF_HEADERS; + . = SIZEOF_HEADERS; - .hash : { *(.hash) } :text - .gnu.hash : { *(.gnu.hash) } - .dynsym : { *(.dynsym) } - .dynstr : { *(.dynstr) } - .gnu.version: { *(.gnu.version) } - .gnu.version_d : { *(.gnu.version_d) } - .gnu.version_r : { *(.gnu.version_r) } + .hash : { *(.hash) } :text + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version: { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } - /* This linker script is used both with -r and with -shared. - For the layouts to match, we need to skip more than enough - space for the dynamic symbol table et al. If this amount - is insufficient, ld -shared will barf. Just increase it here. */ - . = 0x400; + /* +* This linker script is used both with -r and with -shared. +* For the layouts to match, we need to skip more than enough +* space for the dynamic symbol table et al. If this amount +* is insufficient, ld -shared will barf. Just increase it here. +*/ + . = 0x400; - .text : { *(.text) } :text =0x90909090 - .note : { *(.note.*) } :text :note - .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr - .eh_frame : { KEEP (*(.eh_frame)) }:text - .dynamic: { *(.dynamic) }:text :dynamic - .useless: { - *(.got.plt) *(.got) - *(.data .data.* .gnu.linkonce.d.*) - *(.dynbss) - *(.bss .bss.* .gnu.linkonce.b.*) - }:text + .text : { *(.text) } :text =0x90909090 + .note : { *(.note.*) }:text :note + .eh_frame_hdr : { *(.eh_frame_hdr ) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .dynamic: { *(.dynamic) } :text :dynamic + .useless: { + *(.got.plt) *(.got) + *(.data .data.* .gnu.linkonce.d.*) + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + } :text } /* + * Very old versions of ld do not recognize this name token; use the constant. + */ +#define PT_GNU_EH_FRAME0x6474e550 + +/* * We must supply the ELF program headers explicitly to get just one * PT_LOAD segment, and set the flags explicitly to make segments read-only. */ PHDRS { - text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ - dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ - note PT_NOTE FLAGS(4); /* PF_R */ - eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ + textPT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4);/* PF_R */ + notePT_NOTE FLAGS(4); /* PF_R */ + eh_frame_hdrPT_GNU_EH_FRAME; } /* @@ -63,12 +70,12 @@ PHDRS */ VERSION { - LINUX_2.6 { -global: - __kernel_vsyscall; - __kernel_sigreturn; - __kernel_rt_sigreturn; + LINUX_2.6 { + global: + __kernel_vsyscall; + __kernel_sigreturn; + __kernel_rt_sigreturn; -local: *; - }; + local: *; + }; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Killing a network connection
> There is a /proc/sys/net/ipv4/ip_dynaddr sysctl in 2.6.21. Actually, it does look promising, thanks. Stefan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Killing a network connection
>> The main use for me is to deal with dangling connections due to taking >> network interfaces up with different IP addresses (typically the wlan0 >> interface where the IP is different because I've modes from an AP to >> another). Of course, maybe there's another way to solve this particular >> problem, in case I'd like to hear about it as well. > Long ago I did a 2.4 patch that solved exactly this problem. It introduced > a new ifconfig flag "dynamic" and when a dynamic address went down > all TCP connections originating from it were killed. It's still available > in older SUSE releases. I might post a forward port later. Actually, I'm pretty happy sometimes with the current behavior: if the interface goes down and back up with the same AP within a short enough time, it typically gets the same IP and the router's NAT table still has the TCP connection live and things "just work". So I'd want to kill the connections not when the interface goes down, but in comes back up with a different IP. Stefan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ia64 vDSO: linker script indentation
This cleans up the formatting in the vDSO linker script, mostly just the use of whitespace. It's intended to approximate the kernel standard conventions for indenting C, treating elements of the linker script about like initialized variable definitions. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> CC: Sam Ravnborg <[EMAIL PROTECTED]> --- arch/ia64/kernel/gate.lds.S | 135 +++ 1 files changed, 72 insertions(+), 63 deletions(-) diff --git a/arch/ia64/kernel/gate.lds.S b/arch/ia64/kernel/gate.lds.S index 6d19833..44817d9 100644 --- a/arch/ia64/kernel/gate.lds.S +++ b/arch/ia64/kernel/gate.lds.S @@ -1,7 +1,8 @@ /* - * Linker script for gate DSO. The gate pages are an ELF shared object prelinked to its - * virtual address, with only one read-only segment and one execute-only segment (both fit - * in one page). This script controls its layout. + * Linker script for gate DSO. The gate pages are an ELF shared object + * prelinked to its virtual address, with only one read-only segment and + * one execute-only segment (both fit in one page). This script controls + * its layout. */ @@ -9,72 +10,80 @@ SECTIONS { - . = GATE_ADDR + SIZEOF_HEADERS; - - .hash: { *(.hash) } :readable - .gnu.hash: { *(.gnu.hash) } - .dynsym : { *(.dynsym) } - .dynstr : { *(.dynstr) } - .gnu.version : { *(.gnu.version) } - .gnu.version_d : { *(.gnu.version_d) } - .gnu.version_r : { *(.gnu.version_r) } - .dynamic : { *(.dynamic) } :readable :dynamic - - /* - * This linker script is used both with -r and with -shared. For the layouts to match, - * we need to skip more than enough space for the dynamic symbol table et al. If this - * amount is insufficient, ld -shared will barf. Just increase it here. - */ - . = GATE_ADDR + 0x500; - - .data.patch : { - __start_gate_mckinley_e9_patchlist = .; - *(.data.patch.mckinley_e9) - __end_gate_mckinley_e9_patchlist = .; - - __start_gate_vtop_patchlist = .; - *(.data.patch.vtop) - __end_gate_vtop_patchlist = .; - - __start_gate_fsyscall_patchlist = .; - *(.data.patch.fsyscall_table) - __end_gate_fsyscall_patchlist = .; - - __start_gate_brl_fsys_bubble_down_patchlist = .; - *(.data.patch.brl_fsys_bubble_down) - __end_gate_brl_fsys_bubble_down_patchlist = .; - } :readable - .IA_64.unwind_info : { *(.IA_64.unwind_info*) } - .IA_64.unwind: { *(.IA_64.unwind*) } :readable :unwind + . = GATE_ADDR + SIZEOF_HEADERS; + + .hash : { *(.hash) } :readable + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version: { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + .dynamic: { *(.dynamic) } :readable :dynamic + + /* +* This linker script is used both with -r and with -shared. For +* the layouts to match, we need to skip more than enough space for +* the dynamic symbol table et al. If this amount is insufficient, +* ld -shared will barf. Just increase it here. +*/ + . = GATE_ADDR + 0x500; + + .data.patch : { + __start_gate_mckinley_e9_patchlist = .; + *(.data.patch.mckinley_e9) + __end_gate_mckinley_e9_patchlist = .; + + __start_gate_vtop_patchlist = .; + *(.data.patch.vtop) + __end_gate_vtop_patchlist = .; + + __start_gate_fsyscall_patchlist = .; + *(.data.patch.fsyscall_table) + __end_gate_fsyscall_patchlist = .; + + __start_gate_brl_fsys_bubble_down_patchlist = .; + *(.data.patch.brl_fsys_bubble_down) + __end_gate_brl_fsys_bubble_down_patchlist = .; + } :readable + + .IA_64.unwind_info : { *(.IA_64.unwind_info*) } + .IA_64.unwind : { *(.IA_64.unwind*) } :readable :unwind #ifdef HAVE_BUGGY_SEGREL - .text (GATE_ADDR + PAGE_SIZE): { *(.text) *(.text.*) } :readable +
Re: [PATCH] Map volume and brightness events on thinkpads
On Monday, October 15, 2007 2:07 pm Henrique de Moraes Holschuh wrote: > We should fix the backlight class to be more useful and support > poll() or somesuch, for userspace to track the backlight level in a > resource-friendly way for OSD (the only sane thing to do on an IBM > thinkpad with such events). And an ALSA mixer to provide a proper > path to the thinkpad-acpi volume functionality is also in my schedule > for 2.6.25. > > As for Lenovo thinkpads, brightness control is to be processed by the > ACPI video module, so brightness hot keys are not to be reported by > default there either. I am not so sure about the volume keys, but > your patch touches the IBM keymap *and* you provide no testing > information for the various Lenovo models, so I have to NAK it as > well until more information is available. No, on Lenovo (and in general actually) the firmware should *not* touch the backlight. Otherwise if another driver touches it the driver and firmware will be out of sync, causing unexpected and undesirable behavior. We intend to fix this for the Intel driver at least (requiring both ACPI video driver and gfx driver updates), others will probably follow eventually. Jesse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure
Nick Piggin <[EMAIL PROTECTED]> writes: > On Monday 15 October 2007 19:16, Andrew Morton wrote: >> On Tue, 16 Oct 2007 00:06:19 +1000 Nick Piggin <[EMAIL PROTECTED]> > wrote: >> > On Monday 15 October 2007 18:28, Christian Borntraeger wrote: >> > > Andrew, this is a resend of a bugfix patch. Ramdisk seems a bit >> > > unmaintained, so decided to sent the patch to you :-). >> > > I have CCed Ted, who did work on the code in the 90s. I found no >> > > current email address of Chad Page. >> > >> > This really needs to be fixed... >> >> rd.c is fairly mind-boggling vfs abuse. > > Why do you say that? I guess it is _different_, by necessity(?) > Is there anything that is really bad? make_page_uptodate() is most hideous part I have run into. It has to know details about other layers to now what not to stomp. I think my incorrect simplification of this is what messed things up, last round. > I guess it's not nice > for operating on the pagecache from its request_fn, but the > alternative is to duplicate pages for backing store and buffer > cache (actually that might not be a bad alternative really). Cool. Triple buffering :) Although I guess that would only apply to metadata these days. Having a separate store would solve some of the problems, and probably remove the need for carefully specifying the ramdisk block size. We would still need the magic restictions on page allocations though and it we would use them more often as the initial write to the ramdisk would not populate the pages we need. A very ugly bit seems to be the fact that we assume we can dereference bh->b_data without any special magic which means the ramdisk must live in low memory on 32bit machines. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007, Stefan Richter wrote: Subject: Re: What still uses the block layer? Matthew Wilcox wrote: On Mon, Oct 15, 2007 at 04:26:04AM -0500, Rob Landley wrote: Combining USB and IDE into the same /dev/sd? namespace makes enumerating the IDE devices much harder than in the traditional "/dev/hdb doesn't move without a screwdriver" model. The merger creates a new problem for IDE, one which didn't exist before: the addition or removal of other unrelated types of devices may change this device's location next boot. It may be possible to add additional complication to the system to compensate, but what was the advantage of merging the namespaces in the first place? It's not something anyone particularly set out to do, it's just how it worked out. It was justified by saying "ok, this goes from a 99% solution to a 96% solution, but there's 100% solution called uuids". I don't particularly agree with this line of argumentation, but it did hold sway. Low-level networking drivers suggest a default interface name (per interface or as a template like eth%d into which the networking core inserts a lowest spare number). Userspace can rename interfaces, but nevertheless it's nice to have different default kernel names for ethernet, wlan etc.. Could low-level SCSI drivers provide similar name templates which give a hint on the transport involved? It's a bit more difficult as with networking interfaces though because - SCSI devices can have sd, sr, st, osst, ch, sg interfaces, - SCSI device files share a namespace with all other device files. E.g. /dev/sd-ide-b - second IDE HDD, /dev/sd-iscsi-e - fifth iSCSI direct access device, /dev/sr-sata-0 - first SATA CD-ROM, /dev/sr-usb-0 - a USB CD-ROM, /dev/st-fw-0- a FireWire tape drive, /dev/sda- a device whose transport driver didn't propose a name Of course the really interesting names will still be provided by udev-generated symlinks. this is a nice option, and since most of the existing userspace code is looking for /dev/sd*, /dev/sr*, etc this should be able to work for new installs with no userspace changes. Since it would break existing installs it would need to be optional. one other option that could be considered (and I do realize I'm bringing up flame-bait here) is that drivers that have fixed addresses could offer up a device name that include that address. i.e. depending on the config option a device could show up as either sda, sd-scsi-a, sd-scsi-0:0:0:0, or even sd-scsi- if the driver or bus doesn't have a real numbering, it wouldn't invent a fake one (which is a big problem with most of the prior suggestions that have tried to offer a numbering option), it would just offer the most specific information it has. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007, Greg KH wrote: On Mon, Oct 15, 2007 at 05:08:36AM -0500, Rob Landley wrote: On Monday 15 October 2007 4:06:20 am Julian Calaby wrote: On 10/15/07, Rob Landley <[EMAIL PROTECTED]> wrote: I note that the eth0 and eth1 names are dynamically assigned on a first come first serve basis (like scsi). This never causes me a problem because the driver loading order is constant, and once you figure out that eth0 is gigabit and eth1 is the 80211g it _stays_ that way across reboots, reliably. Yeah, it's a heuristic. Hands up everybody relying on such a heuristic in the real world. Umm, not quite, from my experiences with pre-production wireless drivers, (another story, another time) fancy stuff is being done in udev to make sure that your gigabit card is always assigned to eth0. I remember building a 2.4 kernel, statically linking in all the drivers, and getting the ethernet devices showing up in a reliable order for years. Where does the need for fancy stuff come in? Because PCI devices reorder their bus numbers all the time. And we have ethernet devices hanging off of USB connections now (yes, even built-in to the machine), and we have network connections on other hot-pluggable busses (remember, PCI is hot pluggable.) do PCI devices reorder their bus numbers spontaniously, or only if you change the hardware? So, the distros need to name network devices in a persistant way, that is why the distros now do this. If you don't like the distro doing it, complain to them, it's not a kernel issue :) I have, at least the response was to tell me how to kill this 'feature' even if they won't change it. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Mon, 15 Oct 2007, Theodore Tso wrote: On Mon, Oct 15, 2007 at 03:04:00AM -0500, Rob Landley wrote: just as Ethernet and PPP interfaces really are fundamentally the same thing. They're the same thing? Do you mean that on a system with both, going: ifconfig eth1 66.92.53.140 ifconfig ppp 192.168.0.42 Would be functionally equivalent to: ifconfig eth1 192.168.0.42 ifconfig ppp 66.92.53.140 No, of course not. But we don't have separate IP stacks for ethernet and ppp devices. And how we connect to a host via ssh makes no difference whether we accessed it via Ethernet or PPP. And I would argue that how we address a filesystem should also make no difference depending on the path to hard drive. I think a close analogy would be that after a partition is mounted you don't need to know the path to the hard drive, and that is already true today. when you mount a drive (or assign and IP address to a network interface) the path to the device not only matters, it's critical. By the way, ethernet cards contain a unique MAC address. Hard drives do not seem to, or if they do it's not being consistently exposed in a way I can find. You can pull a Model and Serial number via hdparm -i, but it's not as easy to manipulate as a fixed-length MAC address. That's why people tend to use filesystem UUID's. More to the point, with SATA, hot plugging has been designed in, so probing order is not going to be well defined, The spec may define the capability to hotplug, but your average laptop doesn't not offer the capability to hotplug anything into its SATA controllers. The hard drive is screwed in (due to the portability part of laptopness), all the controllers wired onto the motherboard are accounted for, none are exposed externally. What _is_ exposed externally is USB, and if you want to add an extra hard drive you can buy a cheap USB one at Fry's. That may be true for laptops today, but Linux doesn't run just on servers. You can easily get home servers with hot-swap SATA bays. My home fileserver, which is a white box I purchased on my own nickel, NOT IBM big iron, has 3TB of raw storage for less than $10,000 a year ago. Today, that amount of home storage with hot-swap SATA drives and a battery-backed hardware RAID controller could probably be purchased for about half that price. I also have a 3TB raid I built at home, it uses 3ware cards and a dozen 300G IDE drives. since the 3ware driver is classified as SCSI if a drive fails all the other drives get renumbered on the next boot and it's painful to figure out which drive has a problem. I have to reboot and go into the 3ware BIOS to figure out which drive isn't reporting. This system also has an adaptec raid card in it and an adaptec regular SCSI card. The fact that these three cards take different drivers, and so the order of detection changes the drive numbering is a real pain when I'm installing a new distro onto it. once I get it installed I compile my own monolithic kernel and this problem stops becouse the kernel linking order determins the detection order. this replaced a 1.2TB raid that I just about filled up, and then stared having drive failures due to age on. It used 8 160G IDE drives, and when I had problems with a drive it was easy to see that /dev/hdk was missing from the set, and I was still able to have a removable drive bay for /dev/hdc that I could hook my tivo drive into (on a reboot for safety) and not have things go haywire if I left the bay empty (or switched off) when I booted. this may not be hundreds of drives, but it should be enough to show that I have experianced the pain that some people claim is the reason all of this must be dynamic with a userspace helper to sort it all out. My take is that adding the userspace helper and not enumerating things that are easy to enumerate is making things worse, not better. And even for laptops, if you need the performance, you can get Cardbus cards that will allow you to connect eSATA drives to your laptop at Fry's. So even if you ignore "big data center" interconnects like FC, this problem exists even for commodity grade SATA devices. but these are seperate SATA buses, while you could run into ordering issues if you hook multiple devices to one bus, you should be able to have no ordering issues if you don't have more then one device of a type on any one bus (you could have a SATA hard drive on the internal PCI controller, and another one of the Cardbus controller, but if you always order directly connected devices before cardbus connected devices they will always show up in the same order) It's necessary for IBM big iron to do this. It's generally not necessary for laptops or embedded systems to do this if they distinguish between _types_ of devices, which is something they until recently did for the types of devices I was interested in, and something they _stopped_ doing when everything got merged into the scsi layer, and I
Re: More Large blocksize benchmarks
On Mon, Oct 15, 2007 at 08:22:31PM -0400, Chris Mason wrote: > Hello everyone, > > I'm stealing the cc list and reviving and old thread because I've > finally got some numbers to go along with the Btrfs variable blocksize > feature. The basic idea is to create a read/write interface to > map a range of bytes on the address space, and use it in Btrfs for all > metadata operations (file operations have always been extent based). > > So, instead of casting buffer_head->b_data to some structure, I read and > write at offsets in a struct extent_buffer. The extent buffer is very > small and backed by an address space, and I get large block sizes the > same way file_write gets to write to 16k at a time, by finding the > appropriate page in the addess space. This is an over simplification > since I try to cache these mapping decisions to avoid using too much > CPU, but hopefully you get the idea. > > The advantage to this approach is the changes are all inside Btrfs. No > extra kernel patches were required. > > Dave reported that XFS saw much higher write throughput with large > blocksizes, but so far I'm seeing the most benefits during reads. Apples to oranges, Chris ;) btrfs linearises writes due to it's COW behaviour and this is trades off read speed. i.e. we take more seeks to read data so we can keep the write speed high. By using large blocks, you're reducing the number of seeks needed to find anything, and hence the read speed will increase. Write speed will be pretty much unchanged because btrfs does linear writes no matter the block size. XFS doesn't linearise writes and optimises it's layout for a large number of disks and a low number of seeks on reads - the opposite of btrfs. Hence large block sizes reduce the number of writes XFS needs to write a given set of data+metadata and hence write speed increases much more than the read speed (until you get to large tree traversals). The basic conclusion is that different filesystems will benefit in different ways with large block sizes Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] cpuset update_cgroup_cpus_allowed
> currently against an older kernel ah .. which older kernel? I tried it against the broken out 2.6.23-rc8-mm2 patch set, inserting it before the task-containersv11-* patches, but that blew up on me - three rejected hunks. Any chance of getting this against a current cgroup (aka container) kernel? Could you use the diff --show-c-function option when composing patches - they're easier to read that way - thanks. + if (!retval) { + cpus_allowed = cpuset_cpus_allowed(p); + if (!cpus_subset(new_mask, cpus_allowed)) { + /* +* We must have raced with a concurrent cpuset +* update. Just reset the cpus_allowed to the +* cpuset's cpus_allowed +*/ + new_mask = cpus_allowed; This narrows the race, perhaps sufficiently, but I don't see that it guarantees closure. Memory accesses to two different locations are not guaranteed to be ordered across nodes, as best I recall. The second line above, that rereads the cpuset cpus_allowed, could get an old value, in essence. cpuset update task sched_setaffinity task -- -- A. write cpuset [Q] V. read cpuset [Q] B. read task [P]W. check ok C. write task [P] X. write task [P] Y. reread cpuset [Q] Z. check ok again Two memory locations: [P] the cpus_allowed mask in the task_struct of the task doing the sched_setaffinity call. [Q] the cpus_allowed mask in the cpuset of the cpuset to which the sched_setaffinity task is attached. Even though, from the perspective of location [P], both B. and C. happened before X., still from the perspective of location [Q] the rereading in Y. could return the value the cpuset cpus_allowed had before the write in A. This could result in a task running with a cpus_allowed that was totally outside its cpusets cpus_allowed. I will grant that this is a narrow window. I won't loose much sleep over it. > - uses a priority heap to pick the processes to act on, based on start time This adds a fair bit of code and complexity, relative to my patch. This I do loose more sleep over. There has to be a compelling reason for doing this. The point that David raises, regarding the interaction of this with hotplug, seems to be a compelling reason for doing -something- different than my patch proposal. I don't know yet if it compels us to this much code, however. Any chance you could provide a patch that works against cgroups? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] cpuset update_cgroup_cpus_allowed
> Yet by not doing any locking here to prevent a cpu from being > hot-unplugged, you can race and allow the hot-unplug event to happen > before calling set_cpus_allowed(). That makes this entire function a > no-op with set_cpus_allowed() returning -EINVAL for every call, which > isn't caught, and no error is reported to userspace. Good point ... hmmm ... -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] [PATCH 2/2] capabilities: implement 64-bit capabilities
>From 7dd503c612afcb86b3165602ab264e2e9493b4bf Mon Sep 17 00:00:00 2001 From: Serge E. Hallyn <[EMAIL PROTECTED]> Date: Mon, 15 Oct 2007 20:57:52 -0400 Subject: [RFC] [PATCH 2/2] capabilities: implement 64-bit capabilities We are out of capabilities in the 32-bit capability fields, and several users could make use of additional capabilities. Convert the capabilities to 64-bits, change the capability version number accordingly, and convert the file capability code to handle both 32-bit and 64-bit file capability xattrs. It might seem desirable to also implement back-compatibility to read 32-bit caps from userspace, but that becomes problematic with capget, as capget could return valid info for processes not using high bits, but would have to return -EINVAL for those which were. So with this patch, libcap would need to be updated to make use of capset/capget. Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]> --- fs/proc/array.c|6 +++--- include/linux/capability.h | 29 - security/commoncap.c | 37 + 3 files changed, 52 insertions(+), 20 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index 3f4d824..c8ea46d 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -288,9 +288,9 @@ static inline char *task_sig(struct task_struct *p, char *buffer) static inline char *task_cap(struct task_struct *p, char *buffer) { -return buffer + sprintf(buffer, "CapInh:\t%016x\n" - "CapPrm:\t%016x\n" - "CapEff:\t%016x\n", +return buffer + sprintf(buffer, "CapInh:\t%016lx\n" + "CapPrm:\t%016lx\n" + "CapEff:\t%016lx\n", cap_t(p->cap_inheritable), cap_t(p->cap_permitted), cap_t(p->cap_effective)); diff --git a/include/linux/capability.h b/include/linux/capability.h index bb017ed..a3da4b9 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -29,7 +29,7 @@ struct task_struct; library since the draft standard requires the use of malloc/free etc.. */ -#define _LINUX_CAPABILITY_VERSION 0x19980330 +#define _LINUX_CAPABILITY_VERSION 0x20071015 typedef struct __user_cap_header_struct { __u32 version; @@ -37,29 +37,40 @@ typedef struct __user_cap_header_struct { } __user *cap_user_header_t; typedef struct __user_cap_data_struct { -__u32 effective; -__u32 permitted; -__u32 inheritable; +__u64 effective; +__u64 permitted; +__u64 inheritable; } __user *cap_user_data_t; #define XATTR_CAPS_SUFFIX "capability" #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX -#define XATTR_CAPS_SZ (3*sizeof(__le32)) +#define XATTR_CAPS_SZ_1 (3*sizeof(__le32)) +#define XATTR_CAPS_SZ_2 (2*sizeof(__le64) + sizeof(__le32)) #define VFS_CAP_REVISION_MASK 0xFF00 #define VFS_CAP_REVISION_1 0x0100 +#define VFS_CAP_REVISION_2 0x0200 -#define VFS_CAP_REVISION VFS_CAP_REVISION_1 +#define VFS_CAP_REVISION VFS_CAP_REVISION_2 +#define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 #define VFS_CAP_FLAGS_MASK ~VFS_CAP_REVISION_MASK #define VFS_CAP_FLAGS_EFFECTIVE0x01 -struct vfs_cap_data { +struct vfs_cap_data_v1 { __u32 magic_etc; /* Little endian */ __u32 permitted;/* Little endian */ __u32 inheritable; /* Little endian */ }; +struct vfs_cap_data_v2 { + __u32 magic_etc; /* Little endian */ + __u64 permitted;/* Little endian */ + __u64 inheritable; /* Little endian */ +}; + +typedef struct vfs_cap_data_v2 vfs_cap_data; + #ifdef __KERNEL__ /* #define STRICT_CAP_T_TYPECHECKS */ @@ -67,12 +78,12 @@ struct vfs_cap_data { #ifdef STRICT_CAP_T_TYPECHECKS typedef struct kernel_cap_struct { - __u32 cap; + __u64 cap; } kernel_cap_t; #else -typedef __u32 kernel_cap_t; +typedef __u64 kernel_cap_t; #endif diff --git a/security/commoncap.c b/security/commoncap.c index 542bbe9..2cca843 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -190,25 +190,46 @@ int cap_inode_killpriv(struct dentry *dentry) return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); } -static inline int cap_from_disk(struct vfs_cap_data *caps, +union vfs_cap_union { + struct vfs_cap_data_v1 v1; + struct vfs_cap_data_v2 v2; +}; + +static inline int cap_from_disk(union vfs_cap_union *caps, struct linux_binprm *bprm, int size) { __u32 magic_etc; - if (size != XATTR_CAPS_SZ) + if (size != XATTR_CAPS_SZ_1 && size != XATTR_CAPS_SZ_2) return -EINVAL; - magic_etc = le32_to_cpu(caps->magic_etc); + magic_etc = le32_to_cpu(caps->v1.magic_etc); switch ((magic_etc & VFS_CAP_REVISION_MASK)) { - case
[PATCH 1/2 -mm] capabilities: clean up file capability reading
This patch is a simple cleanup which should probably be applied to -mm (assuming I haven't messed it up). The next patch is an experimental patch which will require userspace support and is just RFC at this point. >From 9fc0782de6e1287aaeebe8ad653b008f09b22c11 Mon Sep 17 00:00:00 2001 From: Serge E. Hallyn <[EMAIL PROTECTED]> Date: Mon, 15 Oct 2007 17:33:24 -0400 Subject: [PATCH 1/2] capabilities: clean up file capability reading Simplify the vfs_cap_data structure. Also fix get_file_caps which was declaring __le32 v1caps[XATTR_CAPS_SZ] on the stack, but XATTR_CAPS_SZ is already * sizeof(__le32). Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]> --- include/linux/capability.h |6 ++ security/commoncap.c | 23 +++ 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index 7a8d7ad..bb017ed 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -56,10 +56,8 @@ typedef struct __user_cap_data_struct { struct vfs_cap_data { __u32 magic_etc; /* Little endian */ - struct { - __u32 permitted;/* Little endian */ - __u32 inheritable; /* Little endian */ - } data[1]; + __u32 permitted;/* Little endian */ + __u32 inheritable; /* Little endian */ }; #ifdef __KERNEL__ diff --git a/security/commoncap.c b/security/commoncap.c index 43f9027..542bbe9 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -190,7 +190,8 @@ int cap_inode_killpriv(struct dentry *dentry) return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); } -static inline int cap_from_disk(__le32 *caps, struct linux_binprm *bprm, +static inline int cap_from_disk(struct vfs_cap_data *caps, + struct linux_binprm *bprm, int size) { __u32 magic_etc; @@ -198,7 +199,7 @@ static inline int cap_from_disk(__le32 *caps, struct linux_binprm *bprm, if (size != XATTR_CAPS_SZ) return -EINVAL; - magic_etc = le32_to_cpu(caps[0]); + magic_etc = le32_to_cpu(caps->magic_etc); switch ((magic_etc & VFS_CAP_REVISION_MASK)) { case VFS_CAP_REVISION: @@ -206,8 +207,8 @@ static inline int cap_from_disk(__le32 *caps, struct linux_binprm *bprm, bprm->cap_effective = true; else bprm->cap_effective = false; - bprm->cap_permitted = to_cap_t( le32_to_cpu(caps[1]) ); - bprm->cap_inheritable = to_cap_t( le32_to_cpu(caps[2]) ); + bprm->cap_permitted = to_cap_t( le32_to_cpu(caps->permitted) ); + bprm->cap_inheritable = to_cap_t( le32_to_cpu(caps->inheritable) ); return 0; default: return -EINVAL; @@ -219,7 +220,7 @@ static int get_file_caps(struct linux_binprm *bprm) { struct dentry *dentry; int rc = 0; - __le32 v1caps[XATTR_CAPS_SZ]; + struct vfs_cap_data incaps; struct inode *inode; if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID) { @@ -232,8 +233,14 @@ static int get_file_caps(struct linux_binprm *bprm) if (!inode->i_op || !inode->i_op->getxattr) goto out; - rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, , - XATTR_CAPS_SZ); + rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); + if (rc > 0) { + if (rc == XATTR_CAPS_SZ) + rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, + , XATTR_CAPS_SZ); + else + rc = -EINVAL; + } if (rc == -ENODATA || rc == -EOPNOTSUPP) { /* no data, that's ok */ rc = 0; @@ -242,7 +249,7 @@ static int get_file_caps(struct linux_binprm *bprm) if (rc < 0) goto out; - rc = cap_from_disk(v1caps, bprm, rc); + rc = cap_from_disk(, bprm, rc); if (rc) printk(KERN_NOTICE "%s: cap_from_disk returned %d for %s\n", __FUNCTION__, rc, bprm->filename); -- 1.5.1.1.GIT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/11] maps3: introduce task_size_of for all arches
On Mon, 15 Oct 2007, Dave Hansen wrote: > diff -puN > include/asm-mips/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches > include/asm-mips/processor.h > --- > lxc/include/asm-mips/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches > 2007-10-15 17:29:22.0 -0700 > +++ lxc-dave/include/asm-mips/processor.h 2007-10-15 17:34:12.0 > -0700 > @@ -45,6 +45,8 @@ extern unsigned int vced_count, vcei_cou > * space during mmap's. > */ > #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) > +#define TASK_SIZE_OF(tsk)\ > + (test_tsk_thread_flag(tak, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) > #endif > > #ifdef CONFIG_64BIT tak needs to be tsk. > @@ -65,6 +67,8 @@ extern unsigned int vced_count, vcei_cou > #define TASK_UNMAPPED_BASE \ > (test_thread_flag(TIF_32BIT_ADDR) ? \ > PAGE_ALIGN(TASK_SIZE32 / 3) : PAGE_ALIGN(TASK_SIZE / 3)) > +#define TASK_SIZE_OF(tsk)\ > + (test_tsk_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) > #endif > > #define NUM_FPU_REGS 32 test_tsk_thread_flag() takes two arguments. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/11] maps3: add proportional set size accounting in smaps
On Mon, 15 Oct 2007, Matt Mackall wrote: > > The pss is going to need accessor functions, preferably inlined, and the > > comment adjusted stating that all accesses should be through those > > functions and not directly to the mem_size_stats struct. > > > > static inline u64 pss_up(unsigned long pss) > > { > > return pss << PSS_DIV_BITS; > > } > > > > static inline unsigned long pss_down(u64 pss) > > { > > return pss >> PSS_DIV_BITS; > > } > > I think that's overkill for something that has exactly one use of each. > There's no overkill at all, the current uses are already accessed with these bitshifts so there's no overhead when using an inlined function instead. To correctly access the pss, these bitshifts are required because the decision was made to use the lower PSS_DIV_BITS for rounding. Thus, you need to include accessor functions so that they are always accessed correctly now and in the future. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nmi_watchdog on x86_64
just found my on hand ck804, and mcp55 based AMD servers: nmi_watchdog=1 doesn't work but nmi_watchdog=2 does work =1, it say: IOAPIC 8259A virtual wire mode... Did nmi_watchdog=1 work on any other amd64 platform? YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Tue, 16 Oct 2007, Neil Brown wrote: On Monday October 15, [EMAIL PROTECTED] wrote: Therefore it is best to not have stable single-number naming schemes for any devices on any machines. Why? Because it ensure there will not be any second class citizens. This is where we disagree. The existence of devices you cannot stably enumerate does not eliminate the existence of devices you trivially can. No, but it dramatically reduces that value of being able to enumerate those devices. this is the point of disagreement. the devices you can trivially enumerate can be handled easily and trivially, the ones that you can't may require more complex things to handle them, but that depends on the situation. If you only have one USB drive on a system you don't need to worry about what order USB hotplug events come in if you can just say 'the first USB drive'. mixing the different types of devices into one namespace complicates things in a couple of ways. 1. devices that used to have stable names no longer have stable names without extra effort. 2. having multiple seperate unstable namespaces with one name in each of them looks to the user like a stable namespace, since the instability never comes into play. combineing these into a single namespace looses this stability Pulling out the "IBM numa cluster with multiple SAS enclosures _and_ firewire" infrastructure to find the root partition on my hard drive may be good for the IBM numa clusters, but only at the expense of complicating this part of my laptop's infrastructure by an order of magnitude, and making embedded systems nearly impossible to put together. If "one size fits all" were true, my cell phone would be running Red Hat Enterprise. If some devices that are even reasonably common (e.g. IDE drives) are stable, then some application developers or system integrators will work under the assumption of stability and whatever they build will break when you try it on different hardware. So you break the IDE drives to get laptop users to debug the Niagra set? The Breaking old behaviour is always bad... My computers with IDE interfaces still see stable "/dev/hda" devices. Are you saying the devices that used to be "hda" are now "sdb" ?? Maybe there is a .config option... yes, this changed. If you run your IDE drives with the PATA drivers of libata they show up as sdX, and are subject to the same detection order issues as any other sd device. solution is to make the easy cases hard? Is it really that hard? Note that stable names a still a very real option. udev provides several. /dev/disk-by-path/XXX will be stable for lots of "screwed in" devices. /dev/disk-by-id will be stable for devices the report a unique id. etc. Here it's ls /dev/disk/by-path/ pci-:00:1f.2-scsi-0:0:0:0pci-:00:1f.2-scsi-0:0:0:0-part4 pci-:00:1f.2-scsi-0:0:0:0-part1 pci-:00:1f.2-scsi-0:0:0:0-part5 pci-:00:1f.2-scsi-0:0:0:0-part2 pci-:00:1f.2-scsi-0:0:0:0-part6 pci-:00:1f.2-scsi-0:0:0:0-part3 pci-:00:1f.2-scsi-1:0:0:0 And this is an improvement? Depends on your metric. "Easy to type" - I guess /dev/hda1 wins hands down. "Can be used in a script or config file and is guaranteed always to work until a screwdriver is used to change that device or it's controller" I think /dev/disk/by-path/pci-:00:1f.2-scsi-0:0:0:0-part1 is quite acceptable. What is your metric? does it have to be one or the other? /dev/hda1 suceeded on both metrics. The different between IDE, SATA, SCSI and even USB is peripheral for the large majority of uses, and I think maintaining the distinction in the major/minor number or in the "primary" /dev name is - for the above reasons - more of a cost that a value. Is your definition of "the large majority of uses" where ncr Voyager, the Amiga, and current macintosh laptops are all one use each, or is your definition of "the large majority of uses" the one where each "use" is an installation, of which there are millions of PCs (and even more ARM cell phones), and something like three instances of Voyager? My definition of "the large majority or uses" is "mkfs, fsck, mount, fdisk, system-install-process". Different people differentiate devices in different ways. A system integrator might know about the hardware path. An end user might know about drive brands or sizes. A casual user might just think "internal or external". The kernel cannot support all these different approaches to naming. It really is best if it uses arbitrary names, and provides access to descriptions that the user can choose between. udev facilitates this with links in /dev/disk/. A system install can facilitate this even more by reporting size/manufacturer information etc. but is the possibility of wanting different options really sufficiant reason to eliminate every stable option? right now the /dev names are essentially random without external help. why couldn't they be stable
Re: [PATCH RFC 2/2] paravirt: clean up lazy mode handling
On 10/12/07, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > [ Changes since last post: fixed up lguest ] > > Currently, the set_lazy_mode pv_op is overloaded with 5 functions: > 1. enter lazy cpu mode > 2. leave lazy cpu mode > 3. enter lazy mmu mode > 4. leave lazy mmu mode > 5. flush pending batched operations > > This complicates each paravirt backend, since it needs to deal with > all the possible state transitions, handling flushing, etc. In > particular, flushing is quite distinct from the other 4 functions, and > seems to just cause complication. > > This patch removes the set_lazy_mode operation, and adds "enter" and > "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic > associated with enter and leaving lazy states is now in common code > (basically BUG_ONs to make sure that no mode is current when entering > a lazy mode, and make sure that the mode is current when leaving). > Also, flush is handled in a common way, by simply leaving and > re-entering the lazy mode. > > The result is that the Xen, lguest and VMI lazy mode implementations > are much simpler. > > Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]> > Cc: Zach Amsden <[EMAIL PROTECTED]> > Cc: Rusty Russell <[EMAIL PROTECTED]> > Cc: Avi Kivity <[EMAIL PROTECTED]> > Cc: Anthony Liguory <[EMAIL PROTECTED]> > Cc: "Glauber de Oliveira Costa" <[EMAIL PROTECTED]> > Cc: "Nakajima, Jun" <[EMAIL PROTECTED]> > > --- > arch/i386/kernel/paravirt.c | 58 > +++ > arch/i386/kernel/vmi.c | 45 +++-- > arch/i386/xen/enlighten.c | 44 ++-- > arch/i386/xen/mmu.c |2 - > arch/i386/xen/multicalls.h |2 - > arch/i386/xen/xen-ops.h |7 - > drivers/lguest/lguest.c | 34 - > include/asm-i386/paravirt.h | 52 -- > 8 files changed, 140 insertions(+), 104 deletions(-) > > === > --- a/arch/i386/kernel/paravirt.c > +++ b/arch/i386/kernel/paravirt.c > @@ -266,6 +266,49 @@ int paravirt_disable_iospace(void) > } > > return ret; > +} > + > +static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = > PARAVIRT_LAZY_NONE; > + > +static inline void enter_lazy(enum paravirt_lazy_mode mode) > +{ > + BUG_ON(x86_read_percpu(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE); > + BUG_ON(preemptible()); Wouldn't it be better to WARN_ON, and simply not entering lazy mode? It does not sound like a fatal condition. > +void paravirt_leave_lazy(enum paravirt_lazy_mode mode) > +{ > + BUG_ON(x86_read_percpu(paravirt_lazy_mode) != mode); > + BUG_ON(preemptible()); Although this one seems like a fatal condition ;-) > +void paravirt_enter_lazy_mmu(void) > +{ > + enter_lazy(PARAVIRT_LAZY_MMU); > +} > + > +void paravirt_leave_lazy_mmu(void) > +{ > + paravirt_leave_lazy(PARAVIRT_LAZY_MMU); > +} > + > +void paravirt_enter_lazy_cpu(void) > +{ > + enter_lazy(PARAVIRT_LAZY_CPU); > +} > + > +void paravirt_leave_lazy_cpu(void) > +{ > + paravirt_leave_lazy(PARAVIRT_LAZY_CPU); > +} > + > +enum paravirt_lazy_mode paravirt_get_lazy_mode(void) > +{ > + return x86_read_percpu(paravirt_lazy_mode); > } I am concerned that this is 32-bit specific. But hey: We could wrap it here, but the best solution may be just to define this macro for 64-bit, and make it everyone benefits. So yeah, this is a concern here, but I don't think anything should be changed in this patch to address it so... so... ok ;-) -- Glauber de Oliveira Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] git scsi misc include fix
James wrote: > In that case, the correct fix > is actually to move the scatterlist include from scsi_error.c (where the > scatterlist was originally used locally) into scsi_eh.h, like this. I suspect you're correct, yes. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] lockdep for v2.6.24
On Mon, 15 Oct 2007, Peter Zijlstra wrote: > > please pull the lockdep tree from: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep.git > v2.6.24-lockdep Hmm. I'm now getting WARNING: at kernel/lockdep.c:700 look_up_lock_class() Call Trace: [] __lock_acquire+0x15f/0xc92 [] do_lookup+0x83/0x1b0 [] lock_acquire+0x5a/0x73 [] do_lookup+0x83/0x1b0 [] debug_mutex_lock_common+0x16/0x23 [] mutex_lock_nested+0x10c/0x2b0 [] do_lookup+0x83/0x1b0 [] __link_path_walk+0x924/0xde9 [] link_path_walk+0x58/0xe0 [] _spin_unlock+0x17/0x20 [] get_unused_fd_flags+0x115/0x126 [] do_path_lookup+0x1ae/0x229 [] __path_lookup_intent_open+0x56/0x96 [] open_namei+0x7d/0x66c [] do_filp_open+0x1c/0x38 [] _spin_unlock+0x17/0x20 [] get_unused_fd_flags+0x115/0x126 [] do_sys_open+0x46/0xc3 [] system_call+0x7e/0x83 which seems to be new.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/11] maps3: add /proc/kpagecount and /proc/kpageflags interfaces
On Mon, 2007-10-15 at 19:58 -0500, Matt Mackall wrote: > > > For the bits that we want to export, we could also add the unoptimized > > access functions for any that don't already have them: > > > > #define __ClearPageReserved(page) __clear_bit(PG_reserved, > > &(page)->flags) > > Confused. Why are we interested in clear? We're not. I just grabbed a random line to show the non-atomic accessors. Any actual one we'd need to add would be: #define __PageBuddy(page) __test_bit(PG_buddy, &(page)->flags) It looks like we don't have any of these non-atomic ones for plain __PageFoo(). So, we'd have to add them for each one that we wanted. Still not much work, and still satisfies the "grep test". :) -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] git scsi misc include fix
On Mon, 2007-10-15 at 17:08 -0700, Paul Jackson wrote: > James wrote: > > The requirement for struct scatterlist is the same > > before and after the gid scsi-misc patch. > > Not so. The git-scsi-misc.patch in 2.6.23-mm1 clearly adds the line: > > struct scatterlist sense_sgl; > > as part of the added struct scsi_eh_save in scsi/scsi_eh.h. > > This bit me while I was doing a bisection on 2.6.23-mm1, for another > problem, in git-sched, which is discussed in the lkml thread: > > git-sched patch won't boot on SN arch, 2.6.23-mm1 > > This is using sn2_defconfig. The full 2.6.23-mm1 patch set builds ok, > because another patch, git-block.patch as I recall, includes > scatterlist.h some other way, but for the following range of patches in > 2.6.23-mm1, on the configuration sn2_defconfig, the build is broken, > due to 'struct scatterlist' being an incomplete type: > > git-scsi-misc.patch > git-scsi-misc-include-fix.patch > git-scsi-misc-fixup.patch > qla2xxx-printk-fixes.patch > pci-error-recovery-symbios-scsi-base-support.patch > pci-error-recovery-symbios-scsi-first-failure.patch > nsp32_restart_autoscsi-remove-error-check.patch > scsi-send-media-state-change-modification-events.patch > scsi-early-detection-of-medium-not-present-updated.patch > mptbase-reset-ioc-initiator-during-pci-resume.patch > scsi-use-notifier-chain-for-asynchronous-event.patch > initio-fix-conflict-when-loading-driver.patch > git-block.patch > > > it should also fail with vanilla 2.6.23 > > I don't know about the vanilla 2.6.23 case. Ah, right, sorry ... on the ball now. I thought you were saying that the scsi_error.c compilation was failing. In that case, the correct fix is actually to move the scatterlist include from scsi_error.c (where the scatterlist was originally used locally) into scsi_eh.h, like this. James diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index d29f846..ebaca4c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -24,7 +24,6 @@ #include #include #include -#include #include #include diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h index 44224ba..d21b891 100644 --- a/include/scsi/scsi_eh.h +++ b/include/scsi/scsi_eh.h @@ -1,6 +1,8 @@ #ifndef _SCSI_SCSI_EH_H #define _SCSI_SCSI_EH_H +#include + #include struct scsi_device; struct Scsi_Host; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/11] maps3: add /proc/kpagecount and /proc/kpageflags interfaces
On Mon, Oct 15, 2007 at 05:49:10PM -0700, Dave Hansen wrote: > On Mon, 2007-10-15 at 19:35 -0500, Matt Mackall wrote: > > Perhaps we need something like: > > > > flags = page->flags; > > userflags = > > FLAG_BIT(USER_REFERENCED, flags & PG_referenced) | > > ... > > > > etc. for the flags we want to export. This will let us change to > > > > FLAG_BIT(USER_SLAB, PageSlab(page)) | > > > > if we make a virtual slab bit. > > Yeah, that looks like a pretty sane scheme. Do we want to be any more > abstract about it? Perhaps instead of USER_SLAB, it should be > USER_KERNEL_INTERNAL, or USER_KERNEL_USE. The slab itself is going away > as we speak. :) Perhaps. SLUB is still "a slab-based allocator". SLOB isn't, but I intend to start making it use PG_slab shortly anyway. > > And it shows up in grep. > > > > Unfortunately, i386 test_bit is an asm inline and not a macro so we > > can't hope for the compiler to fold up a bunch of identity bit > > mappings for us. > > We could also Yeah, that looks like a pretty sane scheme. Do we want to > be any more abstract about it? Perhaps instead of USER_SLAB, it should > be USER_KERNEL_INTERNAL, or USER_KERNEL_USE. The slab itself is going > away as we speak. > > For the bits that we want to export, we could also add the unoptimized > access functions for any that don't already have them: > > #define __ClearPageReserved(page) __clear_bit(PG_reserved, > &(page)->flags) Confused. Why are we interested in clear? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sc1200 pci cleanup, resume improvement, bug fix
This patch accomplishes the following goals: * kill the 'pci_enable_device ret val not checked' warning * eliminate the incorrect mucking with pci_dev::current_state via the following changes: * [minor bug fix] eliminate pci_set_power_state() call in resume, pci_enable_device() does so for us. * [bug fix] do not touch dev->current_state, pci_set_power_state() and other PCI layer functions manage this for us. * [minor bug fix, warning fix] check pci_enable_device() ret val in resume, and do not bring up interfaces if it fails (which it might) * eliminate lookup_pci_dev(), a needless loop over a global list, by storing our associated hwif in a struct allocated at probe time. * introduce __ide_setup_pci_device() to facilitate making PCI drivers aware of the hwifs created during IDE generic probe. Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> --- WARNING WARNING WARNING This is a drop-n-run patch, created ultimately because I was annoyed at the [quite valid] pci_enable_device() build warning. If someone likes this, please "take ownership" of the patch. WARNING WARNING WARNING drivers/ide/pci/sc1200.c | 72 ++- drivers/ide/setup-pci.c | 12 +++ include/linux/ide.h |2 + 3 files changed, 59 insertions(+), 27 deletions(-) diff --git a/drivers/ide/pci/sc1200.c b/drivers/ide/pci/sc1200.c index ee0e3f5..fa61550 100644 --- a/drivers/ide/pci/sc1200.c +++ b/drivers/ide/pci/sc1200.c @@ -41,6 +41,12 @@ #define PCI_CLK_66 0x02 #define PCI_CLK_33A0x03 +#define SC1200_IFS 2 + +struct sc1200_ifs { + ide_hwif_t *iface[SC1200_IFS]; +}; + static unsigned short sc1200_get_pci_clock (void) { unsigned char chip_id, silicon_revision; @@ -274,22 +280,6 @@ static void sc1200_set_pio_mode(ide_drive_t *drive, const u8 pio) } #ifdef CONFIG_PM -static ide_hwif_t *lookup_pci_dev (ide_hwif_t *prev, struct pci_dev *dev) -{ - int h; - - for (h = 0; h < MAX_HWIFS; h++) { - ide_hwif_t *hwif = _hwifs[h]; - if (prev) { - if (hwif == prev) - prev = NULL;// found previous, now look for next match - } else { - if (hwif && hwif->pci_dev == dev) - return hwif;// found next match - } - } - return NULL;// not found -} typedef struct sc1200_saved_state_s { __u32 regs[4]; @@ -298,7 +288,9 @@ typedef struct sc1200_saved_state_s { static int sc1200_suspend (struct pci_dev *dev, pm_message_t state) { - ide_hwif_t *hwif = NULL; + struct sc1200_ifs *ifs = pci_get_drvdata(dev); + ide_hwif_t *hwif; + int i; printk("SC1200: suspend(%u)\n", state.event); @@ -308,9 +300,14 @@ static int sc1200_suspend (struct pci_dev *dev, pm_message_t state) // // Loop over all interfaces that are part of this PCI device: // - while ((hwif = lookup_pci_dev(hwif, dev)) != NULL) { + for (i = 0; i < SC1200_IFS; i++) { sc1200_saved_state_t*ss; unsigned intbasereg, r; + + hwif = ifs->iface[i]; + if (!hwif) + continue; + // // allocate a permanent save area, if not already allocated // @@ -337,23 +334,31 @@ static int sc1200_suspend (struct pci_dev *dev, pm_message_t state) pci_disable_device(dev); pci_set_power_state(dev, pci_choose_state(dev, state)); - dev->current_state = state.event; return 0; } static int sc1200_resume (struct pci_dev *dev) { - ide_hwif_t *hwif = NULL; + struct sc1200_ifs *ifs = pci_get_drvdata(dev); + ide_hwif_t *hwif; + int rc, i; + + rc = pci_enable_device(dev); + if (rc) + return rc; - pci_set_power_state(dev, PCI_D0); // bring chip back from sleep state - dev->current_state = PM_EVENT_ON; - pci_enable_device(dev); // // loop over all interfaces that are part of this pci device: // - while ((hwif = lookup_pci_dev(hwif, dev)) != NULL) { + for (i = 0; i < SC1200_IFS; i++) { unsigned intbasereg, r; - sc1200_saved_state_t*ss = (sc1200_saved_state_t *)hwif->config_data; + sc1200_saved_state_t*ss; + + hwif = ifs->iface[i]; + if (!hwif) + continue; + + ss = (sc1200_saved_state_t *)hwif->config_data; // // Restore timing registers: this may be unnecessary if BIOS also does it @@ -411,7 +416,22
Re: [rfc][patch 3/3] x86: optimise barriers
On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote: > On Mon, Oct 15, 2007 at 10:09:24AM +0200, Nick Piggin wrote: > ... > > Has performance really been much problem for you? (even before the > > lfence instruction, when you theoretically had to use a locked op)? > > I mean, I'd struggle to find a place in the Linux kernel where there > > is actually a measurable difference anywhere... and we're pretty > > performance critical and I think we have a reasonable amount of lockless > > code (I guess we may not have a lot of tight computational loops, though). > > I'd be interested to know what, if any, application had found these > > barriers to be problematic... > > I'm not performance-words at all, so I can't help you, sorry. But, I > understand people who care about this, and think there is a popular > conviction barriers and locked instructions are costly, so I'm > surprised there is any "problem" now with finding these gains... It's more expensive than nothing, sure. However in real code, algorithmic complexity, cache misses and cacheline bouncing tend to be much bigger issues. I can't think of a place in the kernel where smp_rmb matters _that_ much. seqlocks maybe (timers, dcache lookup), vmscan... Obviously removing the lfence is not going to hurt. Maybe we even gain 0.01% performance in someone's workload. Also, remember: if loads are already in-order, then lfence is a noop, right? (in practice it seems to have to do a _little_ bit of work, but it's like a dozen cycles). > > The thing is that those documents are not defining what a particular > > implementation does, but how the architecture is defined (ie. what > > must some arbitrary software/hardware provide and what may it expect). > > I'm not sure this is the right way to tell it. If there is no > distinction between what is and what could be, how can I believe in > similar Alpha or Itanium stuff? IMHO, these manuals sometimes look > like they describe some real hardware mechanisms, and sometimes they > mention about possible changes and reserved features too. So, when > they don't mention you could think it's a present behavior. No. Why are you reading that much into it? I know for a fact that some non-x86 architectures actual implementations have stronger ordering than their ISA allows. It's nothing to do with you "believing" how the hardware works. That's not what the document is describing (directly). > > It's pretty natural that Intel started out with a weaker guarantee > > than their CPUs of the time actually supported, and tightened it up > > after (presumably) deciding not to implement such relaxed semantics > > for the forseeable future. > > As a matter of fact it's not natural for me at all. I expected the > other direction, and I still doubt programmers' intentions could be > "automatically" predicted good enough, so IMHO, it's not for long. Really? Consider the consequences if, instead of releasing this latest document tightening consistency, Intel found that out of order loads were worth 5% more performance and implemented them in their next chip. The chip could be completely backwards compatible, but all your old code would break, because it was broken to begin with (because it was outside the spec). IMO Intel did exactly the right thing from an engineering perspective, and so did Linux to always follow the spec. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/11] maps3: add /proc/kpagecount and /proc/kpageflags interfaces
On Mon, 2007-10-15 at 19:35 -0500, Matt Mackall wrote: > Perhaps we need something like: > > flags = page->flags; > userflags = > FLAG_BIT(USER_REFERENCED, flags & PG_referenced) | > ... > > etc. for the flags we want to export. This will let us change to > > FLAG_BIT(USER_SLAB, PageSlab(page)) | > > if we make a virtual slab bit. Yeah, that looks like a pretty sane scheme. Do we want to be any more abstract about it? Perhaps instead of USER_SLAB, it should be USER_KERNEL_INTERNAL, or USER_KERNEL_USE. The slab itself is going away as we speak. :) > And it shows up in grep. > > Unfortunately, i386 test_bit is an asm inline and not a macro so we > can't hope for the compiler to fold up a bunch of identity bit > mappings for us. We could also Yeah, that looks like a pretty sane scheme. Do we want to be any more abstract about it? Perhaps instead of USER_SLAB, it should be USER_KERNEL_INTERNAL, or USER_KERNEL_USE. The slab itself is going away as we speak. For the bits that we want to export, we could also add the unoptimized access functions for any that don't already have them: #define __ClearPageReserved(page) __clear_bit(PG_reserved, &(page)->flags) Anybody changing bit behavior will certainly go check all of the callers, such as ClearPageReserved() *and* __ClearPageReserved(). -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WANTED: kernel projects for CS students
On Mon, 15 Oct 2007, Zan Lynx wrote: On Sun, 2007-10-14 at 19:01 -0400, Rik van Riel wrote: The kernel newbies community often gets inquiries from CS students who need a project for their studies and would like to do something with the Linux kernel, but would also like their code to be useful to the community afterwards. In order to make it easier for them, I am trying to put together a page with projects that: - Are self contained enough that the students can implement the project by themselves, since that is often a university requirement. - Are self contained enough that Linux could merge the code (maybe with additional changes) after the student has been working on it for a few months. - Are large enough to qualify as a student project, luckily there is flexibility here since we get inquiries for anything from 6 week projects to 6 month projects. If you have ideas on what projects would be useful, please add them to this page (or email me): http://kernelnewbies.org/KernelProjects How about this in the Device Mapper raid-1/mirror code? /* FIXME: add read balancing */ That comment has been in there for many releases. I've wanted read balancing for several servers and had all sorts of ideas about it, like adding functions to the underlying device queues to return a "queuing cost" to determine which is the best queue to add the read request. I think that could work better for queues like CFQ than the MD closest-head. An implementation would also need to be benchmarked against the MD raid-1. Along with the time to submit it to LKML, get it reviewed and polish it up, it might make a good student project. another couple of raid enhancements would be: 1. teach the system that a raid456 stripe is handled most efficiantly if treated as a single block of data by this I mean that if you read one block from the stripe the system reads the entire stripe, so it should take this into account when doing read-ahead and not always throw away most of the data it read becouse it's outside the current readahead window (if nothing else, look at putting it on the tail of the LRU list instead of just forgetting it) if you write one block of the stripe the system must read the stripe, then update two blocks of the stripe (the data block and the parity block), but if you are going to write the entire stripe out you can ignore whatever's there and just calculate the parity block from the data you are writing. this should make writing to a raid456 stripe as fast as writing to a raid0 stripe (well, almost, you have one more block to write). 2. not directly a kernel project, create userspace tools that make managing raid and partitioning on linux as easy as the zfs tools 3. there is currently the ability to grow a raid56 array by adding a disk, but there is not the ability to take a raid5 array, add a disk and make the result a raid6 array. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More Large blocksize benchmarks
On Mon, 15 Oct 2007, Chris Mason wrote: > Dave reported that XFS saw much higher write throughput with large > blocksizes, but so far I'm seeing the most benefits during reads. Dave's tests were done with an early large blocksize patchset that had issues with readahead. More recent versions have the fixes by Fengguang that address the issue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/11] maps3: introduce task_size_of for all arches
David, All of your comments looked pretty valid to me. I've refreshed that patch. I haven't even compile-tested this so there may be some fat fingering somewhere. I'll run compile tests on it now. -- Dave For the /proc//pagemap code[1], we need to able to query how much virtual address space a particular task has. The trick is that we do it through /proc and can't use TASK_SIZE since it references "current" on some arches. The process opening the /proc file might be a 32-bit process opening a 64-bit process's pagemap file. x86_64 already has a TASK_SIZE_OF() macro: #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_IA32)) ? IA32_PAGE_OFFSET : TASK_SIZE64) I'd like to have that for other architectures. So, add it for all the architectures that actually use "current" in their TASK_SIZE. For the others, just add a quick #define in sched.h to use plain old TASK_SIZE. 1. http://www.linuxworld.com/news/2007/042407-kernel.html - MIPS portion from Ralf Baechle <[EMAIL PROTECTED]> Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]> Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> --- lxc-dave/include/asm-ia64/processor.h|3 ++- lxc-dave/include/asm-mips/processor.h|4 lxc-dave/include/asm-parisc/processor.h |3 ++- lxc-dave/include/asm-powerpc/processor.h |3 ++- lxc-dave/include/asm-s390/processor.h|3 ++- lxc-dave/include/linux/sched.h |4 6 files changed, 16 insertions(+), 4 deletions(-) diff -puN include/asm-ia64/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches include/asm-ia64/processor.h --- lxc/include/asm-ia64/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches 2007-10-15 17:29:22.0 -0700 +++ lxc-dave/include/asm-ia64/processor.h 2007-10-15 17:29:22.0 -0700 @@ -31,7 +31,8 @@ * each (assuming 8KB page size), for a total of 8TB of user virtual * address space. */ -#define TASK_SIZE (current->thread.task_size) +#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size) +#define TASK_SIZE TASK_SIZE_OF(current) /* * This decides where the kernel will search for a free chunk of vm diff -puN include/asm-mips/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches include/asm-mips/processor.h --- lxc/include/asm-mips/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches 2007-10-15 17:29:22.0 -0700 +++ lxc-dave/include/asm-mips/processor.h 2007-10-15 17:34:12.0 -0700 @@ -45,6 +45,8 @@ extern unsigned int vced_count, vcei_cou * space during mmap's. */ #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) +#define TASK_SIZE_OF(tsk) \ + (test_tsk_thread_flag(tak, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) #endif #ifdef CONFIG_64BIT @@ -65,6 +67,8 @@ extern unsigned int vced_count, vcei_cou #define TASK_UNMAPPED_BASE \ (test_thread_flag(TIF_32BIT_ADDR) ? \ PAGE_ALIGN(TASK_SIZE32 / 3) : PAGE_ALIGN(TASK_SIZE / 3)) +#define TASK_SIZE_OF(tsk) \ + (test_tsk_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) #endif #define NUM_FPU_REGS 32 diff -puN include/asm-parisc/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches include/asm-parisc/processor.h --- lxc/include/asm-parisc/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches 2007-10-15 17:29:22.0 -0700 +++ lxc-dave/include/asm-parisc/processor.h 2007-10-15 17:31:39.0 -0700 @@ -32,7 +32,8 @@ #endif #define current_text_addr() ({ void *pc; current_ia(pc); pc; }) -#define TASK_SIZE (current->thread.task_size) +#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size) +#define TASK_SIZE TASK_SIZE_OF(current) #define TASK_UNMAPPED_BASE (current->thread.map_base) #define DEFAULT_TASK_SIZE32(0xFFF0UL) diff -puN include/asm-powerpc/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches include/asm-powerpc/processor.h --- lxc/include/asm-powerpc/processor.h~PATCH_2_11_maps3-_introduce_task_size_of_for_all_arches 2007-10-15 17:29:22.0 -0700 +++ lxc-dave/include/asm-powerpc/processor.h2007-10-15 17:32:00.0 -0700 @@ -99,8 +99,9 @@ extern struct task_struct *last_task_use */ #define TASK_SIZE_USER32 (0x0001UL - (1*PAGE_SIZE)) -#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ +#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ TASK_SIZE_USER32 : TASK_SIZE_USER64) +#define TASK_SIZETASK_SIZE_OF(current) /* This decides where the kernel will search for a free chunk of vm * space during mmap's. diff -puN
Re: WANTED: kernel projects for CS students
On Mon, 15 Oct 2007, Mark Gross wrote: On Sun, Oct 14, 2007 at 07:01:28PM -0400, Rik van Riel wrote: The kernel newbies community often gets inquiries from CS students who need a project for their studies and would like to do something with the Linux kernel, but would also like their code to be useful to the community afterwards. In order to make it easier for them, I am trying to put together a page with projects that: - Are self contained enough that the students can implement the project by themselves, since that is often a university requirement. - Are self contained enough that Linux could merge the code (maybe with additional changes) after the student has been working on it for a few months. - Are large enough to qualify as a student project, luckily there is flexibility here since we get inquiries for anything from 6 week projects to 6 month projects. If you have ideas on what projects would be useful, please add them to this page (or email me): http://kernelnewbies.org/KernelProjects Is there already a make config option that will do a good job at setting a default .config file based on what is already running on a system? I get tiered of trimming down my .config for my laptop build so it takes less than 30min to build a kernel. Bonus credit to additional "expert" options (like those powertop puts out) for target uses, laptop, HPC, home file share, embedded targets Oh, and lets make the expert configs easily extensible. another config thing that would be nice would be to take something like Rob Landley's miniconfig tool and make it work well enough to be integrated (it creates a version of .config that only contains the things that need to be set, not everything that's at a default that doesn't make any difference) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/11] maps3: add /proc/kpagecount and /proc/kpageflags interfaces
On Mon, Oct 15, 2007 at 04:34:57PM -0700, Dave Hansen wrote: > On Mon, 2007-10-15 at 18:11 -0500, Matt Mackall wrote: > > > Could we just have /proc/kpagereferenced? Is there a legitimate need > > > for other flags to be visible? > > > > Referenced, dirty, uptodate, lru, active, slab, writeback, reclaim, > > and buddy all look like they might be interesting to me from the point > > of view of watching what's happening in the VM graphically in > > real-time. > > This is true, but it forces a lot of logic from the kernel to be run in > userspace to figure out what is going on. Looking at mainline today: > > #define PG_reclaim 17 /* To be reclaimed asap */ > ... > #define PG_readaheadPG_reclaim /* Reminder to do async read-ahead > */ > > All of a sudden, to figure out which flag it actually is, we need to > have all of the logic that the kernel does. > > Does this establish a fixed user<->kernel ABI that will keep us from > doing this in the future: > > -#define PG_slab 7 /* slab debug (Suparna wants this) */ > +#define PG_slab 14 /* slab debug (Suparna wants this) > */ > > Or, even something like this: > > -#define PageSlab(page) test_bit(PG_slab, &(page)->flags) > +#define PageSlab(page) (!PageLRU(page) && !PageHighmem(page)) Yeah, there are a bunch of flags that aren't mutually exclusive and we could probably recover a few. > If we actually had several (or even still one file) that exposed this > state, independent of the actual content of page->flags, I think we'd be > better off. I think that's the difference between a fun, super-useful > debugging feature and one that can stay in mainline and have > applications stay using it (without breaking) for a long time. > > The flags you listed are things that I would imagine will always exist, > logically. But, we might not always have a specific page flag for pages > under writeback or in the buddy list for that matter. PG_buddy isn't > that old. Perhaps that would be better abstracted to something like > page_in_main_allocator(). Perhaps we need something like: flags = page->flags; userflags = FLAG_BIT(USER_REFERENCED, flags & PG_referenced) | ... etc. for the flags we want to export. This will let us change to FLAG_BIT(USER_SLAB, PageSlab(page)) | if we make a virtual slab bit. And it shows up in grep. Unfortunately, i386 test_bit is an asm inline and not a macro so we can't hope for the compiler to fold up a bunch of identity bit mappings for us. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Make m68k cross compile like every other architecture.
On Monday 15 October 2007 3:25:35 pm Geert Uytterhoeven wrote: > 64-bit parisc tests if /usr/bin/hppa64-linux-gnu- exists. > If yes, it sets CROSS_COMPILE to hppa64-linux-gnu-. > If no, it sets CROSS_COMPILE to hppa64-linux- > > 32-bit parisc unconditionally sets CROSS_COMPILE to hppa-linux-. > > This still breaks Rob's setup if his compiler is called differently. Another thing to take into account is that kconfig was recently changed to save ARCH and CROSS_COMPILE in the .config file: http://lwn.net/Articles/253889/ Presumably that means you'd only have to specify your arch and cross compiler during config, and then if you re-used that config it would re-use those settings. But the existing makefile discards anything that isn't explicitly overridden on the make command line at each stage of the build. It seems to me any fix should only reset CROSS_COMPILE if there isn't already a value for it. (Otherwise there's a potentially subtle bug where a year from now you might have "m68k-linux-gnu-gcc" and "m68k-linux-gnu-pcc" and want to compare the results of building with the different compilers.) I still lean towards considering any attempt to cross compile without setting CROSS_COMPILE an error, and not guessing at what the user meant. But perhaps that's just personal preference... Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] maps3: make page monitoring /proc file optional
On Tue, Oct 16, 2007 at 10:03:39AM +1000, Rusty Russell wrote: > On Tuesday 16 October 2007 08:51:17 Jeremy Fitzhardinge wrote: > > Dave Hansen wrote: > > > On Mon, 2007-10-15 at 17:26 -0500, Matt Mackall wrote: > > >> +config PROC_PAGE_MONITOR > > >> + default y > > >> + bool "Enable /proc page monitoring" if EMBEDDED && PROC_FS && > > >> MMU + help > > >> + Various /proc files exist to monitor process memory > > >> utilization: + /proc/pid/smaps, /proc/pid/clear_refs, > > >> /proc/pid/pagemap, + /proc/kpagecount, and /proc/kpageflags. > > >> Disabling these + interfaces will reduce the size of the kernel > > >> by approximately 4kb. > > > > > > How about pulling the EMBEDDED off there? I certainly want it for > > > non-embedded reasons. ;) > > > > That means it will only bother asking you if you've set EMBEDDED; > > otherwise its always on. > > But it's at the least confusing. Surely this option should depend on MMU and > PROC_FS, and the prompt depend on EMBEDDED? > > That might be implied by the Kconfig layout, but AFAICT this patch removed > the > explicit MMU dependency. > > Rusty. Wasn't this your patch? You're right, it ought to say "depends PROC_FS && MMU". Will fix. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] docbook: fix usb content
From: Randy Dunlap <[EMAIL PROTECTED]> Fix USB docbook warnings. Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:487): No description found for parameter 'g' Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:506): No description found for parameter 'g' Warning(linux-2.6.23-git8//drivers/usb/core/hub.c:1416): No description found for parameter 'usb_dev' Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- drivers/usb/core/hub.c |6 +- include/linux/usb/gadget.h |4 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) --- linux-2.6.23-git8.orig/include/linux/usb/gadget.h +++ linux-2.6.23-git8/include/linux/usb/gadget.h @@ -481,7 +481,7 @@ static inline void *get_gadget_data (str /** * gadget_is_dualspeed - return true iff the hardware handles high speed - * @gadget: controller that might support both high and full speeds + * @g: controller that might support both high and full speeds */ static inline int gadget_is_dualspeed(struct usb_gadget *g) { @@ -497,7 +497,7 @@ static inline int gadget_is_dualspeed(st /** * gadget_is_otg - return true iff the hardware is OTG-ready - * @gadget: controller that might have a Mini-AB connector + * @g: controller that might have a Mini-AB connector * * This is a runtime test, since kernels with a USB-OTG stack sometimes * run on boards which only have a Mini-B (or Mini-A) connector. --- linux-2.6.23-git8.orig/drivers/usb/core/hub.c +++ linux-2.6.23-git8/drivers/usb/core/hub.c @@ -1407,7 +1407,11 @@ fail: /** - * Similar to usb_disconnect() + * usb_deauthorize_device - deauthorize a device (usbcore-internal) + * @usb_dev: USB device + * + * Move the USB device to a very basic state where interfaces are disabled + * and the device is in fact unconfigured and unusable. * * We share a lock (that we have) with device_del(), so we need to * defer its call. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] docbook: fix filesystems content
From: Randy Dunlap <[EMAIL PROTECTED]> Fix filesystems docbook warnings. Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value' Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map' Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- fs/debugfs/file.c | 41 +++-- include/linux/jbd.h |1 + 2 files changed, 36 insertions(+), 6 deletions(-) --- linux-2.6.23-git8.orig/fs/debugfs/file.c +++ linux-2.6.23-git8/fs/debugfs/file.c @@ -227,15 +227,24 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_x16, debugf DEFINE_SIMPLE_ATTRIBUTE(fops_x32, debugfs_u32_get, debugfs_u32_set, "0x%08llx\n"); -/** - * debugfs_create_x8 - create a debugfs file that is used to read and write an unsigned 8-bit value - * debugfs_create_x16 - create a debugfs file that is used to read and write an unsigned 16-bit value - * debugfs_create_x32 - create a debugfs file that is used to read and write an unsigned 32-bit value +/* + * debugfs_create_x{8,16,32} - create a debugfs file that is used to read and write an unsigned {8,16,32}-bit value * - * These functions are exactly the same as the above functions, (but use a hex - * output for the decimal challenged) for details look at the above unsigned + * These functions are exactly the same as the above functions (but use a hex + * output for the decimal challenged). For details look at the above unsigned * decimal functions. */ + +/** + * debugfs_create_x8 - create a debugfs file that is used to read and write an unsigned 8-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x8(const char *name, mode_t mode, struct dentry *parent, u8 *value) { @@ -243,6 +252,16 @@ struct dentry *debugfs_create_x8(const c } EXPORT_SYMBOL_GPL(debugfs_create_x8); +/** + * debugfs_create_x16 - create a debugfs file that is used to read and write an unsigned 16-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x16(const char *name, mode_t mode, struct dentry *parent, u16 *value) { @@ -250,6 +269,16 @@ struct dentry *debugfs_create_x16(const } EXPORT_SYMBOL_GPL(debugfs_create_x16); +/** + * debugfs_create_x32 - create a debugfs file that is used to read and write an unsigned 32-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x32(const char *name, mode_t mode, struct dentry *parent, u32 *value) { --- linux-2.6.23-git8.orig/include/linux/jbd.h +++ linux-2.6.23-git8/include/linux/jbd.h @@ -372,6 +372,7 @@ struct jbd_revoke_table_s; * @h_sync: flag for sync-on-close * @h_jdata: flag to force data journaling * @h_aborted: flag indicating fatal error on handle + * @h_lockdep_map: lockdep info for debugging lock problems **/ /* Docbook can't yet cope with the bit fields, but will leave the documentation - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] docbook: fix libata content
From: Randy Dunlap <[EMAIL PROTECTED]> Fix libata docbook warnings. Warning(linux-2.6.23-git8//drivers/ata/libata-scsi.c:3251): No description found for parameter 'dev' Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- drivers/ata/libata-scsi.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.23-git8.orig/drivers/ata/libata-scsi.c +++ linux-2.6.23-git8/drivers/ata/libata-scsi.c @@ -3239,7 +3239,7 @@ static void ata_scsi_handle_link_detach( /** * ata_scsi_media_change_notify - send media change event - * @atadev: Pointer to the disk device with media change event + * @dev: Pointer to the disk device with media change event * * Tell the block layer to send a media change notification * event. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] docbook: fix kernel-api content
From: Randy Dunlap <[EMAIL PROTECTED]> Fix kernel-api docbook warnings. Warning(linux-2.6.23-git8//drivers/message/fusion/mptscsih.c:2618): No description found for parameter 'sc' Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- drivers/message/fusion/mptscsih.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) --- linux-2.6.23-git8.orig/drivers/message/fusion/mptscsih.c +++ linux-2.6.23-git8/drivers/message/fusion/mptscsih.c @@ -2605,14 +2605,10 @@ mptscsih_set_scsi_lookup(MPT_ADAPTER *io } /** - * SCPNT_TO_LOOKUP_IDX - * - * search's for a given scmd in the ScsiLookup[] array list - * + * SCPNT_TO_LOOKUP_IDX - searches for a given scmd in the ScsiLookup[] array list * @ioc: Pointer to MPT_ADAPTER structure - * @scmd: scsi_cmnd pointer - * - **/ + * @sc: scsi_cmnd pointer + */ static int SCPNT_TO_LOOKUP_IDX(MPT_ADAPTER *ioc, struct scsi_cmnd *sc) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More Large blocksize benchmarks
Hello everyone, I'm stealing the cc list and reviving and old thread because I've finally got some numbers to go along with the Btrfs variable blocksize feature. The basic idea is to create a read/write interface to map a range of bytes on the address space, and use it in Btrfs for all metadata operations (file operations have always been extent based). So, instead of casting buffer_head->b_data to some structure, I read and write at offsets in a struct extent_buffer. The extent buffer is very small and backed by an address space, and I get large block sizes the same way file_write gets to write to 16k at a time, by finding the appropriate page in the addess space. This is an over simplification since I try to cache these mapping decisions to avoid using too much CPU, but hopefully you get the idea. The advantage to this approach is the changes are all inside Btrfs. No extra kernel patches were required. Dave reported that XFS saw much higher write throughput with large blocksizes, but so far I'm seeing the most benefits during reads. The next step is a bunch more benchmarks. I've done the first round and posted it here: http://oss.oracle.com/~mason/blocksizes/ The Btrfs code makes it relatively easy to experiment, and so this may be a good step toward figuring out if some automagic solution is worth it in general. I can even use different sizes for nodes and leaves, although I haven't done much testing at all there yet. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: > > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > > > According to latest memory ordering specification documents from > > > > Intel and AMD, both manufacturers are committed to in-order loads > > > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > > > may be a simple barrier. > > > > > > > > http://developer.intel.com/products/processor/manuals/318147.pdf > > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > > > > > Hi > > > > > > I'm just wondering about one thing --- what is LFENCE instruction > > > good for? > > > > > > SFENCE is for enforcing ordering in write-combining buffers (it > > > doesn't have sense in write-back cache mode). > > > MFENCE is for preventing of moving stores past loads. > > > > > > But what is LFENCE for? I read the above documents and they already > > > say that CPUs have ordered loads. > > > > > > > The cpus also have an explicit set of instructions that deliberately do > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > I know about unordered stores (movnti & similar) --- they basically use > write-combining method on memory that is normally write-back --- and they > need sfence. But which one instruction does unordered load and needs > lefence? Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/11] maps3: add proportional set size accounting in smaps
On Mon, Oct 15, 2007 at 04:36:38PM -0700, David Rientjes wrote: > On Mon, 15 Oct 2007, Matt Mackall wrote: > > > Index: l/fs/proc/task_mmu.c > > === > > --- l.orig/fs/proc/task_mmu.c 2007-10-14 13:35:31.0 -0500 > > +++ l/fs/proc/task_mmu.c2007-10-14 13:36:56.0 -0500 > > @@ -122,6 +122,27 @@ struct mem_size_stats > > unsigned long private_clean; > > unsigned long private_dirty; > > unsigned long referenced; > > + > > + /* > > +* Proportional Set Size(PSS): my share of RSS. > > +* > > +* PSS of a process is the count of pages it has in memory, where each > > +* page is divided by the number of processes sharing it. So if a > > +* process has 1000 pages all to itself, and 1000 shared with one other > > +* process, its PSS will be 1500. - Matt Mackall, lwn.net > > +*/ > > + u64 pss; > > + /* > > +* To keep (accumulated) division errors low, we adopt 64bit pss and > > +* use some low bits for division errors. So (pss >> PSS_DIV_BITS) > > +* would be the real byte count. > > +* > > +* A shift of 12 before division means(assuming 4K page size): > > +* - 1M 3-user-pages add up to 8KB errors; > > +* - supports mapcount up to 2^24, or 16M; > > +* - supports PSS up to 2^52 bytes, or 4PB. > > +*/ > > +#define PSS_DIV_BITS 12 > > }; > > > > I know this gets moved again in the eighth patch of the series, but the > #define still has no place inside the struct definition. Agreed. > The pss is going to need accessor functions, preferably inlined, and the > comment adjusted stating that all accesses should be through those > functions and not directly to the mem_size_stats struct. > > static inline u64 pss_up(unsigned long pss) > { > return pss << PSS_DIV_BITS; > } > > static inline unsigned long pss_down(u64 pss) > { > return pss >> PSS_DIV_BITS; > } I think that's overkill for something that has exactly one use of each. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] cpuset update_cgroup_cpus_allowed
Paul Jackson wrote: Paul M wrote: Here's an alternative for consideration, below. I don't see the alternative -- I just see my patch, with the added blurbage: #12 - /usr/local/google/home/menage/kernel9/linux/kernel/cpuset.c # action=edit type=text Should I be increasing my caffeine intake? Bah. Trying again: Here's an alternative for consideration, below. The main differences are: - currently against an older kernel with pre-cgroup cpusets, so it uses tasklist_lock and do_each_thread(); a cgroup version would use cgroup iterators as yours does - solves the race between sched_setaffinity() and update_cpumask() by having sched_setaffinity() check for changes to cpuset_cpus_allowed() after doing set_cpus_allowed() - guarantees to only act on each process once (so guarantees forward progress, in the absence of fork bombs. (And could be adapted to handle fork bombs too) - uses a priority heap to pick the processes to act on, based on start time - uses lock_cpu_hotplug() to avoid races with CPU hotplug; sadly I think this is gone in more recent kernels, so some other synchronization would be needed Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks: - collect batches of tasks under tasklist_lock and then call set_cpus_allowed() on them outside the lock (since this can sleep). - add a simple generic priority heap type to allow efficient collection of batches of tasks to be processed without duplicating or missing any tasks in subsequent batches. - avoid races with hotplug events via lock_cpu_hotplug() - make "cpus" file update a no-op if the mask hasn't changed - fix race between update_cpumask() and sched_setaffinity() by making sched_setaffinity() to post-check that it's not running on any cpus outside cpuset_cpus_allowed(). include/linux/prio_heap.h | 56 + kernel/cpuset.c | 103 -- kernel/sched.c| 13 + lib/Makefile |2 lib/prio_heap.c | 68 ++ 5 files changed, 238 insertions(+), 4 deletions(-) --- /dev/null 1969-12-31 16:00:00.0 -0800 +++ linux/include/linux/prio_heap.h 2007-10-12 16:43:27.0 -0700 @@ -0,0 +1,56 @@ +#ifndef _LINUX_PRIO_HEAP_H +#define _LINUX_PRIO_HEAP_H + +/* + * Simple insertion-only static-sized priority heap containing + * pointers, based on CLR, chapter 7 + */ + +#include + +/** + * struct ptr_heap - simple static-sized priority heap + * @ptrs - pointer to data area + * @max - max number of elements that can be stored in @ptrs + * @size - current number of valid elements in @ptrs (in the range [EMAIL PROTECTED] + */ +struct ptr_heap { + void **ptrs; + int max; + int size; +}; + +/** + * heap_init - initialize an empty heap with a given memory size + * @heap: the heap structure to be initialized + * @size: amount of memory to use in bytes + * @gfp_mask: mask to pass to kmalloc() + */ +extern int heap_init(struct ptr_heap *heap, size_t size, gfp_t gfp_mask); + +/** + * heap_free - release a heap's storage + * @heap: the heap structure whose data should be released + */ +void heap_free(struct ptr_heap *heap); + +/** + * heap_insert - insert a value into the heap and return any overflowed value + * @heap: the heap to be operated on + * @p: the pointer to be inserted + * @gt: comparison operator, which should implement "greater than" + * + * Attempts to insert the given value into the priority heap. If the + * heap is full prior to the insertion, then the resulting heap will + * consist of the smallest @max elements of the original heap and the + * new element; the greatest element will be removed from the heap and + * returned. Note that the returned element will be the new element + * (i.e. no change to the heap) if the new element is greater than all + * elements currently in the heap. + */ +extern void *heap_insert(struct ptr_heap *heap, void *p, +int (*gt)(void *, void *)); + + + +#endif /* _LINUX_PRIO_HEAP_H */ linux/kernel/cpuset.c --- linux/kernel/cpuset.c 2007-10-05 17:46:09.0 -0700 +++ linux/kernel/cpuset.c 2007-10-12 16:24:49.0 -0700 @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -839,6 +840,36 @@ unlock_cpu_hotplug(); } +static int inline started_after_time(struct task_struct *t1, +struct timespec *time, +struct task_struct *t2) +{ + int start_diff = timespec_compare(>start_time, time); + if (start_diff > 0) { + return 1; + } else if (start_diff < 0) { + return 0; + } else { + /* +* Arbitrarily, if two processes started at the same +* time, we'll
Re: [RFC] cpuset update_cgroup_cpus_allowed
Paul M wrote: > Here's an alternative for consideration, below. I don't see the alternative -- I just see my patch, with the added blurbage: #12 - /usr/local/google/home/menage/kernel9/linux/kernel/cpuset.c # action=edit type=text Should I be increasing my caffeine intake? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Mikulas Patocka wrote: I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ecryptfs: clean up attribute mess
It isn't that hard to add simple kset attributes, so don't go through all the gyrations of creating your own object type and show and store functions. Just use the functions that are already present. This makes things much simpler. Note, the version_str string violates the "one value per file" rule for sysfs. I suggest changing this now (individual files per type supported is one suggested way.) Cc: Michael A. Halcrow <[EMAIL PROTECTED]> Cc: Michael C. Thompson <[EMAIL PROTECTED]> Cc: Tyler Hicks <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- fs/ecryptfs/main.c | 88 - 1 file changed, 20 insertions(+), 68 deletions(-) --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -689,58 +689,14 @@ static int ecryptfs_init_kmem_caches(voi return 0; } -struct ecryptfs_obj { - char *name; - struct list_head slot_list; - struct kobject kobj; -}; - -struct ecryptfs_attribute { - struct attribute attr; - ssize_t(*show) (struct ecryptfs_obj *, char *); - ssize_t(*store) (struct ecryptfs_obj *, const char *, size_t); -}; - -static ssize_t -ecryptfs_attr_store(struct kobject *kobj, - struct attribute *attr, const char *buf, size_t len) -{ - struct ecryptfs_obj *obj = container_of(kobj, struct ecryptfs_obj, - kobj); - struct ecryptfs_attribute *attribute = - container_of(attr, struct ecryptfs_attribute, attr); - - return (attribute->store ? attribute->store(obj, buf, len) : 0); -} +static decl_subsys(ecryptfs, NULL, NULL); -static ssize_t -ecryptfs_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) -{ - struct ecryptfs_obj *obj = container_of(kobj, struct ecryptfs_obj, - kobj); - struct ecryptfs_attribute *attribute = - container_of(attr, struct ecryptfs_attribute, attr); - - return (attribute->show ? attribute->show(obj, buf) : 0); -} - -static struct sysfs_ops ecryptfs_sysfs_ops = { - .show = ecryptfs_attr_show, - .store = ecryptfs_attr_store -}; - -static struct kobj_type ecryptfs_ktype = { - .sysfs_ops = _sysfs_ops -}; - -static decl_subsys(ecryptfs, _ktype, NULL); - -static ssize_t version_show(struct ecryptfs_obj *obj, char *buff) +static ssize_t version_show(struct kset *kset, char *buff) { return snprintf(buff, PAGE_SIZE, "%d\n", ECRYPTFS_VERSIONING_MASK); } -static struct ecryptfs_attribute sysfs_attr_version = __ATTR_RO(version); +static struct subsys_attribute version_attr = __ATTR_RO(version); static struct ecryptfs_version_str_map_elem { u32 flag; @@ -753,7 +709,7 @@ static struct ecryptfs_version_str_map_e {ECRYPTFS_VERSIONING_XATTR, "metadata in extended attribute"} }; -static ssize_t version_str_show(struct ecryptfs_obj *obj, char *buff) +static ssize_t version_str_show(struct kset *kset, char *buff) { int i; int remaining = PAGE_SIZE; @@ -780,34 +736,33 @@ out: return total_written; } -static struct ecryptfs_attribute sysfs_attr_version_str = __ATTR_RO(version_str); +static struct subsys_attribute version_attr_str = __ATTR_RO(version_str); + +static struct attribute *attributes[] = { + _attr.attr, + _attr_str.attr, + NULL, +}; + +static struct attribute_group attr_group = { + .attrs = attributes, +}; static int do_sysfs_registration(void) { int rc; - if ((rc = subsystem_register(_subsys))) { - printk(KERN_ERR - "Unable to register ecryptfs sysfs subsystem\n"); - goto out; - } - rc = sysfs_create_file(_subsys.kobj, - _attr_version.attr); + rc = subsystem_register(_subsys); if (rc) { printk(KERN_ERR - "Unable to create ecryptfs version attribute\n"); - subsystem_unregister(_subsys); + "Unable to register ecryptfs sysfs subsystem\n"); goto out; } - rc = sysfs_create_file(_subsys.kobj, - _attr_version_str.attr); + rc = sysfs_create_group(_subsys.kobj, _group); if (rc) { printk(KERN_ERR - "Unable to create ecryptfs version_str attribute\n"); - sysfs_remove_file(_subsys.kobj, - _attr_version.attr); + "Unable to create ecryptfs version attributes\n"); subsystem_unregister(_subsys); - goto out; } out: return rc; @@ -815,10 +770,7 @@ out: static void do_sysfs_unregistration(void) { - sysfs_remove_file(_subsys.kobj, - _attr_version.attr); - sysfs_remove_file(_subsys.kobj, - _attr_version_str.attr); +
Re: [git pull] scheduler updates for v2.6.24
On Tuesday 16 October 2007 00:17, Ingo Molnar wrote: > Linus, please pull the latest scheduler git tree from: > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git > > It contains lots of scheduler updates from lots of people - hopefully > the last big one for quite some time. Most of the focus was on > performance (both micro-performance and scalability/balancing), but > there's the fair-scheduling feature now Kconfig selectable too. Find the > shortlog below. Nice work... However it's a pity all the balancing stuff got wildly changed in 2.6.23 and then somewhat changed back again now. Despite appearances, a lot of those things weren't actually *completely* arbitrary values. I fear that it will make finding performance regressions harder than it should have... Anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] git scsi misc include fix
James wrote: > The requirement for struct scatterlist is the same > before and after the gid scsi-misc patch. Not so. The git-scsi-misc.patch in 2.6.23-mm1 clearly adds the line: struct scatterlist sense_sgl; as part of the added struct scsi_eh_save in scsi/scsi_eh.h. This bit me while I was doing a bisection on 2.6.23-mm1, for another problem, in git-sched, which is discussed in the lkml thread: git-sched patch won't boot on SN arch, 2.6.23-mm1 This is using sn2_defconfig. The full 2.6.23-mm1 patch set builds ok, because another patch, git-block.patch as I recall, includes scatterlist.h some other way, but for the following range of patches in 2.6.23-mm1, on the configuration sn2_defconfig, the build is broken, due to 'struct scatterlist' being an incomplete type: git-scsi-misc.patch git-scsi-misc-include-fix.patch git-scsi-misc-fixup.patch qla2xxx-printk-fixes.patch pci-error-recovery-symbios-scsi-base-support.patch pci-error-recovery-symbios-scsi-first-failure.patch nsp32_restart_autoscsi-remove-error-check.patch scsi-send-media-state-change-modification-events.patch scsi-early-detection-of-medium-not-present-updated.patch mptbase-reset-ioc-initiator-during-pci-resume.patch scsi-use-notifier-chain-for-asynchronous-event.patch initio-fix-conflict-when-loading-driver.patch git-block.patch > it should also fail with vanilla 2.6.23 I don't know about the vanilla 2.6.23 case. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 2/2] paravirt: clean up lazy mode handling
On Saturday 13 October 2007 06:40:36 Jeremy Fitzhardinge wrote: > [ Changes since last post: fixed up lguest ] This is really nice. Thanks Jeremy! This will conflict a little with my own churn (file movement), but no great drama if it goes in soon. Acked-by: Rusty Russell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] maps3: make page monitoring /proc file optional
On Tuesday 16 October 2007 08:51:17 Jeremy Fitzhardinge wrote: > Dave Hansen wrote: > > On Mon, 2007-10-15 at 17:26 -0500, Matt Mackall wrote: > >> +config PROC_PAGE_MONITOR > >> + default y > >> + bool "Enable /proc page monitoring" if EMBEDDED && PROC_FS && > >> MMU + help > >> + Various /proc files exist to monitor process memory > >> utilization: + /proc/pid/smaps, /proc/pid/clear_refs, > >> /proc/pid/pagemap, + /proc/kpagecount, and /proc/kpageflags. > >> Disabling these + interfaces will reduce the size of the kernel > >> by approximately 4kb. > > > > How about pulling the EMBEDDED off there? I certainly want it for > > non-embedded reasons. ;) > > That means it will only bother asking you if you've set EMBEDDED; > otherwise its always on. But it's at the least confusing. Surely this option should depend on MMU and PROC_FS, and the prompt depend on EMBEDDED? That might be implied by the Kconfig layout, but AFAICT this patch removed the explicit MMU dependency. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-usb-users] OHCI root_port_reset() deadly loop...
From: David Brownell <[EMAIL PROTECTED]> Date: Mon, 15 Oct 2007 16:39:10 -0700 > > Bad news, even with the rwsem after a lot more testing I can still > > trigger the hang in ohci_hub_control() :-( > > > > I think we need to go back to considering the total serialization > > approach to this problem. > > We shouldn't need that. What happens if you add an msleep(5) > before ehci-hcd::ehci_run() drops ehci_cf_port_reset_rwsem? What happens is the heisenbug will go away for another week. > The theory there being that the switch triggered by setting CF > doesn't take effect instantaneously, contrary to the effective > assumption of that code. A delay of 5 msec seems like it should > be more than enough, but that's kind of a guess ... it's good to > keep that low, since unfortunately that's in the critical path > for OLPC "resume from idle". I want to help with this, but if I even breath on the kernel the bug goes away. The race just gets harder to trigger, and if we just keep adding things it'll make the problem go away but for the absolutely wrong reasons. The only way we will provably fix this is to make sure EHCI initialize fully, first, regardless of kernel config or what userland does. Also, David, you haven't done anything with the feedback I gave to the most recent revision of the OHCI hub reset anti-wedge patch. You removed the debug logging when the outer-loop timeout expires, and I asked that you put that back so that if it happens there is some chance to know that this is what happened. If it's not supposed to happen, there is no harm in putting the debugging log message there so that if the impossible does happen we find out about it. I really don't think it's appropriate for that bug fix to sit yet another week. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] git scsi misc include fix
On Mon, 15 Oct 2007 19:35:30 -0400 James Bottomley <[EMAIL PROTECTED]> wrote: > On Sat, 2007-10-13 at 22:35 -0700, Paul Jackson wrote: > > From: Paul Jackson <[EMAIL PROTECTED]> > > > > The added line in scsi_eh.h: > > struct scatterlist sense_sgl; > > fails to compile, with the error: > > field 'sense_sgl' has incomplete type > > unless scatterlist.h happens to be included > > somehow already ... which it isn't always. > > > > So include scatterlist.h in scsi_eh.h directly. > > > > Signed-off-by: Paul Jackson <[EMAIL PROTECTED]> > > > > --- > > > > This patch goes after the patch 'git-scsi-misc.patch' > > > > include/scsi/scsi_eh.h |1 + > > 1 file changed, 1 insertion(+) > > > > --- 2.6.23-mm1.orig/include/scsi/scsi_eh.h 2007-10-13 01:13:26.568876534 > > -0700 > > +++ 2.6.23-mm1/include/scsi/scsi_eh.h 2007-10-13 01:31:32.911855338 > > -0700 > > @@ -2,6 +2,7 @@ > > #define _SCSI_SCSI_EH_H > > > > #include > > +#include > > struct scsi_device; > > struct Scsi_Host; > > > I've added linux-scsi which should be cc'd on all SCSI issues. > > I don't quite believe this, though. The requirement for struct > scatterlist is the same before and after the gid scsi-misc patch. If > the compile fails with git-scsi-misc because of a missing scatterlist > include, it should also fail with vanilla 2.6.23 without the git > patch ... could you see if you can find out why it doesn't? > git-scsi-misc adds this: struct scsi_eh_save { int result; enum dma_data_direction data_direction; unsigned char cmd_len; unsigned char cmnd[MAX_COMMAND_SIZE]; void *buffer; unsigned bufflen; unsigned short use_sg; int resid; struct scatterlist sense_sgl; }; which will not compile unless the includer has earlier included scatterlist.h. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Killing a network connection
Andi Kleen <[EMAIL PROTECTED]> wrote: > Stefan Monnier <[EMAIL PROTECTED]> writes: >> The main use for me is to deal with dangling connections due to taking >> network interfaces up with different IP addresses (typically the wlan0 >> interface where the IP is different because I've modes from an AP to >> another). Of course, maybe there's another way to solve this particular >> problem, in case I'd like to hear about it as well. > > Long ago I did a 2.4 patch that solved exactly this problem. It introduced > a new ifconfig flag "dynamic" and when a dynamic address went down > all TCP connections originating from it were killed. It's still available > in older SUSE releases. I might post a forward port later. There is a /proc/sys/net/ipv4/ip_dynaddr sysctl in 2.6.21. -- If at first you don't succeed, call it version 1.0 Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
[adding back CCs which were dropped because I'm stupid - sorry!] On 10/16/07, Rob Landley <[EMAIL PROTECTED]> wrote: > On Monday 15 October 2007 5:27:55 am Julian Calaby wrote: > > On 10/15/07, Rob Landley <[EMAIL PROTECTED]> wrote: > > > On Monday 15 October 2007 4:06:20 am Julian Calaby wrote: > > > > On 10/15/07, Rob Landley <[EMAIL PROTECTED]> wrote: > > > > > I note that the eth0 and eth1 names are dynamically assigned on a > > > > > first come first serve basis (like scsi). This never causes me a > > > > > problem because the driver loading order is constant, and once you > > > > > figure out that eth0 is gigabit and eth1 is the 80211g it _stays_ > > > > > that way across reboots, reliably. Yeah, it's a heuristic. Hands up > > > > > everybody relying on such a heuristic in the real world. > > > > > > > > Umm, not quite, from my experiences with pre-production wireless > > > > drivers, (another story, another time) fancy stuff is being done in > > > > udev to make sure that your gigabit card is always assigned to eth0. > > > > > > I remember building a 2.4 kernel, statically linking in all the drivers, > > > and getting the ethernet devices showing up in a reliable order for > > > years. Where does the need for fancy stuff come in? > > > > I remember that too. In fact, I have had no issues with network card > > enumeration order, outside my own inexperience and stupidity. > > > > However, this sort of thing is needed now because of the various types > > of hotpluggable networking devices, e.g. USB 802.11 cards, USB > > ethernet cards, PCMCIA, etc. > > I thought the strategy was just to scan the hotpluggable busses after the > non-hotpluggable busses. My (practical) experience is that I couldn't guarantee which card was which. (I remember once where it changed over a kernel re-compile) So my solution, before Debian's persistent naming scheme appeared, was to check it after every new kernel and make sure my config matched up with the names of the physical interfaces. > > And yes, PCMCIA worked fine for ages, but > > usually you'd never have more than one PCMCIA network card. > > Still don't, but presumably the slots are scanned in a reliable order so if > the cards are always present on bootup in the same slots, they'd stay in that > order. Well, yes and no. My gut feeling is that it's probed like PCI cards are. They're initialised when the drivers are loaded, and not before, as such, there are no guarantees which card will be initialised first. - and anyway, what happens if you plug them in in a different order? > > Personally, I use 2 different usb network cards, and I'm quite > > comforted to know that the 802.11a one is always wlan0, and the > > 802.11b/g one is always wlan1. > > So if I have a USB 100baseT adapter, and I boot with it plugged in, it'll > potentially come before my built-in wireless card due to ordering based on > device type? Ok, firstly the 100baseT adapter will be named something like ethX, the wireless card will most likely be named something like wlanX. Now let's say your laptop has a built in ethernet card. So, we'll assume a modular kernel, with the module "usbnet" for the usb card and "e100" for the onboard card: If the "usbnet" module is loaded first, then initially, according to the kernel, the usb card will be eth0 and the built in one eth1. Now let's assume that, on the PCI bus, the USB controller is in a lower slot number than the network card. (highly likely, given that the network card is most likely external to the chipset of the laptop) It's pretty likely that the USB controller will have it's module loaded first, before the built in network card. At this point, it'll send out hotplug events for all it's children (root hubs, etc.) and eventually an event will be sent out for the usb network card. Now, at this point, it's impossible to say which one will claim eth0 first. Now, in my case, with my two wireless cards, what happens if I plug the 802.11b/g one in first? If this fancy renaming didn't happen, it'd end up with the name wlan0 and, hence, try to connect to the network which the 802.11a one is supposed to connect to. This is not a good thing. I also have to make the point that this has been happening all over the kernel, well before I started using it. Video4Linux and DVB devices can be USB, and the order the /dev/videoX nodes appear in is determined by the plugging order. IRDA cards, sound cards, usb devices, framebuffers, mice, keyboards, loopback devices, etc. all have the same "issue". (and annoyingly, they all have different ways of getting around it, or not) And to make one final point, getting right back to the initial parts of the discussion, at the end of the day, your SATA disk, IDE disk, USB disk and the CF card in your camera are all mass storage devices - they all work in a fairly similar way. You want to mount filesystems from all of them, and when you run low level tools, like parted or whatever, you want them all to behave in
Re: [patch 1/4] Linux Kernel Markers - Architecture Independent Code
> I think the main issue with the solution you propose is that it doesn't > deal with markers in modules, am I right ? My suggestion applies as well to modules as anything else. What "like Module.symvers" means is something like: name1 vmlinux %s name2 fs/nfs/nfs %d All the modules built by the same kernel build go into this one file. Modules packaged separately for the same kernel could provide additional files of the same kind. > I will soon come with a marker iterator and a module that provides a > userspace -and in kernel- interface to enable/disable markers. Actually, > I already have the code ready in my LTTng snapshots. I can provide a > link if you want to have a look. That's clearly straightforward to do given the basic markers data structures. It does not address the need for an offline list of markers available in a particular kernel build or set of modules that you are not running right now. The approach now available for that is grovelling through the markers data structures extracted from vmlinux and .ko ELF files offline. That is more work than one should have to do, and has lots of problems with coping with different packaging details, etc. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] [PATCH 00/12] xen/paravirt_ops patches for 2.6.24
On Tue, 2007-10-16 at 00:03 +0200, Andi Kleen wrote: > > Subject: [PATCH 12/12] xfs: eagerly remove vmap mappings to avoid > > upsetting Xen > > This should be probably done unconditionally because it's a undefined > dangerous condition everywhere. Should be done unconditionally. One could remap the underlying physical space to include an MMIO region, and speculative reads from the cacheable virtual mapping of that region could move the robot arm, destroying the world. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/11] maps3: introduce task_size_of for all arches
On Mon, 15 Oct 2007, Matt Mackall wrote: > Index: l/include/asm-mips/processor.h > === > --- l.orig/include/asm-mips/processor.h 2007-10-09 17:37:58.0 > -0500 > +++ l/include/asm-mips/processor.h2007-10-10 11:46:30.0 -0500 > @@ -45,6 +45,8 @@ extern unsigned int vced_count, vcei_cou > * space during mmap's. > */ > #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) > +#define TASK_SIZE_OF(tsk)\ > + (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) > #endif > > #ifdef CONFIG_64BIT > @@ -65,6 +67,8 @@ extern unsigned int vced_count, vcei_cou > #define TASK_UNMAPPED_BASE \ > (test_thread_flag(TIF_32BIT_ADDR) ? \ > PAGE_ALIGN(TASK_SIZE32 / 3) : PAGE_ALIGN(TASK_SIZE / 3)) > +#define TASK_SIZE_OF(tsk)\ > + (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) > #endif > > #define NUM_FPU_REGS 32 These need to use test_tsk_thread_flag(tsk, TIF_32BIT_ADDR). > Index: l/include/asm-parisc/processor.h > === > --- l.orig/include/asm-parisc/processor.h 2007-10-09 17:36:49.0 > -0500 > +++ l/include/asm-parisc/processor.h 2007-10-10 11:46:30.0 -0500 > @@ -32,7 +32,8 @@ > #endif > #define current_text_addr() ({ void *pc; current_ia(pc); pc; }) > > -#define TASK_SIZE (current->thread.task_size) > +#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size) > +#define TASK_SIZE (current->thread.task_size) > #define TASK_UNMAPPED_BASE (current->thread.map_base) > > #define DEFAULT_TASK_SIZE32 (0xFFF0UL) TASK_SIZE_OF() should be defined in terms of TASK_SIZE, just like it is for ia64. > Index: l/include/asm-powerpc/processor.h > === > --- l.orig/include/asm-powerpc/processor.h2007-10-09 17:37:58.0 > -0500 > +++ l/include/asm-powerpc/processor.h 2007-10-10 11:46:30.0 -0500 > @@ -99,7 +99,9 @@ extern struct task_struct *last_task_use > */ > #define TASK_SIZE_USER32 (0x0001UL - (1*PAGE_SIZE)) > > -#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ > +#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ > + TASK_SIZE_USER32 : TASK_SIZE_USER64) > +#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ > TASK_SIZE_USER32 : TASK_SIZE_USER64) > Same. > /* This decides where the kernel will search for a free chunk of vm > Index: l/include/asm-s390/processor.h > === > --- l.orig/include/asm-s390/processor.h 2007-10-09 17:37:58.0 > -0500 > +++ l/include/asm-s390/processor.h2007-10-10 11:46:30.0 -0500 > @@ -75,6 +75,8 @@ extern struct task_struct *last_task_use > > # define TASK_SIZE (test_thread_flag(TIF_31BIT) ? \ > (0x8000UL) : (0x400UL)) > +# define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_31BIT) ? \ > + (0x8000UL) : (0x400UL)) > # define TASK_UNMAPPED_BASE (TASK_SIZE / 2) > # define DEFAULT_TASK_SIZE (0x400UL) > Same. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched domain sysctl: free kstrdup allocations
The procnames for the cpu and domain were allocated via kstrdup and so should also be freed. The names for the files are static, but we can differentiate them by the presence of the proc_handler. If a kstrdup (of < 32 characters) fails the sysctl code will not see the procname or remaining table entries, but any child tables and names will be reclaimed upon free. Signed-off-by: Milton Miller <[EMAIL PROTECTED]> --- Hi Ingo. It occurred to me this morning that the procname field was dynamically allocated and needed to be freed. I started to put in break statements when allocation failed but it was approaching 50% error handling code. I came up with this alternative of looping while entry->mode is set and checking proc_handler instead of ->table. Alternatively, the string version of the domain name and cpu number could be stored the structs. I verified by compiling CONFIG_DEBUG_SLAB and checking the allocation counts after taking a cpuset exclusive and back. milton Index: kernel/kernel/sched.c === --- kernel.orig/kernel/sched.c 2007-10-15 12:21:38.0 -0500 +++ kernel/kernel/sched.c 2007-10-15 12:22:12.0 -0500 @@ -5290,11 +5290,20 @@ static struct ctl_table *sd_alloc_ctl_en static void sd_free_ctl_entry(struct ctl_table **tablep) { - struct ctl_table *entry = *tablep; + struct ctl_table *entry; - for (entry = *tablep; entry->procname; entry++) + /* +* In the intermediate directories, both the child directory and +* procname are dynamically allocated and could fail but the mode +* will always be set. In the lowest directory the names are +* static strings and all have proc handlers. +*/ + for (entry = *tablep; entry->mode; entry++) { if (entry->child) sd_free_ctl_entry(>child); + if (entry->proc_handler == NULL) + kfree(entry->procname); + } kfree(*tablep); *tablep = NULL; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What still uses the block layer?
On Monday October 15, [EMAIL PROTECTED] wrote: > > Therefore it is best to not have stable single-number naming schemes > > for any devices on any machines. Why? Because it ensure there will > > not be any second class citizens. > > This is where we disagree. The existence of devices you cannot stably > enumerate does not eliminate the existence of devices you trivially can. No, but it dramatically reduces that value of being able to enumerate those devices. > > Pulling out the "IBM numa cluster with multiple SAS enclosures _and_ > firewire" > infrastructure to find the root partition on my hard drive may be good for > the IBM numa clusters, but only at the expense of complicating this part of > my laptop's infrastructure by an order of magnitude, and making embedded > systems nearly impossible to put together. If "one size fits all" were true, > my cell phone would be running Red Hat Enterprise. > > > If some devices that are even reasonably common (e.g. IDE drives) are > > stable, then some application developers or system integrators will > > work under the assumption of stability and whatever they build will > > break when you try it on different hardware. > > So you break the IDE drives to get laptop users to debug the Niagra set? The Breaking old behaviour is always bad... My computers with IDE interfaces still see stable "/dev/hda" devices. Are you saying the devices that used to be "hda" are now "sdb" ?? Maybe there is a .config option... > solution is to make the easy cases hard? Is it really that hard? > > Note that stable names a still a very real option. udev provides > > several. /dev/disk-by-path/XXX will be stable for lots of "screwed > > in" devices. /dev/disk-by-id will be stable for devices the report a > > unique id. etc. > > Here it's > > ls /dev/disk/by-path/ > pci-:00:1f.2-scsi-0:0:0:0pci-:00:1f.2-scsi-0:0:0:0-part4 > pci-:00:1f.2-scsi-0:0:0:0-part1 pci-:00:1f.2-scsi-0:0:0:0-part5 > pci-:00:1f.2-scsi-0:0:0:0-part2 pci-:00:1f.2-scsi-0:0:0:0-part6 > pci-:00:1f.2-scsi-0:0:0:0-part3 pci-:00:1f.2-scsi-1:0:0:0 > > And this is an improvement? Depends on your metric. "Easy to type" - I guess /dev/hda1 wins hands down. "Can be used in a script or config file and is guaranteed always to work until a screwdriver is used to change that device or it's controller" I think /dev/disk/by-path/pci-:00:1f.2-scsi-0:0:0:0-part1 is quite acceptable. What is your metric? > > > The different between IDE, SATA, SCSI and even USB is peripheral for > > the large majority of uses, and I think maintaining the distinction in > > the major/minor number or in the "primary" /dev name is - for the > > above reasons - more of a cost that a value. > > Is your definition of "the large majority of uses" where ncr Voyager, the > Amiga, and current macintosh laptops are all one use each, or is your > definition of "the large majority of uses" the one where each "use" is an > installation, of which there are millions of PCs (and even more ARM cell > phones), and something like three instances of Voyager? My definition of "the large majority or uses" is "mkfs, fsck, mount, fdisk, system-install-process". Different people differentiate devices in different ways. A system integrator might know about the hardware path. An end user might know about drive brands or sizes. A casual user might just think "internal or external". The kernel cannot support all these different approaches to naming. It really is best if it uses arbitrary names, and provides access to descriptions that the user can choose between. udev facilitates this with links in /dev/disk/. A system install can facilitate this even more by reporting size/manufacturer information etc. > > I realize that both views are valid. This is why the US has a house and a > senate, and filters things through both views. My gripe is that forcing my > laptop to look at my USB devices to find my SATA hard drive is aligned with > only one of those viewpoints, and completely opposed to the other. I'm guessing you are talking about mount-by-uuid? This effectively has to look at the filesystem of all devices to discover which one has the correct UUID, though it can cache the information for efficiency. Maybe it is just an implementation issue. Suppose that everytime a device were discovered, it were examined to see what was stored on it, and this information was stored in a cache. Then to find a particular filesystem to mount, you just look in the cache and if the info isn't there yet, just wait or fail as appropriate. Then we don't "look at my USB devices to find my SATA hard drive" but rather "look at each device as it is attached to find out what is in it", which seems like a sensible thing to do... > > An approach that makes things much easier on laptops is seen to hurt big > iron, > not because it the approach itself has a direct negative impact on big
Lots of disk activity on resume from s2ram
Hi. I've noticed that with recent kernels (starting somewhere between 2.6.20 and 2.6.22) I sometimes get *lots* of disk activity on resume from suspend to ram. About 2/3 of the time, the system resumes normally but in the remaining 1/3 of the time, the hard drive light stays on almost solid and the machine is very, very slow to respond. The only way I've found to reliably recover from this is if I can get to a command prompt fast enough and do "shutdown -r now". There's not much cpu activity at all, it just seems to be disk io that's killing interactivity. This also happens if I resume just from an empty gnome desktop with no applications running, so I don't think it's due to swapping. Some kernels affected: 2.6.22, 2.6.23+hrt patches, Ubuntu Gutsy kernel (2.6.22-14). These are all 64-bit kernels running on an HP nx6125 laptop - single-core Turion 64 processor, 1 gig of ram. To the best of my recollection, this problem did *not* appear with 2.6.20. I put some dmesg output from the vm block dumping and some vmstat output at http://nullinfinity.org/tmp/s2ram/ . The dmesg logs are from the same resume, just a little while apart. The vmstat is from a different resume. Any workarounds, patches, tips for further debugging etc would be appreciated. - Johan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-usb-users] OHCI root_port_reset() deadly loop...
> Bad news, even with the rwsem after a lot more testing I can still > trigger the hang in ohci_hub_control() :-( > > I think we need to go back to considering the total serialization > approach to this problem. We shouldn't need that. What happens if you add an msleep(5) before ehci-hcd::ehci_run() drops ehci_cf_port_reset_rwsem? The theory there being that the switch triggered by setting CF doesn't take effect instantaneously, contrary to the effective assumption of that code. A delay of 5 msec seems like it should be more than enough, but that's kind of a guess ... it's good to keep that low, since unfortunately that's in the critical path for OLPC "resume from idle". - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/11] maps3: add proportional set size accounting in smaps
On Mon, 15 Oct 2007, Matt Mackall wrote: > Index: l/fs/proc/task_mmu.c > === > --- l.orig/fs/proc/task_mmu.c 2007-10-14 13:35:31.0 -0500 > +++ l/fs/proc/task_mmu.c 2007-10-14 13:36:56.0 -0500 > @@ -122,6 +122,27 @@ struct mem_size_stats > unsigned long private_clean; > unsigned long private_dirty; > unsigned long referenced; > + > + /* > + * Proportional Set Size(PSS): my share of RSS. > + * > + * PSS of a process is the count of pages it has in memory, where each > + * page is divided by the number of processes sharing it. So if a > + * process has 1000 pages all to itself, and 1000 shared with one other > + * process, its PSS will be 1500. - Matt Mackall, lwn.net > + */ > + u64 pss; > + /* > + * To keep (accumulated) division errors low, we adopt 64bit pss and > + * use some low bits for division errors. So (pss >> PSS_DIV_BITS) > + * would be the real byte count. > + * > + * A shift of 12 before division means(assuming 4K page size): > + * - 1M 3-user-pages add up to 8KB errors; > + * - supports mapcount up to 2^24, or 16M; > + * - supports PSS up to 2^52 bytes, or 4PB. > + */ > +#define PSS_DIV_BITS 12 > }; > I know this gets moved again in the eighth patch of the series, but the #define still has no place inside the struct definition. The pss is going to need accessor functions, preferably inlined, and the comment adjusted stating that all accesses should be through those functions and not directly to the mem_size_stats struct. static inline u64 pss_up(unsigned long pss) { return pss << PSS_DIV_BITS; } static inline unsigned long pss_down(u64 pss) { return pss >> PSS_DIV_BITS; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/