RE: Cosmetic JFFS patch.
On Thu, 28 Jun 2001, Laramie Leavitt wrote: > > dmesg buffer space is rather limited and IMHO there isn't space to > > waste on credit-giving in boot logs. > > Here here. You don't see annoying log-eating copyright messages > printed out in the Windows boot. Just imagine: There's a difference; someone paid for that Windows code and you paid to get windows and don't care about who did what. But when someone puts down a lot of work to contributes something for free which others find useful and actually use, don't you think it might be prudent to let them at least write who contributed it, if a line is going to be printed anyway to say device that or that has been registred ? I know it sounds a bit like an "advertisment space" but it's always been so; people have been releasing code for free since noone knows how long and often one major factor has been that their peers will go "wow did you do that". Otherwise why would anyone ever write their name in an About box when they release a freeware program. And dmesg is the Linux kernels About box (someone might argue that the code is the about box but unfortunately most people dont read the headers in every .c file they use). See the old BSD license - distribution-wise it's more free than the GPL but you still had to give credit where credit is due when getting a free lunch from someone elses work (I think this requirement was dropped in the current BSD license) The risk is that some people might take it quite personally to get their names removed and might not be as interested to see their code in the kernel in the future. Of course as long as it's GPL nothing would stop it anyway, but I still think it's a good idea to give credit for others hard work. /Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Cosmetic JFFS patch.
On Thu, 28 Jun 2001, Laramie Leavitt wrote: dmesg buffer space is rather limited and IMHO there isn't space to waste on credit-giving in boot logs. Here here. You don't see annoying log-eating copyright messages printed out in the Windows boot. Just imagine: There's a difference; someone paid for that Windows code and you paid to get windows and don't care about who did what. But when someone puts down a lot of work to contributes something for free which others find useful and actually use, don't you think it might be prudent to let them at least write who contributed it, if a line is going to be printed anyway to say device that or that has been registred ? I know it sounds a bit like an advertisment space but it's always been so; people have been releasing code for free since noone knows how long and often one major factor has been that their peers will go wow did you do that. Otherwise why would anyone ever write their name in an About box when they release a freeware program. And dmesg is the Linux kernels About box (someone might argue that the code is the about box but unfortunately most people dont read the headers in every .c file they use). See the old BSD license - distribution-wise it's more free than the GPL but you still had to give credit where credit is due when getting a free lunch from someone elses work (I think this requirement was dropped in the current BSD license) The risk is that some people might take it quite personally to get their names removed and might not be as interested to see their code in the kernel in the future. Of course as long as it's GPL nothing would stop it anyway, but I still think it's a good idea to give credit for others hard work. /Bjorn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Via-rhine in 2.4.5 still requires cold-boot
Just for the record, the via-rhine.c in 2.4.5 still does not work if you soft-boot the computer (at least one a machine here), MAC address shows up as 00:00:00:00:00:00 and it fails - but a cold boot (power cable off, no standby power) makes it work. I read something that we'd need to reload the EEPROM on the boards or something if a cold-boot solves a problem. Well it does. :) /BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
meaning of vmalloc shortcut comment in fault.c
Can someone elaborate on why it's bad to refer to tsk directly below (this is a 2.4.5 change in x86) and why it's needed on x86 and not other archs.. What should I do for an arch that does not have a "cr3" machine register to check with ? /BW vmalloc_fault: { /* * Synchronize this task's top level page-table * with the 'reference' page table. * * Do _not_ use "tsk" here. We might be inside * an interrupt in the middle of a task switch.. */ int offset = __pgd_offset(address); pgd_t *pgd, *pgd_k; pmd_t *pmd, *pmd_k; pte_t *pte_k; asm("movl %%cr3,%0":"=r" (pgd)); pgd = offset + (pgd_t *)__va(pgd); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: USB requiring PCI
On Mon, 4 Jun 2001 [EMAIL PROTECTED] wrote: > I don't know the details of the implementation, but the CRIS port > (ETRAX 100LX) has support for USB but no PCI. A builtin non-PCI USB-host controller, that is. And the driver is in the kernel so we do support it as well :) /BW > > > AC> o Make USB require PCI(me) > > > Huh?! > > > How about people from StrongArm sa11x0 port, who have USB host > controller (in > > > sa companion chip) but do not have PCI? > > > > The strongarm doesnt have a USB master but a slave. > > > > > Probably there are more such embedded architectures with USB > controllers, > > > but not PCI bus. > > > > Currently we don't support any of them. > > > > > How about ISA USB host controllers? > > > > They do not exist. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Missing cache flush.
On Tue, 5 Jun 2001, David Woodhouse wrote: > The flash mapping driver arch/cris/drivers/axisflashmap.c uses a cached > mapping of the flash chips for bulk reads, but obviously an uncached mapping > for sending commands and reading status when we're actually writing to or > erasing parts of the chip. > > However, it fails to flush the dcache for the range used when the flash is > accessed through the uncached mapping. So after an erase or write, we may > read old data from the cache for the changed area. I'll start by saying that axisflashmap.c was not meant to be used by any other archs, that's why it's in arch/cris. But if anyone find it useful, that's great. Just be aware that it's not _designed_ for general use and something like this might be just what that might mean. CRIS is cache coherent just like the x86 cache and does not need any explicit cache flushes for the write case. Even when doing cache bypass writing, if a cacheline already exist with the referenced memory, the cacheline is updated. In the erase case though, yes there should be a flush. However during the 1-2 seconds it takes to erase a sector, you can with very high certainity guarantee that the direct-mapped unified 8 kB cache on the CRIS is flushed from any flash references at all.. I mean, it's one-way associative, during 1-2 seconds it executes potentially 200 million instructions. So we haven't really bothered to think about the problem.. For other CPU's it might be more dangerous, although I don't hold my breath.. 1-2 seconds is a long time when talking about L1 caches. > However, I can't see a cache operation which performs this function. > flush_dcache_page() is defined as a NOP on CRIS as, it seems, it is on most > architectures. On other architectures, there's dma_cache_wback_inv(), but > that also seems to be a NOP on i386, to pick a random example. I'd agree that to be really certain, a "flush_dcache()" function should be implemented and used when an erase finishes. Like David Miller wrote somewhere in the thread, one way is to use your knowledge of the arch's cache and do suitable dummy accesses to flush it, if there is no explicit command to do it. But that's just up to the arch coders.. -bw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Missing cache flush.
On Tue, 5 Jun 2001, David Woodhouse wrote: The flash mapping driver arch/cris/drivers/axisflashmap.c uses a cached mapping of the flash chips for bulk reads, but obviously an uncached mapping for sending commands and reading status when we're actually writing to or erasing parts of the chip. However, it fails to flush the dcache for the range used when the flash is accessed through the uncached mapping. So after an erase or write, we may read old data from the cache for the changed area. I'll start by saying that axisflashmap.c was not meant to be used by any other archs, that's why it's in arch/cris. But if anyone find it useful, that's great. Just be aware that it's not _designed_ for general use and something like this might be just what that might mean. CRIS is cache coherent just like the x86 cache and does not need any explicit cache flushes for the write case. Even when doing cache bypass writing, if a cacheline already exist with the referenced memory, the cacheline is updated. In the erase case though, yes there should be a flush. However during the 1-2 seconds it takes to erase a sector, you can with very high certainity guarantee that the direct-mapped unified 8 kB cache on the CRIS is flushed from any flash references at all.. I mean, it's one-way associative, during 1-2 seconds it executes potentially 200 million instructions. So we haven't really bothered to think about the problem.. For other CPU's it might be more dangerous, although I don't hold my breath.. 1-2 seconds is a long time when talking about L1 caches. However, I can't see a cache operation which performs this function. flush_dcache_page() is defined as a NOP on CRIS as, it seems, it is on most architectures. On other architectures, there's dma_cache_wback_inv(), but that also seems to be a NOP on i386, to pick a random example. I'd agree that to be really certain, a flush_dcache() function should be implemented and used when an erase finishes. Like David Miller wrote somewhere in the thread, one way is to use your knowledge of the arch's cache and do suitable dummy accesses to flush it, if there is no explicit command to do it. But that's just up to the arch coders.. -bw - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: USB requiring PCI
On Mon, 4 Jun 2001 [EMAIL PROTECTED] wrote: I don't know the details of the implementation, but the CRIS port (ETRAX 100LX) has support for USB but no PCI. A builtin non-PCI USB-host controller, that is. And the driver is in the kernel so we do support it as well :) /BW AC o Make USB require PCI(me) Huh?! How about people from StrongArm sa11x0 port, who have USB host controller (in sa companion chip) but do not have PCI? The strongarm doesnt have a USB master but a slave. Probably there are more such embedded architectures with USB controllers, but not PCI bus. Currently we don't support any of them. How about ISA USB host controllers? They do not exist. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
meaning of vmalloc shortcut comment in fault.c
Can someone elaborate on why it's bad to refer to tsk directly below (this is a 2.4.5 change in x86) and why it's needed on x86 and not other archs.. What should I do for an arch that does not have a cr3 machine register to check with ? /BW vmalloc_fault: { /* * Synchronize this task's top level page-table * with the 'reference' page table. * * Do _not_ use tsk here. We might be inside * an interrupt in the middle of a task switch.. */ int offset = __pgd_offset(address); pgd_t *pgd, *pgd_k; pmd_t *pmd, *pmd_k; pte_t *pte_k; asm(movl %%cr3,%0:=r (pgd)); pgd = offset + (pgd_t *)__va(pgd); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: make menuconfig - cosmetic question
While we're on cosmetics... how about imprisonment for the person who chose yellow on light grey for the first letters in each option... /Bjorn On Thu, 17 May 2001, Martin.Knoblauch wrote: > this is most likely just a small issue. If I knew where to look, I > would try to fix it and submit a patch :-) > > When I diff config files pocessed by "make [old]config" and "make > menueconfig", it seems that menuconfig is not writing out some of the > "comments" that the other versions do write. This is of course nothing > serious, but it ticks me off. Any idea where to look for this glitch? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: make menuconfig - cosmetic question
While we're on cosmetics... how about imprisonment for the person who chose yellow on light grey for the first letters in each option... /Bjorn On Thu, 17 May 2001, Martin.Knoblauch wrote: this is most likely just a small issue. If I knew where to look, I would try to fix it and submit a patch :-) When I diff config files pocessed by make [old]config and make menueconfig, it seems that menuconfig is not writing out some of the comments that the other versions do write. This is of course nothing serious, but it ticks me off. Any idea where to look for this glitch? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Question] Explanation of zero-copy networking
On Mon, 7 May 2001, Richard B. Johnson wrote: > Basically, "no copy" is an academic exercise. It makes the first > packet get sent more quickly, after which everything slows to > the natural bandwidth of the system. > > If you used a server for multicast-only. In other words, you > just spewed out unidirectional data, you still slow to the rate > at which the media can take the data. And CPUs can obtain or > generate these data a lot faster than 100-base can sink them. This is an awfully PC-centric way of putting things. You assume that the only ones who use Linux are those with a 1 ghz CPU and those 66 mhz PCI boards and whatever. You simply cannot make that assumption anymore; the diversity of Linux HW these days is so broad that the sweet spot between CPU cycles, memory bandwidth etc which controls the code optimization fluctuates wildly. A simple kernel profile of one of our embedded Linux systems for example show csum_partial_copy limiting the performance. Now for us zero-copy cannot be implemented anyway because we don't have a checksumming ethernet controller but if we had, we could enhance performance by 50% by skipping the copy perhaps. And there definitely are no 1 GHZ embedded CPU's in the same price range to choose instead, or Rambus memories etc.. raw power simply is not an option sometimes. It's still true of course that it's not obvious that the cycles spent on copying can be used for anything better in all cases. However, the beauty of open-source is that there is no need to debate over whether something should be done or not. If someone feels the need, it will be coded and if it's good people will use it. In this case, if anyone gets a 200% boost in performance, they probably won't listen to the argument that "it's academic" afterwards :) And some others might go twiddle their hardware and skip the zero-copy mechanism altogether. -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Question] Explanation of zero-copy networking
On Mon, 7 May 2001, Richard B. Johnson wrote: Basically, no copy is an academic exercise. It makes the first packet get sent more quickly, after which everything slows to the natural bandwidth of the system. If you used a server for multicast-only. In other words, you just spewed out unidirectional data, you still slow to the rate at which the media can take the data. And CPUs can obtain or generate these data a lot faster than 100-base can sink them. This is an awfully PC-centric way of putting things. You assume that the only ones who use Linux are those with a 1 ghz CPU and those 66 mhz PCI boards and whatever. You simply cannot make that assumption anymore; the diversity of Linux HW these days is so broad that the sweet spot between CPU cycles, memory bandwidth etc which controls the code optimization fluctuates wildly. A simple kernel profile of one of our embedded Linux systems for example show csum_partial_copy limiting the performance. Now for us zero-copy cannot be implemented anyway because we don't have a checksumming ethernet controller but if we had, we could enhance performance by 50% by skipping the copy perhaps. And there definitely are no 1 GHZ embedded CPU's in the same price range to choose instead, or Rambus memories etc.. raw power simply is not an option sometimes. It's still true of course that it's not obvious that the cycles spent on copying can be used for anything better in all cases. However, the beauty of open-source is that there is no need to debate over whether something should be done or not. If someone feels the need, it will be coded and if it's good people will use it. In this case, if anyone gets a 200% boost in performance, they probably won't listen to the argument that it's academic afterwards :) And some others might go twiddle their hardware and skip the zero-copy mechanism altogether. -BW - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with map_user_kiobuf() not mapping to physical memory
On Wed, 2 May 2001, Terry Barnaby wrote: > However, I note that if the user just mallocs memory and does not access > it > (No physical memory pages created) and then passes this virtual address > space > to the driver which performs a map_user_kiobuf() on it, the resulting > kiobuf > structure has all of the pagelist[] physical address entries set to the > same value > and the maplist[] entries set to 0. The devices access to this memory > now > causes system problems. > Is map_user_kiobuf() working correctly ? > Should I call some function to map the virtual address space into > physical memory > or at least pages before I call map_user_kiobuf() ? No.. but you might just have done something wrong. See the example in arch/cris/drivers/examples/kiobuftest.c (that example does not deallocate the vectors properly IIRC, but the actual kiobuf mapping sequence should work) /BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with map_user_kiobuf() not mapping to physical memory
On Wed, 2 May 2001, Terry Barnaby wrote: However, I note that if the user just mallocs memory and does not access it (No physical memory pages created) and then passes this virtual address space to the driver which performs a map_user_kiobuf() on it, the resulting kiobuf structure has all of the pagelist[] physical address entries set to the same value and the maplist[] entries set to 0. The devices access to this memory now causes system problems. Is map_user_kiobuf() working correctly ? Should I call some function to map the virtual address space into physical memory or at least pages before I call map_user_kiobuf() ? No.. but you might just have done something wrong. See the example in arch/cris/drivers/examples/kiobuftest.c (that example does not deallocate the vectors properly IIRC, but the actual kiobuf mapping sequence should work) /BW - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ramdisk/tmpfs/ramfs/memfs ?
On Fri, 27 Apr 2001, Padraig Brady wrote: > for a partition. If I understand correctly ramfs just points > to the file data which are pages in the cache marked not to be It does not even do that - as of 2.4, the VFS in the kernel also knows how to cache a filestructure itself. It's in the dentry-cache. So ramfs just provides the thin mapping between VFS operations and the VFS caches (dentries, inodes, pages) like any other 2.4 filesystem - with the difference that ramfs does not need to know anything about actually transferring the cache entries to a backing store (a physical filesystem). Take a look at fs/ramfs/inode.c, it's just some hundred odd lines of code and worth reading to find out more about how 2.4's VFS works. > uncached. Doh! is ramfs supported in 2.2? Don't think so, for the above reason. -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ramdisk/tmpfs/ramfs/memfs ?
On Fri, 27 Apr 2001, Padraig Brady wrote: for a partition. If I understand correctly ramfs just points to the file data which are pages in the cache marked not to be It does not even do that - as of 2.4, the VFS in the kernel also knows how to cache a filestructure itself. It's in the dentry-cache. So ramfs just provides the thin mapping between VFS operations and the VFS caches (dentries, inodes, pages) like any other 2.4 filesystem - with the difference that ramfs does not need to know anything about actually transferring the cache entries to a backing store (a physical filesystem). Take a look at fs/ramfs/inode.c, it's just some hundred odd lines of code and worth reading to find out more about how 2.4's VFS works. uncached. Doh! is ramfs supported in 2.2? Don't think so, for the above reason. -BW - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ramdisk/tmpfs/ramfs/memfs ?
On Thu, 26 Apr 2001, Padraig Brady wrote: > I'm working on an embedded system here which has no harddisk. > So, I can't swap to disk and need to have /var & /tmp in RAM. > I'm confused between the various options for in RAM file- > systems. At the moment I've created a ramdisk and made an > ext2 partition in it (which is compressed as I applied the > e2compr patch), which is working fine. Anyway questions: Ouch.. yes you had to do stuff like that in the old days but it's very cumbersome and inefficient compared to ramfs for what you're trying to do. > 1. I presume the kernel is clever enough to not cache any >files from these filesystems? Would it ever need to? You always need to "cache" pages read. Because a page is the smallest possible granularity for the MMU, and a block-based filesystem does not need to be page-aligned, so it's impossible to do it otherwise in a general way. > 3. If I've no backing store (harddisk?) is there any advantage >of using tmpfs instead of ramfs? Also does tmpfs need a >backing store? I don't know what tmpfs does actually, but if it is like you suggest (a ramfs that can be swapped out ?) then you don't need it obviously (since you don't have any swap). ramfs simply inserts any files written into the kernels cache and tells it not to forget it. it can't get much more simple than that. > 5. Can you set size limits on ramfs/tmpfs/memfs? i don't think you can set a limit in the current ramfs implementation but it would not be particularly difficult to make it work I think > 6. Is a ramdisk resizable like the others. If so, do you have >to delete/recreate or umount/resize a fs (e.g. ext2) every >time it's resized? Do ramfs/tmpfs/memfs do this transparently? >Are ramdisks resizable in kernel 2.2? ramfs does not need any "resizing" because there is no filesystem behind it. there is only the actual file data and metadata in the cache itself. if you delete a file, it disapperas, if you create a new one new pages are brought in. > 7. What's memfs? > 8. Is there a way I can get transparent compression like I now >have using a ramdisk+ext2+e2compr with ramfs et al? you could try using jffs2 on a RAM-simulated MTD partition. i think that would work but i have not tried it.. > 9. Apart from this transparent compression, is there any other >functionality ext2 would have over ramfs for e.g, for /tmp >& /var? Also would ramfs have less/more speed over ext2? ramfs has all the bells and whistles you need except size limiting. and obviously its faster than simulating a harddisk in ram and using ext2 on it.. -bw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ramdisk/tmpfs/ramfs/memfs ?
On Thu, 26 Apr 2001, Padraig Brady wrote: I'm working on an embedded system here which has no harddisk. So, I can't swap to disk and need to have /var /tmp in RAM. I'm confused between the various options for in RAM file- systems. At the moment I've created a ramdisk and made an ext2 partition in it (which is compressed as I applied the e2compr patch), which is working fine. Anyway questions: Ouch.. yes you had to do stuff like that in the old days but it's very cumbersome and inefficient compared to ramfs for what you're trying to do. 1. I presume the kernel is clever enough to not cache any files from these filesystems? Would it ever need to? You always need to cache pages read. Because a page is the smallest possible granularity for the MMU, and a block-based filesystem does not need to be page-aligned, so it's impossible to do it otherwise in a general way. 3. If I've no backing store (harddisk?) is there any advantage of using tmpfs instead of ramfs? Also does tmpfs need a backing store? I don't know what tmpfs does actually, but if it is like you suggest (a ramfs that can be swapped out ?) then you don't need it obviously (since you don't have any swap). ramfs simply inserts any files written into the kernels cache and tells it not to forget it. it can't get much more simple than that. 5. Can you set size limits on ramfs/tmpfs/memfs? i don't think you can set a limit in the current ramfs implementation but it would not be particularly difficult to make it work I think 6. Is a ramdisk resizable like the others. If so, do you have to delete/recreate or umount/resize a fs (e.g. ext2) every time it's resized? Do ramfs/tmpfs/memfs do this transparently? Are ramdisks resizable in kernel 2.2? ramfs does not need any resizing because there is no filesystem behind it. there is only the actual file data and metadata in the cache itself. if you delete a file, it disapperas, if you create a new one new pages are brought in. 7. What's memfs? 8. Is there a way I can get transparent compression like I now have using a ramdisk+ext2+e2compr with ramfs et al? you could try using jffs2 on a RAM-simulated MTD partition. i think that would work but i have not tried it.. 9. Apart from this transparent compression, is there any other functionality ext2 would have over ramfs for e.g, for /tmp /var? Also would ramfs have less/more speed over ext2? ramfs has all the bells and whistles you need except size limiting. and obviously its faster than simulating a harddisk in ram and using ext2 on it.. -bw - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/ide/ide.c to work with more IDE controllers
Hi! Problem description: * drivers/ide/ide.c assumes the IDE controller is mapped in such a way that it can access it by "hardcoded" I/O commands (IN_BYTE/OUT_BYTE) * drivers/ide/ide.c assumes that polled ide/atapi transfers should be done the way a PC would * drivers/ide/Makefile assumes that all IDE DMA controllers are PCI This makes it impossible to use for example the IDE-driver for the Etrax controller (arch/cris) which is not memory-mapped and is not PCI-based. The following trivial patches (against 2.4.4-pre6 but are probably appliable to any 2.4.3-ac as well) fix the problem: * In include/linux/ide.h, do #ifdef HAVE_ARCH_IN_BYTE etc. around the definitions of IN_BYTE and OUT_BYTE (allowing include/asm/ide.h to bypass the standard definition - see asm-cris/ide.h for an example) * Add the "ideproc" entry in the HW driver structure, and let ide_input_bytes and friends in ide.c test that first. If it exists, it uses it, otherwise just do the normal PC transfer * In the Makefile, let ide-dma.c (which is really PCI DMA only) be included by CONFIG_BLK_DEV_IDEDMA_PCI instead of just CONFIG_BLK_DEV_IDEDMA * (Un)related addition: add ide_etrax100 as a chipset enum and an init call to the etrax IDE driver under #ifdef CONFIG_ETRAX_IDE Please comment. It should all be trivial but there is one thing I'm unsure about and that is if it's guaranteed that the HWIF's structures are nulled upon creation (or maybe, if the primordial HWIF is nulled when copies are made). Obviously the above patch depends on any HWIF to have NULL as 'ideproc' if it does not need any alternative function there. Regards, Bjorn --- /home/bjornw/tmp/linux/drivers/ide/ide.cTue Apr 24 13:30:46 2001 +++ linux/drivers/ide/ide.c Wed Apr 4 13:20:53 2001 @@ -374,7 +374,19 @@ */ void ide_input_data (ide_drive_t *drive, void *buffer, unsigned int wcount) { - byte io_32bit = drive->io_32bit; + byte io_32bit; + + /* first check if this controller has defined a special function +* for handling polled ide transfers +*/ + + if(HWIF(drive)->ideproc) { + HWIF(drive)->ideproc(ideproc_ide_input_data, +drive, buffer, wcount); + return; + } + + io_32bit = drive->io_32bit; if (io_32bit) { #if SUPPORT_VLB_SYNC @@ -407,7 +419,15 @@ */ void ide_output_data (ide_drive_t *drive, void *buffer, unsigned int wcount) { - byte io_32bit = drive->io_32bit; + byte io_32bit; + + if(HWIF(drive)->ideproc) { + HWIF(drive)->ideproc(ideproc_ide_output_data, +drive, buffer, wcount); + return; + } + + io_32bit = drive->io_32bit; if (io_32bit) { #if SUPPORT_VLB_SYNC @@ -444,6 +464,12 @@ */ void atapi_input_bytes (ide_drive_t *drive, void *buffer, unsigned int bytecount) { + if(HWIF(drive)->ideproc) { + HWIF(drive)->ideproc(ideproc_atapi_input_bytes, +drive, buffer, bytecount); + return; + } + ++bytecount; #if defined(CONFIG_ATARI) || defined(CONFIG_Q40) if (MACH_IS_ATARI || MACH_IS_Q40) { @@ -459,6 +485,12 @@ void atapi_output_bytes (ide_drive_t *drive, void *buffer, unsigned int bytecount) { + if(HWIF(drive)->ideproc) { + HWIF(drive)->ideproc(ideproc_atapi_output_bytes, +drive, buffer, bytecount); + return; + } + ++bytecount; #if defined(CONFIG_ATARI) || defined(CONFIG_Q40) if (MACH_IS_ATARI || MACH_IS_Q40) { @@ -2092,6 +2123,7 @@ hwif->maskproc = old_hwif.maskproc; hwif->quirkproc = old_hwif.quirkproc; hwif->rwproc= old_hwif.rwproc; + hwif->ideproc = old_hwif.ideproc; hwif->dmaproc = old_hwif.dmaproc; hwif->dma_base = old_hwif.dma_base; hwif->dma_extra = old_hwif.dma_extra; @@ -3193,6 +3225,12 @@ } #endif /* CONFIG_PCI */ +#ifdef CONFIG_ETRAX_IDE + { + extern void init_e100_ide(void); + init_e100_ide(); + } +#endif /* CONFIG_ETRAX_IDE */ #ifdef CONFIG_BLK_DEV_CMD640 { extern void ide_probe_for_cmd640x(void); --- /home/bjornw/tmp/linux/include/linux/ide.h Thu Jan 4 23:51:21 2001 +++ linux/include/linux/ide.h Wed Apr 18 13:49:54 2001 @@ -133,14 +133,6 @@ #define IDE_BCOUNTL_REGIDE_LCYL_REG #define IDE_BCOUNTH_REGIDE_HCYL_REG -#ifdef REALLY_FAST_IO -#define OUT_BYTE(b,p) outb((b),(p)) -#define IN_BYTE(p) (byte)inb(p) -#else -#define OUT_BYTE(b,p) outb_p((b),(p)) -#define IN_BYTE(p) (byte)inb_p(p) -#endif /*
Re: Is there a way to turn file caching off ?
A similar phenomenon happens when you simply copy a file - file A is read into the cache and file B is written to the cache, until the memory runs out. Then both start to flush at the same time, creating a horrible performance hit (especially if A and B are on the same disk :) I don't know a way to fix this except having the kernel correctly identify the access pattern and optimize for it (i.e. if it recognizes that cache pages are flushed in order to make room for more pages from the same inode, then it's probably a suboptimal caching pattern and instead it should probably increase the readahead and flush bigger chunks of pages at the same time). I don't think anything can be done to the writing queue (except maybe make the kernel understand that seek-time is more expensive than transfer-time, so it does not schedule the read/writeing each odd page..) I'm still using 2.4.0 though so maybe this behaviour has been fixed to the better in later kernels.. As a sidenote, try the same thing on an WinNT box and watch it die :) Like unpacking a 1 GB file on a machine with 128 MB ram.. after it has unpacked the first 100 MB's or so, performance drops to 1% or something.. -BW On Tue, 17 Apr 2001, Laurent Chavet wrote: > First cache grows to the size of RAM (2GB) with transfer rate > slowing down as the cache grows. > Then the transfer rates drops a lot (2 to 3 time slower than the > drive capacity) and there is a very high CPU usage of system time (more > than a CPU) used by bdflush and kswapd (and some others like kupdated). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is there a way to turn file caching off ?
A similar phenomenon happens when you simply copy a file - file A is read into the cache and file B is written to the cache, until the memory runs out. Then both start to flush at the same time, creating a horrible performance hit (especially if A and B are on the same disk :) I don't know a way to fix this except having the kernel correctly identify the access pattern and optimize for it (i.e. if it recognizes that cache pages are flushed in order to make room for more pages from the same inode, then it's probably a suboptimal caching pattern and instead it should probably increase the readahead and flush bigger chunks of pages at the same time). I don't think anything can be done to the writing queue (except maybe make the kernel understand that seek-time is more expensive than transfer-time, so it does not schedule the read/writeing each odd page..) I'm still using 2.4.0 though so maybe this behaviour has been fixed to the better in later kernels.. As a sidenote, try the same thing on an WinNT box and watch it die :) Like unpacking a 1 GB file on a machine with 128 MB ram.. after it has unpacked the first 100 MB's or so, performance drops to 1% or something.. -BW On Tue, 17 Apr 2001, Laurent Chavet wrote: First cache grows to the size of RAM (2GB) with transfer rate slowing down as the cache grows. Then the transfer rates drops a lot (2 to 3 time slower than the drive capacity) and there is a very high CPU usage of system time (more than a CPU) used by bdflush and kswapd (and some others like kupdated). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
parport initialisation
Hi, regarding drivers/parport/* is there any particular reason as to why the different parport drivers aren't initialized using module_init() ? Like weird init order dependencies and stuff. Looking at parport_init itself (which has hardcoded init calls to the different drivers right now) it does not look like it does anything particularly special except some proc filesystem registering. Is it just because nobody has gotten around to "fixing" it or is there a deeper reason ? Regards Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
parport initialisation
Hi, regarding drivers/parport/* is there any particular reason as to why the different parport drivers aren't initialized using module_init() ? Like weird init order dependencies and stuff. Looking at parport_init itself (which has hardcoded init calls to the different drivers right now) it does not look like it does anything particularly special except some proc filesystem registering. Is it just because nobody has gotten around to "fixing" it or is there a deeper reason ? Regards Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ERESTARTSYS question.
On Thu, 5 Apr 2001, Jani Monoses wrote: > although the comments in errno.h say that ERESTARTSYS should not be seen > by userland,many drivers seam to return it from their > file_operations.Should glibc convert this errno so that the user program > sees something meaningful?Because it does not.Is EINTR not a better errno > to return from the drivers? ERESTARTSYS is a part of the api between the driver and the signal-handling code in the kernel. It does not reach user-space (provided of course that it's used appropriately in the drivers :) When a driver needs to wait, and get awoken by a signal (as opposed to what it's really waiting for) the driver should in most cases abort the system call so the signal handler can be run (like, you push ctrl-c while running somethinig that's stuck in a wait for an interrupt). The kernel uses the ERESTARTSYS as a "magic" value saying it's ok to restart the system call automagically after the signal handling is done. The actual return-code is switched to EINTR if the system call could not be restarted. -Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ERESTARTSYS question.
On Thu, 5 Apr 2001, Jani Monoses wrote: although the comments in errno.h say that ERESTARTSYS should not be seen by userland,many drivers seam to return it from their file_operations.Should glibc convert this errno so that the user program sees something meaningful?Because it does not.Is EINTR not a better errno to return from the drivers? ERESTARTSYS is a part of the api between the driver and the signal-handling code in the kernel. It does not reach user-space (provided of course that it's used appropriately in the drivers :) When a driver needs to wait, and get awoken by a signal (as opposed to what it's really waiting for) the driver should in most cases abort the system call so the signal handler can be run (like, you push ctrl-c while running somethinig that's stuck in a wait for an interrupt). The kernel uses the ERESTARTSYS as a "magic" value saying it's ok to restart the system call automagically after the signal handling is done. The actual return-code is switched to EINTR if the system call could not be restarted. -Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel/sched.c questions
On 4 Apr 2001, Andi Kleen wrote: > > >> Hello, I would like to know why you put this two functions: > > >> void scheduling_functions_start_here(void) { } > > >> ... > > >> void scheduling_functions_end_here(void) { } > This is needed for a very bad hack to get the EIP information in ps -lax: > most programs would be shown as hanging in schedule(), which would not be > very useful to show the user. To avoid this sched.c is always compiled with > frame pointers and if the EIP is inside these two functions the proc code > goes back one level in the stack frame. That sure is a very bad hack :) (For the original poster: search for get_wchan in the various ports) There is no comment anywhere near it that says what it is MEANT to do. You can guess from the code and the usage that it has to do with stack-frames and special-casing the scheduler functions.. Thanks for the clarification.. now I can go and fix it in arch/cris :) (I had never seen the WCHAN field in ps before actually) Just as a reference (everyone should get their daily dose of headache) here is the i386 version: unsigned long get_wchan(struct task_struct *p) { unsigned long ebp, esp, eip; unsigned long stack_page; int count = 0; if (!p || p == current || p->state == TASK_RUNNING) return 0; stack_page = (unsigned long)p; esp = p->thread.esp; if (!stack_page || esp < stack_page || esp > 8188+stack_page) return 0; /* include/asm-i386/system.h:switch_to() pushes ebp last. */ ebp = *(unsigned long *) esp; do { if (ebp < stack_page || ebp > 8184+stack_page) return 0; eip = *(unsigned long *) (ebp+4); if (eip < first_sched || eip >= last_sched) return eip; ebp = *(unsigned long *) ebp; } while (count++ < 16); return 0; } -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel/sched.c questions
On 4 Apr 2001, Andi Kleen wrote: Hello, I would like to know why you put this two functions: void scheduling_functions_start_here(void) { } ... void scheduling_functions_end_here(void) { } This is needed for a very bad hack to get the EIP information in ps -lax: most programs would be shown as hanging in schedule(), which would not be very useful to show the user. To avoid this sched.c is always compiled with frame pointers and if the EIP is inside these two functions the proc code goes back one level in the stack frame. That sure is a very bad hack :) (For the original poster: search for get_wchan in the various ports) There is no comment anywhere near it that says what it is MEANT to do. You can guess from the code and the usage that it has to do with stack-frames and special-casing the scheduler functions.. Thanks for the clarification.. now I can go and fix it in arch/cris :) (I had never seen the WCHAN field in ps before actually) Just as a reference (everyone should get their daily dose of headache) here is the i386 version: unsigned long get_wchan(struct task_struct *p) { unsigned long ebp, esp, eip; unsigned long stack_page; int count = 0; if (!p || p == current || p-state == TASK_RUNNING) return 0; stack_page = (unsigned long)p; esp = p-thread.esp; if (!stack_page || esp stack_page || esp 8188+stack_page) return 0; /* include/asm-i386/system.h:switch_to() pushes ebp last. */ ebp = *(unsigned long *) esp; do { if (ebp stack_page || ebp 8184+stack_page) return 0; eip = *(unsigned long *) (ebp+4); if (eip first_sched || eip = last_sched) return eip; ebp = *(unsigned long *) ebp; } while (count++ 16); return 0; } -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CML1 cleanup patch, take 2
On Mon, 26 Mar 2001, Eric S. Raymond wrote: > (2) Fix up 20 cris-architecture configuration symbols lacking a CONFIG_ > prefix, so they obey CML1/CML2 conventions and can be detected by > `make dep', also static-analysis tools and consistency checkers. > This is a BUG FIX in CML1. No need for you to fret on this; it's partly fixed in the version in Alan's tree and the rest will be cleaned up in our next update. -Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CML1 cleanup patch, take 2
On Mon, 26 Mar 2001, Eric S. Raymond wrote: (2) Fix up 20 cris-architecture configuration symbols lacking a CONFIG_ prefix, so they obey CML1/CML2 conventions and can be detected by `make dep', also static-analysis tools and consistency checkers. This is a BUG FIX in CML1. No need for you to fret on this; it's partly fixed in the version in Alan's tree and the rest will be cleaned up in our next update. -Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CRAMFS
On Fri, 23 Mar 2001, David Woodhouse wrote: > > 1. RAMFS is just more stable in terms of less complexity, less bugs reported > > over the time, etc. > > 2. RAMFS is a fairly robust filesystem and all features required as far as I can > > tell. Ok, ramfs is really simple, but heck, cramfs is not much more complex :) It's as simple a flash-filesystem as you can get. I don't know why the comparision is made though, they are used for two completely different things... ramfs is for temporary file storage, cramfs is for immutable files stored on flash. Each by itself is quite optimal for what it's designed for, isn't it ? > I'm not aware of any bugs being found in cramfs recently - unless you > wanted to use it on Alpha (or anything else where PAGE_SIZE != the > hard-coded 4096 in mkcramfs.c). I committed a patch that disappeared that added the choice of page size (trivial yes :), we have PAGE_SIZE == 8192 on our systems. Works fine. > I wouldn't avoid it for those reasons - although if you're _really_ short > of flash space, the same argument applies as for JFFS2 - a single > compression stream (tar.gz) will be smaller than compressing individual > pages like JFFS2 and cramfs do. Here are some results from a quite mixed filesystem: [bjornw@godzilla linux]$ ls -l cram* -rw-r--r-- 1 bjornw users 1179648 Mar 23 22:38 cram32768 -rw-r--r-- 1 bjornw users 1282048 Mar 23 22:38 cram4096 -rw-r--r-- 1 bjornw users 1220608 Mar 23 22:38 cram8192 (the numbers correspond to blocksize) There's not any big difference here. With bigger files though, the difference get larger. YMMV. Notice that you can change cramfs so it uses a blocksize that is bigger than PAGE_SIZE, of course, if it really is necessary. You'll get worse performance at run-time though since you need to cache the page and hope for read-ahead or similar (you can stuff the pages in the page-cache even if they are not requested for example). -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CRAMFS
On Fri, 23 Mar 2001, David Woodhouse wrote: 1. RAMFS is just more stable in terms of less complexity, less bugs reported over the time, etc. 2. RAMFS is a fairly robust filesystem and all features required as far as I can tell. Ok, ramfs is really simple, but heck, cramfs is not much more complex :) It's as simple a flash-filesystem as you can get. I don't know why the comparision is made though, they are used for two completely different things... ramfs is for temporary file storage, cramfs is for immutable files stored on flash. Each by itself is quite optimal for what it's designed for, isn't it ? I'm not aware of any bugs being found in cramfs recently - unless you wanted to use it on Alpha (or anything else where PAGE_SIZE != the hard-coded 4096 in mkcramfs.c). I committed a patch that disappeared that added the choice of page size (trivial yes :), we have PAGE_SIZE == 8192 on our systems. Works fine. I wouldn't avoid it for those reasons - although if you're _really_ short of flash space, the same argument applies as for JFFS2 - a single compression stream (tar.gz) will be smaller than compressing individual pages like JFFS2 and cramfs do. Here are some results from a quite mixed filesystem: [bjornw@godzilla linux]$ ls -l cram* -rw-r--r-- 1 bjornw users 1179648 Mar 23 22:38 cram32768 -rw-r--r-- 1 bjornw users 1282048 Mar 23 22:38 cram4096 -rw-r--r-- 1 bjornw users 1220608 Mar 23 22:38 cram8192 (the numbers correspond to blocksize) There's not any big difference here. With bigger files though, the difference get larger. YMMV. Notice that you can change cramfs so it uses a blocksize that is bigger than PAGE_SIZE, of course, if it really is necessary. You'll get worse performance at run-time though since you need to cache the page and hope for read-ahead or similar (you can stuff the pages in the page-cache even if they are not requested for example). -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
On Sat, 20 Jan 2001, Martin MaD Douda wrote: > On Fri, 19 Jan 2001, Michael Lindner wrote: > > data is generated as a result of data received via a select(), > > the next delivery occurs a clock tick later, with the machine > > mostly idle. > > The machine is in fact not idle - there is a task running - idle task. > Could the problem be that scheduler does not preempt this task to run > something more useful? Normally, the "idle task" (task[0]) does this pseudo-code: while(1) { if(need_resched) schedule(); } to minimize latency out of idle so if that actually is running it should not be a problem (unless need_resched is not set by the wakeup calls) Perhaps the kapm-idled kernel thread is killing your latency, you could try disabling APM and APM-making-idle-calls especially. Also check ps aux and see if anything else is taking your idle CPU %. -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
On Sat, 20 Jan 2001, Martin MaD Douda wrote: On Fri, 19 Jan 2001, Michael Lindner wrote: data is generated as a result of data received via a select(), the next delivery occurs a clock tick later, with the machine mostly idle. The machine is in fact not idle - there is a task running - idle task. Could the problem be that scheduler does not preempt this task to run something more useful? Normally, the "idle task" (task[0]) does this pseudo-code: while(1) { if(need_resched) schedule(); } to minimize latency out of idle so if that actually is running it should not be a problem (unless need_resched is not set by the wakeup calls) Perhaps the kapm-idled kernel thread is killing your latency, you could try disabling APM and APM-making-idle-calls especially. Also check ps aux and see if anything else is taking your idle CPU %. -BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: setfsuid on ext2 weirdness (2.4)
On Mon, 8 Jan 2001, Linus Torvalds wrote: > Please show them, anyway. What does "ls -ld / /etc /etc/passwd" say? Heh... /etc and /etc/passwd were allright... but / was fscked (or not, maybe :) drwx- 500 0 both locked from other users and 500 as owner.. > 99% says that one of the three will be wrong (probably "/", because you > probably checked the others already and overlooked root), and you'll > feel really silly. Dunno how that ever happened (unpacking a bad tar-ball maybe) but it's fixed now and Linux 2.4.0 is completely without blame! :) I'm stupendously silly but that's just normal, also, it's another warm unix experience to cherish.. Thanks for the hint! > And hey, if you think the above is confusing, try making your /dev/null > a regular (writable) file by mistake. Now THAT will be confusing as Been there got the t-shirt :) /BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: setfsuid on ext2 weirdness (2.4)
On Mon, 8 Jan 2001, Linus Torvalds wrote: Please show them, anyway. What does "ls -ld / /etc /etc/passwd" say? Heh... /etc and /etc/passwd were allright... but / was fscked (or not, maybe :) drwx- 500 0 both locked from other users and 500 as owner.. 99% says that one of the three will be wrong (probably "/", because you probably checked the others already and overlooked root), and you'll feel really silly. Dunno how that ever happened (unpacking a bad tar-ball maybe) but it's fixed now and Linux 2.4.0 is completely without blame! :) I'm stupendously silly but that's just normal, also, it's another warm unix experience to cherish.. Thanks for the hint! And hey, if you think the above is confusing, try making your /dev/null a regular (writable) file by mistake. Now THAT will be confusing as Been there got the t-shirt :) /BW - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
setfsuid on ext2 weirdness (2.4)
Ok.. I'm going bananas. It could be a 4am braindeath or a rh7.0 bungholio but this is annoying: main(int argc, char **argv) { int fd; setfsuid(atoi(argv[1])); fd = open("/etc/passwd", O_RDONLY); printf("got fd %d\n", fd); } [root@wizball /root]# ./setfstest 0 got fd 3 [root@wizball /root]# ./setfstest 500 got fd 3 [root@wizball /root]# ./setfstest 501 got fd -1 0 is obviously my root user and 500 is my standard user i log-in with. 501 exists (not that that has anything to do with this) in fact, 0 and 500 are the ONLY ones who let a filesystem op through after the setfsuid call. all other cause an EACCESS error on the open (or any other fs op). and yes, the actual filepermissions on /etc and /etc/passwd are correct. consequence is that i can't login as any other user (or ftp, or anything that needs to change the uid's) :( so... the quick question is... is there anything in EXT2 or VFS that can cause a quite normal ext2 filesystem on a 2.4.0 kernel to behave remotely like this ? strace shows the setfsuid call succeeds and nothing funny happens. [root@wizball /root]# strace ./setfstest 501 execve("./setfstest", ["./setfstest", "501"], [/* 38 vars */]) = 0 uname({sys="Linux", node="wizball.xxx.yyy.zzz", ...}) = 0 brk(0) = 0x80496c8 open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=32172, ...}) = 0 old_mmap(NULL, 32172, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3)= 0 open("/lib/libc.so.6", O_RDONLY)= 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\301\1"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=4851725, ...}) = 0 old_mmap(NULL, 1217864, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002 mprotect(0x4014, 38216, PROT_NONE) = 0 old_mmap(0x4014, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x11f000) = 0x4014 old_mmap(0x40146000, 13640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40146000 close(3)= 0 munmap(0x40018000, 32172) = 0 getpid()= 1739 setfsuid32(0x1f5) = 0 open("/etc/passwd", O_RDONLY) = -1 EACCES (Permission denied) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
setfsuid on ext2 weirdness (2.4)
Ok.. I'm going bananas. It could be a 4am braindeath or a rh7.0 bungholio but this is annoying: main(int argc, char **argv) { int fd; setfsuid(atoi(argv[1])); fd = open("/etc/passwd", O_RDONLY); printf("got fd %d\n", fd); } [root@wizball /root]# ./setfstest 0 got fd 3 [root@wizball /root]# ./setfstest 500 got fd 3 [root@wizball /root]# ./setfstest 501 got fd -1 0 is obviously my root user and 500 is my standard user i log-in with. 501 exists (not that that has anything to do with this) in fact, 0 and 500 are the ONLY ones who let a filesystem op through after the setfsuid call. all other cause an EACCESS error on the open (or any other fs op). and yes, the actual filepermissions on /etc and /etc/passwd are correct. consequence is that i can't login as any other user (or ftp, or anything that needs to change the uid's) :( so... the quick question is... is there anything in EXT2 or VFS that can cause a quite normal ext2 filesystem on a 2.4.0 kernel to behave remotely like this ? strace shows the setfsuid call succeeds and nothing funny happens. [root@wizball /root]# strace ./setfstest 501 execve("./setfstest", ["./setfstest", "501"], [/* 38 vars */]) = 0 uname({sys="Linux", node="wizball.xxx.yyy.zzz", ...}) = 0 brk(0) = 0x80496c8 open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=32172, ...}) = 0 old_mmap(NULL, 32172, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3)= 0 open("/lib/libc.so.6", O_RDONLY)= 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\301\1"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=4851725, ...}) = 0 old_mmap(NULL, 1217864, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002 mprotect(0x4014, 38216, PROT_NONE) = 0 old_mmap(0x4014, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x11f000) = 0x4014 old_mmap(0x40146000, 13640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40146000 close(3)= 0 munmap(0x40018000, 32172) = 0 getpid()= 1739 setfsuid32(0x1f5) = 0 open("/etc/passwd", O_RDONLY) = -1 EACCES (Permission denied) cut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IDE-driver not generalized enough ?
On Mon, 27 Nov 2000, Andre Hedrick wrote: > Yes, I have been working on that for some time. > This requires that the macros be exported the arch-xxx/ide.h > Additionally it takes more work to modify the request_io and release_io, > but it is all doable. Right on! Do you think it would be too big a performance hit if OUT_BYTE actually was an hwif function call instead of a macro ? OUT_BYTE has more to do with the specific hw interface than the system architecture, really. Actually the entire hwif_unregister function should be handled by the hwif itself I guess (haven't noticed that yet since I never unregister my drivers :) My "hack" right now involves putting "magic" values in the io_ports array so that OUT_BYTE separate them correctly (my controller has ONE address where a 32-bit write does the commands, with a bitfield controlling the IDE bus address instead of splitting into 7 + 1 separate addresses). BTW can ide_register_hw be called from the automatic "module_init" chains during bootup, or is that too early or too late ? It would be nice if that was the case because otherwise we need to add to the long list in probe_for_hwifs with initialization calls. -BW > On Tue, 28 Nov 2000, Bjorn Wesen wrote: > > Hi! Quick question: is it possible to write an IDE driver for a controller > > that is not mappable using outp and those memory-mapped thingys ? > > > > I see all the nice overrideables in struct hwif_s but the main code still > > uses OUT_BYTE which is hardcoded to an outb_p.. non-overrideable. Same > > thing with ide_input/output_bytes, they do direct in/out accesses also > > without consulting any hwif specific routine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IDE-driver not generalized enough ?
On Mon, 27 Nov 2000, Andre Hedrick wrote: Yes, I have been working on that for some time. This requires that the macros be exported the arch-xxx/ide.h Additionally it takes more work to modify the request_io and release_io, but it is all doable. Right on! Do you think it would be too big a performance hit if OUT_BYTE actually was an hwif function call instead of a macro ? OUT_BYTE has more to do with the specific hw interface than the system architecture, really. Actually the entire hwif_unregister function should be handled by the hwif itself I guess (haven't noticed that yet since I never unregister my drivers :) My "hack" right now involves putting "magic" values in the io_ports array so that OUT_BYTE separate them correctly (my controller has ONE address where a 32-bit write does the commands, with a bitfield controlling the IDE bus address instead of splitting into 7 + 1 separate addresses). BTW can ide_register_hw be called from the automatic "module_init" chains during bootup, or is that too early or too late ? It would be nice if that was the case because otherwise we need to add to the long list in probe_for_hwifs with initialization calls. -BW On Tue, 28 Nov 2000, Bjorn Wesen wrote: Hi! Quick question: is it possible to write an IDE driver for a controller that is not mappable using outp and those memory-mapped thingys ? I see all the nice overrideables in struct hwif_s but the main code still uses OUT_BYTE which is hardcoded to an outb_p.. non-overrideable. Same thing with ide_input/output_bytes, they do direct in/out accesses also without consulting any hwif specific routine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Address translation
On Thu, 23 Nov 2000, Andreas Bombe wrote: > > I may be wrong on this, but I thought that copy_{to,from}_user are > > only necessary if the address range you are accessing might cause a > > fault which Linux cannot handle (ie. one which would cause the > > application to segfault if it accessed that memory). If it is only a > > It is wrong. copy_*_user handle the page faults, whether they are good > faults (swapped out, copy on write) or bad faults (illegal access). > Without these macros you get the "unable to handle kernel page fault" > oops message if a fault occurs. Yes but only if it's a real fault, not if the address range actually is a valid VMA which needs paging, COW'ing or related OS ops. copy_*_user does not do the access in any different way than a "manual" access or memcpy does, it just adds a .fixup section that tells the do_page_fault handler that it should not segfault the kernel itself if the copy takes a big fault at any point, instead it should jump to the fixup which makes the copy routine return an error message. However, the fixup stuff is not in-line with the copy code so there should be absolutely no penalty using copy_*_user instead of a memcpy (provided the copy_*_user is as optimized as the memcpy code), and it's dangerous to assume anything about pages visible in user-space, they might be unmapped by another thread while you're doing that memcpy etc. > > (1) In a "top half" thread, can I now access this memory without the > > access macros (since I know the address range is valid)? > > The address is valid, the pages probably aren't. In fact, extending the > address space only creates read-only mappings to the global zeroed page > if I remember right. But it does not matter that the pages aren't there physically, any kind of access (including an access from kernel-mode) will bring about the same COW/change-on-write mechanism as copy_to_user or a user-mode access would. The problem is rather that between your do_brk and when you access the pages, a thread in the process might do an unmap or brk to remove the mapping, then you crash the kernel. > > (2) Can I also access this memory from an interrupt/exception > > context, or must I lock it? (ie. can faults be handled from such > > a context) > > You can't even use copy_*_user in this context (since the current user > space might be any process, even kernel threads that have no user space > at all). > > For access to user memory from interrupt context at all and to access > user memory without the uaccess macros, you have to lock them down in > memory, with map_user_kiobuf(). This is only recommended if you want > hardware to DMA to/from buffers provided by user space. Yup, if you are in the wrong context or in an interrupt context you'll die horribly if you try to access user-pages that aren't there: if (in_interrupt() || !mm) goto no_context; So you need to 1) make sure the pages are in physical memory and 2) make sure the pages won't get removed from under your feet at any time and 3) access them using their physical address > > (3) Is the above code sensible at all, or barking? It took me a while > > to figure that the above would work, and I think/hope it is the > > most elegant way to share memory between kernel and a process. > > It will fail quickly, probably taking the kernel down with it. > > The most elegant way to share memory between user and kernel is to > allocate the memory in the kernel and map it to user space (by > implementing mmap on the kernel side for the file used for > communication). Agreed, but that does not cut it for some applications. For example, let's say you want to grab 16 MB of video frames without copying them from that mmap area to your malloc'ed 16 MB (let's say your CPU takes a pretty big hit doing that extra memcpy) and you'd like to DMA directly into the user-pages. You can of course make the kernel grab 16 MB worth of pages for you and then mmap them into the process, but the kernel driver would be pretty hooked to that demanding user process then.. Actually I'm trying to figure out the best way to do a similar thing for some hardware we have - I have incoming DMA data containing JPG grabs, and I want to cache images in a user-mode daemon, which will send pictures from the cache out on TCP. The images might be generated with many different JPG settings so they need right tags in the cache etc. Before when we ran on a chip without MMU this was easy - that user-mode buffer was a contigous physical area which I could DMA directly in. But now when we're going to a CPU with MMU, it gets more complicated of course :) I have figured the options are 1) let the kernel driver have a buffer big enough for a single grab, mmap this into user-space and do the memcpy into the cache (might be fast enough, but our chip isn't super on memcpy's..) 2) let the kernel driver lock down the user-pages in an access and DMA directly
Re: Q: network drivers interface changes
On Wed, 27 Sep 2000, Hen, Shmulik wrote: > Is there a good source of information that describes the changes in network > driver interface between 2.2.x and 2.4.x kernels ? Try a diff -u of skeleton.c in the both kernels. If the skeleton driver is correct that is :) It didn't look very complicated from 2.0 -> 2.4 at least so 2.2 -> 2.4 should not be difficult at all. -Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/