Re: [PATCH 1/2] msi: Invert the sense of the MSI enables.
On May 24, 2007, at 10:51 PM, Andi Kleen wrote: Do we have a feel for how much performace we're losing on those systems which _could_ do MSI, but which will end up defaulting to not using it? At least on 10GB ethernet it is a significant difference; you usually cannot go anywhere near line speed without MSI I suspect it is visible on high performance / multiple GB NICs too. Why would that be? As the packet rate goes up and NAPI polling kicks in, wouldn't MSI make less and less difference? I like the fact that MSI gives us finer control over CPU affinity than many INTx implementations, but that's a different issue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] msi: Invert the sense of the MSI enables.
On May 24, 2007, at 10:51 PM, Andi Kleen wrote: Do we have a feel for how much performace we're losing on those systems which _could_ do MSI, but which will end up defaulting to not using it? At least on 10GB ethernet it is a significant difference; you usually cannot go anywhere near line speed without MSI I suspect it is visible on high performance / multiple GB NICs too. Why would that be? As the packet rate goes up and NAPI polling kicks in, wouldn't MSI make less and less difference? I like the fact that MSI gives us finer control over CPU affinity than many INTx implementations, but that's a different issue. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Apr 15, 2007, at 10:59 AM, Linus Torvalds wrote: It's a really good thing, and it means that if somebody shows that your code is flawed in some way (by, for example, making a patch that people claim gets better behaviour or numbers), any *good* programmer that actually cares about his code will obviously suddenly be very motivated to out-do the out-doer! "No one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar." --Don Foster, "literary sleuth", on retracting his attribution of "A Funerall Elegye" to Shakespeare (it's more likely John Ford's work). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Apr 15, 2007, at 10:59 AM, Linus Torvalds wrote: It's a really good thing, and it means that if somebody shows that your code is flawed in some way (by, for example, making a patch that people claim gets better behaviour or numbers), any *good* programmer that actually cares about his code will obviously suddenly be very motivated to out-do the out-doer! No one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar. --Don Foster, literary sleuth, on retracting his attribution of A Funerall Elegye to Shakespeare (it's more likely John Ford's work). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86 TSC time warp puzzle
At 3:13 AM -0500 4/2/05, Lee Revell wrote: On Fri, 2005-04-01 at 23:05 -0800, Pallipadi, Venkatesh wrote: It can be SMI happening in the platform. Typically BIOS uses some SMI > polling to handle some devices during early boot. Though 500 microseconds > sounds a bit too high. Nope, that sounds just about right. Buggy BIOSes that implement ACPI via SMM (or so I have been told) can stall the machine for over a millisecond, this is why some laptops lose timer ticks at HZ=1000. The issue is well known by Linux audio users, as it causes big problems for people who buy laptops for live audio use. This is a desktop board, and this is well after boot (hours). Also, ACPI is disabled in the BIOS. I suppose I can try to disable SMI via the APIC? -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86 TSC time warp puzzle
At 3:13 AM -0500 4/2/05, Lee Revell wrote: On Fri, 2005-04-01 at 23:05 -0800, Pallipadi, Venkatesh wrote: It can be SMI happening in the platform. Typically BIOS uses some SMI polling to handle some devices during early boot. Though 500 microseconds sounds a bit too high. Nope, that sounds just about right. Buggy BIOSes that implement ACPI via SMM (or so I have been told) can stall the machine for over a millisecond, this is why some laptops lose timer ticks at HZ=1000. The issue is well known by Linux audio users, as it causes big problems for people who buy laptops for live audio use. This is a desktop board, and this is well after boot (hours). Also, ACPI is disabled in the BIOS. I suppose I can try to disable SMI via the APIC? -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
x86 TSC time warp puzzle
Well, not actually a time warp, though it feels like one. I'm doing some real-time bit-twiddling in a driver, using the TSC to measure out delays on the order of hundreds of nanoseconds. Because I want an upper limit on the delay, I disable interrupts around it. The logic is something like: local_irq_save out(set a bit) t0 = TSC wait while (t = (TSC - t0)) < delay_time out(clear the bit) local_irq_restore From time to time, when I exit the delay, t is *much* bigger than delay_time. If delay_time is, say, 300ns, t is usually no more than 325ns. But every so often, t can be 2000, or 1, or even much higher. The value of t seems to depend on the CPU involved, The worst case is with an Intel 915GV chipset, where t approaches 500 microseconds (!). This is with ACPI and HT disabled, to avoid confounding interactions. I suspected NMI, of course, but I monitored the nmi counter, and mostly saw nothing (from time to time a random hit, but mostly not). The longer delay is real. I can see the bit being set/cleared in the pseudocode above on a scope, and when the long delay happens, the bit is set for a correspondingly long time. BTW, the symptom is independent of my IO. I wrote a test case that does diddles nothing but reading TSC, and get the same result. Finally, on some CPUs, at least, the extra delay appears to be periodic. The 500us delay happens about every second. On a different machine (chipset) it happens at about 5 Hz. And the characteristic delay on each type of machine seems consistent. Any ideas of where to look? Other lists to inquire on? Thanks. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Minor cleanup and export three functions
At 3:03 AM +0100 2001-07-20, Anton Altaparmakov wrote: >I do appologize. I didn't realize pine would do this. In pine I can just >read the attachment as text and in Eudora it just appears as inlined >text without any indication of it being a separate attachment, so I just >assumed that it was sent clear text. Obviously not. Eudora does leave you one little clue: At 2:19 AM +0100 2001-07-20, Anton Altaparmakov wrote: >MIME-Version: 1.0 >Content-Type: MULTIPART/MIXED; >BOUNDARY="-559023410-1804928587-995591940=:20239" -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Minor cleanup and export three functions
At 3:03 AM +0100 2001-07-20, Anton Altaparmakov wrote: I do appologize. I didn't realize pine would do this. In pine I can just read the attachment as text and in Eudora it just appears as inlined text without any indication of it being a separate attachment, so I just assumed that it was sent clear text. Obviously not. Stupid mailers. Grr. Eudora does leave you one little clue: At 2:19 AM +0100 2001-07-20, Anton Altaparmakov wrote: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY=-559023410-1804928587-995591940=:20239 -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Acpi] Re: ACPI fundamental locking problems
At 3:26 AM -0400 2001-07-08, Alexander Viro wrote: >On Sat, 7 Jul 2001, Jamie Lokier wrote: > >> Daniel Phillips wrote: >> > > Reading a tarball is the distillation of what you describe into >> > > efficient form :) >> > >> > /me downloads tar file definition >> > >> > Um, gnu tar or posix tar? or some new, improved tar? >> >> I suggest cpio, which is more compact and in some ways more standard. >> (tar has a silly pad-to-multiple-of-512-byte per file rule, which is >> inappropriate for this). GNU cpio creates cpio format just fine. > >GNU cpio is a race-ridden unmaintained pile of junk. Look at the size >of, say it, Debian patch to upstream source. Then try to read the >patched code. Quite a few of us simply don't have that FPOS on their >boxen. > >Using cpio archive layout is OK, but _please_, don't make it dependent >on GNU cpio. If size is an issue (and of course it is), presumably the archive would be compressed. As long as tar can be convinced to pad with (say) nulls, the padding shouldn't have that much of an impact on archive size. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Acpi] Re: ACPI fundamental locking problems
At 3:26 AM -0400 2001-07-08, Alexander Viro wrote: On Sat, 7 Jul 2001, Jamie Lokier wrote: Daniel Phillips wrote: Reading a tarball is the distillation of what you describe into efficient form :) /me downloads tar file definition Um, gnu tar or posix tar? or some new, improved tar? I suggest cpio, which is more compact and in some ways more standard. (tar has a silly pad-to-multiple-of-512-byte per file rule, which is inappropriate for this). GNU cpio creates cpio format just fine. GNU cpio is a race-ridden unmaintained pile of junk. Look at the size of, say it, Debian patch to upstream source. Then try to read the patched code. Quite a few of us simply don't have that FPOS on their boxen. Using cpio archive layout is OK, but _please_, don't make it dependent on GNU cpio. If size is an issue (and of course it is), presumably the archive would be compressed. As long as tar can be convinced to pad with (say) nulls, the padding shouldn't have that much of an impact on archive size. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc_file_read() (Was: Re: proc_file_read() question)
At 10:07 AM +0200 2001-06-27, Martin Wilck wrote: >On Tue, 26 Jun 2001, Jonathan Lundell wrote: > >> I use the hack myself, to implement a record-oriented file where the >> file position is a record number. I could probably live with >> PAGE_SIZE, but the current hack works fine with start bigger than >> that, and it's possible that someone counts on it. > >Ok, let's use PAGE_OFFSET instead of PAGE_SIZE, then (see new patch >below). >Unless I'm mislead, legitimate values of "start" as a pointer are always >larger than that, and I can hardly imagin e a case where the "unsigned >int" value of start must be greater than PAGE_OFFSET. PAGE_OFFSET definitely works for me, but a quick scan of the headers suggests that non-sun3 m68k builds define PAGE_OFFSET as 0, as does s390. Maybe you want max(PAGE_SIZE, PAGE_OFFSET). >I insist that relying on the comparison of two pointers is the wrong >thing. If (as you suggest) the major use of "start" has migrated from the >original intention to that of the "hack", this should be reflected >in the interface by making the "start" parameter to read_proc () >an unsigned long. Everything else is misleading and error-prone. >For now, "start" is a char* and should be treated as such. That's the hack, though. Rusty should chime in, but the implicit restriction on start in the original hack (by the time we get to the test we're talking about) is that it's either a pointer of the form page+offset, where offset < PAGE_SIZE, or it's a (relatively) small file offset. That's a reasonable assumption given that the procedure is dynamically allocating page. After all, why would you allocate the buffer and then not use it? Sure, the overloading is self-admittedly hacky, but (again I assume) the motivation was to avoid breaking the clients, many of which are not in the kernel.org tree. Your proposed change overloads a third interpretation on start, namely an arbitrary pointer, outside the page allocation. > > But if you're allocating your own buffer, you'd probably be better >> off writing your own file ops, and not using the default >> proc_file_read() at all. At the very least you'd save a redundant >> __get_free_page/free_page pair. > >That's right, but nevertheless (repeat) comparing "start" and "page" is >wrong. Not given the implied restriction that, if start is a pointer at all, it's a pointer within page's allocation. And after all, PAGE_OFFSET is effectively a pointer. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc_file_read() (Was: Re: proc_file_read() question)
At 10:07 AM +0200 2001-06-27, Martin Wilck wrote: On Tue, 26 Jun 2001, Jonathan Lundell wrote: I use the hack myself, to implement a record-oriented file where the file position is a record number. I could probably live with PAGE_SIZE, but the current hack works fine with start bigger than that, and it's possible that someone counts on it. Ok, let's use PAGE_OFFSET instead of PAGE_SIZE, then (see new patch below). Unless I'm mislead, legitimate values of start as a pointer are always larger than that, and I can hardly imagin e a case where the unsigned int value of start must be greater than PAGE_OFFSET. PAGE_OFFSET definitely works for me, but a quick scan of the headers suggests that non-sun3 m68k builds define PAGE_OFFSET as 0, as does s390. Maybe you want max(PAGE_SIZE, PAGE_OFFSET). I insist that relying on the comparison of two pointers is the wrong thing. If (as you suggest) the major use of start has migrated from the original intention to that of the hack, this should be reflected in the interface by making the start parameter to read_proc () an unsigned long. Everything else is misleading and error-prone. For now, start is a char* and should be treated as such. That's the hack, though. Rusty should chime in, but the implicit restriction on start in the original hack (by the time we get to the test we're talking about) is that it's either a pointer of the form page+offset, where offset PAGE_SIZE, or it's a (relatively) small file offset. That's a reasonable assumption given that the procedure is dynamically allocating page. After all, why would you allocate the buffer and then not use it? Sure, the overloading is self-admittedly hacky, but (again I assume) the motivation was to avoid breaking the clients, many of which are not in the kernel.org tree. Your proposed change overloads a third interpretation on start, namely an arbitrary pointer, outside the page allocation. But if you're allocating your own buffer, you'd probably be better off writing your own file ops, and not using the default proc_file_read() at all. At the very least you'd save a redundant __get_free_page/free_page pair. That's right, but nevertheless (repeat) comparing start and page is wrong. Not given the implied restriction that, if start is a pointer at all, it's a pointer within page's allocation. And after all, PAGE_OFFSET is effectively a pointer. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [comphist] Re: Microsoft and Xenix.
At 10:44 AM -0400 2001-06-26, Rob Landley wrote: >"A quarter century of unix" mentions RK05 cartridges several times, but never >says much ABOUT them. > >Okay, so they're 2.4 megabyte removable cartridges? How big? Are they tapes >or disk packs? (I.E. can you run off of them or are they just storage?) I >know lots of early copies of unix were sent out from Bell Labs on RK05 >cartidges signed "love, ken"... http://www.pdp8.net/rk05/rk05.shtml >What was that big reel to reel tape they always show in movies, anyway? The big-refrigerator-sized guys were generally attached to mainframes, IBM or otherwise. Here's a little info: http://www.digital-interact.co.uk/site/html/reference/media_9trk.html (but take it with a grain of salt; IBM surely didn't go to nine tracks because of ASCII!). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc_file_read() (Was: Re: proc_file_read() question)
At 7:14 PM +0200 2001-06-26, Martin Wilck wrote: >Hi, > >> Shhh ;-) Last time that hack was mentioned, someone wanted to _remove_ >> it. It's a very nice little hack to have around, and IKD uses it. > >I am not saying it should be removed. But IMO it is a legitimate (if >not the originally intended) use of "start" to serve as a pointer to >a memory area allocated in the proc_read () function. This use is broken >with this hack in its current form, because reading from such a file >will fail depending on the (random) order of the page and start pointers. > >If I understand the "hack" right, legitimate offsets generated for it >are always between 0 and PAGE_SIZE. Therefore the patch below would >not break it, while overcoming the abovementioned problem, because >legitimate page pointers will never be < PAGE_SIZE. > >Please correct me if I'm wrong. I use the hack myself, to implement a record-oriented file where the file position is a record number. I could probably live with PAGE_SIZE, but the current hack works fine with start bigger than that, and it's possible that someone counts on it. But if you're allocating your own buffer, you'd probably be better off writing your own file ops, and not using the default proc_file_read() at all. At the very least you'd save a redundant __get_free_page/free_page pair. >Cheers, >Martin > >-- >Martin Wilck <[EMAIL PROTECTED]> >FSC EP PS DS1, Paderborn Tel. +49 5251 8 15113 > > >--- linux-2.4.5/fs/proc/generic.c Mon Jun 25 13:46:26 2001 >+++ 2.4.5mw/fs/proc/generic.c Tue Jun 26 20:42:22 2001 >@@ -104,14 +104,14 @@ >* return the bytes, and set `start' to the desired offset >* as an unsigned int. - [EMAIL PROTECTED] >*/ >- n -= copy_to_user(buf, start < page ? page : start, n); >+ n -= copy_to_user(buf, (unsigned long) start < >PAGE_SIZE ? page : start, n); > if (n == 0) { > if (retval == 0) > retval = -EFAULT; > break; > } > >- *ppos += start < page ? (long)start : n; /* Move down >the file */ >+ *ppos += (unsigned long) start < PAGE_SIZE ? >(unsigned long) start : n; /* Move down the file */ > nbytes -= n; > buf += n; > retval += n; -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] Re: When the FUD is all around (sniff).
At 4:02 PM +0100 2001-06-26, Alan Cox wrote: > > > There is a saying in he UK 'You can fool all of the people some of the >> > time, you can fool some of the people all the time, but you >>cannot fool all >> > of the people all of the time'. >> >> Didn't Abraham Lincoln say that? :) > >[Digs] >Indeed in 1864. Perhaps, perhaps not. http://www.usnews.com/usnews/issue/970217/17linc.htm >What Zall did with the plethora of Lincoln anecdotes--include and >evaluate the apparently authentic, delete the seemingly >apocryphal--other historians are doing with collections of his >words. Their task is daunting: No American is more quoted--or >misquoted--than Lincoln. Their work also is important: The image of >Lincoln, the historical as well as the mythical, has been shaped to >an uncommon degree by statements that other people put in his mouth, >often to suit their own purposes. > >Stanford's Don Fehrenbacher and his wife, Virginia, spent 12 years >compiling the Recollected Words of Abraham Lincoln (Stanford >University Press, 1996, $60), a collection of 1,900 quotations >attributed to Lincoln by more than 500 of his contemporaries. The >scholars rated the authenticity of quotations with letter grades: A >for a direct quote the listener wrote down soon after hearing it; B >for a quickly recorded indirect quote; C for quotes reported weeks, >months, or years later; D for one "about whose authenticity there is >more than average doubt"; E for those "probably not authentic." > >No fooling. One now familiar line the Fehrenbachers examined was far >from familiar to 19th-century America: "You can fool all the people >some of the time and some of the people all of the time, but you >can't fool all the people all of the time." The saying apparently >first emerged in print in 1901 in Lincoln's Yarns and Stories; the >book identified the person who allegedly heard Lincoln as "a caller >at the White House." Years later, two old-timers claimed they had >heard Lincoln say it in an 1856 address in Illinois, but a news >account of the speech didn't mention it. The Fehrenbachers give the >old-timers' recollections a D. The evidence, the scholars say, >"suggests that this is a case of reminiscence echoing folklore or >fiction." -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[OT] Re: When the FUD is all around (sniff).
At 8:59 AM -0600 2001-06-26, Jordan Crouse wrote: > > There is a saying in he UK 'You can fool all of the people some of the >> time, you can fool some of the people all the time, but you cannot fool all >> of the people all of the time'. > >Didn't Abraham Lincoln say that? :) That's the common, but doubtful, attribution. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[OT] Re: When the FUD is all around (sniff).
At 8:59 AM -0600 2001-06-26, Jordan Crouse wrote: There is a saying in he UK 'You can fool all of the people some of the time, you can fool some of the people all the time, but you cannot fool all of the people all of the time'. Didn't Abraham Lincoln say that? :) That's the common, but doubtful, attribution. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc_file_read() (Was: Re: proc_file_read() question)
At 7:14 PM +0200 2001-06-26, Martin Wilck wrote: Hi, Shhh ;-) Last time that hack was mentioned, someone wanted to _remove_ it. It's a very nice little hack to have around, and IKD uses it. I am not saying it should be removed. But IMO it is a legitimate (if not the originally intended) use of start to serve as a pointer to a memory area allocated in the proc_read () function. This use is broken with this hack in its current form, because reading from such a file will fail depending on the (random) order of the page and start pointers. If I understand the hack right, legitimate offsets generated for it are always between 0 and PAGE_SIZE. Therefore the patch below would not break it, while overcoming the abovementioned problem, because legitimate page pointers will never be PAGE_SIZE. Please correct me if I'm wrong. I use the hack myself, to implement a record-oriented file where the file position is a record number. I could probably live with PAGE_SIZE, but the current hack works fine with start bigger than that, and it's possible that someone counts on it. But if you're allocating your own buffer, you'd probably be better off writing your own file ops, and not using the default proc_file_read() at all. At the very least you'd save a redundant __get_free_page/free_page pair. Cheers, Martin -- Martin Wilck [EMAIL PROTECTED] FSC EP PS DS1, Paderborn Tel. +49 5251 8 15113 --- linux-2.4.5/fs/proc/generic.c Mon Jun 25 13:46:26 2001 +++ 2.4.5mw/fs/proc/generic.c Tue Jun 26 20:42:22 2001 @@ -104,14 +104,14 @@ * return the bytes, and set `start' to the desired offset * as an unsigned int. - [EMAIL PROTECTED] */ - n -= copy_to_user(buf, start page ? page : start, n); + n -= copy_to_user(buf, (unsigned long) start PAGE_SIZE ? page : start, n); if (n == 0) { if (retval == 0) retval = -EFAULT; break; } - *ppos += start page ? (long)start : n; /* Move down the file */ + *ppos += (unsigned long) start PAGE_SIZE ? (unsigned long) start : n; /* Move down the file */ nbytes -= n; buf += n; retval += n; -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [comphist] Re: Microsoft and Xenix.
At 10:44 AM -0400 2001-06-26, Rob Landley wrote: A quarter century of unix mentions RK05 cartridges several times, but never says much ABOUT them. Okay, so they're 2.4 megabyte removable cartridges? How big? Are they tapes or disk packs? (I.E. can you run off of them or are they just storage?) I know lots of early copies of unix were sent out from Bell Labs on RK05 cartidges signed love, ken... http://www.pdp8.net/rk05/rk05.shtml What was that big reel to reel tape they always show in movies, anyway? The big-refrigerator-sized guys were generally attached to mainframes, IBM or otherwise. Here's a little info: http://www.digital-interact.co.uk/site/html/reference/media_9trk.html (but take it with a grain of salt; IBM surely didn't go to nine tracks because of ASCII!). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] Re: When the FUD is all around (sniff).
At 4:02 PM +0100 2001-06-26, Alan Cox wrote: There is a saying in he UK 'You can fool all of the people some of the time, you can fool some of the people all the time, but you cannot fool all of the people all of the time'. Didn't Abraham Lincoln say that? :) [Digs] Indeed in 1864. Perhaps, perhaps not. http://www.usnews.com/usnews/issue/970217/17linc.htm What Zall did with the plethora of Lincoln anecdotes--include and evaluate the apparently authentic, delete the seemingly apocryphal--other historians are doing with collections of his words. Their task is daunting: No American is more quoted--or misquoted--than Lincoln. Their work also is important: The image of Lincoln, the historical as well as the mythical, has been shaped to an uncommon degree by statements that other people put in his mouth, often to suit their own purposes. Stanford's Don Fehrenbacher and his wife, Virginia, spent 12 years compiling the Recollected Words of Abraham Lincoln (Stanford University Press, 1996, $60), a collection of 1,900 quotations attributed to Lincoln by more than 500 of his contemporaries. The scholars rated the authenticity of quotations with letter grades: A for a direct quote the listener wrote down soon after hearing it; B for a quickly recorded indirect quote; C for quotes reported weeks, months, or years later; D for one about whose authenticity there is more than average doubt; E for those probably not authentic. No fooling. One now familiar line the Fehrenbachers examined was far from familiar to 19th-century America: You can fool all the people some of the time and some of the people all of the time, but you can't fool all the people all of the time. The saying apparently first emerged in print in 1901 in Lincoln's Yarns and Stories; the book identified the person who allegedly heard Lincoln as a caller at the White House. Years later, two old-timers claimed they had heard Lincoln say it in an 1856 address in Illinois, but a news account of the speech didn't mention it. The Fehrenbachers give the old-timers' recollections a D. The evidence, the scholars say, suggests that this is a case of reminiscence echoing folklore or fiction. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Q serial.c
At 9:51 AM -0400 2001-06-22, Stuart MacDonald wrote: >From: "kees" <[EMAIL PROTECTED]> >> What may happen on a SMP machine if a serial port has been closed and the >> closing stage is at shutdown() in serial.c in the call to free_IRQ and >> BEFORE the IRQ is really shutdown, a new character arrives which causes an >> IRQ? Is it possible that the OTHER cpu takes this interrupt and causes a >> crash? > >I'm looking at serial-5.05/serial.c. You'll notice at the >beginning of shutdown the saveflags(); cli(); calls. >This disables interrupts. The uart will not be able to >generate IRQs even if new characters arrive. The other CPU servicing the interrupt, was the question. cli() doesn't affect that. This could presumably happen if shutdown() gets run on a non-interrupt-servicing CPU, or if interrupts are dynamically routed (eg round-robin). Where can I find the 5.05 driver? -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mktime in include/linux
At 1:43 PM +0200 2001-06-22, Erik Mouw wrote: >On Thu, Jun 21, 2001 at 10:30:40PM -0400, Rick Hohensee wrote: >> Why does Linux have a mktime routine fully coded in linux/time.h that >> conflicts directly with the ANSI C standard library routine of the same >> name? It breaks a couple things against libc5, including gcc 3.0. OK, you >> don't care about libc5. It's still pretty weird. Wierd? Weird. > >This has been brought up many times on this list: you are not supposed >to include kernel headers in userland. That's not the problem, I think. Most of time.h, including the definition of mktime, is #ifdef __KERNEL__, so it shouldn't be breaking anything in userland even if you do include it. And you might, in order to obtain the interface definition of struct timespec. What's weird is: why is __KERNEL__ getting #defined in Rick's userland? There can't, of course, be any blanket prohibition against using kernel headers in userland. Think about ioctl.h, for example. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mktime in include/linux
At 1:43 PM +0200 2001-06-22, Erik Mouw wrote: On Thu, Jun 21, 2001 at 10:30:40PM -0400, Rick Hohensee wrote: Why does Linux have a mktime routine fully coded in linux/time.h that conflicts directly with the ANSI C standard library routine of the same name? It breaks a couple things against libc5, including gcc 3.0. OK, you don't care about libc5. It's still pretty weird. Wierd? Weird. This has been brought up many times on this list: you are not supposed to include kernel headers in userland. That's not the problem, I think. Most of time.h, including the definition of mktime, is #ifdef __KERNEL__, so it shouldn't be breaking anything in userland even if you do include it. And you might, in order to obtain the interface definition of struct timespec. What's weird is: why is __KERNEL__ getting #defined in Rick's userland? There can't, of course, be any blanket prohibition against using kernel headers in userland. Think about ioctl.h, for example. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Q serial.c
At 9:51 AM -0400 2001-06-22, Stuart MacDonald wrote: From: kees [EMAIL PROTECTED] What may happen on a SMP machine if a serial port has been closed and the closing stage is at shutdown() in serial.c in the call to free_IRQ and BEFORE the IRQ is really shutdown, a new character arrives which causes an IRQ? Is it possible that the OTHER cpu takes this interrupt and causes a crash? I'm looking at serial-5.05/serial.c. You'll notice at the beginning of shutdown the saveflags(); cli(); calls. This disables interrupts. The uart will not be able to generate IRQs even if new characters arrive. The other CPU servicing the interrupt, was the question. cli() doesn't affect that. This could presumably happen if shutdown() gets run on a non-interrupt-servicing CPU, or if interrupts are dynamically routed (eg round-robin). Where can I find the 5.05 driver? -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Controversy over dynamic linking -- how to end the panic
At 8:06 PM +0100 2001-06-21, Alan Cox wrote: > > > the stdio.h, I'd tell him to go screw himself. >> What is the difference between including kernel header file and >> including GPLed header file? > >There are real differences between programs and interface definitions. At this >point you get into law and the like and its probably best you read up on it >from a reputable source not l/k Though header files don't fall clearly on the interface-definition side of the line. ctype.h, for example, in userland, or any other header with #defined or inline code. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Controversy over dynamic linking -- how to end the panic
At 8:06 PM +0100 2001-06-21, Alan Cox wrote: the stdio.h, I'd tell him to go screw himself. What is the difference between including kernel header file and including GPLed header file? There are real differences between programs and interface definitions. At this point you get into law and the like and its probably best you read up on it from a reputable source not l/k Though header files don't fall clearly on the interface-definition side of the line. ctype.h, for example, in userland, or any other header with #defined or inline code. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alan Cox quote? (was: Re: accounting for threads)
At 9:09 AM -0700 2001-06-19, Larry McVoy wrote: >Don't you think it is funny that Sun doesn't publish numbers comparing >their thread performance to process performance? Sure, you can find >context switch benchmarks where they have user level switching going on >but those are a red herring. The real numbers you want are the kernel >level context switches and those are just as expensive as the process >context switch numbers. Sun (or at least SPARC) is a bit of a special case, though. SPARC's register-window architecture makes thread-switching (not to mention recursion) significantly more expensive than on most other architectures. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Alan Cox quote? (was: Re: accounting for threads)
At 9:09 AM -0700 2001-06-19, Larry McVoy wrote: Don't you think it is funny that Sun doesn't publish numbers comparing their thread performance to process performance? Sure, you can find context switch benchmarks where they have user level switching going on but those are a red herring. The real numbers you want are the kernel level context switches and those are just as expensive as the process context switch numbers. Sun (or at least SPARC) is a bit of a special case, though. SPARC's register-window architecture makes thread-switching (not to mention recursion) significantly more expensive than on most other architectures. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: any good diff merging utility?
At 2:34 AM +0200 2001-06-18, Ivan Vadovic wrote: >Very often the case is that they indeed can be merged automagically. >For example two patches inserting few lines right after the #include >lines. > >patch1: >@@ 10,1 10,2 @@ > #include >+#include <1.h> > >patch2: >@@ 10,1 10,2 @@ > #include >+#include <2.h> > >The patch will fail to patch :-). But there is no real conflict between >the patches. Problem is, you can't tell automatically. Even if the diffs don't conflict physically, it's entirely possible that they conflict logically. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: any good diff merging utility?
At 2:34 AM +0200 2001-06-18, Ivan Vadovic wrote: Very often the case is that they indeed can be merged automagically. For example two patches inserting few lines right after the #include lines. patch1: @@ 10,1 10,2 @@ #include foo.h +#include 1.h patch2: @@ 10,1 10,2 @@ #include foo.h +#include 2.h The patch will fail to patch :-). But there is no real conflict between the patches. Problem is, you can't tell automatically. Even if the diffs don't conflict physically, it's entirely possible that they conflict logically. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Going beyond 256 PCI buses
At 10:14 AM -0400 2001-06-14, Jeff Garzik wrote: >According to the PCI spec it is -impossible- to have more than 256 buses >on a single "hose", so you simply have to implement multiple hoses, just >like Alpha (and Sparc64?) already do. That's how the hardware is forced >to implement it... That's right, of course. A small problem is that dev->slot_name becomes ambiguous, since it doesn't have any hose identification. Nor does it have any room for the hose id; it's fixed at 8 chars, and fully used (bb:dd.f\0). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Going beyond 256 PCI buses
At 10:14 AM -0400 2001-06-14, Jeff Garzik wrote: According to the PCI spec it is -impossible- to have more than 256 buses on a single hose, so you simply have to implement multiple hoses, just like Alpha (and Sparc64?) already do. That's how the hardware is forced to implement it... That's right, of course. A small problem is that dev-slot_name becomes ambiguous, since it doesn't have any hose identification. Nor does it have any room for the hose id; it's fixed at 8 chars, and fully used (bb:dd.f\0). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Configure.help is complete
At 2:59 PM +0200 2001-06-01, David Weinehall wrote: > > Not to open a what may be can of worms but ... >> >> What's wrong with procfs? > >Imho, a procfs should be for process-information, nothing else. >The procfs in its current form, while useful, is something horrible >that should be taken out on the backyard and shot using slugs. > >Ehrmmm. No, but seriously, the non-process stuff should be separate >from the procfs. Maybe call it kernfs or whatever. > >> It allows a general interface to the kernel that does not require new >> syscalls/ioctls and can be accessed from user space without specifically >> compiled programs. You can use shell scripts, java, command line etc. > >Yes, and it's also totally non standardised. It clearly fills a need, though, and has the distinct side benefit of cutting down on the proliferation of ioctls. Sure, it's non-standard and a mess. But it's semi-documented, easy to use, and v. general. What's the preferred alternative, to state the first question another way? For any single small project/driver, creating a new fs simply isn't going to happen. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Configure.help is complete
At 2:59 PM +0200 2001-06-01, David Weinehall wrote: Not to open a what may be can of worms but ... What's wrong with procfs? Imho, a procfs should be for process-information, nothing else. The procfs in its current form, while useful, is something horrible that should be taken out on the backyard and shot using slugs. Ehrmmm. No, but seriously, the non-process stuff should be separate from the procfs. Maybe call it kernfs or whatever. It allows a general interface to the kernel that does not require new syscalls/ioctls and can be accessed from user space without specifically compiled programs. You can use shell scripts, java, command line etc. Yes, and it's also totally non standardised. It clearly fills a need, though, and has the distinct side benefit of cutting down on the proliferation of ioctls. Sure, it's non-standard and a mess. But it's semi-documented, easy to use, and v. general. What's the preferred alternative, to state the first question another way? For any single small project/driver, creating a new fs simply isn't going to happen. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to know HZ from userspace?
At 1:38 AM +0100 2001-05-31, Joel Becker wrote: >On Wed, May 30, 2001 at 05:24:37PM -0700, Jonathan Lundell wrote: >> FWIW (perhaps not much in this context), the POSIX way is >>sysconf(_SC_CLK_TCK) >> >> POSIX sysconf is pretty useful for this kind of thing (not just HZ, either). > > Well, how many hundred things on Linux are available from /proc >but not from sysconf or the like? :-) > >Joel Lots. Maybe we oughta have /proc/sysconf/... (there's no reason sysconf() can't be a library reading /proc). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to know HZ from userspace?
At 5:07 PM -0700 2001-05-30, H. Peter Anvin wrote: > > If you now want to set those values from a userspace program / script in >> a portable manner, you need to be able to find out of HZ of the currently >> running kernel. >> > >Yes, but that's because the interfaces are broken. The decision has >been that these values should be exported using the default HZ for the >architecture, and that it is the kernel's responsibility to scale them >when HZ != USER_HZ. I don't know if any work has been done in this >area. FWIW (perhaps not much in this context), the POSIX way is sysconf(_SC_CLK_TCK) POSIX sysconf is pretty useful for this kind of thing (not just HZ, either). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to know HZ from userspace?
At 5:07 PM -0700 2001-05-30, H. Peter Anvin wrote: If you now want to set those values from a userspace program / script in a portable manner, you need to be able to find out of HZ of the currently running kernel. Yes, but that's because the interfaces are broken. The decision has been that these values should be exported using the default HZ for the architecture, and that it is the kernel's responsibility to scale them when HZ != USER_HZ. I don't know if any work has been done in this area. FWIW (perhaps not much in this context), the POSIX way is sysconf(_SC_CLK_TCK) POSIX sysconf is pretty useful for this kind of thing (not just HZ, either). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to know HZ from userspace?
At 1:38 AM +0100 2001-05-31, Joel Becker wrote: On Wed, May 30, 2001 at 05:24:37PM -0700, Jonathan Lundell wrote: FWIW (perhaps not much in this context), the POSIX way is sysconf(_SC_CLK_TCK) POSIX sysconf is pretty useful for this kind of thing (not just HZ, either). Well, how many hundred things on Linux are available from /proc but not from sysconf or the like? :-) Joel Lots. Maybe we oughta have /proc/sysconf/... (there's no reason sysconf() can't be a library reading /proc). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] large stack variables (>=1K) in 2.4.4 and 2.4.4-ac8
At 8:45 AM -0700 2001-05-25, dean gaudet wrote: >i think it really depends on how you use current -- here's an alternative >usage which can fold the extra addition into the structure offset >calculations, and moves the task struct to the top of the stack. > >not that this really solves anything, 'cause a stack underflow will just >trash something else rather than the task struct :) It would open the door for putting a guard page (which only occupies virtual space, after all) below the stack. I have no idea whether that's practical, given other constraints, but it's a potential benefit of having the stack at the bottom rather than the top of a page. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] large stack variables (=1K) in 2.4.4 and 2.4.4-ac8
At 8:45 AM -0700 2001-05-25, dean gaudet wrote: i think it really depends on how you use current -- here's an alternative usage which can fold the extra addition into the structure offset calculations, and moves the task struct to the top of the stack. not that this really solves anything, 'cause a stack underflow will just trash something else rather than the task struct :) It would open the door for putting a guard page (which only occupies virtual space, after all) below the stack. I have no idea whether that's practical, given other constraints, but it's a potential benefit of having the stack at the bottom rather than the top of a page. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 5:56 PM +0200 2001-05-24, Andi Kleen wrote: >On Thu, May 24, 2001 at 08:50:04AM -0700, Jonathan Lundell wrote: > > At 10:31 AM +0200 2001-05-24, Andi Kleen wrote: >> >reiserfs doesn't, but the HD usually has transparently in its firmware. >> >So it hits a bad block; you see an IO error and the next time you hit >> >the block the firmware has mapped in a fresh one from its internal >> >reserves. >> >> Drives have remapping capability, but it's the first I've heard of HD >> firmware doing it automatically. I'd be very interested in reading >> the relevant documentation, if you could provide a pointer. Seems to >> me if a drive *could* do this, you'd certainly want to turn it >> (automatic remapping) off. There's way too much chance that a system >> will read the remapped sector and assume that it contains the >> original data. That would be hopelessly corrupting. > >There are two scenarios: read and write. For write doing remapping transparent >is all fine, as the data is destroyed anyways. >For read it returns an IO error once and the next time you read from that >block it contains fresh (or partly recovered) data. What HDs are we talking about, specifically? WRT writes, how does the drive detect the error? WRT reads, there are too many filesystems that would accept the second (no-IO-error) read as being the original good data. IBM's UltraStar drives have an option (a bit in a vendor-unique mode page) that enables automatic reassignment, but it's done safely. If an unrecoverable read error is reported, the block is entered in a list of reassignment candidates. If that block is subsequently written, it's written back to the original location, and then verified. If the verify fails, the block is reassigned and rewritten; if it succeeds, it's left in the original location, and the block is removed from the reassignment candidate list. Notice that invalid data is never returned without an error indication. That's critical. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 12:19 PM +0200 2001-05-24, Jens Axboe wrote: >In fact you will typically only see an I/O error if the drive _can't_ >remap the sector anymore, because it has run out. No point in reporting >a condition that was recovered. > >I'd still say, that if you get bad block errors reported from your disk >it's long overdue for replacement. This can't be right. It implies that the drive is returning bogus data with no error indication. Remapping a bad sector is not the same as recovering it. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 10:31 AM +0200 2001-05-24, Andi Kleen wrote: >reiserfs doesn't, but the HD usually has transparently in its firmware. >So it hits a bad block; you see an IO error and the next time you hit >the block the firmware has mapped in a fresh one from its internal >reserves. Drives have remapping capability, but it's the first I've heard of HD firmware doing it automatically. I'd be very interested in reading the relevant documentation, if you could provide a pointer. Seems to me if a drive *could* do this, you'd certainly want to turn it (automatic remapping) off. There's way too much chance that a system will read the remapped sector and assume that it contains the original data. That would be hopelessly corrupting. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 12:19 PM +0200 2001-05-24, Jens Axboe wrote: In fact you will typically only see an I/O error if the drive _can't_ remap the sector anymore, because it has run out. No point in reporting a condition that was recovered. I'd still say, that if you get bad block errors reported from your disk it's long overdue for replacement. This can't be right. It implies that the drive is returning bogus data with no error indication. Remapping a bad sector is not the same as recovering it. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 10:31 AM +0200 2001-05-24, Andi Kleen wrote: reiserfs doesn't, but the HD usually has transparently in its firmware. So it hits a bad block; you see an IO error and the next time you hit the block the firmware has mapped in a fresh one from its internal reserves. Drives have remapping capability, but it's the first I've heard of HD firmware doing it automatically. I'd be very interested in reading the relevant documentation, if you could provide a pointer. Seems to me if a drive *could* do this, you'd certainly want to turn it (automatic remapping) off. There's way too much chance that a system will read the remapped sector and assume that it contains the original data. That would be hopelessly corrupting. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dying disk and filesystem choice.
At 5:56 PM +0200 2001-05-24, Andi Kleen wrote: On Thu, May 24, 2001 at 08:50:04AM -0700, Jonathan Lundell wrote: At 10:31 AM +0200 2001-05-24, Andi Kleen wrote: reiserfs doesn't, but the HD usually has transparently in its firmware. So it hits a bad block; you see an IO error and the next time you hit the block the firmware has mapped in a fresh one from its internal reserves. Drives have remapping capability, but it's the first I've heard of HD firmware doing it automatically. I'd be very interested in reading the relevant documentation, if you could provide a pointer. Seems to me if a drive *could* do this, you'd certainly want to turn it (automatic remapping) off. There's way too much chance that a system will read the remapped sector and assume that it contains the original data. That would be hopelessly corrupting. There are two scenarios: read and write. For write doing remapping transparent is all fine, as the data is destroyed anyways. For read it returns an IO error once and the next time you read from that block it contains fresh (or partly recovered) data. What HDs are we talking about, specifically? WRT writes, how does the drive detect the error? WRT reads, there are too many filesystems that would accept the second (no-IO-error) read as being the original good data. IBM's UltraStar drives have an option (a bit in a vendor-unique mode page) that enables automatic reassignment, but it's done safely. If an unrecoverable read error is reported, the block is entered in a list of reassignment candidates. If that block is subsequently written, it's written back to the original location, and then verified. If the verify fails, the block is reassigned and rewritten; if it succeeds, it's left in the original location, and the block is removed from the reassignment candidate list. Notice that invalid data is never returned without an error indication. That's critical. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 10:24 PM +0100 2001-05-22, Alan Cox wrote: > > On the main board, and not just the old ones. These days it's >> typically in the chipset's south bridge. "Third-party DMA" is >> sometimes called "fly-by DMA". The ISA card is a slave, as is memory, >> and the DMA chip reads from one ands writes to the other. > >There is also another mode which will give the Alpha kittens I suspect. A >few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes >to the ISA DMA controller which does a pointless ISA transfer and the PCI >card sniffs the DMA controller setup (as it goes to pci, then when nobody >claims it on to the isa bridge) then does bus mastering DMA of its own to fake >the ISA dma That's sick. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 2:02 PM -0700 2001-05-22, Richard Henderson wrote: >On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: >> 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't >> apply to bus-master DMA, just the legacy (8237) stuff. > >Would this 8237 be something on the ISA card, or something on >the old pc mainboards? I'm wondering if we can safely ignore >this issue altogether here... On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. "Third-party DMA" is sometimes called "fly-by DMA". The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. IDE didn't originally use DMA at all (but floppies did), just programmed IO. These days, PC chipsets mostly have some form of extended higher-performance DMA facilities for stuff like IDE, but I'm not really familiar with the details. I do wish Linux didn't have so much PC legacy sh^Htuff embedded into the i386 architecture. > > There was also a 24-bit address limitation. > >Yes, that's in the number of address lines going to the isa card. >We work around that one by having an iommu arena from 8M to 16M >and forcing all ISA traffic to go through there. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 1:28 PM -0700 2001-05-22, Richard Henderson wrote: >On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: >> I'm also wondering if ISA needs the sg to start on a 64k boundary, > >Traditionally, ISA could not do DMA across a 64k boundary. > >The only ISA card I have (a soundblaster compatible) appears >to work without caring for this, but I suppose we should pay >lip service to pedantics. 64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the legacy third-party-DMA controllers, which had only 16-bit address registers (the high part of the address lives in a non-counting register). This doesn't apply to bus-master DMA, just the legacy (8237) stuff. There was also a 24-bit address limitation. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote: >On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: > > Electrically (someone correct me, I'm probably wrong) PCI is > limited to 6 physical plug-in slots I believe, let's say it's 8 > to choose an arbitrary larger number to be safe. > >Minor nit... it can in fact be higher than this, but typically it is >not. CompactPCI implementations may go higher (different electrical >characteristics allow for this). Compact PCI specifies a max of 8 slots (one of which is typically the system board). Regular PCI doesn't have a hard and fast slot limit (except for the logical limit of 32 devices per bus); the limits are driven by electrical loading concerns. As I recall, a bus of typical length can accommodate 10 "loads", where a load is either a device pin or a slot connector (that is, an expansion card counts as two loads, one for the device and one for the connector). (I take this to be a rule of thumb, not a hard spec, based on the detailed electrical requirements in the PCI spec.) Still, the presence of bridges opens up the number of devices on a root PCI bus to a very high number, logically. Certainly having three or four quad Ethernet cards, so 12 or 16 devices, is a plausible configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst bandwidth of 533 MB/second, which would be saturated by 20 full duplex 100baseT ports that were themselves saturated in both directions (all ignoring overhead). Full saturation is not reasonable for either PCI or Ethernet; I'm just looking at order-of-magnitude numbers here. The bottom line is: don't make any hard and fast assumption about the number of devices connected to a root PCI bus. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 11:12 PM +1200 2001-05-22, Chris Wedgwood wrote: On Mon, May 21, 2001 at 03:19:54AM -0700, David S. Miller wrote: Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Minor nit... it can in fact be higher than this, but typically it is not. CompactPCI implementations may go higher (different electrical characteristics allow for this). Compact PCI specifies a max of 8 slots (one of which is typically the system board). Regular PCI doesn't have a hard and fast slot limit (except for the logical limit of 32 devices per bus); the limits are driven by electrical loading concerns. As I recall, a bus of typical length can accommodate 10 loads, where a load is either a device pin or a slot connector (that is, an expansion card counts as two loads, one for the device and one for the connector). (I take this to be a rule of thumb, not a hard spec, based on the detailed electrical requirements in the PCI spec.) Still, the presence of bridges opens up the number of devices on a root PCI bus to a very high number, logically. Certainly having three or four quad Ethernet cards, so 12 or 16 devices, is a plausible configuration. As for bandwidth, a 64x66 PCI bus has a nominal burst bandwidth of 533 MB/second, which would be saturated by 20 full duplex 100baseT ports that were themselves saturated in both directions (all ignoring overhead). Full saturation is not reasonable for either PCI or Ethernet; I'm just looking at order-of-magnitude numbers here. The bottom line is: don't make any hard and fast assumption about the number of devices connected to a root PCI bus. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 10:24 PM +0100 2001-05-22, Alan Cox wrote: On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. Third-party DMA is sometimes called fly-by DMA. The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. There is also another mode which will give the Alpha kittens I suspect. A few PCI cards do SB emulation by snooping the PCI bus. So the kernel writes to the ISA DMA controller which does a pointless ISA transfer and the PCI card sniffs the DMA controller setup (as it goes to pci, then when nobody claims it on to the isa bridge) then does bus mastering DMA of its own to fake the ISA dma That's sick. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 2:02 PM -0700 2001-05-22, Richard Henderson wrote: On Tue, May 22, 2001 at 01:48:23PM -0700, Jonathan Lundell wrote: 64KB for 8-bit DMA; 128KB for 16-bit DMA. [...] This doesn't apply to bus-master DMA, just the legacy (8237) stuff. Would this 8237 be something on the ISA card, or something on the old pc mainboards? I'm wondering if we can safely ignore this issue altogether here... On the main board, and not just the old ones. These days it's typically in the chipset's south bridge. Third-party DMA is sometimes called fly-by DMA. The ISA card is a slave, as is memory, and the DMA chip reads from one ands writes to the other. IDE didn't originally use DMA at all (but floppies did), just programmed IO. These days, PC chipsets mostly have some form of extended higher-performance DMA facilities for stuff like IDE, but I'm not really familiar with the details. asideI do wish Linux didn't have so much PC legacy sh^Htuff embedded into the i386 architecture./aside There was also a 24-bit address limitation. Yes, that's in the number of address lines going to the isa card. We work around that one by having an iommu arena from 8M to 16M and forcing all ISA traffic to go through there. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 1:28 PM -0700 2001-05-22, Richard Henderson wrote: On Tue, May 22, 2001 at 05:00:16PM +0200, Andrea Arcangeli wrote: I'm also wondering if ISA needs the sg to start on a 64k boundary, Traditionally, ISA could not do DMA across a 64k boundary. The only ISA card I have (a soundblaster compatible) appears to work without caring for this, but I suppose we should pay lip service to pedantics. 64KB for 8-bit DMA; 128KB for 16-bit DMA. It's a limitation of the legacy third-party-DMA controllers, which had only 16-bit address registers (the high part of the address lives in a non-counting register). This doesn't apply to bus-master DMA, just the legacy (8237) stuff. There was also a 24-bit address limitation. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 3:19 AM -0700 2001-05-21, David S. Miller wrote: >This is totally wrong in two ways. > >Let me fix this, the IOMMU on these machines is per PCI bus, so this >figure should be drastically lower. > >Electrically (someone correct me, I'm probably wrong) PCI is limited >to 6 physical plug-in slots I believe, let's say it's 8 to choose an >arbitrary larger number to be safe. > >Then we have: > >max bytes per bttv: max_gbuffers * max_gbufsize > 64 * 0x208000 == 133.12MB > >133.12MB * 8 PCI slots == ~1.06 GB > >Which is still only half of the total IOMMU space available per >controller. 8 slots (and you're right, 6 is a practical upper limit, fewer for 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot limit becomes largely irrelevant. A typical quad Ethernet card, for example (and this is true for many/most multiple-device cards), has a bridge, its own internal PCI bus, and four "slots" ("devices" in PCI terminology). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha iommu fixes
At 3:19 AM -0700 2001-05-21, David S. Miller wrote: This is totally wrong in two ways. Let me fix this, the IOMMU on these machines is per PCI bus, so this figure should be drastically lower. Electrically (someone correct me, I'm probably wrong) PCI is limited to 6 physical plug-in slots I believe, let's say it's 8 to choose an arbitrary larger number to be safe. Then we have: max bytes per bttv: max_gbuffers * max_gbufsize 64 * 0x208000 == 133.12MB 133.12MB * 8 PCI slots == ~1.06 GB Which is still only half of the total IOMMU space available per controller. 8 slots (and you're right, 6 is a practical upper limit, fewer for 66 MHz) *per bus*. Buses can proliferate like crazy, so the slot limit becomes largely irrelevant. A typical quad Ethernet card, for example (and this is true for many/most multiple-device cards), has a bridge, its own internal PCI bus, and four slots (devices in PCI terminology). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 2:16 AM +1200 2001-05-21, Chris Wedgwood wrote: >On Sat, May 19, 2001 at 10:36:14AM -0700, Jonathan Lundell wrote: > > I know from system documentation, or can figure out once and for > all by experimentation, the correspondence between PCI > bus/dev/fcn and physical locations. Jeff's extension gives me the > mapping between eth# and PCI bus/dev/fcn, which is not otherwise > available (outside the kernel). > >Won't work with hotplug PCI (consider plugging in something with a >bridge). It's true that hotplug devices make it more complicated, but I think the result can be achieved by describing the correspondence topologically rather than as a simple b/d/f-to-location table. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 3:37 AM -0600 2001-05-20, Eric W. Biederman wrote: >Jonathan Lundell <[EMAIL PROTECTED]> writes: > >> At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: >> > > Jeff Garzik's ethtool >> > > extension at least tells me the PCI bus/dev/fcn, though, and from >> >> that I can write a userland mapping function to the physical >> >> location. >> > >> >I don't see how PCI bus/dev/fcn lets you do that. >> >> I know from system documentation, or can figure out once and for all >> by experimentation, the correspondence between PCI bus/dev/fcn and >> physical locations. Jeff's extension gives me the mapping between >> eth# and PCI bus/dev/fcn, which is not otherwise available (outside >> the kernel). > >Just a second let me reenumerate your pci busses, and change all of the bus >numbers. Not that this is a bad thought. It is just you need to know >the tree of PCI busses/bridges up to the root on the machine in question. Yes, you do. And it's true that renumbering is problematical; I hadn't thought of all the implications. Say, you have a system with hot-plug slots on two buses, and someone hot-plugs a card with a bridge (fairly common; most dual/quad Ethernet boards have a bridge). If the buses were numbered densely to begin with, they're going to have to be renumbered above the point that the new bridge was added. Phooey. Well, it can still be done, but it's a bit more complicated than the bus/dev/fcn-to-location map I was imagining. You'd have to describe the topology of the built-in buses, and dynamically make the correspondences. As you say, "know the tree", by topology, not bus numbers. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 3:37 AM -0600 2001-05-20, Eric W. Biederman wrote: Jonathan Lundell [EMAIL PROTECTED] writes: At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: Jeff Garzik's ethtool extension at least tells me the PCI bus/dev/fcn, though, and from that I can write a userland mapping function to the physical location. I don't see how PCI bus/dev/fcn lets you do that. I know from system documentation, or can figure out once and for all by experimentation, the correspondence between PCI bus/dev/fcn and physical locations. Jeff's extension gives me the mapping between eth# and PCI bus/dev/fcn, which is not otherwise available (outside the kernel). Just a second let me reenumerate your pci busses, and change all of the bus numbers. Not that this is a bad thought. It is just you need to know the tree of PCI busses/bridges up to the root on the machine in question. Yes, you do. And it's true that renumbering is problematical; I hadn't thought of all the implications. Say, you have a system with hot-plug slots on two buses, and someone hot-plugs a card with a bridge (fairly common; most dual/quad Ethernet boards have a bridge). If the buses were numbered densely to begin with, they're going to have to be renumbered above the point that the new bridge was added. Phooey. Well, it can still be done, but it's a bit more complicated than the bus/dev/fcn-to-location map I was imagining. You'd have to describe the topology of the built-in buses, and dynamically make the correspondences. As you say, know the tree, by topology, not bus numbers. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 2:16 AM +1200 2001-05-21, Chris Wedgwood wrote: On Sat, May 19, 2001 at 10:36:14AM -0700, Jonathan Lundell wrote: I know from system documentation, or can figure out once and for all by experimentation, the correspondence between PCI bus/dev/fcn and physical locations. Jeff's extension gives me the mapping between eth# and PCI bus/dev/fcn, which is not otherwise available (outside the kernel). Won't work with hotplug PCI (consider plugging in something with a bridge). It's true that hotplug devices make it more complicated, but I think the result can be achieved by describing the correspondence topologically rather than as a simple b/d/f-to-location table. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: > > >Make your config script look at the hardware MAC addresses. Those don't >> >change. >> >> They're not necessarily unique, though. > >So if you plug both into the same network segment, that segment is broken? >That looks like very stupid design to me. > >It's not as if getting enough unique MAC addresses was particularly >expensive. These days, even el-cheapo PC network cards get that right. >(And have for quite a number of years.) Many do, some don't. Moreover, the MAC address is volatile in that it can be changed at will (via, eg, ifconfig). I assume that the reason that Sun (for example) defaults to all MAC addresses on a system being the same is that it doesn't make sense, ordinarily, to plug two Ethernet interfaces into the same network segment. If, for some reason, you really want to do that, there's ifconfig ready to reassign the MAC address. If I plug both into the same network segments by accident (because I can't tell which is which, say), then my configuration is nearly as broken with different MAC addresses as with identical ones; the fix is to replug correctly, not to change MAC addresses. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: > > Jeff Garzik's ethtool > > extension at least tells me the PCI bus/dev/fcn, though, and from >> that I can write a userland mapping function to the physical >> location. > >I don't see how PCI bus/dev/fcn lets you do that. I know from system documentation, or can figure out once and for all by experimentation, the correspondence between PCI bus/dev/fcn and physical locations. Jeff's extension gives me the mapping between eth# and PCI bus/dev/fcn, which is not otherwise available (outside the kernel). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: Make your config script look at the hardware MAC addresses. Those don't change. They're not necessarily unique, though. So if you plug both into the same network segment, that segment is broken? That looks like very stupid design to me. It's not as if getting enough unique MAC addresses was particularly expensive. These days, even el-cheapo PC network cards get that right. (And have for quite a number of years.) Many do, some don't. Moreover, the MAC address is volatile in that it can be changed at will (via, eg, ifconfig). I assume that the reason that Sun (for example) defaults to all MAC addresses on a system being the same is that it doesn't make sense, ordinarily, to plug two Ethernet interfaces into the same network segment. If, for some reason, you really want to do that, there's ifconfig ready to reassign the MAC address. If I plug both into the same network segments by accident (because I can't tell which is which, say), then my configuration is nearly as broken with different MAC addresses as with identical ones; the fix is to replug correctly, not to change MAC addresses. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote: Jeff Garzik's ethtool extension at least tells me the PCI bus/dev/fcn, though, and from that I can write a userland mapping function to the physical location. I don't see how PCI bus/dev/fcn lets you do that. I know from system documentation, or can figure out once and for all by experimentation, the correspondence between PCI bus/dev/fcn and physical locations. Jeff's extension gives me the mapping between eth# and PCI bus/dev/fcn, which is not otherwise available (outside the kernel). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Storage - redundant path failover / failback - quo vadis linux?
At 9:03 AM +0200 2001-05-18, [EMAIL PROTECTED] wrote: > >My question is which way is the more probable solution for future linux > >kernels? > >The low-level-approach of the "T3"-patch requires changes to the > >scsi-drivers and the hardware-drivers but provides optimal communication > >between the driver and the hardware > >Thinking about it: if there would be some sort of 'available' flag >in the gendisk structure, that would be updated by the low-level >drivers. This could the used by a high-level design to use or skip a >failed device/path... In the S/390 (or zSeries) environment the >device drivers are even able to detect a failing connection even if >there is no data going to a device. That way the device would be >disabled even _before_ anybody tries to write... > > >The high-level-approach of the "multipath"-personality is > >hardware-independant but works very slowly. On the other hand I see no > >clear way how to check for availability of the (previously failed) primary > >channel to automate a fail-back. > >Well, slower, but I think there will be many that take that >performance loss already by using lvm or md (for the benefit of >flexible/large filesystems) this approach would add failover while >beeing IMHO only a little less performant. The flag idea, or some equivalent way for the low-level driver to communicate to the multi-pathing level, seems exactly right. I'm guessing that provision needs to be made for some external-device-dependent means of signalling both failure and recovery. There are potentially side-channel/out-of-band means to communicate this kind of status from specific devices. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Storage - redundant path failover / failback - quo vadis linux?
At 9:03 AM +0200 2001-05-18, [EMAIL PROTECTED] wrote: My question is which way is the more probable solution for future linux kernels? The low-level-approach of the T3-patch requires changes to the scsi-drivers and the hardware-drivers but provides optimal communication between the driver and the hardware Thinking about it: if there would be some sort of 'available' flag in the gendisk structure, that would be updated by the low-level drivers. This could the used by a high-level design to use or skip a failed device/path... In the S/390 (or zSeries) environment the device drivers are even able to detect a failing connection even if there is no data going to a device. That way the device would be disabled even _before_ anybody tries to write... The high-level-approach of the multipath-personality is hardware-independant but works very slowly. On the other hand I see no clear way how to check for availability of the (previously failed) primary channel to automate a fail-back. Well, slower, but I think there will be many that take that performance loss already by using lvm or md (for the benefit of flexible/large filesystems) this approach would add failover while beeing IMHO only a little less performant. The flag idea, or some equivalent way for the low-level driver to communicate to the multi-pathing level, seems exactly right. I'm guessing that provision needs to be made for some external-device-dependent means of signalling both failure and recovery. There are potentially side-channel/out-of-band means to communicate this kind of status from specific devices. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 11:23 PM +0200 2001-05-17, Kai Henningsen wrote: >[EMAIL PROTECTED] (Jonathan Lundell) wrote on 15.05.01 in ><p05100316b7272cdfd50c@[207.213.214.37]>: > >> What about: >> >> 1 (network domain). I have two network interfaces that I connect to >> two different network segments, eth0 & eth1; they're ifconfig'd to >> the appropriate IP and MAC addresses. I really do need to know >> physically which (physical) hole to plug my eth0 cable into. > >Sorry, the software doesn't know that. Never has, for that matter. Well, no, it doesn't. That's a problem. Jeff Garzik's ethtool extension at least tells me the PCI bus/dev/fcn, though, and from that I can write a userland mapping function to the physical location. My point, though, is that finding the socket is a real-life problem on systems with multiple interfaces. I don't expect the kernel to know the physical locations, but the user has to be able to get from kernel/ifconfig names (eth#) to sockets, one way or another. Support for a uniform means of doing the mapping, even if it needs userland help, would be good. > > (Extension: same situation, but it's a firewall and I've got 12 ports >> to connect.) (Extension #2: if I add a NIC to the system and reboot, >> I'd really prefer that the NICs already in use didn't get renumbered.) > >Make your config script look at the hardware MAC addresses. Those don't >change. They're not necessarily unique, though. > > 2 (disk domain). I have multiple spindles on multiple SCSI adapters. >> I want to allocate them to more than one RAID0/1/5 set, with the >> usual considerations of putting mirrors on different adapters, >> spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI >> paths to config all this, and I further need real physical locations >> to identify failed drives that need to be hot-replaced. The mirror >> members will move around as drives are replaced and hot spares come >> into play. > >Use partition UUIDs, or SCSI serial numbers, or whatever. This works >today. This pushes the problem back in time: I need to write the UUID, for example, at some point. And, with hot-swappable drives, I'm still interested in the physical location. I really know know that there's a good answer to this problem, especially with FC, but I need to tell an operator, "replace this particular physical drive". It doesn't do any good to tell the operator the UUID. > > Seems like more that merely informational. > >The *location*? Nope. Some unique id for the device, if available at all: >sure. What good does it do to tell an operator to connect a cable to a MAC address? Or to remove a drive having a particular UUID? If it's "mere information", it's *necessary* mere information. > > (A side observation: PCI or SCSI bus/device/lun/etc paths are not >> physical locations; you also need external hardware-specific >> knowledge to be able to talk about real physical locations in a way >> that does the system operator any good.) > >And those you typically do not have. But (ideally) should. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 11:23 PM +0200 2001-05-17, Kai Henningsen wrote: [EMAIL PROTECTED] (Jonathan Lundell) wrote on 15.05.01 in p05100316b7272cdfd50c@[207.213.214.37]: What about: 1 (network domain). I have two network interfaces that I connect to two different network segments, eth0 eth1; they're ifconfig'd to the appropriate IP and MAC addresses. I really do need to know physically which (physical) hole to plug my eth0 cable into. Sorry, the software doesn't know that. Never has, for that matter. Well, no, it doesn't. That's a problem. Jeff Garzik's ethtool extension at least tells me the PCI bus/dev/fcn, though, and from that I can write a userland mapping function to the physical location. My point, though, is that finding the socket is a real-life problem on systems with multiple interfaces. I don't expect the kernel to know the physical locations, but the user has to be able to get from kernel/ifconfig names (eth#) to sockets, one way or another. Support for a uniform means of doing the mapping, even if it needs userland help, would be good. (Extension: same situation, but it's a firewall and I've got 12 ports to connect.) (Extension #2: if I add a NIC to the system and reboot, I'd really prefer that the NICs already in use didn't get renumbered.) Make your config script look at the hardware MAC addresses. Those don't change. They're not necessarily unique, though. 2 (disk domain). I have multiple spindles on multiple SCSI adapters. I want to allocate them to more than one RAID0/1/5 set, with the usual considerations of putting mirrors on different adapters, spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI paths to config all this, and I further need real physical locations to identify failed drives that need to be hot-replaced. The mirror members will move around as drives are replaced and hot spares come into play. Use partition UUIDs, or SCSI serial numbers, or whatever. This works today. This pushes the problem back in time: I need to write the UUID, for example, at some point. And, with hot-swappable drives, I'm still interested in the physical location. I really know know that there's a good answer to this problem, especially with FC, but I need to tell an operator, replace this particular physical drive. It doesn't do any good to tell the operator the UUID. Seems like more that merely informational. The *location*? Nope. Some unique id for the device, if available at all: sure. What good does it do to tell an operator to connect a cable to a MAC address? Or to remove a drive having a particular UUID? If it's mere information, it's *necessary* mere information. (A side observation: PCI or SCSI bus/device/lun/etc paths are not physical locations; you also need external hardware-specific knowledge to be able to talk about real physical locations in a way that does the system operator any good.) And those you typically do not have. But (ideally) should. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ((struct pci_dev*)dev)->resource[...].start
At 5:37 PM -0400 2001-05-16, Jeff Garzik wrote: >This is not a safe assumption, because the OS may reprogram the PCI BARs >at certain times. The rule is: ALWAYS read from dev->resource[] unless >you are a bus driver (PCI bridges, for example, need to assign >resources). Would you please elaborate? If I understand what you're saying, you can't rely on the "pointer" returned by ioremap() because the OS might reprogram the relevant BAR out from under you. So one would need to know: when does a driver have to re-ioremap() due to the BAR having been (potentially) changed? I'd expect the answer to be: for all practical purposes never. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 4:57 PM +0200 2001-05-16, Vojtech Pavlik wrote: >On Wed, May 16, 2001 at 07:37:45AM -0700, Jonathan Lundell wrote: >> At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote: >> > > It's also true that some buses simply don't yield up physical >> >> locations (ISA springs to mind, >> > >> >ISA is quite fine, you can use the i/o space as physical locations. >> >> I meant physical not as in physical-vs-virtual addresses (all ISA >> addresses, memory or IO, are physical in this sense, by the time they >> get to the bus). Rather, I meant that you can't determine which slot >> a given device is plugged into. If you have two NICs in two ISA >> slots, there's no way to distinguish between the slots. In practice, >> you'd have to experiment or remove a card and check the jumpering or >> some such. > >Yes. But I meant that while this indeed is not possible, still the i/o >port address can be used instead of the slot number, because it at least >is physically jumpered and must be unique. Yes, I agree. And it's stable (whereas "physical" PCI addresses are not). Best we've got for ISA (though it's true for ISA memory addresses as well). -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: LANANA: To Pending Device Number Registrants
At 11:56 AM +0200 2001-05-16, Chemolli Francesco (USI) wrote: >We could do something like baptizing disks.. Fix some location >(i.e. the absolutely last sector of the disk or the partition table or >whatever) and store there some 32-bit ID >(could be a random number, a progressive number, whatever). Most of these solutions (and RAID IDs and UUIDs) don't completely solve the problem; they just push it to a different time: how do you talk about a new disk, or a new RAID array, or a moved disk? And what about removable media (not neglecting the possibility of multiple drives)? Removable media from another OS? Shared drives? Not that this kind of "firm" ID might not be an improvement, or at least a good sanity check. [Side question, not original with me: why isn't all this a 2.5 discussion?] -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote: > > It's also true that some buses simply don't yield up physical >> locations (ISA springs to mind, > >ISA is quite fine, you can use the i/o space as physical locations. I meant physical not as in physical-vs-virtual addresses (all ISA addresses, memory or IO, are physical in this sense, by the time they get to the bus). Rather, I meant that you can't determine which slot a given device is plugged into. If you have two NICs in two ISA slots, there's no way to distinguish between the slots. In practice, you'd have to experiment or remove a card and check the jumpering or some such. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 12:31 PM +1000 2001-05-16, Andrew Morton wrote: > > When I ifconfig one of a collection of interfaces, I'm very much >> talking about the specific physical interface connected via a > > specific physical cable to a specific physical switch port. >> > >Yes, it can be a security trap as well - physically move a card and >your firewall rules end up being applied to the wrong connection. > >The 2.4 kernel allows you to rename an interface. So you can build >a little database of (MAC address/name) pairs. Apply this after booting >and before bringing up the interfaces and everything has the name >you wanted, based on MAC address. > >Andi Kleen has an app which does this: > > ftp://ftp.firstfloor.org/pub/ak/smallsrc/nameif.c > >but apparently some additional kernel work is needed to make >this work 100% correctly. I do not know what the specific >problem is. There's a bit of a catch 22, though, if you don't have unique MAC addresses in the system (across multiple interfaces). It's common practice in the SPARC world (Solaris, anyway) for all the interfaces to default to a single system-wide MAC address. The fact that MAC addresses are at least semi-volatile is also bothersome. It's also true that some buses simply don't yield up physical locations (ISA springs to mind, and I gather that FC is squishy that way), but it's desirable to be able to make the connection all ways (eth# <-> bus location <-> physical location <-> MAC address) in a uniform manner. (Where MAC address might be something else in a non-Ethernet domain.) -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 12:31 PM +1000 2001-05-16, Andrew Morton wrote: When I ifconfig one of a collection of interfaces, I'm very much talking about the specific physical interface connected via a specific physical cable to a specific physical switch port. Yes, it can be a security trap as well - physically move a card and your firewall rules end up being applied to the wrong connection. The 2.4 kernel allows you to rename an interface. So you can build a little database of (MAC address/name) pairs. Apply this after booting and before bringing up the interfaces and everything has the name you wanted, based on MAC address. Andi Kleen has an app which does this: ftp://ftp.firstfloor.org/pub/ak/smallsrc/nameif.c but apparently some additional kernel work is needed to make this work 100% correctly. I do not know what the specific problem is. There's a bit of a catch 22, though, if you don't have unique MAC addresses in the system (across multiple interfaces). It's common practice in the SPARC world (Solaris, anyway) for all the interfaces to default to a single system-wide MAC address. The fact that MAC addresses are at least semi-volatile is also bothersome. It's also true that some buses simply don't yield up physical locations (ISA springs to mind, and I gather that FC is squishy that way), but it's desirable to be able to make the connection all ways (eth# - bus location - physical location - MAC address) in a uniform manner. (Where MAC address might be something else in a non-Ethernet domain.) -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote: It's also true that some buses simply don't yield up physical locations (ISA springs to mind, ISA is quite fine, you can use the i/o space as physical locations. I meant physical not as in physical-vs-virtual addresses (all ISA addresses, memory or IO, are physical in this sense, by the time they get to the bus). Rather, I meant that you can't determine which slot a given device is plugged into. If you have two NICs in two ISA slots, there's no way to distinguish between the slots. In practice, you'd have to experiment or remove a card and check the jumpering or some such. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: LANANA: To Pending Device Number Registrants
At 11:56 AM +0200 2001-05-16, Chemolli Francesco (USI) wrote: We could do something like baptizing disks.. Fix some location (i.e. the absolutely last sector of the disk or the partition table or whatever) and store there some 32-bit ID (could be a random number, a progressive number, whatever). Most of these solutions (and RAID IDs and UUIDs) don't completely solve the problem; they just push it to a different time: how do you talk about a new disk, or a new RAID array, or a moved disk? And what about removable media (not neglecting the possibility of multiple drives)? Removable media from another OS? Shared drives? Not that this kind of firm ID might not be an improvement, or at least a good sanity check. [Side question, not original with me: why isn't all this a 2.5 discussion?] -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 4:57 PM +0200 2001-05-16, Vojtech Pavlik wrote: On Wed, May 16, 2001 at 07:37:45AM -0700, Jonathan Lundell wrote: At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote: It's also true that some buses simply don't yield up physical locations (ISA springs to mind, ISA is quite fine, you can use the i/o space as physical locations. I meant physical not as in physical-vs-virtual addresses (all ISA addresses, memory or IO, are physical in this sense, by the time they get to the bus). Rather, I meant that you can't determine which slot a given device is plugged into. If you have two NICs in two ISA slots, there's no way to distinguish between the slots. In practice, you'd have to experiment or remove a card and check the jumpering or some such. Yes. But I meant that while this indeed is not possible, still the i/o port address can be used instead of the slot number, because it at least is physically jumpered and must be unique. Yes, I agree. And it's stable (whereas physical PCI addresses are not). Best we've got for ISA (though it's true for ISA memory addresses as well). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ((struct pci_dev*)dev)-resource[...].start
At 5:37 PM -0400 2001-05-16, Jeff Garzik wrote: This is not a safe assumption, because the OS may reprogram the PCI BARs at certain times. The rule is: ALWAYS read from dev-resource[] unless you are a bus driver (PCI bridges, for example, need to assign resources). Would you please elaborate? If I understand what you're saying, you can't rely on the pointer returned by ioremap() because the OS might reprogram the relevant BAR out from under you. So one would need to know: when does a driver have to re-ioremap() due to the BAR having been (potentially) changed? I'd expect the answer to be: for all practical purposes never. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 9:34 PM -0400 2001-05-15, Nicolas Pitre wrote: >On Wed, 16 May 2001, Daniel Phillips wrote: > >> On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote: >> > Personally, I'd really like to see /dev/ttyS0 be the first detected >> > serial port on a system, /dev/ttyS1 the second, etc. >> >> There are well-defined rules for the first four on PC's. The ttySx >> better match the labels the OEM put on the box. > >Then just make them be detected first. Well, they traditionally start with 1, not 0, too. Or have cute little icons and no text. Or aren't labelled at all. I'm using one fairly well-known dual-port PCI serial board that silently interchanged the two ports on a rev change, with no labelling change at all ('cause there was no label!). Make your ttySx match *that*! -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 1:18 PM -0700 2001-05-15, Linus Torvalds wrote: > > 1 (network domain). I have two network interfaces that I connect to >> two different network segments, eth0 & eth1; > >So? > >Informational. You can always ask what "eth0" and "eth1" are. > >There's another side to this: repeatability. A setup should be >_repeatable_. > >This is what we have now. Network devices are called "eth0..N", and nobody >is complaining about the fact that the numbering is basically random. It >is _repeatable_ as long as you don't change your hardware setup, and the >numbering has effectively _nothing_ to do with "location". > >You don't say "oh, I have my network card in PCI bus #2, slot #3, >subfunction #1, so I should do 'ifconfig netp2s3f1'". Right? > >The location of the device is _meaningless_. I *like* eth0..n (I'd like net0..n better). And I *can't* ask what eth0 and eth1 are, by the way, but I should be able to (Jeff Garzik has proposed an extension to ethtool to help out this lack, but it's not in Linux today, and needs concrete implementation anyway). But that's not my point. I'm *not* proposing that we exchange eth0 for geographic names. I'm suggesting, though, that the location of the device is *not* meaningless, because it's the physically-located RJ45 socket (or whatever) that I have to connect a particular cable to. Sure, no big deal for systems with a single connection, but it becomes a real pain when you've got a dozen, which is a reasonable number for some network-infrastructure functions (eg firewalls). When I ifconfig one of a collection of interfaces, I'm very much talking about the specific physical interface connected via a specific physical cable to a specific physical switch port. Bob Glamm is on the right track with At 5:35 PM -0500 2001-05-15, Bob Glamm wrote: > # start up networking > for i in eth0 eth1 eth2; do > identify device $i > get configuration/config procedure for device $i identity > configure $i > done ...it's just that right now the connection between eth* and its physical identity isn't made. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 4:35 PM -0700 2001-05-15, David Brownell wrote: >[ Re why "physical" device IDs _should_ have a critical role in sysadmin ] > >> I would have to agree that "stable" is critical to not driving people >> crazy. In the case of AIX, once a device is enumerated, it will retain >> the same name across reboots. Enough information is kept about each >> device to determine if it has already been enumerated (i.e. same I/O >> port address for serial devices, MAC address for ethernet cards, etc), >> or if it is a new device and should get a new name. > >I caught those refs to how AIX does this ... sounds worth learning from. >Does it handle USB "port addresses" (which bus and hub)? Solaris has a scheme that addresses the issue at well. Device nodes live in /devices (/dev has soft links into /devices) and have system-global-geographic names. In Solaris talk, the 0-1-2 of eth0-1-2 i an instance. There's a file /etc/pathtoinst that records the connection of an device instance to its /devices geographical name. It does keep naming stable, but can be a PITA at times when you're reconfiguring a system and *want* to renumber things. (There are magic ways to do it, though). That's all Solaris 2.6; not sure about 2.8. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 11:15 AM -0700 2001-05-15, Linus Torvalds wrote: >The part I absolutely detest is when the information becomes more than >just "information", and is used to enforce a world-view. Anybody who uses >physical location for naming devices (ie you have to know where the hell >the thing is in order to look it up), is so far out to lunch that it's not >even funny. And the sad fact is that this is pretty much how ALL unixes >have historically done things ("Oh, you want to see the disk? Sure. It's >on scsi bus 1, channel 2, ID 3, lun 0, so you just open /dev/s1c3l0 and >you're done! Easy as pie!"). > >Keep it informational. And NEVER EVER make it part of the design. What about: 1 (network domain). I have two network interfaces that I connect to two different network segments, eth0 & eth1; they're ifconfig'd to the appropriate IP and MAC addresses. I really do need to know physically which (physical) hole to plug my eth0 cable into. (Extension: same situation, but it's a firewall and I've got 12 ports to connect.) (Extension #2: if I add a NIC to the system and reboot, I'd really prefer that the NICs already in use didn't get renumbered.) 2 (disk domain). I have multiple spindles on multiple SCSI adapters. I want to allocate them to more than one RAID0/1/5 set, with the usual considerations of putting mirrors on different adapters, spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI paths to config all this, and I further need real physical locations to identify failed drives that need to be hot-replaced. The mirror members will move around as drives are replaced and hot spares come into play. Seems like more that merely informational. (A side observation: PCI or SCSI bus/device/lun/etc paths are not physical locations; you also need external hardware-specific knowledge to be able to talk about real physical locations in a way that does the system operator any good.) -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove silly beep macro from pgtable.h
At 7:36 PM +0200 2001-05-15, Mike Galbraith wrote: >On Tue, 15 May 2001, Jeff Golds wrote: > >> Hi folks, >> >> Found this bit of unused code in the i386 and sh architectures. >>As it's not being used, let's get rid of it. Also, pgtable.h seems >>to be an odd place for this. > >I'd leave it.. folks with early boot troubles might find it useful. > > -Mike Consider small rant about literal IO references to magic locations hereby ranted. Especially in header files completely unrelated to the IO function in question. -#define __beep() asm("movb $0x3,%al; outb %al,$0x61") Let's please not assume that every i386 implementation has a full set of legacy PC IO hardware. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove silly beep macro from pgtable.h
At 7:36 PM +0200 2001-05-15, Mike Galbraith wrote: On Tue, 15 May 2001, Jeff Golds wrote: Hi folks, Found this bit of unused code in the i386 and sh architectures. As it's not being used, let's get rid of it. Also, pgtable.h seems to be an odd place for this. I'd leave it.. folks with early boot troubles might find it useful. -Mike Consider small rant about literal IO references to magic locations hereby ranted. Especially in header files completely unrelated to the IO function in question. -#define __beep() asm(movb $0x3,%al; outb %al,$0x61) Let's please not assume that every i386 implementation has a full set of legacy PC IO hardware. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 11:15 AM -0700 2001-05-15, Linus Torvalds wrote: The part I absolutely detest is when the information becomes more than just information, and is used to enforce a world-view. Anybody who uses physical location for naming devices (ie you have to know where the hell the thing is in order to look it up), is so far out to lunch that it's not even funny. And the sad fact is that this is pretty much how ALL unixes have historically done things (Oh, you want to see the disk? Sure. It's on scsi bus 1, channel 2, ID 3, lun 0, so you just open /dev/s1c3l0 and you're done! Easy as pie!). Keep it informational. And NEVER EVER make it part of the design. What about: 1 (network domain). I have two network interfaces that I connect to two different network segments, eth0 eth1; they're ifconfig'd to the appropriate IP and MAC addresses. I really do need to know physically which (physical) hole to plug my eth0 cable into. (Extension: same situation, but it's a firewall and I've got 12 ports to connect.) (Extension #2: if I add a NIC to the system and reboot, I'd really prefer that the NICs already in use didn't get renumbered.) 2 (disk domain). I have multiple spindles on multiple SCSI adapters. I want to allocate them to more than one RAID0/1/5 set, with the usual considerations of putting mirrors on different adapters, spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI paths to config all this, and I further need real physical locations to identify failed drives that need to be hot-replaced. The mirror members will move around as drives are replaced and hot spares come into play. Seems like more that merely informational. (A side observation: PCI or SCSI bus/device/lun/etc paths are not physical locations; you also need external hardware-specific knowledge to be able to talk about real physical locations in a way that does the system operator any good.) -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 4:35 PM -0700 2001-05-15, David Brownell wrote: [ Re why physical device IDs _should_ have a critical role in sysadmin ] I would have to agree that stable is critical to not driving people crazy. In the case of AIX, once a device is enumerated, it will retain the same name across reboots. Enough information is kept about each device to determine if it has already been enumerated (i.e. same I/O port address for serial devices, MAC address for ethernet cards, etc), or if it is a new device and should get a new name. I caught those refs to how AIX does this ... sounds worth learning from. Does it handle USB port addresses (which bus and hub)? Solaris has a scheme that addresses the issue at well. Device nodes live in /devices (/dev has soft links into /devices) and have system-global-geographic names. In Solaris talk, the 0-1-2 of eth0-1-2 i an instance. There's a file /etc/pathtoinst that records the connection of an device instance to its /devices geographical name. It does keep naming stable, but can be a PITA at times when you're reconfiguring a system and *want* to renumber things. (There are magic ways to do it, though). That's all Solaris 2.6; not sure about 2.8. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 1:18 PM -0700 2001-05-15, Linus Torvalds wrote: 1 (network domain). I have two network interfaces that I connect to two different network segments, eth0 eth1; So? Informational. You can always ask what eth0 and eth1 are. There's another side to this: repeatability. A setup should be _repeatable_. This is what we have now. Network devices are called eth0..N, and nobody is complaining about the fact that the numbering is basically random. It is _repeatable_ as long as you don't change your hardware setup, and the numbering has effectively _nothing_ to do with location. You don't say oh, I have my network card in PCI bus #2, slot #3, subfunction #1, so I should do 'ifconfig netp2s3f1'. Right? The location of the device is _meaningless_. I *like* eth0..n (I'd like net0..n better). And I *can't* ask what eth0 and eth1 are, by the way, but I should be able to (Jeff Garzik has proposed an extension to ethtool to help out this lack, but it's not in Linux today, and needs concrete implementation anyway). But that's not my point. I'm *not* proposing that we exchange eth0 for geographic names. I'm suggesting, though, that the location of the device is *not* meaningless, because it's the physically-located RJ45 socket (or whatever) that I have to connect a particular cable to. Sure, no big deal for systems with a single connection, but it becomes a real pain when you've got a dozen, which is a reasonable number for some network-infrastructure functions (eg firewalls). When I ifconfig one of a collection of interfaces, I'm very much talking about the specific physical interface connected via a specific physical cable to a specific physical switch port. Bob Glamm is on the right track with At 5:35 PM -0500 2001-05-15, Bob Glamm wrote: # start up networking for i in eth0 eth1 eth2; do identify device $i get configuration/config procedure for device $i identity configure $i done ...it's just that right now the connection between eth* and its physical identity isn't made. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
At 9:34 PM -0400 2001-05-15, Nicolas Pitre wrote: On Wed, 16 May 2001, Daniel Phillips wrote: On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote: Personally, I'd really like to see /dev/ttyS0 be the first detected serial port on a system, /dev/ttyS1 the second, etc. There are well-defined rules for the first four on PC's. The ttySx better match the labels the OEM put on the box. Then just make them be detected first. Well, they traditionally start with 1, not 0, too. Or have cute little icons and no text. Or aren't labelled at all. I'm using one fairly well-known dual-port PCI serial board that silently interchanged the two ports on a rev change, with no labelling change at all ('cause there was no label!). Make your ttySx match *that*! -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Not a typewriter
>why creat doesn't end in an "e;" and so forth. I tell the Some time back, Ken Thompson was asked, if he had it to do over again, what changes he would make to Unix. The only thing he could think of: spell it "create()". -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 5:45 PM +0100 2001-05-13, Alan Cox wrote: > > What I was arguing (conceptually) is that something like >> #define ENOIOCTLCMD ENOTTY >> or preferably but more invasively s/ENOIOCTLCMD/ENOTTY/ (mutatis mutandis) >> >> would result in no loss of function. I assert that ENOIOCTLCMD is >> redundant, pending a specific counterexample. > >On the contrary. I can now no longer force an unsupported response when there >is a generic routine I dont wish to use That makes sense. Thanks. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 5:43 PM +0100 2001-05-12, Alan Cox wrote: > > That's what's confusing me: why the distinction? It's true that the >> current scheme allows the dev->ioctlfunc() call below to force ENOTTY >> to be returned, bypassing the switch, but presumably that's not what >> one wants. > >It allows driver specific code to override generic code, including >by reporting >that a given feature is not available/appropriate. > >Alan What I was arguing (conceptually) is that something like #define ENOIOCTLCMD ENOTTY or preferably but more invasively s/ENOIOCTLCMD/ENOTTY/ (mutatis mutandis) would result in no loss of function. I assert that ENOIOCTLCMD is redundant, pending a specific counterexample. -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 3:27 PM -0700 2001-05-12, Shane Wegner wrote: > >int err = dev->ioctlfunc(dev, op, arg); >> if( err != -ENOIOCTLCMD) >> return err; >> >> /* Driver specific code does not support this ioctl */ > >I noticed this return coming out of the watchdog driver a >while ago when I was playing with it. I have taken a quick >look and it seems a few drivers do return this directly to >userspace. I'm not sure if this is complete but ... Can't this be handled in sys_ioctl()? At the very end, replace out: return error; with out: return (error == -ENOIOCTLCMD) ? -ENOTTY : error; >diff -ur linux-2.4.4-ac8/drivers/block/swim3.c linux/drivers/block/swim3.c >--- linux-2.4.4-ac8/drivers/block/swim3.c Sat May 12 14:59:44 2001 >+++ linux/drivers/block/swim3.cSat May 12 15:22:30 2001 >@@ -848,7 +848,7 @@ > sizeof(struct floppy_struct)); > return err; > } >- return -ENOIOCTLCMD; >+ return -ENOTTY; > } -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 3:27 PM -0700 2001-05-12, Shane Wegner wrote: int err = dev-ioctlfunc(dev, op, arg); if( err != -ENOIOCTLCMD) return err; /* Driver specific code does not support this ioctl */ I noticed this return coming out of the watchdog driver a while ago when I was playing with it. I have taken a quick look and it seems a few drivers do return this directly to userspace. I'm not sure if this is complete but ... Can't this be handled in sys_ioctl()? At the very end, replace out: return error; with out: return (error == -ENOIOCTLCMD) ? -ENOTTY : error; diff -ur linux-2.4.4-ac8/drivers/block/swim3.c linux/drivers/block/swim3.c --- linux-2.4.4-ac8/drivers/block/swim3.c Sat May 12 14:59:44 2001 +++ linux/drivers/block/swim3.cSat May 12 15:22:30 2001 @@ -848,7 +848,7 @@ sizeof(struct floppy_struct)); return err; } - return -ENOIOCTLCMD; + return -ENOTTY; } -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 5:43 PM +0100 2001-05-12, Alan Cox wrote: That's what's confusing me: why the distinction? It's true that the current scheme allows the dev-ioctlfunc() call below to force ENOTTY to be returned, bypassing the switch, but presumably that's not what one wants. It allows driver specific code to override generic code, including by reporting that a given feature is not available/appropriate. Alan What I was arguing (conceptually) is that something like #define ENOIOCTLCMD ENOTTY or preferably but more invasively s/ENOIOCTLCMD/ENOTTY/ (mutatis mutandis) would result in no loss of function. I assert that ENOIOCTLCMD is redundant, pending a specific counterexample. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 5:45 PM +0100 2001-05-13, Alan Cox wrote: What I was arguing (conceptually) is that something like #define ENOIOCTLCMD ENOTTY or preferably but more invasively s/ENOIOCTLCMD/ENOTTY/ (mutatis mutandis) would result in no loss of function. I assert that ENOIOCTLCMD is redundant, pending a specific counterexample. On the contrary. I can now no longer force an unsupported response when there is a generic routine I dont wish to use That makes sense. Thanks. -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Not a typewriter
why creat doesn't end in an e; and so forth. I tell the Some time back, Ken Thompson was asked, if he had it to do over again, what changes he would make to Unix. The only thing he could think of: spell it create(). -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 12:16 PM +0100 2001-05-12, Alan Cox wrote: > > Can somebody explain the use of ENOIOCTLCMD? There are order of 170 >> uses in the kernel, but I don't see any guidelines for that use (nor >> what prevents it from being seen by user programs). > >It should never be seen by apps. If it can be then it is wrong code. >Basically you use it in things like I was surmising something like that, but in that case aren't ENOIOCTLCMD and ENOTTY redundant? That is, could not every occurrence of ENOIOCTLCMD be replaced by ENOTTY with no change in function? That's what's confusing me: why the distinction? It's true that the current scheme allows the dev->ioctlfunc() call below to force ENOTTY to be returned, bypassing the switch, but presumably that's not what one wants. > int err = dev->ioctlfunc(dev, op, arg); > if( err != -ENOIOCTLCMD) > return err; > > /* Driver specific code does not support this ioctl */ > > switch(op) > { > > ... > default: > return -ENOTTY; > } > >Its a way of passing back 'you handle it' >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to [EMAIL PROTECTED] >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ENOIOCTLCMD?
At 12:16 PM +0100 2001-05-12, Alan Cox wrote: Can somebody explain the use of ENOIOCTLCMD? There are order of 170 uses in the kernel, but I don't see any guidelines for that use (nor what prevents it from being seen by user programs). It should never be seen by apps. If it can be then it is wrong code. Basically you use it in things like I was surmising something like that, but in that case aren't ENOIOCTLCMD and ENOTTY redundant? That is, could not every occurrence of ENOIOCTLCMD be replaced by ENOTTY with no change in function? That's what's confusing me: why the distinction? It's true that the current scheme allows the dev-ioctlfunc() call below to force ENOTTY to be returned, bypassing the switch, but presumably that's not what one wants. int err = dev-ioctlfunc(dev, op, arg); if( err != -ENOIOCTLCMD) return err; /* Driver specific code does not support this ioctl */ switch(op) { ... default: return -ENOTTY; } Its a way of passing back 'you handle it' - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- /Jonathan Lundell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ENOIOCTLCMD?
Can somebody explain the use of ENOIOCTLCMD? There are order of 170 uses in the kernel, but I don't see any guidelines for that use (nor what prevents it from being seen by user programs). Thanks. errno.h: >/* Should never be seen by user programs */ >#define ERESTARTSYS512 >#define ERESTARTNOINTR 513 >#define ERESTARTNOHAND 514 /* restart if no handler.. */ >#define ENOIOCTLCMD515 /* No ioctl command */ -- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/