Re: [RFC] killing the NR_IRQS arrays.
> > Similarly, in a pci device, one could imagine that the > > struct pci_driver contains a irq_handler_t member that > > is registered from the pci_device_probe() function > > if present. > > Yes. There is some potential there. Although we would have to go > through an extra hoop to make it a pci specific handler type. Beware with that approach though. If you are on a shared IRQ line, when do you start getting called when an IRQ happen (possibly for the "other" device) ? As soon as you are bound to the device ? But that means potentially before the driver internal data structures are fully initialized... I like the driver having in control the "hooking" of the irq handler, thus, when it starts being capable of handling interrupts (even if they aren't initiated by that driver's device). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Similarly, in a pci device, one could imagine that the struct pci_driver contains a irq_handler_t member that is registered from the pci_device_probe() function if present. Yes. There is some potential there. Although we would have to go through an extra hoop to make it a pci specific handler type. Beware with that approach though. If you are on a shared IRQ line, when do you start getting called when an IRQ happen (possibly for the other device) ? As soon as you are bound to the device ? But that means potentially before the driver internal data structures are fully initialized... I like the driver having in control the hooking of the irq handler, thus, when it starts being capable of handling interrupts (even if they aren't initiated by that driver's device). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Arnd Bergmann <[EMAIL PROTECTED]> writes: > > I have to admit I still don't really understand how this works > at all. Can a driver that uses msi-x have different handlers > for each of those interrupts registered simultaneously? Yes, and the irqs can be routed at different cpus independently. However not all hardware supports all 4K irqs. 4K is the implementation defined maximu. Although infiniband HCA's are rumored to support a lot of irqs and it isn't uncommon for simpler nics to support 4 or so. Conceptually think of it as having an irq controller embedded in your pci device. The MSI messages are writes to special addresses that are then converted into CPU interrupts. > I would expect that instead there should be only one 'struct irq' > for the device, with the handler getting a 12 bit number argument. No. That would be unnecessary coalescing. Even if that was what the hardware layer gave us the (and it doesn't) the generic layers should demux these things as much as is reasonable. >> > s390: got rid of irq numbers already >> >> Yes. I should really look at that more and see if I could bring >> s390 into the generic irq code with my planned changes. > > I don't think there is much point in changing the s390 code, but > the way it is solved there may be interesting for other buses > as well. The interrupt handler there is not being registered > explicitly, but is part of the driver (in case of subchannel) > or of the device (in case of ccw_device) data structure. > > Similarly, in a pci device, one could imagine that the > struct pci_driver contains a irq_handler_t member that > is registered from the pci_device_probe() function > if present. Yes. There is some potential there. Although we would have to go through an extra hoop to make it a pci specific handler type. >> > Note that we can even start converting device drivers first, before >> > moving away from irq numbers. A typical PCI driver should get >> > somewhat simpler by the conversion, and when they are all converted, >> > we can replace pci_dev->irq with a struct irq* under the covers. >> >> Reasonable if it is easy and straight forward. >> Something like pci_request_irq(dev,) and the helper looks at >> dev->irq under the covers and calls request_irq or whatever makes >> sense. Is this what you are thinking. Examples would help me here. > > Ok, I had an example in on of my previous posts, but based on the > discussion since then, it has become significantly simpler, basically > reducing the work to > > struct irq *pci_irq_request(struct pci_device *dev, > irq_handler_t handler) > { > if (!dev->irq) > return -ENODEV; > > return irq_request(irq, handler, IRQF_SHARED, > >driver->name, dev); > } > int pci_irq_free(struct pci_device *dev) > { > return irq_free(dev->irq, dev); > } > > The most significant change of this to the current code > would be that we can pass arguments down to irq_request > automatically, e.g. the irq handler can always get the > pci_device as its dev_id. Yes. Mostly. Since dev_id is what is passed back to the irq handler, it makes sense to pass the device when the irq is registered. Passing the driver a pointer to the driver specific structure (not the pci device) make a lot more of sense from an efficiency standpoint. Now it may make sense to remove the irq parameter from irq_handler_t, and require drivers to look at their dev_id to see which irq they really are processing. There is a danger here of making things so generic you don't have good performance, or the code becomes unnecessarily complex. >> For talking to user space I expect we will have numbers for a long time >> to come yet. > > I was wondering about that. Do you only mean /proc/interrupts or > are there other user interfaces we need to worry about? Yes. There are other interfaces like /proc/irq/XXX/smp_affinity, for irq migration. There are also device specific ioctls. There is lspci. I don't know what all else, and given the current state of the kernel it is hard to grep for. > For /proc/interrupts, what could break if we have interrupt numbers > only local to each controller and potentially duplicate numbers > in the list? It's good to be paranoid about changes to proc files, > but I can definitely see value in having meaningful interrupt > numbers in there instead of making up a more or less random mapping > to a flat number space. Well I can have meaning full flat numbers and on i386 and x86_64 except for msi I have that. The problem is that for the numbers to have meaning I get a very sparse usage of the numbers, because very frequently the hardware interrupt controllers has pins that are not connected up, so I have about an order of magnitude more numbers then I have actually irqs in use. That is before I start reserving irq numbers for MSI. For MSI (since they cannot be shared) it would actually make a lot of sense to
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: >> What I really object to is not the irq numbers. As an arbitrary number >> does not impose limits. What I object to is drivers that can't handle the >> full range of numbers, and the limits imposed upon those numbers when >> you require them to be indexes into an array. >> >> For talking to user space I expect we will have numbers for a long time >> to come yet. > > I wouldn't bother too much about going into bus specific bits like > irq_request(dev, ...). Well, actually, I _do_ think it's a good thing to > pass the struct device to irq_request but that's a different issue > completely. Well irq_request is probably a bit late for associating an irq with a struct device. And I don't see how to get the device name from that but making that association wouldn't be a bad thing. > I think bus types should provide bus specific helpers to obtain the > struct irq *'s for a given device on that bus, but the API for > requesting/freeing them shall remain generic. Yes. But if you can do a good job of wrapping them so a driver wouldn't need to care at all that could help simplify drivers and remove one more tidbit of complexity from drivers. However for a widespread change. The less you have to think about it the more quickly you can get it completed. So I would rather do several wide spread changes in succession, that I don't have to think about much to do (i.e. the change mostly meets the obviously correct requirement). Then to do one single change that I have to think about harder to accomplish. The more thinking comes into the picture the more you open yourself up to breaking something by accident because it wasn't clear how the code should be changed, and the more it slows down the conversion. Conversions where we have had to think about things are notoriously slow to complete. Even if they are a good thing to do. The examples I can think of are the kthread API and the DMA api. Last I checked there were still a few drivers in the kernel using virt_to_bus... I don't really get the benefits I'm after unless the conversion can actually be completed for everything. So the more I have to think about any one piece the less I intend to do it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
pci: each device/function has a unique irq, drivers need not know about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? Yes. It doesn't have to, though. I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. Why? The device really generates many different interrupts, why hide this fact. For talking to user space I expect we will have numbers for a long time to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? There's the IRQ affinity stuff too. For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Duplicate all this stuff into /sys in a sane format (*) and wait until userland catches up, then throw away the /proc interfaces. It'll take a while, and until that day you will have to keep *some* interrupt number <-> interrupt bijection. Userland tools that think they know what interrupt number should be what are dead already. (*) i.e., exposing the interrupt tree as a tree, cascaded controllers and all. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Wednesday 28 February 2007, Eric W. Biederman wrote: > Arnd Bergmann <[EMAIL PROTECTED]> writes: > > > > > Introducing the irq_request() etc. functions that take a struct irq* > > instead of an int sounds good, but I'd hope we can avoid using those > > in device drivers and do a separate abstraction for each bus_type > > that deals with interrupts. I'm not sure if that's possible for > > each bus_type, but the ones I have worked with in the past should > > allow that: > > > > pci: each device/function has a unique irq, drivers need not know > > about it afaics. > Then there is msi and with msi-x you can have up to 4K irqs. I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. > > s390: got rid of irq numbers already > > Yes. I should really look at that more and see if I could bring > s390 into the generic irq code with my planned changes. I don't think there is much point in changing the s390 code, but the way it is solved there may be interesting for other buses as well. The interrupt handler there is not being registered explicitly, but is part of the driver (in case of subchannel) or of the device (in case of ccw_device) data structure. Similarly, in a pci device, one could imagine that the struct pci_driver contains a irq_handler_t member that is registered from the pci_device_probe() function if present. > > Note that we can even start converting device drivers first, before > > moving away from irq numbers. A typical PCI driver should get > > somewhat simpler by the conversion, and when they are all converted, > > we can replace pci_dev->irq with a struct irq* under the covers. > > Reasonable if it is easy and straight forward. > Something like pci_request_irq(dev,) and the helper looks at > dev->irq under the covers and calls request_irq or whatever makes > sense. Is this what you are thinking. Examples would help me here. Ok, I had an example in on of my previous posts, but based on the discussion since then, it has become significantly simpler, basically reducing the work to struct irq *pci_irq_request(struct pci_device *dev, irq_handler_t handler) { if (!dev->irq) return -ENODEV; return irq_request(irq, handler, IRQF_SHARED, >driver->name, dev); } int pci_irq_free(struct pci_device *dev) { return irq_free(dev->irq, dev); } The most significant change of this to the current code would be that we can pass arguments down to irq_request automatically, e.g. the irq handler can always get the pci_device as its dev_id. > For talking to user space I expect we will have numbers for a long time > to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> What I really object to is not the irq numbers. As an arbitrary number > does not impose limits. What I object to is drivers that can't handle the > full range of numbers, and the limits imposed upon those numbers when > you require them to be indexes into an array. > > For talking to user space I expect we will have numbers for a long time > to come yet. I wouldn't bother too much about going into bus specific bits like irq_request(dev, ...). Well, actually, I _do_ think it's a good thing to pass the struct device to irq_request but that's a different issue completely. I think bus types should provide bus specific helpers to obtain the struct irq *'s for a given device on that bus, but the API for requesting/freeing them shall remain generic. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
What I really object to is not the irq numbers. As an arbitrary number does not impose limits. What I object to is drivers that can't handle the full range of numbers, and the limits imposed upon those numbers when you require them to be indexes into an array. For talking to user space I expect we will have numbers for a long time to come yet. I wouldn't bother too much about going into bus specific bits like irq_request(dev, ...). Well, actually, I _do_ think it's a good thing to pass the struct device to irq_request but that's a different issue completely. I think bus types should provide bus specific helpers to obtain the struct irq *'s for a given device on that bus, but the API for requesting/freeing them shall remain generic. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Wednesday 28 February 2007, Eric W. Biederman wrote: Arnd Bergmann [EMAIL PROTECTED] writes: Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. s390: got rid of irq numbers already Yes. I should really look at that more and see if I could bring s390 into the generic irq code with my planned changes. I don't think there is much point in changing the s390 code, but the way it is solved there may be interesting for other buses as well. The interrupt handler there is not being registered explicitly, but is part of the driver (in case of subchannel) or of the device (in case of ccw_device) data structure. Similarly, in a pci device, one could imagine that the struct pci_driver contains a irq_handler_t member that is registered from the pci_device_probe() function if present. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Reasonable if it is easy and straight forward. Something like pci_request_irq(dev,) and the helper looks at dev-irq under the covers and calls request_irq or whatever makes sense. Is this what you are thinking. Examples would help me here. Ok, I had an example in on of my previous posts, but based on the discussion since then, it has become significantly simpler, basically reducing the work to struct irq *pci_irq_request(struct pci_device *dev, irq_handler_t handler) { if (!dev-irq) return -ENODEV; return irq_request(irq, handler, IRQF_SHARED, dev-driver-name, dev); } int pci_irq_free(struct pci_device *dev) { return irq_free(dev-irq, dev); } The most significant change of this to the current code would be that we can pass arguments down to irq_request automatically, e.g. the irq handler can always get the pci_device as its dev_id. For talking to user space I expect we will have numbers for a long time to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
pci: each device/function has a unique irq, drivers need not know about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? Yes. It doesn't have to, though. I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. Why? The device really generates many different interrupts, why hide this fact. For talking to user space I expect we will have numbers for a long time to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? There's the IRQ affinity stuff too. For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Duplicate all this stuff into /sys in a sane format (*) and wait until userland catches up, then throw away the /proc interfaces. It'll take a while, and until that day you will have to keep *some* interrupt number - interrupt bijection. Userland tools that think they know what interrupt number should be what are dead already. (*) i.e., exposing the interrupt tree as a tree, cascaded controllers and all. Segher - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: What I really object to is not the irq numbers. As an arbitrary number does not impose limits. What I object to is drivers that can't handle the full range of numbers, and the limits imposed upon those numbers when you require them to be indexes into an array. For talking to user space I expect we will have numbers for a long time to come yet. I wouldn't bother too much about going into bus specific bits like irq_request(dev, ...). Well, actually, I _do_ think it's a good thing to pass the struct device to irq_request but that's a different issue completely. Well irq_request is probably a bit late for associating an irq with a struct device. And I don't see how to get the device name from that but making that association wouldn't be a bad thing. I think bus types should provide bus specific helpers to obtain the struct irq *'s for a given device on that bus, but the API for requesting/freeing them shall remain generic. Yes. But if you can do a good job of wrapping them so a driver wouldn't need to care at all that could help simplify drivers and remove one more tidbit of complexity from drivers. However for a widespread change. The less you have to think about it the more quickly you can get it completed. So I would rather do several wide spread changes in succession, that I don't have to think about much to do (i.e. the change mostly meets the obviously correct requirement). Then to do one single change that I have to think about harder to accomplish. The more thinking comes into the picture the more you open yourself up to breaking something by accident because it wasn't clear how the code should be changed, and the more it slows down the conversion. Conversions where we have had to think about things are notoriously slow to complete. Even if they are a good thing to do. The examples I can think of are the kthread API and the DMA api. Last I checked there were still a few drivers in the kernel using virt_to_bus... I don't really get the benefits I'm after unless the conversion can actually be completed for everything. So the more I have to think about any one piece the less I intend to do it. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Arnd Bergmann [EMAIL PROTECTED] writes: I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? Yes, and the irqs can be routed at different cpus independently. However not all hardware supports all 4K irqs. 4K is the implementation defined maximu. Although infiniband HCA's are rumored to support a lot of irqs and it isn't uncommon for simpler nics to support 4 or so. Conceptually think of it as having an irq controller embedded in your pci device. The MSI messages are writes to special addresses that are then converted into CPU interrupts. I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. No. That would be unnecessary coalescing. Even if that was what the hardware layer gave us the (and it doesn't) the generic layers should demux these things as much as is reasonable. s390: got rid of irq numbers already Yes. I should really look at that more and see if I could bring s390 into the generic irq code with my planned changes. I don't think there is much point in changing the s390 code, but the way it is solved there may be interesting for other buses as well. The interrupt handler there is not being registered explicitly, but is part of the driver (in case of subchannel) or of the device (in case of ccw_device) data structure. Similarly, in a pci device, one could imagine that the struct pci_driver contains a irq_handler_t member that is registered from the pci_device_probe() function if present. Yes. There is some potential there. Although we would have to go through an extra hoop to make it a pci specific handler type. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Reasonable if it is easy and straight forward. Something like pci_request_irq(dev,) and the helper looks at dev-irq under the covers and calls request_irq or whatever makes sense. Is this what you are thinking. Examples would help me here. Ok, I had an example in on of my previous posts, but based on the discussion since then, it has become significantly simpler, basically reducing the work to struct irq *pci_irq_request(struct pci_device *dev, irq_handler_t handler) { if (!dev-irq) return -ENODEV; return irq_request(irq, handler, IRQF_SHARED, dev-driver-name, dev); } int pci_irq_free(struct pci_device *dev) { return irq_free(dev-irq, dev); } The most significant change of this to the current code would be that we can pass arguments down to irq_request automatically, e.g. the irq handler can always get the pci_device as its dev_id. Yes. Mostly. Since dev_id is what is passed back to the irq handler, it makes sense to pass the device when the irq is registered. Passing the driver a pointer to the driver specific structure (not the pci device) make a lot more of sense from an efficiency standpoint. Now it may make sense to remove the irq parameter from irq_handler_t, and require drivers to look at their dev_id to see which irq they really are processing. There is a danger here of making things so generic you don't have good performance, or the code becomes unnecessarily complex. For talking to user space I expect we will have numbers for a long time to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? Yes. There are other interfaces like /proc/irq/XXX/smp_affinity, for irq migration. There are also device specific ioctls. There is lspci. I don't know what all else, and given the current state of the kernel it is hard to grep for. For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Well I can have meaning full flat numbers and on i386 and x86_64 except for msi I have that. The problem is that for the numbers to have meaning I get a very sparse usage of the numbers, because very frequently the hardware interrupt controllers has pins that are not connected up, so I have about an order of magnitude more numbers then I have actually irqs in use. That is before I start reserving irq numbers for MSI. For MSI (since they cannot be shared) it would actually make a lot of sense to make the numbers domain,bus,device,func,(Nth device irq) but I can't because
Re: [RFC] killing the NR_IRQS arrays.
Arnd Bergmann <[EMAIL PROTECTED]> writes: > > Introducing the irq_request() etc. functions that take a struct irq* > instead of an int sounds good, but I'd hope we can avoid using those > in device drivers and do a separate abstraction for each bus_type > that deals with interrupts. I'm not sure if that's possible for > each bus_type, but the ones I have worked with in the past should > allow that: > > pci: each device/function has a unique irq, drivers need not know > about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. > isa/pnp: numbers from 1 to 15 are the right abstraction here, that > how isa has worked for ages. There is some truth there yes. But for ISA the numbers are really 0 to 15. > s390: got rid of irq numbers already Yes. I should really look at that more and see if I could bring s390 into the generic irq code with my planned changes. > ofw: an open firmware device can have a number of interrupts, but > like PCI, the driver only needs to know things like 'first > irq of this device', not how it's connected Yes. > ps3: irqs are requested from the firmware for each device, this > can happen under the covers. > mmc, usb, phy, ieee1394: these already have a higl-level abstraction > for interrupt events > platform: dunno, probably these really should use the struct irq > directly > eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI. Largely for pci we already have dev->irq and the device just calls request_irq to get things going. The challenge is that the token in dev->irq get's looked at. > Note that we can even start converting device drivers first, before > moving away from irq numbers. A typical PCI driver should get > somewhat simpler by the conversion, and when they are all converted, > we can replace pci_dev->irq with a struct irq* under the covers. Reasonable if it is easy and straight forward. Something like pci_request_irq(dev,) and the helper looks at dev->irq under the covers and calls request_irq or whatever makes sense. Is this what you are thinking. Examples would help me here. What I really object to is not the irq numbers. As an arbitrary number does not impose limits. What I object to is drivers that can't handle the full range of numbers, and the limits imposed upon those numbers when you require them to be indexes into an array. For talking to user space I expect we will have numbers for a long time to come yet. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Tuesday 27 February 2007, Eric W. Biederman wrote: > * Add a variation of the API in interrupt.h that uses > "struct irq *irq" instead of "unsigned int irq" > > Probably replacing request_irq with irq_request or something > trivial like that. > > This will need to touch all of different irq implementation back > ends, but only very lightly. > > * Convert the generic irq code to use struct irq * everywhere it > current uses "unsigned int irq". > > * Start on the conversions of drivers and subsystems picking on > the easy ones first :) Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. isa/pnp: numbers from 1 to 15 are the right abstraction here, that how isa has worked for ages. s390: got rid of irq numbers already ofw: an open firmware device can have a number of interrupts, but like PCI, the driver only needs to know things like 'first irq of this device', not how it's connected ps3: irqs are requested from the firmware for each device, this can happen under the covers. mmc, usb, phy, ieee1394: these already have a higl-level abstraction for interrupt events platform: dunno, probably these really should use the struct irq directly eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev->irq with a struct irq* under the covers. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
A quick update. I did some work on this and have some observations. - Every back end irq implementation seems to have a different name for the structure that describes irqs. So picking struct irq which is different from everything seems to make sense. At the very least a empty struct irq can be embedded in the architecture specific irq description structure and we can use container_of to get back to it. - The name struct irq conflicts with a spurious definition in include/pcmcia/cs.h, but otherwise is fine. So it probably makes sense to use. - Updating all of the drivers is going to be a pain for precisely the reason we need to update the drivers. irq numbers are not handled at all cleanly. In attempting just a few conversions I have seen irq number stashed in everything for a u8 to a u64. I have seen all values <= 0 thrown out as invalid. So we have a decade or more of accumulated inconsistencies and it is getting painful to work with. Changing the type to something that won't support all of the old hacks should help cleanup the code. - Because drivers are not consistent in their handling of irq numbers whatever path I take to a conversion each patch needs to be thought about instead of just performed. So I'm going to have to double the API so a gradual conversion is possible. - Converting the genirq code to use struct irq_desc pointers throughout (instead of unsigned int irq) is straight forward and mindless. Though it is tedious. Like I expected the drivers to be :( So it looks like all I need to do to convert the genirq backend is to break the work up into small enough patches that each patch is obviously correct. Then compiling on the different architectures can just serve as a spot check, not as absolutely required step during the code conversion. - The converted genirq code was short and easier to follow. Mostly because I got to kill all of the if (irq >= NR_IRQS) tests... Null pointer dereferences are your friends! - The are only 3 or 4 arrays of size NR_IRQS in non-architecture code and I have patches in my queue to kill them, so that isn't too bad. - All of the drivers that handle irqs need to be touched because one of the parameters to the irq handler is the interrupt number. So that needs to be converted. So I think the path should be: * Kill the arrays of size NR_IRQS in non-arch code. * Add a variation of the API in interrupt.h that uses "struct irq *irq" instead of "unsigned int irq" Probably replacing request_irq with irq_request or something trivial like that. This will need to touch all of different irq implementation back ends, but only very lightly. * Convert the generic irq code to use struct irq * everywhere it current uses "unsigned int irq". * Start on the conversions of drivers and subsystems picking on the easy ones first :) * Adding for_each_irq_desc() and similar helpers to the generic irq code. * Add support in the generic irq code for architectures that don't have a giant array of irqs. * Convert x86_64 and i386 to dynamically allocate their irqs. Routines using the old interfaces will be no longer O(1) more likely O(N). So will be slow but request_irq and free_irq are no where near the fast path so it doesn't matter. enable_irq and disable_irq are the only cases that might matter and they occur rarely enough fixing the drivers that matter should not be a problem. * Ultimately finish converting all of the drivers and remove the compatibility cruft. I will look at getting things started and some patches into -mm sometime next month. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
A quick update. I did some work on this and have some observations. - Every back end irq implementation seems to have a different name for the structure that describes irqs. So picking struct irq which is different from everything seems to make sense. At the very least a empty struct irq can be embedded in the architecture specific irq description structure and we can use container_of to get back to it. - The name struct irq conflicts with a spurious definition in include/pcmcia/cs.h, but otherwise is fine. So it probably makes sense to use. - Updating all of the drivers is going to be a pain for precisely the reason we need to update the drivers. irq numbers are not handled at all cleanly. In attempting just a few conversions I have seen irq number stashed in everything for a u8 to a u64. I have seen all values = 0 thrown out as invalid. So we have a decade or more of accumulated inconsistencies and it is getting painful to work with. Changing the type to something that won't support all of the old hacks should help cleanup the code. - Because drivers are not consistent in their handling of irq numbers whatever path I take to a conversion each patch needs to be thought about instead of just performed. So I'm going to have to double the API so a gradual conversion is possible. - Converting the genirq code to use struct irq_desc pointers throughout (instead of unsigned int irq) is straight forward and mindless. Though it is tedious. Like I expected the drivers to be :( So it looks like all I need to do to convert the genirq backend is to break the work up into small enough patches that each patch is obviously correct. Then compiling on the different architectures can just serve as a spot check, not as absolutely required step during the code conversion. - The converted genirq code was short and easier to follow. Mostly because I got to kill all of the if (irq = NR_IRQS) tests... Null pointer dereferences are your friends! - The are only 3 or 4 arrays of size NR_IRQS in non-architecture code and I have patches in my queue to kill them, so that isn't too bad. - All of the drivers that handle irqs need to be touched because one of the parameters to the irq handler is the interrupt number. So that needs to be converted. So I think the path should be: * Kill the arrays of size NR_IRQS in non-arch code. * Add a variation of the API in interrupt.h that uses struct irq *irq instead of unsigned int irq Probably replacing request_irq with irq_request or something trivial like that. This will need to touch all of different irq implementation back ends, but only very lightly. * Convert the generic irq code to use struct irq * everywhere it current uses unsigned int irq. * Start on the conversions of drivers and subsystems picking on the easy ones first :) * Adding for_each_irq_desc() and similar helpers to the generic irq code. * Add support in the generic irq code for architectures that don't have a giant array of irqs. * Convert x86_64 and i386 to dynamically allocate their irqs. Routines using the old interfaces will be no longer O(1) more likely O(N). So will be slow but request_irq and free_irq are no where near the fast path so it doesn't matter. enable_irq and disable_irq are the only cases that might matter and they occur rarely enough fixing the drivers that matter should not be a problem. * Ultimately finish converting all of the drivers and remove the compatibility cruft. I will look at getting things started and some patches into -mm sometime next month. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Tuesday 27 February 2007, Eric W. Biederman wrote: * Add a variation of the API in interrupt.h that uses struct irq *irq instead of unsigned int irq Probably replacing request_irq with irq_request or something trivial like that. This will need to touch all of different irq implementation back ends, but only very lightly. * Convert the generic irq code to use struct irq * everywhere it current uses unsigned int irq. * Start on the conversions of drivers and subsystems picking on the easy ones first :) Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. isa/pnp: numbers from 1 to 15 are the right abstraction here, that how isa has worked for ages. s390: got rid of irq numbers already ofw: an open firmware device can have a number of interrupts, but like PCI, the driver only needs to know things like 'first irq of this device', not how it's connected ps3: irqs are requested from the firmware for each device, this can happen under the covers. mmc, usb, phy, ieee1394: these already have a higl-level abstraction for interrupt events platform: dunno, probably these really should use the struct irq directly eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Arnd Bergmann [EMAIL PROTECTED] writes: Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. isa/pnp: numbers from 1 to 15 are the right abstraction here, that how isa has worked for ages. There is some truth there yes. But for ISA the numbers are really 0 to 15. s390: got rid of irq numbers already Yes. I should really look at that more and see if I could bring s390 into the generic irq code with my planned changes. ofw: an open firmware device can have a number of interrupts, but like PCI, the driver only needs to know things like 'first irq of this device', not how it's connected Yes. ps3: irqs are requested from the firmware for each device, this can happen under the covers. mmc, usb, phy, ieee1394: these already have a higl-level abstraction for interrupt events platform: dunno, probably these really should use the struct irq directly eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI. Largely for pci we already have dev-irq and the device just calls request_irq to get things going. The challenge is that the token in dev-irq get's looked at. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Reasonable if it is easy and straight forward. Something like pci_request_irq(dev,) and the helper looks at dev-irq under the covers and calls request_irq or whatever makes sense. Is this what you are thinking. Examples would help me here. What I really object to is not the irq numbers. As an arbitrary number does not impose limits. What I object to is drivers that can't handle the full range of numbers, and the limits imposed upon those numbers when you require them to be indexes into an array. For talking to user space I expect we will have numbers for a long time to come yet. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Sun, 2007-02-18 at 22:24 +0100, Arjan van de Ven wrote: > On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: > > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > > > > > So I propose we remove all assumptions from the code that we actually > > > have an array of irqs. That will allow for irq_desc to be dynamically > > > allocated instead of statically allocated saving memory and reducing > > > kernel complexity. > > > > hm. I'd suggest to do this without changing request_irq() - and then we > > could avoid the 'massive, every driver affected' change, right? > > if request_irq() changes we might as well make a variant that takes a > PCI device struct rather than a number, for the 99% of PCI drivers that > use that.. (and then msi and other stuff becomes simpler :) As a matter of fact, if IRQs has to be handled properly as resources of their respective devices, I think request_irq replacement should take a struct device... In fact, having IRQs hanging off their respective devices would give a proper way to access them via sysfs and implement the affinity etc... thus providing a long term replacement for the current number based APIs. In addition, to facilitate the job of things like IRQ balancing daemons, a /sys/irqs/ could be created containing symlinks to all irqs in the system. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > > > So I propose we remove all assumptions from the code that we actually > > have an array of irqs. That will allow for irq_desc to be dynamically > > allocated instead of statically allocated saving memory and reducing > > kernel complexity. > > hm. I'd suggest to do this without changing request_irq() - and then we > could avoid the 'massive, every driver affected' change, right? if request_irq() changes we might as well make a variant that takes a PCI device struct rather than a number, for the 99% of PCI drivers that use that.. (and then msi and other stuff becomes simpler :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> Except for the what appears to be instability of the irq numbers on > simpler configurations I don't have a problem with it. I agree that's a bit annoying and I beleive it can be fixed. In additionm I'd like to look into exposing the domain/HW number -> virq mapping somewhere in sysfs maybe. > Until we find a solution for the user space side of things we seem to > need the unsigned int irq number for user space. Now I don't want > people mapping back and forth which is why I don't intend to provide a > reverse function. Ok. > But of course there will be a for_each_irq in the genirq layer so if > people really want to they will be able to go from the linux irq to > an irq_desc. But we don't have to export that generically (except > possibly something for the isa irqs). Ok. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> Because I don't have something better to replace them with. > > We need names for irqs, currently the kernel/user space interface > is a unsigned number. .../... Ok, as long as it's strictly a userspace issue, I understand. > The model can be made to work if you force it but it isn't really > a good fit. > > I can't really use the (cpu#, vector#) tuple as hw number as it > varies at runtime, and a single interrupt can send different (cpu#, > vector#) tuples from one interrupt message to the next without being > reprogrammed. At least I don't have the impression that you support > multiple hardware numbers going to the same linux irq. But this > really is the layer where I need the reverse mapping. However I can > optimize the reverse mapping by taking advantage of the per cpu > nature. > > Currently the hardware number that I use is the pin number on > the ioapic. And to form the linux irq I just add the number > of pins of all previous ioapics together and then add my pin number. > Fairly simple. Ok. > Doing the above gives me stable names that are the same from one boot > to the next if someone doesn't change how the hardware is put > together. It looks to me that if I adapt the ppc scheme my irq > numbers will change from one boot to the next one kernel to the next, > almost at random. Not necessarily. The current code would though in practice, it doesn't change much, but as I said, it would be trivial to change it so that a domain using a linear reverse map is fully "allocated" when initialized. Stable numbers are somewhat useful for users, thus one of the thing the ppc code tries to do in order to get "mostly" stable numbers/names as well is to try to use the HW number as a "starting" value when searching for a linux irq number to assign. Only if it's not possible (the HW number is in the reserved ISA range, too big, or the matching linux number is already used) then it will allocate a new one. I still think I need to add the domain and HW number to /proc/interrupts though (or to some other sysfs file) in order to provide the mappign to virqs to userland. Also, if the eixsting linear and radix tree reverse maps don't fit your needs, you can create a new one or request no reverse map from the core at all and do everything in your arch code. Basically, what I'm saying is that I'd like the concept of domain & generic remappers to become generic. sparc64 and ppc use it and I'm convinced it would be useful to other especially archs with lots of cascaded PICs like ARM. And by doing so, we can also make it easier to expose the HW -> virq mapping to userland in a common place with a consistent format since it would be done by generic code. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Because I don't have something better to replace them with. We need names for irqs, currently the kernel/user space interface is a unsigned number. .../... Ok, as long as it's strictly a userspace issue, I understand. The model can be made to work if you force it but it isn't really a good fit. I can't really use the (cpu#, vector#) tuple as hw number as it varies at runtime, and a single interrupt can send different (cpu#, vector#) tuples from one interrupt message to the next without being reprogrammed. At least I don't have the impression that you support multiple hardware numbers going to the same linux irq. But this really is the layer where I need the reverse mapping. However I can optimize the reverse mapping by taking advantage of the per cpu nature. Currently the hardware number that I use is the pin number on the ioapic. And to form the linux irq I just add the number of pins of all previous ioapics together and then add my pin number. Fairly simple. Ok. Doing the above gives me stable names that are the same from one boot to the next if someone doesn't change how the hardware is put together. It looks to me that if I adapt the ppc scheme my irq numbers will change from one boot to the next one kernel to the next, almost at random. Not necessarily. The current code would though in practice, it doesn't change much, but as I said, it would be trivial to change it so that a domain using a linear reverse map is fully allocated when initialized. Stable numbers are somewhat useful for users, thus one of the thing the ppc code tries to do in order to get mostly stable numbers/names as well is to try to use the HW number as a starting value when searching for a linux irq number to assign. Only if it's not possible (the HW number is in the reserved ISA range, too big, or the matching linux number is already used) then it will allocate a new one. I still think I need to add the domain and HW number to /proc/interrupts though (or to some other sysfs file) in order to provide the mappign to virqs to userland. Also, if the eixsting linear and radix tree reverse maps don't fit your needs, you can create a new one or request no reverse map from the core at all and do everything in your arch code. Basically, what I'm saying is that I'd like the concept of domain generic remappers to become generic. sparc64 and ppc use it and I'm convinced it would be useful to other especially archs with lots of cascaded PICs like ARM. And by doing so, we can also make it easier to expose the HW - virq mapping to userland in a common place with a consistent format since it would be done by generic code. Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Except for the what appears to be instability of the irq numbers on simpler configurations I don't have a problem with it. I agree that's a bit annoying and I beleive it can be fixed. In additionm I'd like to look into exposing the domain/HW number - virq mapping somewhere in sysfs maybe. Until we find a solution for the user space side of things we seem to need the unsigned int irq number for user space. Now I don't want people mapping back and forth which is why I don't intend to provide a reverse function. Ok. But of course there will be a for_each_irq in the genirq layer so if people really want to they will be able to go from the linux irq to an irq_desc. But we don't have to export that generically (except possibly something for the isa irqs). Ok. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: * Eric W. Biederman [EMAIL PROTECTED] wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? if request_irq() changes we might as well make a variant that takes a PCI device struct rather than a number, for the 99% of PCI drivers that use that.. (and then msi and other stuff becomes simpler :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Sun, 2007-02-18 at 22:24 +0100, Arjan van de Ven wrote: On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: * Eric W. Biederman [EMAIL PROTECTED] wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? if request_irq() changes we might as well make a variant that takes a PCI device struct rather than a number, for the 99% of PCI drivers that use that.. (and then msi and other stuff becomes simpler :) As a matter of fact, if IRQs has to be handled properly as resources of their respective devices, I think request_irq replacement should take a struct device... In fact, having IRQs hanging off their respective devices would give a proper way to access them via sysfs and implement the affinity etc... thus providing a long term replacement for the current number based APIs. In addition, to facilitate the job of things like IRQ balancing daemons, a /sys/irqs/ could be created containing symlinks to all irqs in the system. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > On Sat, 2007-02-17 at 02:06 -0700, Eric W. Biederman wrote: > However, PowerPC is a good example because it has such a diversity of > very different hardware setups to deal with, ranging from the multiple > layers of cascading controllers all over the place, to interrupts > packets encoding vector/target etc... a bit like x86 on cell, to > hypervisors providing a single giant number space etc etc etc... > > Thus, it is extremely likely that something that works well for PowerPC > (or for ARM for that matter as it's probably as a "colorful" environment > as PowerPC is) will end up being useful for others. Sure I agree. Part of what I'm trying to say is that it appears that basic interrupt handling assumptions seem to be inherent to the architectures. And as much as it surprises me because of basic assumptions I don't think there is any architecture with every flavor of color. >> I have a version of the x86 code with a partial conversion done and >> I didn't need a reverse mapping. What you call the hardware interrupt >> number never happens to be interesting to me after the system is setup. > > Because you have the ability to tell your PIC to give you your "linux" > interrupt number when actually sending the interrupt to the processor ? > You need a way to get to the irq_desc * when getting an IRQ, either you > have a way to map HW numbers back to irq_desc * in sofrware, or your HW > allows you to do it. I don't think is totally foreign, but in essence I have two kinds of hardware number. An (apic, pin) pair that I need when talking to the hardware itself and a (cpu, vector) pair that I use when handling an interrupt. The vector number has never been the linux irq number but at times it has only needed a simple offset adjustment. Now that we are having to handle bigger cases only the (apic, pin) pairs that are actually used get a (cpumask_t, vector) assigned to them. It may be that the only difference from the cell is that I have a very small vector number I have to cope with instead of being able to tell the irq controller to give me something immediately useful. > I'm saying that if we're going to change the IRQ stuff that deeply, it > would be nice if we looked into some of that stuff I've done that I > beleive would be of use for other archs. Reasonable. For the first pass when I do the genirq conversion passing struct irq_desc *irq instead of unsigned int irq, I should be able to do something stupid and correct on all of the architectures. When the start taking advantage of the new freedom though generic helpers can be good. > I found it overall very useful to have a generic remapping core and have > cascaded PIC setups have a numbering domain local to a given PIC (pretty > much, a domain != an irq_chip) and I'm convinced it would make life > easier for archs with similar setups. The remapping core also shows its > usefulness on archs with very big interrupt numbers, like sparc or > pSeries ppc, and possibly others. Except for the what appears to be instability of the irq numbers on simpler configurations I don't have a problem with it. > Now, I -do- have a problem with one aspect of your proposed design which > is to keep the "linux" interrupt number in the generic irq_desc, which I > think defeats most of the purpose of moving away from those linux irq > numbers. If you do so, then I'll have to keep a separate remapping layer > and keep a mecanism for virtualizing linux numbers. Until we find a solution for the user space side of things we seem to need the unsigned int irq number for user space. Now I don't want people mapping back and forth which is why I don't intend to provide a reverse function. But of course there will be a for_each_irq in the genirq layer so if people really want to they will be able to go from the linux irq to an irq_desc. But we don't have to export that generically (except possibly something for the isa irqs). Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: >> The only time it really makes sense to me to let the irq number vary >> arbitrary are when things are truly dynamic, like with MSI, a >> hypervisor, or hot plug interrupt controllers. > > I don't understand why you would go to all that lenght to replace irq > numbers with irq_desc * and ... keep then numbers :-) Because I don't have something better to replace them with. We need names for irqs, currently the kernel/user space interface is a unsigned number. Printing out a pointer where we currently have an integer in: /proc/interrupts /proc/irq/N/... /sys/devices/pci:00/:00:0e.0/irq is a bad practice, and if I don't retain the number that is my only choice. I similar problem exists in all of the initialization messages from device drivers that display their irq number. Plus I think there are also a few ioctls that return the linux irq number. Now it may make sense to replace my irq_nr() with irq_name(), and return a string that can be used instead, but fixing the kernel user space interface is a third step that is a lot more delicate and will require more thinking. So I would prefer to put that off until all of the internal users are using a pointer. Then we can grep for irq_nr and see how many places we actually export the irq number to user space. The fact that the user space has been put in charge of when to migrate an irq from cpu to another makes this double delicate. >> Sure, and I have the same issue with a big "DESIGNED FOR ppc" in the middle, >> or "DESIGNED FOR arch/x". However the unfortunate truth is that the x86 >> has enough volume that frequently other architectures use some x86 >> hardware and thus get some of x86's warts. So anything that doesn't >> cope with the x86's warts is frequently doomed to failure. > > I fait to see how what I described would not apply nicely to x86 .. The model can be made to work if you force it but it isn't really a good fit. I can't really use the (cpu#, vector#) tuple as hw number as it varies at runtime, and a single interrupt can send different (cpu#, vector#) tuples from one interrupt message to the next without being reprogrammed. At least I don't have the impression that you support multiple hardware numbers going to the same linux irq. But this really is the layer where I need the reverse mapping. However I can optimize the reverse mapping by taking advantage of the per cpu nature. Currently the hardware number that I use is the pin number on the ioapic. And to form the linux irq I just add the number of pins of all previous ioapics together and then add my pin number. Fairly simple. Doing the above gives me stable names that are the same from one boot to the next if someone doesn't change how the hardware is put together. It looks to me that if I adapt the ppc scheme my irq numbers will change from one boot to the next one kernel to the next, almost at random. Depending on driver initialization order and similar things. Having names that change all of the time is confusing and not very useful. The fact that in the process of making my names stable it actually happens to reflect part of the irq hardware topology is incidental. Giving up stable names is not something I want to do. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > >> We might need this. But I don't think we need reference counting in >> the traditional sense. For all practical purpose we already have >> dynamic irq allocation and it hasn't proven necessary. I would >> prefer to go to lengths to avoid having to expose that kind of >> an issue to driver code. > > I think we do need proper refcounting, but I also think that most > drivers will not need to see it. > > For example, a PCI driver will most probably just do something along the > lines of the existing request_irq(pdev->irq), the liftime of pdev->irq > is managed by the PCI core. > > Same goes with MSIs imho, the MSI core can manage the lifetime > transparently. Yes. I'm optimistic that we won't find a case where refcounting will be needed. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> > #define NO_IRQ > > When did you need a magic constant NO_IRQ in generic code. > One of the reasons I want to convert the drivers is so we can > kill the NO_IRQ nonsense. > > As for struct irq. Instead of struct irq_desc I really don't > care, although the C++ camp hasn't not yet weighed in and mentioned > how that creates a namespace conflict for them. Yeah, NO_IRQ would be NULL here... What I do on the powerpc code is since IRQ HW numbers are defined locally to a domain/PIC, when creating a new domain, The PIC code passes a value to use as an "illegal" value in that domain. It's not exposed outside of the core though, it's really only used to initialize the remapping table with something before any interrupt on that PIC has been mapped. > We might need this. But I don't think we need reference counting in > the traditional sense. For all practical purpose we already have > dynamic irq allocation and it hasn't proven necessary. I would > prefer to go to lengths to avoid having to expose that kind of > an issue to driver code. I think we do need proper refcounting, but I also think that most drivers will not need to see it. For example, a PCI driver will most probably just do something along the lines of the existing request_irq(pdev->irq), the liftime of pdev->irq is managed by the PCI core. Same goes with MSIs imho, the MSI core can manage the lifetime transparently. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Sat, 2007-02-17 at 02:06 -0700, Eric W. Biederman wrote: > Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > > > In addition, if we remove the numbers, archs will need basically the > > exact same services provided by the powerpc irq core for reverse mapping > > (going from a HW irq number on a given PIC back to an irq_desc *). > > Ben you seem to be under misapprehension that except for the case of > ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary > software enumeration, and I think it has been that way a very long time. Did you actually mean "is not a hardware number" ? If not, then I don't understand your sentence... > I can only tell you that my impression of this last is that all the > world's not a PPC. Yeah and my grandmother is not the pope, thank you. However, PowerPC is a good example because it has such a diversity of very different hardware setups to deal with, ranging from the multiple layers of cascading controllers all over the place, to interrupts packets encoding vector/target etc... a bit like x86 on cell, to hypervisors providing a single giant number space etc etc etc... Thus, it is extremely likely that something that works well for PowerPC (or for ARM for that matter as it's probably as a "colorful" environment as PowerPC is) will end up being useful for others. > I have a version of the x86 code with a partial conversion done and > I didn't need a reverse mapping. What you call the hardware interrupt > number never happens to be interesting to me after the system is setup. Because you have the ability to tell your PIC to give you your "linux" interrupt number when actually sending the interrupt to the processor ? You need a way to get to the irq_desc * when getting an IRQ, either you have a way to map HW numbers back to irq_desc * in sofrware, or your HW allows you to do it. > I do suspect there may be an interesting chunk of your ppc work that > probably makes sense as a library so other arches could use it. Guess what, one of the options of my code is to not instanciate a remapper... for archs where it's not necessary. (We have the case for example of iSeries whose hypervisor can return us the number we want for an arbitrary interrupt). Now, I'm not saying we should take the PowerPC code and say "hey' here's the new generic code". I'm saying that if we're going to change the IRQ stuff that deeply, it would be nice if we looked into some of that stuff I've done that I beleive would be of use for other archs (though you seem to imply that it would be of no use on x86, good, still...). I found it overall very useful to have a generic remapping core and have cascaded PIC setups have a numbering domain local to a given PIC (pretty much, a domain != an irq_chip) and I'm convinced it would make life easier for archs with similar setups. The remapping core also shows its usefulness on archs with very big interrupt numbers, like sparc or pSeries ppc, and possibly others. Now, I -do- have a problem with one aspect of your proposed design which is to keep the "linux" interrupt number in the generic irq_desc, which I think defeats most of the purpose of moving away from those linux irq numbers. If you do so, then I'll have to keep a separate remapping layer and keep a mecanism for virtualizing linux numbers. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> No. I don't think we should make your irq_hwnumber_t thingy general > because it is not general. I don't understand why you need it to be > an unsigned long, that still puzzles me. But for the rest it actually > appears that ppc has a simpler model to deal with. I think you might have misunderstood becaues I do beleive it's actually very general :-) Let me explain below. > I don't think I actually can describe x86 hardware in you hwnumber_t > world. Although I can approximate. And I think it fits well... > In non-legacy mode at the top of the tree I have a network cooperating > irq controllers. For each cpu there is an lapic next to each cpu that > catches interrupt packets and below that I have interrupt controllers > that throw interrupt packets. In the network of cooperating interrupt > controllers a interrupt packet has a destination address that looks > like (cpu#, vector#) where cpu# is currently at 8 bits and slowly > growing and the vector# is a fixed 8 bits. > > The interrupt controllers that throw those packets have a fixed > number of irq slots usually 24 or so. Each slot (referred to in the > code as a pin) can be programmed which (cpu#, vector#) packet it > throws when an interrupt occurs. Including an option to vary the cpu# > between a set of cpus. > > So to be frank to handle this model properly I need to deal with > this properly I need. > #define NR_IRQS (NR_CPUS*256) > > There is enough flexibility in this model that hardware vectors > have not found a need to cascade interrupt controllers. This is roughly similar to the cell "toplevel" model where interrupt messages encode the source unit/node, target and class. The chip has an interrupt "controller" (receiver of those messages) for each thread. In the kernel, I use a "flat" model, that is I create one host for all of them and my hardware numbers are mode of a similar bit encoding of those "routing" infos. That is, with a remapping model like mine, the x86 non-legacy situation could be easily expressed by having one domain (I call them hosts in the code) covering the whole fabric and the hw number be your (CPU << 16) | vector thing. In addition, but you don't need that on x86, cell has an external controller cascaded on one of those interrupt, I use a separate domain for it. The reason my hwnumber thingy is a generic type is that i provide generic functions to create a linux interrupt for a domain/number pair and generic mecanism to do the reverse mapping. That's where I think my code might be of some use as with the "numbers" going away, pretty everybody will need a wat to reverse map from HW numbers back to irq_desc *. I use an unsigned long because I needed to choose a type that would fit the biggest number potentially used by an interrupt controller, and that can be real big with some hypervisors for which those are "tokens" which are potentially 64 bits. > Ben I have no problem with a number that is specific to an irq > controller for dealing with the internal irq controller > implementations, heck I think everyone has that to some degree > > The linux irq number will remain an arbitrary software number for > use by the linux system for talking about the source of the > interrupt. So you do intend to keep the linux number which is what I call the "virtual interrupt" number on powerpc... I wouldn't have thought that to be necessary except as a special case of an array of 16 entries for ISA interrupts... > Why in a sparse address space you would find it hard to allocate a > range of numbers to an irq controller that only has a fixed number of > irqs it can deal with is something I don't understand and I think > it is does a disservice to your users. But that is all it is > a quality of implementation issue. ia64 does the same foolish > thing. It would be fairly easy to change my powerpc code to pre-allocate a full range for a given domain/pic when initializing it instead of doing "lazy" scattered allocation like I do, though it won't bring much I think. It's not possible for all PICs though, for example, the pSeries needs to use the radix tree reverse mapper because of how large HW interrupt numbers can be. I chose not to do it. In the long run, the only remotely meaningful way to expose interrupt to users would be to -add- columns to /proc/interrupts that provide the "host" and the HW number on that host, though I'm not sure that wouldn't break some userland tools. > The only time it really makes sense to me to let the irq number vary > arbitrary are when things are truly dynamic, like with MSI, a > hypervisor, or hot plug interrupt controllers. I don't understand why you would go to all that lenght to replace irq numbers with irq_desc * and ... keep then numbers :-) But again, as I said, this is in no way a fundamental limitation of the powerpc code. It could be modified easily to allocate the whole range of a given PIC that uses the "linear" remapping. It makes no sense for PICs that use the
Re: [RFC] killing the NR_IRQS arrays.
Russell King <[EMAIL PROTECTED]> writes: > On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: >> On Friday 16 February 2007 13:10, Eric W. Biederman wrote: >> > To do this I believe will require a s/unsigned int irq/struct irq_desc >> > *irq/ >> > throughout the entire kernel. Getting the arch specific code and the >> > generic kernel infrastructure fixed and ready for that change looks >> > like a pain but pretty doable. >> >> We did something like this a few years back on the s390 architecture, which >> happens to be lucky enough not to share any interrupt based drivers with >> any of the other architectures. > > What you're proposing is looking similar to a proposal I put forward some > 4 years ago, but was rejected. Maybe times have changed and there's a > need for it now. > > Message attached. > > -- > Russell King > Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ > maintainer of: > > From: Russell King <[EMAIL PROTECTED]> > Subject: [RFC] IRQ API > To: linux-arch@vger.kernel.org > Cc: Alan Cox <[EMAIL PROTECTED]> > Date: Sat, 07 Jun 2003 17:05:19 -0700 > > Hi, > > I've recently received an updated development system from ARM Ltd, > which has caused me to become concerned about whether the existing > IRQ infrastructure for the ARM architecture is really up to the job > of handling the developments which will occur over the 2.6 lifetime. > Essentially we're going to be seeing the emergence of vectored > interrupt controllers, and knowing hardware designers, they'll > continue the practice of chaining interrupt controllers (which is > pretty common on ARM today.) I have some hardware here today which > has a vectored interrupt controller chained after two non-vectored > controllers. This vectored interrupt controller is on an add-on > card, and so has no fixed address space and no fixed IRQ numbering. > > Rather than having the job of rewriting this code during 2.6, I'd much > prefer to get something sorted, even if it is ARM only before 2.6. > > I believe that there are some common problems with the existing API > which have been hinted at over the last few days, such as large > NR_IRQS. As such, I think it would be a good idea to try to thrash > this issue out and get something which everyone is happy with. > > Additionally, I've added Alan's "reserve then hook" idea to the API; > I seem to remember there is a case in IDE which needs something like > this. > > Please note that what I am proposing is not to strip out the existing > API between now and 2.7; what I am proposing is a structure for 2.7 > which can optionally be implemented by architectures and used in > architecture specific drivers now if they feel they would benefit > from it. > > Comments? (other than "wtf are you thinking about this so close to 2.6, > are you mad" 8)) > > > Linux Interrupt API > === > > Russell King <[EMAIL PROTECTED]> > > The Linux Interrupt API provides a flexible mechanism to handle and > control interrupts within the kernel. The design requirements for > this API are: > > - must have as little overhead as possible for commodity hardware > - must be easy and obvious to use > - must allow complex multi-level interrupt implementations to exist > transparently to device drivers > - must be compatible with the existing API > > Essentially, this means that implementation of the existing API must > be simple. > > -- > > The API. > > > struct irq { > /* architecture defined information */ > /* must not be dereferenced by drivers */ > /* eg, x86's irq_desc_t or sparc64's struct ino_bucket */ > }; > > #define NO_IRQ When did you need a magic constant NO_IRQ in generic code. One of the reasons I want to convert the drivers is so we can kill the NO_IRQ nonsense. As for struct irq. Instead of struct irq_desc I really don't care, although the C++ camp hasn't not yet weighed in and mentioned how that creates a namespace conflict for them. > /** > *irq_get - increment reference count on the IRQ descriptor > *@irq: interrupt descriptor > * > *IRQ descriptor reference counting is mandatory for > *implementations which provide dynamically allocated IRQ > *descriptors. statically allocated IRQ descriptor > *implementations may define these to be no-ops. > */ > struct irq *irq_get(struct irq *irq); > /** > *irq_put - decrement reference count on IRQ descriptor > *@irq: interrupt descriptor > * > *Decrement the reference counter in an IRQ descriptor. > *If the reference counter drops to zero, the IRQ descriptor > *will be freed. > * > *IRQ descriptor reference counting is mandatory for > *implementations which provide dynamically allocated IRQ > *descriptors. statically allocated IRQ descriptor > *implementations may
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > In addition, if we remove the numbers, archs will need basically the > exact same services provided by the powerpc irq core for reverse mapping > (going from a HW irq number on a given PIC back to an irq_desc *). Ben you seem to be under misapprehension that except for the case of ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary software enumeration, and I think it has been that way a very long time. > Either using a linear array for simple PICs or a radix tree for > platforms with very big interrupt numbers (BTW. I think we have lockless > radix trees nowadays, I can remove the spinlocks to protect it in the > powerpc remapper). I can only tell you that my impression of this last is that all the world's not a PPC. I have a version of the x86 code with a partial conversion done and I didn't need a reverse mapping. What you call the hardware interrupt number never happens to be interesting to me after the system is setup. I do suspect there may be an interesting chunk of your ppc work that probably makes sense as a library so other arches could use it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote: > >> Getting the drivers changed actually looks to be pretty straight >> forward it will just be a very large mechanical change. We change the >> type where of variables where appropriate and every once in a while >> introduce an irq_nr(irq) to get the actual irq number for the places >> that care (ISA or print statements). > > Dunno about that irq_nr thingy. If we go that way, I'd be tempted to > remove the number completely from the "public" side of irq_desc... or > not. When dealing with users and userspace for /proc/interrupts /proc/irq and the like we need a way to talk about irqs. Currently we use the interrupt number for that and we are likely to break the user/kernel interface if we don't preserve that. Debugging would tend to suck if we couldn't print out the irq number of the irq a driver has been assigned and trace it through various data structures. For hardware that is not hotplug or auto discoverable I think we will need the irq number to talk about the ISA number as well. So I don't see a way that we can get rid of a number completely but it should be of much less significance. > On powerpc, we have this remapped thingy because we completely separate > the linux "virtual" interrupt domain from the physical numbering domains > of each PIC. Your change would turn the linux virtual domain into > pointers, removing the need for an array and associated limitations, > which is nice. > > So to a given irq_desc / irq "virtual" number today, I match a pair HW > number (which is a special typedef which is currently defined as an > unsigned long) and a pointer to the irq "host" (which is the entity that > define a HW number domain). > > That means that you can have multiple hosts and a given HW number can > exist multiple times, once per host. > > Do you think the irq_hwnumber_t thingy I have should then be generalized > and put into the irq_desc ? I would need an additional void * pointer to > the irq host as well (it's not a 1:1 relationship to an irq chip and > need to be accessed by generic code). Having taken a little bit of time to digest roughly what the concept is I think I can finally answer this one. No. I don't think we should make your irq_hwnumber_t thingy general because it is not general. I don't understand why you need it to be an unsigned long, that still puzzles me. But for the rest it actually appears that ppc has a simpler model to deal with. I don't think I actually can describe x86 hardware in you hwnumber_t world. Although I can approximate. In non-legacy mode at the top of the tree I have a network cooperating irq controllers. For each cpu there is an lapic next to each cpu that catches interrupt packets and below that I have interrupt controllers that throw interrupt packets. In the network of cooperating interrupt controllers a interrupt packet has a destination address that looks like (cpu#, vector#) where cpu# is currently at 8 bits and slowly growing and the vector# is a fixed 8 bits. The interrupt controllers that throw those packets have a fixed number of irq slots usually 24 or so. Each slot (referred to in the code as a pin) can be programmed which (cpu#, vector#) packet it throws when an interrupt occurs. Including an option to vary the cpu# between a set of cpus. So to be frank to handle this model properly I need to deal with this properly I need. #define NR_IRQS (NR_CPUS*256) There is enough flexibility in this model that hardware vectors have not found a need to cascade interrupt controllers. > Having the HW number be clearly specific to a "domain controller" makes > also a lot of sense in the embedded field with lots of cascaded > interrupt controllers. It avoids having to play all sorts of tricks to > assign ranges of numbers to various controllers in the system. Only the > local number on a given controller matters, the rest is dynamically > assigned. Ben I have no problem with a number that is specific to an irq controller for dealing with the internal irq controller implementations, heck I think everyone has that to some degree The linux irq number will remain an arbitrary software number for use by the linux system for talking about the source of the interrupt. Why in a sparse address space you would find it hard to allocate a range of numbers to an irq controller that only has a fixed number of irqs it can deal with is something I don't understand and I think it is does a disservice to your users. But that is all it is a quality of implementation issue. ia64 does the same foolish thing. The only time it really makes sense to me to let the irq number vary arbitrary are when things are truly dynamic, like with MSI, a hypervisor, or hot plug interrupt controllers. > Another option would be to have the irq_desc be created by the arch and > "embedded" in a larger data structure, in which case the HW number would > be
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote: Getting the drivers changed actually looks to be pretty straight forward it will just be a very large mechanical change. We change the type where of variables where appropriate and every once in a while introduce an irq_nr(irq) to get the actual irq number for the places that care (ISA or print statements). Dunno about that irq_nr thingy. If we go that way, I'd be tempted to remove the number completely from the public side of irq_desc... or not. When dealing with users and userspace for /proc/interrupts /proc/irq and the like we need a way to talk about irqs. Currently we use the interrupt number for that and we are likely to break the user/kernel interface if we don't preserve that. Debugging would tend to suck if we couldn't print out the irq number of the irq a driver has been assigned and trace it through various data structures. For hardware that is not hotplug or auto discoverable I think we will need the irq number to talk about the ISA number as well. So I don't see a way that we can get rid of a number completely but it should be of much less significance. On powerpc, we have this remapped thingy because we completely separate the linux virtual interrupt domain from the physical numbering domains of each PIC. Your change would turn the linux virtual domain into pointers, removing the need for an array and associated limitations, which is nice. So to a given irq_desc / irq virtual number today, I match a pair HW number (which is a special typedef which is currently defined as an unsigned long) and a pointer to the irq host (which is the entity that define a HW number domain). That means that you can have multiple hosts and a given HW number can exist multiple times, once per host. Do you think the irq_hwnumber_t thingy I have should then be generalized and put into the irq_desc ? I would need an additional void * pointer to the irq host as well (it's not a 1:1 relationship to an irq chip and need to be accessed by generic code). Having taken a little bit of time to digest roughly what the concept is I think I can finally answer this one. No. I don't think we should make your irq_hwnumber_t thingy general because it is not general. I don't understand why you need it to be an unsigned long, that still puzzles me. But for the rest it actually appears that ppc has a simpler model to deal with. I don't think I actually can describe x86 hardware in you hwnumber_t world. Although I can approximate. In non-legacy mode at the top of the tree I have a network cooperating irq controllers. For each cpu there is an lapic next to each cpu that catches interrupt packets and below that I have interrupt controllers that throw interrupt packets. In the network of cooperating interrupt controllers a interrupt packet has a destination address that looks like (cpu#, vector#) where cpu# is currently at 8 bits and slowly growing and the vector# is a fixed 8 bits. The interrupt controllers that throw those packets have a fixed number of irq slots usually 24 or so. Each slot (referred to in the code as a pin) can be programmed which (cpu#, vector#) packet it throws when an interrupt occurs. Including an option to vary the cpu# between a set of cpus. So to be frank to handle this model properly I need to deal with this properly I need. #define NR_IRQS (NR_CPUS*256) There is enough flexibility in this model that hardware vectors have not found a need to cascade interrupt controllers. Having the HW number be clearly specific to a domain controller makes also a lot of sense in the embedded field with lots of cascaded interrupt controllers. It avoids having to play all sorts of tricks to assign ranges of numbers to various controllers in the system. Only the local number on a given controller matters, the rest is dynamically assigned. Ben I have no problem with a number that is specific to an irq controller for dealing with the internal irq controller implementations, heck I think everyone has that to some degree The linux irq number will remain an arbitrary software number for use by the linux system for talking about the source of the interrupt. Why in a sparse address space you would find it hard to allocate a range of numbers to an irq controller that only has a fixed number of irqs it can deal with is something I don't understand and I think it is does a disservice to your users. But that is all it is a quality of implementation issue. ia64 does the same foolish thing. The only time it really makes sense to me to let the irq number vary arbitrary are when things are truly dynamic, like with MSI, a hypervisor, or hot plug interrupt controllers. Another option would be to have the irq_desc be created by the arch and embedded in a larger data structure, in which case the HW number would be part of the private part of that data structure. Though I
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: In addition, if we remove the numbers, archs will need basically the exact same services provided by the powerpc irq core for reverse mapping (going from a HW irq number on a given PIC back to an irq_desc *). Ben you seem to be under misapprehension that except for the case of ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary software enumeration, and I think it has been that way a very long time. Either using a linear array for simple PICs or a radix tree for platforms with very big interrupt numbers (BTW. I think we have lockless radix trees nowadays, I can remove the spinlocks to protect it in the powerpc remapper). I can only tell you that my impression of this last is that all the world's not a PPC. I have a version of the x86 code with a partial conversion done and I didn't need a reverse mapping. What you call the hardware interrupt number never happens to be interesting to me after the system is setup. I do suspect there may be an interesting chunk of your ppc work that probably makes sense as a library so other arches could use it. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Russell King [EMAIL PROTECTED] writes: On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: On Friday 16 February 2007 13:10, Eric W. Biederman wrote: To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ throughout the entire kernel. Getting the arch specific code and the generic kernel infrastructure fixed and ready for that change looks like a pain but pretty doable. We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Message attached. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: From: Russell King [EMAIL PROTECTED] Subject: [RFC] IRQ API To: linux-arch@vger.kernel.org Cc: Alan Cox [EMAIL PROTECTED] Date: Sat, 07 Jun 2003 17:05:19 -0700 Hi, I've recently received an updated development system from ARM Ltd, which has caused me to become concerned about whether the existing IRQ infrastructure for the ARM architecture is really up to the job of handling the developments which will occur over the 2.6 lifetime. Essentially we're going to be seeing the emergence of vectored interrupt controllers, and knowing hardware designers, they'll continue the practice of chaining interrupt controllers (which is pretty common on ARM today.) I have some hardware here today which has a vectored interrupt controller chained after two non-vectored controllers. This vectored interrupt controller is on an add-on card, and so has no fixed address space and no fixed IRQ numbering. Rather than having the job of rewriting this code during 2.6, I'd much prefer to get something sorted, even if it is ARM only before 2.6. I believe that there are some common problems with the existing API which have been hinted at over the last few days, such as large NR_IRQS. As such, I think it would be a good idea to try to thrash this issue out and get something which everyone is happy with. Additionally, I've added Alan's reserve then hook idea to the API; I seem to remember there is a case in IDE which needs something like this. Please note that what I am proposing is not to strip out the existing API between now and 2.7; what I am proposing is a structure for 2.7 which can optionally be implemented by architectures and used in architecture specific drivers now if they feel they would benefit from it. Comments? (other than wtf are you thinking about this so close to 2.6, are you mad 8)) Linux Interrupt API === Russell King [EMAIL PROTECTED] The Linux Interrupt API provides a flexible mechanism to handle and control interrupts within the kernel. The design requirements for this API are: - must have as little overhead as possible for commodity hardware - must be easy and obvious to use - must allow complex multi-level interrupt implementations to exist transparently to device drivers - must be compatible with the existing API Essentially, this means that implementation of the existing API must be simple. -- The API. struct irq { /* architecture defined information */ /* must not be dereferenced by drivers */ /* eg, x86's irq_desc_t or sparc64's struct ino_bucket */ }; #define NO_IRQarchitecture-defined-int-constant When did you need a magic constant NO_IRQ in generic code. One of the reasons I want to convert the drivers is so we can kill the NO_IRQ nonsense. As for struct irq. Instead of struct irq_desc I really don't care, although the C++ camp hasn't not yet weighed in and mentioned how that creates a namespace conflict for them. /** *irq_get - increment reference count on the IRQ descriptor *@irq: interrupt descriptor * *IRQ descriptor reference counting is mandatory for *implementations which provide dynamically allocated IRQ *descriptors. statically allocated IRQ descriptor *implementations may define these to be no-ops. */ struct irq *irq_get(struct irq *irq); /** *irq_put - decrement reference count on IRQ descriptor *@irq: interrupt descriptor * *Decrement the reference counter in an IRQ descriptor. *If the reference counter drops to zero, the IRQ descriptor *will be freed. * *IRQ descriptor reference counting is mandatory for *implementations which provide dynamically allocated IRQ *descriptors. statically allocated IRQ descriptor *implementations may define these to be no-ops. */ void irq_put(struct irq *irq); We might need this. But I don't think we
Re: [RFC] killing the NR_IRQS arrays.
No. I don't think we should make your irq_hwnumber_t thingy general because it is not general. I don't understand why you need it to be an unsigned long, that still puzzles me. But for the rest it actually appears that ppc has a simpler model to deal with. I think you might have misunderstood becaues I do beleive it's actually very general :-) Let me explain below. I don't think I actually can describe x86 hardware in you hwnumber_t world. Although I can approximate. And I think it fits well... In non-legacy mode at the top of the tree I have a network cooperating irq controllers. For each cpu there is an lapic next to each cpu that catches interrupt packets and below that I have interrupt controllers that throw interrupt packets. In the network of cooperating interrupt controllers a interrupt packet has a destination address that looks like (cpu#, vector#) where cpu# is currently at 8 bits and slowly growing and the vector# is a fixed 8 bits. The interrupt controllers that throw those packets have a fixed number of irq slots usually 24 or so. Each slot (referred to in the code as a pin) can be programmed which (cpu#, vector#) packet it throws when an interrupt occurs. Including an option to vary the cpu# between a set of cpus. So to be frank to handle this model properly I need to deal with this properly I need. #define NR_IRQS (NR_CPUS*256) There is enough flexibility in this model that hardware vectors have not found a need to cascade interrupt controllers. This is roughly similar to the cell toplevel model where interrupt messages encode the source unit/node, target and class. The chip has an interrupt controller (receiver of those messages) for each thread. In the kernel, I use a flat model, that is I create one host for all of them and my hardware numbers are mode of a similar bit encoding of those routing infos. That is, with a remapping model like mine, the x86 non-legacy situation could be easily expressed by having one domain (I call them hosts in the code) covering the whole fabric and the hw number be your (CPU 16) | vector thing. In addition, but you don't need that on x86, cell has an external controller cascaded on one of those interrupt, I use a separate domain for it. The reason my hwnumber thingy is a generic type is that i provide generic functions to create a linux interrupt for a domain/number pair and generic mecanism to do the reverse mapping. That's where I think my code might be of some use as with the numbers going away, pretty everybody will need a wat to reverse map from HW numbers back to irq_desc *. I use an unsigned long because I needed to choose a type that would fit the biggest number potentially used by an interrupt controller, and that can be real big with some hypervisors for which those are tokens which are potentially 64 bits. Ben I have no problem with a number that is specific to an irq controller for dealing with the internal irq controller implementations, heck I think everyone has that to some degree The linux irq number will remain an arbitrary software number for use by the linux system for talking about the source of the interrupt. So you do intend to keep the linux number which is what I call the virtual interrupt number on powerpc... I wouldn't have thought that to be necessary except as a special case of an array of 16 entries for ISA interrupts... Why in a sparse address space you would find it hard to allocate a range of numbers to an irq controller that only has a fixed number of irqs it can deal with is something I don't understand and I think it is does a disservice to your users. But that is all it is a quality of implementation issue. ia64 does the same foolish thing. It would be fairly easy to change my powerpc code to pre-allocate a full range for a given domain/pic when initializing it instead of doing lazy scattered allocation like I do, though it won't bring much I think. It's not possible for all PICs though, for example, the pSeries needs to use the radix tree reverse mapper because of how large HW interrupt numbers can be. I chose not to do it. In the long run, the only remotely meaningful way to expose interrupt to users would be to -add- columns to /proc/interrupts that provide the host and the HW number on that host, though I'm not sure that wouldn't break some userland tools. The only time it really makes sense to me to let the irq number vary arbitrary are when things are truly dynamic, like with MSI, a hypervisor, or hot plug interrupt controllers. I don't understand why you would go to all that lenght to replace irq numbers with irq_desc * and ... keep then numbers :-) But again, as I said, this is in no way a fundamental limitation of the powerpc code. It could be modified easily to allocate the whole range of a given PIC that uses the linear remapping. It makes no sense for PICs that use the radix tree remapping though. Sure, and I have the same issue with
Re: [RFC] killing the NR_IRQS arrays.
On Sat, 2007-02-17 at 02:06 -0700, Eric W. Biederman wrote: Benjamin Herrenschmidt [EMAIL PROTECTED] writes: In addition, if we remove the numbers, archs will need basically the exact same services provided by the powerpc irq core for reverse mapping (going from a HW irq number on a given PIC back to an irq_desc *). Ben you seem to be under misapprehension that except for the case of ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary software enumeration, and I think it has been that way a very long time. Did you actually mean is not a hardware number ? If not, then I don't understand your sentence... I can only tell you that my impression of this last is that all the world's not a PPC. Yeah and my grandmother is not the pope, thank you. However, PowerPC is a good example because it has such a diversity of very different hardware setups to deal with, ranging from the multiple layers of cascading controllers all over the place, to interrupts packets encoding vector/target etc... a bit like x86 on cell, to hypervisors providing a single giant number space etc etc etc... Thus, it is extremely likely that something that works well for PowerPC (or for ARM for that matter as it's probably as a colorful environment as PowerPC is) will end up being useful for others. I have a version of the x86 code with a partial conversion done and I didn't need a reverse mapping. What you call the hardware interrupt number never happens to be interesting to me after the system is setup. Because you have the ability to tell your PIC to give you your linux interrupt number when actually sending the interrupt to the processor ? You need a way to get to the irq_desc * when getting an IRQ, either you have a way to map HW numbers back to irq_desc * in sofrware, or your HW allows you to do it. I do suspect there may be an interesting chunk of your ppc work that probably makes sense as a library so other arches could use it. Guess what, one of the options of my code is to not instanciate a remapper... for archs where it's not necessary. (We have the case for example of iSeries whose hypervisor can return us the number we want for an arbitrary interrupt). Now, I'm not saying we should take the PowerPC code and say hey' here's the new generic code. I'm saying that if we're going to change the IRQ stuff that deeply, it would be nice if we looked into some of that stuff I've done that I beleive would be of use for other archs (though you seem to imply that it would be of no use on x86, good, still...). I found it overall very useful to have a generic remapping core and have cascaded PIC setups have a numbering domain local to a given PIC (pretty much, a domain != an irq_chip) and I'm convinced it would make life easier for archs with similar setups. The remapping core also shows its usefulness on archs with very big interrupt numbers, like sparc or pSeries ppc, and possibly others. Now, I -do- have a problem with one aspect of your proposed design which is to keep the linux interrupt number in the generic irq_desc, which I think defeats most of the purpose of moving away from those linux irq numbers. If you do so, then I'll have to keep a separate remapping layer and keep a mecanism for virtualizing linux numbers. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
#define NO_IRQ architecture-defined-int-constant When did you need a magic constant NO_IRQ in generic code. One of the reasons I want to convert the drivers is so we can kill the NO_IRQ nonsense. As for struct irq. Instead of struct irq_desc I really don't care, although the C++ camp hasn't not yet weighed in and mentioned how that creates a namespace conflict for them. Yeah, NO_IRQ would be NULL here... What I do on the powerpc code is since IRQ HW numbers are defined locally to a domain/PIC, when creating a new domain, The PIC code passes a value to use as an illegal value in that domain. It's not exposed outside of the core though, it's really only used to initialize the remapping table with something before any interrupt on that PIC has been mapped. We might need this. But I don't think we need reference counting in the traditional sense. For all practical purpose we already have dynamic irq allocation and it hasn't proven necessary. I would prefer to go to lengths to avoid having to expose that kind of an issue to driver code. I think we do need proper refcounting, but I also think that most drivers will not need to see it. For example, a PCI driver will most probably just do something along the lines of the existing request_irq(pdev-irq), the liftime of pdev-irq is managed by the PCI core. Same goes with MSIs imho, the MSI core can manage the lifetime transparently. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: We might need this. But I don't think we need reference counting in the traditional sense. For all practical purpose we already have dynamic irq allocation and it hasn't proven necessary. I would prefer to go to lengths to avoid having to expose that kind of an issue to driver code. I think we do need proper refcounting, but I also think that most drivers will not need to see it. For example, a PCI driver will most probably just do something along the lines of the existing request_irq(pdev-irq), the liftime of pdev-irq is managed by the PCI core. Same goes with MSIs imho, the MSI core can manage the lifetime transparently. Yes. I'm optimistic that we won't find a case where refcounting will be needed. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: The only time it really makes sense to me to let the irq number vary arbitrary are when things are truly dynamic, like with MSI, a hypervisor, or hot plug interrupt controllers. I don't understand why you would go to all that lenght to replace irq numbers with irq_desc * and ... keep then numbers :-) Because I don't have something better to replace them with. We need names for irqs, currently the kernel/user space interface is a unsigned number. Printing out a pointer where we currently have an integer in: /proc/interrupts /proc/irq/N/... /sys/devices/pci:00/:00:0e.0/irq is a bad practice, and if I don't retain the number that is my only choice. I similar problem exists in all of the initialization messages from device drivers that display their irq number. Plus I think there are also a few ioctls that return the linux irq number. Now it may make sense to replace my irq_nr() with irq_name(), and return a string that can be used instead, but fixing the kernel user space interface is a third step that is a lot more delicate and will require more thinking. So I would prefer to put that off until all of the internal users are using a pointer. Then we can grep for irq_nr and see how many places we actually export the irq number to user space. The fact that the user space has been put in charge of when to migrate an irq from cpu to another makes this double delicate. Sure, and I have the same issue with a big DESIGNED FOR ppc in the middle, or DESIGNED FOR arch/x. However the unfortunate truth is that the x86 has enough volume that frequently other architectures use some x86 hardware and thus get some of x86's warts. So anything that doesn't cope with the x86's warts is frequently doomed to failure. I fait to see how what I described would not apply nicely to x86 .. The model can be made to work if you force it but it isn't really a good fit. I can't really use the (cpu#, vector#) tuple as hw number as it varies at runtime, and a single interrupt can send different (cpu#, vector#) tuples from one interrupt message to the next without being reprogrammed. At least I don't have the impression that you support multiple hardware numbers going to the same linux irq. But this really is the layer where I need the reverse mapping. However I can optimize the reverse mapping by taking advantage of the per cpu nature. Currently the hardware number that I use is the pin number on the ioapic. And to form the linux irq I just add the number of pins of all previous ioapics together and then add my pin number. Fairly simple. Doing the above gives me stable names that are the same from one boot to the next if someone doesn't change how the hardware is put together. It looks to me that if I adapt the ppc scheme my irq numbers will change from one boot to the next one kernel to the next, almost at random. Depending on driver initialization order and similar things. Having names that change all of the time is confusing and not very useful. The fact that in the process of making my names stable it actually happens to reflect part of the irq hardware topology is incidental. Giving up stable names is not something I want to do. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: On Sat, 2007-02-17 at 02:06 -0700, Eric W. Biederman wrote: However, PowerPC is a good example because it has such a diversity of very different hardware setups to deal with, ranging from the multiple layers of cascading controllers all over the place, to interrupts packets encoding vector/target etc... a bit like x86 on cell, to hypervisors providing a single giant number space etc etc etc... Thus, it is extremely likely that something that works well for PowerPC (or for ARM for that matter as it's probably as a colorful environment as PowerPC is) will end up being useful for others. Sure I agree. Part of what I'm trying to say is that it appears that basic interrupt handling assumptions seem to be inherent to the architectures. And as much as it surprises me because of basic assumptions I don't think there is any architecture with every flavor of color. I have a version of the x86 code with a partial conversion done and I didn't need a reverse mapping. What you call the hardware interrupt number never happens to be interesting to me after the system is setup. Because you have the ability to tell your PIC to give you your linux interrupt number when actually sending the interrupt to the processor ? You need a way to get to the irq_desc * when getting an IRQ, either you have a way to map HW numbers back to irq_desc * in sofrware, or your HW allows you to do it. I don't think is totally foreign, but in essence I have two kinds of hardware number. An (apic, pin) pair that I need when talking to the hardware itself and a (cpu, vector) pair that I use when handling an interrupt. The vector number has never been the linux irq number but at times it has only needed a simple offset adjustment. Now that we are having to handle bigger cases only the (apic, pin) pairs that are actually used get a (cpumask_t, vector) assigned to them. It may be that the only difference from the cell is that I have a very small vector number I have to cope with instead of being able to tell the irq controller to give me something immediately useful. I'm saying that if we're going to change the IRQ stuff that deeply, it would be nice if we looked into some of that stuff I've done that I beleive would be of use for other archs. Reasonable. For the first pass when I do the genirq conversion passing struct irq_desc *irq instead of unsigned int irq, I should be able to do something stupid and correct on all of the architectures. When the start taking advantage of the new freedom though generic helpers can be good. I found it overall very useful to have a generic remapping core and have cascaded PIC setups have a numbering domain local to a given PIC (pretty much, a domain != an irq_chip) and I'm convinced it would make life easier for archs with similar setups. The remapping core also shows its usefulness on archs with very big interrupt numbers, like sparc or pSeries ppc, and possibly others. Except for the what appears to be instability of the irq numbers on simpler configurations I don't have a problem with it. Now, I -do- have a problem with one aspect of your proposed design which is to keep the linux interrupt number in the generic irq_desc, which I think defeats most of the purpose of moving away from those linux irq numbers. If you do so, then I'll have to keep a separate remapping layer and keep a mecanism for virtualizing linux numbers. Until we find a solution for the user space side of things we seem to need the unsigned int irq number for user space. Now I don't want people mapping back and forth which is why I don't intend to provide a reverse function. But of course there will be a for_each_irq in the genirq layer so if people really want to they will be able to go from the linux irq to an irq_desc. But we don't have to export that generically (except possibly something for the isa irqs). Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Sat, 2007-02-17 at 02:37 +0100, Arnd Bergmann wrote: > On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote: > > You might want to have a look at the powerpc API with it's remaping > > capabilities. It's very nice for handling multiple domain spaces. It > > might be of some use for you. > > I don't consider the powerpc virtual IRQs a solution for the problem. > While I believe you did the right thing for powerpc with generalizing > this over all its platforms, it really isn't more than a workaround > for the problem that we can't deal well with the static irq_desc > array. It's not a solution per-se, though it contains elements of solution like the reverse mappin, which I use to map HW numbers to virtual irqs but can trivially adapt to map HW numbers to irq_desc pointers. Among other things, I want to make sure that we don't end up with just putting an irq number in a field of the irq_desc and have half of the drivers peek at it and assume we can convert between irq_desc* and number in arbitrary ways. The HW irq number should be as much opaque as possible from the world outside of the PIC code and/or arch code that assign them. That's an area where the powerpc and/or sparc code might be of use. > When that problem is now getting worse on other architectures, we > should try to get it right on all of them, rather than spreading > the workaround further. Yes, but I'd like aspects of my remapping work to be included in whatever we come up with, which is to have the new irq_desc either hide the underlying HW number, or at least associate it make it very clear that it's an opaque token and not guaranteed to be unique accross multiple PICs in the system. In addition, if we remove the numbers, archs will need basically the exact same services provided by the powerpc irq core for reverse mapping (going from a HW irq number on a given PIC back to an irq_desc *). Either using a linear array for simple PICs or a radix tree for platforms with very big interrupt numbers (BTW. I think we have lockless radix trees nowadays, I can remove the spinlocks to protect it in the powerpc remapper). Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote: > You might want to have a look at the powerpc API with it's remaping > capabilities. It's very nice for handling multiple domain spaces. It > might be of some use for you. I don't consider the powerpc virtual IRQs a solution for the problem. While I believe you did the right thing for powerpc with generalizing this over all its platforms, it really isn't more than a workaround for the problem that we can't deal well with the static irq_desc array. When that problem is now getting worse on other architectures, we should try to get it right on all of them, rather than spreading the workaround further. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> > Rather than having the job of rewriting this code during 2.6, I'd much > > prefer to get something sorted, even if it is ARM only before 2.6. > > > > I believe that there are some common problems with the existing API > > which have been hinted at over the last few days, such as large > > NR_IRQS. As such, I think it would be a good idea to try to thrash > > this issue out and get something which everyone is happy with. > > > > Additionally, I've added Alan's "reserve then hook" idea to the API; > > I seem to remember there is a case in IDE which needs something like > > this. You might want to have a look at the powerpc API with it's remaping capabilities. It's very nice for handling multiple domain spaces. It might be of some use for you. I like your proposed API, I think that's where we want to go in the long run. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > > > So I propose we remove all assumptions from the code that we actually > > have an array of irqs. That will allow for irq_desc to be dynamically > > allocated instead of statically allocated saving memory and reducing > > kernel complexity. > > hm. I'd suggest to do this without changing request_irq() - and then we > could avoid the 'massive, every driver affected' change, right? > > i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() > mapping facility anyway, lets just not change the driver APIs massively. > There dont seem to be that many drivers that assume that irq_desc[] is > an array - are there? > > otherwise, in terms of the irqchips infrastructure and the API between > genirq and the irqchip arch-level drivers, this change makes quite a bit > of sense i think. > > or am i missing something fundamental? Well, I don't want to see anything like desc_to_nr / nr_to_desc unless the number in question is a virtual number. That is, there is no way we should go that way and keep passing a HW number through request_irq. That would just be a total nightmare for powerpc and sparc at least. What we can do is generalize the powerpc virtual irq scheme though. You can see the implementation in arch/powerpc/kernel/irq.c starting from the definition of irq_alloc_host() though for some stupid reason, I've put all the documentation in include/asm-powerpc/irq.h so you might want to start there. Once the IRQ numbers are virtualized, it becomes easier to slowly migrate things to use irq_desc_t * while still having a virutal number available. Once everything has been migrated, we can then get rid of the virtual numbers completely except maybe for an optional 16 entries array for legacy cruft. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote: > Getting the drivers changed actually looks to be pretty straight > forward it will just be a very large mechanical change. We change the > type where of variables where appropriate and every once in a while > introduce an irq_nr(irq) to get the actual irq number for the places > that care (ISA or print statements). Dunno about that irq_nr thingy. If we go that way, I'd be tempted to remove the number completely from the "public" side of irq_desc... or not. On powerpc, we have this remapped thingy because we completely separate the linux "virtual" interrupt domain from the physical numbering domains of each PIC. Your change would turn the linux virtual domain into pointers, removing the need for an array and associated limitations, which is nice. So to a given irq_desc / irq "virtual" number today, I match a pair HW number (which is a special typedef which is currently defined as an unsigned long) and a pointer to the irq "host" (which is the entity that define a HW number domain). That means that you can have multiple hosts and a given HW number can exist multiple times, once per host. Do you think the irq_hwnumber_t thingy I have should then be generalized and put into the irq_desc ? I would need an additional void * pointer to the irq host as well (it's not a 1:1 relationship to an irq chip and need to be accessed by generic code). Having the HW number be clearly specific to a "domain controller" makes also a lot of sense in the embedded field with lots of cascaded interrupt controllers. It avoids having to play all sorts of tricks to assign ranges of numbers to various controllers in the system. Only the local number on a given controller matters, the rest is dynamically assigned. Another option would be to have the irq_desc be created by the arch and "embedded" in a larger data structure, in which case the HW number would be part of the private part of that data structure. Though I suppose that could be a problem with ISA... I suspect that for backward compatibility, we will need to keep something (optionally maybe via CONFIG_*) for ISA/legacy interrupts. That is a 16 entries irq_desc* array, so we can go from a legacy IRQ number to an irq_desc on platform that have legacy/ISA crap floating around. On powerpc, what I do is that I always reserve entries 0...15 of my remapping array in such a way that linux virtual irq 0 is always reserved, and 1...15 are only ever assigned to legacy interrupts if they exist in the system, or left unassigned if they don't. > I think we can make this change fairly smoothly if before the code is > merged into Linus's tree we have a patchset prepared with a all of the > core infrastructure changes and a best effort at all of the driver > changes. Then early some merge window we merge the patchset, and > fixup the drivers that were missed. As long as we do things properly and not with a big "DESIGNED FOR x86" hack in the middle that makes it hard for everybody else, I agree. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, Feb 16, 2007 at 09:43:24PM +0100, Arnd Bergmann wrote: > On Friday 16 February 2007 20:52, Russell King wrote: > > On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: > > > We did something like this a few years back on the s390 architecture, > > > which > > > happens to be lucky enough not to share any interrupt based drivers with > > > any of the other architectures. > > > > What you're proposing is looking similar to a proposal I put forward some > > 4 years ago, but was rejected. Maybe times have changed and there's a > > need for it now. > > Yes, I think times have changed, with the increased popularity of MSI > and paravirtualized devices. A few points on your old proposal though: > > - Doing it per architecture no longer sounds feasible, I think it would > need to be done per subsystem so that the drivers can be adapted to > a new interface, and most drivers are used across multiple architectures. > - struct irq sounds much more fitting than struct irq_desc > - creating new irq_foo() functions to replace foo_irq() also sounds right. > - doing subsystem specific abstractions ideally allows the drivers to > not even need to worry about the irq pointer, significantly simplifying > the interface for register/unregister. I agree with your points above, except for: > - I don't see the point in splitting request_irq into irq_request and > irq_register. This was to work around those scenarios where you want to mark an IRQ resource as being in use prior to actually using it in much the same way as is done with IO ports. I've come across hardware where you need to claim the interrupt with the controller masked, configure the device generating the interrupt appropriately, and only then unmask it. Otherwise you end up spinning. To work around that, we've had to introduce additional flags into the genirq subsystem - IRQF_NOAUTOEN - whereas separating the "obtain" from the "start using" bit of request_irq would've made this unnecessary. Another example where this (was|still is) used is the IDE code, but that's probably been cleaned up in some way now. There's nothing wrong with keeping a combined "request_irq" for the common case though. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 20:52, Russell King wrote: > On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: > > We did something like this a few years back on the s390 architecture, which > > happens to be lucky enough not to share any interrupt based drivers with > > any of the other architectures. > > What you're proposing is looking similar to a proposal I put forward some > 4 years ago, but was rejected. Maybe times have changed and there's a > need for it now. Yes, I think times have changed, with the increased popularity of MSI and paravirtualized devices. A few points on your old proposal though: - Doing it per architecture no longer sounds feasible, I think it would need to be done per subsystem so that the drivers can be adapted to a new interface, and most drivers are used across multiple architectures. - struct irq sounds much more fitting than struct irq_desc - creating new irq_foo() functions to replace foo_irq() also sounds right. - I don't see the point in splitting request_irq into irq_request and irq_register. - doing subsystem specific abstractions ideally allows the drivers to not even need to worry about the irq pointer, significantly simplifying the interface for register/unregister. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: > On Friday 16 February 2007 13:10, Eric W. Biederman wrote: > > To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ > > throughout the entire kernel. Getting the arch specific code and the > > generic kernel infrastructure fixed and ready for that change looks > > like a pain but pretty doable. > > We did something like this a few years back on the s390 architecture, which > happens to be lucky enough not to share any interrupt based drivers with > any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Message attached. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: --- Begin Message --- Hi, I've recently received an updated development system from ARM Ltd, which has caused me to become concerned about whether the existing IRQ infrastructure for the ARM architecture is really up to the job of handling the developments which will occur over the 2.6 lifetime. Essentially we're going to be seeing the emergence of vectored interrupt controllers, and knowing hardware designers, they'll continue the practice of chaining interrupt controllers (which is pretty common on ARM today.) I have some hardware here today which has a vectored interrupt controller chained after two non-vectored controllers. This vectored interrupt controller is on an add-on card, and so has no fixed address space and no fixed IRQ numbering. Rather than having the job of rewriting this code during 2.6, I'd much prefer to get something sorted, even if it is ARM only before 2.6. I believe that there are some common problems with the existing API which have been hinted at over the last few days, such as large NR_IRQS. As such, I think it would be a good idea to try to thrash this issue out and get something which everyone is happy with. Additionally, I've added Alan's "reserve then hook" idea to the API; I seem to remember there is a case in IDE which needs something like this. Please note that what I am proposing is not to strip out the existing API between now and 2.7; what I am proposing is a structure for 2.7 which can optionally be implemented by architectures and used in architecture specific drivers now if they feel they would benefit from it. Comments? (other than "wtf are you thinking about this so close to 2.6, are you mad" 8)) Linux Interrupt API === Russell King <[EMAIL PROTECTED]> The Linux Interrupt API provides a flexible mechanism to handle and control interrupts within the kernel. The design requirements for this API are: - must have as little overhead as possible for commodity hardware - must be easy and obvious to use - must allow complex multi-level interrupt implementations to exist transparently to device drivers - must be compatible with the existing API Essentially, this means that implementation of the existing API must be simple. -- The API. struct irq { /* architecture defined information */ /* must not be dereferenced by drivers */ /* eg, x86's irq_desc_t or sparc64's struct ino_bucket */ }; #define NO_IRQ /** * irq_get - increment reference count on the IRQ descriptor * @irq: interrupt descriptor * * IRQ descriptor reference counting is mandatory for * implementations which provide dynamically allocated IRQ * descriptors. statically allocated IRQ descriptor * implementations may define these to be no-ops. */ struct irq *irq_get(struct irq *irq); /** * irq_put - decrement reference count on IRQ descriptor * @irq: interrupt descriptor * * Decrement the reference counter in an IRQ descriptor. * If the reference counter drops to zero, the IRQ descriptor * will be freed. * * IRQ descriptor reference counting is mandatory for * implementations which provide dynamically allocated IRQ * descriptors. statically allocated IRQ descriptor * implementations may define these to be no-ops. */ void irq_put(struct irq *irq); /** * irq_disable_nosync - disable an irq without waiting * @irq: Interrupt descriptor to disable * * Disable the selected interrupt line. Disables and Enables are * nested. * Unlike irq_disable(), this function does not ensure existing * instances of the IRQ handler have completed before returning. * * This function may be called from IRQ context. */ void irq_disable_nosync(struct irq *irq); /** * irq_disable - disable an irq and wait for completion * @irq: Interrupt descriptor to disable * * Disable the selected interrupt line. Enables and Disables are *
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 13:10, Eric W. Biederman wrote: > To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ > throughout the entire kernel. Getting the arch specific code and the > generic kernel infrastructure fixed and ready for that change looks > like a pain but pretty doable. We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. It helped a lot on s390, and I think the change will be beneficial on others as well, e.g. powerpc already uses 'virtual' interrupt numbers to collapse the large (sparse) range of interrupt numbers into 512 unique numbers. This could easily be avoided if there was simply an array of irq_desc structures per interrupt controller. However, I also think we should maintain the old interface, and introduce a new one to deal only with those cases that benefit from it (MSI, Xen, powerpc VIO, ...). This means one subsystem can be converted at a time. I don't think there is a point converting the legacy ISA interrupts to a different interface, as the concept of IRQ numbers is part of the subsystem itself (if you want to call ISA a subsystem...). For PCI, it makes a lot more sense to use something else, considering that PCI interrupts are defined as 'pins' instead of 'lines', and while an interrupt pin is defined per slot, while the line is per bus, in a system with multiple PCI buses, the line is still not necessarily unique. One interface I could imagine for PCI devices would be /* generic functions */ int request_irq_desc(struct irq_desc *desc, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id); int free_irq_desc(struct irq_desc *desc, void *dev_id); /* legacy functions */ int request_irq(int irq, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id) { return request_irq_desc(lookup_irq_desc(irq), handler, irqflags, devname, dev_id); } int free_irq(int irq, void *dev_id) { return free_irq_desc(lookup_irq_desc(irq), dev_id); } /* pci specific */ struct irq_desc *pci_request_irq(struct pci_device *dev, int pin, irq_handler_t handler) { struct irq_desc *desc = pci_lookup_irq(dev, pin); int ret; if (!desc) return NULL; ret = request_irq_desc(desc, handler, IRQF_SHARED, >dev.bus_id, dev); if (ret < 0) return NULL; return desc; } int pci_free_irq(struct pci_device *dev, int pin) { return free_irq_desc(pci_lookup_irq(dev, pin), dev); } Now I don't know enough about MSI yet, but I could imagine that something along these lines would work as well, and we could simply require all drivers that want to support MSI to use the new interfaces. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Eric W. Biederman wrote: > Well you shouldn't need to wait just run with a kernel with NR_IRQS >= 1024. > 1024 is stretch but it isn't to bad. There are already x86 boxes that have > more pins on their ioapics then that. So x86_64 and with this latest > round of patches from Len Brown and I i386 should be able to support that. > Early Xen patches did just that, but there was general criticism about the memory use. And in the paravirt_ops world, a large compile-time static allocation is not really acceptable if its only needed by Xen. But, hey, if you're OK with it I'll submit the patch ;) > On the other side 1024 looks extremely limiting for exposing pci devices. > If someone gets serious about pushing what is legal with MSI-X you may be > in trouble. As a single device is allowed to have 4096 interrupts. Not > that I can think of a user for so many but... > No, I think we'll burn that bridge when we come to it. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> So I propose we remove all assumptions from the code that we actually >> have an array of irqs. That will allow for irq_desc to be dynamically >> allocated instead of statically allocated saving memory and reducing >> kernel complexity. >> > > Sounds good to me. In Xen we have 1024 event channels which we need to > map down into a smaller irq. Aside from the complexity of maintaining a > mapping table, that's not a huge issue for now, but when we start > exposing pci devices to guests it all becomes more complex. The ideal > for us is to simply use event channel == irq, which this would allow. Well you shouldn't need to wait just run with a kernel with NR_IRQS >= 1024. 1024 is stretch but it isn't to bad. There are already x86 boxes that have more pins on their ioapics then that. So x86_64 and with this latest round of patches from Len Brown and I i386 should be able to support that. On the other side 1024 looks extremely limiting for exposing pci devices. If someone gets serious about pushing what is legal with MSI-X you may be in trouble. As a single device is allowed to have 4096 interrupts. Not that I can think of a user for so many but... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Eric W. Biederman wrote: > So I propose we remove all assumptions from the code that we actually > have an array of irqs. That will allow for irq_desc to be dynamically > allocated instead of statically allocated saving memory and reducing > kernel complexity. > Sounds good to me. In Xen we have 1024 event channels which we need to map down into a smaller irq. Aside from the complexity of maintaining a mapping table, that's not a huge issue for now, but when we start exposing pci devices to guests it all becomes more complex. The ideal for us is to simply use event channel == irq, which this would allow. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Ingo Molnar <[EMAIL PROTECTED]> writes: > > or am i missing something fundamental? One piece. At the driver level this not a big scary change. This is just a change with widespread effect. It should be no worse than enabling a very revealing new compiler warning. Every fix should be purely mechanical. There should be no need at all to think to get it right (unless things are broken today and we just don't see it.). Yes typo's and the like will happen. There will be issues. But 99% of them will be the code doesn't compile, for an obvious reason. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Andi Kleen <[EMAIL PROTECTED]> writes: >> I expect the most it makes sense to aim for 2.6.22 are the genirq >> changes so the internal arch code is passing struct irq_desc >> everywhere internally. > > Are there any livetime issues with passing pointers around? > e.g. what happens on APIC hotunplug etc.? We don't necessarily > support that yet, but for a big interface change it should > be probably kept in mind first. Ouch. Let's consider the case of pci device (using msi's) hot unplug. That case we theoretically support today but I'm not certain we account for it. The only real issue (I can imagine) would come from something that is not part of the device driver using the irq, as the device and everything associated with it should have the same lifetime rules. (You can't unplug an ioapic without unplugging the device it is connected to). So the things to consider would be things like /proc/interrupts and /proc/irq. I think we already have some kind of revoke in place when the irq goes away so it probably makes sense just to make that revoke solid and immediate. So I can't imagine any real lifetime issues that would cause us problems with a pointer. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Ingo Molnar <[EMAIL PROTECTED]> writes: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> So I propose we remove all assumptions from the code that we actually >> have an array of irqs. That will allow for irq_desc to be dynamically >> allocated instead of statically allocated saving memory and reducing >> kernel complexity. > > hm. I'd suggest to do this without changing request_irq() - and then we > could avoid the 'massive, every driver affected' change, right? It is a different aspect of the problem. But we have significant problematic inconsistencies in what drivers are doing. I know at least one driver put an irq into an unsigned char, and passed it to user space that way. So I think the driver change is very much worth doing because a pointer is a token that is much harder to abuse, than an unsigned int where you think you know how it works and so can take some liberties. > i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() > mapping facility anyway, lets just not change the driver APIs massively. > There dont seem to be that many drivers that assume that irq_desc[] is > an array - are there? We will have to have desc_to_nr(). I don't know about nr_to_desc(). Even if we do nr_to_desc() probably will just be a linked list walk. There are a lot of drivers and other pieces of the kernel that don't believe an irq is an unsigned int, and just using an unsigned int makes killing the array an expensive operation because operations go from O(1) to O(N). Now that isn't something anyone on a small machine is likely to care about (N < 32). I have no problem staggering the change. But I see a lot of benefit in going the whole way. > otherwise, in terms of the irqchips infrastructure and the API between > genirq and the irqchip arch-level drivers, this change makes quite a bit > of sense i think. Sounds good, and that is certainly the level to start. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > So I propose we remove all assumptions from the code that we actually > have an array of irqs. That will allow for irq_desc to be dynamically > allocated instead of statically allocated saving memory and reducing > kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() mapping facility anyway, lets just not change the driver APIs massively. There dont seem to be that many drivers that assume that irq_desc[] is an array - are there? otherwise, in terms of the irqchips infrastructure and the API between genirq and the irqchip arch-level drivers, this change makes quite a bit of sense i think. or am i missing something fundamental? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
> I expect the most it makes sense to aim for 2.6.22 are the genirq > changes so the internal arch code is passing struct irq_desc > everywhere internally. Are there any livetime issues with passing pointers around? e.g. what happens on APIC hotunplug etc.? We don't necessarily support that yet, but for a big interface change it should be probably kept in mind first. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
I expect the most it makes sense to aim for 2.6.22 are the genirq changes so the internal arch code is passing struct irq_desc everywhere internally. Are there any livetime issues with passing pointers around? e.g. what happens on APIC hotunplug etc.? We don't necessarily support that yet, but for a big interface change it should be probably kept in mind first. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
* Eric W. Biederman [EMAIL PROTECTED] wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() mapping facility anyway, lets just not change the driver APIs massively. There dont seem to be that many drivers that assume that irq_desc[] is an array - are there? otherwise, in terms of the irqchips infrastructure and the API between genirq and the irqchip arch-level drivers, this change makes quite a bit of sense i think. or am i missing something fundamental? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Ingo Molnar [EMAIL PROTECTED] writes: * Eric W. Biederman [EMAIL PROTECTED] wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? It is a different aspect of the problem. But we have significant problematic inconsistencies in what drivers are doing. I know at least one driver put an irq into an unsigned char, and passed it to user space that way. So I think the driver change is very much worth doing because a pointer is a token that is much harder to abuse, than an unsigned int where you think you know how it works and so can take some liberties. i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() mapping facility anyway, lets just not change the driver APIs massively. There dont seem to be that many drivers that assume that irq_desc[] is an array - are there? We will have to have desc_to_nr(). I don't know about nr_to_desc(). Even if we do nr_to_desc() probably will just be a linked list walk. There are a lot of drivers and other pieces of the kernel that don't believe an irq is an unsigned int, and just using an unsigned int makes killing the array an expensive operation because operations go from O(1) to O(N). Now that isn't something anyone on a small machine is likely to care about (N 32). I have no problem staggering the change. But I see a lot of benefit in going the whole way. otherwise, in terms of the irqchips infrastructure and the API between genirq and the irqchip arch-level drivers, this change makes quite a bit of sense i think. Sounds good, and that is certainly the level to start. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Andi Kleen [EMAIL PROTECTED] writes: I expect the most it makes sense to aim for 2.6.22 are the genirq changes so the internal arch code is passing struct irq_desc everywhere internally. Are there any livetime issues with passing pointers around? e.g. what happens on APIC hotunplug etc.? We don't necessarily support that yet, but for a big interface change it should be probably kept in mind first. Ouch. Let's consider the case of pci device (using msi's) hot unplug. That case we theoretically support today but I'm not certain we account for it. The only real issue (I can imagine) would come from something that is not part of the device driver using the irq, as the device and everything associated with it should have the same lifetime rules. (You can't unplug an ioapic without unplugging the device it is connected to). So the things to consider would be things like /proc/interrupts and /proc/irq. I think we already have some kind of revoke in place when the irq goes away so it probably makes sense just to make that revoke solid and immediate. So I can't imagine any real lifetime issues that would cause us problems with a pointer. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Ingo Molnar [EMAIL PROTECTED] writes: or am i missing something fundamental? One piece. At the driver level this not a big scary change. This is just a change with widespread effect. It should be no worse than enabling a very revealing new compiler warning. Every fix should be purely mechanical. There should be no need at all to think to get it right (unless things are broken today and we just don't see it.). Yes typo's and the like will happen. There will be issues. But 99% of them will be the code doesn't compile, for an obvious reason. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Eric W. Biederman wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. Sounds good to me. In Xen we have 1024 event channels which we need to map down into a smaller irq. Aside from the complexity of maintaining a mapping table, that's not a huge issue for now, but when we start exposing pci devices to guests it all becomes more complex. The ideal for us is to simply use event channel == irq, which this would allow. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. Sounds good to me. In Xen we have 1024 event channels which we need to map down into a smaller irq. Aside from the complexity of maintaining a mapping table, that's not a huge issue for now, but when we start exposing pci devices to guests it all becomes more complex. The ideal for us is to simply use event channel == irq, which this would allow. Well you shouldn't need to wait just run with a kernel with NR_IRQS = 1024. 1024 is stretch but it isn't to bad. There are already x86 boxes that have more pins on their ioapics then that. So x86_64 and with this latest round of patches from Len Brown and I i386 should be able to support that. On the other side 1024 looks extremely limiting for exposing pci devices. If someone gets serious about pushing what is legal with MSI-X you may be in trouble. As a single device is allowed to have 4096 interrupts. Not that I can think of a user for so many but... Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Eric W. Biederman wrote: Well you shouldn't need to wait just run with a kernel with NR_IRQS = 1024. 1024 is stretch but it isn't to bad. There are already x86 boxes that have more pins on their ioapics then that. So x86_64 and with this latest round of patches from Len Brown and I i386 should be able to support that. Early Xen patches did just that, but there was general criticism about the memory use. And in the paravirt_ops world, a large compile-time static allocation is not really acceptable if its only needed by Xen. But, hey, if you're OK with it I'll submit the patch ;) On the other side 1024 looks extremely limiting for exposing pci devices. If someone gets serious about pushing what is legal with MSI-X you may be in trouble. As a single device is allowed to have 4096 interrupts. Not that I can think of a user for so many but... No, I think we'll burn that bridge when we come to it. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 13:10, Eric W. Biederman wrote: To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ throughout the entire kernel. Getting the arch specific code and the generic kernel infrastructure fixed and ready for that change looks like a pain but pretty doable. We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. It helped a lot on s390, and I think the change will be beneficial on others as well, e.g. powerpc already uses 'virtual' interrupt numbers to collapse the large (sparse) range of interrupt numbers into 512 unique numbers. This could easily be avoided if there was simply an array of irq_desc structures per interrupt controller. However, I also think we should maintain the old interface, and introduce a new one to deal only with those cases that benefit from it (MSI, Xen, powerpc VIO, ...). This means one subsystem can be converted at a time. I don't think there is a point converting the legacy ISA interrupts to a different interface, as the concept of IRQ numbers is part of the subsystem itself (if you want to call ISA a subsystem...). For PCI, it makes a lot more sense to use something else, considering that PCI interrupts are defined as 'pins' instead of 'lines', and while an interrupt pin is defined per slot, while the line is per bus, in a system with multiple PCI buses, the line is still not necessarily unique. One interface I could imagine for PCI devices would be /* generic functions */ int request_irq_desc(struct irq_desc *desc, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id); int free_irq_desc(struct irq_desc *desc, void *dev_id); /* legacy functions */ int request_irq(int irq, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id) { return request_irq_desc(lookup_irq_desc(irq), handler, irqflags, devname, dev_id); } int free_irq(int irq, void *dev_id) { return free_irq_desc(lookup_irq_desc(irq), dev_id); } /* pci specific */ struct irq_desc *pci_request_irq(struct pci_device *dev, int pin, irq_handler_t handler) { struct irq_desc *desc = pci_lookup_irq(dev, pin); int ret; if (!desc) return NULL; ret = request_irq_desc(desc, handler, IRQF_SHARED, dev-dev.bus_id, dev); if (ret 0) return NULL; return desc; } int pci_free_irq(struct pci_device *dev, int pin) { return free_irq_desc(pci_lookup_irq(dev, pin), dev); } Now I don't know enough about MSI yet, but I could imagine that something along these lines would work as well, and we could simply require all drivers that want to support MSI to use the new interfaces. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: On Friday 16 February 2007 13:10, Eric W. Biederman wrote: To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ throughout the entire kernel. Getting the arch specific code and the generic kernel infrastructure fixed and ready for that change looks like a pain but pretty doable. We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Message attached. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: ---BeginMessage--- Hi, I've recently received an updated development system from ARM Ltd, which has caused me to become concerned about whether the existing IRQ infrastructure for the ARM architecture is really up to the job of handling the developments which will occur over the 2.6 lifetime. Essentially we're going to be seeing the emergence of vectored interrupt controllers, and knowing hardware designers, they'll continue the practice of chaining interrupt controllers (which is pretty common on ARM today.) I have some hardware here today which has a vectored interrupt controller chained after two non-vectored controllers. This vectored interrupt controller is on an add-on card, and so has no fixed address space and no fixed IRQ numbering. Rather than having the job of rewriting this code during 2.6, I'd much prefer to get something sorted, even if it is ARM only before 2.6. I believe that there are some common problems with the existing API which have been hinted at over the last few days, such as large NR_IRQS. As such, I think it would be a good idea to try to thrash this issue out and get something which everyone is happy with. Additionally, I've added Alan's reserve then hook idea to the API; I seem to remember there is a case in IDE which needs something like this. Please note that what I am proposing is not to strip out the existing API between now and 2.7; what I am proposing is a structure for 2.7 which can optionally be implemented by architectures and used in architecture specific drivers now if they feel they would benefit from it. Comments? (other than wtf are you thinking about this so close to 2.6, are you mad 8)) Linux Interrupt API === Russell King [EMAIL PROTECTED] The Linux Interrupt API provides a flexible mechanism to handle and control interrupts within the kernel. The design requirements for this API are: - must have as little overhead as possible for commodity hardware - must be easy and obvious to use - must allow complex multi-level interrupt implementations to exist transparently to device drivers - must be compatible with the existing API Essentially, this means that implementation of the existing API must be simple. -- The API. struct irq { /* architecture defined information */ /* must not be dereferenced by drivers */ /* eg, x86's irq_desc_t or sparc64's struct ino_bucket */ }; #define NO_IRQ architecture-defined-int-constant /** * irq_get - increment reference count on the IRQ descriptor * @irq: interrupt descriptor * * IRQ descriptor reference counting is mandatory for * implementations which provide dynamically allocated IRQ * descriptors. statically allocated IRQ descriptor * implementations may define these to be no-ops. */ struct irq *irq_get(struct irq *irq); /** * irq_put - decrement reference count on IRQ descriptor * @irq: interrupt descriptor * * Decrement the reference counter in an IRQ descriptor. * If the reference counter drops to zero, the IRQ descriptor * will be freed. * * IRQ descriptor reference counting is mandatory for * implementations which provide dynamically allocated IRQ * descriptors. statically allocated IRQ descriptor * implementations may define these to be no-ops. */ void irq_put(struct irq *irq); /** * irq_disable_nosync - disable an irq without waiting * @irq: Interrupt descriptor to disable * * Disable the selected interrupt line. Disables and Enables are * nested. * Unlike irq_disable(), this function does not ensure existing * instances of the IRQ handler have completed before returning. * * This function may be called from IRQ context. */ void irq_disable_nosync(struct irq *irq); /** * irq_disable - disable an irq and wait for completion * @irq: Interrupt descriptor to disable * * Disable the selected interrupt line. Enables and Disables
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 20:52, Russell King wrote: On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Yes, I think times have changed, with the increased popularity of MSI and paravirtualized devices. A few points on your old proposal though: - Doing it per architecture no longer sounds feasible, I think it would need to be done per subsystem so that the drivers can be adapted to a new interface, and most drivers are used across multiple architectures. - struct irq sounds much more fitting than struct irq_desc - creating new irq_foo() functions to replace foo_irq() also sounds right. - I don't see the point in splitting request_irq into irq_request and irq_register. - doing subsystem specific abstractions ideally allows the drivers to not even need to worry about the irq pointer, significantly simplifying the interface for register/unregister. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, Feb 16, 2007 at 09:43:24PM +0100, Arnd Bergmann wrote: On Friday 16 February 2007 20:52, Russell King wrote: On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Yes, I think times have changed, with the increased popularity of MSI and paravirtualized devices. A few points on your old proposal though: - Doing it per architecture no longer sounds feasible, I think it would need to be done per subsystem so that the drivers can be adapted to a new interface, and most drivers are used across multiple architectures. - struct irq sounds much more fitting than struct irq_desc - creating new irq_foo() functions to replace foo_irq() also sounds right. - doing subsystem specific abstractions ideally allows the drivers to not even need to worry about the irq pointer, significantly simplifying the interface for register/unregister. I agree with your points above, except for: - I don't see the point in splitting request_irq into irq_request and irq_register. This was to work around those scenarios where you want to mark an IRQ resource as being in use prior to actually using it in much the same way as is done with IO ports. I've come across hardware where you need to claim the interrupt with the controller masked, configure the device generating the interrupt appropriately, and only then unmask it. Otherwise you end up spinning. To work around that, we've had to introduce additional flags into the genirq subsystem - IRQF_NOAUTOEN - whereas separating the obtain from the start using bit of request_irq would've made this unnecessary. Another example where this (was|still is) used is the IDE code, but that's probably been cleaned up in some way now. There's nothing wrong with keeping a combined request_irq for the common case though. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote: Getting the drivers changed actually looks to be pretty straight forward it will just be a very large mechanical change. We change the type where of variables where appropriate and every once in a while introduce an irq_nr(irq) to get the actual irq number for the places that care (ISA or print statements). Dunno about that irq_nr thingy. If we go that way, I'd be tempted to remove the number completely from the public side of irq_desc... or not. On powerpc, we have this remapped thingy because we completely separate the linux virtual interrupt domain from the physical numbering domains of each PIC. Your change would turn the linux virtual domain into pointers, removing the need for an array and associated limitations, which is nice. So to a given irq_desc / irq virtual number today, I match a pair HW number (which is a special typedef which is currently defined as an unsigned long) and a pointer to the irq host (which is the entity that define a HW number domain). That means that you can have multiple hosts and a given HW number can exist multiple times, once per host. Do you think the irq_hwnumber_t thingy I have should then be generalized and put into the irq_desc ? I would need an additional void * pointer to the irq host as well (it's not a 1:1 relationship to an irq chip and need to be accessed by generic code). Having the HW number be clearly specific to a domain controller makes also a lot of sense in the embedded field with lots of cascaded interrupt controllers. It avoids having to play all sorts of tricks to assign ranges of numbers to various controllers in the system. Only the local number on a given controller matters, the rest is dynamically assigned. Another option would be to have the irq_desc be created by the arch and embedded in a larger data structure, in which case the HW number would be part of the private part of that data structure. Though I suppose that could be a problem with ISA... I suspect that for backward compatibility, we will need to keep something (optionally maybe via CONFIG_*) for ISA/legacy interrupts. That is a 16 entries irq_desc* array, so we can go from a legacy IRQ number to an irq_desc on platform that have legacy/ISA crap floating around. On powerpc, what I do is that I always reserve entries 0...15 of my remapping array in such a way that linux virtual irq 0 is always reserved, and 1...15 are only ever assigned to legacy interrupts if they exist in the system, or left unassigned if they don't. I think we can make this change fairly smoothly if before the code is merged into Linus's tree we have a patchset prepared with a all of the core infrastructure changes and a best effort at all of the driver changes. Then early some merge window we merge the patchset, and fixup the drivers that were missed. As long as we do things properly and not with a big DESIGNED FOR x86 hack in the middle that makes it hard for everybody else, I agree. Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote: * Eric W. Biederman [EMAIL PROTECTED] wrote: So I propose we remove all assumptions from the code that we actually have an array of irqs. That will allow for irq_desc to be dynamically allocated instead of statically allocated saving memory and reducing kernel complexity. hm. I'd suggest to do this without changing request_irq() - and then we could avoid the 'massive, every driver affected' change, right? i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr() mapping facility anyway, lets just not change the driver APIs massively. There dont seem to be that many drivers that assume that irq_desc[] is an array - are there? otherwise, in terms of the irqchips infrastructure and the API between genirq and the irqchip arch-level drivers, this change makes quite a bit of sense i think. or am i missing something fundamental? Well, I don't want to see anything like desc_to_nr / nr_to_desc unless the number in question is a virtual number. That is, there is no way we should go that way and keep passing a HW number through request_irq. That would just be a total nightmare for powerpc and sparc at least. What we can do is generalize the powerpc virtual irq scheme though. You can see the implementation in arch/powerpc/kernel/irq.c starting from the definition of irq_alloc_host() though for some stupid reason, I've put all the documentation in include/asm-powerpc/irq.h so you might want to start there. Once the IRQ numbers are virtualized, it becomes easier to slowly migrate things to use irq_desc_t * while still having a virutal number available. Once everything has been migrated, we can then get rid of the virtual numbers completely except maybe for an optional 16 entries array for legacy cruft. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
Rather than having the job of rewriting this code during 2.6, I'd much prefer to get something sorted, even if it is ARM only before 2.6. I believe that there are some common problems with the existing API which have been hinted at over the last few days, such as large NR_IRQS. As such, I think it would be a good idea to try to thrash this issue out and get something which everyone is happy with. Additionally, I've added Alan's reserve then hook idea to the API; I seem to remember there is a case in IDE which needs something like this. You might want to have a look at the powerpc API with it's remaping capabilities. It's very nice for handling multiple domain spaces. It might be of some use for you. I like your proposed API, I think that's where we want to go in the long run. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote: You might want to have a look at the powerpc API with it's remaping capabilities. It's very nice for handling multiple domain spaces. It might be of some use for you. I don't consider the powerpc virtual IRQs a solution for the problem. While I believe you did the right thing for powerpc with generalizing this over all its platforms, it really isn't more than a workaround for the problem that we can't deal well with the static irq_desc array. When that problem is now getting worse on other architectures, we should try to get it right on all of them, rather than spreading the workaround further. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Sat, 2007-02-17 at 02:37 +0100, Arnd Bergmann wrote: On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote: You might want to have a look at the powerpc API with it's remaping capabilities. It's very nice for handling multiple domain spaces. It might be of some use for you. I don't consider the powerpc virtual IRQs a solution for the problem. While I believe you did the right thing for powerpc with generalizing this over all its platforms, it really isn't more than a workaround for the problem that we can't deal well with the static irq_desc array. It's not a solution per-se, though it contains elements of solution like the reverse mappin, which I use to map HW numbers to virtual irqs but can trivially adapt to map HW numbers to irq_desc pointers. Among other things, I want to make sure that we don't end up with just putting an irq number in a field of the irq_desc and have half of the drivers peek at it and assume we can convert between irq_desc* and number in arbitrary ways. The HW irq number should be as much opaque as possible from the world outside of the PIC code and/or arch code that assign them. That's an area where the powerpc and/or sparc code might be of use. When that problem is now getting worse on other architectures, we should try to get it right on all of them, rather than spreading the workaround further. Yes, but I'd like aspects of my remapping work to be included in whatever we come up with, which is to have the new irq_desc either hide the underlying HW number, or at least associate it make it very clear that it's an opaque token and not guaranteed to be unique accross multiple PICs in the system. In addition, if we remove the numbers, archs will need basically the exact same services provided by the powerpc irq core for reverse mapping (going from a HW irq number on a given PIC back to an irq_desc *). Either using a linear array for simple PICs or a radix tree for platforms with very big interrupt numbers (BTW. I think we have lockless radix trees nowadays, I can remove the spinlocks to protect it in the powerpc remapper). Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/