On 2014/12/13 4:35, Thomas Gleixner wrote: > Folks, > > after mulling this in my head for quite some time, I'm going to > postpone the whole thing for 3.20. > > That said, I need to say, that I'm really happy with the outcome of > this massive overhaul. I really want to thank all involved people, > especially Jiang, for their great work and help so far!!! > > The hierarchical irq domains really improve the code by distangling > the various subsystems and the arm[64] use cases just prove that it > was the right decision. > > We're almost there with x86 but my gut feeling tells me that pushing > it now is too risky. I rather prefer quiet holidays for all of us than > the nagging fear that the post holiday inbox will be full of obscure > bug reports and we then start a chase and bandaid race which will kill > the well earned recreation in an instant. Hi Thomas, It's more safe to let it mature for another merge window in tip tree:)
> > This will block other things in that area for a while, but it's the > only sane decision at the moment, unless Linus insists on pulling the > lot and promises to deal with the fallout. :) > > The reasons why I decided to do so are: > > - The bugs we found in the last week. That tells me that there is > some more stuff lurking. > > - The already existing mess in a some areas which got unearthed by > this work in the last week. That definitely needs a thorough > cleanup and not some more bandaids. > > - Lack of proper debugging features. Sending out per issue debug > patches simply does not scale. > > - It's not bisectable and unfortunately there are too many fixes to > various places to make manual bisection feasible. > > For 3.20 I want to proceed in the following way: > > - Apply all bug fixes to x86/apic > > - Address the issues with the resource management (and elsewhere) > proper on top > > - Add a proper debugging mechanism (the existing irqdomain debugfs > interface is completely useless). > > For the hierarchical domains we really want two things: > > 1) A debugfs interface which lets us introspect the hierarchy. > > I was working on that before I got dragged into bug chasing and > merge window frenzy. > > For proper introspection down to the hardware level this > requires either domain/irq_chip specific callbacks or some > unified way to track the current state. The latter is painful as > it requires to store information redundantly. > > So having domain/chip callbacks to retrieve the state is the > right solution. Most chip/domain implementations cache their > [hardware] state already, so providing an accessor to convert > that into a common data format is the best way. If the callback > is not implemented then the information is not available or > maybe not relevant. > > I'm not going to have a per domain/chip seqfile print function > as this is just a complete waste. Pretty printing obscure > hardware information does not help much for the general user. We > rather have the raw data and proper post processing tools which > can provide that pretty print information than bloating the > kernel binary with randomized and possibly useless seq_print > functions. > > Another reason why I want just raw binary data is that I want to > use exactly the same mechanism for tracing. See below. > > After looking at the various new domain/chip implementations its > sufficient to have 16 bytes of storage space for this, but > that's a minor detail. > > To provide a proper translation into pretty printed values we > can do the following: > > Create a new section for storing such data and have a data > structure there which describes the content of the buffer. That > section goes into a seperate file and not linked into the > kernel binary. Simple enough for tools to pick up and for bug > reporters to use/provide. If the stupid file is not available > we still can recreate it from source and translate the hex > dump. And in the most cases the pure hexdump will be sufficient > for the people who need actually to look at this. > > 2) Proper trace point support so we can actually track allocation > and the hardware access at the various domain levels because > some of these issues cannot be decoded by looking at a state > snapshot in debugfs. With some of them we even can't access > debugfs at all. > > Though one issue with that is, that for the early boot process > there is no way to store that information as the tracer gets > enabled way after init_IRQ(). But there is no reason why the > tracer could not be enabled before that. All it needs is a > working memory allocator. Steven? > > Now there is another class of problems which might be hard to > debug. When the machine just boots into a hang, so we dont get a > ftrace output neither from an oops nor from a console. It would > be nice if we could have a command line option which prints > enabled trace points via (early_)printk. That would avoid > sending out ad hoc printk debug patches which will basically > provide the same information as the trace_points. That would be > useful for other hard to debug boot hangs as well. Steven? > > I think the above can be solved, so we need to agree on a proper > set of tracepoints. I came up with the following list: > > - trace_irqdomain_create(domain->id, domain->name, ...) > - trace_irqdomain_destroy(domain->id) > > - trace_irqdomain_alloc(irq_data) > > struct irq_data contains all relevant information for > assigning the tracepoint data. > > __entry->virq = irq_data->virq; > __entry->domainid = irq_data->domain; > __entry->hwirq = irq_data->hwirq; > TP_STORE_DATA(__entry->data, irq_data); > > Where TP_STORE_DATA checks for the above callback and uses it > if available, otherwise we just clear the data field. > > So this reuses the callback which we want for debugfs > anyway. The print format is just hexdump. See my above > rationale for that. > > - trace_irqdomain_free(virq, domain->id) > > - trace_irqdomain_hw_access(irqdata) > > Same "data" and pretty printing argument as for > trace_irqdomain_alloc() > > The obvious place to put such a trace point is > e.g. irq_chip_write_msi_msg() where the callback records the > currently written msi msg. > > Once we have sorted that, I'll push x86/apic into a seperate git > repository so the history is preserved. > > After that I'll redo x86/apic from scratch with proper ordering and > all fixes folded to the right places so the whole thing becomes > bisectable. > > Thoughts? This really sounds a good idea to debug interrupt. So I will work on following items for 3.20: 1) Continue to convert PCI MSI code into generic MSI code as much as possible. 2) Simplify interrupt remapping initialization on x86, the first version has been posted at: https://lkml.org/lkml/2014/12/10/20. 3) Solve new bugs if any:) Thanks! Gerry > > Thanks, > > Thomas > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/