Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On Thu, 25 Jan 2018 16:46:06 -0800 Frank Rowandwrote: > The point is that using ftrace means there are use cases for the > debug information where the information will not be available. Note, this email came out when I was traveling. I'm now looking at the code and trace events are enabled right after rcu_init() in init/main.c. The early_initcall() enables syscall events(), as they are not available earlier. -- Steve
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:53, Tyrel Datwyler wrote: > On 01/25/2018 01:49 PM, Frank Rowand wrote: >> Hi Wolfram, >> >> On 01/25/18 03:03, Steven Rostedt wrote: >>> On Wed, 24 Jan 2018 22:55:13 -0800 >>> Frank Rowandwrote: >>> Hi Steve, >>> Off the top of your head, can you tell me know early in the boot process a trace_event can be called and successfully provide the data to someone trying to debug early boot issues? >>> >>> The trace events are enabled by early_initcall(). >> >> < snip > >> >> This means that ftrace can not be used for the of_node_get(), >> of_node_put(), and of_node_release() debug info, because >> these functions are called before early_initcall(). Please >> use pr_debug() for these functions. > > I would argue that early boot debugging doesn't completely negate the > usefulness of this tracing infrastructure. I did not say or imply that it did. I am pointing out that this implementation does not meet the needs of other use cases. And potentially provides misleading information (or more precisely misleading lack of information) in some other use cases. > I get that no information > is available in the trace up until ftrace is setup by its > early_initcall, but I still found issues after early boot using this > patch and I would hope that it would be somewhat obvious if > references are out of whack once the ftrace data becomes available. > In the dynamic case on Power we often do reconfig well after boot on > live systems which produces a lot of reference put/gets. This patch > made it easy to identify several reference leaks and underflows in > our attach and detach logic with the added aid of being able to turn > on the stacktrace for each call in the ftrace data. Yes, you can get stacktraces relatively easily. This is the strongest argument for using ftrace. My assumption has been that the stack trace is useful for of_node_get() and of_node_put(). Is there _large_ value to the stack trace for of_reconfig_notify()? > Another thought is it would be nice if we could have the best of both > worlds such that the tracepoints were pr_debugs up until the ftrace > early_initcall. Or, I suppose we could ifdef it and make the ftrace > tracepoints a configuration option, such that if it wasn't configured > we implement the tracepoint functions as pr_debugs. This makes early > boot an option. Just spit balling ideas. An overly complex solution. This is just debug. Worst case alternative is that the patches live on, out of tree. So nope. > > -Tyrel > >> >> As far as I know, the of_reconfig_notify() could remain an >> ftrace instrumented function. But now that the only thing >> that would be ftrace instrumented is of_reconfig_notify(), >> I don't see a strong justification for changing the existing >> pr_debug() calls to an ftrace alternative. Though I suspect >> the original author of the patch still might desire to have >> the "#ifdef DEBUG" surrounding the pr_debug() calls removed >> since one of his issues was having to recompile his kernel >> to do his debugging. >> >> -Frank >> > >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:14, Wolfram Sang wrote: > >> This means that ftrace can not be used for the of_node_get(), >> of_node_put(), and of_node_release() debug info, because >> these functions are called before early_initcall(). > > For the record: You can still unbind/bind devices. This is how I > debugged an issue. I wasn't implying that the data wasn't usable for any use case. The point is that using ftrace means there are use cases for the debug information where the information will not be available.
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:12, Wolfram Sang wrote: > Frank, > > here seems to be a misunderstanding going on. I don't want to push this > patch upstream against all odds. I merely wanted to find out what the > status of this patch is. Because one possibility was that it had just > been forgotten... > >>> So, I thought reposting would be a good way of finding out if your >>> concerns were addressed in the discussion or not. If I overlooked Marking a patch as RFC (as you mention just below here) is very different than explicitly stating something like: Frank, you had concerns with version 1, were your concerns addressed in the thread about version 1? You mention below that adding RFC to the title was providing the information that I needed. I am saying that the communication was not clear, was implied at best, and should have be more explicit. I actually didn't even notice that the patch title had changed from not an RFC, to being an RFC, so the subtle clue went right over my head. I just treated it as I would any RFC patch. >> Then you should have stated that there were concerns raised in the >> discussion and asked me if my concerns were addressed. > > From my perspective, I have done that: > > I marked the patch as RFC. I put you on the CC list. I asked about the > possibility of applying it. It was not very elaborate, but hey, this is > just a simple debugging patch :) After reading through the original patch thread, I did not think that the issues raised had been fully addressed. You read the thread and thought they had. No big deal on coming to different conclusions. I think you and I are talking past each other a little bit. My comments in the email you are responding to are because I don't think that the previous emails have been as clear as you think. I could read between the lines and see how you think that you were being clear. But from my perspective, I'm asking for more explicit statements and less implied statements. My first real response (the response that pointed out that Rob had made an observation / suggestion that was not responded to) was coming from the perspective that the issues in the first thread had not been fully addressed. As I was writing that response, I felt that I might as well do a review, even though I thought the previous thread was dangling. Which led to a lot of issues and a few more emails pointing them out. > I totally would have accepted a high level "No, that won't fly > because...". Or a high level "This and that would need a change". Or > something like that. I intentionally sent this out as RFC because I know > there is some testing missing. I wanted to know if it is worth taking > further steps with this patch. > > So, I simply wanted to know if you (still) have fundamental issues with > the patch? It would have been good if you had simply stated so in exactly those words. I did not read the original email as saying that. I read the original email as saying (my paraphrase) "please apply it". You did _not_ use the words that I paraphrased, you actually said "So what about applying it?". I misunderstood what you were trying to say. I apologize for that. > That needs to be discussed first before we go into coding > details. I think it is fine to not apply it if there are reasons. I'd > like to know them, however, for a better understanding. Too late now. :-) I've already done the reviewing and provided all of the reasons that are show stoppers for the patch and how to fix. One thing that I did not mention in this thread is that I have an aversion to using ftrace for what is purely debugging data (which this is) because there is a danger that trace points become user space ABI. That is a whole long discussion that we do not have to have because I am not subjecting this patch to that objection. > For me, this is a super-super-side project, so if it causes too much > hazzle, I just leave it here and know interested people can find it > easier now. But if it could be applied with a sane amount of effort, I > was offering that. > > Was that understandable? I think so. And in return? We can always talk more at the next conference if you want. -Frank > > Kind regards, > >Wolfram >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/2018 01:49 PM, Frank Rowand wrote: > Hi Wolfram, > > On 01/25/18 03:03, Steven Rostedt wrote: >> On Wed, 24 Jan 2018 22:55:13 -0800 >> Frank Rowandwrote: >> >>> Hi Steve, >> >>> >>> Off the top of your head, can you tell me know early in the boot >>> process a trace_event can be called and successfully provide the >>> data to someone trying to debug early boot issues? >> >> The trace events are enabled by early_initcall(). > > < snip > > > This means that ftrace can not be used for the of_node_get(), > of_node_put(), and of_node_release() debug info, because > these functions are called before early_initcall(). Please > use pr_debug() for these functions. I would argue that early boot debugging doesn't completely negate the usefulness of this tracing infrastructure. I get that no information is available in the trace up until ftrace is setup by its early_initcall, but I still found issues after early boot using this patch and I would hope that it would be somewhat obvious if references are out of whack once the ftrace data becomes available. In the dynamic case on Power we often do reconfig well after boot on live systems which produces a lot of reference put/gets. This patch made it easy to identify several reference leaks and underflows in our attach and detach logic with the added aid of being able to turn on the stacktrace for each call in the ftrace data. Another thought is it would be nice if we could have the best of both worlds such that the tracepoints were pr_debugs up until the ftrace early_initcall. Or, I suppose we could ifdef it and make the ftrace tracepoints a configuration option, such that if it wasn't configured we implement the tracepoint functions as pr_debugs. This makes early boot an option. Just spit balling ideas. -Tyrel > > As far as I know, the of_reconfig_notify() could remain an > ftrace instrumented function. But now that the only thing > that would be ftrace instrumented is of_reconfig_notify(), > I don't see a strong justification for changing the existing > pr_debug() calls to an ftrace alternative. Though I suspect > the original author of the patch still might desire to have > the "#ifdef DEBUG" surrounding the pr_debug() calls removed > since one of his issues was having to recompile his kernel > to do his debugging. > > -Frank >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
> This means that ftrace can not be used for the of_node_get(), > of_node_put(), and of_node_release() debug info, because > these functions are called before early_initcall(). For the record: You can still unbind/bind devices. This is how I debugged an issue. signature.asc Description: PGP signature
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Frank, here seems to be a misunderstanding going on. I don't want to push this patch upstream against all odds. I merely wanted to find out what the status of this patch is. Because one possibility was that it had just been forgotten... > > So, I thought reposting would be a good way of finding out if your > > concerns were addressed in the discussion or not. If I overlooked > > Then you should have stated that there were concerns raised in the > discussion and asked me if my concerns were addressed. From my perspective, I have done that: I marked the patch as RFC. I put you on the CC list. I asked about the possibility of applying it. It was not very elaborate, but hey, this is just a simple debugging patch :) I totally would have accepted a high level "No, that won't fly because...". Or a high level "This and that would need a change". Or something like that. I intentionally sent this out as RFC because I know there is some testing missing. I wanted to know if it is worth taking further steps with this patch. So, I simply wanted to know if you (still) have fundamental issues with the patch? That needs to be discussed first before we go into coding details. I think it is fine to not apply it if there are reasons. I'd like to know them, however, for a better understanding. For me, this is a super-super-side project, so if it causes too much hazzle, I just leave it here and know interested people can find it easier now. But if it could be applied with a sane amount of effort, I was offering that. Was that understandable? Kind regards, Wolfram signature.asc Description: PGP signature
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Hi Wolfram, On 01/25/18 03:03, Steven Rostedt wrote: > On Wed, 24 Jan 2018 22:55:13 -0800 > Frank Rowandwrote: > >> Hi Steve, > >> >> Off the top of your head, can you tell me know early in the boot >> process a trace_event can be called and successfully provide the >> data to someone trying to debug early boot issues? > > The trace events are enabled by early_initcall(). < snip > This means that ftrace can not be used for the of_node_get(), of_node_put(), and of_node_release() debug info, because these functions are called before early_initcall(). Please use pr_debug() for these functions. As far as I know, the of_reconfig_notify() could remain an ftrace instrumented function. But now that the only thing that would be ftrace instrumented is of_reconfig_notify(), I don't see a strong justification for changing the existing pr_debug() calls to an ftrace alternative. Though I suspect the original author of the patch still might desire to have the "#ifdef DEBUG" surrounding the pr_debug() calls removed since one of his issues was having to recompile his kernel to do his debugging. -Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On Wed, 24 Jan 2018 22:55:13 -0800 Frank Rowandwrote: > Hi Steve, > > Off the top of your head, can you tell me know early in the boot > process a trace_event can be called and successfully provide the > data to someone trying to debug early boot issues? The trace events are enabled by early_initcall(). > > Also, way back when version 1 of this patch was being discussed, > a question about stacktrace triggers: > > >>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options > >>> # cat trace | grep -A6 "/pci@8002018" > >> > >> Just to let you know that there is now stacktrace event triggers, > where > >> you don't need to stacktrace all events, you can pick and choose. And > >> even filter the stack trace on specific fields of the event. > > > > This is great, and I did figure that out this afternoon. One thing I > was > > still trying to determine though was whether its possible to set these > > triggers at boot? As far as I could tell I'm still limited to > > "trace_options=stacktrace" as a kernel boot parameter to get the stack > > for event tracepoints. > > No not yet. But I'll add that to the todo list. > > Thanks, > > -- Steve > > Is this still on your todo list, or is it now available? Still on the todo list. As someone once said, there really is no todo list, but just work on the stuff that people are yelling the loudest about. Yell louder, and it will get done quicker ;-) -- Steve
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Hi Steve, On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. > > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h > Off the top of your head, can you tell me know early in the boot process a trace_event can be called and successfully provide the data to someone trying to debug early boot issues? Also, way back when version 1 of this patch was being discussed, a question about stacktrace triggers: >>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options >>> # cat trace | grep -A6 "/pci@8002018" >> >> Just to let you know that there is now stacktrace event triggers, where >> you don't need to stacktrace all events, you can pick and choose. And >> even filter the stack trace on specific fields of the event. > > This is great, and I did figure that out this afternoon. One thing I was > still trying to determine though was whether its possible to set these > triggers at boot? As far as I could tell I'm still limited to > "trace_options=stacktrace" as a kernel boot parameter to get the stack > for event tracepoints. No not yet. But I'll add that to the todo list. Thanks, -- Steve Is this still on your todo list, or is it now available? Thanks, Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. It's been nine months since version 1. If you are going to include the dynamic updates part of the patch then please test them. > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. I am OK with the line length warnings in this patch. Why can't the macro error be fixed? A file entry needs to be added to MAINTAINERS. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/22/18 03:49, Wolfram Sang wrote: > Hi Frank, > >> Please go back and read the thread for version 1. Simply resubmitting a >> forward port is ignoring that whole conversation. >> >> There is a lot of good info in that thread. I certainly learned stuff in it. > > Yes, I did that and learned stuff, too. My summary of the discussion was: > > - you mentioned some drawbacks you saw (like the mixture of trace output > and printk output)> - most of them look like addressed to me? (e.g. Steven > showed a way to redirect > printk to trace > - you posted your version (which was, however, marked as "not user friendly" > even by yourself) Not exactly a fair quoting. There were two things I said: "Here is a patch that I have used. It is not as user friendly in terms of human readable stack traces (though a very small user space program should be able to fix that)." So easy to fix using existing userspace programs to convert kernel addresses to symbols. "FIXME: Currently using pr_err() so I don't need to set loglevel on boot. So obviously not a user friendly tool!!! The process is: - apply patch - configure, build, boot kernel - analyze data - remove patch" So not friendly because it uses pr_err() instead of pr_debug(). In a reply I said if I submitted my patches I would change it to use pr_debug() instead. So not an issue. And not user friendly because it requires patching the kernel. Again a NOP if I submitted my patch, because the patch would already be in the kernel. But whatever, let's ignore that - a poor quoting is not a reason to reject this version of the patch. > - The discussion stalled over having two approaches Then you should have stated such when you resubmitted. > So, I thought reposting would be a good way of finding out if your > concerns were addressed in the discussion or not. If I overlooked Then you should have stated that there were concerns raised in the discussion and asked me if my concerns were addressed. > something, I am sorry for that. Still, my intention is to continue the > discussion, not to ignore it. Because as it stands, we don't have such a > debugging mechanism in place currently, and with people working with DT > overlays, I'd think it would be nice to have. > > Kind regards, > >Wolfram > Rob suggested: > > @@ -25,8 +28,10 @@ > */ > struct device_node *of_node_get(struct device_node *node) > { > - if (node) > + if (node) { > kobject_get(>kobj); > + trace_of_node_get(refcount_read(>kobj.kref.refcount), node->full_name); Seems like there should be a kobj wrapper to read the refcount. As far as I noticed, that was never addressed. I don't know the answer, but the question was asked. And if there is no such function, then there is at least kref_read(), which would improve the code a little bit. I'll reply to the patch 0/1 and patch 1/1 emails with review comments. -Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/23/18 04:11, Michael Ellerman wrote: > Wolfram Sangwrites: > >> Hi Frank, >> >>> Please go back and read the thread for version 1. Simply resubmitting a >>> forward port is ignoring that whole conversation. >>> >>> There is a lot of good info in that thread. I certainly learned stuff in >>> it. >> >> Yes, I did that and learned stuff, too. My summary of the discussion was: >> >> - you mentioned some drawbacks you saw (like the mixture of trace output >> and printk output) >> - most of them look like addressed to me? (e.g. Steven showed a way to >> redirect >> printk to trace) >> - you posted your version (which was, however, marked as "not user friendly" >> even by yourself) >> - The discussion stalled over having two approaches >> >> So, I thought reposting would be a good way of finding out if your >> concerns were addressed in the discussion or not. If I overlooked >> something, I am sorry for that. Still, my intention is to continue the >> discussion, not to ignore it. Because as it stands, we don't have such a >> debugging mechanism in place currently, and with people working with DT >> overlays, I'd think it would be nice to have. > > Yeah I agree with all of that, I didn't think there were really any > concerns left outstanding. These trace points are very useful, I've > twice added them to a kernel to debug something, so it would be great > for them to be in mainline. > > cheers > Yes, I believe there are concerns outstanding. I'll try to read through the whole thread today to make sure I'm not missing anything. -Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Wolfram Sangwrites: > Hi Frank, > >> Please go back and read the thread for version 1. Simply resubmitting a >> forward port is ignoring that whole conversation. >> >> There is a lot of good info in that thread. I certainly learned stuff in it. > > Yes, I did that and learned stuff, too. My summary of the discussion was: > > - you mentioned some drawbacks you saw (like the mixture of trace output > and printk output) > - most of them look like addressed to me? (e.g. Steven showed a way to > redirect > printk to trace) > - you posted your version (which was, however, marked as "not user friendly" > even by yourself) > - The discussion stalled over having two approaches > > So, I thought reposting would be a good way of finding out if your > concerns were addressed in the discussion or not. If I overlooked > something, I am sorry for that. Still, my intention is to continue the > discussion, not to ignore it. Because as it stands, we don't have such a > debugging mechanism in place currently, and with people working with DT > overlays, I'd think it would be nice to have. Yeah I agree with all of that, I didn't think there were really any concerns left outstanding. These trace points are very useful, I've twice added them to a kernel to debug something, so it would be great for them to be in mainline. cheers
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Hi Frank, > Please go back and read the thread for version 1. Simply resubmitting a > forward port is ignoring that whole conversation. > > There is a lot of good info in that thread. I certainly learned stuff in it. Yes, I did that and learned stuff, too. My summary of the discussion was: - you mentioned some drawbacks you saw (like the mixture of trace output and printk output) - most of them look like addressed to me? (e.g. Steven showed a way to redirect printk to trace) - you posted your version (which was, however, marked as "not user friendly" even by yourself) - The discussion stalled over having two approaches So, I thought reposting would be a good way of finding out if your concerns were addressed in the discussion or not. If I overlooked something, I am sorry for that. Still, my intention is to continue the discussion, not to ignore it. Because as it stands, we don't have such a debugging mechanism in place currently, and with people working with DT overlays, I'd think it would be nice to have. Kind regards, Wolfram signature.asc Description: PGP signature
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. > > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h > Please go back and read the thread for version 1. Simply resubmitting a forward port is ignoring that whole conversation. There is a lot of good info in that thread. I certainly learned stuff in it. Thanks, Frank