Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Dec 20, 2011 Jason Gunthorpe wrote: > On Tue, Dec 20, 2011 Or Gerlitz wrote: >> Jason Gunthorpe wrote: >> > The netdev counters are all the same size and there is some other way >> > to discover what the size is. I'd like to see that for IB counters >> > too, but it is probably infeasible. So if we have counters that are >> > not the same size as netdev counters then some kind of size indicator >> > is required for sane userspace. >> We're talking now only on the IB extended counters who are all 64 bits > netdev counters are 32 bit or 64 bit, depending on how the kernel was > compiled. I think indicating the size explicitly, or always being 64 > bit (and extending all the lessor counters in future) is the way to go for > IB.. Roland/Jason, Any concrete preference here? I'd like to fix the patch so it can go into 3.3-rc1 * do we want that directory to be present only when the port link type is Ethernet (I assume the ib device will be re-created across link type change as the per port HW elements need to be re-initialized). * decimal display as all the network counters and the other IB counters are? * could clarify "indicating the size explicitly" * "always being 64 bit" applies to hex decimal only, I guess Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On 11/8/2011 2:54 AM, Roland Dreier wrote: Let's make sure we learn from our mistakes. Let's say we create a new "ext_counters" directory. What should the format of those files be? Should they be assumed to be 64-bit quantities? Do we want to allow some way of indicating the number of bits (ie 0-padded hex entries?)? Hi Roland,Jason - again, I'd like to re-submit that for kernel 3.3 - if there's anything specific you think need to change, could you please comment? thanks Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: >> We're talking now only on the IB extended counters who are all 64 bits > netdev counters are 32 bit or 64 bit, depending on how the kernel was > compiled. I think indicating the size explicitly, or always being 64 > bit (and extending all the lessor counters in future) is the way to go for > IB.. So for the extended counters which we are dealing with now, and who are all 64bit, anything I should change in my patches? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Dec 20, 2011 at 09:40:09PM +0200, Or Gerlitz wrote: > Jason Gunthorpe wrote: > > The netdev counters are all the same size and there is some other way > > to discover what the size is. I'd like to see that for IB counters > > too, but it is probably infeasible. So if we have counters that are > > not the same size as netdev counters then some kind of size indicator > > is required for sane userspace. > > We're talking now only on the IB extended counters who are all 64 bits netdev counters are 32 bit or 64 bit, depending on how the kernel was compiled. I think indicating the size explicitly, or always being 64 bit (and extending all the lessor counters in future) is the way to go for IB.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: > The netdev counters are all the same size and there is some other way > to discover what the size is. I'd like to see that for IB counters > too, but it is probably infeasible. So if we have counters that are > not the same size as netdev counters then some kind of size indicator > is required for sane userspace. We're talking now only on the IB extended counters who are all 64 bits Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Dec 20, 2011 at 02:03:31PM +0200, Or Gerlitz wrote: > >That is a good idea. Let's require counters_ext to be sane: > > > > 1 Hex quantity of unspecified size > > 2 Prefixed with 0x and leading zeros to fill out to size and allow > >userspace discovery of size > > 3 Size must be a multiple of 4 bits. > > 4 Counters do not saturate > > 5 Counters wrap around at all F's back to 0. > > 6 If the counter is resettable it is only via a local operation > >through netlink or sysfs or something. Not PMA reset. > > > >Certainly, aside from some minor details and different string > >formatting, the 64 bit counters Or proposes to add meet these > >requirements when the port is used in Ethernet mode. > > Jason, in an earlier post of yours you mentioned the netdev counters > as something we should be following, well, I took a look - the > counters are read-only (see /sys/class/net/$DEV/statistics/*) they > are displayed in decimal/non-padded manner (see > /sys/class/net/$DEV/statistics/* or /proc/net/dev or ifconfig and > friends) The netdev counters are all the same size and there is some other way to discover what the size is. I'd like to see that for IB counters too, but it is probably infeasible. So if we have counters that are not the same size as netdev counters then some kind of size indicator is required for sane userspace. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On 11/8/2011 3:09 AM, Jason Gunthorpe wrote: Roland Dreier wrote: Let's make sure we learn from our mistakes. Let's say we create a new "ext_counters" directory. What should the format of those files be? Should they be assumed to be 64-bit quantities? Do we want to allow some way of indicating the number of bits (ie 0-padded hex entries?)? That is a good idea. Let's require counters_ext to be sane: 1 Hex quantity of unspecified size 2 Prefixed with 0x and leading zeros to fill out to size and allow userspace discovery of size 3 Size must be a multiple of 4 bits. 4 Counters do not saturate 5 Counters wrap around at all F's back to 0. 6 If the counter is resettable it is only via a local operation through netlink or sysfs or something. Not PMA reset. Certainly, aside from some minor details and different string formatting, the 64 bit counters Or proposes to add meet these requirements when the port is used in Ethernet mode. Jason, in an earlier post of yours you mentioned the netdev counters as something we should be following, well, I took a look - the counters are read-only (see /sys/class/net/$DEV/statistics/*) they are displayed in decimal/non-padded manner (see /sys/class/net/$DEV/statistics/* or /proc/net/dev or ifconfig and friends) Roland, I'd like to re-format these patches for 3.3, will be glad to hear what you would like to see here, How do you feel about having counters_ext appear in ethernet mode and disappear in IB mode? I think I'm fine with that, IB has its own MAD based means to query these counters. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Ira Weiny wrote: Jason Gunthorpe wrote: How do you feel about having counters_ext appear in ethernet mode and disappear in IB mode? That might get complicated with ports which can be either mode. Ira (reviving this thread), At the ib core level, the link layer sysfs attribute is read only. At the mlx4 VPI level support, a port has given link layer at certain point of time, and further, AFAIK, this isn't something that can be changed in the life-cycle of the ib device exported by mlx4_ib, which is deleted/re-added when the port link layer change, see commit 7ff93f8b7ecbc36e7ffc5c11a61643821c1bfee5 which states " When the type of a port is changed, all mlx4 interfaces are unregistered, and then registered again with the new port types" Still, this patch makes the decision when the ib device is added, so you have a point here w.r.t to future devices which could change their link layer at run time, I'll look on that. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Ira Weiny wrote: Jason Gunthorpe wrote: How do you feel about having counters_ext appear in ethernet mode and disappear in IB mode? That might get complicated with ports which can be either mode. Ira (reviving this thread), At the ib core level, the link layer sysfs attribute is read only. At the mlx4 VPI level support, a port has given link layer at certain point of time, and further, AFAIK, this isn't something that can be changed in the life-cycle of the ib device exported by mlx4_ib, which is deleted/re-added when the port link layer change, see commit 7ff93f8b7ecbc36e7ffc5c11a61643821c1bfee5 which states " When the type of a port is changed, all mlx4 interfaces are unregistered, and then registered again with the new port types" Still, this patch makes the decision when the ib device is added, so you have a point here w.r.t to future devices which could change their link layer at run time, I'll look on that. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Mon, 7 Nov 2011 17:09:52 -0800 Jason Gunthorpe wrote: > On Mon, Nov 07, 2011 at 04:54:42PM -0800, Roland Dreier wrote: > > On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz wrote: > > > I suggest we go that least bad way along the lines of your comment. > > > > > > If/when on some future point something constructive can be formed from > > > Jason's observations, changes will follow, agree? > > > > Let's make sure we learn from our mistakes. Let's say we create a > > new "ext_counters" directory. What should the format of those files > > be? Should they be assumed to be 64-bit quantities? Do we want to > > allow some way of indicating the number of bits (ie 0-padded hex > > entries?)? > > That is a good idea. Let's require counters_ext to be sane: > > 1 Hex quantity of unspecified size > 2 Prefixed with 0x and leading zeros to fill out to size and allow >userspace discovery of size > 3 Size must be a multiple of 4 bits. > 4 Counters do not saturate > 5 Counters wrap around at all F's back to 0. > 6 If the counter is resettable it is only via a local operation >through netlink or sysfs or something. Not PMA reset. > > Certainly, aside from some minor details and different string > formatting, the 64 bit counters Or proposes to add meet these > requirements when the port is used in Ethernet mode. > > How do you feel about having counters_ext appear in ethernet mode and > disappear in IB mode? That might get complicated with ports which can be either mode. Ira > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Mon, Nov 07, 2011 at 04:54:42PM -0800, Roland Dreier wrote: > On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz wrote: > > I suggest we go that least bad way along the lines of your comment. > > > > If/when on some future point something constructive can be formed from > > Jason's observations, changes will follow, agree? > > Let's make sure we learn from our mistakes. Let's say we create a > new "ext_counters" directory. What should the format of those files > be? Should they be assumed to be 64-bit quantities? Do we want to > allow some way of indicating the number of bits (ie 0-padded hex > entries?)? That is a good idea. Let's require counters_ext to be sane: 1 Hex quantity of unspecified size 2 Prefixed with 0x and leading zeros to fill out to size and allow userspace discovery of size 3 Size must be a multiple of 4 bits. 4 Counters do not saturate 5 Counters wrap around at all F's back to 0. 6 If the counter is resettable it is only via a local operation through netlink or sysfs or something. Not PMA reset. Certainly, aside from some minor details and different string formatting, the 64 bit counters Or proposes to add meet these requirements when the port is used in Ethernet mode. How do you feel about having counters_ext appear in ethernet mode and disappear in IB mode? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz wrote: > I suggest we go that least bad way along the lines of your comment. > > If/when on some future point something constructive can be formed from > Jason's observations, changes will follow, agree? Let's make sure we learn from our mistakes. Let's say we create a new "ext_counters" directory. What should the format of those files be? Should they be assumed to be 64-bit quantities? Do we want to allow some way of indicating the number of bits (ie 0-padded hex entries?)? - R> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On 11/1/2011 11:42 PM, Roland Dreier wrote: The least bad way forward does seem like it is probably the separate new directory thing Hi Roland, I suggest we go that least bad way along the lines of your comment. If/when on some future point something constructive can be formed from Jason's observations, changes will follow, agree? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On 11/1/2011 11:58 PM, Jason Gunthorpe wrote: maybe you should patch to un-export them until things can be fixed sanely... Jason, I don't think we want to work this way, bug are either opened or attempted to be fixed or fixed. You can't just come out of the the blue and remove functionality which is buggy (BTW - to your taste, I don't agree with your conclusive comments). Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On 11/2/2011 12:46 AM, Ira Weiny wrote: What do you mean "for both"? Ira The sysfs PMA counters are functional for both IB and IBoE, the latter uses a HW counter allocated per device/port for which all the QPs created on that port are reporting their rx/tx bytes/packets, see commits cfcde11c3d7ae175f49280bb6f913478c2f1bd8c and c37791349cc79d025df6e9a4f896a7b0a97cdbd3 Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
> Let's not get into fairness here... I'm trying to make progress on my backlog > but there are patches that for better or worse have been around for a year > or more. Along these lines, is there any news on when patchwork might be available again? I've been trying to help review some of the backlog patches, but it's hard to do without seeing them. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, 1 Nov 2011 15:34:46 -0700 Or Gerlitz wrote: > Jason Gunthorpe wrote: > > Why have a sysfs counter at all when you can just ask the PMA and get > > exactly the > > same data? > > The HW/FW PMA agent isn't supported for IBoE only for IB, the > counters are for both What do you mean "for both"? Ira > > Or. -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: > Why have a sysfs counter at all when you can just ask the PMA and get exactly > the > same data? The HW/FW PMA agent isn't supported for IBoE only for IB, the counters are for both Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 1, 2011 at 11:49 PM, Roland Dreier wrote: > There's no obligation to merge something just because you posted it before > the merge window, and in fact Linus's complaint at the kernel summit is > always that sub-maintainers don't say no enough. > > And let's be honest in this specific case: the world is not going to end > without > a few performance counters in sysfs. Agree on all, I just want to see progress here, its okay for this discussion and its such to miss this or that merge window, as long as at some point we get into a resolution which allows to fix a patch and queue it for the next merge window, e.g in this case, if we miss 3.2-rc1 and you don't feel the patches are appropriate for -rc2, they can be queued for 3.3 BTW - this brings something I wanted to raise long ago... I would be happy to see the for-next branch active at all times, e.g in the same manner net-next is, which means that patches aren't reviewed/accepted only/mostly in the weeks before the merge window opens but rather constantly over time. This shouldn't increase the amortized load on you and reduce the possible (maybe existing to some extent?) frustration by people who submit patches long before the next window opens and don't much feedback... (again, not relevant to this patch and the next) Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, 1 Nov 2011 15:11:35 -0700 Jason Gunthorpe wrote: > On Tue, Nov 01, 2011 at 03:03:58PM -0700, Ira Weiny wrote: > > > > And again, this is a useless interface in IB. > > > > Why do you mean by this? > > A counter that is randomly reset by an external IB performance manager > is not useful for collecting local statistical information. > > A counter that saturates and cannot be reset locally is not useful for > any automatic process. > > Why have a sysfs counter at all when you can just ask the PMA and get > exactly the same data? I agree, but... sys admins like sysfs files... :-( > > I can't make a SNMP MIB from this counter. I can't make graphs from > this counter. I can't do anything with it except show it to the user > and hope they understand when it was last reset, or understand what > to do when it is saturated. Ok, just making sure we are on the same page. Ira > > Jason -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 01, 2011 at 03:03:58PM -0700, Ira Weiny wrote: > > And again, this is a useless interface in IB. > > Why do you mean by this? A counter that is randomly reset by an external IB performance manager is not useful for collecting local statistical information. A counter that saturates and cannot be reset locally is not useful for any automatic process. Why have a sysfs counter at all when you can just ask the PMA and get exactly the same data? I can't make a SNMP MIB from this counter. I can't make graphs from this counter. I can't do anything with it except show it to the user and hope they understand when it was last reset, or understand what to do when it is saturated. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, 1 Nov 2011 14:52:00 -0700 Jason Gunthorpe wrote: > On Tue, Nov 01, 2011 at 02:42:33PM -0700, Roland Dreier wrote: > [snip] > > And again, this is a useless interface in IB. Why do you mean by this? Ira > IBoE is going to be > first real, serious, long term user, let's make it saner instead of > keeping it as is forever? > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 01, 2011 at 11:46:08PM +0200, Or Gerlitz wrote: > > In the same vien adding saturating but non-resettable PMA-esque > > counters for IBoE seems pretty hackish to me.. Though I agree it is > > not terribly relevant for 64 bit counters. > To put things in place, the IB stack PMA counters aren't resettable > through sysfs, still, under IB, the same counter set is readable > through both mads and sysfs and resettable through mads. Right, the sysfs interface is pretty much unusable for IB. Your work to make it go on IBoE makes something is very nearly usuable, but you can't write a tool that collects these counters from a port in IBoE mode and also expect it to work in IB mode because the semantics are different. My argument here is that the semantics we have for the IB case are not useful. Let us define sane semantics for the IBoE case and have a longer term clean up to make the IB case follow them as well. Sane semantics for a sysfs counter are: - Free-running - Non-saturating - No reset - 64 or 32 bit value, detectable by user-space No 6 bit counters. No counters that saturate. No counters that randomly reset. To this end, I think exporting 64 bit and 32 bit counts of the same value is not the way to go. > As for the saturation thing, I didn't think about that, but you're > probably right and all the IBA PMA counters are saturating, but as > your comment said, the 64 bit case is practically okay Will any counters that get exposed when IBoE is turned on not be 64 bits? There are not very many 64 bit PMA counters. If yes, maybe you should patch to un-export them until things can be fixed sanely... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 01, 2011 at 02:42:33PM -0700, Roland Dreier wrote: > I agree that it definitely is more appealing, if we have a 64-bit > version of a counter, that we should just export that counter > where we used to export the 32-bit version. I think this falls under the 'undocumented, beware' API design. This interface isn't specified so exactly as to have set out how many bytes are in the files and how many bits are in the numbers. If you wrote a reader that can't handle a 20 byte integer with leading zeros then your user space isn't following the API. If your reader doesn't elegantly handle overflow to whatever type your reader picked, then you aren't following the API. There are many examples of the kernel tweaking APIs along this undocumented axis, and theoritically the text-free-form nature of sysfs is supposed to save us from having to worry about exactly this sort of case. And again, this is a useless interface in IB. IBoE is going to be first real, serious, long term user, let's make it saner instead of keeping it as is forever? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 1, 2011 at 11:14 AM, Or Gerlitz wrote: > Guys (Roland, Jason), I'm open to any comments, any time, for any > patch, but for a patch which was posted weeks ago it's pretty unfair > to have your comments coming only eight days after the merge window > has been opened, lets try to come quickly to decision so I can fix > this up along those lines, Let's not get into fairness here... I'm trying to make progress on my backlog but there are patches that for better or worse have been around for a year or more. And for this merge window, I did manage to do 84 files changed, 4738 insertions(+), 984 deletions(-) and include more than 60 patches. There's no obligation to merge something just because you posted it before the merge window, and in fact Linus's complaint at the kernel summit is always that sub-maintainers don't say no enough. And let's be honest in this specific case: the world is not going to end without a few performance counters in sysfs. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: > I don't mean the 32 bit counters are useless, I mean exposing PMA > counters that saturate and can be randomly reset by external agents > through sysfs is useless. You can't make any kind of data collection > based on such a system. > Ideally the sysfs counters are all non-saturating, non-resetting > counters like everything else in the net stack. You need a different > interface to the chip firmware to implement this, can't use the > existing PMA stuff. > In the same vien adding saturating but non-resettable PMA-esque > counters for IBoE seems pretty hackish to me.. Though I agree it is > not terribly relevant for 64 bit counters. Jason, To put things in place, the IB stack PMA counters aren't resettable through sysfs, still, under IB, the same counter set is readable through both mads and sysfs and resettable through mads. As for the saturation thing, I didn't think about that, but you're probably right and all the IBA PMA counters are saturating, but as your comment said, the 64 bit case is practically okay Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 1, 2011 at 11:42 PM, Roland Dreier wrote: > The least bad way forward does seem like it is probably > the separate new directory thing. I agree Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 1, 2011 at 11:37 AM, Jason Gunthorpe wrote: > Whats the problem here? If a 64 bit counter is available then export > it as 64 bit otherwise keep exporting something smaller. > > I agree zero padding non-hex numbers isn't ideal. Export as hex? I agree that it definitely is more appealing, if we have a 64-bit version of a counter, that we should just export that counter where we used to export the 32-bit version. But I'm not sure there's a feasible way to do this without breaking old userspace. For sure we can't assume userspace can cope with hex where we used to have decimal. And I don't think it's even a safe assumption that userspace can cope with a 64-bit quantity where we used to have a 32-bit quantity. It doesn't seem safe to assume that userspace that used to work with 32-bit quantities can cope with a 0-padded 20 digit value. The least bad way forward does seem like it is probably the separate new directory thing. - R. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
> > I don't see a problem with having a sysfs counter file being extended > > to return a 64 bit number.. I think that is within the purvue of > > acceptable changes. Shame the counter wasn't exported as hex though - > > makes it harder to signal if it is 32 or 64 bit. > > if I understand you right, we would have traffic counters exposed > through sysfs, where a counter is either a 32 zero-padded/embedded in > 64bit one or true 64 bit one, a problem is that the four 32bit traffic > counters (rx/tx data/packets) are actually part of the IB port L2 > basic counter set which includes about ten more counters to mark > different kinds of errors, wheres the 64bit counters are only traffic > counters, so what do you suggest for them? use the same approach for > the error counters as well even though IB doesn't define 64 bit > version for them? also zero padding for something which isn't exported > in hex is very ugly, isn't that? Whats the problem here? If a 64 bit counter is available then export it as 64 bit otherwise keep exporting something smaller. I agree zero padding non-hex numbers isn't ideal. Export as hex? Broadly, this is another problem with the sysfs interface because the width matters for any kind of serious data collection, and IBA defined interesting widths for many of the counters that was flowed right through the sysfs interface, with no means of discovery. > > Frankly, exporting these PMA counters as saturate on maximum via sysfs > > is pretty useless. Does anyone actually use them aside from a few scripts? > > under IB our monitorying code/scripts use perfquery/mads wheres under > IBoE we use sysfs, the mad approach allows to reset the counters, so > the 32 bit counters aren't useless, reset via sysfs isn't supported so > the 64 bit counter are kind of must, anyway, I don't mean the 32 bit counters are useless, I mean exposing PMA counters that saturate and can be randomly reset by external agents through sysfs is useless. You can't make any kind of data collection based on such a system. Ideally the sysfs counters are all non-saturating, non-resetting counters like everything else in the net stack. You need a different interface to the chip firmware to implement this, can't use the existing PMA stuff. In the same vien adding saturating but non-resettable PMA-esque counters for IBoE seems pretty hackish to me.. Though I agree it is not terribly relevant for 64 bit counters. > > What would be useful is free running 64 bit sysfs counters that > > are independent and not reset by PMA activity. Like all the other > > Linux networking counters. That would be great. I hope that is > > what is done for IBoE? > > yes this is what the 64 bit counters are IBA defined 64 bit counters are not free-running, they still saturate. Does the firmware not do this in IBoE mode? > > Unifying the counters to be semantically the same on IB and IBoE seems > > like a very good idea. > yes, this is what we do here I disagree. Your IBoE counters cannot be reset, externally or otherwise - aside from the saturating this makes them almost the same as the usual Linux net counters. When the port is in IB mode the counter doesn't have those properties. That is a big semantic difference when compared to what these sysfs files show for normal IB counters. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: Guys (Roland, Jason), I'm open to any comments, any time, for any patch, but for a patch which was posted weeks ago it's pretty unfair to have your comments coming only eight days after the merge window has been opened, lets try to come quickly to decision so I can fix this up along those lines, > I don't see a problem with having a sysfs counter file being extended > to return a 64 bit number.. I think that is within the purvue of > acceptable changes. Shame the counter wasn't exported as hex though - > makes it harder to signal if it is 32 or 64 bit. if I understand you right, we would have traffic counters exposed through sysfs, where a counter is either a 32 zero-padded/embedded in 64bit one or true 64 bit one, a problem is that the four 32bit traffic counters (rx/tx data/packets) are actually part of the IB port L2 basic counter set which includes about ten more counters to mark different kinds of errors, wheres the 64bit counters are only traffic counters, so what do you suggest for them? use the same approach for the error counters as well even though IB doesn't define 64 bit version for them? also zero padding for something which isn't exported in hex is very ugly, isn't that? > Frankly, exporting these PMA counters as saturate on maximum via sysfs > is pretty useless. Does anyone actually use them aside from a few scripts? under IB our monitorying code/scripts use perfquery/mads wheres under IBoE we use sysfs, the mad approach allows to reset the counters, so the 32 bit counters aren't useless, reset via sysfs isn't supported so the 64 bit counter are kind of must, anyway, > What would be useful is free running 64 bit sysfs counters that are > independent and not reset by PMA activity. Like all the other Linux > networking counters. That would be great. I hope that is what is done for > IBoE? yes this is what the 64 bit counters are > Unifying the counters to be semantically the same on IB and IBoE seems > like a very good idea. yes, this is what we do here -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 01, 2011 at 07:23:52PM +0200, Or Gerlitz wrote: > Jason Gunthorpe wrote: > > Is there any reason to expose the 32 and 64 bit version of the same > > counter? That seems needless. Emit the largest version available and > > prepend 0's to fill out to the available width so that userspace can > > know the counter size. > > Basically, the approach you suggest seems fine for IBoE which is > pretty new, however, > > the problem is that the 32 bit counters exists from kind of day one > AND have the same semantics either if returned through sysfs or > through perfquery and alike mad based apps. > In other words around IB there should be some legacy which exists > today, and I don't think it would be wise to touch that area such that > the 32 bit counters become embedded in 64 bit numbers, thoughts? I don't see a problem with having a sysfs counter file being extended to return a 64 bit number.. I think that is within the purvue of acceptable changes. Shame the counter wasn't exported as hex though - makes it harder to signal if it is 32 or 64 bit. Frankly, exporting these PMA counters as saturate on maximum via sysfs is pretty useless. Does anyone actually use them aside from a few scripts? What would be useful is free running 64 bit sysfs counters that are independent and not reset by PMA activity. Like all the other Linux networking counters. That would be great. I hope that is what is done for IBoE? Unifying the counters to be semantically the same on IB and IBoE seems like a very good idea. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
Jason Gunthorpe wrote: > Is there any reason to expose the 32 and 64 bit version of the same > counter? That seems needless. Emit the largest version available and > prepend 0's to fill out to the available width so that userspace can > know the counter size. Basically, the approach you suggest seems fine for IBoE which is pretty new, however, the problem is that the 32 bit counters exists from kind of day one AND have the same semantics either if returned through sysfs or through perfquery and alike mad based apps. In other words around IB there should be some legacy which exists today, and I don't think it would be wise to touch that area such that the 32 bit counters become embedded in 64 bit numbers, thoughts? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Tue, Nov 01, 2011 at 08:40:12AM +0200, Or Gerlitz wrote: > Today, e.g in some IBoE perf monitoring scripts we wrote, the > distinction is done by if (the ext counter directory exists) then go > and read the counters from there, else read from the non extended > counters directory. With the change you propose, that if (.) would > become a little less elegant and would check if this or that --file-- > exists (e.g the 64 bit tx data counter) and if yes, would read the > four 64 bit counters (rx/tx packets/data) else the four 32 bits > counters, so from our user standpoint, diff dirs seems better, but we > can get along with same dir with diff contents depending on the > device. Is there any reason to expose the 32 and 64 bit version of the same counter? That seems needless. Emit the largest version available and prepend 0's to fill out to the available width so that userspace can know the counter size. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Mon, Oct 31, 2011 at 9:38 PM, Roland Dreier wrote: > Sorry for the late review here Oh yes... BTW this is patch 4/5, I don't see patches 1,2,3 on your for-next tree/branch @ kernel.org, have you accepted them? > Sorry for the late review here, but does it seem like the best > approach to have a separate "counters_ext" directory for > some subset of performance counters? Instead we could > have two attribute_groups, one the basic counters and one > the basic and extended counters, and basically do > > if (is_pma_class_cap_ext_width(device, port_num) == 0) > sysfs_create_group(...basic and extended counters...) > else > sysfs_create_group(...basic counters...) Basically, I don't see a problem to have one directory along the lines of your suggestion > Or is there some reason why users would want to make the > distinction between basic and extended counters? Today, e.g in some IBoE perf monitoring scripts we wrote, the distinction is done by if (the ext counter directory exists) then go and read the counters from there, else read from the non extended counters directory. With the change you propose, that if (.) would become a little less elegant and would check if this or that --file-- exists (e.g the 64 bit tx data counter) and if yes, would read the four 64 bit counters (rx/tx packets/data) else the four 32 bits counters, so from our user standpoint, diff dirs seems better, but we can get along with same dir with diff contents depending on the device. > (by the way, is_pma_class_cap_ext_width() seems backwards > since it returns 0 for true. How about bool pma_has_ext_width() > and have it return true if the extended counters ARE supported?) sure, I will be able to handle this little change Wed, but we should be okay / enough time for the current merge window, correct? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs
On Mon, Oct 10, 2011 at 1:56 AM, Or Gerlitz wrote: > +static struct attribute_group pma_ext_group = { > + .name = "counters_ext", > + .attrs = pma_attrs_ext > +}; Sorry for the late review here, but does it seem like the best approach to have a separate "counters_ext" directory for some subset of performance counters? Instead we could have two attribute_groups, one the basic counters and one the basic and extended counters, and basically do if (is_pma_class_cap_ext_width(device, port_num) == 0) sysfs_create_group(...basic and extended counters...) else sysfs_create_group(...basic counters...) (by the way, is_pma_class_cap_ext_width() seems backwards since it returns 0 for true. How about bool pma_has_ext_width() and have it return true if the extended counters ARE supported?) Or is there some reason why users would want to make the distinction between basic and extended counters? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html