Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2012-01-04 Thread Or Gerlitz
On Tue, Dec 20, 2011 Jason Gunthorpe  wrote:
> On Tue, Dec 20, 2011 Or Gerlitz wrote:
>> Jason Gunthorpe  wrote:
>> > The netdev counters are all the same size and there is some other way
>> > to discover what the size is. I'd like to see that for IB counters
>> > too, but it is probably infeasible. So if we have counters that are
>> > not the same size as netdev counters then some kind of size indicator
>> > is required for sane userspace.
>> We're talking now only on the IB extended counters who are all 64 bits

> netdev counters are 32 bit or 64 bit, depending on how the kernel was
> compiled. I think indicating the size explicitly, or always being 64
> bit (and extending all the lessor counters in future) is the way to go for 
> IB..

Roland/Jason,

Any concrete preference here? I'd like to fix the patch so it can go
into 3.3-rc1

* do we want that directory to be present only when  the port link
type is Ethernet (I assume the ib device will be re-created across
link type change as the per port HW elements need to be
re-initialized).

*  decimal display as all the network counters and the other IB counters are?

* could clarify "indicating the size explicitly"

* "always being 64 bit" applies to hex decimal only, I guess

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-22 Thread Or Gerlitz

On 11/8/2011 2:54 AM, Roland Dreier wrote:
Let's make sure we learn from our mistakes. Let's say we create a new 
"ext_counters" directory. What should the format of those files be? 
Should they be assumed to be 64-bit quantities? Do we want to allow 
some way of indicating the number of bits (ie 0-padded hex entries?)?


Hi Roland,Jason -  again, I'd like to re-submit that for kernel 3.3 - if 
there's anything specific you think need to change, could you please 
comment? thanks


Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Or Gerlitz
Jason Gunthorpe  wrote:
>> We're talking now only on the IB extended counters who are all 64 bits

> netdev counters are 32 bit or 64 bit, depending on how the kernel was
> compiled. I think indicating the size explicitly, or always being 64
> bit (and extending all the lessor counters in future) is the way to go for 
> IB..

So for the extended counters which we are dealing with now,  and who
are all 64bit, anything I should change in my patches?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Jason Gunthorpe
On Tue, Dec 20, 2011 at 09:40:09PM +0200, Or Gerlitz wrote:
> Jason Gunthorpe  wrote:
> > The netdev counters are all the same size and there is some other way
> > to discover what the size is. I'd like to see that for IB counters
> > too, but it is probably infeasible. So if we have counters that are
> > not the same size as netdev counters then some kind of size indicator
> > is required for sane userspace.
> 
> We're talking now only on the IB extended counters who are all 64 bits

netdev counters are 32 bit or 64 bit, depending on how the kernel was
compiled. I think indicating the size explicitly, or always being 64
bit (and extending all the lessor counters in future) is the way to go
for IB..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Or Gerlitz
Jason Gunthorpe  wrote:
> The netdev counters are all the same size and there is some other way
> to discover what the size is. I'd like to see that for IB counters
> too, but it is probably infeasible. So if we have counters that are
> not the same size as netdev counters then some kind of size indicator
> is required for sane userspace.

We're talking now only on the IB extended counters who are all 64 bits

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Jason Gunthorpe
On Tue, Dec 20, 2011 at 02:03:31PM +0200, Or Gerlitz wrote:

> >That is a good idea. Let's require counters_ext to be sane:
> >
> >  1 Hex quantity of unspecified size
> >  2 Prefixed with 0x and leading zeros to fill out to size and allow
> >userspace discovery of size
> >  3 Size must be a multiple of 4 bits.
> >  4 Counters do not saturate
> >  5 Counters wrap around at all F's back to 0.
> >  6 If the counter is resettable it is only via a local operation
> >through netlink or sysfs or something. Not PMA reset.
> >
> >Certainly, aside from some minor details and different string
> >formatting, the 64 bit counters Or proposes to add meet these
> >requirements when the port is used in Ethernet mode.
> 
> Jason, in an earlier post of yours you mentioned the netdev counters
> as something we should be following, well, I took a look - the
> counters are read-only (see /sys/class/net/$DEV/statistics/*) they
> are displayed in decimal/non-padded manner (see
> /sys/class/net/$DEV/statistics/* or /proc/net/dev or ifconfig and
> friends)

The netdev counters are all the same size and there is some other way
to discover what the size is. I'd like to see that for IB counters
too, but it is probably infeasible. So if we have counters that are
not the same size as netdev counters then some kind of size indicator
is required for sane userspace.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Or Gerlitz

On 11/8/2011 3:09 AM, Jason Gunthorpe wrote:

Roland Dreier wrote:


Let's make sure we learn from our mistakes.  Let's say we create a
new "ext_counters" directory.  What should the format of those files
be?  Should they be assumed to be 64-bit quantities?  Do we want to
allow some way of indicating the number of bits (ie 0-padded hex
entries?)?


That is a good idea. Let's require counters_ext to be sane:

  1 Hex quantity of unspecified size
  2 Prefixed with 0x and leading zeros to fill out to size and allow
userspace discovery of size
  3 Size must be a multiple of 4 bits.
  4 Counters do not saturate
  5 Counters wrap around at all F's back to 0.
  6 If the counter is resettable it is only via a local operation
through netlink or sysfs or something. Not PMA reset.

Certainly, aside from some minor details and different string
formatting, the 64 bit counters Or proposes to add meet these
requirements when the port is used in Ethernet mode.


Jason, in an earlier post of yours you mentioned the netdev counters as 
something we should be following, well, I took a look - the counters are 
read-only (see /sys/class/net/$DEV/statistics/*) they are displayed in 
decimal/non-padded manner (see /sys/class/net/$DEV/statistics/* or 
/proc/net/dev or ifconfig and friends)


Roland, I'd like to re-format these patches for 3.3, will be glad to 
hear what you would like to see here,



How do you feel about having counters_ext appear in ethernet mode and
disappear in IB mode?


I think I'm fine with that, IB has its own MAD based means to query these 
counters.



Or.

 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Or Gerlitz

Ira Weiny wrote:

Jason Gunthorpe  wrote:
How do you feel about having counters_ext appear in ethernet mode and
disappear in IB mode?

That might get complicated with ports which can be either mode.


Ira (reviving this thread),

At the ib core level, the link layer sysfs attribute is read only.

At the mlx4 VPI level support, a port has given link layer at certain 
point of time, and further, AFAIK, this isn't something that can be 
changed in the life-cycle of the ib device exported by mlx4_ib, which is 
deleted/re-added when the port link  layer change, see commit 
7ff93f8b7ecbc36e7ffc5c11a61643821c1bfee5 which states " When the type of 
a port is changed, all mlx4 interfaces are unregistered, and then 
registered again with the new port types"


Still, this patch makes the decision when the ib device is added, so you 
have a point here w.r.t to future devices which could change their link 
layer at run time, I'll look on that.


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-12-20 Thread Or Gerlitz

Ira Weiny wrote:

Jason Gunthorpe  wrote:
How do you feel about having counters_ext appear in ethernet mode and
disappear in IB mode?

That might get complicated with ports which can be either mode.


Ira (reviving this thread),

At the ib core level, the link layer sysfs attribute is read only.

At the mlx4 VPI level support, a port has given link layer at certain 
point of time, and further, AFAIK, this isn't something that can be 
changed in the life-cycle of the ib device exported by mlx4_ib, which is 
deleted/re-added when the port link  layer change, see commit 
7ff93f8b7ecbc36e7ffc5c11a61643821c1bfee5 which states " When the type of 
a port is changed, all mlx4 interfaces are unregistered, and then 
registered again with the new port types"


Still, this patch makes the decision when the ib device is added, so you 
have a point here w.r.t to future devices which could change their link 
layer at run time, I'll look on that.


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-08 Thread Ira Weiny
On Mon, 7 Nov 2011 17:09:52 -0800
Jason Gunthorpe  wrote:

> On Mon, Nov 07, 2011 at 04:54:42PM -0800, Roland Dreier wrote:
> > On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz  wrote:
> > > I suggest we go that least bad way along the lines of your comment.
> > >
> > > If/when on some future point something constructive can be formed from
> > > Jason's observations, changes will follow, agree?
> > 
> > Let's make sure we learn from our mistakes.  Let's say we create a
> > new "ext_counters" directory.  What should the format of those files
> > be?  Should they be assumed to be 64-bit quantities?  Do we want to
> > allow some way of indicating the number of bits (ie 0-padded hex
> > entries?)?
> 
> That is a good idea. Let's require counters_ext to be sane:
> 
>  1 Hex quantity of unspecified size
>  2 Prefixed with 0x and leading zeros to fill out to size and allow
>userspace discovery of size
>  3 Size must be a multiple of 4 bits.
>  4 Counters do not saturate
>  5 Counters wrap around at all F's back to 0.
>  6 If the counter is resettable it is only via a local operation
>through netlink or sysfs or something. Not PMA reset.
> 
> Certainly, aside from some minor details and different string
> formatting, the 64 bit counters Or proposes to add meet these
> requirements when the port is used in Ethernet mode.
> 
> How do you feel about having counters_ext appear in ethernet mode and
> disappear in IB mode?

That might get complicated with ports which can be either mode.

Ira

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-07 Thread Jason Gunthorpe
On Mon, Nov 07, 2011 at 04:54:42PM -0800, Roland Dreier wrote:
> On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz  wrote:
> > I suggest we go that least bad way along the lines of your comment.
> >
> > If/when on some future point something constructive can be formed from
> > Jason's observations, changes will follow, agree?
> 
> Let's make sure we learn from our mistakes.  Let's say we create a
> new "ext_counters" directory.  What should the format of those files
> be?  Should they be assumed to be 64-bit quantities?  Do we want to
> allow some way of indicating the number of bits (ie 0-padded hex
> entries?)?

That is a good idea. Let's require counters_ext to be sane:

 1 Hex quantity of unspecified size
 2 Prefixed with 0x and leading zeros to fill out to size and allow
   userspace discovery of size
 3 Size must be a multiple of 4 bits.
 4 Counters do not saturate
 5 Counters wrap around at all F's back to 0.
 6 If the counter is resettable it is only via a local operation
   through netlink or sysfs or something. Not PMA reset.

Certainly, aside from some minor details and different string
formatting, the 64 bit counters Or proposes to add meet these
requirements when the port is used in Ethernet mode.

How do you feel about having counters_ext appear in ethernet mode and
disappear in IB mode?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-07 Thread Roland Dreier
On Wed, Nov 2, 2011 at 10:16 AM, Or Gerlitz  wrote:
> I suggest we go that least bad way along the lines of your comment.
>
> If/when on some future point something constructive can be formed from
> Jason's observations, changes will follow, agree?

Let's make sure we learn from our mistakes.  Let's say we create a new
"ext_counters"
directory.  What should the format of those files be?  Should they be
assumed to be
64-bit quantities?  Do we want to allow some way of indicating the
number of bits
(ie 0-padded hex entries?)?

 - R>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-02 Thread Or Gerlitz

On 11/1/2011 11:42 PM, Roland Dreier wrote:
The least bad way forward does seem like it is probably the separate 
new directory thing


Hi Roland,

I suggest we go that least bad way along the lines of your comment.

If/when on some future point something constructive can be formed from 
Jason's observations, changes will follow, agree?




Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-02 Thread Or Gerlitz

On 11/1/2011 11:58 PM, Jason Gunthorpe wrote:
maybe you should patch to un-export them until things can be fixed 
sanely...


Jason,

I don't think we want to work this way, bug are either opened or 
attempted to be fixed or fixed.


You can't just come out of the the blue and remove functionality which 
is buggy (BTW - to

your taste, I don't agree with your conclusive comments).

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-02 Thread Or Gerlitz

On 11/2/2011 12:46 AM, Ira Weiny wrote:
What do you mean "for both"? Ira 


The sysfs PMA counters are functional for both IB and IBoE, the latter uses
a HW counter allocated per device/port for which all the QPs created on 
that

port are reporting their rx/tx bytes/packets, see commits
cfcde11c3d7ae175f49280bb6f913478c2f1bd8c and 
c37791349cc79d025df6e9a4f896a7b0a97cdbd3


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Hefty, Sean
> Let's not get into fairness here... I'm trying to make progress on my backlog
> but there are patches that for better or worse have been around for a year
> or more.

Along these lines, is there any news on when patchwork might be available 
again?  I've been trying to help review some of the backlog patches, but it's 
hard to do without seeing them.
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Ira Weiny
On Tue, 1 Nov 2011 15:34:46 -0700
Or Gerlitz  wrote:

> Jason Gunthorpe  wrote:
> > Why have a sysfs counter at all when you can just ask the PMA and get 
> > exactly the
> > same data?
> 
> The HW/FW  PMA agent isn't supported for IBoE only for IB, the
> counters are for both

What do you mean "for both"?

Ira


> 
> Or.


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
Jason Gunthorpe  wrote:
> Why have a sysfs counter at all when you can just ask the PMA and get exactly 
> the
> same data?

The HW/FW  PMA agent isn't supported for IBoE only for IB, the
counters are for both

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
On Tue, Nov 1, 2011 at 11:49 PM, Roland Dreier  wrote:
> There's no obligation to merge something just because you posted it before
> the merge window, and in fact Linus's complaint at the kernel summit is
> always that sub-maintainers don't say no enough.
>
> And let's be honest in this specific case: the world is not going to end 
> without
> a few performance counters in sysfs.

Agree on all,  I just want to see progress here, its okay for this
discussion and its such
to miss this or that merge window, as long as at some point we get
into a resolution which allows to fix a patch and queue it for the
next merge window, e.g in this case, if we miss 3.2-rc1 and you don't
feel the patches are appropriate for -rc2, they can be queued for 3.3

BTW - this brings something I wanted to raise long ago... I would be
happy to see the for-next branch active at all times, e.g in the same
manner net-next is, which means that patches aren't reviewed/accepted
only/mostly in the weeks before the merge window opens but rather
constantly over time. This shouldn't increase the amortized load on
you and reduce the possible (maybe existing to some extent?)
frustration by people who submit patches long before the next window
opens and don't much feedback... (again, not relevant to this patch
and the next)

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Ira Weiny
On Tue, 1 Nov 2011 15:11:35 -0700
Jason Gunthorpe  wrote:

> On Tue, Nov 01, 2011 at 03:03:58PM -0700, Ira Weiny wrote:
> 
> > > And again, this is a useless interface in IB. 
> > 
> > Why do you mean by this?
> 
> A counter that is randomly reset by an external IB performance manager 
> is not useful for collecting local statistical information.
> 
> A counter that saturates and cannot be reset locally is not useful for
> any automatic process.
> 
> Why have a sysfs counter at all when you can just ask the PMA and get
> exactly the same data?

I agree, but...  sys admins like sysfs files...  :-(

> 
> I can't make a SNMP MIB from this counter. I can't make graphs from
> this counter. I can't do anything with it except show it to the user
> and hope they understand when it was last reset, or understand what
> to do when it is saturated.

Ok, just making sure we are on the same page.

Ira

> 
> Jason


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
On Tue, Nov 01, 2011 at 03:03:58PM -0700, Ira Weiny wrote:

> > And again, this is a useless interface in IB. 
> 
> Why do you mean by this?

A counter that is randomly reset by an external IB performance manager 
is not useful for collecting local statistical information.

A counter that saturates and cannot be reset locally is not useful for
any automatic process.

Why have a sysfs counter at all when you can just ask the PMA and get
exactly the same data?

I can't make a SNMP MIB from this counter. I can't make graphs from
this counter. I can't do anything with it except show it to the user
and hope they understand when it was last reset, or understand what
to do when it is saturated.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Ira Weiny
On Tue, 1 Nov 2011 14:52:00 -0700
Jason Gunthorpe  wrote:

> On Tue, Nov 01, 2011 at 02:42:33PM -0700, Roland Dreier wrote:
> 
[snip]

> 
> And again, this is a useless interface in IB. 

Why do you mean by this?

Ira

> IBoE is going to be
> first real, serious, long term user, let's make it saner instead of
> keeping it as is forever?
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
On Tue, Nov 01, 2011 at 11:46:08PM +0200, Or Gerlitz wrote:

> > In the same vien adding saturating but non-resettable PMA-esque
> > counters for IBoE seems pretty hackish to me.. Though I agree it is
> > not terribly relevant for 64 bit counters.
 
> To put things in place, the IB stack PMA counters aren't resettable
> through sysfs, still, under IB, the same counter set is readable
> through both mads and sysfs and resettable through mads.

Right, the sysfs interface is pretty much unusable for IB. Your work
to make it go on IBoE makes something is very nearly usuable, but you
can't write a tool that collects these counters from a port in IBoE
mode and also expect it to work in IB mode because the semantics are
different.

My argument here is that the semantics we have for the IB case are not
useful. Let us define sane semantics for the IBoE case and have a
longer term clean up to make the IB case follow them as well.

Sane semantics for a sysfs counter are:
  - Free-running
  - Non-saturating
  - No reset
  - 64 or 32 bit value, detectable by user-space

No 6 bit counters. No counters that saturate. No counters that
randomly reset.

To this end, I think exporting 64 bit and 32 bit counts of the same
value is not the way to go.

> As for the saturation thing, I didn't think about that, but you're
> probably right and all the IBA PMA counters are saturating, but as
> your comment said, the 64 bit case is practically okay

Will any counters that get exposed when IBoE is turned on not be 64
bits? There are not very many 64 bit PMA counters.

If yes, maybe you should patch to un-export them until things can be
fixed sanely...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
On Tue, Nov 01, 2011 at 02:42:33PM -0700, Roland Dreier wrote:

> I agree that it definitely is more appealing, if we have a 64-bit
> version of a counter, that we should just export that counter
> where we used to export the 32-bit version.

I think this falls under the 'undocumented, beware' API design.  This
interface isn't specified so exactly as to have set out how many bytes
are in the files and how many bits are in the numbers.

If you wrote a reader that can't handle a 20 byte integer with leading
zeros then your user space isn't following the API.

If your reader doesn't elegantly handle overflow to whatever type your
reader picked, then you aren't following the API.

There are many examples of the kernel tweaking APIs along this
undocumented axis, and theoritically the text-free-form nature of
sysfs is supposed to save us from having to worry about exactly this
sort of case.

And again, this is a useless interface in IB. IBoE is going to be
first real, serious, long term user, let's make it saner instead of
keeping it as is forever?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Roland Dreier
On Tue, Nov 1, 2011 at 11:14 AM, Or Gerlitz  wrote:
> Guys (Roland, Jason), I'm open to any comments, any time, for any
> patch, but for a patch which was posted weeks ago it's pretty unfair
> to have your comments coming only eight days after the merge window
> has been opened, lets try to come quickly to decision so I can fix
> this up along those lines,

Let's not get into fairness here... I'm trying to make progress on my backlog
but there are patches that for better or worse have been around for a year
or more.  And for this merge window, I did manage to do

 84 files changed, 4738 insertions(+), 984 deletions(-)

and include more than 60 patches.

There's no obligation to merge something just because you posted it before
the merge window, and in fact Linus's complaint at the kernel summit is
always that sub-maintainers don't say no enough.

And let's be honest in this specific case: the world is not going to end without
a few performance counters in sysfs.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
Jason Gunthorpe  wrote:

> I don't mean the 32 bit counters are useless, I mean exposing PMA
> counters that saturate and can be randomly reset by external agents
> through sysfs is useless. You can't make any kind of data collection
> based on such a system.

> Ideally the sysfs counters are all non-saturating, non-resetting
> counters like everything else in the net stack. You need a different
> interface to the chip firmware to implement this, can't use the
> existing PMA stuff.

> In the same vien adding saturating but non-resettable PMA-esque
> counters for IBoE seems pretty hackish to me.. Though I agree it is
> not terribly relevant for 64 bit counters.

Jason,

To put things in place, the IB stack PMA counters aren't resettable
through sysfs, still, under IB, the same counter set is readable
through both mads and sysfs and resettable through mads.

As for the saturation thing, I didn't think about that, but you're
probably right and all the IBA PMA counters are saturating, but as
your comment said, the 64 bit case is practically okay

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
On Tue, Nov 1, 2011 at 11:42 PM, Roland Dreier  wrote:

> The least bad way forward does seem like it is probably
> the separate new directory thing.

I agree

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Roland Dreier
On Tue, Nov 1, 2011 at 11:37 AM, Jason Gunthorpe
 wrote:
> Whats the problem here? If a 64 bit counter is available then export
> it as 64 bit otherwise keep exporting something smaller.
>
> I agree zero padding non-hex numbers isn't ideal. Export as hex?

I agree that it definitely is more appealing, if we have a 64-bit
version of a counter, that we should just export that counter
where we used to export the 32-bit version.

But I'm not sure there's a feasible way to do this without
breaking old userspace.  For sure we can't assume userspace
can cope with hex where we used to have decimal.  And
I don't think it's even a safe assumption that userspace
can cope with a 64-bit quantity where we used to have a
32-bit quantity.  It doesn't seem safe to assume that
userspace that used to work with 32-bit quantities can
cope with a 0-padded 20 digit value.

The least bad way forward does seem like it is probably
the separate new directory thing.

 - R.


 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
> > I don't see a problem with having a sysfs counter file being extended
> > to return a 64 bit number.. I think that is within the purvue of
> > acceptable changes. Shame the counter wasn't exported as hex though -
> > makes it harder to signal if it is 32 or 64 bit.
> 
> if I understand you right, we would have traffic counters exposed
> through sysfs, where a counter is either a 32 zero-padded/embedded in
> 64bit one or true 64 bit one, a problem is that the four 32bit traffic
> counters (rx/tx data/packets) are actually part of the IB port L2
> basic counter set which includes about ten more counters to mark
> different kinds of errors, wheres the 64bit counters are only traffic
> counters, so what do you suggest for them? use the same approach for
> the error counters as well even though IB doesn't define 64 bit
> version for them? also zero padding for something which isn't exported
> in hex is very ugly, isn't that?

Whats the problem here? If a 64 bit counter is available then export
it as 64 bit otherwise keep exporting something smaller.

I agree zero padding non-hex numbers isn't ideal. Export as hex?

Broadly, this is another problem with the sysfs interface because the
width matters for any kind of serious data collection, and IBA defined
interesting widths for many of the counters that was flowed right
through the sysfs interface, with no means of discovery.

> > Frankly, exporting these PMA counters as saturate on maximum via sysfs
> > is pretty useless. Does anyone actually use them aside from a few scripts?
> 
> under IB our monitorying code/scripts use perfquery/mads wheres under
> IBoE we use sysfs, the mad approach allows to reset the counters, so
> the 32 bit counters aren't useless, reset via sysfs isn't supported so
> the 64 bit counter are kind of must, anyway,

I don't mean the 32 bit counters are useless, I mean exposing PMA
counters that saturate and can be randomly reset by external agents
through sysfs is useless. You can't make any kind of data collection
based on such a system.

Ideally the sysfs counters are all non-saturating, non-resetting
counters like everything else in the net stack. You need a different
interface to the chip firmware to implement this, can't use the
existing PMA stuff.

In the same vien adding saturating but non-resettable PMA-esque
counters for IBoE seems pretty hackish to me.. Though I agree it is
not terribly relevant for 64 bit counters.

> > What would be useful is free running 64 bit sysfs counters that
> > are independent and not reset by PMA activity. Like all the other
> > Linux networking counters. That would be great. I hope that is
> > what is done for IBoE?
> 
> yes this is what the 64 bit counters are

IBA defined 64 bit counters are not free-running, they still
saturate. Does the firmware not do this in IBoE mode?

> > Unifying the counters to be semantically the same on IB and IBoE seems
> > like a very good idea.

> yes, this is what we do here

I disagree. Your IBoE counters cannot be reset, externally or
otherwise - aside from the saturating this makes them almost the same
as the usual Linux net counters. When the port is in IB mode the
counter doesn't have those properties.

That is a big semantic difference when compared to what these sysfs
files show for normal IB counters.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
Jason Gunthorpe  wrote:

Guys (Roland, Jason), I'm open to any comments, any time, for any
patch, but for a patch which was posted weeks ago it's pretty unfair
to have your comments coming only eight days after the merge window
has been opened, lets try to come quickly to decision so I can fix
this up along those lines,

> I don't see a problem with having a sysfs counter file being extended
> to return a 64 bit number.. I think that is within the purvue of
> acceptable changes. Shame the counter wasn't exported as hex though -
> makes it harder to signal if it is 32 or 64 bit.

if I understand you right, we would have traffic counters exposed
through sysfs, where a counter is either a 32 zero-padded/embedded in
64bit one or true 64 bit one, a problem is that the four 32bit traffic
counters (rx/tx data/packets) are actually part of the IB port L2
basic counter set which includes about ten more counters to mark
different kinds of errors, wheres the 64bit counters are only traffic
counters, so what do you suggest for them? use the same approach for
the error counters as well even though IB doesn't define 64 bit
version for them? also zero padding for something which isn't exported
in hex is very ugly, isn't that?

> Frankly, exporting these PMA counters as saturate on maximum via sysfs
> is pretty useless. Does anyone actually use them aside from a few scripts?

under IB our monitorying code/scripts use perfquery/mads wheres under
IBoE we use sysfs, the mad approach allows to reset the counters, so
the 32 bit counters aren't useless, reset via sysfs isn't supported so
the 64 bit counter are kind of must, anyway,

> What would be useful is free running 64 bit sysfs counters that are
> independent and not reset by PMA activity. Like all the other Linux
> networking counters. That would be great. I hope that is what is done for 
> IBoE?

yes this is what the 64 bit counters are

> Unifying the counters to be semantically the same on IB and IBoE seems
> like a very good idea.

yes, this is what we do here
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
On Tue, Nov 01, 2011 at 07:23:52PM +0200, Or Gerlitz wrote:
> Jason Gunthorpe  wrote:
> > Is there any reason to expose the 32 and 64 bit version of the same
> > counter? That seems needless. Emit the largest version available and
> > prepend 0's to fill out to the available width so that userspace can
> > know the counter size.
> 
> Basically, the approach you suggest seems fine for IBoE which is
> pretty new, however,
> 
> the problem is that the 32 bit counters exists from kind of day one
> AND have the same semantics either if returned through sysfs or
> through perfquery and alike mad based apps.
> In other words around IB there should be some legacy which exists
> today, and I don't think it would be wise to touch that area such that
> the 32 bit counters become embedded in 64 bit numbers, thoughts?

I don't see a problem with having a sysfs counter file being extended
to return a 64 bit number.. I think that is within the purvue of
acceptable changes. Shame the counter wasn't exported as hex though -
makes it harder to signal if it is 32 or 64 bit.

Frankly, exporting these PMA counters as saturate on maximum via sysfs
is pretty useless. Does anyone actually use them aside from a few
scripts? 

What would be useful is free running 64 bit sysfs counters that are
independent and not reset by PMA activity. Like all the other Linux
networking counters. That would be great. I hope that is what is done
for IBoE?

Unifying the counters to be semantically the same on IB and IBoE seems
like a very good idea.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Or Gerlitz
Jason Gunthorpe  wrote:
> Is there any reason to expose the 32 and 64 bit version of the same
> counter? That seems needless. Emit the largest version available and
> prepend 0's to fill out to the available width so that userspace can
> know the counter size.

Basically, the approach you suggest seems fine for IBoE which is
pretty new, however,

the problem is that the 32 bit counters exists from kind of day one
AND have the same semantics either if returned through sysfs or
through perfquery and alike mad based apps.
In other words around IB there should be some legacy which exists
today, and I don't think it would be wise to touch that area such that
the 32 bit counters become embedded in 64 bit numbers, thoughts?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Jason Gunthorpe
On Tue, Nov 01, 2011 at 08:40:12AM +0200, Or Gerlitz wrote:

> Today, e.g in some IBoE perf monitoring scripts we wrote, the
> distinction is done by if (the ext counter directory exists) then go
> and read the counters from there, else read from the non extended
> counters directory. With the change you propose, that if (.) would
> become a little less elegant and would check if this or that --file--
> exists (e.g the 64 bit tx data counter) and if yes, would read the
> four 64 bit counters (rx/tx packets/data) else the four 32 bits
> counters, so from our user standpoint, diff dirs seems better, but we
> can get along with same dir with diff contents depending on the
> device.

Is there any reason to expose the 32 and 64 bit version of the same
counter? That seems needless. Emit the largest version available and
prepend 0's to fill out to the available width so that userspace can
know the counter size.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-10-31 Thread Or Gerlitz
On Mon, Oct 31, 2011 at 9:38 PM, Roland Dreier  wrote:

> Sorry for the late review here

Oh yes... BTW this is patch 4/5, I don't see patches 1,2,3 on your for-next
tree/branch @ kernel.org, have you accepted them?


> Sorry for the late review here, but does it seem like the best
> approach to have a separate "counters_ext" directory for
> some subset of performance counters?  Instead we could
> have two attribute_groups, one the basic counters and one
> the basic and extended counters, and basically do
>
>        if (is_pma_class_cap_ext_width(device, port_num) == 0)
>                sysfs_create_group(...basic and extended counters...)
>       else
>                sysfs_create_group(...basic counters...)

Basically, I don't see a problem to have one directory along the lines
of your suggestion

> Or is there some reason why users would want to make the
> distinction between basic and extended counters?

Today, e.g in some IBoE perf monitoring scripts we wrote, the
distinction is done by if (the ext counter directory exists) then go
and read the counters from there, else read from the non extended
counters directory. With the change you propose, that if (.) would
become a little less elegant and would check if this or that --file--
exists (e.g the 64 bit tx data counter) and if yes, would read the
four 64 bit counters (rx/tx packets/data) else the four 32 bits
counters, so from our user standpoint, diff dirs seems better, but we
can get along with same dir with diff contents depending on the
device.

> (by the way, is_pma_class_cap_ext_width() seems backwards
> since it returns 0 for true.  How about bool pma_has_ext_width()
> and have it return true if the extended counters ARE supported?)

sure, I will be able to handle this little change Wed, but we should
be okay / enough time for the  current merge window, correct?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-10-31 Thread Roland Dreier
On Mon, Oct 10, 2011 at 1:56 AM, Or Gerlitz  wrote:
> +static struct attribute_group pma_ext_group = {
> +       .name  = "counters_ext",
> +       .attrs  = pma_attrs_ext
> +};

Sorry for the late review here, but does it seem like the best
approach to have a separate "counters_ext" directory for
some subset of performance counters?  Instead we could
have two attribute_groups, one the basic counters and one
the basic and extended counters, and basically do

if (is_pma_class_cap_ext_width(device, port_num) == 0)
sysfs_create_group(...basic and extended counters...)
   else
sysfs_create_group(...basic counters...)

(by the way, is_pma_class_cap_ext_width() seems backwards
since it returns 0 for true.  How about bool pma_has_ext_width()
and have it return true if the extended counters ARE supported?)

Or is there some reason why users would want to make the
distinction between basic and extended counters?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html