> From: Jason Gunthorpe <j...@nvidia.com>
> Sent: Wednesday, April 7, 2021 8:44 PM
> 
> On Wed, Apr 07, 2021 at 03:06:35PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe <j...@nvidia.com>
> > > Sent: Tuesday, April 6, 2021 9:17 PM
> > >
> > > On Mon, Apr 05, 2021 at 08:49:56AM +0300, Leon Romanovsky wrote:
> > > > @@ -2293,6 +2295,17 @@ static void ib_sa_event(struct
> > > > ib_event_handler
> > > *handler,
> > > >         }
> > > >  }
> > > >
> > > > +static bool ib_sa_client_supported(struct ib_device *device) {
> > > > +       unsigned int i;
> > > > +
> > > > +       rdma_for_each_port(device, i) {
> > > > +               if (rdma_cap_ib_sa(device, i))
> > > > +                       return true;
> > > > +       }
> > > > +       return false;
> > > > +}
> > >
> > > This is already done though:
> 
> > It is but, ib_sa_device() allocates ib_sa_device worth of struct for
> > each port without checking the rdma_cap_ib_sa().  This results into
> > allocating 40 * 512 = 20480 rounded of to power of 2 to 32K bytes of
> > memory for the rdma device with 512 ports.  Other modules are also
> > similarly wasting such memory.
> 
> If it returns EOPNOTUPP then the remove is never called so if it allocated
> memory and left it allocated then it is leaking memory.
> 
I probably confused you. There is no leak today because add_one allocates 
memory, and later on when SA/CM etc per port cap is not present, it is unused 
left there which is freed on remove_one().
Returning EOPNOTUPP is fine at start of add_one() before allocation.

> If you are saying 32k bytes of temporary allocation matters during device
> startup then it needs benchmarks and a use case.
> 
Use case is clear and explained in commit logs, i.e. to not allocate the memory 
which is never used.

> > > The add_one function should return -EOPNOTSUPP if it doesn't want to
> > > run on this device and any supported checks should just be at the
> > > front - this is how things work right now
> 
> > I am ok to fold this check at the beginning of add callback.  When
> > 512 to 1K RoCE devices are used, they do not have SA, CM, CMA etc caps
> > on and all the client needs to go through refcnt + xa + sem and unroll
> > them.  Is_supported() routine helps to cut down all of it. I didn't
> > calculate the usec saved with it.
> 
> If that is the reason then explain in the cover letter and provide benchmarks
I doubt it will be significant but I will do a benchmark.

Reply via email to