Re: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-20 Thread Jason Gunthorpe
On Wed, May 20, 2015 at 04:13:59PM +, Hefty, Sean wrote:
> > The other issue is that each caller in the kernel specifies a different
> > timeout.  Defining this in 1 central place and allowing user space to
> > control
> > the policy of that timeout is much better than allowing the kernel clients
> > to
> > specify the timeout as they do now.
> 
> Everything has been randomly hard-coded.  IMO, the sa_query module
> should use its own timeout value, which it updates based on how fast
> it actually gets responses.  But that takes too much work, and no
> one is ever going to write the code to do this.

The IB spec actually says how to compute the timeout for the SA, and
if done properly the SM will configure a timeout appropriate for the
network. It looks like everything the kernel does in this area is
basically wrong..

> For the netlink specific problem, I'll propose using a different
> randomly hard-coded value as a timeout.  Then define an 'MRA' type
> of message that user space can send to the kernel in order to ask it
> to wait longer.  The 'set timeout' message could apply to a single
> request or all future requests.  If we only wanted to the 'all
> future requests' option, the data value could be written into a
> file.  In any case, this pushes the policy of the timeout value into
> the hands of the responding daemon.

A fixed will known timeout (5s?) and require that user space send a
'operation in progress' positive liveness indication seems very
reasonable.

The only purpose of the timeout is to detect a locked up daemon so IB
doesn't lock up, so a watchdog like scheme seems appropriate.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-20 Thread ira.weiny
On Wed, May 20, 2015 at 10:13:59AM -0600, Hefty, Sean wrote:
> > The other issue is that each caller in the kernel specifies a different
> > timeout.  Defining this in 1 central place and allowing user space to
> > control
> > the policy of that timeout is much better than allowing the kernel clients
> > to
> > specify the timeout as they do now.
> 
> Everything has been randomly hard-coded.  IMO, the sa_query module should use
> its own timeout value, which it updates based on how fast it actually gets
> responses.  But that takes too much work, and no one is ever going to write
> the code to do this.

I agree.  So lets not do this again.

> 
> For the netlink specific problem, I'll propose using a different randomly
> hard-coded value as a timeout.

Isn't this what we have as the start default to the module parameter?

> Then define an 'MRA' type of message that
> user space can send to the kernel in order to ask it to wait longer.  The
> 'set timeout' message could apply to a single request or all future requests.
> If we only wanted to the 'all future requests' option, the data value could
> be written into a file.

This is effectively what we have with the module parameter.

> In any case, this pushes the policy of the timeout
> value into the hands of the responding daemon.
> 

I'm flexible on the mechanism used.  In addition to the module parameter, we
discussed sysctl as well as an additional control protocol for configuring the
communication.  We avoided these to keep the mechanism simple.  But since we
are going down the path of designing a new protocol I think it is reasonable to
have some "control" packets there.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-20 Thread Hefty, Sean
> The other issue is that each caller in the kernel specifies a different
> timeout.  Defining this in 1 central place and allowing user space to
> control
> the policy of that timeout is much better than allowing the kernel clients
> to
> specify the timeout as they do now.

Everything has been randomly hard-coded.  IMO, the sa_query module should use 
its own timeout value, which it updates based on how fast it actually gets 
responses.  But that takes too much work, and no one is ever going to write the 
code to do this.

For the netlink specific problem, I'll propose using a different randomly 
hard-coded value as a timeout.  Then define an 'MRA' type of message that user 
space can send to the kernel in order to ask it to wait longer.  The 'set 
timeout' message could apply to a single request or all future requests.  If we 
only wanted to the 'all future requests' option, the data value could be 
written into a file.  In any case, this pushes the policy of the timeout value 
into the hands of the responding daemon.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-19 Thread Hefty, Sean
> I'm not sure there is much use case for letting user space get
> involved in arbitary MAD traffic..

I was thinking more about easily supporting other queries - ServiceRecords, 
MCMemberRecords, etc.  The only use case I'm currently aware of is PathRecords, 
however.

> Over generalizing feels like over design and doesn't let us add
> valuable information that could help push policy decisions into user
> space, like passing the IP and TOS, QP type, etc when available, to
> userspace.

I agree.  This level of control would be better.  The current security model of 
IB seems hopelessly broken.  IMO, all path data should come from privileged 
users, without any ability of the app to modify it.

Changing the protocol shouldn't be a big deal, though if you wanted to make my 
life easy, we could just adopt the ibacm sockets protocol directly.  :)

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-19 Thread Hefty, Sean
> > N?r??yb?X??ǧv?^?)޺{.n?+{??ٚ?{ay?ʇڙ?,j
??f???h???z??w???
> 
> ???j:+v???w?j?m
zZ+?ݢj"??!
> 
> Does anyone else ever wonder what this garbage is on some of the
> messages from @intel folks?

I believe it's some weird formatting of:

> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I have no idea why we get it.  I send/receive in plain text. 


Re: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

2015-05-19 Thread Or Gerlitz
On Tue, May 19, 2015 at 3:24 PM, Wan, Kaike  wrote:
>> On 5/18/2015 10:00 PM, kaike@intel.com wrote:

>> > --- a/drivers/infiniband/core/sa_query.c
>> > +++ b/drivers/infiniband/core/sa_query.c
>> > @@ -45,12 +45,22 @@

>> what's actually the role of the module param? why it belongs here? is that
>> really unavoidable to have it?

> It's nice to provide the capability for the user to adjust the netlink 
> timeout based his/her fabric setup.

NO, adding module params should be really your last resort when
everything else wouldn't work, definitely not the case here, remove
it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html