Re: [RFC] AF_ALG AIO and IV

2018-01-16 Thread Jonathan Cameron
On Tue, 16 Jan 2018 07:28:06 +0100
Stephan Mueller  wrote:

> Am Montag, 15. Januar 2018, 15:42:58 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > > What about:
> > > 
> > > sendmsg(IV, data)
> > > sendmsg(data)
> > > ..
> > > AIO recvmsg with multiple IOCBs
> > > AIO recvmsg with multiple IOCBs
> > > ..
> > > sendmsg(IV, data)
> > > ..
> > > 
> > > This implies, however, that before the sendmsg with the second IV is sent,
> > > all AIO operations from the first invocation would need to be finished.  
> > Yes that works fine, but rather restricts the flow - you would end up
> > waiting until you could concatenate a bunch of data in userspace so as to
> > trade off against the slow down whenever you need to synchronize back up to
> > userspace.  
> 
> I think the solution is already present and even libkcapi's architecture is 
> set up to handle this scenario:
> 
> We have 2 types of FDs: one obtained from socket() and one from accept(). The 
> socket-FD is akin to the TFM. The accept FD is exactly what you want:
> 
> tfmfd = socket()
> setkey(tfmfd)
> opfd = accept()
> opfd2 = accept()
> sendmsg(opfd, IV, data)
> recvmsg(opfd, data)
> 
> sendmsg(opfd2, IV, data)
> recvmsg(opfd2, data)
> 
> sendmsg(opfd, data)
> ..
> 
> There can be multipe FDs from accept and these are the "identifiers" for your 
> cipher operation stream that belongs together.
> 
> libkcapi has already the architecture for this type of work, but it is not 
> exposed to the API yet. The internal calls for sendmsg/recvmsg all take an 
> (op)FD parameter. E.g. _kcapi_common_send_meta_fd has the fdptr variable. 
> internal.h currently wraps this call into _kcapi_common_send_meta where the 
> handle->opfd variable is used.
> 
> The idea why I implemented that is because the caller could maintain an array 
> of opfds. If we would expose these internal calls with the FD argument, you 
> can maintain multiple opfds implementing your use case.
> 
> The only change would be to expose the internal libkcapi calls.
> 
> Ciao
> Stephan

Thanks, I'll take a look at this soonish. Having a busy week.

Jonathan
> 
> 



Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Stephan Mueller
Am Montag, 15. Januar 2018, 15:42:58 CET schrieb Jonathan Cameron:

Hi Jonathan,

> > What about:
> > 
> > sendmsg(IV, data)
> > sendmsg(data)
> > ..
> > AIO recvmsg with multiple IOCBs
> > AIO recvmsg with multiple IOCBs
> > ..
> > sendmsg(IV, data)
> > ..
> > 
> > This implies, however, that before the sendmsg with the second IV is sent,
> > all AIO operations from the first invocation would need to be finished.
> Yes that works fine, but rather restricts the flow - you would end up
> waiting until you could concatenate a bunch of data in userspace so as to
> trade off against the slow down whenever you need to synchronize back up to
> userspace.

I think the solution is already present and even libkcapi's architecture is 
set up to handle this scenario:

We have 2 types of FDs: one obtained from socket() and one from accept(). The 
socket-FD is akin to the TFM. The accept FD is exactly what you want:

tfmfd = socket()
setkey(tfmfd)
opfd = accept()
opfd2 = accept()
sendmsg(opfd, IV, data)
recvmsg(opfd, data)

sendmsg(opfd2, IV, data)
recvmsg(opfd2, data)

sendmsg(opfd, data)
..

There can be multipe FDs from accept and these are the "identifiers" for your 
cipher operation stream that belongs together.

libkcapi has already the architecture for this type of work, but it is not 
exposed to the API yet. The internal calls for sendmsg/recvmsg all take an 
(op)FD parameter. E.g. _kcapi_common_send_meta_fd has the fdptr variable. 
internal.h currently wraps this call into _kcapi_common_send_meta where the 
handle->opfd variable is used.

The idea why I implemented that is because the caller could maintain an array 
of opfds. If we would expose these internal calls with the FD argument, you 
can maintain multiple opfds implementing your use case.

The only change would be to expose the internal libkcapi calls.

Ciao
Stephan




Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron
On Mon, 15 Jan 2018 15:31:42 +0100
Stephan Mueller  wrote:

> Am Montag, 15. Januar 2018, 15:25:38 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > On Mon, 15 Jan 2018 14:15:42 +0100
> > 
> > Stephan Mueller  wrote:  
> > > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> > > 
> > > Hi Jonathan,
> > >   
> > > > > But there may be hardware that cannot/will not track such
> > > > > dependencies.
> > > > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > > > parallel requests when they are totally independent from each other.
> > > > > For
> > > > > such a case, AF_ALG currently has no support, because it lacks the
> > > > > support for setting multiple IVs for the multiple concurrent calls.  
> > > > 
> > > > Agreed, something like your new support is needed - I just suspect we
> > > > need
> > > > a level between one socket one iv chain and every IOCB with own IV and
> > > > right now the only way to hit that balance is to have a separate socket
> > > > for each IV chain.  Not exactly efficient use of resources though it
> > > > will
> > > > work.  
> > > 
> > > How would you propose such support via AF_ALG?
> > > Wouldn't it be possible to
> > > arrange the IOVEC array in user space appropriately before calling the
> > > AF_ALG interface? In this case, I would still see that the current AF_ALG
> > > (plus the patch) would support all use cases I am aware of.  
> > 
> > I'm not sure how that would work, but maybe I'm missing something - are you
> > suggesting we could contrive the situation where the kernel side can tell
> > it is getting the same IV multiple times and hence know that it should chain
> > it?  We are talking streaming here - we don't have the data for the later
> > elements when the first ones are queued up.
> > 
> > One approach to handling token based IV - where the token refers to an IV
> > without being it's value would be to add another flag similar to the one
> > you used for inline IV.  
> 
> What about:
> 
> sendmsg(IV, data)
> sendmsg(data)
> ..
> AIO recvmsg with multiple IOCBs
> AIO recvmsg with multiple IOCBs
> ..
> sendmsg(IV, data)
> ..
> 
> This implies, however, that before the sendmsg with the second IV is sent, 
> all 
> AIO operations from the first invocation would need to be finished.

Yes that works fine, but rather restricts the flow - you would end up waiting
until you could concatenate a bunch of data in userspace so as to trade
off against the slow down whenever you need to synchronize back up to userspace.

> > 
> > You would then set the IV as you have done, but also provide a magic value
> > by which to track the chain.  Later IOCBs using the same IV chain would
> > just provide the magic token.
> > 
> > You'd also need some way of retiring the IV eventually once you were done
> > with it or ultimately you would run out of resources.  
> 
> Let me think about that approach a bit.
> 
> > > 
> > > What AF_ALG should do is to enable different vendors like yourself to use
> > > the most appropriate solution. AF_ALG shall not limit users in any way.  
> > Agreed, but we also need to have some consistency for userspace to have some
> > awareness of what it should be using.  Last thing we want is lots of higher
> > level software having to have knowledge of the encryption hardware
> > underneath. Hence I think we should keep the options to the minimum
> > possible or put the burden on drivers that must play well with all options
> > (be it not as efficiently for the ones that work badly for them).
> >   
> > > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage
> > > with one setiv call applicable to multiple cipher operations. And with the
> > > offered patch it would allow multiple concurrent and yet independent
> > > cipher
> > > operations. Whichever use case is right for you, AF_ALG should not block
> > > you from applying it. Yet, what is good for you may not be good for
> > > others. Thus, these others may implement a different usage strategy for
> > > AF_ALG. The good thing is that this strategy is defined by user space.
> > > 
> > > In case you see a use case that is prevented by AF_ALG, it would be great
> > > to hear about it to see whether we can support it.  
> > 
> > The usecase isn't blocked, but if you have hardware that is doing the IV
> > management then it is not efficiently handled.  Either
> > 1) You move the chaining up to userspace - throughput on a given chain will
> >be awful - basically all the advantages of AIO are gone - fine if you
> > know you only care about bandwidth with lots of separate IV chains.  
> 
> This sounds not like the right path.
> > 
> > 2) You open a socket per IV chain and eat resources.  
> 
> Ok, AF_ALG allows this.

That was my plan before this discussion started.  Ugly but works without
any AF_ALG changes.

We can probably play some internal games to make this not as bad as it
initially seems, but still not nice.

Jonathan
> > 
> > Jonathan

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron
On Mon, 15 Jan 2018 14:25:38 +
Jonathan Cameron  wrote:

> On Mon, 15 Jan 2018 14:15:42 +0100
> Stephan Mueller  wrote:
> 
> > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> > 
> > Hi Jonathan,  
> > > > 
> > > > But there may be hardware that cannot/will not track such dependencies.
> > > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > > parallel requests when they are totally independent from each other. For
> > > > such a case, AF_ALG currently has no support, because it lacks the
> > > > support for setting multiple IVs for the multiple concurrent calls.
> > > 
> > > Agreed, something like your new support is needed - I just suspect we need
> > > a level between one socket one iv chain and every IOCB with own IV and
> > > right now the only way to hit that balance is to have a separate socket
> > > for each IV chain.  Not exactly efficient use of resources though it will
> > > work.
> > 
> > How would you propose such support via AF_ALG? 
> > Wouldn't it be possible to 
> > arrange the IOVEC array in user space appropriately before calling the 
> > AF_ALG 
> > interface? In this case, I would still see that the current AF_ALG (plus 
> > the 
> > patch) would support all use cases I am aware of.  
> 
> I'm not sure how that would work, but maybe I'm missing something - are you
> suggesting we could contrive the situation where the kernel side can tell
> it is getting the same IV multiple times and hence know that it should chain
> it?  We are talking streaming here - we don't have the data for the
> later elements when the first ones are queued up.
> 
> One approach to handling token based IV - where the token refers to an IV 
> without
> being it's value would be to add another flag similar to the one you used for
> inline IV.
> 
> You would then set the IV as you have done, but also provide a magic value by
> which to track the chain.  Later IOCBs using the same IV chain would just
> provide the magic token.
> 
> You'd also need some way of retiring the IV eventually once you were done
> with it or ultimately you would run out of resources.
> 
> >   
> > > > > So the only one left is the case 3 above where the hardware is capable
> > > > > of doing the dependency tracking.
> > > > > 
> > > > > We can support that in two ways but one is rather heavyweight in terms
> > > > > of
> > > > > resources.
> > > > > 
> > > > > 1) Whenever we want to allocate a new context we spin up a new socket
> > > > > and
> > > > > effectively associate a single IV with that (and it's chained updates)
> > > > > much
> > > > > like we do in the existing interface.
> > > > 
> > > > I would not like that because it is too heavyweight. Moreover, 
> > > > considering
> > > > the kernel crypto API logic, a socket is the user space equivalent of a
> > > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM.   
> > > >  
> > > 
> > > Agreed, though as I mention above if you have multiple processes you
> > > probably want to give them their own resources anyway (own socket and
> > > probably hardware queue if you can spare one) so as to avoid denial of
> > > service from one to another.
> > 
> > That is for sure, different processes shall never share a socket as 
> > otherwise 
> > one can obtain data belonging to the other which would violate process 
> > isolation.
> >   
> > > > > 2) We allow a token based tracking of IVs.  So userspace code 
> > > > > maintains
> > > > > a counter and tags ever message and the initial IV setup with that
> > > > > counter.
> > > > 
> > > > I think the option I offer with the patch, we have an even more
> > > > lightweight
> > > > approach.
> > > 
> > > Except that I think you have to go all the way back to userspace - unless 
> > > I
> > > am missing the point - you can't have multiple elements of a stream queued
> > > up. Performance will stink if you have a small number of contexts and 
> > > can't
> > > keep the processing engines busy.  At the moment option 1 here is the only
> > > way to implement this.
> > 
> > What AF_ALG should do is to enable different vendors like yourself to use 
> > the 
> > most appropriate solution. AF_ALG shall not limit users in any way.  
> 
> Agreed, but we also need to have some consistency for userspace to have some
> awareness of what it should be using.  Last thing we want is lots of higher
> level software having to have knowledge of the encryption hardware underneath.
> Hence I think we should keep the options to the minimum possible or put the
> burden on drivers that must play well with all options (be it not as 
> efficiently
> for the ones that work badly for them).
> 
> > 
> > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage 
> > with one setiv call applicable to multiple cipher operations. And with the 
> > offered patch it would allow multiple concurrent and yet independent cipher 
> > operations. Whichever use case is right 

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Stephan Mueller
Am Montag, 15. Januar 2018, 15:25:38 CET schrieb Jonathan Cameron:

Hi Jonathan,

> On Mon, 15 Jan 2018 14:15:42 +0100
> 
> Stephan Mueller  wrote:
> > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> > 
> > Hi Jonathan,
> > 
> > > > But there may be hardware that cannot/will not track such
> > > > dependencies.
> > > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > > parallel requests when they are totally independent from each other.
> > > > For
> > > > such a case, AF_ALG currently has no support, because it lacks the
> > > > support for setting multiple IVs for the multiple concurrent calls.
> > > 
> > > Agreed, something like your new support is needed - I just suspect we
> > > need
> > > a level between one socket one iv chain and every IOCB with own IV and
> > > right now the only way to hit that balance is to have a separate socket
> > > for each IV chain.  Not exactly efficient use of resources though it
> > > will
> > > work.
> > 
> > How would you propose such support via AF_ALG?
> > Wouldn't it be possible to
> > arrange the IOVEC array in user space appropriately before calling the
> > AF_ALG interface? In this case, I would still see that the current AF_ALG
> > (plus the patch) would support all use cases I am aware of.
> 
> I'm not sure how that would work, but maybe I'm missing something - are you
> suggesting we could contrive the situation where the kernel side can tell
> it is getting the same IV multiple times and hence know that it should chain
> it?  We are talking streaming here - we don't have the data for the later
> elements when the first ones are queued up.
> 
> One approach to handling token based IV - where the token refers to an IV
> without being it's value would be to add another flag similar to the one
> you used for inline IV.

What about:

sendmsg(IV, data)
sendmsg(data)
..
AIO recvmsg with multiple IOCBs
AIO recvmsg with multiple IOCBs
..
sendmsg(IV, data)
..

This implies, however, that before the sendmsg with the second IV is sent, all 
AIO operations from the first invocation would need to be finished.
> 
> You would then set the IV as you have done, but also provide a magic value
> by which to track the chain.  Later IOCBs using the same IV chain would
> just provide the magic token.
> 
> You'd also need some way of retiring the IV eventually once you were done
> with it or ultimately you would run out of resources.

Let me think about that approach a bit.

> > 
> > What AF_ALG should do is to enable different vendors like yourself to use
> > the most appropriate solution. AF_ALG shall not limit users in any way.
> Agreed, but we also need to have some consistency for userspace to have some
> awareness of what it should be using.  Last thing we want is lots of higher
> level software having to have knowledge of the encryption hardware
> underneath. Hence I think we should keep the options to the minimum
> possible or put the burden on drivers that must play well with all options
> (be it not as efficiently for the ones that work badly for them).
> 
> > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage
> > with one setiv call applicable to multiple cipher operations. And with the
> > offered patch it would allow multiple concurrent and yet independent
> > cipher
> > operations. Whichever use case is right for you, AF_ALG should not block
> > you from applying it. Yet, what is good for you may not be good for
> > others. Thus, these others may implement a different usage strategy for
> > AF_ALG. The good thing is that this strategy is defined by user space.
> > 
> > In case you see a use case that is prevented by AF_ALG, it would be great
> > to hear about it to see whether we can support it.
> 
> The usecase isn't blocked, but if you have hardware that is doing the IV
> management then it is not efficiently handled.  Either
> 1) You move the chaining up to userspace - throughput on a given chain will
>be awful - basically all the advantages of AIO are gone - fine if you
> know you only care about bandwidth with lots of separate IV chains.

This sounds not like the right path.
> 
> 2) You open a socket per IV chain and eat resources.

Ok, AF_ALG allows this.
> 
> Jonathan
> 
> > Ciao
> > Stephan



Ciao
Stephan




Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron
On Mon, 15 Jan 2018 14:15:42 +0100
Stephan Mueller  wrote:

> Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> > > 
> > > But there may be hardware that cannot/will not track such dependencies.
> > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > parallel requests when they are totally independent from each other. For
> > > such a case, AF_ALG currently has no support, because it lacks the
> > > support for setting multiple IVs for the multiple concurrent calls.  
> > 
> > Agreed, something like your new support is needed - I just suspect we need
> > a level between one socket one iv chain and every IOCB with own IV and
> > right now the only way to hit that balance is to have a separate socket
> > for each IV chain.  Not exactly efficient use of resources though it will
> > work.  
> 
> How would you propose such support via AF_ALG? 
> Wouldn't it be possible to 
> arrange the IOVEC array in user space appropriately before calling the AF_ALG 
> interface? In this case, I would still see that the current AF_ALG (plus the 
> patch) would support all use cases I am aware of.

I'm not sure how that would work, but maybe I'm missing something - are you
suggesting we could contrive the situation where the kernel side can tell
it is getting the same IV multiple times and hence know that it should chain
it?  We are talking streaming here - we don't have the data for the
later elements when the first ones are queued up.

One approach to handling token based IV - where the token refers to an IV 
without
being it's value would be to add another flag similar to the one you used for
inline IV.

You would then set the IV as you have done, but also provide a magic value by
which to track the chain.  Later IOCBs using the same IV chain would just
provide the magic token.

You'd also need some way of retiring the IV eventually once you were done
with it or ultimately you would run out of resources.

> 
> > > > So the only one left is the case 3 above where the hardware is capable
> > > > of doing the dependency tracking.
> > > > 
> > > > We can support that in two ways but one is rather heavyweight in terms
> > > > of
> > > > resources.
> > > > 
> > > > 1) Whenever we want to allocate a new context we spin up a new socket
> > > > and
> > > > effectively associate a single IV with that (and it's chained updates)
> > > > much
> > > > like we do in the existing interface.  
> > > 
> > > I would not like that because it is too heavyweight. Moreover, considering
> > > the kernel crypto API logic, a socket is the user space equivalent of a
> > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM.  
> > 
> > Agreed, though as I mention above if you have multiple processes you
> > probably want to give them their own resources anyway (own socket and
> > probably hardware queue if you can spare one) so as to avoid denial of
> > service from one to another.  
> 
> That is for sure, different processes shall never share a socket as otherwise 
> one can obtain data belonging to the other which would violate process 
> isolation.
> 
> > > > 2) We allow a token based tracking of IVs.  So userspace code maintains
> > > > a counter and tags ever message and the initial IV setup with that
> > > > counter.  
> > > 
> > > I think the option I offer with the patch, we have an even more
> > > lightweight
> > > approach.  
> > 
> > Except that I think you have to go all the way back to userspace - unless I
> > am missing the point - you can't have multiple elements of a stream queued
> > up. Performance will stink if you have a small number of contexts and can't
> > keep the processing engines busy.  At the moment option 1 here is the only
> > way to implement this.  
> 
> What AF_ALG should do is to enable different vendors like yourself to use the 
> most appropriate solution. AF_ALG shall not limit users in any way.

Agreed, but we also need to have some consistency for userspace to have some
awareness of what it should be using.  Last thing we want is lots of higher
level software having to have knowledge of the encryption hardware underneath.
Hence I think we should keep the options to the minimum possible or put the
burden on drivers that must play well with all options (be it not as efficiently
for the ones that work badly for them).

> 
> Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage 
> with one setiv call applicable to multiple cipher operations. And with the 
> offered patch it would allow multiple concurrent and yet independent cipher 
> operations. Whichever use case is right for you, AF_ALG should not block you 
> from applying it. Yet, what is good for you may not be good for others. Thus, 
> these others may implement a different usage strategy for AF_ALG. The good 
> thing is that this strategy is defined by user space.
> 
> In case you see a use case that is prevented by AF_ALG, it would be great to 
> hear about it

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Stephan Mueller
Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:

Hi Jonathan,
> > 
> > But there may be hardware that cannot/will not track such dependencies.
> > Yet, it has multiple hardware queues. Such hardware can still handle
> > parallel requests when they are totally independent from each other. For
> > such a case, AF_ALG currently has no support, because it lacks the
> > support for setting multiple IVs for the multiple concurrent calls.
> 
> Agreed, something like your new support is needed - I just suspect we need
> a level between one socket one iv chain and every IOCB with own IV and
> right now the only way to hit that balance is to have a separate socket
> for each IV chain.  Not exactly efficient use of resources though it will
> work.

How would you propose such support via AF_ALG? Wouldn't it be possible to 
arrange the IOVEC array in user space appropriately before calling the AF_ALG 
interface? In this case, I would still see that the current AF_ALG (plus the 
patch) would support all use cases I am aware of.

> > > So the only one left is the case 3 above where the hardware is capable
> > > of doing the dependency tracking.
> > > 
> > > We can support that in two ways but one is rather heavyweight in terms
> > > of
> > > resources.
> > > 
> > > 1) Whenever we want to allocate a new context we spin up a new socket
> > > and
> > > effectively associate a single IV with that (and it's chained updates)
> > > much
> > > like we do in the existing interface.
> > 
> > I would not like that because it is too heavyweight. Moreover, considering
> > the kernel crypto API logic, a socket is the user space equivalent of a
> > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM.
> 
> Agreed, though as I mention above if you have multiple processes you
> probably want to give them their own resources anyway (own socket and
> probably hardware queue if you can spare one) so as to avoid denial of
> service from one to another.

That is for sure, different processes shall never share a socket as otherwise 
one can obtain data belonging to the other which would violate process 
isolation.

> > > 2) We allow a token based tracking of IVs.  So userspace code maintains
> > > a counter and tags ever message and the initial IV setup with that
> > > counter.
> > 
> > I think the option I offer with the patch, we have an even more
> > lightweight
> > approach.
> 
> Except that I think you have to go all the way back to userspace - unless I
> am missing the point - you can't have multiple elements of a stream queued
> up. Performance will stink if you have a small number of contexts and can't
> keep the processing engines busy.  At the moment option 1 here is the only
> way to implement this.

What AF_ALG should do is to enable different vendors like yourself to use the 
most appropriate solution. AF_ALG shall not limit users in any way.

Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage 
with one setiv call applicable to multiple cipher operations. And with the 
offered patch it would allow multiple concurrent and yet independent cipher 
operations. Whichever use case is right for you, AF_ALG should not block you 
from applying it. Yet, what is good for you may not be good for others. Thus, 
these others may implement a different usage strategy for AF_ALG. The good 
thing is that this strategy is defined by user space.

In case you see a use case that is prevented by AF_ALG, it would be great to 
hear about it to see whether we can support it.

Ciao
Stephan




Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron
On Mon, 15 Jan 2018 13:07:16 +0100
Stephan Mueller  wrote:

> Am Montag, 15. Januar 2018, 12:05:03 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > On Fri, 12 Jan 2018 14:21:15 +0100
> > 
> > Stephan Mueller  wrote:  
> > > Hi,
> > > 
> > > The kernel crypto API requires the caller to set an IV in the request data
> > > structure. That request data structure shall define one particular cipher
> > > operation. During the cipher operation, the IV is read by the cipher
> > > implementation and eventually the potentially updated IV (e.g. in case of
> > > CBC) is written back to the memory location the request data structure
> > > points to.  
> > Silly question, are we obliged to always write it back?  
> 
> Well, in general, yes. The AF_ALG interface should allow a "stream" mode of 
> operation:
> 
> socket
> accept
> setsockopt(setkey)
> sendmsg(IV, data)
> recvmsg(data)
> sendmsg(data)
> recvmsg(data)
> ..
> 
> For such synchronous operation, I guess it is clear that the IV needs to be 
> written back.
> 
> If you want to play with it, use the "stream" API of libkcapi and the 
> associated test cases.

Thanks for the pointer - will do.

> 
> > In CBC it is
> > obviously the same as the last n bytes of the encrypted message.  I guess
> > for ease of handling it makes sense to do so though.
> >   
> > > AF_ALG allows setting the IV with a sendmsg request, where the IV is
> > > stored in the AF_ALG context that is unique to one particular AF_ALG
> > > socket. Note the analogy: an AF_ALG socket is like a TFM where one
> > > recvmsg operation uses one request with the TFM from the socket.
> > > 
> > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with
> > > one recvmsg call, multiple IOVECs can be specified. Each individual IOCB
> > > (derived from one IOVEC) implies that one request data structure is
> > > created with the data to be processed by the cipher implementation. The
> > > IV that was set with the sendmsg call is registered with the request data
> > > structure before the cipher operation.
> > > 
> > > In case of an AIO operation, the cipher operation invocation returns
> > > immediately, queuing the request to the hardware. While the AIO request is
> > > processed by the hardware, recvmsg processes the next IOVEC for which
> > > another request is created. Again, the IV buffer from the AF_ALG socket
> > > context is registered with the new request and the cipher operation is
> > > invoked.
> > > 
> > > You may now see that there is a potential race condition regarding the IV
> > > handling, because there is *no* separate IV buffer for the different
> > > requests. This is nicely demonstrated with libkcapi using the following
> > > command which creates an AIO request with two IOCBs each encrypting one
> > > AES block in CBC mode:
> > > 
> > > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > > 
> > > When the first AIO request finishes before the 2nd AIO request is
> > > processed, the returned value is:
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> > > 
> > > I.e. two blocks where the IV output from the first request is the IV input
> > > to the 2nd block.
> > > 
> > > In case the first AIO request is not completed before the 2nd request
> > > commences, the result is two identical AES blocks (i.e. both use the same
> > > IV):
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> > > 
> > > This inconsistent result may even lead to the conclusion that there can be
> > > a memory corruption in the IV buffer if both AIO requests write to the IV
> > > buffer at the same time.
> > > 
> > > This needs to be solved somehow. I see the following options which I would
> > > like to have vetted by the community.  
> > 
> > Taking some 'entirely hypothetical' hardware with the following structure
> > for all my responses - it's about as flexible as I think we'll see in the
> > near future - though I'm sure someone has something more complex out there
> > :)
> > 
> > N hardware queues feeding M processing engines in a scheduler driven
> > fashion. Actually we might have P sets of these, but load balancing and
> > tracking and transferring contexts between these is a complexity I think we
> > can ignore. If you want to use more than one of these P you'll just have to
> > handle it yourself in userspace.  Note messages may be shorter than IOCBs
> > which raises another question I've been meaning to ask.  Are all crypto
> > algorithms obliged to run unlimited length IOCBs?  
> 
> There are instances where hardware may reject large data chunks. IIRC I have 
> seen some limits around 32k. But in this case, the driver must chunk up the 
> scatter-gather lists (SGLs) with the data and feed it to the hardware in the 
> chunk size necessary.
> 
> From the kernel crypto API point of view, the dri

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Stephan Mueller
Am Montag, 15. Januar 2018, 12:05:03 CET schrieb Jonathan Cameron:

Hi Jonathan,

> On Fri, 12 Jan 2018 14:21:15 +0100
> 
> Stephan Mueller  wrote:
> > Hi,
> > 
> > The kernel crypto API requires the caller to set an IV in the request data
> > structure. That request data structure shall define one particular cipher
> > operation. During the cipher operation, the IV is read by the cipher
> > implementation and eventually the potentially updated IV (e.g. in case of
> > CBC) is written back to the memory location the request data structure
> > points to.
> Silly question, are we obliged to always write it back?

Well, in general, yes. The AF_ALG interface should allow a "stream" mode of 
operation:

socket
accept
setsockopt(setkey)
sendmsg(IV, data)
recvmsg(data)
sendmsg(data)
recvmsg(data)
..

For such synchronous operation, I guess it is clear that the IV needs to be 
written back.

If you want to play with it, use the "stream" API of libkcapi and the 
associated test cases.

> In CBC it is
> obviously the same as the last n bytes of the encrypted message.  I guess
> for ease of handling it makes sense to do so though.
> 
> > AF_ALG allows setting the IV with a sendmsg request, where the IV is
> > stored in the AF_ALG context that is unique to one particular AF_ALG
> > socket. Note the analogy: an AF_ALG socket is like a TFM where one
> > recvmsg operation uses one request with the TFM from the socket.
> > 
> > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with
> > one recvmsg call, multiple IOVECs can be specified. Each individual IOCB
> > (derived from one IOVEC) implies that one request data structure is
> > created with the data to be processed by the cipher implementation. The
> > IV that was set with the sendmsg call is registered with the request data
> > structure before the cipher operation.
> > 
> > In case of an AIO operation, the cipher operation invocation returns
> > immediately, queuing the request to the hardware. While the AIO request is
> > processed by the hardware, recvmsg processes the next IOVEC for which
> > another request is created. Again, the IV buffer from the AF_ALG socket
> > context is registered with the new request and the cipher operation is
> > invoked.
> > 
> > You may now see that there is a potential race condition regarding the IV
> > handling, because there is *no* separate IV buffer for the different
> > requests. This is nicely demonstrated with libkcapi using the following
> > command which creates an AIO request with two IOCBs each encrypting one
> > AES block in CBC mode:
> > 
> > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > 
> > When the first AIO request finishes before the 2nd AIO request is
> > processed, the returned value is:
> > 
> > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> > 
> > I.e. two blocks where the IV output from the first request is the IV input
> > to the 2nd block.
> > 
> > In case the first AIO request is not completed before the 2nd request
> > commences, the result is two identical AES blocks (i.e. both use the same
> > IV):
> > 
> > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> > 
> > This inconsistent result may even lead to the conclusion that there can be
> > a memory corruption in the IV buffer if both AIO requests write to the IV
> > buffer at the same time.
> > 
> > This needs to be solved somehow. I see the following options which I would
> > like to have vetted by the community.
> 
> Taking some 'entirely hypothetical' hardware with the following structure
> for all my responses - it's about as flexible as I think we'll see in the
> near future - though I'm sure someone has something more complex out there
> :)
> 
> N hardware queues feeding M processing engines in a scheduler driven
> fashion. Actually we might have P sets of these, but load balancing and
> tracking and transferring contexts between these is a complexity I think we
> can ignore. If you want to use more than one of these P you'll just have to
> handle it yourself in userspace.  Note messages may be shorter than IOCBs
> which raises another question I've been meaning to ask.  Are all crypto
> algorithms obliged to run unlimited length IOCBs?

There are instances where hardware may reject large data chunks. IIRC I have 
seen some limits around 32k. But in this case, the driver must chunk up the 
scatter-gather lists (SGLs) with the data and feed it to the hardware in the 
chunk size necessary.

>From the kernel crypto API point of view, the driver must support unlimited 
sized IOCBs / SGLs.
> 
> If there are M messages in a particular queue and none elsewhere it is
> capable of processing them all at once (and perhaps returning out of order
> but we can fudge them back in order in the driver to avoid that additional
> complexity from an interface point of view)

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron
On Fri, 12 Jan 2018 14:21:15 +0100
Stephan Mueller  wrote:

> Hi,
> 
> The kernel crypto API requires the caller to set an IV in the request data 
> structure. That request data structure shall define one particular cipher 
> operation. During the cipher operation, the IV is read by the cipher 
> implementation and eventually the potentially updated IV (e.g. in case of 
> CBC) 
> is written back to the memory location the request data structure points to.

Silly question, are we obliged to always write it back? In CBC it is obviously
the same as the last n bytes of the encrypted message.  I guess for ease of
handling it makes sense to do so though.

> 
> AF_ALG allows setting the IV with a sendmsg request, where the IV is stored 
> in 
> the AF_ALG context that is unique to one particular AF_ALG socket. Note the 
> analogy: an AF_ALG socket is like a TFM where one recvmsg operation uses one 
> request with the TFM from the socket.
> 
> AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with one 
> recvmsg call, multiple IOVECs can be specified. Each individual IOCB (derived 
> from one IOVEC) implies that one request data structure is created with the 
> data to be processed by the cipher implementation. The IV that was set with 
> the sendmsg call is registered with the request data structure before the 
> cipher operation.
> 
> In case of an AIO operation, the cipher operation invocation returns 
> immediately, queuing the request to the hardware. While the AIO request is 
> processed by the hardware, recvmsg processes the next IOVEC for which another 
> request is created. Again, the IV buffer from the AF_ALG socket context is 
> registered with the new request and the cipher operation is invoked.
> 
> You may now see that there is a potential race condition regarding the IV 
> handling, because there is *no* separate IV buffer for the different 
> requests. 
> This is nicely demonstrated with libkcapi using the following command which 
> creates an AIO request with two IOCBs each encrypting one AES block in CBC 
> mode:
> 
> kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
> 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
> 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> 
> When the first AIO request finishes before the 2nd AIO request is processed, 
> the returned value is:
> 
> 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> 
> I.e. two blocks where the IV output from the first request is the IV input to 
> the 2nd block.
> 
> In case the first AIO request is not completed before the 2nd request 
> commences, the result is two identical AES blocks (i.e. both use the same IV):
> 
> 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> 
> This inconsistent result may even lead to the conclusion that there can be a 
> memory corruption in the IV buffer if both AIO requests write to the IV 
> buffer 
> at the same time.
> 
> This needs to be solved somehow. I see the following options which I would 
> like to have vetted by the community.
>

Taking some 'entirely hypothetical' hardware with the following structure
for all my responses - it's about as flexible as I think we'll see in the
near future - though I'm sure someone has something more complex out there :)

N hardware queues feeding M processing engines in a scheduler driven fashion.
Actually we might have P sets of these, but load balancing and tracking and
transferring contexts between these is a complexity I think we can ignore.
If you want to use more than one of these P you'll just have to handle it
yourself in userspace.  Note messages may be shorter than IOCBs which
raises another question I've been meaning to ask.  Are all crypto algorithms
obliged to run unlimited length IOCBs?

If there are M messages in a particular queue and none elsewhere it is
capable of processing them all at once (and perhaps returning out of order but
we can fudge them back in order in the driver to avoid that additional
complexity from an interface point of view).

So I'm going to look at this from the hardware point of view - you have
well addressed software management above.

Three ways context management can be handled (in CBC this is basically just
the IV).

1. Each 'work item' queued on a hardware queue has it's IV embedded with the
data.  This requires external synchronization if we are chaining across
multiple 'work items' - note the hardware may have restrictions that mean
it has to split large pieces of data up to encrypt them.  Not all hardware
may support per 'work item' IVs (I haven't done a survey to find out if
everyone does...)

2. Each queue has a context assigned.  We get a new queue whenever we want
to have a different context.  Runs out eventually but our hypothetical
hardware may support a lot of queues.  Note this version could be 'faked'
by putting a cryptoengine queue on the front of the hardware queues.

3. The hardware supports IV dependency tracking in it'

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Stephan Mueller
Am Freitag, 12. Januar 2018, 14:21:15 CET schrieb Stephan Mueller:

Hi,
> 
> 1. Require that the cipher implementations serialize any AIO requests that
> have dependencies. I.e. for CBC, requests need to be serialized by the
> driver. For, say, ECB or XTS no serialization is necessary.
> 
> 2. Change AF_ALG to require a per-request IV. This could be implemented by
> moving the IV submission via CMSG from sendmsg to recvmsg. I.e. the recvmsg
> code path would obtain the IV.

With the released patch, I found a third way that seems to be much less 
intrusive: adding inline IV handling where the IV is sent as part of the data 
to be processed.

In the libkcapi source code tree, I added a branch [1] that contains support 
for the inline IV handling. It also has received numerous tests [2] verifying 
that the inline IV kernel patch works.

[1] https://github.com/smuellerDD/libkcapi/tree/iiv
[2] https://github.com/smuellerDD/libkcapi/commit/
f56991ff2f975caf1aa6cb75f2b6fc104cc72f9c

Ciao
Stephan




[RFC] AF_ALG AIO and IV

2018-01-12 Thread Stephan Mueller
Hi,

The kernel crypto API requires the caller to set an IV in the request data 
structure. That request data structure shall define one particular cipher 
operation. During the cipher operation, the IV is read by the cipher 
implementation and eventually the potentially updated IV (e.g. in case of CBC) 
is written back to the memory location the request data structure points to.

AF_ALG allows setting the IV with a sendmsg request, where the IV is stored in 
the AF_ALG context that is unique to one particular AF_ALG socket. Note the 
analogy: an AF_ALG socket is like a TFM where one recvmsg operation uses one 
request with the TFM from the socket.

AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with one 
recvmsg call, multiple IOVECs can be specified. Each individual IOCB (derived 
from one IOVEC) implies that one request data structure is created with the 
data to be processed by the cipher implementation. The IV that was set with 
the sendmsg call is registered with the request data structure before the 
cipher operation.

In case of an AIO operation, the cipher operation invocation returns 
immediately, queuing the request to the hardware. While the AIO request is 
processed by the hardware, recvmsg processes the next IOVEC for which another 
request is created. Again, the IV buffer from the AF_ALG socket context is 
registered with the new request and the cipher operation is invoked.

You may now see that there is a potential race condition regarding the IV 
handling, because there is *no* separate IV buffer for the different requests. 
This is nicely demonstrated with libkcapi using the following command which 
creates an AIO request with two IOCBs each encrypting one AES block in CBC 
mode:

kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910

When the first AIO request finishes before the 2nd AIO request is processed, 
the returned value is:

8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32

I.e. two blocks where the IV output from the first request is the IV input to 
the 2nd block.

In case the first AIO request is not completed before the 2nd request 
commences, the result is two identical AES blocks (i.e. both use the same IV):

8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71

This inconsistent result may even lead to the conclusion that there can be a 
memory corruption in the IV buffer if both AIO requests write to the IV buffer 
at the same time.

This needs to be solved somehow. I see the following options which I would 
like to have vetted by the community.

1. Require that the cipher implementations serialize any AIO requests that 
have dependencies. I.e. for CBC, requests need to be serialized by the driver. 
For, say, ECB or XTS no serialization is necessary.

2. Change AF_ALG to require a per-request IV. This could be implemented by 
moving the IV submission via CMSG from sendmsg to recvmsg. I.e. the recvmsg 
code path would obtain the IV.

I would tend to favor option 2 as this requires code change at only location. 
If option 2 is considered, I would recommend to still allow setting the IV via 
sendmsg CMSG (to keep the interface stable). If, however, the caller provides 
an IV via recvmsg, this takes precedence.

If there are other options, please allow us to learn about them.

Ciao
Stephan