Re: What should be the algo priority

2017-04-07 Thread Hamid Nassiby
"authenc" and "hmac" are templates, not different implementations of a cipher.

Please take a look at:
https://kernel.readthedocs.io/en/sphinx-samples/crypto-API.html#terminology


On Thu, Apr 6, 2017 at 9:56 AM, Harsh Jain  wrote:
> On Tue, Apr 4, 2017 at 6:07 PM, Stephan Müller  wrote:
>> Am Dienstag, 4. April 2017, 09:53:17 CEST schrieb Harsh Jain:
>>
>> Hi Harsh,
>>
>>> Hi,
>>>
>>> Do we have any guidelines documented to decide what should be the
>>> algorithm priority. Specially for authenc implementation.Most of the
>>> drivers have fixed priority for all algos. Problem comes in when we
>>> have cbc(aes), hmac(sha1) and authenc(cbc(aes),hmac(sha1))
>>> implementation in driver. Base authenc driver gets more precedence
>>> because of higher priority(enc->base.cra_priority * 10 +
>>> auth_base->cra_priority;)
>>>
>>> What should be the priority of
>>> cbc(aes),
>>> hmac(sha1)
>>> authenc(cbc(aes),hmac(sha1))
>>
>> There is no general rule about the actual numbers. But commonly, the prios 
>> are
>> set such that the prios of C implementations < ASM implementations < hardware
>> accelerators. The idea is to give users the fastest implementation there is
>> for his particular system.
>
> It means cbc, hmac should have smaller(nearly 10 times less) priority
> than their authenc implementation otherwise request will not offload
> to driver because sw authenc priority is (aes * 10 + hmac).
>
>>
>> Ciao
>> Stephan


Re: ARM-CE aes encryption on uneven blocks

2016-10-26 Thread Hamid Nassiby
Hi,

Based on my old experience with "struct crypto_alg" based drivers, the
data you receive there, is padded beforehand(in the upper layers);
Therefore the plaintext contains integral multiple of AES block size
of data and based on the number of blocks, the crypto transform can be
computed.

Regards,
Hamid

On Mon, Oct 24, 2016 at 6:11 PM, Cata Vasile  wrote:
>
> Hi,
>
> I'm trying to understand the code for AES encryption from ARM-CE.
> From the aes-glue.S calls I understand that the encryption primitives receive 
> the number of blocks, but have no way of determining the number of bytes to 
> encrypt, if for example the plaintext does not have a length of a multiple of 
> AES block size.
> How does, for example, ecb_encrypt() also encrypt the last remaining bytes in 
> the plaintext if it is not a multiple of AES block size if It can never 
> deduce the full plaintext size?
>
> Catalin Vasile--
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IV generation in geode-aes

2013-09-26 Thread Hamid Nassiby
Hi,

In geode-aes, there is not any IV generation mechanism. In fact IV is
delivered to geode-aes's
registered algorithms, from upper layers.For example in case of
"cbc-aes-geode" algorithm,
from cbc wrapper ("cbc.c") via walk->iv:

blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
op->iv = walk.iv;
...

Regards.

On Thu, Sep 19, 2013 at 4:34 PM, Sohail  wrote:
> Hi all,
> I could'nt understand the mechanism of IV generation in geode-aes. Can
> someone explain it in easy to understand manner?
>
> Thanks a lot.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Old PADATA patch vs crypto-2.6 tree

2012-03-29 Thread Hamid Nassiby
You must instantiate pcrypt using crconf app or tcrypt module;

On Wed, Mar 28, 2012 at 4:23 PM, Sebastien Agnolini
 wrote:
>
> Hey,
>
> How activate the IPsec parallelization ?
> I compiled the crypto-2.6 kernel with this param :
> CONFIG_CRYPTO_... = y
> CONFIG_PADATA = y
> CONFIG_SMP=y
> After installation on 2 servers (IPSEC tunnel), i don't detect the IPsec
> parallelization.
> The algorithm is loaded (present in /proc/crypto), but only one core works.
>
> So, What are the other parameters that I forgot for the compilation of the
> kernel? IRQ, IO, Scheduler parameters... Am i missing something ?
> I thought that the parallelization was automatically started. True ?
> What are the conditions to observe a parallel work ?
> A "little" documentation will be Welcome.
>
> I'd like compare the bandwidth of my test platform using the « old » PADATA
> patch.
>
> Sebastien
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: crypto accelerator driver problems

2011-10-15 Thread Hamid Nassiby
On 10/11/11, Steffen Klassert  wrote:

>
> I can't tell much when looking at this code snippet. One guess would be
> someone (maybe you) has set the CRYPTO_TFM_REQ_MAY_SLEEP flag, as
> blkcipher_walk_done calls crypto_yield() which in turn might call
> schedule() if this flag is set. prcypt removes this flag explicit.
>

I've not set such a flag.

>
> Basically, the bottom halves are off to keep up with the network softirqs.
> They run with much higher priority and would interrupt the parallel
> workers frequently.
>

Do you mean that with BHs on, we only have some performance degrades?

Thanks for your reply.
Any other idea?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: crypto accelerator driver problems

2011-10-05 Thread Hamid Nassiby
On Tue, Oct 4, 2011 at 11:27 AM, Steffen Klassert
 wrote:
>
> On Sat, Oct 01, 2011 at 12:38:19PM +0330, Hamid Nassiby wrote:
> >
> > And my_cbc_encrypt function as PSEUDO/real code (for simplicity of
> > representation) is as:
> >
> > static int
> > my_cbc_encrypt(struct blkcipher_desc *desc,
> >                 struct scatterlist *dst, struct scatterlist *src,
> >                 unsigned int nbytes)
> > {
> >               SOME__common_preparation_and_initializations;
> >
> >               spin_lock_irqsave(&myloc, myflags);
> >               send_request_to_device(&dev); /*sends request to device. After
> >                                           processing request,device writes
> >                                           result to destination*/
> >               while(!readl(complete_flag)); /*here we wait for a flag in
> >                         device register space indicating completion. */
> >               spin_unlock_irqrestore(&mylock, myflags);
> >
> >
> > }
>
> As I told you already in the private mail, it makes not too much sense
> to parallelize the crypto layer and to hold a global lock during the
> crypto operation. So if you really need this lock, you are much better
> off without a parallelization.
>
Hi Steffen,
Thanks for your reply :).

It makes sense in two manners:
1. If request transmit time to device is much shorter than request
processing time
 spent in device and the device has more than one processing engine.

 2. It also can be advantageous when device has only one processing
engine and we
have multiple blkcipher requests pending behind entrance port of device,
because delay between request entrances to device will be shorter. The overall
advantage will be that our IPSec throughput gets nearer to our device bulk
encryption throughput. (It is interesting to note that with our
current driver and device
configuration, if I test gateway throughput with a traffic belonging to two SAs,
traveling through one link that connects them, I'll get a rate about
280Mbps(80Mbps
increase in comparison with one SA's traffic), while our device's bulk
processing is
about 400Mbps.)

Currently we want to take advantage of the latter case and then extend it.

>
>
>
> >
> > With above code, I can successfully test IPSec gateway equipped with our
> > hardware and get a 200Mbps throughput using Iperf. Now I am facing with 
> > another
> > poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With
> > above code I only utilize one of them.
> > >From this point, we want to go a step further and utilize more than one aes
> > engines of our device. Simplest solution appears to me is to deploy
> > pcrypt/padata, made by Steffen Klassert. First instantiate in a dual
> > core gateway :
> >       modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3
> >  and test again. Running Iperf now gives me a very low
> > throughput about 20Mbps while dmesg shows the following:
> >
> >    BUG: workqueue leaked lock or atomic: kworker/0:1/0x0001/10
> >        last function: padata_parallel_worker+0x0/0x80
>
> This looks like the parallel worker exited in atomic context,
> but I can't tell you much more as long as you don't show us your code.

OK, I represented code as PSEUSO, just to simplify and concentrate problem's
aspects ;),  (but it is also possible that I've concentrated it in a
wrong way :D)
This is my_cbc_encrypt code and functions it calls, bottom-up:

int write_request(u8 *buff, unsigned int count)
{

u32  tlp_size = 32;
struct my_dma_desc *desc_table = (struct my_dma_desc *)global_bar[0];
tlp_size = (count/128) | (tlp_size << 16);
memcpy(g_mydev->rdmaBuf_va, buff, count);
wmb();

writel(cpu_to_le32(tlp_size),(&desc_table->wdmaperf));
wmb();

while((readl(&desc_table->ddmacr) | 0x)!= 0x0101);/*wait for
transfer compeltion*/
return 0;
}

 int my_transform(struct my_aes_op *op, int alg)
{

int  req_len, err;
unsigned long iflagsq, tflag;
u8 *req_buf = NULL, *res_buf = NULL;
alg_operation operation;
if (op->len == 0)
return 0;
operation = !(op->dir);

create_request(alg, op->mode, operation, 0, op->key,
  op->iv, op->src, op->len, &req_buf, &req_len); /*add
header to original request and copy it to req_buf*/

spin_lock_irqsave(&glock, tflag);

   

Re: Fwd: crypto accelerator driver problems

2011-10-01 Thread Hamid Nassiby
Hi all,

Referring my previous posts in crypto list related to our hardware aes
accelerator project, I finally could deploy device in IPSec successfully. As I
mentioned earlier, my driver registers itself in kernel as blkcipher for
cbc(aes) as follows:

static struct crypto_alg my_cbc_alg = {
.cra_name   =   "cbc(aes)",
.cra_driver_name=   "cbc-aes-my",
.cra_priority   =   400,
.cra_flags  =   CRYPTO_ALG_TYPE_BLKCIPHER |

CRYPTO_ALG_NEED_FALLBACK,
.cra_init   =   fallback_init_blk,
.cra_exit   =   fallback_exit_blk,
.cra_blocksize  =   AES_MIN_BLOCK_SIZE,
.cra_ctxsize=   sizeof(struct my_aes_op),
.cra_alignmask  =   15,
.cra_type   =   &crypto_blkcipher_type,
.cra_module =   THIS_MODULE,
.cra_list   =   LIST_HEAD_INIT(my_cbc_alg.cra_list),
.cra_u  =   {
.blkcipher  =   {
.min_keysize=   AES_MIN_KEY_SIZE,
.max_keysize=   AES_MIN_KEY_SIZE,
.setkey =   my_setkey_blk,
.encrypt=   my_cbc_encrypt,
.decrypt=   my_cbc_decrypt,
.ivsize =   AES_IV_LENGTH,
}
}
};

And my_cbc_encrypt function as PSEUDO/real code (for simplicity of
representation) is as:

static int
my_cbc_encrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
{
SOME__common_preparation_and_initializations;   

spin_lock_irqsave(&myloc, myflags);
send_request_to_device(&dev); /*sends request to device. After
processing request,device writes
result to destination*/
while(!readl(complete_flag)); /*here we wait for a flag in
  device register space indicating completion. */
spin_unlock_irqrestore(&mylock, myflags);


}

With above code, I can successfully test IPSec gateway equipped with our
hardware and get a 200Mbps throughput using Iperf. Now I am facing with another
poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With
above code I only utilize one of them.
>From this point, we want to go a step further and utilize more than one aes
engines of our device. Simplest solution appears to me is to deploy
pcrypt/padata, made by Steffen Klassert. First instantiate in a dual
core gateway :
modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3
 and test again. Running Iperf now gives me a very low
throughput about 20Mbps while dmesg shows the following:

   BUG: workqueue leaked lock or atomic: kworker/0:1/0x0001/10
   last function: padata_parallel_worker+0x0/0x80
   Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1
   Call Trace:
[] ? printk+0x18/0x1b
[] process_one_work+0x177/0x370
[] ? padata_parallel_worker+0x0/0x80
[] worker_thread+0x127/0x390
[] ? worker_thread+0x0/0x390
[] kthread+0x74/0x80
[] ? kthread+0x0/0x80
[] kernel_thread_helper+0x6/0x10
   BUG: scheduling while atomic: kworker/0:1/10/0x0002
   Modules linked in: pcrypt my_aes2 binfmt_misc bridge stp
bnep sco rfcomm l2cap crc16 bluetooth rfkill ppdev acpi_cpufreq mperf
cpufreq_stats cpufreq_conservative cpufreq_ondemand cpufreq_userspace
cpufreq_powersave freq_table pci_slot sbs container video output sbshc battery
iptable_filter ip_tables x_tables decnet ctr twofish_i586 twofish_generic
twofish_common camellia serpent blowfish cast5 aes_i586 aes_generic xcbc rmd160
sha512_generic sha256_generic crypto_null af_key ac lp snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm_oss evdev snd_mixer_oss snd_pcm psmouse
serio_raw snd_seq_dummy pcspkr parport_pc parport snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event option usb_wwan snd_seq usbserial snd_timer
snd_seq_device button processor iTCO_wdt iTCO_vendor_support snd intel_agp
soundcore intel_gtt snd_page_alloc agpgart shpchp pci_hotplug ext3 jbd mbcache
sr_mod cdrom sd_mod sg ata_generic pata_jmicron ata_piix pata_acpi libata floppy
r8169 mii
  scsi_mod uhci_hcd ehci_hcd usbcore thermal fan fuse
   Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1
   Call Trace:
[] __schedule_bug+0x59/0x70
[] schedule+0x6a7/0xa70
[] ? show_trace_log_lvl+0x47/0x60
[] ? dump_stack+0x6e/0x75
[] ? process_one_work+0x1c8/0x370
[] ? padata_parallel_worker+0x0/0x

Re: Fwd: crypto accelerator driver problems

2011-07-04 Thread Hamid Nassiby
On Thu, Jan 27, 2011 at 3:03 AM, Herbert Xu  wrote:
>
> On Wed, Jan 26, 2011 at 11:20:22AM +0330, Hamid Nassiby wrote:
> >
> > Do you mean that different IP packets fit into one single Block Cipher tfm?
> > Would you please explain expansively?
>
> We allocate one tfm per SA.  So as long as ordering is guaranteed
> per SA then it's guaranteed per SA which is all that's needed.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Dears,
Referring to my previous posts related to a hardware AES accelerator (that is
to be used to accelerate IPSec block cipher operations) driver, I would like to
ask you about an possibly algorithmic problem exists in our solution.
As I said earlier our driver is inspired by geode_aes driver, so assume that we
have defined our supported  algorithm as:

static struct crypto_alg shams_cbc_alg = {
.cra_name   =   "cbc(aes)",
.cra_driver_name=   "cbc-aes-mine",
.cra_priority   =   400,
.cra_flags  =   CRYPTO_ALG_TYPE_BLKCIPHER |

CRYPTO_ALG_NEED_FALLBACK,
.cra_init   =   fallback_init_blk,
.cra_exit   =   fallback_exit_blk,
.cra_blocksize  =   AES_MIN_BLOCK_SIZE,
.cra_ctxsize=   sizeof(struct my_aes_op),
.cra_alignmask  =   0,
.cra_type   =   &crypto_blkcipher_type,
.cra_module =   THIS_MODULE,
.cra_list   =
LIST_HEAD_INIT(shams_cbc_alg.cra_list),
.cra_u  =   {
.blkcipher  =   {
.min_keysize=   AES_MIN_KEY_SIZE,
.max_keysize=   AES_MIN_KEY_SIZE,
.setkey =   my_setkey_blk,
.encrypt=   my_cbc_encrypt,
.decrypt=   my_cbc_decrypt,
.ivsize =   AES_IV_LENGTH,
}
}
};

And our encrypt function, my_cbc_encrypt, looks like:

static int
my_cbc_encrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
{
struct my_aes_op *op = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
int err, ret;
unsigned long flag1, c2flag;
u32 my_req_id;

spin_lock_irqsave(&reqlock, c2flag);
/*Our request id sent to device and then retrieved to be able
to distinguish between device responses. */
my_req_id = (global_reqid++) % 63000;
spin_unlock_irqrestore(&reqlock, c2flag);


if (unlikely(op->keylen != AES_KEYSIZE_128))
return fallback_blk_enc(desc, dst, src, nbytes);

blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
op->iv = walk.iv;

while((nbytes = walk.nbytes)) {
op->src = walk.src.virt.addr,
op->dst = walk.dst.virt.addr;
op->mode = AES_MODE_CBC;
op->len = nbytes /*- (nbytes % AES_MIN_BLOCK_SIZE)*/;
op->dir = AES_DIR_ENCRYPT;

/* Critical PSEUDO code */
  spin_lock_irqsave(&1lock, flag1);
 write_to_device(op, 0, my_req_id);
  spin_unlock_irqrestore(&lock1, flag1);

  spin_lock_irqsave(&lock1, flag1);
ret = read_from_device(op, 0, my_req_id);
  spin_unlock_irqrestore(&lock1, flag1);
/* End of Critical PSEUDO code*/
nbytes -= ret;
err = blkcipher_walk_done(desc, &walk, nbytes);
}

return err;
}

As I mentioned earlier we have multiple AES engines in our hardware, so to
utilize hardware as much as possible, we would like to have the possibility to
give multiple requests to device and get responses from it as soon as one
becomes ready.

Now look at that section of my_cbc_encrypt, commented as "Critical PSEUDO code".
This section gives requests to device and reads back responses (And is the damn
bottleneck) . If we protect write_to_device and read_from_device call, by one
pair of lock/unlock as:

/* Critical PSEUDO code */
  spin_lock_irqsave(&lock1, flag1);
 write_to_device(op, 0, my_req_id);
ret = read_from_device(op, 0, my_req_id);
  spin_unlock_irqrestore(&lock1, flag1);
/* End of Critical PSEUDO code*/

then we would have no problem, system works and IPSec en/decrypts by our
hardware. But ONL

Re: Fwd: crypto accelerator driver problems

2011-01-25 Thread Hamid Nassiby
On Wed, Jan 26, 2011 at 10:39 AM, Herbert Xu
 wrote:
> On Wed, Jan 26, 2011 at 10:26:33AM +0330, Hamid Nassiby wrote:
>>
>> As you know, I posted my problem again to crypto list and no one answered.
>> Now I
>> emphasize one aspect of the problem as a concept related to IPSec protocol,
>> free
>> of my problem's nature, and I hope to get some guidelines at this time. The
>> question is as following:
>> If IPSec delivers IP packets to hardware crypto accelerator in sequential
>> manner
>> (e.g, packets in order: 1, 2, 3, ..., 36, 37, 38,...) and crypto accelerator
>> possibly returns back packets out of entering order to IPSec (e.g, packet
>> 37 is returned back before the packet 36 to IPSec, so the order of packets
>> is
>> not the same before entering crypto accelerator and after exiting it); Is it
>> possible to rise any problem here?
>
> We do not allow such reordering.  All crypto drivers must ensure
> ordering within a single tfm.  Between different tfms there is no
> ordering requirement.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>


Do you mean that different IP packets fit into one single Block Cipher tfm?
Would you please explain expansively?

Thanks a lot,
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: crypto accelerator driver problems

2011-01-25 Thread Hamid Nassiby
On Sat, Jan 8, 2011 at 11:09 AM, Hamid Nassiby  wrote:
>
> On Fri, Dec 31, 2010 at 12:49 AM, Herbert Xu
>  wrote:
> >
> > Hamid Nassiby  wrote:
> > > Hi,
> > >
> > > As some good news and additional information, with the following patch
> > > I no more get
> > > "UDP bad cheksum" error as I mentioned erlier with Iperf in udp mode.
> > > But some times I get the following call trace in dmesg after running
> > > Iperf in UDP mode, more than one time (and ofcourse Iperf stops
> > > transferring data while it uses 100% of CPU cycles.
> > >
> > >
> > >
> > > [  130.171909] mydriver-aes: mydriver Crypto-Engine enabled.
> > > [  134.767846] NET: Registered protocol family 15
> > > [  200.031846] iperf: page allocation failure. order:0, mode:0x20
> > > [  200.031850] Pid: 10935, comm: iperf Tainted: P            2.6.36-zen1 
> > > #1
> > > [  200.031852] Call Trace:
> > > [  200.031860]  [] ? __alloc_pages_nodemask+0x6d3/0x722
> > > [  200.031864]  [] ? virt_to_head_page+0x9/0x30
> > > [  200.031867]  [] ? alloc_pages_current+0xa5/0xce
> > > [  200.031869]  [] ? __get_free_pages+0x9/0x46
> > > [  200.031872]  [] ? need_resched+0x1a/0x23
> > > [  200.031876]  [] ? blkcipher_walk_next+0x68/0x2d9
> >
> > This means that your box has run out of memory temporarily.
> > If all errors were handled correctly it should continue at this
> > point.
> >
> > > --- mydriver1   2010-12-21 15:20:17.0 +0330
> > > +++ mydriver2   2010-12-21 15:24:18.0 +0330
> > > @@ -1,4 +1,3 @@
> > > -
> > > static int
> > > mydriver_cbc_decrypt(struct blkcipher_desc *desc,
> > >                  struct scatterlist *dst, struct scatterlist *src,
> > > @@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc
> > >        err = blkcipher_walk_virt(desc, &walk);
> > >        op->iv = walk.iv;
> > >
> > > -       while((nbytes = walk.nbytes)) {
> > > +
> >
> > However, your patch removes the error checking (and the loop
> > condition) which is why it crashes.
> >
> > Cheers,
> > --
> > Email: Herbert Xu 
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
>
>
> Hi Herbert,
>
> First I should notice that by removing while loop iteration, "UDP bad 
> checksum"
> error in dmesg output is no longer seen. Diving deeper in problem, It seemed
> to me that when mydriver_transform returns 0, I must not get any more bytes
> (belonging to previous request) to process in the next iteration of while 
> loop.
> But I see that the behavior is not as it has to be (By removing while loop
> mydriver_transform gets for example one 1500 byte request, processes it and
> copies it back to destination, But in existence of while loop It gets same
> request as one 1300 byte request, processes and copies it back to destination,
> returning 0, and getting remaining 200 bytes of request in second iteration of
> while, so on the other end of tunnel I see "UDP bad checksum"). So I conclude
> that blkcipher_walk_done behaves strange, assigns incorrect value to 
> walk.nbytes
> resulting in iterating while loop one time more!
>
>
> Second note is about our accelerator's architecture and the way we should
> utilize it. Our device has several crypto engines built in. So for maximum
> utilization of device we should feed it with multiple crypto requests
> simultaneously (I intended for doing  it by using pcrypt) and here is the 
> point
> everything freezes. From other point of view, I found that if I protect 
> entering
> write_request and read_response in mydriver_transform by one lock
> (spin_unlock(x) before write_request and spin_unlock(x) after read_reasponse 
> in
> mydriver_transform as shown in following code snippet), I would be able to run
> "iperf" in tcp mode successfully. This leads me to uncertainty, because in
> such a situation, we only utilize one crypto engine of device and each request
> is followed by its response sequentially and arrangement of requests and
> responses is not interleaved. So I guess that getting multiple requests to
> device and receiving the responses not in the same arrangement they delivered 
> to
> device, might cause TCP transfer to freeze, and here my question arises: If my
> conclusion is true, SHOULD I change the driver approach to ablkcipher?
>
>
> Code snippet in the way write_request and read_response are protected by lock
> and iperf in

Re: Fwd: crypto accelerator driver problems

2011-01-07 Thread Hamid Nassiby
On Fri, Dec 31, 2010 at 12:49 AM, Herbert Xu
 wrote:
>
> Hamid Nassiby  wrote:
> > Hi,
> >
> > As some good news and additional information, with the following patch
> > I no more get
> > "UDP bad cheksum" error as I mentioned erlier with Iperf in udp mode.
> > But some times I get the following call trace in dmesg after running
> > Iperf in UDP mode, more than one time (and ofcourse Iperf stops
> > transferring data while it uses 100% of CPU cycles.
> >
> >
> >
> > [  130.171909] mydriver-aes: mydriver Crypto-Engine enabled.
> > [  134.767846] NET: Registered protocol family 15
> > [  200.031846] iperf: page allocation failure. order:0, mode:0x20
> > [  200.031850] Pid: 10935, comm: iperf Tainted: P            2.6.36-zen1 #1
> > [  200.031852] Call Trace:
> > [  200.031860]  [] ? __alloc_pages_nodemask+0x6d3/0x722
> > [  200.031864]  [] ? virt_to_head_page+0x9/0x30
> > [  200.031867]  [] ? alloc_pages_current+0xa5/0xce
> > [  200.031869]  [] ? __get_free_pages+0x9/0x46
> > [  200.031872]  [] ? need_resched+0x1a/0x23
> > [  200.031876]  [] ? blkcipher_walk_next+0x68/0x2d9
>
> This means that your box has run out of memory temporarily.
> If all errors were handled correctly it should continue at this
> point.
>
> > --- mydriver1   2010-12-21 15:20:17.0 +0330
> > +++ mydriver2   2010-12-21 15:24:18.0 +0330
> > @@ -1,4 +1,3 @@
> > -
> > static int
> > mydriver_cbc_decrypt(struct blkcipher_desc *desc,
> >                  struct scatterlist *dst, struct scatterlist *src,
> > @@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc
> >        err = blkcipher_walk_virt(desc, &walk);
> >        op->iv = walk.iv;
> >
> > -       while((nbytes = walk.nbytes)) {
> > +
>
> However, your patch removes the error checking (and the loop
> condition) which is why it crashes.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



Hi Herbert,

First I should notice that by removing while loop iteration, "UDP bad checksum"
error in dmesg output is no longer seen. Diving deeper in problem, It seemed
to me that when mydriver_transform returns 0, I must not get any more bytes
(belonging to previous request) to process in the next iteration of while loop.
But I see that the behavior is not as it has to be (By removing while loop
mydriver_transform gets for example one 1500 byte request, processes it and
copies it back to destination, But in existence of while loop It gets same
request as one 1300 byte request, processes and copies it back to destination,
returning 0, and getting remaining 200 bytes of request in second iteration of
while, so on the other end of tunnel I see "UDP bad checksum"). So I conclude
that blkcipher_walk_done behaves strange, assigns incorrect value to walk.nbytes
resulting in iterating while loop one time more!


Second note is about our accelerator's architecture and the way we should
utilize it. Our device has several crypto engines built in. So for maximum
utilization of device we should feed it with multiple crypto requests
simultaneously (I intended for doing  it by using pcrypt) and here is the point
everything freezes. From other point of view, I found that if I protect entering
write_request and read_response in mydriver_transform by one lock
(spin_unlock(x) before write_request and spin_unlock(x) after read_reasponse in
mydriver_transform as shown in following code snippet), I would be able to run
"iperf" in tcp mode successfully. This leads me to uncertainty, because in
such a situation, we only utilize one crypto engine of device and each request
is followed by its response sequentially and arrangement of requests and
responses is not interleaved. So I guess that getting multiple requests to
device and receiving the responses not in the same arrangement they delivered to
device, might cause TCP transfer to freeze, and here my question arises: If my
conclusion is true, SHOULD I change the driver approach to ablkcipher?


Code snippet in the way write_request and read_response are protected by lock
and iperf in TCP mode progresses:


static inline int mydriver_transform(struct mydriver_aes_op *op, int alg)
{
.
.
.
spin_lock_irqsave(&glock, tflag);
write_request(req_buf, req_len);
kfree(req_buf);
req_buf = NULL;
err = read_response(&res_buf,my_req_id);
spin_unlock_irqrestore(&glock, tflag2);
if (err == 0){
kfree(res_buf);
res_buf = NULL;
 

Fwd: crypto accelerator driver problems

2010-12-21 Thread Hamid Nassiby
_aes2 fuse
nvidia(P) r8169 iTCO_wdt iTCO_vendor_support
[  200.040542]
[  200.040544] Pid: 10935, comm: iperf Tainted: P
2.6.36-zen1 #1 EP45-UD3P/EP45-UD3P
[  200.040546] RIP: 0010:[]  []
mydriver_transform+0x1a3/0x6a8 [mydriver_aes2]
[  200.040550] RSP: 0018:880072c5b898  EFLAGS: 00010246
[  200.040551] RAX: 880055a3a030 RBX: 0680 RCX: 05f0
[  200.040553] RDX: 0680 RSI:  RDI: 880055a3a030
[  200.040555] RBP:  R08: 0680 R09: 0018
[  200.040556] R10: 7d078004 R11: 00013234 R12: 0010
[  200.040558] R13: 0004 R14: eaef R15: 05f0
[  200.040561] FS:  41767950(0063) GS:880001a0()
knlGS:
[  200.040562] CS:  0010 DS:  ES:  CR0: 8005003b
[  200.040564] CR2:  CR3: 7409d000 CR4: 000406f0
[  200.040566] DR0:  DR1:  DR2: 
[  200.040568] DR3:  DR6: 0ff0 DR7: 0400
[  200.040570] Process iperf (pid: 10935, threadinfo 880072c5a000,
task 880006d09000)
[  200.040571] Stack:
[  200.040572]  880006d09000  817854f0
0020
[  200.040574] <0>  efea72c5b9e8 88007d4a7c58
0001810afac2
[  200.040577] <0>  88004d53dc00 880072c5b901
000f
[  200.040580] Call Trace:
[  200.040585]  [] ? need_resched+0x1a/0x23
[  200.040588]  [] ? mydriver_cbc_encrypt+0x7e/0x9c
[mydriver_aes2]
[  200.040592]  [] ? async_encrypt+0x35/0x3a
[  200.040595]  [] ? eseqiv_givencrypt+0x341/0x389
[  200.040598]  [] ? __skb_to_sgvec+0x49/0x1ea
[  200.040600]  [] ? __skb_to_sgvec+0x1b2/0x1ea
[  200.040603]  [] ? crypto_authenc_givencrypt+0x60/0x7c
[  200.040607]  [] ? esp_output+0x320/0x357
[  200.040610]  [] ? xfrm_output_resume+0x38d/0x48f
[  200.040613]  [] ? nf_hook_slow+0xc8/0xd9
[  200.040616]  [] ? ip_push_pending_frames+0x2cc/0x328
[  200.040619]  [] ? udp_push_pending_frames+0x2c4/0x342
[  200.040621]  [] ? udp_sendmsg+0x508/0x600
[  200.040623]  [] ? need_resched+0x1a/0x23
[  200.040627]  [] ? sock_aio_write+0xd5/0xe9
[  200.040630]  [] ? apic_timer_interrupt+0xe/0x20
[  200.040633]  [] ? do_sync_write+0xb0/0xf2
[  200.040636]  [] ? sched_clock+0x5/0x8
[  200.040639]  [] ? security_file_permission+0x18/0x67
[  200.040641]  [] ? vfs_write+0xbc/0x101
[  200.040643]  [] ? sys_write+0x45/0x6e
[  200.040646]  [] ? system_call_fastpath+0x16/0x1b
[  200.040647] Code: 83 c0 08 80 7c 24 50 01 75 10 48 89 c7 49 63 cc
48 8b 74 24 48 f3 a4 48 89 f8 48 89 c7 48 8b 74 24 40 41 0f b7 d8 49
63 cf 89 da  a4 4c 8b 2d 62 1d 00 00 48 8b 3d 53 1d 00 00 b1 01 48
8b 74
[  200.040668] RIP  []
mydriver_transform+0x1a3/0x6a8 [mydriver_aes2]
[  200.040671]  RSP 
[  200.040672] CR2: 
[  200.040733] ---[ end trace ae2865df0a025f7d ]---
[  221.687773] SysRq : Emergency Sync



BUT Iperf in TCP mode has its own problems yet ( the system freezes
with no response ).

Thank in advance,
Hamid.


--- mydriver1   2010-12-21 15:20:17.0 +0330
+++ mydriver2   2010-12-21 15:24:18.0 +0330
@@ -1,4 +1,3 @@
-
 static int
 mydriver_cbc_decrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
@@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc
err = blkcipher_walk_virt(desc, &walk);
op->iv = walk.iv;

-   while((nbytes = walk.nbytes)) {
+   
op->src = walk.src.virt.addr,
op->dst = walk.dst.virt.addr;
op->mode = AES_MODE_CBC;
-   op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE);
+   op->len = nbytes;
op->dir = AES_DIR_DECRYPT;
-   
ret = mydriver_transform(op, 0);

nbytes -= ret;
err = blkcipher_walk_done(desc, &walk, nbytes);
-   }
+   

return err;
 }
@@ -45,16 +43,17 @@ mydriver_cbc_encrypt(struct blkcipher_desc
err = blkcipher_walk_virt(desc, &walk);
op->iv = walk.iv;

-   while((nbytes = walk.nbytes)) {
+   
op->src = walk.src.virt.addr,
op->dst = walk.dst.virt.addr;
op->mode = AES_MODE_CBC;
-   op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE);
+   op->len = nbytes;
op->dir = AES_DIR_ENCRYPT;
ret = mydriver_transform(op, 0);
nbytes -= ret;
err = blkcipher_walk_done(desc, &walk, nbytes);
-   }
+   

    return err;
 }
+

-- Forwarded message --
From: Hamid Nassiby 
Date: Sun, Dec 19, 2010 at 4:28 PM
Subject: crypto accelerator driver problems
To: linux-crypto@vger.kernel.org


Hi All,

In a research project, w

crypto accelerator driver problems

2010-12-19 Thread Hamid Nassiby
Hi All,

In a research project, we've developed a crypto accelerator based on Xilinx
Virtex5 FPGA family which is connected to PC through PCI-Express slot and is
used by IPSec to offload crypto processing from CPU. The accelerator only
provides AES and DES3_EDE algorithms and I am responsible for providing driver
of the stuff. I inspired much of driver work from geode_aes.c which is
located in "drivers/crypto" subdir of kernel source directory. Both algorithms
are registered as blkcipher providing cbc wrapper "cbc(aes)" just as one that is
registered in geode_aes. Now after months of work, the accelerator is ready to
work (Correctness of hardware operation is assured by direct crypto
test and not by IPSec) and it is time of driver to provide IPSec
access to accelerator. In first
try I could get  "ping" through the IPsec tunnel. One end of IPSec tunnel is
equipped by our accelerator and the other end is using kernel native IPSec and
built in AES and DES3_EDE algorithms. Now I am faced with 2 problems:

1. Ping will stop getting reply with packet sizes greater than 1426 Bytes
(ping dest_ip -s  1427). I guessed that it might be MTU problem, but reducing
mtu with "ifconfig eth1 mtu xxx" or
"echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc"
 does not solve the problem. Also when I ping each of tunnel ends from
another end
simultaneously with "ping other_node_ip  -i 0.001", the kernel hangs
out completely.

2. Iperf problem. When I try to measure throughput of the IPSec gateway equipped
by our accelerator ( AES-MD5 ), using iperf in tcp mode, the kernel hangs such
that sometimes "Magic SysRq key" does not respond too! And so I could not trace
the problem anyway. Using iperf in udp mode works but I get "UDP bad cheksum" in
'dmesg' output of other end of tunnel (Native IPSec and built in kernel
algorithms).

Two gateways are connected by a cross cable and no router/switch is located
between them to cause mtu problems. In my test pcrypt is not used by now and
booting the kernel with nosmp (so no fear of thread contention) does not change
the situation.

So I request you to help me solve the problem. I bring some parts of driver
that is changed from geode_aes.c and might give useful information. If
it is required,
I'll post all driver text.
-- 

static struct crypto_alg mydriver_cbc_alg = {
       .cra_name               =       "cbc(aes)",
       .cra_driver_name        =       "cbc-aes-mydriver",
       .cra_priority           =       400,
       .cra_flags                      =       CRYPTO_ALG_TYPE_BLKCIPHER |

CRYPTO_ALG_NEED_FALLBACK,
       .cra_init                       =       fallback_init_blk,
       .cra_exit                       =       fallback_exit_blk,
       .cra_blocksize          =       AES_MIN_BLOCK_SIZE,
       .cra_ctxsize            =       sizeof(struct mydriver_aes_op),
       .cra_alignmask          =       15,
       .cra_type                       =       &crypto_blkcipher_type,
       .cra_module                     =       THIS_MODULE,
       .cra_list                       =
LIST_HEAD_INIT(mydriver_cbc_alg.cra_list),
       .cra_u                          =       {
               .blkcipher      =       {
                       .min_keysize    =       AES_MIN_KEY_SIZE,
                       .max_keysize    =       AES_MIN_KEY_SIZE,
                       .setkey                 =       mydriver_setkey_blk,
                       .encrypt                =       mydriver_cbc_encrypt,
                       .decrypt                =       mydriver_cbc_decrypt,
                       .ivsize                 =       AES_IV_LENGTH,
               }
       }
};
//---
static int
mydriver_cbc_encrypt(struct blkcipher_desc *desc,
                 struct scatterlist *dst, struct scatterlist *src,
                 unsigned int nbytes)
{

       struct mydriver_aes_op *op = crypto_blkcipher_ctx(desc->tfm);
       struct blkcipher_walk walk;
       int err, ret;

       if (unlikely(op->keylen != AES_KEYSIZE_128))
               return fallback_blk_enc(desc, dst, src, nbytes);

       blkcipher_walk_init(&walk, dst, src, nbytes);
       err = blkcipher_walk_virt(desc, &walk);
       op->iv = walk.iv;

       while((nbytes = walk.nbytes)) {

               op->src = walk.src.virt.addr,
               op->dst = walk.dst.virt.addr;
               op->mode = AES_MODE_CBC;
               op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE);
               op->dir = AES_DIR_ENCRYPT;
                       //ret = mydriver_aes_crypt(op);
               ret = mydriver_transform(op, 0);
               nbytes -= ret;
               err = blkcipher_walk_done(desc, &walk, nbytes);
       }

       return err;
}
/*- mydriver_transform which makes a buffer containing key, iv, data
with
some additional header that is required by our accelerator, writes the buffer
to accelerator by DMA and then reads response from hardware.*/

static inline int