Re: What should be the algo priority
"authenc" and "hmac" are templates, not different implementations of a cipher. Please take a look at: https://kernel.readthedocs.io/en/sphinx-samples/crypto-API.html#terminology On Thu, Apr 6, 2017 at 9:56 AM, Harsh Jain wrote: > On Tue, Apr 4, 2017 at 6:07 PM, Stephan Müller wrote: >> Am Dienstag, 4. April 2017, 09:53:17 CEST schrieb Harsh Jain: >> >> Hi Harsh, >> >>> Hi, >>> >>> Do we have any guidelines documented to decide what should be the >>> algorithm priority. Specially for authenc implementation.Most of the >>> drivers have fixed priority for all algos. Problem comes in when we >>> have cbc(aes), hmac(sha1) and authenc(cbc(aes),hmac(sha1)) >>> implementation in driver. Base authenc driver gets more precedence >>> because of higher priority(enc->base.cra_priority * 10 + >>> auth_base->cra_priority;) >>> >>> What should be the priority of >>> cbc(aes), >>> hmac(sha1) >>> authenc(cbc(aes),hmac(sha1)) >> >> There is no general rule about the actual numbers. But commonly, the prios >> are >> set such that the prios of C implementations < ASM implementations < hardware >> accelerators. The idea is to give users the fastest implementation there is >> for his particular system. > > It means cbc, hmac should have smaller(nearly 10 times less) priority > than their authenc implementation otherwise request will not offload > to driver because sw authenc priority is (aes * 10 + hmac). > >> >> Ciao >> Stephan
Re: ARM-CE aes encryption on uneven blocks
Hi, Based on my old experience with "struct crypto_alg" based drivers, the data you receive there, is padded beforehand(in the upper layers); Therefore the plaintext contains integral multiple of AES block size of data and based on the number of blocks, the crypto transform can be computed. Regards, Hamid On Mon, Oct 24, 2016 at 6:11 PM, Cata Vasile wrote: > > Hi, > > I'm trying to understand the code for AES encryption from ARM-CE. > From the aes-glue.S calls I understand that the encryption primitives receive > the number of blocks, but have no way of determining the number of bytes to > encrypt, if for example the plaintext does not have a length of a multiple of > AES block size. > How does, for example, ecb_encrypt() also encrypt the last remaining bytes in > the plaintext if it is not a multiple of AES block size if It can never > deduce the full plaintext size? > > Catalin Vasile-- > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IV generation in geode-aes
Hi, In geode-aes, there is not any IV generation mechanism. In fact IV is delivered to geode-aes's registered algorithms, from upper layers.For example in case of "cbc-aes-geode" algorithm, from cbc wrapper ("cbc.c") via walk->iv: blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; ... Regards. On Thu, Sep 19, 2013 at 4:34 PM, Sohail wrote: > Hi all, > I could'nt understand the mechanism of IV generation in geode-aes. Can > someone explain it in easy to understand manner? > > Thanks a lot. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Old PADATA patch vs crypto-2.6 tree
You must instantiate pcrypt using crconf app or tcrypt module; On Wed, Mar 28, 2012 at 4:23 PM, Sebastien Agnolini wrote: > > Hey, > > How activate the IPsec parallelization ? > I compiled the crypto-2.6 kernel with this param : > CONFIG_CRYPTO_... = y > CONFIG_PADATA = y > CONFIG_SMP=y > After installation on 2 servers (IPSEC tunnel), i don't detect the IPsec > parallelization. > The algorithm is loaded (present in /proc/crypto), but only one core works. > > So, What are the other parameters that I forgot for the compilation of the > kernel? IRQ, IO, Scheduler parameters... Am i missing something ? > I thought that the parallelization was automatically started. True ? > What are the conditions to observe a parallel work ? > A "little" documentation will be Welcome. > > I'd like compare the bandwidth of my test platform using the « old » PADATA > patch. > > Sebastien > -- > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: crypto accelerator driver problems
On 10/11/11, Steffen Klassert wrote: > > I can't tell much when looking at this code snippet. One guess would be > someone (maybe you) has set the CRYPTO_TFM_REQ_MAY_SLEEP flag, as > blkcipher_walk_done calls crypto_yield() which in turn might call > schedule() if this flag is set. prcypt removes this flag explicit. > I've not set such a flag. > > Basically, the bottom halves are off to keep up with the network softirqs. > They run with much higher priority and would interrupt the parallel > workers frequently. > Do you mean that with BHs on, we only have some performance degrades? Thanks for your reply. Any other idea? -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: crypto accelerator driver problems
On Tue, Oct 4, 2011 at 11:27 AM, Steffen Klassert wrote: > > On Sat, Oct 01, 2011 at 12:38:19PM +0330, Hamid Nassiby wrote: > > > > And my_cbc_encrypt function as PSEUDO/real code (for simplicity of > > representation) is as: > > > > static int > > my_cbc_encrypt(struct blkcipher_desc *desc, > > struct scatterlist *dst, struct scatterlist *src, > > unsigned int nbytes) > > { > > SOME__common_preparation_and_initializations; > > > > spin_lock_irqsave(&myloc, myflags); > > send_request_to_device(&dev); /*sends request to device. After > > processing request,device writes > > result to destination*/ > > while(!readl(complete_flag)); /*here we wait for a flag in > > device register space indicating completion. */ > > spin_unlock_irqrestore(&mylock, myflags); > > > > > > } > > As I told you already in the private mail, it makes not too much sense > to parallelize the crypto layer and to hold a global lock during the > crypto operation. So if you really need this lock, you are much better > off without a parallelization. > Hi Steffen, Thanks for your reply :). It makes sense in two manners: 1. If request transmit time to device is much shorter than request processing time spent in device and the device has more than one processing engine. 2. It also can be advantageous when device has only one processing engine and we have multiple blkcipher requests pending behind entrance port of device, because delay between request entrances to device will be shorter. The overall advantage will be that our IPSec throughput gets nearer to our device bulk encryption throughput. (It is interesting to note that with our current driver and device configuration, if I test gateway throughput with a traffic belonging to two SAs, traveling through one link that connects them, I'll get a rate about 280Mbps(80Mbps increase in comparison with one SA's traffic), while our device's bulk processing is about 400Mbps.) Currently we want to take advantage of the latter case and then extend it. > > > > > > > With above code, I can successfully test IPSec gateway equipped with our > > hardware and get a 200Mbps throughput using Iperf. Now I am facing with > > another > > poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With > > above code I only utilize one of them. > > >From this point, we want to go a step further and utilize more than one aes > > engines of our device. Simplest solution appears to me is to deploy > > pcrypt/padata, made by Steffen Klassert. First instantiate in a dual > > core gateway : > > modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3 > > and test again. Running Iperf now gives me a very low > > throughput about 20Mbps while dmesg shows the following: > > > > BUG: workqueue leaked lock or atomic: kworker/0:1/0x0001/10 > > last function: padata_parallel_worker+0x0/0x80 > > This looks like the parallel worker exited in atomic context, > but I can't tell you much more as long as you don't show us your code. OK, I represented code as PSEUSO, just to simplify and concentrate problem's aspects ;), (but it is also possible that I've concentrated it in a wrong way :D) This is my_cbc_encrypt code and functions it calls, bottom-up: int write_request(u8 *buff, unsigned int count) { u32 tlp_size = 32; struct my_dma_desc *desc_table = (struct my_dma_desc *)global_bar[0]; tlp_size = (count/128) | (tlp_size << 16); memcpy(g_mydev->rdmaBuf_va, buff, count); wmb(); writel(cpu_to_le32(tlp_size),(&desc_table->wdmaperf)); wmb(); while((readl(&desc_table->ddmacr) | 0x)!= 0x0101);/*wait for transfer compeltion*/ return 0; } int my_transform(struct my_aes_op *op, int alg) { int req_len, err; unsigned long iflagsq, tflag; u8 *req_buf = NULL, *res_buf = NULL; alg_operation operation; if (op->len == 0) return 0; operation = !(op->dir); create_request(alg, op->mode, operation, 0, op->key, op->iv, op->src, op->len, &req_buf, &req_len); /*add header to original request and copy it to req_buf*/ spin_lock_irqsave(&glock, tflag);
Re: Fwd: crypto accelerator driver problems
Hi all, Referring my previous posts in crypto list related to our hardware aes accelerator project, I finally could deploy device in IPSec successfully. As I mentioned earlier, my driver registers itself in kernel as blkcipher for cbc(aes) as follows: static struct crypto_alg my_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name= "cbc-aes-my", .cra_priority = 400, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_init = fallback_init_blk, .cra_exit = fallback_exit_blk, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize= sizeof(struct my_aes_op), .cra_alignmask = 15, .cra_type = &crypto_blkcipher_type, .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(my_cbc_alg.cra_list), .cra_u = { .blkcipher = { .min_keysize= AES_MIN_KEY_SIZE, .max_keysize= AES_MIN_KEY_SIZE, .setkey = my_setkey_blk, .encrypt= my_cbc_encrypt, .decrypt= my_cbc_decrypt, .ivsize = AES_IV_LENGTH, } } }; And my_cbc_encrypt function as PSEUDO/real code (for simplicity of representation) is as: static int my_cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { SOME__common_preparation_and_initializations; spin_lock_irqsave(&myloc, myflags); send_request_to_device(&dev); /*sends request to device. After processing request,device writes result to destination*/ while(!readl(complete_flag)); /*here we wait for a flag in device register space indicating completion. */ spin_unlock_irqrestore(&mylock, myflags); } With above code, I can successfully test IPSec gateway equipped with our hardware and get a 200Mbps throughput using Iperf. Now I am facing with another poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With above code I only utilize one of them. >From this point, we want to go a step further and utilize more than one aes engines of our device. Simplest solution appears to me is to deploy pcrypt/padata, made by Steffen Klassert. First instantiate in a dual core gateway : modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3 and test again. Running Iperf now gives me a very low throughput about 20Mbps while dmesg shows the following: BUG: workqueue leaked lock or atomic: kworker/0:1/0x0001/10 last function: padata_parallel_worker+0x0/0x80 Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1 Call Trace: [] ? printk+0x18/0x1b [] process_one_work+0x177/0x370 [] ? padata_parallel_worker+0x0/0x80 [] worker_thread+0x127/0x390 [] ? worker_thread+0x0/0x390 [] kthread+0x74/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x6/0x10 BUG: scheduling while atomic: kworker/0:1/10/0x0002 Modules linked in: pcrypt my_aes2 binfmt_misc bridge stp bnep sco rfcomm l2cap crc16 bluetooth rfkill ppdev acpi_cpufreq mperf cpufreq_stats cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave freq_table pci_slot sbs container video output sbshc battery iptable_filter ip_tables x_tables decnet ctr twofish_i586 twofish_generic twofish_common camellia serpent blowfish cast5 aes_i586 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key ac lp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm_oss evdev snd_mixer_oss snd_pcm psmouse serio_raw snd_seq_dummy pcspkr parport_pc parport snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event option usb_wwan snd_seq usbserial snd_timer snd_seq_device button processor iTCO_wdt iTCO_vendor_support snd intel_agp soundcore intel_gtt snd_page_alloc agpgart shpchp pci_hotplug ext3 jbd mbcache sr_mod cdrom sd_mod sg ata_generic pata_jmicron ata_piix pata_acpi libata floppy r8169 mii scsi_mod uhci_hcd ehci_hcd usbcore thermal fan fuse Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1 Call Trace: [] __schedule_bug+0x59/0x70 [] schedule+0x6a7/0xa70 [] ? show_trace_log_lvl+0x47/0x60 [] ? dump_stack+0x6e/0x75 [] ? process_one_work+0x1c8/0x370 [] ? padata_parallel_worker+0x0/0x
Re: Fwd: crypto accelerator driver problems
On Thu, Jan 27, 2011 at 3:03 AM, Herbert Xu wrote: > > On Wed, Jan 26, 2011 at 11:20:22AM +0330, Hamid Nassiby wrote: > > > > Do you mean that different IP packets fit into one single Block Cipher tfm? > > Would you please explain expansively? > > We allocate one tfm per SA. So as long as ordering is guaranteed > per SA then it's guaranteed per SA which is all that's needed. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Dears, Referring to my previous posts related to a hardware AES accelerator (that is to be used to accelerate IPSec block cipher operations) driver, I would like to ask you about an possibly algorithmic problem exists in our solution. As I said earlier our driver is inspired by geode_aes driver, so assume that we have defined our supported algorithm as: static struct crypto_alg shams_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name= "cbc-aes-mine", .cra_priority = 400, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_init = fallback_init_blk, .cra_exit = fallback_exit_blk, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize= sizeof(struct my_aes_op), .cra_alignmask = 0, .cra_type = &crypto_blkcipher_type, .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(shams_cbc_alg.cra_list), .cra_u = { .blkcipher = { .min_keysize= AES_MIN_KEY_SIZE, .max_keysize= AES_MIN_KEY_SIZE, .setkey = my_setkey_blk, .encrypt= my_cbc_encrypt, .decrypt= my_cbc_decrypt, .ivsize = AES_IV_LENGTH, } } }; And our encrypt function, my_cbc_encrypt, looks like: static int my_cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { struct my_aes_op *op = crypto_blkcipher_ctx(desc->tfm); struct blkcipher_walk walk; int err, ret; unsigned long flag1, c2flag; u32 my_req_id; spin_lock_irqsave(&reqlock, c2flag); /*Our request id sent to device and then retrieved to be able to distinguish between device responses. */ my_req_id = (global_reqid++) % 63000; spin_unlock_irqrestore(&reqlock, c2flag); if (unlikely(op->keylen != AES_KEYSIZE_128)) return fallback_blk_enc(desc, dst, src, nbytes); blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; while((nbytes = walk.nbytes)) { op->src = walk.src.virt.addr, op->dst = walk.dst.virt.addr; op->mode = AES_MODE_CBC; op->len = nbytes /*- (nbytes % AES_MIN_BLOCK_SIZE)*/; op->dir = AES_DIR_ENCRYPT; /* Critical PSEUDO code */ spin_lock_irqsave(&1lock, flag1); write_to_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); spin_lock_irqsave(&lock1, flag1); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); /* End of Critical PSEUDO code*/ nbytes -= ret; err = blkcipher_walk_done(desc, &walk, nbytes); } return err; } As I mentioned earlier we have multiple AES engines in our hardware, so to utilize hardware as much as possible, we would like to have the possibility to give multiple requests to device and get responses from it as soon as one becomes ready. Now look at that section of my_cbc_encrypt, commented as "Critical PSEUDO code". This section gives requests to device and reads back responses (And is the damn bottleneck) . If we protect write_to_device and read_from_device call, by one pair of lock/unlock as: /* Critical PSEUDO code */ spin_lock_irqsave(&lock1, flag1); write_to_device(op, 0, my_req_id); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); /* End of Critical PSEUDO code*/ then we would have no problem, system works and IPSec en/decrypts by our hardware. But ONL
Re: Fwd: crypto accelerator driver problems
On Wed, Jan 26, 2011 at 10:39 AM, Herbert Xu wrote: > On Wed, Jan 26, 2011 at 10:26:33AM +0330, Hamid Nassiby wrote: >> >> As you know, I posted my problem again to crypto list and no one answered. >> Now I >> emphasize one aspect of the problem as a concept related to IPSec protocol, >> free >> of my problem's nature, and I hope to get some guidelines at this time. The >> question is as following: >> If IPSec delivers IP packets to hardware crypto accelerator in sequential >> manner >> (e.g, packets in order: 1, 2, 3, ..., 36, 37, 38,...) and crypto accelerator >> possibly returns back packets out of entering order to IPSec (e.g, packet >> 37 is returned back before the packet 36 to IPSec, so the order of packets >> is >> not the same before entering crypto accelerator and after exiting it); Is it >> possible to rise any problem here? > > We do not allow such reordering. All crypto drivers must ensure > ordering within a single tfm. Between different tfms there is no > ordering requirement. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > Do you mean that different IP packets fit into one single Block Cipher tfm? Would you please explain expansively? Thanks a lot, -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: crypto accelerator driver problems
On Sat, Jan 8, 2011 at 11:09 AM, Hamid Nassiby wrote: > > On Fri, Dec 31, 2010 at 12:49 AM, Herbert Xu > wrote: > > > > Hamid Nassiby wrote: > > > Hi, > > > > > > As some good news and additional information, with the following patch > > > I no more get > > > "UDP bad cheksum" error as I mentioned erlier with Iperf in udp mode. > > > But some times I get the following call trace in dmesg after running > > > Iperf in UDP mode, more than one time (and ofcourse Iperf stops > > > transferring data while it uses 100% of CPU cycles. > > > > > > > > > > > > [ 130.171909] mydriver-aes: mydriver Crypto-Engine enabled. > > > [ 134.767846] NET: Registered protocol family 15 > > > [ 200.031846] iperf: page allocation failure. order:0, mode:0x20 > > > [ 200.031850] Pid: 10935, comm: iperf Tainted: P 2.6.36-zen1 > > > #1 > > > [ 200.031852] Call Trace: > > > [ 200.031860] [] ? __alloc_pages_nodemask+0x6d3/0x722 > > > [ 200.031864] [] ? virt_to_head_page+0x9/0x30 > > > [ 200.031867] [] ? alloc_pages_current+0xa5/0xce > > > [ 200.031869] [] ? __get_free_pages+0x9/0x46 > > > [ 200.031872] [] ? need_resched+0x1a/0x23 > > > [ 200.031876] [] ? blkcipher_walk_next+0x68/0x2d9 > > > > This means that your box has run out of memory temporarily. > > If all errors were handled correctly it should continue at this > > point. > > > > > --- mydriver1 2010-12-21 15:20:17.0 +0330 > > > +++ mydriver2 2010-12-21 15:24:18.0 +0330 > > > @@ -1,4 +1,3 @@ > > > - > > > static int > > > mydriver_cbc_decrypt(struct blkcipher_desc *desc, > > > struct scatterlist *dst, struct scatterlist *src, > > > @@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc > > > err = blkcipher_walk_virt(desc, &walk); > > > op->iv = walk.iv; > > > > > > - while((nbytes = walk.nbytes)) { > > > + > > > > However, your patch removes the error checking (and the loop > > condition) which is why it crashes. > > > > Cheers, > > -- > > Email: Herbert Xu > > Home Page: http://gondor.apana.org.au/~herbert/ > > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > > > > Hi Herbert, > > First I should notice that by removing while loop iteration, "UDP bad > checksum" > error in dmesg output is no longer seen. Diving deeper in problem, It seemed > to me that when mydriver_transform returns 0, I must not get any more bytes > (belonging to previous request) to process in the next iteration of while > loop. > But I see that the behavior is not as it has to be (By removing while loop > mydriver_transform gets for example one 1500 byte request, processes it and > copies it back to destination, But in existence of while loop It gets same > request as one 1300 byte request, processes and copies it back to destination, > returning 0, and getting remaining 200 bytes of request in second iteration of > while, so on the other end of tunnel I see "UDP bad checksum"). So I conclude > that blkcipher_walk_done behaves strange, assigns incorrect value to > walk.nbytes > resulting in iterating while loop one time more! > > > Second note is about our accelerator's architecture and the way we should > utilize it. Our device has several crypto engines built in. So for maximum > utilization of device we should feed it with multiple crypto requests > simultaneously (I intended for doing it by using pcrypt) and here is the > point > everything freezes. From other point of view, I found that if I protect > entering > write_request and read_response in mydriver_transform by one lock > (spin_unlock(x) before write_request and spin_unlock(x) after read_reasponse > in > mydriver_transform as shown in following code snippet), I would be able to run > "iperf" in tcp mode successfully. This leads me to uncertainty, because in > such a situation, we only utilize one crypto engine of device and each request > is followed by its response sequentially and arrangement of requests and > responses is not interleaved. So I guess that getting multiple requests to > device and receiving the responses not in the same arrangement they delivered > to > device, might cause TCP transfer to freeze, and here my question arises: If my > conclusion is true, SHOULD I change the driver approach to ablkcipher? > > > Code snippet in the way write_request and read_response are protected by lock > and iperf in
Re: Fwd: crypto accelerator driver problems
On Fri, Dec 31, 2010 at 12:49 AM, Herbert Xu wrote: > > Hamid Nassiby wrote: > > Hi, > > > > As some good news and additional information, with the following patch > > I no more get > > "UDP bad cheksum" error as I mentioned erlier with Iperf in udp mode. > > But some times I get the following call trace in dmesg after running > > Iperf in UDP mode, more than one time (and ofcourse Iperf stops > > transferring data while it uses 100% of CPU cycles. > > > > > > > > [ 130.171909] mydriver-aes: mydriver Crypto-Engine enabled. > > [ 134.767846] NET: Registered protocol family 15 > > [ 200.031846] iperf: page allocation failure. order:0, mode:0x20 > > [ 200.031850] Pid: 10935, comm: iperf Tainted: P 2.6.36-zen1 #1 > > [ 200.031852] Call Trace: > > [ 200.031860] [] ? __alloc_pages_nodemask+0x6d3/0x722 > > [ 200.031864] [] ? virt_to_head_page+0x9/0x30 > > [ 200.031867] [] ? alloc_pages_current+0xa5/0xce > > [ 200.031869] [] ? __get_free_pages+0x9/0x46 > > [ 200.031872] [] ? need_resched+0x1a/0x23 > > [ 200.031876] [] ? blkcipher_walk_next+0x68/0x2d9 > > This means that your box has run out of memory temporarily. > If all errors were handled correctly it should continue at this > point. > > > --- mydriver1 2010-12-21 15:20:17.0 +0330 > > +++ mydriver2 2010-12-21 15:24:18.0 +0330 > > @@ -1,4 +1,3 @@ > > - > > static int > > mydriver_cbc_decrypt(struct blkcipher_desc *desc, > > struct scatterlist *dst, struct scatterlist *src, > > @@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc > > err = blkcipher_walk_virt(desc, &walk); > > op->iv = walk.iv; > > > > - while((nbytes = walk.nbytes)) { > > + > > However, your patch removes the error checking (and the loop > condition) which is why it crashes. > > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Hi Herbert, First I should notice that by removing while loop iteration, "UDP bad checksum" error in dmesg output is no longer seen. Diving deeper in problem, It seemed to me that when mydriver_transform returns 0, I must not get any more bytes (belonging to previous request) to process in the next iteration of while loop. But I see that the behavior is not as it has to be (By removing while loop mydriver_transform gets for example one 1500 byte request, processes it and copies it back to destination, But in existence of while loop It gets same request as one 1300 byte request, processes and copies it back to destination, returning 0, and getting remaining 200 bytes of request in second iteration of while, so on the other end of tunnel I see "UDP bad checksum"). So I conclude that blkcipher_walk_done behaves strange, assigns incorrect value to walk.nbytes resulting in iterating while loop one time more! Second note is about our accelerator's architecture and the way we should utilize it. Our device has several crypto engines built in. So for maximum utilization of device we should feed it with multiple crypto requests simultaneously (I intended for doing it by using pcrypt) and here is the point everything freezes. From other point of view, I found that if I protect entering write_request and read_response in mydriver_transform by one lock (spin_unlock(x) before write_request and spin_unlock(x) after read_reasponse in mydriver_transform as shown in following code snippet), I would be able to run "iperf" in tcp mode successfully. This leads me to uncertainty, because in such a situation, we only utilize one crypto engine of device and each request is followed by its response sequentially and arrangement of requests and responses is not interleaved. So I guess that getting multiple requests to device and receiving the responses not in the same arrangement they delivered to device, might cause TCP transfer to freeze, and here my question arises: If my conclusion is true, SHOULD I change the driver approach to ablkcipher? Code snippet in the way write_request and read_response are protected by lock and iperf in TCP mode progresses: static inline int mydriver_transform(struct mydriver_aes_op *op, int alg) { . . . spin_lock_irqsave(&glock, tflag); write_request(req_buf, req_len); kfree(req_buf); req_buf = NULL; err = read_response(&res_buf,my_req_id); spin_unlock_irqrestore(&glock, tflag2); if (err == 0){ kfree(res_buf); res_buf = NULL;
Fwd: crypto accelerator driver problems
_aes2 fuse nvidia(P) r8169 iTCO_wdt iTCO_vendor_support [ 200.040542] [ 200.040544] Pid: 10935, comm: iperf Tainted: P 2.6.36-zen1 #1 EP45-UD3P/EP45-UD3P [ 200.040546] RIP: 0010:[] [] mydriver_transform+0x1a3/0x6a8 [mydriver_aes2] [ 200.040550] RSP: 0018:880072c5b898 EFLAGS: 00010246 [ 200.040551] RAX: 880055a3a030 RBX: 0680 RCX: 05f0 [ 200.040553] RDX: 0680 RSI: RDI: 880055a3a030 [ 200.040555] RBP: R08: 0680 R09: 0018 [ 200.040556] R10: 7d078004 R11: 00013234 R12: 0010 [ 200.040558] R13: 0004 R14: eaef R15: 05f0 [ 200.040561] FS: 41767950(0063) GS:880001a0() knlGS: [ 200.040562] CS: 0010 DS: ES: CR0: 8005003b [ 200.040564] CR2: CR3: 7409d000 CR4: 000406f0 [ 200.040566] DR0: DR1: DR2: [ 200.040568] DR3: DR6: 0ff0 DR7: 0400 [ 200.040570] Process iperf (pid: 10935, threadinfo 880072c5a000, task 880006d09000) [ 200.040571] Stack: [ 200.040572] 880006d09000 817854f0 0020 [ 200.040574] <0> efea72c5b9e8 88007d4a7c58 0001810afac2 [ 200.040577] <0> 88004d53dc00 880072c5b901 000f [ 200.040580] Call Trace: [ 200.040585] [] ? need_resched+0x1a/0x23 [ 200.040588] [] ? mydriver_cbc_encrypt+0x7e/0x9c [mydriver_aes2] [ 200.040592] [] ? async_encrypt+0x35/0x3a [ 200.040595] [] ? eseqiv_givencrypt+0x341/0x389 [ 200.040598] [] ? __skb_to_sgvec+0x49/0x1ea [ 200.040600] [] ? __skb_to_sgvec+0x1b2/0x1ea [ 200.040603] [] ? crypto_authenc_givencrypt+0x60/0x7c [ 200.040607] [] ? esp_output+0x320/0x357 [ 200.040610] [] ? xfrm_output_resume+0x38d/0x48f [ 200.040613] [] ? nf_hook_slow+0xc8/0xd9 [ 200.040616] [] ? ip_push_pending_frames+0x2cc/0x328 [ 200.040619] [] ? udp_push_pending_frames+0x2c4/0x342 [ 200.040621] [] ? udp_sendmsg+0x508/0x600 [ 200.040623] [] ? need_resched+0x1a/0x23 [ 200.040627] [] ? sock_aio_write+0xd5/0xe9 [ 200.040630] [] ? apic_timer_interrupt+0xe/0x20 [ 200.040633] [] ? do_sync_write+0xb0/0xf2 [ 200.040636] [] ? sched_clock+0x5/0x8 [ 200.040639] [] ? security_file_permission+0x18/0x67 [ 200.040641] [] ? vfs_write+0xbc/0x101 [ 200.040643] [] ? sys_write+0x45/0x6e [ 200.040646] [] ? system_call_fastpath+0x16/0x1b [ 200.040647] Code: 83 c0 08 80 7c 24 50 01 75 10 48 89 c7 49 63 cc 48 8b 74 24 48 f3 a4 48 89 f8 48 89 c7 48 8b 74 24 40 41 0f b7 d8 49 63 cf 89 da a4 4c 8b 2d 62 1d 00 00 48 8b 3d 53 1d 00 00 b1 01 48 8b 74 [ 200.040668] RIP [] mydriver_transform+0x1a3/0x6a8 [mydriver_aes2] [ 200.040671] RSP [ 200.040672] CR2: [ 200.040733] ---[ end trace ae2865df0a025f7d ]--- [ 221.687773] SysRq : Emergency Sync BUT Iperf in TCP mode has its own problems yet ( the system freezes with no response ). Thank in advance, Hamid. --- mydriver1 2010-12-21 15:20:17.0 +0330 +++ mydriver2 2010-12-21 15:24:18.0 +0330 @@ -1,4 +1,3 @@ - static int mydriver_cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, @@ -14,18 +13,17 @@ mydriver_cbc_decrypt(struct blkcipher_desc err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; - while((nbytes = walk.nbytes)) { + op->src = walk.src.virt.addr, op->dst = walk.dst.virt.addr; op->mode = AES_MODE_CBC; - op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE); + op->len = nbytes; op->dir = AES_DIR_DECRYPT; - ret = mydriver_transform(op, 0); nbytes -= ret; err = blkcipher_walk_done(desc, &walk, nbytes); - } + return err; } @@ -45,16 +43,17 @@ mydriver_cbc_encrypt(struct blkcipher_desc err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; - while((nbytes = walk.nbytes)) { + op->src = walk.src.virt.addr, op->dst = walk.dst.virt.addr; op->mode = AES_MODE_CBC; - op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE); + op->len = nbytes; op->dir = AES_DIR_ENCRYPT; ret = mydriver_transform(op, 0); nbytes -= ret; err = blkcipher_walk_done(desc, &walk, nbytes); - } + return err; } + -- Forwarded message -- From: Hamid Nassiby Date: Sun, Dec 19, 2010 at 4:28 PM Subject: crypto accelerator driver problems To: linux-crypto@vger.kernel.org Hi All, In a research project, w
crypto accelerator driver problems
Hi All, In a research project, we've developed a crypto accelerator based on Xilinx Virtex5 FPGA family which is connected to PC through PCI-Express slot and is used by IPSec to offload crypto processing from CPU. The accelerator only provides AES and DES3_EDE algorithms and I am responsible for providing driver of the stuff. I inspired much of driver work from geode_aes.c which is located in "drivers/crypto" subdir of kernel source directory. Both algorithms are registered as blkcipher providing cbc wrapper "cbc(aes)" just as one that is registered in geode_aes. Now after months of work, the accelerator is ready to work (Correctness of hardware operation is assured by direct crypto test and not by IPSec) and it is time of driver to provide IPSec access to accelerator. In first try I could get "ping" through the IPsec tunnel. One end of IPSec tunnel is equipped by our accelerator and the other end is using kernel native IPSec and built in AES and DES3_EDE algorithms. Now I am faced with 2 problems: 1. Ping will stop getting reply with packet sizes greater than 1426 Bytes (ping dest_ip -s 1427). I guessed that it might be MTU problem, but reducing mtu with "ifconfig eth1 mtu xxx" or "echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc" does not solve the problem. Also when I ping each of tunnel ends from another end simultaneously with "ping other_node_ip -i 0.001", the kernel hangs out completely. 2. Iperf problem. When I try to measure throughput of the IPSec gateway equipped by our accelerator ( AES-MD5 ), using iperf in tcp mode, the kernel hangs such that sometimes "Magic SysRq key" does not respond too! And so I could not trace the problem anyway. Using iperf in udp mode works but I get "UDP bad cheksum" in 'dmesg' output of other end of tunnel (Native IPSec and built in kernel algorithms). Two gateways are connected by a cross cable and no router/switch is located between them to cause mtu problems. In my test pcrypt is not used by now and booting the kernel with nosmp (so no fear of thread contention) does not change the situation. So I request you to help me solve the problem. I bring some parts of driver that is changed from geode_aes.c and might give useful information. If it is required, I'll post all driver text. -- static struct crypto_alg mydriver_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name = "cbc-aes-mydriver", .cra_priority = 400, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_init = fallback_init_blk, .cra_exit = fallback_exit_blk, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize = sizeof(struct mydriver_aes_op), .cra_alignmask = 15, .cra_type = &crypto_blkcipher_type, .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(mydriver_cbc_alg.cra_list), .cra_u = { .blkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MIN_KEY_SIZE, .setkey = mydriver_setkey_blk, .encrypt = mydriver_cbc_encrypt, .decrypt = mydriver_cbc_decrypt, .ivsize = AES_IV_LENGTH, } } }; //--- static int mydriver_cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { struct mydriver_aes_op *op = crypto_blkcipher_ctx(desc->tfm); struct blkcipher_walk walk; int err, ret; if (unlikely(op->keylen != AES_KEYSIZE_128)) return fallback_blk_enc(desc, dst, src, nbytes); blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; while((nbytes = walk.nbytes)) { op->src = walk.src.virt.addr, op->dst = walk.dst.virt.addr; op->mode = AES_MODE_CBC; op->len = nbytes - (nbytes % AES_MIN_BLOCK_SIZE); op->dir = AES_DIR_ENCRYPT; //ret = mydriver_aes_crypt(op); ret = mydriver_transform(op, 0); nbytes -= ret; err = blkcipher_walk_done(desc, &walk, nbytes); } return err; } /*- mydriver_transform which makes a buffer containing key, iv, data with some additional header that is required by our accelerator, writes the buffer to accelerator by DMA and then reads response from hardware.*/ static inline int