[PATCH v4 2/2] arm64: dts: imx8mp: add reserve-memory nodes for DSP

2023-10-13 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the reserve-memory nodes used by DSP when the rpmsg
feature is enabled.

Signed-off-by: Iuliana Prodan 
---
 arch/arm64/boot/dts/freescale/imx8mp-evk.dts | 22 
 1 file changed, 22 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp-evk.dts 
b/arch/arm64/boot/dts/freescale/imx8mp-evk.dts
index fa37ce89f8d3..b677ad8ef042 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-evk.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-evk.dts
@@ -125,6 +125,28 @@
};
 
};
+
+   reserved-memory {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   dsp_vdev0vring0: vdev0vring0@942f {
+   reg = <0 0x942f 0 0x8000>;
+   no-map;
+   };
+
+   dsp_vdev0vring1: vdev0vring1@942f8000 {
+   reg = <0 0x942f8000 0 0x8000>;
+   no-map;
+   };
+
+   dsp_vdev0buffer: vdev0buffer@9430 {
+   compatible = "shared-dma-pool";
+   reg = <0 0x9430 0 0x10>;
+   no-map;
+   };
+   };
 };
 
 &flexspi {
-- 
2.17.1



[PATCH v4 1/2] remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op

2023-10-13 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the .find_loaded_rsc_table operation for i.MX DSP.
We need it for inter-process communication between DSP
and main core.

This callback is used to find the resource table (defined
in remote processor linker script) where the address of the
vrings along with the other allocated resources (carveouts etc)
are stored.
If this is not found, the vrings are not allocated and
the IPC between cores will not work.

Signed-off-by: Iuliana Prodan 
Reviewed-by: Daniel Baluta 
---
 drivers/remoteproc/imx_dsp_rproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index 8fcda9b74545..a1c62d15f16c 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -940,6 +940,7 @@ static const struct rproc_ops imx_dsp_rproc_ops = {
.kick   = imx_dsp_rproc_kick,
.load   = imx_dsp_rproc_elf_load_segments,
.parse_fw   = imx_dsp_rproc_parse_fw,
+   .find_loaded_rsc_table = rproc_elf_find_loaded_rsc_table,
.sanity_check   = rproc_elf_sanity_check,
.get_boot_addr  = rproc_elf_get_boot_addr,
 };
-- 
2.17.1



[PATCH v4 0/2] Rpmsg support for i.MX DSP with resource table

2023-10-13 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

These patches are needed in order to support rpmsg on DSP when a
resource table is available.

Changes since v3:
 - add reserve-memory nodes in imx8mp-evk.dts rather than .dtsi (patch 2/2)

Changes since v2:
 - add newline between nodes in dtsi (patch 2/2)

Changes since v1:
 - add missing bracket in dtsi (patch 2/2)

Iuliana Prodan (2):
  remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op
  arm64: dts: imx8mp: add reserve-memory nodes for DSP

 arch/arm64/boot/dts/freescale/imx8mp-evk.dts | 22 
 drivers/remoteproc/imx_dsp_rproc.c   |  1 +
 2 files changed, 23 insertions(+)

-- 
2.17.1



[PATCH v3 2/2] arm64: dts: imx8mp: add reserve-memory nodes for DSP

2023-10-10 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the reserve-memory nodes used by DSP when the rpmsg
feature is enabled.

Signed-off-by: Iuliana Prodan 
---
 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
index cc406bb338fe..22815b3ea890 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
@@ -211,6 +211,22 @@
reg = <0 0x9240 0 0x200>;
no-map;
};
+
+   dsp_vdev0vring0: vdev0vring0@942f {
+   reg = <0 0x942f 0 0x8000>;
+   no-map;
+   };
+
+   dsp_vdev0vring1: vdev0vring1@942f8000 {
+   reg = <0 0x942f8000 0 0x8000>;
+   no-map;
+   };
+
+   dsp_vdev0buffer: vdev0buffer@9430 {
+   compatible = "shared-dma-pool";
+   reg = <0 0x9430 0 0x10>;
+   no-map;
+   };
};
 
pmu {
-- 
2.17.1



[PATCH v3 1/2] remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op

2023-10-10 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the .find_loaded_rsc_table operation for i.MX DSP.
We need it for inter-process communication between DSP
and main core.

This callback is used to find the resource table (defined
in remote processor linker script) where the address of the
vrings along with the other allocated resources (carveouts etc)
are stored.
If this is not found, the vrings are not allocated and
the IPC between cores will not work.

Signed-off-by: Iuliana Prodan 
Reviewed-by: Daniel Baluta 
---
 drivers/remoteproc/imx_dsp_rproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index 8fcda9b74545..a1c62d15f16c 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -940,6 +940,7 @@ static const struct rproc_ops imx_dsp_rproc_ops = {
.kick   = imx_dsp_rproc_kick,
.load   = imx_dsp_rproc_elf_load_segments,
.parse_fw   = imx_dsp_rproc_parse_fw,
+   .find_loaded_rsc_table = rproc_elf_find_loaded_rsc_table,
.sanity_check   = rproc_elf_sanity_check,
.get_boot_addr  = rproc_elf_get_boot_addr,
 };
-- 
2.17.1



[PATCH v3 0/2] Rpmsg support for i.MX DSP with resource table

2023-10-10 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

These patches are needed in order to support rpmsg on DSP when a
resource table is available.

Changes since v2:
 - add newline between nodes in dtsi (patch 2/2)

Changes since v1:
 - add missing bracket in dtsi (patch 2/2)

Iuliana Prodan (2):
  remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op
  arm64: dts: imx8mp: add reserve-memory nodes for DSP

 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 16 
 drivers/remoteproc/imx_dsp_rproc.c|  1 +
 2 files changed, 17 insertions(+)

-- 
2.17.1



[PATCH v2 2/2] arm64: dts: imx8mp: add reserve-memory nodes for DSP

2023-09-12 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the reserve-memory nodes used by DSP when the rpmsg
feature is enabled.

Signed-off-by: Iuliana Prodan 
---
 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
index cc406bb338fe..59e672382b07 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
@@ -211,6 +211,19 @@
reg = <0 0x9240 0 0x200>;
no-map;
};
+   dsp_vdev0vring0: vdev0vring0@942f {
+   reg = <0 0x942f 0 0x8000>;
+   no-map;
+   };
+   dsp_vdev0vring1: vdev0vring1@942f8000 {
+   reg = <0 0x942f8000 0 0x8000>;
+   no-map;
+   };
+   dsp_vdev0buffer: vdev0buffer@9430 {
+   compatible = "shared-dma-pool";
+   reg = <0 0x9430 0 0x10>;
+   no-map;
+   };
};
 
pmu {
-- 
2.17.1



[PATCH v2 1/2] remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op

2023-09-12 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the .find_loaded_rsc_table operation for i.MX DSP.
We need it for inter-process communication between DSP
and main core.

This callback is used to find the resource table (defined
in remote processor linker script) where the address of the
vrings along with the other allocated resources (carveouts etc)
are stored.
If this is not found, the vrings are not allocated and
the IPC between cores will not work.

Signed-off-by: Iuliana Prodan 
Reviewed-by: Daniel Baluta 
---
 drivers/remoteproc/imx_dsp_rproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index 8fcda9b74545..a1c62d15f16c 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -940,6 +940,7 @@ static const struct rproc_ops imx_dsp_rproc_ops = {
.kick   = imx_dsp_rproc_kick,
.load   = imx_dsp_rproc_elf_load_segments,
.parse_fw   = imx_dsp_rproc_parse_fw,
+   .find_loaded_rsc_table = rproc_elf_find_loaded_rsc_table,
.sanity_check   = rproc_elf_sanity_check,
.get_boot_addr  = rproc_elf_get_boot_addr,
 };
-- 
2.17.1



[PATCH v2 0/2] Rpmsg support for i.MX DSP with resource table

2023-09-12 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

These patches are needed in order to support rpmsg on DSP when a
resource table is available.

Changes since v1:
 - add missing bracket in dtsi (patch 2/2)

Iuliana Prodan (2):
  remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op
  arm64: dts: imx8mp: add reserve-memory nodes for DSP

 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 13 +
 drivers/remoteproc/imx_dsp_rproc.c|  1 +
 2 files changed, 14 insertions(+)

-- 
2.17.1



[PATCH 1/2] remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op

2023-09-11 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the .find_loaded_rsc_table operation for i.MX DSP.
We need it for inter-process communication between DSP
and main core.

This callback is used to find the resource table (defined
in remote processor linker script) where the address of the
vrings along with the other allocated resources (carveouts etc)
are stored.
If this is not found, the vrings are not allocated and
the IPC between cores will not work.

Signed-off-by: Iuliana Prodan 
Reviewed-by: Daniel Baluta 
---
 drivers/remoteproc/imx_dsp_rproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index 8fcda9b74545..a1c62d15f16c 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -940,6 +940,7 @@ static const struct rproc_ops imx_dsp_rproc_ops = {
.kick   = imx_dsp_rproc_kick,
.load   = imx_dsp_rproc_elf_load_segments,
.parse_fw   = imx_dsp_rproc_parse_fw,
+   .find_loaded_rsc_table = rproc_elf_find_loaded_rsc_table,
.sanity_check   = rproc_elf_sanity_check,
.get_boot_addr  = rproc_elf_get_boot_addr,
 };
-- 
2.17.1



[PATCH 0/2] Rpmsg support for i.MX DSP with resource table

2023-09-11 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

These patches are needed in order to support rpmsg on DSP when a
resource table is available.

Iuliana Prodan (2):
  remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op
  arm64: dts: imx8mp: add reserve-memory nodes for DSP

 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 12 
 drivers/remoteproc/imx_dsp_rproc.c|  1 +
 2 files changed, 13 insertions(+)

-- 
2.17.1



[PATCH 2/2] arm64: dts: imx8mp: add reserve-memory nodes for DSP

2023-09-11 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the reserve-memory nodes used by DSP when the rpmsg
feature is enabled.
These can be later used in a dsp node, like:
dsp: dsp@3b6e8000 {
compatible = "fsl,imx8mp-dsp";
reg = <0x3b6e8000 0x88000>;
mbox-names = "tx0", "rx0", "rxdb0";
mboxes = <&mu2 2 0>, <&mu2 2 1>,
<&mu2 3 0>, <&mu2 3 1>;
memory-region = <&dsp_vdev0buffer>, <&dsp_vdev0vring0>,
<&dsp_vdev0vring1>, <&dsp_reserved>;
status = "okay";
};

Signed-off-by: Iuliana Prodan 
---
 arch/arm64/boot/dts/freescale/imx8mp.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
index cc406bb338fe..eedc1921af62 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
@@ -210,6 +210,18 @@
dsp_reserved: dsp@9240 {
reg = <0 0x9240 0 0x200>;
no-map;
+   dsp_vdev0vring0: vdev0vring0@942f {
+   reg = <0 0x942f 0 0x8000>;
+   no-map;
+   };
+   dsp_vdev0vring1: vdev0vring1@942f8000 {
+   reg = <0 0x942f8000 0 0x8000>;
+   no-map;
+   };
+   dsp_vdev0buffer: vdev0buffer@9430 {
+   compatible = "shared-dma-pool";
+   reg = <0 0x9430 0 0x10>;
+   no-map;
};
};
 
-- 
2.17.1



[PATCH 5/5] crypto: caam/qi2 - avoid allocating memory at crypto request runtime

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is computed in frontend
driver (caamalg_qi2) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg_qi2.c | 415 --
 drivers/crypto/caam/caamalg_qi2.h |   6 +
 2 files changed, 288 insertions(+), 133 deletions(-)

diff --git a/drivers/crypto/caam/caamalg_qi2.c 
b/drivers/crypto/caam/caamalg_qi2.c
index a780e627838a..88bbed7dc65b 100644
--- a/drivers/crypto/caam/caamalg_qi2.c
+++ b/drivers/crypto/caam/caamalg_qi2.c
@@ -362,17 +362,10 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
dma_addr_t qm_sg_dma, iv_dma = 0;
int ivsize = 0;
unsigned int authsize = ctx->authsize;
-   int qm_sg_index = 0, qm_sg_nents = 0, qm_sg_bytes;
+   int qm_sg_index = 0, qm_sg_nents = 0, qm_sg_bytes, edesc_size = 0;
int in_len, out_len;
struct dpaa2_sg_entry *sg_table;
 
-   /* allocate space for base edesc, link tables and IV */
-   edesc = qi_cache_zalloc(GFP_DMA | flags);
-   if (unlikely(!edesc)) {
-   dev_err(dev, "could not allocate extended descriptor\n");
-   return ERR_PTR(-ENOMEM);
-   }
-
if (unlikely(req->dst != req->src)) {
src_len = req->assoclen + req->cryptlen;
dst_len = src_len + (encrypt ? authsize : (-authsize));
@@ -381,7 +374,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (unlikely(src_nents < 0)) {
dev_err(dev, "Insufficient bytes (%d) in src S/G\n",
src_len);
-   qi_cache_free(edesc);
return ERR_PTR(src_nents);
}
 
@@ -389,7 +381,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (unlikely(dst_nents < 0)) {
dev_err(dev, "Insufficient bytes (%d) in dst S/G\n",
dst_len);
-   qi_cache_free(edesc);
return ERR_PTR(dst_nents);
}
 
@@ -398,7 +389,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
  DMA_TO_DEVICE);
if (unlikely(!mapped_src_nents)) {
dev_err(dev, "unable to map source\n");
-   qi_cache_free(edesc);
return ERR_PTR(-ENOMEM);
}
} else {
@@ -412,7 +402,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
dev_err(dev, "unable to map destination\n");
dma_unmap_sg(dev, req->src, src_nents,
 DMA_TO_DEVICE);
-   qi_cache_free(edesc);
return ERR_PTR(-ENOMEM);
}
} else {
@@ -426,7 +415,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (unlikely(src_nents < 0)) {
dev_err(dev, "Insufficient bytes (%d) in src S/G\n",
src_len);
-   qi_cache_free(edesc);
return ERR_PTR(src_nents);
}
 
@@ -434,7 +422,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
  DMA_BIDIRECTIONAL);
if (unlikely(!mapped_src_nents)) {
dev_err(dev, "unable to map source\n");
-   qi_cache_free(edesc);
return ERR_PTR(-ENOMEM);
}
}
@@ -466,14 +453,30 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
 
sg_table = &edesc->sgt[0];
qm_sg_bytes = qm_sg_nents * sizeof(*sg_table);
-   if (unlikely(offsetof(struct aead_edesc, sgt) + qm_sg_bytes + ivsize >
-CAAM_QI_MEMCACHE_SIZE)) {
+
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = offsetof(struct aead_edesc, sgt) + qm_sg_bytes + ivsize;
+   if (unlikely(edesc_size > CAAM_QI_MEMCACHE_SIZE)) {
dev_err(dev, "No space for %d S/G entries and/or %dB IV\n",
 

[PATCH 4/5] crypto: caam/qi - avoid allocating memory at crypto request runtime

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is computed in frontend
driver (caamalg_qi) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg_qi.c | 134 +--
 1 file changed, 90 insertions(+), 44 deletions(-)

diff --git a/drivers/crypto/caam/caamalg_qi.c b/drivers/crypto/caam/caamalg_qi.c
index a24ae966df4a..ea49697e2579 100644
--- a/drivers/crypto/caam/caamalg_qi.c
+++ b/drivers/crypto/caam/caamalg_qi.c
@@ -788,6 +788,7 @@ static int xts_skcipher_setkey(struct crypto_skcipher 
*skcipher, const u8 *key,
  * @dst_nents: number of segments in output scatterlist
  * @iv_dma: dma address of iv for checking continuity and link table
  * @qm_sg_bytes: length of dma mapped h/w link table
+ * @free: stored to determine if aead_edesc needs to be freed
  * @qm_sg_dma: bus physical mapped address of h/w link table
  * @assoclen: associated data length, in CAAM endianness
  * @assoclen_dma: bus physical mapped address of req->assoclen
@@ -799,6 +800,7 @@ struct aead_edesc {
int dst_nents;
dma_addr_t iv_dma;
int qm_sg_bytes;
+   bool free;
dma_addr_t qm_sg_dma;
unsigned int assoclen;
dma_addr_t assoclen_dma;
@@ -812,6 +814,7 @@ struct aead_edesc {
  * @dst_nents: number of segments in output scatterlist
  * @iv_dma: dma address of iv for checking continuity and link table
  * @qm_sg_bytes: length of dma mapped h/w link table
+ * @free: stored to determine if skcipher_edesc needs to be freed
  * @qm_sg_dma: bus physical mapped address of h/w link table
  * @drv_req: driver-specific request structure
  * @sgt: the h/w link table, followed by IV
@@ -821,6 +824,7 @@ struct skcipher_edesc {
int dst_nents;
dma_addr_t iv_dma;
int qm_sg_bytes;
+   bool free;
dma_addr_t qm_sg_dma;
struct caam_drv_req drv_req;
struct qm_sg_entry sgt[];
@@ -927,7 +931,8 @@ static void aead_done(struct caam_drv_req *drv_req, u32 
status)
aead_unmap(qidev, edesc, aead_req);
 
aead_request_complete(aead_req, ecode);
-   qi_cache_free(edesc);
+   if (edesc->free)
+   qi_cache_free(edesc);
 }
 
 /*
@@ -949,7 +954,7 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
dma_addr_t qm_sg_dma, iv_dma = 0;
int ivsize = 0;
unsigned int authsize = ctx->authsize;
-   int qm_sg_index = 0, qm_sg_ents = 0, qm_sg_bytes;
+   int qm_sg_index = 0, qm_sg_ents = 0, qm_sg_bytes, edesc_size = 0;
int in_len, out_len;
struct qm_sg_entry *sg_table, *fd_sgt;
struct caam_drv_ctx *drv_ctx;
@@ -958,13 +963,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (IS_ERR_OR_NULL(drv_ctx))
return (struct aead_edesc *)drv_ctx;
 
-   /* allocate space for base edesc and hw desc commands, link tables */
-   edesc = qi_cache_alloc(GFP_DMA | flags);
-   if (unlikely(!edesc)) {
-   dev_err(qidev, "could not allocate extended descriptor\n");
-   return ERR_PTR(-ENOMEM);
-   }
-
if (likely(req->src == req->dst)) {
src_len = req->assoclen + req->cryptlen +
  (encrypt ? authsize : 0);
@@ -973,7 +971,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (unlikely(src_nents < 0)) {
dev_err(qidev, "Insufficient bytes (%d) in src S/G\n",
src_len);
-   qi_cache_free(edesc);
return ERR_PTR(src_nents);
}
 
@@ -981,7 +978,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
  DMA_BIDIRECTIONAL);
if (unlikely(!mapped_src_nents)) {
dev_err(qidev, "unable to map source\n");
-   qi_cache_free(edesc);
return ERR_PTR(-ENOMEM);
}
} else {
@@ -992,7 +988,6 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (unlikely(src_nents < 0)) {
dev_err(qidev, "Insufficient bytes (%d) in src S/G\n",
src_len);
-   qi_cache_free(edesc);
return ERR_PTR(src_nents);
  

[PATCH 3/5] crypto: caam/jr - avoid allocating memory at crypto request runtime fost hash

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor and link tables
is computed in frontend driver (caamhash) initialization and saved
in reqsize field that indicates how much memory could be needed per
request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamhash.c | 77 +-
 1 file changed, 57 insertions(+), 20 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index e8a6d8bc43b5..4a6376691ad6 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -527,6 +527,7 @@ static int acmac_setkey(struct crypto_ahash *ahash, const 
u8 *key,
  * @src_nents: number of segments in input scatterlist
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if ahash_edesc needs to be freed
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
  * @sec4_sg: h/w link table
  */
@@ -535,6 +536,7 @@ struct ahash_edesc {
int src_nents;
int sec4_sg_bytes;
bool bklog;
+   bool free;
u32 hw_desc[DESC_JOB_IO_LEN_MAX / sizeof(u32)] cacheline_aligned;
struct sec4_sg_entry sec4_sg[];
 };
@@ -595,7 +597,8 @@ static inline void ahash_done_cpy(struct device *jrdev, u32 
*desc, u32 err,
 
ahash_unmap_ctx(jrdev, edesc, req, digestsize, dir);
memcpy(req->result, state->caam_ctx, digestsize);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
print_hex_dump_debug("ctx@"__stringify(__LINE__)": ",
 DUMP_PREFIX_ADDRESS, 16, 4, state->caam_ctx,
@@ -644,7 +647,8 @@ static inline void ahash_done_switch(struct device *jrdev, 
u32 *desc, u32 err,
ecode = caam_jr_strstatus(jrdev, err);
 
ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len, dir);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
scatterwalk_map_and_copy(state->buf, req->src,
 req->nbytes - state->next_buflen,
@@ -701,11 +705,25 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
ahash_request *req,
   GFP_KERNEL : GFP_ATOMIC;
struct ahash_edesc *edesc;
unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
-
-   edesc = kzalloc(sizeof(*edesc) + sg_size, GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(ctx->jrdev, "could not allocate extended descriptor\n");
-   return NULL;
+   int edesc_size;
+
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = sizeof(*edesc) + sg_size;
+   if (edesc_size > (crypto_ahash_reqsize(ahash) -
+ sizeof(struct caam_hash_state))) {
+   /* allocate space for base edesc and link tables */
+   edesc = kzalloc(sizeof(*edesc) + sg_size, GFP_DMA | flags);
+   if (!edesc) {
+   dev_err(ctx->jrdev, "could not allocate extended 
descriptor\n");
+   return NULL;
+   }
+   edesc->free = true;
+   } else {
+   /* get address for base edesc and link tables */
+   edesc = (struct ahash_edesc *)((u8 *)state +
+sizeof(struct caam_hash_state));
+   /* clear memory */
+   memset(edesc, 0, sizeof(*edesc));
}
 
state->edesc = edesc;
@@ -767,7 +785,8 @@ static int ahash_do_one_req(struct crypto_engine *engine, 
void *areq)
 
if (ret != -EINPROGRESS) {
ahash_unmap(jrdev, state->edesc, req, 0);
-   kfree(state->edesc);
+   if (state->edesc->free)
+   kfree(state->edesc);
} else {
ret = 0;
}
@@ -802,7 +821,8 @@ static int ahash_enqueue_req(struct device *jrdev,
 
if ((ret != -EINPROGRESS) && (ret != -EBUSY)) {
ahash_unmap_ctx(jrdev, edesc, req, dst_len, dir);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
}
 
return ret;
@@ -930,7 +950,8 @@ static int ahash_update_ctx(struct ahash_request *req)
return ret;
 unmap_ctx:
ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len, DMA_BIDIRECTIONAL);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
return ret;
 }
 
@@ -991,7 +1012,8 @@ static int ahash_final_ctx(struct ahash_request *

[PATCH 2/5] crypto: caam/jr - avoid allocating memory at crypto request runtime for aead

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is computed in frontend
driver (caamalg) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg.c | 64 ++-
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index ef49781a2545..058c808dbae9 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -880,6 +880,7 @@ static int xts_skcipher_setkey(struct crypto_skcipher 
*skcipher, const u8 *key,
  * @mapped_dst_nents: number of segments in output h/w link table
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if aead_edesc needs to be freed
  * @sec4_sg_dma: bus physical mapped address of h/w link table
  * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
@@ -891,6 +892,7 @@ struct aead_edesc {
int mapped_dst_nents;
int sec4_sg_bytes;
bool bklog;
+   bool free;
dma_addr_t sec4_sg_dma;
struct sec4_sg_entry *sec4_sg;
u32 hw_desc[];
@@ -987,8 +989,8 @@ static void aead_crypt_done(struct device *jrdev, u32 
*desc, u32 err,
ecode = caam_jr_strstatus(jrdev, err);
 
aead_unmap(jrdev, edesc, req);
-
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
/*
 * If no backlog flag, the completion of the request is done
@@ -1301,7 +1303,7 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
int src_len, dst_len = 0;
struct aead_edesc *edesc;
-   int sec4_sg_index, sec4_sg_len, sec4_sg_bytes;
+   int sec4_sg_index, sec4_sg_len, sec4_sg_bytes, edesc_size = 0;
unsigned int authsize = ctx->authsize;
 
if (unlikely(req->dst != req->src)) {
@@ -1381,13 +1383,30 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
 
sec4_sg_bytes = sec4_sg_len * sizeof(struct sec4_sg_entry);
 
-   /* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
-   GFP_DMA | flags);
-   if (!edesc) {
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
-  0, 0, 0);
-   return ERR_PTR(-ENOMEM);
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = sizeof(*edesc) + desc_bytes + sec4_sg_bytes;
+   if (edesc_size > (crypto_aead_reqsize(aead) -
+ sizeof(struct caam_aead_req_ctx))) {
+   /*
+* allocate space for base edesc and
+* hw desc commands, link tables
+*/
+   edesc = kzalloc(edesc_size, GFP_DMA | flags);
+   if (!edesc) {
+   caam_unmap(jrdev, req->src, req->dst, src_nents,
+  dst_nents, 0, 0, 0, 0);
+   return ERR_PTR(-ENOMEM);
+   }
+   edesc->free = true;
+   } else {
+   /*
+* get address for base edesc and
+* hw desc commands, link tables
+*/
+   edesc = (struct aead_edesc *)((u8 *)rctx +
+   sizeof(struct caam_aead_req_ctx));
+   /* clear memory */
+   memset(edesc, 0, sizeof(*edesc));
}
 
edesc->src_nents = src_nents;
@@ -1420,7 +1439,8 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
aead_unmap(jrdev, edesc, req);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
return ERR_PTR(-ENOMEM);
}
 
@@ -1450,7 +1470,8 @@ static int aead_enqueue_req(struct device *jrdev, struct 
aead_request *req)
 
if ((ret != -EINPROGRESS) && (ret != -EBUSY)) {
aead_unmap(jrdev, edesc, req);
-   kfree(rctx->edesc);
+   if (rctx->edesc->free)
+   kfree(

[PATCH 1/5] crypto: caam/jr - avoid allocating memory at crypto request runtime for skcipher

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor and hw
descriptor commands, link tables, IV is computed in frontend
driver (caamalg) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 1 for IV, and the same for dst, both aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg.c | 77 +--
 1 file changed, 55 insertions(+), 22 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 8697ae53b063..ef49781a2545 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -905,6 +905,7 @@ struct aead_edesc {
  * @iv_dma: dma address of iv for checking continuity and link table
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if skcipher_edesc needs to be freed
  * @sec4_sg_dma: bus physical mapped address of h/w link table
  * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
@@ -918,6 +919,7 @@ struct skcipher_edesc {
dma_addr_t iv_dma;
int sec4_sg_bytes;
bool bklog;
+   bool free;
dma_addr_t sec4_sg_dma;
struct sec4_sg_entry *sec4_sg;
u32 hw_desc[];
@@ -1037,7 +1039,8 @@ static void skcipher_crypt_done(struct device *jrdev, u32 
*desc, u32 err,
 DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
 edesc->dst_nents > 1 ? 100 : req->cryptlen, 1);
 
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
/*
 * If no backlog flag, the completion of the request is done
@@ -1604,7 +1607,7 @@ static struct skcipher_edesc *skcipher_edesc_alloc(struct 
skcipher_request *req,
dma_addr_t iv_dma = 0;
u8 *iv;
int ivsize = crypto_skcipher_ivsize(skcipher);
-   int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes;
+   int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes, edesc_size = 0;
 
src_nents = sg_nents_for_len(req->src, req->cryptlen);
if (unlikely(src_nents < 0)) {
@@ -1675,16 +1678,30 @@ static struct skcipher_edesc 
*skcipher_edesc_alloc(struct skcipher_request *req,
 
sec4_sg_bytes = sec4_sg_ents * sizeof(struct sec4_sg_entry);
 
-   /*
-* allocate space for base edesc and hw desc commands, link tables, IV
-*/
-   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes + ivsize,
-   GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
-  0, 0, 0);
-   return ERR_PTR(-ENOMEM);
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = sizeof(*edesc) + desc_bytes + sec4_sg_bytes + ivsize;
+   if (edesc_size > (crypto_skcipher_reqsize(skcipher) -
+ sizeof(struct caam_skcipher_req_ctx))) {
+   /*
+* allocate space for base edesc and hw desc commands,
+* link tables, IV
+*/
+   edesc = kzalloc(edesc_size, GFP_DMA | flags);
+   if (!edesc) {
+   caam_unmap(jrdev, req->src, req->dst, src_nents,
+  dst_nents, 0, 0, 0, 0);
+   return ERR_PTR(-ENOMEM);
+   }
+   edesc->free = true;
+   } else {
+   /*
+* get address for base edesc and hw desc commands,
+* link tables, IV
+*/
+   edesc = (struct skcipher_edesc *)((u8 *)rctx +
+   sizeof(struct caam_skcipher_req_ctx));
+   /* clear memory */
+   memset(edesc, 0, sizeof(*edesc));
}
 
edesc->src_nents = src_nents;
@@ -1706,7 +1723,8 @@ static struct skcipher_edesc *skcipher_edesc_alloc(struct 
skcipher_request *req,
dev_err(jrdev, "unable to map IV\n");
caam_unmap(jrdev, req->src, req->dst, src_nents,
   dst_nents, 0, 0, 0, 0);
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
return ERR_PTR(-ENOMEM);
}
 
@@ -1736,7 +1754,8 @@ static struct skcipher_edesc *skcipher_edesc_alloc(struct 
skc

[PATCH 0/5] crypto: caam - avoid allocating memory at crypto request runtime

2020-12-02 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

This series removes CRYPTO_ALG_ALLOCATES_MEMORY flag and
allocates the memory needed by the driver, to fulfil a
request, within the crypto request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is added to the reqsize
field that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Iuliana Prodan (5):
  crypto: caam/jr - avoid allocating memory at crypto request runtime
for skcipher
  crypto: caam/jr - avoid allocating memory at crypto request runtime
for aead
  crypto: caam/jr - avoid allocating memory at crypto request runtime
fost hash
  crypto: caam/qi - avoid allocating memory at crypto request runtime
  crypto: caam/qi2 - avoid allocating memory at crypto request runtime

 drivers/crypto/caam/caamalg.c | 141 +++---
 drivers/crypto/caam/caamalg_qi.c  | 134 ++
 drivers/crypto/caam/caamalg_qi2.c | 415 --
 drivers/crypto/caam/caamalg_qi2.h |   6 +
 drivers/crypto/caam/caamhash.c|  77 --
 5 files changed, 538 insertions(+), 235 deletions(-)

-- 
2.17.1



[RFC PATCH 4/4] crypto: caam - avoid allocating memory at crypto request runtime for aead

2020-11-25 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor, hw
descriptor commands and link tables is computed in frontend
driver (caamalg) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 4 for dst, aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg.c | 59 +++
 1 file changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 6ace8545faec..7038394c41c0 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -880,6 +880,7 @@ static int xts_skcipher_setkey(struct crypto_skcipher 
*skcipher, const u8 *key,
  * @mapped_dst_nents: number of segments in output h/w link table
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if aead_edesc needs to be freed
  * @sec4_sg_dma: bus physical mapped address of h/w link table
  * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
@@ -891,6 +892,7 @@ struct aead_edesc {
int mapped_dst_nents;
int sec4_sg_bytes;
bool bklog;
+   bool free;
dma_addr_t sec4_sg_dma;
struct sec4_sg_entry *sec4_sg;
u32 hw_desc[];
@@ -987,8 +989,8 @@ static void aead_crypt_done(struct device *jrdev, u32 
*desc, u32 err,
ecode = caam_jr_strstatus(jrdev, err);
 
aead_unmap(jrdev, edesc, req);
-
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
/*
 * If no backlog flag, the completion of the request is done
@@ -1301,7 +1303,7 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
int src_len, dst_len = 0;
struct aead_edesc *edesc;
-   int sec4_sg_index, sec4_sg_len, sec4_sg_bytes;
+   int sec4_sg_index, sec4_sg_len, sec4_sg_bytes, edesc_size = 0;
unsigned int authsize = ctx->authsize;
 
if (unlikely(req->dst != req->src)) {
@@ -1381,13 +1383,30 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
 
sec4_sg_bytes = sec4_sg_len * sizeof(struct sec4_sg_entry);
 
-   /* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
-   GFP_DMA | flags);
-   if (!edesc) {
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
-  0, 0, 0);
-   return ERR_PTR(-ENOMEM);
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = sizeof(*edesc) + desc_bytes + sec4_sg_bytes;
+   if (edesc_size > (crypto_aead_reqsize(aead) -
+ sizeof(struct caam_aead_req_ctx))) {
+   /*
+* allocate space for base edesc and
+* hw desc commands, link tables
+*/
+   edesc = kzalloc(edesc_size, GFP_DMA | flags);
+   if (!edesc) {
+   caam_unmap(jrdev, req->src, req->dst, src_nents,
+  dst_nents, 0, 0, 0, 0);
+   return ERR_PTR(-ENOMEM);
+   }
+   edesc->free = true;
+   } else {
+   /*
+* get address for base edesc and
+* hw desc commands, link tables
+*/
+   edesc = (struct aead_edesc *)((u8 *)rctx +
+   sizeof(struct caam_aead_req_ctx));
+   /* clear memory */
+   memset(edesc, 0, sizeof(*edesc));
}
 
edesc->src_nents = src_nents;
@@ -1538,7 +1557,8 @@ static int aead_do_one_req(struct crypto_engine *engine, 
void *areq)
 
if (ret != -EINPROGRESS) {
aead_unmap(ctx->jrdev, rctx->edesc, req);
-   kfree(rctx->edesc);
+   if (rctx->edesc->free)
+   kfree(rctx->edesc);
} else {
ret = 0;
}
@@ -3463,6 +3483,20 @@ static int caam_aead_init(struct crypto_aead *tfm)
struct caam_aead_alg *caam_alg =
 container_of(alg, struct caam_aead_alg, aead);
struct caam_ctx *ctx = crypto_aead_ctx(tfm);
+   int extra_reqsize = 0;
+
+   /*
+* Compute extra space needed for base edesc and
+* hw desc commands, lin

[RFC PATCH 3/4] crypto: caam - avoid allocating memory at crypto request runtime for skcipher

2020-11-25 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Remove CRYPTO_ALG_ALLOCATES_MEMORY flag and allocate the memory
needed by the driver, to fulfil a request, within the crypto
request object.
The extra size needed for base extended descriptor and hw
descriptor commands, link tables, IV is computed in frontend
driver (caamalg) initialization and saved in reqsize field
that indicates how much memory could be needed per request.

CRYPTO_ALG_ALLOCATES_MEMORY flag is limited only to
dm-crypt use-cases, which seems to be 4 entries maximum.
Therefore in reqsize we allocate memory for maximum 4 entries
for src and 1 for IV, and the same for dst, both aligned.
If the driver needs more than the 4 entries maximum, the memory
is dynamically allocated, at runtime.

Signed-off-by: Iuliana Prodan 
---
 drivers/crypto/caam/caamalg.c | 71 +--
 1 file changed, 52 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 8697ae53b063..6ace8545faec 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -905,6 +905,7 @@ struct aead_edesc {
  * @iv_dma: dma address of iv for checking continuity and link table
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @bklog: stored to determine if the request needs backlog
+ * @free: stored to determine if skcipher_edesc needs to be freed
  * @sec4_sg_dma: bus physical mapped address of h/w link table
  * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
@@ -918,6 +919,7 @@ struct skcipher_edesc {
dma_addr_t iv_dma;
int sec4_sg_bytes;
bool bklog;
+   bool free;
dma_addr_t sec4_sg_dma;
struct sec4_sg_entry *sec4_sg;
u32 hw_desc[];
@@ -1037,7 +1039,8 @@ static void skcipher_crypt_done(struct device *jrdev, u32 
*desc, u32 err,
 DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
 edesc->dst_nents > 1 ? 100 : req->cryptlen, 1);
 
-   kfree(edesc);
+   if (edesc->free)
+   kfree(edesc);
 
/*
 * If no backlog flag, the completion of the request is done
@@ -1604,7 +1607,7 @@ static struct skcipher_edesc *skcipher_edesc_alloc(struct 
skcipher_request *req,
dma_addr_t iv_dma = 0;
u8 *iv;
int ivsize = crypto_skcipher_ivsize(skcipher);
-   int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes;
+   int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes, edesc_size = 0;
 
src_nents = sg_nents_for_len(req->src, req->cryptlen);
if (unlikely(src_nents < 0)) {
@@ -1675,16 +1678,30 @@ static struct skcipher_edesc 
*skcipher_edesc_alloc(struct skcipher_request *req,
 
sec4_sg_bytes = sec4_sg_ents * sizeof(struct sec4_sg_entry);
 
-   /*
-* allocate space for base edesc and hw desc commands, link tables, IV
-*/
-   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes + ivsize,
-   GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
-  0, 0, 0);
-   return ERR_PTR(-ENOMEM);
+/* Check if there's enough space for edesc saved in req */
+   edesc_size = sizeof(*edesc) + desc_bytes + sec4_sg_bytes + ivsize;
+   if (edesc_size > (crypto_skcipher_reqsize(skcipher) -
+ sizeof(struct caam_skcipher_req_ctx))) {
+   /*
+* allocate space for base edesc and hw desc commands,
+* link tables, IV
+*/
+   edesc = kzalloc(edesc_size, GFP_DMA | flags);
+   if (!edesc) {
+   caam_unmap(jrdev, req->src, req->dst, src_nents,
+  dst_nents, 0, 0, 0, 0);
+   return ERR_PTR(-ENOMEM);
+   }
+   edesc->free = true;
+   } else {
+   /*
+* get address for base edesc and hw desc commands,
+* link tables, IV
+*/
+   edesc = (struct skcipher_edesc *)((u8 *)rctx +
+   sizeof(struct caam_skcipher_req_ctx));
+   /* clear memory */
+   memset(edesc, 0, sizeof(*edesc));
}
 
edesc->src_nents = src_nents;
@@ -1764,11 +1781,11 @@ static int skcipher_do_one_req(struct crypto_engine 
*engine, void *areq)
 
if (ret != -EINPROGRESS) {
skcipher_unmap(ctx->jrdev, rctx->edesc, req);
-   kfree(rctx->edesc);
+   if (rctx->edesc->free)
+   kfree(rctx->edesc);
} else {
ret = 0;
}
-
return ret;
 }
 
@@ -3393,10 +3410,25 @@ static int caam_cra_init(struct crypto_skcipher *tfm)
container_of(alg, typeof(*caam_alg), skcipher);
struct caam_ctx *ctx = crypto_s

[RFC PATCH 2/4] net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request

2020-11-25 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Some crypto backends might require the requests' private contexts
to be allocated in DMA-able memory.

Signed-off-by: Horia Geanta 
---
 net/ipv4/esp4.c | 7 ++-
 net/ipv6/esp6.c | 7 ++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 8b07f3a4f2db..9edfb1012c3d 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -46,6 +46,7 @@ struct esp_output_extra {
 static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int extralen)
 {
unsigned int len;
+   gfp_t gfp = GFP_ATOMIC;
 
len = extralen;
 
@@ -62,7 +63,11 @@ static void *esp_alloc_tmp(struct crypto_aead *aead, int 
nfrags, int extralen)
 
len += sizeof(struct scatterlist) * nfrags;
 
-   return kmalloc(len, GFP_ATOMIC);
+   if (crypto_aead_reqsize(aead) &&
+   (crypto_aead_get_flags(aead) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
+   return kmalloc(len, gfp);
 }
 
 static inline void *esp_tmp_extra(void *tmp)
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 52c2f063529f..e9125e1234b5 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -63,6 +63,7 @@ struct esp_output_extra {
 static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int seqihlen)
 {
unsigned int len;
+   gfp_t gfp = GFP_ATOMIC;
 
len = seqihlen;
 
@@ -79,7 +80,11 @@ static void *esp_alloc_tmp(struct crypto_aead *aead, int 
nfrags, int seqihlen)
 
len += sizeof(struct scatterlist) * nfrags;
 
-   return kmalloc(len, GFP_ATOMIC);
+   if (crypto_aead_reqsize(aead) &&
+   (crypto_aead_get_flags(aead) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
+   return kmalloc(len, gfp);
 }
 
 static inline void *esp_tmp_extra(void *tmp)
-- 
2.17.1



[RFC PATCH 1/4] crypto: add CRYPTO_TFM_REQ_DMA flag

2020-11-25 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

The CRYPTO_TFM_REQ_DMA flag can be used by backend implementations to
indicate to crypto API the need to allocate GFP_DMA memory
for private contexts of the crypto requests.

For public key encryption add the needed functions to
set/get/clear flags.

Signed-off-by: Horia Geanta 
Signed-off-by: Iuliana Prodan 
---
 include/crypto/aead.h |  4 
 include/crypto/akcipher.h | 21 +
 include/crypto/hash.h |  4 
 include/crypto/skcipher.h |  4 
 include/linux/crypto.h|  1 +
 5 files changed, 34 insertions(+)

diff --git a/include/crypto/aead.h b/include/crypto/aead.h
index fcc12c593ef8..ae2ef87cfb0d 100644
--- a/include/crypto/aead.h
+++ b/include/crypto/aead.h
@@ -416,6 +416,10 @@ static inline struct aead_request 
*aead_request_alloc(struct crypto_aead *tfm,
 {
struct aead_request *req;
 
+   if (crypto_aead_reqsize(tfm) &&
+   (crypto_aead_get_flags(tfm) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
req = kmalloc(sizeof(*req) + crypto_aead_reqsize(tfm), gfp);
 
if (likely(req))
diff --git a/include/crypto/akcipher.h b/include/crypto/akcipher.h
index 1d3aa252caba..c06c140d1b7a 100644
--- a/include/crypto/akcipher.h
+++ b/include/crypto/akcipher.h
@@ -158,6 +158,23 @@ static inline unsigned int crypto_akcipher_reqsize(struct 
crypto_akcipher *tfm)
return crypto_akcipher_alg(tfm)->reqsize;
 }
 
+static inline u32 crypto_akcipher_get_flags(struct crypto_akcipher *tfm)
+{
+   return crypto_tfm_get_flags(crypto_akcipher_tfm(tfm));
+}
+
+static inline void crypto_akcipher_set_flags(struct crypto_akcipher *tfm,
+u32 flags)
+{
+   crypto_tfm_set_flags(crypto_akcipher_tfm(tfm), flags);
+}
+
+static inline void crypto_akcipher_clear_flags(struct crypto_akcipher *tfm,
+  u32 flags)
+{
+   crypto_tfm_clear_flags(crypto_akcipher_tfm(tfm), flags);
+}
+
 static inline void akcipher_request_set_tfm(struct akcipher_request *req,
struct crypto_akcipher *tfm)
 {
@@ -193,6 +210,10 @@ static inline struct akcipher_request 
*akcipher_request_alloc(
 {
struct akcipher_request *req;
 
+   if (crypto_akcipher_reqsize(tfm) &&
+   (crypto_akcipher_get_flags(tfm) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
req = kmalloc(sizeof(*req) + crypto_akcipher_reqsize(tfm), gfp);
if (likely(req))
akcipher_request_set_tfm(req, tfm);
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index af2ff31ff619..cb28be54569a 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -599,6 +599,10 @@ static inline struct ahash_request *ahash_request_alloc(
 {
struct ahash_request *req;
 
+   if (crypto_ahash_reqsize(tfm) &&
+   (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
req = kmalloc(sizeof(struct ahash_request) +
  crypto_ahash_reqsize(tfm), gfp);
 
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 6a733b171a5d..3c598b56628b 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -493,6 +493,10 @@ static inline struct skcipher_request 
*skcipher_request_alloc(
 {
struct skcipher_request *req;
 
+   if (crypto_skcipher_reqsize(tfm) &&
+   (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_REQ_DMA))
+   gfp |= GFP_DMA;
+
req = kmalloc(sizeof(struct skcipher_request) +
  crypto_skcipher_reqsize(tfm), gfp);
 
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index ef90e07c9635..87d7f0563c13 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -141,6 +141,7 @@
 #define CRYPTO_TFM_REQ_FORBID_WEAK_KEYS0x0100
 #define CRYPTO_TFM_REQ_MAY_SLEEP   0x0200
 #define CRYPTO_TFM_REQ_MAY_BACKLOG 0x0400
+#define CRYPTO_TFM_REQ_DMA 0x0800
 
 /*
  * Miscellaneous stuff.
-- 
2.17.1



[RFC PATCH 0/4] crypto: add CRYPTO_TFM_REQ_DMA flag

2020-11-25 Thread Iuliana Prodan (OSS)
From: Iuliana Prodan 

Add the option to allocate the crypto request object plus any extra space
needed by the driver into a DMA-able memory.

Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
indicate to crypto API the need to allocate GFP_DMA memory
for private contexts of the crypto requests.

For IPsec use cases, CRYPTO_TFM_REQ_DMA flag is also checked in
esp_alloc_tmp() function for IPv4 and IPv6.

This series includes an example of how a driver can use
CRYPTO_TFM_REQ_DMA flag while setting reqsize to a larger value
to avoid allocating memory at crypto request runtime.
The extra size needed by the driver is added to the reqsize field
that indicates how much memory could be needed per request.

Iuliana Prodan (4):
  crypto: add CRYPTO_TFM_REQ_DMA flag
  net: esp: check CRYPTO_TFM_REQ_DMA flag when allocating crypto request
  crypto: caam - avoid allocating memory at crypto request runtime for
skcipher
  crypto: caam - avoid allocating memory at crypto request runtime for
aead

 drivers/crypto/caam/caamalg.c | 130 +-
 include/crypto/aead.h |   4 ++
 include/crypto/akcipher.h |  21 ++
 include/crypto/hash.h |   4 ++
 include/crypto/skcipher.h |   4 ++
 include/linux/crypto.h|   1 +
 net/ipv4/esp4.c   |   7 +-
 net/ipv6/esp6.c   |   7 +-
 8 files changed, 144 insertions(+), 34 deletions(-)

-- 
2.17.1