Re: [PATCH v7] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-11-09 Thread Sebastian Andrzej Siewior
I've been looking at the patch and it looks like it should work. Having
numbers to backup the performance in the pure-software version and with
HW acceleration would _very_ nice to have.

On 2020-11-07 19:53:32 [+1300], Barry Song wrote:
> index fbb7829..73f04de 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -415,30 +445,54 @@ static int zswap_dstmem_dead(unsigned int cpu)
…
> + acomp_ctx->req = req;
> +
> + crypto_init_wait(&acomp_ctx->wait);
> + /*
> +  * if the backend of acomp is async zip, crypto_req_done() will wakeup
> +  * crypto_wait_req(); if the backend of acomp is scomp, the callback
> +  * won't be called, crypto_wait_req() will return without blocking.
> +  */
> + acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +crypto_req_done, &acomp_ctx->wait);
> +
> + acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> + acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);

You added a comment here and there you never mentioned that this single
per-CPU mutex protects the per-CPU context (which you can have more than
one on a single CPU) and the scratch/dstmem which is one per-CPU. Of
course if you read the code you figure it out.
I still think that you should have a pool of memory and crypto contexts
which you can use instead of having them strictly per-CPU. The code is
fully preemptible and you may have multiple requests on the same CPU.
Yes, locking works but at the same you block processing while waiting on
a lock and the "reserved memory" on other CPUs remains unused.

Sebastian


[PATCH v2 net-next 1/3] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

2020-11-01 Thread Sebastian Andrzej Siewior
dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:

 - Hard interrupt context
 - Any context which is not serving soft interrupts

Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:

/*
 * In case of threaded ISR, for RT kernels in_irq() does not return
 * appropriate value, so use in_serving_softirq to distinguish between
 * softirq and irq contexts.
 */
 if (in_irq() || !in_serving_softirq())

This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.

The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.

The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.

The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():

qman_p_poll_dqrr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

portal_isr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

Both need to schedule NAPI.
The crypto part has another code path leading up to this:
  kill_fq()
 empty_retired_fq()
   qman_p_poll_dqrr()
 __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.

The code path:
  caam_qi_poll() -> qman_p_poll_dqrr()

is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.

Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert XS 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c   |  3 ++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 
 drivers/soc/fsl/qbman/qman.c   | 12 ++--
 drivers/soc/fsl/qbman/qman_test_api.c  |  6 --
 drivers/soc/fsl/qbman/qman_test_stash.c|  6 --
 include/soc/fsl/qman.h |  3 ++-
 6 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index ec53528d82058..57f6ab6bfb56a 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -564,7 +564,8 @@ static int caam_qi_napi_schedule(struct qman_portal *p, 
struct caam_napi *np)
 
 static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct qman_portal *p,
struct qman_fq *rsp_fq,
-   const struct qm_dqrr_entry 
*dqrr)
+   const struct qm_dqrr_entry 
*dqrr,
+   bool sched_napi)
 {
struct caam_napi *caam_napi = raw_cpu_ptr(&pcpu_qipriv.caam_napi);
struct caam_drv_req *drv_req;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 06cc863f4dd63..98ead77c673e5 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2316,7 +2316,8 @@ static inline int dpaa_eth_napi_schedule(struct 
dpaa_percpu_priv *percpu_priv,
 
 static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
  struct qman_fq *fq,
- const struct qm_dqrr_entry *dq)
+ const struct qm_dqrr_entry *dq,
+ bool sched_napi)
 {
struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
struct dpaa_percpu_priv *percpu_priv;
@@ -2343,7 +2344,8 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
 stati

[PATCH v2 net-next 3/3] crypto: caam: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index 57f6ab6bfb56a..8163f5df8ebf7 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -545,14 +545,10 @@ static void cgr_cb(struct qman_portal *qm, struct 
qman_cgr *cgr, int congested)
}
 }
 
-static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np)
+static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np,
+bool sched_napi)
 {
-   /*
-* In case of threaded ISR, for RT kernels in_irq() does not return
-* appropriate value, so use in_serving_softirq to distinguish between
-* softirq and irq contexts.
-*/
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (sched_napi) {
/* Disable QMan IRQ source and invoke NAPI */
qman_p_irqsource_remove(p, QM_PIRQ_DQRI);
np->p = p;
@@ -574,7 +570,7 @@ static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct 
qman_portal *p,
struct caam_drv_private *priv = dev_get_drvdata(qidev);
u32 status;
 
-   if (caam_qi_napi_schedule(p, caam_napi))
+   if (caam_qi_napi_schedule(p, caam_napi, sched_napi))
return qman_cb_dqrr_stop;
 
fd = &dqrr->fd;
-- 
2.29.1



[PATCH v2 net-next 2/3] net: dpaa: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 98ead77c673e5..39c996b64ccec 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2300,9 +2300,9 @@ static void dpaa_tx_conf(struct net_device *net_dev,
 }
 
 static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
-struct qman_portal *portal)
+struct qman_portal *portal, bool 
sched_napi)
 {
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (sched_napi) {
/* Disable QMan IRQ and invoke NAPI */
qman_p_irqsource_remove(portal, QM_PIRQ_DQRI);
 
@@ -2333,7 +2333,7 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_eth_refill_bpools(priv);
@@ -2377,7 +2377,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
percpu_priv = this_cpu_ptr(priv->percpu_priv);
percpu_stats = &percpu_priv->stats;
 
-   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal)))
+   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi)))
return qman_cb_dqrr_stop;
 
/* Make sure we didn't run out of buffers */
@@ -2474,7 +2474,7 @@ static enum qman_cb_dqrr_result conf_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
@@ -2499,7 +2499,7 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_conf(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
-- 
2.29.1



Re: [PATCH net-next 14/15] net: dpaa: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
On 2020-10-31 10:12:15 [-0700], Jakub Kicinski wrote:
> Nit: some networking drivers have a bool napi which means "are we
> running in napi context", the semantics here feel a little backwards,
> at least to me. But if I'm the only one thinking this, so be it.

I renamed it to `sched_napi'.

Sebastian


[PATCH net-next 06/15] net: airo: Invoke airo_read_wireless_stats() directly

2020-10-27 Thread Sebastian Andrzej Siewior
airo_get_wireless_stats() is the iw_handler_if::get_wireless_stats()
callback of this driver. This callback was not allowed to sleep until
commit a160ee69c6a46 ("wext: let get_wireless_stats() sleep") in v2.6.32.

airo still delegates the readout to a thread, which is not longer
necessary.

Invoke airo_read_wireless_stats() directly from the callback and remove
the now unused JOB_WSTATS handling.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 87b9398b03fd4..ca423f3b6b3ea 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -1144,7 +1144,6 @@ static int airo_thread(void *data);
 static void timer_func(struct net_device *dev);
 static int airo_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 static struct iw_statistics *airo_get_wireless_stats(struct net_device *dev);
-static void airo_read_wireless_stats(struct airo_info *local);
 #ifdef CISCO_EXT
 static int readrids(struct net_device *dev, aironet_ioctl *comp);
 static int writerids(struct net_device *dev, aironet_ioctl *comp);
@@ -1200,7 +1199,6 @@ struct airo_info {
 #define JOB_MIC5
 #define JOB_EVENT  6
 #define JOB_AUTOWEP7
-#define JOB_WSTATS 8
 #define JOB_SCAN_RESULTS  9
unsigned long jobs;
int (*bap_read)(struct airo_info*, __le16 *pu16Dst, int bytelen,
@@ -3155,8 +3153,6 @@ static int airo_thread(void *data)
airo_end_xmit11(dev);
else if (test_bit(JOB_STATS, &ai->jobs))
airo_read_stats(dev);
-   else if (test_bit(JOB_WSTATS, &ai->jobs))
-   airo_read_wireless_stats(ai);
else if (test_bit(JOB_PROMISC, &ai->jobs))
airo_set_promisc(ai);
else if (test_bit(JOB_MIC, &ai->jobs))
@@ -7732,15 +7728,12 @@ static void airo_read_wireless_stats(struct airo_info 
*local)
__le32 *vals = stats_rid.vals;
 
/* Get stats out of the card */
-   clear_bit(JOB_WSTATS, &local->jobs);
-   if (local->power.event) {
-   up(&local->sem);
+   if (local->power.event)
return;
-   }
+
readCapabilityRid(local, &cap_rid, 0);
readStatusRid(local, &status_rid, 0);
readStatsRid(local, &stats_rid, RID_STATS, 0);
-   up(&local->sem);
 
/* The status */
local->wstats.status = le16_to_cpu(status_rid.mode);
@@ -7783,15 +7776,10 @@ static struct iw_statistics 
*airo_get_wireless_stats(struct net_device *dev)
 {
struct airo_info *local =  dev->ml_priv;
 
-   if (!test_bit(JOB_WSTATS, &local->jobs)) {
-   /* Get stats out of the card if available */
-   if (down_trylock(&local->sem) != 0) {
-   set_bit(JOB_WSTATS, &local->jobs);
-   wake_up_interruptible(&local->thr_wait);
-   } else
-   airo_read_wireless_stats(local);
+   if (!down_interruptible(&local->sem)) {
+   airo_read_wireless_stats(local);
+   up(&local->sem);
}
-
return &local->wstats;
 }
 
-- 
2.28.0



[PATCH net-next 09/15] net: hostap: Remove in_atomic() check.

2020-10-27 Thread Sebastian Andrzej Siewior
hostap_get_wireless_stats() is the iw_handler_if::get_wireless_stats()
callback of this driver. This callback was not allowed to sleep until
commit a160ee69c6a46 ("wext: let get_wireless_stats() sleep") in v2.6.32.

Remove the therefore pointless in_atomic() check.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Jouni Malinen 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/intersil/hostap/hostap_ioctl.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c 
b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
index 514c7b01dbf6f..49766b285230c 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
@@ -44,19 +44,8 @@ static struct iw_statistics 
*hostap_get_wireless_stats(struct net_device *dev)
 
if (local->iw_mode != IW_MODE_MASTER &&
local->iw_mode != IW_MODE_REPEAT) {
-   int update = 1;
-#ifdef in_atomic
-   /* RID reading might sleep and it must not be called in
-* interrupt context or while atomic. However, this
-* function seems to be called while atomic (at least in Linux
-* 2.5.59). Update signal quality values only if in suitable
-* context. Otherwise, previous values read from tick timer
-* will be used. */
-   if (in_atomic())
-   update = 0;
-#endif /* in_atomic */
 
-   if (update && prism2_update_comms_qual(dev) == 0)
+   if (prism2_update_comms_qual(dev) == 0)
wstats->qual.updated = IW_QUAL_ALL_UPDATED |
IW_QUAL_DBM;
 
-- 
2.28.0



[PATCH net-next 12/15] net: rtlwifi: Remove in_interrupt() usage in halbtc_send_bt_mp_operation()

2020-10-27 Thread Sebastian Andrzej Siewior
halbtc_send_bt_mp_operation() uses in_interrupt() to determine if it is
safe to invoke wait_for_completion().

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Aside of that in_interrupt() is not correct as it does not catch preempt
disabled regions which neither can sleep.

halbtc_send_bt_mp_operation() is called from:

 rtl_watchdog_wq_callback()
   rtl_btc_periodical()
 halbtc_get()
   case BTC_GET_U4_BT_PATCH_VER:
 halbtc_get_bt_patch_version()

 which is preemtible context.

   rtl_c2h_content_parsing()
 btc_ops->btc_btinfo_notify()
   rtl_btc_btinfo_notify()
 exhalbtc_bt_info_notify()
   ex_btc8723b1ant_bt_info_notify()
   ex_btc8821a1ant_bt_info_notify()
   ex_btc8821a2ant_bt_info_notify()
 btcoexist->btc_set_bt_reg()
   halbtc_set_bt_reg()

   rtl_c2h_content_parsing() is in turn called from:

   rtl_c2hcmd_wq_callback()
 rtl_c2hcmd_launcher()

   which is preemptible context and from:

   _rtl_pci_rx_interrupt
 rtl_c2hcmd_enqueue()

   which is obviously not preemptible but limited to C2H_BT_MP commands
   which does invoke rtl_c2h_content_parsing().

Aside of that it can be reached from:

 halbtc_get()
   case BTC_GET_U4_SUPPORTED_FEATURE:
 halbtc_get_bt_coex_supported_feature()
   case BTC_GET_U4_BT_FORBIDDEN_SLOT_VAL:
 halbtc_get_bt_forbidden_slot_val()
   case BTC_GET_U4_BT_DEVICE_INFO:
 halbtc_get_bt_device_info()
   case BTC_GET_U4_SUPPORTED_VERSION:
 halbtc_get_bt_coex_supported_version()
   case BTC_GET_U4_SUPPORTED_FEATURE:
 halbtc_get_bt_coex_supported_feature()

   btcoexist->btc_get_bt_afh_map_from_bt()
 halbtc_get_bt_afh_map_from_bt()

   btcoexist->btc_get_ble_scan_para_from_bt()
 halbtc_get_ble_scan_para_from_bt()

   btcoexist->btc_get_ble_scan_type_from_bt()
 halbtc_get_ble_scan_type_from_bt()

   btcoexist->btc_get_ant_det_val_from_bt()
 halbtc_get_ant_det_val_from_bt()

   btcoexist->btc_get_bt_coex_supported_version()
 halbtc_get_bt_coex_supported_version()

   btcoexist->btc_get_bt_coex_supported_feature()
 halbtc_get_bt_coex_supported_feature()

None of these have a caller. Welcome to the wonderful world of HALs and
onion layers.

Remove in_interrupt() check.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Ping-Ke Shih 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c 
b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
index 2155a6699ef8d..be4c0e60d44d1 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
@@ -240,9 +240,6 @@ bool halbtc_send_bt_mp_operation(struct btc_coexist 
*btcoexist, u8 op_code,
rtl_dbg(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
"btmpinfo wait req_num=%d wait=%ld\n", req_num, wait_ms);
 
-   if (in_interrupt())
-   return false;
-
if (wait_for_completion_timeout(&btcoexist->bt_mp_comp,
msecs_to_jiffies(wait_ms)) == 0) {
rtl_dbg(rtlpriv, COMP_BT_COEXIST, DBG_DMESG,
-- 
2.28.0



[PATCH net-next 15/15] crypto: caam: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index 09ea398304c8b..79dbd90887f8a 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -545,14 +545,10 @@ static void cgr_cb(struct qman_portal *qm, struct 
qman_cgr *cgr, int congested)
}
 }
 
-static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np)
+static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np,
+bool napi)
 {
-   /*
-* In case of threaded ISR, for RT kernels in_irq() does not return
-* appropriate value, so use in_serving_softirq to distinguish between
-* softirq and irq contexts.
-*/
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (napi) {
/* Disable QMan IRQ source and invoke NAPI */
qman_p_irqsource_remove(p, QM_PIRQ_DQRI);
np->p = p;
@@ -574,7 +570,7 @@ static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct 
qman_portal *p,
struct caam_drv_private *priv = dev_get_drvdata(qidev);
u32 status;
 
-   if (caam_qi_napi_schedule(p, caam_napi))
+   if (caam_qi_napi_schedule(p, caam_napi, napi))
return qman_cb_dqrr_stop;
 
fd = &dqrr->fd;
-- 
2.28.0



[PATCH net-next 10/15] net: zd1211rw: Remove in_atomic() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The usage of in_atomic() in driver code is deprecated as it can not
always detect all states where it is not allowed to sleep.

All callers are in premptible thread context and all functions invoke core
functions which have checks for invalid calling contexts already.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Daniel Drake 
Cc: Ulrich Kunitz 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/zydas/zd1211rw/zd_usb.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_usb.c 
b/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
index 66367ab7e4c1e..5c4cd0e1adebb 100644
--- a/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
@@ -1711,11 +1711,6 @@ int zd_usb_ioread16v(struct zd_usb *usb, u16 *values,
 count, USB_MAX_IOREAD16_COUNT);
return -EINVAL;
}
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-"error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
if (!usb_int_enabled(usb)) {
dev_dbg_f(zd_usb_dev(usb),
  "error: usb interrupt not enabled\n");
@@ -1882,11 +1877,6 @@ int zd_usb_iowrite16v_async(struct zd_usb *usb, const 
struct zd_ioreq16 *ioreqs,
count, USB_MAX_IOWRITE16_COUNT);
return -EINVAL;
}
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-   "error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
 
udev = zd_usb_to_usbdev(usb);
 
@@ -1966,11 +1956,6 @@ int zd_usb_rfwrite(struct zd_usb *usb, u32 value, u8 
bits)
int i, req_len, actual_req_len;
u16 bit_value_template;
 
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-   "error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
if (bits < USB_MIN_RFWRITE_BIT_COUNT) {
dev_dbg_f(zd_usb_dev(usb),
"error: bits %d are smaller than"
-- 
2.28.0



[PATCH net-next 07/15] net: airo: Always use JOB_STATS and JOB_EVENT

2020-10-27 Thread Sebastian Andrzej Siewior
issuecommand() is using in_atomic() to decide if it is safe to invoke
schedule() while waiting for the command to be accepted.

Usage of in_atomic() for this is only half correct as it can not detect all
condition where it is not allowed to schedule(). Also Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Chasing the call chains leading up to issuecommand() is straight forward,
but airo_link() and airo_get_stats() would require to pass the context
through a quite large amount of functions.

As this is ancient hardware, avoid the churn and enforce the invocation of
those functions through the JOB machinery.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index ca423f3b6b3ea..369a6ca44d1ff 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -2286,12 +2286,8 @@ static struct net_device_stats *airo_get_stats(struct 
net_device *dev)
struct airo_info *local =  dev->ml_priv;
 
if (!test_bit(JOB_STATS, &local->jobs)) {
-   /* Get stats out of the card if available */
-   if (down_trylock(&local->sem) != 0) {
-   set_bit(JOB_STATS, &local->jobs);
-   wake_up_interruptible(&local->thr_wait);
-   } else
-   airo_read_stats(dev);
+   set_bit(JOB_STATS, &local->jobs);
+   wake_up_interruptible(&local->thr_wait);
}
 
return &dev->stats;
@@ -3277,11 +3273,9 @@ static void airo_handle_link(struct airo_info *ai)
set_bit(FLAG_UPDATE_UNI, &ai->flags);
set_bit(FLAG_UPDATE_MULTI, &ai->flags);
 
-   if (down_trylock(&ai->sem) != 0) {
-   set_bit(JOB_EVENT, &ai->jobs);
-   wake_up_interruptible(&ai->thr_wait);
-   } else
-   airo_send_event(ai->dev);
+   set_bit(JOB_EVENT, &ai->jobs);
+   wake_up_interruptible(&ai->thr_wait);
+
netif_carrier_on(ai->dev);
} else if (!scan_forceloss) {
if (auto_wep && !ai->expires) {
-- 
2.28.0



[PATCH net-next 14/15] net: dpaa: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 27835310b718e..2c949acd74c67 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2300,9 +2300,9 @@ static void dpaa_tx_conf(struct net_device *net_dev,
 }
 
 static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
-struct qman_portal *portal)
+struct qman_portal *portal, bool napi)
 {
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (napi) {
/* Disable QMan IRQ and invoke NAPI */
qman_p_irqsource_remove(portal, QM_PIRQ_DQRI);
 
@@ -2333,7 +2333,7 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_eth_refill_bpools(priv);
@@ -2377,7 +2377,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
percpu_priv = this_cpu_ptr(priv->percpu_priv);
percpu_stats = &percpu_priv->stats;
 
-   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal)))
+   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal, napi)))
return qman_cb_dqrr_stop;
 
/* Make sure we didn't run out of buffers */
@@ -2474,7 +2474,7 @@ static enum qman_cb_dqrr_result conf_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
@@ -2499,7 +2499,7 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_conf(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
-- 
2.28.0



[PATCH net-next 13/15] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

2020-10-27 Thread Sebastian Andrzej Siewior
dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:

 - Hard interrupt context
 - Any context which is not serving soft interrupts

Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:

/*
 * In case of threaded ISR, for RT kernels in_irq() does not return
 * appropriate value, so use in_serving_softirq to distinguish between
 * softirq and irq contexts.
 */
 if (in_irq() || !in_serving_softirq())

This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.

The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.

The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.

The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():

qman_p_poll_dqrr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

portal_isr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

Both need to schedule NAPI.
The crypto part has another code path leading up to this:
  kill_fq()
 empty_retired_fq()
   qman_p_poll_dqrr()
 __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.

The code path:
  caam_qi_poll() -> qman_p_poll_dqrr()

is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.

Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-crypto@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c   |  3 ++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 
 drivers/soc/fsl/qbman/qman.c   | 12 ++--
 drivers/soc/fsl/qbman/qman_test_api.c  |  6 --
 drivers/soc/fsl/qbman/qman_test_stash.c|  6 --
 include/soc/fsl/qman.h |  3 ++-
 6 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index ec53528d82058..09ea398304c8b 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -564,7 +564,8 @@ static int caam_qi_napi_schedule(struct qman_portal *p, 
struct caam_napi *np)
 
 static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct qman_portal *p,
struct qman_fq *rsp_fq,
-   const struct qm_dqrr_entry 
*dqrr)
+   const struct qm_dqrr_entry 
*dqrr,
+   bool napi)
 {
struct caam_napi *caam_napi = raw_cpu_ptr(&pcpu_qipriv.caam_napi);
struct caam_drv_req *drv_req;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 06cc863f4dd63..27835310b718e 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2316,7 +2316,8 @@ static inline int dpaa_eth_napi_schedule(struct 
dpaa_percpu_priv *percpu_priv,
 
 static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
  struct qman_fq *fq,
- const struct qm_dqrr_entry *dq)
+ const struct qm_dqrr_entry *dq,
+ bool napi)
 {
struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
struct dpaa_percpu_priv *percpu_priv;
@@ -2343,7 +2344,8 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
 static enum q

[PATCH net-next 04/15] net: mlx5: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Saeed Mahameed 
Cc: Leon Romanovsky 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-r...@vger.kernel.org
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 8ebfe782f95e5..3800e9415158b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -189,19 +189,21 @@ u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq_comp *eq)
return count_eqe;
 }
 
-static void mlx5_eq_async_int_lock(struct mlx5_eq_async *eq, unsigned long 
*flags)
+static void mlx5_eq_async_int_lock(struct mlx5_eq_async *eq, bool recovery,
+  unsigned long *flags)
__acquires(&eq->lock)
 {
-   if (in_irq())
+   if (!recovery)
spin_lock(&eq->lock);
else
spin_lock_irqsave(&eq->lock, *flags);
 }
 
-static void mlx5_eq_async_int_unlock(struct mlx5_eq_async *eq, unsigned long 
*flags)
+static void mlx5_eq_async_int_unlock(struct mlx5_eq_async *eq, bool recovery,
+unsigned long *flags)
__releases(&eq->lock)
 {
-   if (in_irq())
+   if (!recovery)
spin_unlock(&eq->lock);
else
spin_unlock_irqrestore(&eq->lock, *flags);
@@ -222,12 +224,14 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
struct mlx5_core_dev *dev;
struct mlx5_eqe *eqe;
unsigned long flags;
+   bool recovery;
int num_eqes = 0;
 
dev = eq->dev;
eqt = dev->priv.eq_table;
 
-   mlx5_eq_async_int_lock(eq_async, &flags);
+   recovery = action == ASYNC_EQ_RECOVER;
+   mlx5_eq_async_int_lock(eq_async, recovery, &flags);
 
eqe = next_eqe_sw(eq);
if (!eqe)
@@ -249,9 +253,9 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
 
 out:
eq_update_ci(eq, 1);
-   mlx5_eq_async_int_unlock(eq_async, &flags);
+   mlx5_eq_async_int_unlock(eq_async, recovery, &flags);
 
-   return unlikely(action == ASYNC_EQ_RECOVER) ? num_eqes : 0;
+   return unlikely(recovery) ? num_eqes : 0;
 }
 
 void mlx5_cmd_eq_recover(struct mlx5_core_dev *dev)
-- 
2.28.0



[PATCH net-next 11/15] net: rtlwifi: Remove in_interrupt() usage in is_any_client_connect_to_ap().

2020-10-27 Thread Sebastian Andrzej Siewior
is_any_client_connect_to_ap() is using in_interrupt() to determine whether
it should acquire the lock prior accessing the list.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The function is called from:

- halbtc_get()

- halbtc_get()
halbtc_get_wifi_link_status()

- halbtc_display_dbg_msg()
halbtc_display_wifi_status()
  halbtc_get_wifi_link_status()

All top level callers are part of the btc_coexist callback inferface and
are never invoked from a context which can hold the lock already.

The contexts which hold the lock are either protecting list add/del
operations or list walks which never call into any of the btc_coexist
interfaces.

In fact the conditional is outright dangerous because if this function
would be invoked from a BH disabled context the check would avoid taking
the lock while on another CPU the list could be manipulated under the lock.

Remove the in_interrupt() check and always acquire the lock.

To simplify the code further use list_empty() instead of walking the list
and counting the entries just to check the count for > 0 at the end.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Ping-Ke Shih 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.c  | 25 +--
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c 
b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
index 2c05369b79e4d..2155a6699ef8d 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
@@ -47,30 +47,17 @@ static bool is_any_client_connect_to_ap(struct btc_coexist 
*btcoexist)
 {
struct rtl_priv *rtlpriv = btcoexist->adapter;
struct rtl_mac *mac = rtl_mac(rtlpriv);
-   struct rtl_sta_info *drv_priv;
-   u8 cnt = 0;
+   bool ret = false;
 
if (mac->opmode == NL80211_IFTYPE_ADHOC ||
mac->opmode == NL80211_IFTYPE_MESH_POINT ||
mac->opmode == NL80211_IFTYPE_AP) {
-   if (in_interrupt() > 0) {
-   list_for_each_entry(drv_priv, &rtlpriv->entry_list,
-   list) {
-   cnt++;
-   }
-   } else {
-   spin_lock_bh(&rtlpriv->locks.entry_list_lock);
-   list_for_each_entry(drv_priv, &rtlpriv->entry_list,
-   list) {
-   cnt++;
-   }
-   spin_unlock_bh(&rtlpriv->locks.entry_list_lock);
-   }
+   spin_lock_bh(&rtlpriv->locks.entry_list_lock);
+   if (!list_empty(&rtlpriv->entry_list))
+   ret = true;
+   spin_unlock_bh(&rtlpriv->locks.entry_list_lock);
}
-   if (cnt > 0)
-   return true;
-   else
-   return false;
+   return ret;
 }
 
 static bool halbtc_legacy(struct rtl_priv *adapter)
-- 
2.28.0



[PATCH net-next 08/15] net: airo: Replace in_atomic() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
issuecommand() is using in_atomic() to decide if it is safe to invoke
schedule() while waiting for the command to be accepted.

Usage of in_atomic() for this is only half correct as it can not detect all
condition where it is not allowed to schedule(). Also Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Add an may_sleep argument to issuecommand() indicating when it is save to
sleep and change schedule() to cond_resched() because it's pointless to
invoke schedule() if there is no request to reschedule.

Pass the may_sleep condition through the various call chains leading to
issuecommand().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 86 +--
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 369a6ca44d1ff..74acf9af2adb1 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -1115,7 +1115,8 @@ static int enable_MAC(struct airo_info *ai, int lock);
 static void disable_MAC(struct airo_info *ai, int lock);
 static void enable_interrupts(struct airo_info*);
 static void disable_interrupts(struct airo_info*);
-static u16 issuecommand(struct airo_info*, Cmd *pCmd, Resp *pRsp);
+static u16 issuecommand(struct airo_info*, Cmd *pCmd, Resp *pRsp,
+   bool may_sleep);
 static int bap_setup(struct airo_info*, u16 rid, u16 offset, int whichbap);
 static int aux_bap_read(struct airo_info*, __le16 *pu16Dst, int bytelen,
int whichbap);
@@ -1130,8 +1131,10 @@ static int PC4500_writerid(struct airo_info*, u16 rid, 
const void
 static int do_writerid(struct airo_info*, u16 rid, const void *rid_data,
int len, int dummy);
 static u16 transmit_allocate(struct airo_info*, int lenPayload, int raw);
-static int transmit_802_3_packet(struct airo_info*, int len, char *pPacket);
-static int transmit_802_11_packet(struct airo_info*, int len, char *pPacket);
+static int transmit_802_3_packet(struct airo_info*, int len, char *pPacket,
+bool may_sleep);
+static int transmit_802_11_packet(struct airo_info*, int len, char *pPacket,
+ bool may_sleep);
 
 static int mpi_send_packet(struct net_device *dev);
 static void mpi_unmap_card(struct pci_dev *pci);
@@ -1753,7 +1756,7 @@ static int readBSSListRid(struct airo_info *ai, int first,
if (down_interruptible(&ai->sem))
return -ERESTARTSYS;
ai->list_bss_task = current;
-   issuecommand(ai, &cmd, &rsp);
+   issuecommand(ai, &cmd, &rsp, true);
up(&ai->sem);
/* Let the command take effect */
schedule_timeout_uninterruptible(3 * HZ);
@@ -2096,7 +2099,7 @@ static void get_tx_error(struct airo_info *ai, s32 fid)
}
 }
 
-static void airo_end_xmit(struct net_device *dev)
+static void airo_end_xmit(struct net_device *dev, bool may_sleep)
 {
u16 status;
int i;
@@ -2107,7 +2110,7 @@ static void airo_end_xmit(struct net_device *dev)
 
clear_bit(JOB_XMIT, &priv->jobs);
clear_bit(FLAG_PENDING_XMIT, &priv->flags);
-   status = transmit_802_3_packet (priv, fids[fid], skb->data);
+   status = transmit_802_3_packet(priv, fids[fid], skb->data, may_sleep);
up(&priv->sem);
 
i = 0;
@@ -2164,11 +2167,11 @@ static netdev_tx_t airo_start_xmit(struct sk_buff *skb,
set_bit(JOB_XMIT, &priv->jobs);
wake_up_interruptible(&priv->thr_wait);
} else
-   airo_end_xmit(dev);
+   airo_end_xmit(dev, false);
return NETDEV_TX_OK;
 }
 
-static void airo_end_xmit11(struct net_device *dev)
+static void airo_end_xmit11(struct net_device *dev, bool may_sleep)
 {
u16 status;
int i;
@@ -2179,7 +2182,7 @@ static void airo_end_xmit11(struct net_device *dev)
 
clear_bit(JOB_XMIT11, &priv->jobs);
clear_bit(FLAG_PENDING_XMIT11, &priv->flags);
-   status = transmit_802_11_packet (priv, fids[fid], skb->data);
+   status = transmit_802_11_packet(priv, fids[fid], skb->data, may_sleep);
up(&priv->sem);
 
i = MAX_FIDS / 2;
@@ -2243,7 +2246,7 @@ static netdev_tx_t airo_start_xmit11(struct sk_buff *skb,
set_bit(JOB_XMIT11, &priv->jobs);
wake_up_interruptible(&priv->thr_wait);
} else
-   airo_end_xmit11(dev);
+   airo_end_xmit11(dev, false);
retu

[PATCH net-next 05/15] net: tlan: Replace in_irq() usage

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() to determine if the tlan_priv::lock has to be
acquired in tlan_mii_read_reg() and tlan_mii_write_reg().

The interrupt handler acquires the lock outside of these functions so the
in_irq() check is meant to prevent a lock recursion deadlock. But this
check is incorrect when interrupt force threading is enabled because then
the handler runs in thread context and in_irq() correctly returns false.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

tlan_set_timer() has this conditional as well, but this function is only
invoked from task context or the timer callback itself. So it always has to
lock and the check can be removed.

tlan_mii_read_reg(), tlan_mii_write_reg() and tlan_phy_print() are invoked
from interrupt and other contexts.

Split out the actual function body into helper variants which are called
from interrupt context and make the original functions wrappers which
acquire tlan_priv::lock unconditionally.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Samuel Chessman 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/ti/tlan.c | 98 --
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/ti/tlan.c b/drivers/net/ethernet/ti/tlan.c
index 267c080ee084b..0b2ce4bdc2c3d 100644
--- a/drivers/net/ethernet/ti/tlan.c
+++ b/drivers/net/ethernet/ti/tlan.c
@@ -186,6 +186,7 @@ static void tlan_reset_adapter(struct net_device *);
 static voidtlan_finish_reset(struct net_device *);
 static voidtlan_set_mac(struct net_device *, int areg, char *mac);
 
+static void__tlan_phy_print(struct net_device *);
 static voidtlan_phy_print(struct net_device *);
 static voidtlan_phy_detect(struct net_device *);
 static voidtlan_phy_power_down(struct net_device *);
@@ -201,9 +202,11 @@ static voidtlan_phy_finish_auto_neg(struct 
net_device *);
   static int   tlan_phy_dp83840a_check(struct net_device *);
 */
 
-static booltlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
+static bool__tlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
+static voidtlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
 static voidtlan_mii_send_data(u16, u32, unsigned);
 static voidtlan_mii_sync(u16);
+static void__tlan_mii_write_reg(struct net_device *, u16, u16, u16);
 static voidtlan_mii_write_reg(struct net_device *, u16, u16, u16);
 
 static voidtlan_ee_send_start(u16);
@@ -242,23 +245,20 @@ static u32
tlan_handle_rx_eoc
 };
 
-static inline void
+static void
 tlan_set_timer(struct net_device *dev, u32 ticks, u32 type)
 {
struct tlan_priv *priv = netdev_priv(dev);
unsigned long flags = 0;
 
-   if (!in_irq())
-   spin_lock_irqsave(&priv->lock, flags);
+   spin_lock_irqsave(&priv->lock, flags);
if (priv->timer.function != NULL &&
priv->timer_type != TLAN_TIMER_ACTIVITY) {
-   if (!in_irq())
-   spin_unlock_irqrestore(&priv->lock, flags);
+   spin_unlock_irqrestore(&priv->lock, flags);
return;
}
priv->timer.function = tlan_timer;
-   if (!in_irq())
-   spin_unlock_irqrestore(&priv->lock, flags);
+   spin_unlock_irqrestore(&priv->lock, flags);
 
priv->timer_set_at = jiffies;
priv->timer_type = type;
@@ -1703,22 +1703,22 @@ static u32 tlan_handle_status_check(struct net_device 
*dev, u16 host_int)
 dev->name, (unsigned) net_sts);
}
if ((net_sts & TLAN_NET_STS_MIRQ) &&  (priv->phy_num == 0)) {
-   tlan_mii_read_reg(dev, phy, TLAN_TLPHY_STS, &tlphy_sts);
-   tlan_mii_read_reg(dev, phy, TLAN_TLPHY_CTL, &tlphy_ctl);
+   __tlan_mii_read_reg(dev, phy, TLAN_TLPHY_STS, 
&tlphy_sts);
+   __tlan_mii_read_reg(dev, phy, TLAN_TLPHY_CTL, 
&tlphy_ctl);
if (!(tlphy_sts & TLAN_TS_POLOK) &&
!(tlphy_ctl & TLAN_TC_SWAPOL)) {
tlphy_ctl |= TLAN_TC_SWAPOL;
-   tlan_mii_write_reg(dev, phy, TLAN_TLPHY_CTL,
-  tlphy_ctl);
+   __tlan_mii_write_reg(dev, phy, TLAN_TLPHY_CTL,
+tlphy_ctl);
} else if ((tlphy_sts & TLAN_TS_POLOK) &&
   (tlphy_ctl & TLAN_TC_SWAPOL)) {
tlphy_ctl &= ~TLAN_TC_

[PATCH net-next 01/15] net: orinoco: Remove BUG_ON(in_interrupt/irq())

2020-10-27 Thread Sebastian Andrzej Siewior
The usage of in_irq()/in_interrupt() in drivers is phased out and the
BUG_ON()'s based on those are not covering all contexts in which these
functions cannot be called.

Aside of that BUG_ON() should only be used as last resort, which is clearly
not the case here.

A broad variety of checks in the invoked functions (always enabled or debug
option dependent) cover these conditions already, so the BUG_ON()'s do not
really provide additional value.

Just remove them.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c 
b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index b849d27bd741e..046f2453ad5d9 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -859,8 +859,6 @@ static int ezusb_access_ltv(struct ezusb_priv *upriv,
int retval = 0;
enum ezusb_state state;
 
-   BUG_ON(in_irq());
-
if (!upriv->udev) {
retval = -ENODEV;
goto exit;
@@ -1349,7 +1347,6 @@ static int ezusb_init(struct hermes *hw)
struct ezusb_priv *upriv = hw->priv;
int retval;
 
-   BUG_ON(in_interrupt());
if (!upriv)
return -EINVAL;
 
@@ -1448,7 +1445,6 @@ static inline void ezusb_delete(struct ezusb_priv *upriv)
struct list_head *tmp_item;
unsigned long flags;
 
-   BUG_ON(in_interrupt());
BUG_ON(!upriv);
 
mutex_lock(&upriv->mtx);
-- 
2.28.0



[PATCH net-next 00/15] in_interrupt() cleanup, part 2

2020-10-27 Thread Sebastian Andrzej Siewior
Folks,

in the discussion about preempt count consistency across kernel configurations:

  https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

Linus clearly requested that code in drivers and libraries which changes
behaviour based on execution context should either be split up so that
e.g. task context invocations and BH invocations have different interfaces
or if that's not possible the context information has to be provided by the
caller which knows in which context it is executing.

This includes conditional locking, allocation mode (GFP_*) decisions and
avoidance of code paths which might sleep.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This is part two addressing remaining drivers except for orinoco-usb.

Sebastian


[PATCH net-next 02/15] net: neterion: s2io: Replace in_interrupt() for context detection

2020-10-27 Thread Sebastian Andrzej Siewior
wait_for_cmd_complete() uses in_interrupt() to detect whether it is safe to
sleep or not.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

in_interrupt() also is only partially correct because it fails to chose the
correct code path when just preemption or interrupts are disabled.

Add an argument 'may_block' to both functions and adjust the callers to
pass the context information.

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 s2io_card_up()
   s2io_set_multicast()

 init_nic()
   init_tti()

 s2io_close()
   do_s2io_delete_unicast_mc()
 do_s2io_add_mac()

 s2io_set_mac_addr()
   do_s2io_prog_unicast()
 do_s2io_add_mac()

 s2io_reset()
   do_s2io_restore_unicast_mc()
 do_s2io_add_mc()
   do_s2io_add_mac()

 s2io_open()
   do_s2io_prog_unicast()
 do_s2io_add_mac()

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 __dev_set_rx_mode()
s2io_set_multicast()

 s2io_txpic_intr_handle()
   s2io_link()
 init_tti()

Add a may_sleep argument to wait_for_cmd_complete(), s2io_set_multicast()
and init_tti() and hand the context information in from the call sites.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Jon Mason 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/neterion/s2io.c | 41 
 drivers/net/ethernet/neterion/s2io.h |  4 +--
 2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index d13d92bf74478..8f2f091bce899 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -1106,7 +1106,7 @@ static int s2io_print_pci_mode(struct s2io_nic *nic)
  *  '-1' on failure
  */
 
-static int init_tti(struct s2io_nic *nic, int link)
+static int init_tti(struct s2io_nic *nic, int link, bool may_sleep)
 {
struct XENA_dev_config __iomem *bar0 = nic->bar0;
register u64 val64 = 0;
@@ -1166,7 +1166,7 @@ static int init_tti(struct s2io_nic *nic, int link)
 
if (wait_for_cmd_complete(&bar0->tti_command_mem,
  TTI_CMD_MEM_STROBE_NEW_CMD,
- S2IO_BIT_RESET) != SUCCESS)
+ S2IO_BIT_RESET, may_sleep) != SUCCESS)
return FAILURE;
}
 
@@ -1659,7 +1659,7 @@ static int init_nic(struct s2io_nic *nic)
 */
 
/* Initialize TTI */
-   if (SUCCESS != init_tti(nic, nic->last_link_state))
+   if (SUCCESS != init_tti(nic, nic->last_link_state, true))
return -ENODEV;
 
/* RTI Initialization */
@@ -3331,7 +3331,7 @@ static void s2io_updt_xpak_counter(struct net_device *dev)
  */
 
 static int wait_for_cmd_complete(void __iomem *addr, u64 busy_bit,
-int bit_state)
+int bit_state, bool may_sleep)
 {
int ret = FAILURE, cnt = 0, delay = 1;
u64 val64;
@@ -3353,7 +3353,7 @@ static int wait_for_cmd_complete(void __iomem *addr, u64 
busy_bit,
}
}
 
-   if (in_interrupt())
+   if (!may_sleep)
mdelay(delay);
else
msleep(delay);
@@ -4877,8 +4877,7 @@ static struct net_device_stats *s2io_get_stats(struct 
net_device *dev)
  *  Return value:
  *  void.
  */
-
-static void s2io_set_multicast(struct net_device *dev)
+static void s2io_set_multicast(struct net_device *dev, bool may_sleep)
 {
int i, j, prev_cnt;
struct netdev_hw_addr *ha;
@@ -4903,7 +4902,7 @@ static void s2io_set_multicast(struct net_device *dev)
/* Wait till command completes */
wait_for_cmd_complete(&bar0->rmac_addr_cmd_mem,
  RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING,
- S2IO_BIT_RESET);
+ S2IO_BIT_RESET, may_sleep);
 
sp->m_cast_flg = 1;
sp->all_multi_pos = config->max_mc_addr - 1;
@@ -4920,7 +4919,7 @@ static void s2io_set_multicast(struct net_device *dev)
/* Wait till command completes */
wait_for_cmd_complete(&bar0->rmac_addr_cmd_mem,
  RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING,
- S2IO_BIT_RESET);
+ S2IO_BIT_RESET, may_sleep);
 
sp->m_cast_flg = 0;
sp->all_multi_pos = 0;

[PATCH net-next 03/15] net: forcedeth: Replace context and lock check with a lockdep_assert()

2020-10-27 Thread Sebastian Andrzej Siewior
nv_update_stats() triggers a WARN_ON() when invoked from hard interrupt
context because the locks in use are not hard interrupt safe. It also has
an assert_spin_locked() which was the lock check before the lockdep era.

Lockdep has way broader locking correctness checks and covers both issues,
so replace the warning and the lock assert with lockdep_assert_held().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Rain River 
Cc: Zhu Yanjun 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/nvidia/forcedeth.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c 
b/drivers/net/ethernet/nvidia/forcedeth.c
index 2fc10a36afa4a..7e85cf943be11 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1666,11 +1666,7 @@ static void nv_update_stats(struct net_device *dev)
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
 
-   /* If it happens that this is run in top-half context, then
-* replace the spin_lock of hwstats_lock with
-* spin_lock_irqsave() in calling functions. */
-   WARN_ONCE(in_irq(), "forcedeth: estats spin_lock(_bh) from top-half");
-   assert_spin_locked(&np->hwstats_lock);
+   lockdep_assert_held(&np->hwstats_lock);
 
/* query hardware */
np->estats.tx_bytes += readl(base + NvRegTxCnt);
-- 
2.28.0



Re: [PATCH v6] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-09-29 Thread Sebastian Andrzej Siewior
On 2020-09-29 10:02:15 [+], Song Bao Hua (Barry Song) wrote:
> > My point was that there will be a warning at run-time and you don't want
> > that. There are raw_ accessors if you know what you are doing. But…
> 
> I have only seen get_cpu_ptr/var() things will disable preemption. I don't 
> think
> we will have a warning as this_cpu_ptr() won't disable preemption.

Good. Just enable CONFIG_DEBUG_PREEMPT and tell please what happens.

> > Earlier you had compression/decompression with disabled preemption and
> 
> No. that is right now done in enabled preemption context with this patch. The 
> code before this patch
> was doing (de)compression in preemption-disabled context by using get_cpu_ptr 
> and get_cpu_var.

Exactly what I am saying. And within this get_cpu_ptr() section there
was the compression/decompression sitting. So compression/decompression
happend while preemtion was off.

> > strict per-CPU memory allocation. Now if you keep this per-CPU memory
> > allocation then you gain a possible bottleneck.
> > In the previous email you said that there may be a bottleneck in the
> > upper layer where you can't utilize all that memory you allocate. So you
> > may want to rethink that strategy before that rework.
> 
> we are probably not talking about same thing :-)
> I was talking about possible generic swap bottleneck. For example, LRU is 
> global,
> while swapping, multiple cores might have some locks on this LRU. for example,
> if we have 8 inactive pages to swap out, I am not sure if mm can use 8 cores
> to swap them out at the same time.

In that case you probably don't need 8* per-CPU memory for this task.

> > 
> > > 2. while allocating mutex, we can put the mutex into local memory by using
> > kmalloc_node().
> > > If we move to "struct mutex lock" directly, most CPUs in a NUMA server 
> > > will
> > have to access
> > > remote memory to read/write the mutex, therefore, this will increase the
> > latency dramatically.
> > 
> > If you need something per-CPU then DEFINE_PER_CPU() will give it to you.
> 
> Yes. It is true.
> 
> > It would be very bad for performance if this allocations were not from
> > CPU-local memory, right? So what makes you think this is worse than
> > kmalloc_node() based allocations?
> 
> Yes. If your read zswap code, it has considered NUMA very carefully by 
> allocating various
> memory locally. And in crypto framework, I also added API to allocate local 
> compression.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7bc13b5b60e94
> this zswap patch has used the new node-aware API.
> 
> Memory access crossing NUMA node, practically crossing packages, can 
> dramatically increase,
> like double, triple or more.

So you are telling me, DEFINE_PER_CPU() does not allocate the memory for
each CPU to be local but kmalloc_node() does?

> Thanks
> Barry

Sebastian


Re: [PATCH v6] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-09-29 Thread Sebastian Andrzej Siewior
On 2020-09-29 05:14:31 [+], Song Bao Hua (Barry Song) wrote:
> After second thought and trying to make this change, I would like to change 
> my mind
> and disagree with this idea. Two reasons:
> 1. while using this_cpu_ptr() without preemption lock, people usually put all 
> things bound
> with one cpu to one structure, so that once we get the pointer of the whole 
> structure, we get
> all its parts belonging to the same cpu. If we move the dstmem and mutex out 
> of the structure
> containing them, we will have to do:
>   a. get_cpu_ptr() for the acomp_ctx   //lock preemption
>   b. this_cpu_ptr() for the dstmem and mutex
>   c. put_cpu_ptr() for the acomp_ctx  //unlock preemption
>   d. mutex_lock()
> sg_init_one()
> compress/decompress etc.
> ...
> mutex_unlock
> 
> as the get() and put() have a preemption lock/unlock, this will make certain 
> this_cpu_ptr()
> in the step "b" will return the right dstmem and mutex which belong to the 
> same cpu with
> step "a".
> 
> The steps from "a" to "c" are quite silly and confusing. I believe the 
> existing code aligns
> with the most similar code in kernel better:
>   a. this_cpu_ptr()   //get everything for one cpu
>   b. mutex_lock()
> sg_init_one()
> compress/decompress etc.
> ...
> mutex_unlock

My point was that there will be a warning at run-time and you don't want
that. There are raw_ accessors if you know what you are doing. But…

Earlier you had compression/decompression with disabled preemption and
strict per-CPU memory allocation. Now if you keep this per-CPU memory
allocation then you gain a possible bottleneck.
In the previous email you said that there may be a bottleneck in the
upper layer where you can't utilize all that memory you allocate. So you
may want to rethink that strategy before that rework.

> 2. while allocating mutex, we can put the mutex into local memory by using 
> kmalloc_node().
> If we move to "struct mutex lock" directly, most CPUs in a NUMA server will 
> have to access
> remote memory to read/write the mutex, therefore, this will increase the 
> latency dramatically.

If you need something per-CPU then DEFINE_PER_CPU() will give it to you.
It would be very bad for performance if this allocations were not from
CPU-local memory, right? So what makes you think this is worse than
kmalloc_node() based allocations?

> Thanks
> Barry

Sebastian


Re: [PATCH v6] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-09-28 Thread Sebastian Andrzej Siewior
On 2020-08-19 00:31:00 [+1200], Barry Song wrote:
> diff --git a/mm/zswap.c b/mm/zswap.c
> index fbb782924ccc..00b5f14a7332 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -127,9 +129,17 @@ module_param_named(same_filled_pages_enabled, 
> zswap_same_filled_pages_enabled,
>  * data structures
>  **/
>  
> +struct crypto_acomp_ctx {
> + struct crypto_acomp *acomp;
> + struct acomp_req *req;
> + struct crypto_wait wait;
> + u8 *dstmem;
> + struct mutex *mutex;
> +};
> +
>  struct zswap_pool {
>   struct zpool *zpool;
> - struct crypto_comp * __percpu *tfm;
> + struct crypto_acomp_ctx __percpu *acomp_ctx;
>   struct kref kref;
>   struct list_head list;
>   struct work_struct release_work;
> @@ -388,23 +398,43 @@ static struct zswap_entry *zswap_entry_find_get(struct 
> rb_root *root,
>  * per-cpu code
>  **/
>  static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> +/*
> + * If users dynamically change the zpool type and compressor at runtime, i.e.
> + * zswap is running, zswap can have more than one zpool on one cpu, but they
> + * are sharing dtsmem. So we need this mutex to be per-cpu.
> + */
> +static DEFINE_PER_CPU(struct mutex *, zswap_mutex);

There is no need to make it sturct mutex*. You could make it struct
mutex. The following make it more obvious how the relation here is (even
without a comment):

|struct zswap_mem_pool {
|   void*dst_mem;
|   struct mutexlock;
|};
|
|static DEFINE_PER_CPU(struct zswap_mem_pool , zswap_mem_pool);

Please access only this, don't add use a pointer in a another struct to
dest_mem or lock which may confuse people.

This resource is per-CPU.  Do you really utilize them all? On 2, 8, 16,
64, 128 core system? More later…

> @@ -916,14 +974,21 @@ static int zswap_writeback_entry(struct zpool *pool, 
> unsigned long handle)
>  
>   case ZSWAP_SWAPCACHE_NEW: /* page is locked */
>   /* decompress */
> + acomp_ctx = this_cpu_ptr(entry->pool->acomp_ctx);

My feeling is that this triggers a warning with CONFIG_DEBUG_PREEMPT. I
don't see how it could be avoid it. If you are not preemptible here then
you must not sleep later.

> +
>   dlen = PAGE_SIZE;
>   src = (u8 *)zhdr + sizeof(struct zswap_header);
> - dst = kmap_atomic(page);
> - tfm = *get_cpu_ptr(entry->pool->tfm);
> - ret = crypto_comp_decompress(tfm, src, entry->length,
> -  dst, &dlen);
> - put_cpu_ptr(entry->pool->tfm);
> - kunmap_atomic(dst);
> + dst = kmap(page);
> +
> + mutex_lock(acomp_ctx->mutex);
> + sg_init_one(&input, src, entry->length);
> + sg_init_one(&output, dst, dlen);
> + acomp_request_set_params(acomp_ctx->req, &input, &output, 
> entry->length, dlen);
> + ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), 
> &acomp_ctx->wait);
> + dlen = acomp_ctx->req->dlen;
> + mutex_unlock(acomp_ctx->mutex);

Say a request comes in on CPU1. After the mutex_lock() you get migrated to
CPU2. You do crypto_wait_req() on CPU2. In that time another request
gets in on CPU1. It blocks on the very same mutex. No data corruption but
it could have used another buffer instead of blocking and waiting for
the previous one to finish its work.
So it might make sense to have a pool of pages which are shared in the
system globally system instead of having strict per-CPU allocations
while being fully migrate-able while the are in use.

While at it: For dst you could use sg_set_page(). This would avoid the
kmap(). 

> + kunmap(page);
>   BUG_ON(ret);
>   BUG_ON(dlen != PAGE_SIZE);
>  

Sebastian


Re: [PATCH v4] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-07-09 Thread Sebastian Andrzej Siewior
On 2020-07-09 07:55:22 [+], Song Bao Hua (Barry Song) wrote:
> Hello Sebastian, thanks for your reply and careful review.
Hi,

> I don't think we can simply "forward the result to the caller and let him 
> decide".
> Would you like to present some pseudo code?

I provided just some pseudo code to illustrate an example how the async
interface should look like (more or less). The essential part is where
you allow to feed multiple requests without blocking.
I went up the call-chain and found one potential user which seem to have
a list of pages which are processed. This looked like a nice example. I
haven't looked at the details.

I have no opinion whether or not it makes sense to switch to the async
interface in a sync way.

> Thanks
> Barry

Sebastian


Re: [PATCH v4] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-07-09 Thread Sebastian Andrzej Siewior
On 2020-07-09 01:32:38 [+], Song Bao Hua (Barry Song) wrote:
> > This looks using the same synchronous mechanism around an asynchronous
> > interface. It works as a PoC.
> > 
> > As far as I remember the crypto async interface, the incoming skbs were fed 
> > to
> > the async interface and returned to the caller so the NIC could continue
> > allocate new RX skbs and move on. Only if the queue of requests was getting
> > to long the code started to throttle. Eventually the async crypto code
> > completed the decryption operation in a different context and fed the
> > decrypted packet(s) into the stack.
> > 
> > From a quick view, you would have to return -EINPROGRESS here and have at
> > the caller side something like that:
> > 
> > iff --git a/mm/page_io.c b/mm/page_io.c
> > index e8726f3e3820b..9d1baa46ec3ed 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -252,12 +252,15 @@ int swap_writepage(struct page *page, struct
> > writeback_control *wbc)
> > unlock_page(page);
> > goto out;
> > }
> > -   if (frontswap_store(page) == 0) {
> > +   ret = frontswap_store(page);
> > +   if (ret == 0) {
> > set_page_writeback(page);
> > unlock_page(page);
> > end_page_writeback(page);
> > goto out;
> > }
> > +   if (ret = -EINPROGRESS)
> > +   goto out;
> > ret = __swap_writepage(page, wbc, end_swap_bio_write);
> >  out:
> > return ret;
> > 
> Unfortunately, this is not true and things are not that simple.
> 
> We can't simply depend on -EINPROGRESS and go out.
> We have to wait for the result of compression to decide if we should
> do __swap_writepage(). As one page might be compressed into two
> pages, in this case, we will still need to do _swap_writepage().
> As I replied in the latest email, all of the async improvement to frontswap
> needs very careful thinking and benchmark. It can only happen after
> we build the base in this patch, fixing the broken connection between
> zswap and those new zip drivers.

At the time the compression finishes you see what happens and based on
the design you can either complete it immediately (the 0/error case from
above) or forward the result to the caller and let him decide.

> Thanks
> Barry

Sebastian


Re: [PATCH v4] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-07-09 Thread Sebastian Andrzej Siewior
On 2020-07-08 21:45:47 [+], Song Bao Hua (Barry Song) wrote:
> > On 2020-07-08 00:52:10 [+1200], Barry Song wrote:
> > > @@ -127,9 +129,17 @@
> > > +struct crypto_acomp_ctx {
> > > + struct crypto_acomp *acomp;
> > > + struct acomp_req *req;
> > > + struct crypto_wait wait;
> > > + u8 *dstmem;
> > > + struct mutex mutex;
> > > +};
> > …
> > > @@ -1074,12 +1138,32 @@ static int zswap_frontswap_store(unsigned
> > type, pgoff_t offset,
> > >   }
> > >
> > >   /* compress */
> > > - dst = get_cpu_var(zswap_dstmem);
> > > - tfm = *get_cpu_ptr(entry->pool->tfm);
> > > - src = kmap_atomic(page);
> > > - ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
> > > - kunmap_atomic(src);
> > > - put_cpu_ptr(entry->pool->tfm);
> > > + acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
> > > +
> > > + mutex_lock(&acomp_ctx->mutex);
> > > +
> > > + src = kmap(page);
> > > + dst = acomp_ctx->dstmem;
> > 
> > that mutex is per-CPU, per-context. The dstmem pointer is per-CPU. So if
> > I read this right, you can get preempted after crypto_wait_req() and
> > another context in this CPU writes its data to the same dstmem and then…
> > 
> 
> This isn't true. Another thread in this cpu will be blocked by the mutex.
> It is impossible for two threads to write the same dstmem.
> If thread1 ran on cpu1, it held cpu1's mutex; if another thread wants to run 
> on cpu1, it is blocked.
> If thread1 ran on cpu1 first, it held cpu1's mutex, then it migrated to cpu2 
> (with very rare chance)
>   a. if another thread wants to run on cpu1, it is blocked;

How it is blocked? That "struct crypto_acomp_ctx" is
"this_cpu_ptr(entry->pool->acomp_ctx)" - which is per-CPU of a pool
which you can have multiple of. But `dstmem' you have only one per-CPU
no matter have many pools you have.
So pool1 on CPU1 uses the same `dstmem' as pool2 on CPU1. But pool1 and
pool2 on CPU1 use a different mutex for protection of this `dstmem'.

> Thanks
> Barry

Sebastian


Re: [PATCH v4] mm/zswap: move to use crypto_acomp API for hardware acceleration

2020-07-08 Thread Sebastian Andrzej Siewior
On 2020-07-08 00:52:10 [+1200], Barry Song wrote:
…
> @@ -127,9 +129,17 @@ module_param_named(same_filled_pages_enabled, 
> zswap_same_filled_pages_enabled,
>  * data structures
>  **/
>  
> +struct crypto_acomp_ctx {
> + struct crypto_acomp *acomp;
> + struct acomp_req *req;
> + struct crypto_wait wait;
> + u8 *dstmem;
> + struct mutex mutex;
> +};
…
> @@ -561,8 +614,9 @@ static struct zswap_pool *zswap_pool_create(char *type, 
> char *compressor)
>   pr_debug("using %s zpool\n", zpool_get_type(pool->zpool));
>  
>   strlcpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
> - pool->tfm = alloc_percpu(struct crypto_comp *);
> - if (!pool->tfm) {
> +
> + pool->acomp_ctx = alloc_percpu(struct crypto_acomp_ctx *);

Can't you allocate the whole structure instead just a pointer to it? The
structure looks just like bunch of pointers anyway. Less time for
pointer chasing means more time for fun.

> @@ -1074,12 +1138,32 @@ static int zswap_frontswap_store(unsigned type, 
> pgoff_t offset,
>   }
>  
>   /* compress */
> - dst = get_cpu_var(zswap_dstmem);
> - tfm = *get_cpu_ptr(entry->pool->tfm);
> - src = kmap_atomic(page);
> - ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
> - kunmap_atomic(src);
> - put_cpu_ptr(entry->pool->tfm);
> + acomp_ctx = *this_cpu_ptr(entry->pool->acomp_ctx);
> +
> + mutex_lock(&acomp_ctx->mutex);
> +
> + src = kmap(page);
> + dst = acomp_ctx->dstmem;

that mutex is per-CPU, per-context. The dstmem pointer is per-CPU. So if
I read this right, you can get preempted after crypto_wait_req() and
another context in this CPU writes its data to the same dstmem and then…

> + sg_init_one(&input, src, PAGE_SIZE);
> + /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
> + sg_init_one(&output, dst, PAGE_SIZE * 2);
> + acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, 
> dlen);
> + /*
> +  * it maybe looks a little bit silly that we send an asynchronous 
> request,
> +  * then wait for its completion synchronously. This makes the process 
> look
> +  * synchronous in fact.
> +  * Theoretically, acomp supports users send multiple acomp requests in 
> one
> +  * acomp instance, then get those requests done simultaneously. but in 
> this
> +  * case, frontswap actually does store and load page by page, there is 
> no
> +  * existing method to send the second page before the first page is done
> +  * in one thread doing frontswap.
> +  * but in different threads running on different cpu, we have different
> +  * acomp instance, so multiple threads can do (de)compression in 
> parallel.
> +  */
> + ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), 
> &acomp_ctx->wait);
> + dlen = acomp_ctx->req->dlen;
> + kunmap(page);
> +
>   if (ret) {
>   ret = -EINVAL;
>   goto put_dstmem;

This looks using the same synchronous mechanism around an asynchronous
interface. It works as a PoC.

As far as I remember the crypto async interface, the incoming skbs were
fed to the async interface and returned to the caller so the NIC could
continue allocate new RX skbs and move on. Only if the queue of requests
was getting to long the code started to throttle. Eventually the async
crypto code completed the decryption operation in a different context
and fed the decrypted packet(s) into the stack.

From a quick view, you would have to return -EINPROGRESS here and have
at the caller side something like that:

iff --git a/mm/page_io.c b/mm/page_io.c
index e8726f3e3820b..9d1baa46ec3ed 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -252,12 +252,15 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
unlock_page(page);
goto out;
}
-   if (frontswap_store(page) == 0) {
+   ret = frontswap_store(page);
+   if (ret == 0) {
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
goto out;
}
+   if (ret = -EINPROGRESS)
+   goto out;
ret = __swap_writepage(page, wbc, end_swap_bio_write);
 out:
return ret;

so that eventually callers like write_cache_pages() could feed all pages
into *writepage and then wait for that bulk to finish.

Having it this way would also reshape the memory allocation you have.
You have now per-context a per-CPU crypto request and everything. With
a 64 or 128 core I'm not sure you will use all that resources.
With a truly async interface you would be force to have a resource pool
or so which you would use and then only allow a certain amount of
parallel requests.

Sebastian


Re: [PATCH] random: Support freezable kthreads in add_hwgenerator_randomness()

2019-09-05 Thread Sebastian Andrzej Siewior
On 2019-09-04 11:49:57 [-0700], Stephen Boyd wrote:
> Can you try this?

yes, works.

Sebastian


Re: [PATCH] random: Support freezable kthreads in add_hwgenerator_randomness()

2019-09-04 Thread Sebastian Andrzej Siewior
On 2019-08-22 15:55:19 [+1000], Herbert Xu wrote:
> Patch applied.  Thanks.
[ ff296293b3538 ("random: Support freezable kthreads in 
add_hwgenerator_randomness()") ]

and since kthread_freezable_should_stop() has might_sleep() in it, I get
this:

|: do not call blocking ops when !TASK_RUNNING; state=1 set at 
[<349d1489>] prepare_to_wait_event+0x5a/0x180
|: WARNING: CPU: 0 PID: 828 at kernel/sched/core.c:6741 __might_sleep+0x6f/0x80
|: Modules linked in:
|:
|: CPU: 0 PID: 828 Comm: hwrng Not tainted 5.3.0-rc7-next-20190903+ #46
|: RIP: 0010:__might_sleep+0x6f/0x80
…
|: Call Trace:
|:  kthread_freezable_should_stop+0x1b/0x60
|:  add_hwgenerator_randomness+0xdd/0x130
|:  hwrng_fillfn+0xbf/0x120
|:  kthread+0x10c/0x140
|:  ret_from_fork+0x27/0x50

Sebastian


[PATCH 1/7] crypto: ux500: Use spinlock_t instead of struct spinlock

2019-07-04 Thread Sebastian Andrzej Siewior
For spinlocks the type spinlock_t should be used instead of "struct
spinlock".

Use spinlock_t for spinlock's definition.

Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/crypto/ux500/cryp/cryp.h | 4 ++--
 drivers/crypto/ux500/hash/hash_alg.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/ux500/cryp/cryp.h b/drivers/crypto/ux500/cryp/cryp.h
index bd89504e81678..8da7f87b339b4 100644
--- a/drivers/crypto/ux500/cryp/cryp.h
+++ b/drivers/crypto/ux500/cryp/cryp.h
@@ -241,12 +241,12 @@ struct cryp_device_data {
struct clk *clk;
struct regulator *pwr_regulator;
int power_status;
-   struct spinlock ctx_lock;
+   spinlock_t ctx_lock;
struct cryp_ctx *current_ctx;
struct klist_node list_node;
struct cryp_dma dma;
bool power_state;
-   struct spinlock power_state_spinlock;
+   spinlock_t power_state_spinlock;
bool restore_dev_ctx;
 };
 
diff --git a/drivers/crypto/ux500/hash/hash_alg.h 
b/drivers/crypto/ux500/hash/hash_alg.h
index ab2bd00c1c365..7c9bcc15125ff 100644
--- a/drivers/crypto/ux500/hash/hash_alg.h
+++ b/drivers/crypto/ux500/hash/hash_alg.h
@@ -366,10 +366,10 @@ struct hash_device_data {
phys_addr_t phybase;
struct klist_node   list_node;
struct device   *dev;
-   struct spinlock ctx_lock;
+   spinlock_t  ctx_lock;
struct hash_ctx *current_ctx;
boolpower_state;
-   struct spinlock power_state_lock;
+   spinlock_t  power_state_lock;
struct regulator*regulator;
struct clk  *clk;
boolrestore_dev_state;
-- 
2.20.1



[PATCH] crypto: scompress - initialize per-CPU variables on each CPU

2019-04-12 Thread Sebastian Andrzej Siewior
In commit 71052dcf4be70 ("crypto: scompress - Use per-CPU struct instead
multiple variables") I accidentally initialized multiple times the memory on a
random CPU. I should have initialize the memory on every CPU like it has
been done earlier. I didn't notice this because the scheduler didn't
move the task to another CPU.
Guenter managed to do that and the code crashed as expected.

Allocate / free per-CPU memory on each CPU.

Fixes: 71052dcf4be70 ("crypto: scompress - Use per-CPU struct instead multiple 
variables")
Reported-by: Guenter Roeck 
Signed-off-by: Sebastian Andrzej Siewior 
---
Guenter, this should cure that.

 crypto/scompress.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index da31f6fe1f833..712b4c2ea021f 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -76,7 +76,7 @@ static void crypto_scomp_free_scratches(void)
int i;
 
for_each_possible_cpu(i) {
-   scratch = raw_cpu_ptr(&scomp_scratch);
+   scratch = per_cpu_ptr(&scomp_scratch, i);
 
vfree(scratch->src);
vfree(scratch->dst);
@@ -93,7 +93,7 @@ static int crypto_scomp_alloc_scratches(void)
for_each_possible_cpu(i) {
void *mem;
 
-   scratch = raw_cpu_ptr(&scomp_scratch);
+   scratch = per_cpu_ptr(&scomp_scratch, i);
 
mem = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
if (!mem)
-- 
2.20.1



Re: [PATCH 2/2] crypto: scompress: Use per-CPU struct instead multiple variables

2019-04-12 Thread Sebastian Andrzej Siewior
On 2019-04-12 15:50:56 [+0200], To Guenter Roeck wrote:
> So I have this, let me try to dig further.

I'm such a moron. Patch is coming soon…

> > Guenter
> 
Sebastian


Re: [PATCH 2/2] crypto: scompress: Use per-CPU struct instead multiple variables

2019-04-12 Thread Sebastian Andrzej Siewior
On 2019-04-12 06:43:31 [-0700], Guenter Roeck wrote:
> On 4/12/19 1:42 AM, Sebastian Andrzej Siewior wrote:
> > On 2019-04-10 21:07:35 [-0700], Guenter Roeck wrote:
> > > Hi Sebastian,
> > Hi Guenter,
> > 
> > > Unfortunately, this patch causes random crashes.
> > …
> > > This is seen with an arm64 image running on qemu with machine xlnx-zcu102
> > > and two CPUs, and crypto test options enabled. It happens roughly every
> > > other boot. Reverting the patch fixes the problem. Bisect log (from
> > > crypto/master) is attached.
> > 
> > What is it that you are doing? I had it tcrypt tested on x86-64 before
> > posting. Let me check it again with the next tree on qemu…
> > 
> 
> Nothing special. Booting arm64:defconfig+various debug and test options in 
> qemu.
> It does not seem to depend on specific boot options. Crash failure rate, based
> on the last three of days testing, seems to be roughly 20%. The problem is 
> only
> seen if there is more than one CPU. I have seen it with 2, 4, and 6 CPUs 
> active,
> but it seems to be most likely to happen with 2 CPUs. I don't have enough data
> yet to be sure, though.
> 
> The problem is only seen with arm64, at least in my tests.

I found your test thingy and I managed to run it here. I started 
rootfs/arm64/run-qemu-arm64.sh

a few times (only with the xlnx-zcu102 targets) and after the third
iteration it exploded once.
So I have this, let me try to dig further.

> Guenter

Sebastian


Re: [PATCH 2/2] crypto: scompress: Use per-CPU struct instead multiple variables

2019-04-12 Thread Sebastian Andrzej Siewior
On 2019-04-10 21:39:34 [-0700], Eric Biggers wrote:
> Well, from a quick read of the patch, it's probably because it uses
> raw_cpu_ptr() instead of per_cpu_ptr() when allocating/freeing the buffers, so
> they are unlikely to actually be allocated for all CPUs.

memory is allocated in crypto_scomp_alloc_scratches() before the
algorithm is first used for for_each_possible_cpu. So all CPUs should
have it. 

> Also Sebastian, I don't understand the point of the spinlock.  What was wrong
> with get_cpu()?  Note that taking the spinlock disables preemption too...  
> Also
> now it's possible to spin for a long time on the spinlock, for no reason 
> really.

The long spin is only possible PREEMPT kernel if the context switches
the CPU between getting the pointer and acquiring the CPU. Should this
really become a problem I could try to think of something.
That spin_lock gives us lock annotation and a lock context. With
preempt_disable() it is just a BKL like thing with no context. And this
basically kills PREEMPT_RT.

> - Eric

Sebastian


Re: [PATCH 2/2] crypto: scompress: Use per-CPU struct instead multiple variables

2019-04-12 Thread Sebastian Andrzej Siewior
On 2019-04-10 21:07:35 [-0700], Guenter Roeck wrote:
> Hi Sebastian,
Hi Guenter,

> Unfortunately, this patch causes random crashes.
…
> This is seen with an arm64 image running on qemu with machine xlnx-zcu102
> and two CPUs, and crypto test options enabled. It happens roughly every
> other boot. Reverting the patch fixes the problem. Bisect log (from
> crypto/master) is attached.

What is it that you are doing? I had it tcrypt tested on x86-64 before
posting. Let me check it again with the next tree on qemu…

> Guenter

Sebastian


[PATCH 1/2] crypto: scompress: return proper error code for allocation failure

2019-03-29 Thread Sebastian Andrzej Siewior
If scomp_acomp_comp_decomp() fails to allocate memory for the
destination then we never copy back the data we compressed.
It is probably best to return an error code instead 0 in case of
failure.
I haven't found any user that is using acomp_request_set_params()
without the `dst' buffer so there is probably no harm.

Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/scompress.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 6f8305f8c3004..aea1a8e5d1954 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -171,8 +171,10 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, 
int dir)
if (!ret) {
if (!req->dst) {
req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
-   if (!req->dst)
+   if (!req->dst) {
+   ret = -ENOMEM;
goto out;
+   }
}
scatterwalk_map_and_copy(scratch_dst, req->dst, 0, req->dlen,
 1);
-- 
2.20.1



[PATCH 2/2] crypto: scompress: Use per-CPU struct instead multiple variables

2019-03-29 Thread Sebastian Andrzej Siewior
Two per-CPU variables are allocated as pointer to per-CPU memory which
then are used as scratch buffers.
We could be smart about this and use instead a per-CPU struct which
contains the pointers already and then we need to allocate just the
scratch buffers.
Add a lock to the struct. By doing so we can avoid the get_cpu()
statement and gain lockdep coverage (if enabled) to ensure that the lock
is always acquired in the right context. On non-preemptible kernels the
lock vanishes.
It is okay to use raw_cpu_ptr() in order to get a pointer to the struct
since it is protected by the spinlock.

The diffstat of this is negative and according to size scompress.o:
   textdata bss dec hex filename
   1847 160  242031 7ef dbg_before.o
   1754 232   41990 7c6 dbg_after.o
   1799  64  241887 75f no_dbg-before.o
   1703  88   41795 703 no_dbg-after.o

The overall size increase difference is also negative. The increase in
the data section is only four bytes without lockdep.

Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/scompress.c | 137 -
 1 file changed, 60 insertions(+), 77 deletions(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index aea1a8e5d1954..da31f6fe1f833 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -29,9 +29,17 @@
 #include 
 #include "internal.h"
 
+struct scomp_scratch {
+   spinlock_t  lock;
+   void*src;
+   void*dst;
+};
+
+static DEFINE_PER_CPU(struct scomp_scratch, scomp_scratch) = {
+   .lock = __SPIN_LOCK_UNLOCKED(scomp_scratch.lock),
+};
+
 static const struct crypto_type crypto_scomp_type;
-static void * __percpu *scomp_src_scratches;
-static void * __percpu *scomp_dst_scratches;
 static int scomp_scratch_users;
 static DEFINE_MUTEX(scomp_lock);
 
@@ -62,76 +70,53 @@ static void crypto_scomp_show(struct seq_file *m, struct 
crypto_alg *alg)
seq_puts(m, "type : scomp\n");
 }
 
-static void crypto_scomp_free_scratches(void * __percpu *scratches)
+static void crypto_scomp_free_scratches(void)
 {
+   struct scomp_scratch *scratch;
int i;
 
-   if (!scratches)
-   return;
-
-   for_each_possible_cpu(i)
-   vfree(*per_cpu_ptr(scratches, i));
-
-   free_percpu(scratches);
-}
-
-static void * __percpu *crypto_scomp_alloc_scratches(void)
-{
-   void * __percpu *scratches;
-   int i;
-
-   scratches = alloc_percpu(void *);
-   if (!scratches)
-   return NULL;
-
for_each_possible_cpu(i) {
-   void *scratch;
+   scratch = raw_cpu_ptr(&scomp_scratch);
 
-   scratch = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
-   if (!scratch)
+   vfree(scratch->src);
+   vfree(scratch->dst);
+   scratch->src = NULL;
+   scratch->dst = NULL;
+   }
+}
+
+static int crypto_scomp_alloc_scratches(void)
+{
+   struct scomp_scratch *scratch;
+   int i;
+
+   for_each_possible_cpu(i) {
+   void *mem;
+
+   scratch = raw_cpu_ptr(&scomp_scratch);
+
+   mem = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
+   if (!mem)
goto error;
-   *per_cpu_ptr(scratches, i) = scratch;
-   }
-
-   return scratches;
-
-error:
-   crypto_scomp_free_scratches(scratches);
-   return NULL;
-}
-
-static void crypto_scomp_free_all_scratches(void)
-{
-   if (!--scomp_scratch_users) {
-   crypto_scomp_free_scratches(scomp_src_scratches);
-   crypto_scomp_free_scratches(scomp_dst_scratches);
-   scomp_src_scratches = NULL;
-   scomp_dst_scratches = NULL;
-   }
-}
-
-static int crypto_scomp_alloc_all_scratches(void)
-{
-   if (!scomp_scratch_users++) {
-   scomp_src_scratches = crypto_scomp_alloc_scratches();
-   if (!scomp_src_scratches)
-   return -ENOMEM;
-   scomp_dst_scratches = crypto_scomp_alloc_scratches();
-   if (!scomp_dst_scratches) {
-   crypto_scomp_free_scratches(scomp_src_scratches);
-   scomp_src_scratches = NULL;
-   return -ENOMEM;
-   }
+   scratch->src = mem;
+   mem = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
+   if (!mem)
+   goto error;
+   scratch->dst = mem;
}
return 0;
+error:
+   crypto_scomp_free_scratches();
+   return -ENOMEM;
 }
 
 static int crypto_scomp_init_tfm(struct crypto_tfm *tfm)
 {
-   int ret;
+   int ret = 0;
 
mutex_lock(&scomp_lock);
-   ret = crypto_scomp_alloc_all_scratches();
+   if (!scomp_scratch_users++)
+   ret = cr

[PATCH 1/2] crypto: chtls: remove cdev_list_lock

2019-02-11 Thread Sebastian Andrzej Siewior
Last user of cdev_list_lock was removed in commit

  6422ccc5fbefb ("crypto/chelsio/chtls: listen fails with multiadapt")

Cc: Atul Gupta 
Cc: Harsh Jain 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
index 563f8fe7686ad..dd2daf2a54e06 100644
--- a/drivers/crypto/chelsio/chtls/chtls_main.c
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -30,7 +30,6 @@
  */
 static LIST_HEAD(cdev_list);
 static DEFINE_MUTEX(cdev_mutex);
-static DEFINE_MUTEX(cdev_list_lock);
 
 static DEFINE_MUTEX(notify_mutex);
 static RAW_NOTIFIER_HEAD(listen_notify_list);
-- 
2.20.1



[PATCH 2/2] crypto: user: remove crypto_cfg_mutex

2019-02-11 Thread Sebastian Andrzej Siewior
crypto_cfg_mutex was never used since it got introduced in commit

  cac5818c25d04 ("crypto: user - Implement a generic crypto statistics")

Cc: Corentin Labbe 
Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/crypto_user_stat.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index 3e9a53233d804..028be6a1ea781 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -20,8 +20,6 @@
 
 #define null_terminated(x) (strnlen(x, sizeof(x)) < sizeof(x))
 
-static DEFINE_MUTEX(crypto_cfg_mutex);
-
 extern struct sock *crypto_nlsk;
 
 struct crypto_dump_info {
-- 
2.20.1



Re: [PATCH RT] padata: Make padata_do_serial() use get_cpu_light()

2019-01-14 Thread Sebastian Andrzej Siewior
On 2019-01-09 16:59:26 [+0100], Daniel Bristot de Oliveira wrote:
> diff --git a/kernel/padata.c b/kernel/padata.c
> index d568cc56405f..bfcbdeb20ba5 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -295,7 +295,7 @@ static void padata_reorder_timer(struct timer_list *t)
>   unsigned int weight;
>   int target_cpu, cpu;
>  
> - cpu = get_cpu();
> + cpu = get_cpu_light();

This can become a
cpu = smp_processor_id()

>   /* We don't lock pd here to not interfere with parallel processing
>* padata_reorder() calls on other CPUs. We just need any CPU out of
> @@ -321,7 +321,7 @@ static void padata_reorder_timer(struct timer_list *t)
>   padata_reorder(pd);
>   }
>  
> - put_cpu();
> + put_cpu_light();

and this can go because this is invoked in a timer callback.

>  }
>  
>  static void padata_serial_worker(struct work_struct *serial_work)
> @@ -369,7 +369,7 @@ void padata_do_serial(struct padata_priv *padata)
>  
>   pd = padata->pd;
>  
> - cpu = get_cpu();
> + cpu = get_cpu_light();

this is tricky but I would also say that this can become
smp_processor_id() like in the upper hunk

>   /* We need to run on the same CPU padata_do_parallel(.., padata, ..)
>* was called on -- or, at least, enqueue the padata object into the
> @@ -387,7 +387,7 @@ void padata_do_serial(struct padata_priv *padata)
>   list_add_tail(&padata->list, &pqueue->reorder.list);
>   spin_unlock(&pqueue->reorder.lock);
>  
> - put_cpu();
> + put_cpu_light();

and than this can go, too. It looks like this invoked from a worker with
BH disabled. If it does, it is all good. If not then it might be
problematic because later we have
queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);

and nothing guarantees that the work is carried out by the CPU specified
if CPU-hotplug is involved.

>   /* If we're running on the wrong CPU, call padata_reorder() via a
>* kernel worker.

Sebastian


Re: [PATCH] crypto: caam/qi - simplify CGR allocation, freeing

2018-10-29 Thread Sebastian Andrzej Siewior
On 2018-10-25 14:05:32 [+], Horia Geanta wrote:
> > Now, what is the problem with the CPU limitation? Is this a HW
> > limitation that you can access the registers from a certain CPU?
> > 
> Roy confirmed the CPU limitation should actually be removed, there is nothing 
> in
> HW requiring it.
> A clean-up patch will follow.

good. Thank you.

> Thanks,
> Horia

Sebastian


Re: [PATCH] crypto: caam/qi - simplify CGR allocation, freeing

2018-10-09 Thread Sebastian Andrzej Siewior
On 2018-10-08 14:09:37 [+0300], Horia Geantă wrote:
> CGRs (Congestion Groups) have to be freed by the same CPU that
> initialized them.
> This is why currently the driver takes special measures; however, using
> set_cpus_allowed_ptr() is incorrect - as reported by Sebastian.
> 
> Instead of the generic solution of replacing set_cpus_allowed_ptr() with
> work_on_cpu_safe(), we use the qman_delete_cgr_safe() QBMan API instead
> of qman_delete_cgr() - which internally takes care of proper CGR
> deletion.
> 
> Link: https://lkml.kernel.org/r/20181005125443.dfhd2asqktm22...@linutronix.de
> Reported-by: Sebastian Andrzej Siewior 
> Signed-off-by: Horia Geantă 

Oh. No more usage of set_cpus_allowed_ptr(). Wonderful. Thank you.
 Acked-by: Sebastian Andrzej Siewior 
for that.

Now that you shifted my attention to qman_delete_cgr_safe().
Could you please use work_on_cpu_safe() here instead
smp_call_function_single() with preempt_disable() around it?

Now, what is the problem with the CPU limitation? Is this a HW
limitation that you can access the registers from a certain CPU?

This still fails (silently) if the CPU is missing, right? If you can't
get around it, you could block the CPU from going offline. You could
register a HP notifier
cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, …

and the function would return -EINVAL if this is the special CPU. The
other thing would be forbid rmmod. This *could* work but if I remember
correctly, an explicit unbind can't be stopped.

Sebastian


set_cpus_allowed_ptr() usage in FREESCALE CAAM

2018-10-05 Thread Sebastian Andrzej Siewior
Hi,

this block:
|int caam_qi_shutdown(struct device *qidev)
| {
| struct cpumask old_cpumask = current->cpus_allowed;
…
| /*
|  * QMan driver requires CGRs to be deleted from same CPU from where 
they
|  * were instantiated. Hence we get the module removal execute from the
|  * same CPU from where it was originally inserted.
|  */
| set_cpus_allowed_ptr(current, get_cpu_mask(mod_init_cpu));
…
|  /* Now that we're done with the CGRs, restore the cpus allowed mask 
*/
| set_cpus_allowed_ptr(current, &old_cpumask);

in drivers/crypto/caam/qi.c needs to go. I saw it twice in the driver.
set_cpus_allowed_ptr() is not intended for this kind of thing.

What you want is to use work_on_cpu_safe() instead. It takes also a CPU
as an argument. You need to check the error code of the function if it
worked because the CPU may have gone offline. This functions also
ensures that the CPU does not vanish in the middle of the work. 
Also please check the error code in both cases of the function because
it may fail if the CPU is not online.

Sebastian


Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks

2018-07-24 Thread Sebastian Andrzej Siewior
On 2018-07-24 19:12:20 [+0200], Ard Biesheuvel wrote:
> Vakul reports a considerable performance hit when running the accelerated
> arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have
> been updated to take the TIF_NEED_RESCHED flag into account.

just in time. I will try to come up with some numbers on RT with the
original patch and with that one. I have it almost working…

Sebastian


Re: Can a driver->probe be called for two devices at the same time (WAS: Re: [PATCH] crypto/ccp: don't disable interrupts while setting up debugfs)

2018-02-27 Thread Sebastian Andrzej Siewior
On 2018-02-27 19:40:34 [+0100], Greg Kroah-Hartman wrote:
> On Tue, Feb 27, 2018 at 06:33:14PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2018-02-27 11:08:56 [-0600], Gary R Hook wrote:
> > > That issue remains unclear to me: Are probes of PCI devices guaranteed to 
> > > be
> > > serialized? Observations on my CCPs says that they occur in order, but I
> > > don't know for certain that serialization is guaranteed.
> > > 
> > > Is there a definitive statement on this somewhere that I just don't know
> > > about?
> 
> The bus enforces this.
> 
> > So the question if a driver can probe two devices simultaneously.
> 
> Depends on the bus type.

PCI

> thanks,
> 
> greg k-h

Sebastian


Can a driver->probe be called for two devices at the same time (WAS: Re: [PATCH] crypto/ccp: don't disable interrupts while setting up debugfs)

2018-02-27 Thread Sebastian Andrzej Siewior
On 2018-02-27 11:08:56 [-0600], Gary R Hook wrote:
> That issue remains unclear to me: Are probes of PCI devices guaranteed to be
> serialized? Observations on my CCPs says that they occur in order, but I
> don't know for certain that serialization is guaranteed.
> 
> Is there a definitive statement on this somewhere that I just don't know
> about?

So the question if a driver can probe two devices simultaneously. I'm
not sure. We have PROBE_PREFER_ASYNCHRONOUS which defers the probe to
worker. However I have no idea if two of those worker can run at the
same time.

> I think a mutex would be just fine; I got this wrong, clearly. Let me work
> up a patch using a mutex.

I've sent one. Why not just ack it and be done with it?

> Gary

Sebastian


Re: [PATCH] crypto/ccp: don't disable interrupts while setting up debugfs

2018-02-26 Thread Sebastian Andrzej Siewior
On 2018-02-25 21:04:27 [-0500], Hook, Gary wrote:
> On 2/23/2018 5:33 PM, Sebastian Andrzej Siewior wrote:
> > I don't why we need take a single write lock and disable interrupts
> > while setting up debugfs. This is what what happens when we try anyway:
> 
> There is more than one CCP on some processors. The lock is intended to
> serialize attempts to initialize the new directory, but a R/W lock isn't
> required.

And they are probed in parallel? Any you need disable interrupts while
creating the debugfs folder? A mutex isn't enough?

> My testing on  an EPYC (8 CCPs) didn't expose this problem. May I ask what
> hardware you used here?
an EPYC, too. I have no idea how many CCPs were around but lspci -k
printed that driver more than once. Try to enable lockdep and "sleeping
while atomic".
But I did provide you a complete backtrace, look:

> > |ccp :03:00.2: enabling device ( -> 0002)
> > |BUG: sleeping function called from invalid context at 
> > kernel/locking/rwsem.c:69
> > |in_atomic(): 1, irqs_disabled(): 1, pid: 3, name: kworker/0:0
> > |irq event stamp: 17150
> > |hardirqs last  enabled at (17149): [<97a18c49>] 
> > restore_regs_and_return_to_kernel+0x0/0x23
> > |hardirqs last disabled at (17150): [<0773b3a9>] 
> > _raw_write_lock_irqsave+0x1b/0x50
> > |softirqs last  enabled at (17148): [<64d56155>] 
> > __do_softirq+0x3b8/0x4c1
> > |softirqs last disabled at (17125): [<92633c18>] irq_exit+0xb1/0xc0
> > |CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.16.0-rc2+ #30
> > |Workqueue: events work_for_cpu_fn
> > |Call Trace:
> > | dump_stack+0x7d/0xb6
> > | ___might_sleep+0x1eb/0x250
> > | down_write+0x17/0x60
> > | start_creating+0x4c/0xe0
> > | debugfs_create_dir+0x9/0x100
> > | ccp5_debugfs_setup+0x191/0x1b0
> > | ccp5_init+0x8a7/0x8c0
> > | ccp_dev_init+0xb8/0xe0
> > | sp_init+0x6c/0x90
> > | sp_pci_probe+0x26e/0x590
> > | local_pci_probe+0x3f/0x90
> > | work_for_cpu_fn+0x11/0x20
> > | process_one_work+0x1ff/0x650
> > | worker_thread+0x1d4/0x3a0
> > | kthread+0xfe/0x130
> > | ret_from_fork+0x27/0x50

Sebastian


[PATCH] crypto/ccp: don't disable interrupts while setting up debugfs

2018-02-23 Thread Sebastian Andrzej Siewior
I don't why we need take a single write lock and disable interrupts
while setting up debugfs. This is what what happens when we try anyway:

|ccp :03:00.2: enabling device ( -> 0002)
|BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:69
|in_atomic(): 1, irqs_disabled(): 1, pid: 3, name: kworker/0:0
|irq event stamp: 17150
|hardirqs last  enabled at (17149): [<97a18c49>] 
restore_regs_and_return_to_kernel+0x0/0x23
|hardirqs last disabled at (17150): [<0773b3a9>] 
_raw_write_lock_irqsave+0x1b/0x50
|softirqs last  enabled at (17148): [<64d56155>] 
__do_softirq+0x3b8/0x4c1
|softirqs last disabled at (17125): [<92633c18>] irq_exit+0xb1/0xc0
|CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.16.0-rc2+ #30
|Workqueue: events work_for_cpu_fn
|Call Trace:
| dump_stack+0x7d/0xb6
| ___might_sleep+0x1eb/0x250
| down_write+0x17/0x60
| start_creating+0x4c/0xe0
| debugfs_create_dir+0x9/0x100
| ccp5_debugfs_setup+0x191/0x1b0
| ccp5_init+0x8a7/0x8c0
| ccp_dev_init+0xb8/0xe0
| sp_init+0x6c/0x90
| sp_pci_probe+0x26e/0x590
| local_pci_probe+0x3f/0x90
| work_for_cpu_fn+0x11/0x20
| process_one_work+0x1ff/0x650
| worker_thread+0x1d4/0x3a0
| kthread+0xfe/0x130
| ret_from_fork+0x27/0x50

If any locking is required, a simple mutex will do it.

Cc: Gary R Hook 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/crypto/ccp/ccp-debugfs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-debugfs.c b/drivers/crypto/ccp/ccp-debugfs.c
index 59d4ca4e72d8..1a734bd2070a 100644
--- a/drivers/crypto/ccp/ccp-debugfs.c
+++ b/drivers/crypto/ccp/ccp-debugfs.c
@@ -278,7 +278,7 @@ static const struct file_operations ccp_debugfs_stats_ops = 
{
 };
 
 static struct dentry *ccp_debugfs_dir;
-static DEFINE_RWLOCK(ccp_debugfs_lock);
+static DEFINE_MUTEX(ccp_debugfs_lock);
 
 #defineMAX_NAME_LEN20
 
@@ -290,16 +290,15 @@ void ccp5_debugfs_setup(struct ccp_device *ccp)
struct dentry *debugfs_stats;
struct dentry *debugfs_q_instance;
struct dentry *debugfs_q_stats;
-   unsigned long flags;
int i;
 
if (!debugfs_initialized())
return;
 
-   write_lock_irqsave(&ccp_debugfs_lock, flags);
+   mutex_lock(&ccp_debugfs_lock);
if (!ccp_debugfs_dir)
ccp_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
-   write_unlock_irqrestore(&ccp_debugfs_lock, flags);
+   mutex_unlock(&ccp_debugfs_lock);
if (!ccp_debugfs_dir)
return;
 
-- 
2.16.1



[PATCH] crypto: mcryptd: protect the per-CPU queue with a lock

2017-11-30 Thread Sebastian Andrzej Siewior
mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.

If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.

Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().

Cc: sta...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/mcryptd.c | 23 ++-
 include/crypto/mcryptd.h |  1 +
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/crypto/mcryptd.c b/crypto/mcryptd.c
index 4e6472658852..eca04d3729b3 100644
--- a/crypto/mcryptd.c
+++ b/crypto/mcryptd.c
@@ -81,6 +81,7 @@ static int mcryptd_init_queue(struct mcryptd_queue *queue,
pr_debug("cpu_queue #%d %p\n", cpu, queue->cpu_queue);
crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
INIT_WORK(&cpu_queue->work, mcryptd_queue_worker);
+   spin_lock_init(&cpu_queue->q_lock);
}
return 0;
 }
@@ -104,15 +105,16 @@ static int mcryptd_enqueue_request(struct mcryptd_queue 
*queue,
int cpu, err;
struct mcryptd_cpu_queue *cpu_queue;
 
-   cpu = get_cpu();
-   cpu_queue = this_cpu_ptr(queue->cpu_queue);
-   rctx->tag.cpu = cpu;
+   cpu_queue = raw_cpu_ptr(queue->cpu_queue);
+   spin_lock(&cpu_queue->q_lock);
+   cpu = smp_processor_id();
+   rctx->tag.cpu = smp_processor_id();
 
err = crypto_enqueue_request(&cpu_queue->queue, request);
pr_debug("enqueue request: cpu %d cpu_queue %p request %p\n",
 cpu, cpu_queue, request);
+   spin_unlock(&cpu_queue->q_lock);
queue_work_on(cpu, kcrypto_wq, &cpu_queue->work);
-   put_cpu();
 
return err;
 }
@@ -161,16 +163,11 @@ static void mcryptd_queue_worker(struct work_struct *work)
cpu_queue = container_of(work, struct mcryptd_cpu_queue, work);
i = 0;
while (i < MCRYPTD_BATCH || single_task_running()) {
-   /*
-* preempt_disable/enable is used to prevent
-* being preempted by mcryptd_enqueue_request()
-*/
-   local_bh_disable();
-   preempt_disable();
+
+   spin_lock_bh(&cpu_queue->q_lock);
backlog = crypto_get_backlog(&cpu_queue->queue);
req = crypto_dequeue_request(&cpu_queue->queue);
-   preempt_enable();
-   local_bh_enable();
+   spin_unlock_bh(&cpu_queue->q_lock);
 
if (!req) {
mcryptd_opportunistic_flush();
@@ -185,7 +182,7 @@ static void mcryptd_queue_worker(struct work_struct *work)
++i;
}
if (cpu_queue->queue.qlen)
-   queue_work(kcrypto_wq, &cpu_queue->work);
+   queue_work_on(smp_processor_id(), kcrypto_wq, &cpu_queue->work);
 }
 
 void mcryptd_flusher(struct work_struct *__work)
diff --git a/include/crypto/mcryptd.h b/include/crypto/mcryptd.h
index cceafa01f907..b67404fc4b34 100644
--- a/include/crypto/mcryptd.h
+++ b/include/crypto/mcryptd.h
@@ -27,6 +27,7 @@ static inline struct mcryptd_ahash *__mcryptd_ahash_cast(
 
 struct mcryptd_cpu_queue {
struct crypto_queue queue;
+   spinlock_t q_lock;
struct work_struct work;
 };
 
-- 
2.15.0



Re: [PATCH] random: silence compiler warnings and fix race

2017-06-19 Thread Sebastian Andrzej Siewior
On 2017-06-19 22:55:37 [+0200], Jason A. Donenfeld wrote:
> On Mon, Jun 19, 2017 at 9:45 AM, Sebastian Andrzej Siewior
>  wrote:
> > ehm. You sure? I simply delayed the lock-dropping _after_ the state
> > variable was been modified. So it was basically what your patch did
> > except it was unlocked later…
> 
> Yes, I'm sure. You moved the call to invalidate_batched_entropy() to
> be after the assignment of crng_init. However, the call to
> invalidate_batched_entropy() must be made _before_ the assignment of
> crng_init.

so you need to find a another way then. Doing the assignment after
dropping the lock opens another race.

> >> > Are use about that? I am not sure that the gcc will inline "crng_init"
> >> > read twice. It is not a local variable. READ_ONCE() is usually used
> >> > where gcc could cache a memory access but you do not want this. But hey!
> >> > If someone knows better I am here to learn.
> >>
> >> The whole purpose is that I _want_ it to cache the memory access so
> >> that it is _not_ inlined. So, based on your understanding, it does
> >> exactly what I intended it to do. The reason is that I'd like to avoid
> >> a lock imbalance, which could happen if the read is inlined.
> >
> > So it was good as it was which means you can drop that READ_ONCE().
> 
> Except READ_ONCE ensures that the compiler will never inline it, so it
> actually needs to stay.

I don't think the compiler is allowed to inline it the way you describe
it. This would render any assignment to local variable useless. Also
the READ_ONCE creates worse code in this case (because the read can not
be delayed).

Sebastian


Re: [PATCH] random: silence compiler warnings and fix race

2017-06-19 Thread Sebastian Andrzej Siewior
On 2017-06-17 02:39:40 [+0200], Jason A. Donenfeld wrote:
> On Fri, Jun 16, 2017 at 4:35 PM, Sebastian Andrzej Siewior
>  wrote:
> > I wouldn't just push the lock one up as is but move that write part to
> > crng_init to remain within the locked section. Like that:
> 
> We can't quite do that, because invalidate_batched_entropy() needs to
> be called _before_ crng_init. Otherwise a concurrent call to
> get_random_u32/u64() will have crng_init being the wrong value when
> the batched entropy is still old.

ehm. You sure? I simply delayed the lock-dropping _after_ the state
variable was been modified. So it was basically what your patch did
except it was unlocked later…

> 
> > Are use about that? I am not sure that the gcc will inline "crng_init"
> > read twice. It is not a local variable. READ_ONCE() is usually used
> > where gcc could cache a memory access but you do not want this. But hey!
> > If someone knows better I am here to learn.
> 
> The whole purpose is that I _want_ it to cache the memory access so
> that it is _not_ inlined. So, based on your understanding, it does
> exactly what I intended it to do. The reason is that I'd like to avoid
> a lock imbalance, which could happen if the read is inlined.

So it was good as it was which means you can drop that READ_ONCE().

> Jason

Sebastian


Re: [PATCH v5 01/13] random: invalidate batched entropy after crng init

2017-06-16 Thread Sebastian Andrzej Siewior
On 2017-06-16 14:12:42 [+0200], Jason A. Donenfeld wrote:
> I actually figured that out myself after sending the initial email, so
> then I wrote a follow-up patch which I attached to this thread. You
> should have received it. Can you take a look?

replied to the patch.

Sebastian


Re: [PATCH] random: silence compiler warnings and fix race

2017-06-16 Thread Sebastian Andrzej Siewior
On 2017-06-15 00:45:26 [+0200], Jason A. Donenfeld wrote:
> Odd versions of gcc for the sh4 architecture will actually warn about
> flags being used while uninitialized, so we set them to zero. Non crazy
> gccs will optimize that out again, so it doesn't make a difference.

that is minor

> Next, over aggressive gccs could inline the expression that defines
> use_lock, which could then introduce a race resulting in a lock
> imbalance. By using READ_ONCE, we prevent that fate. Finally, we make
> that assignment const, so that gcc can still optimize a nice amount.

Not sure about that, more below.

> Finally, we fix a potential deadlock between primary_crng.lock and
> batched_entropy_reset_lock, where they could be called in opposite
> order. Moving the call to invalidate_batched_entropy to outside the lock
> rectifies this issue.

and *that* is separate issue which has to pulled in for stable once it
has been addressed in Linus' tree.

> Signed-off-by: Jason A. Donenfeld 
> ---
> Ted -- the first part of this is the fixup patch we discussed earlier.
> Then I added on top a fix for a potentially related race.
> 
> I'm not totally convinced that moving this block to outside the spinlock
> is 100% okay, so please give this a close look before merging.
> 
> 
>  drivers/char/random.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index e870f329db88..01a260f67437 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -803,13 +803,13 @@ static int crng_fast_load(const char *cp, size_t len)
>   p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp;
>   cp++; crng_init_cnt++; len--;
>   }
> + spin_unlock_irqrestore(&primary_crng.lock, flags);
>   if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
>   invalidate_batched_entropy();
>   crng_init = 1;
>   wake_up_interruptible(&crng_init_wait);
>   pr_notice("random: fast init done\n");
>   }
> - spin_unlock_irqrestore(&primary_crng.lock, flags);
>   return 1;

I wouldn't just push the lock one up as is but move that write part to
crng_init to remain within the locked section. Like that:

diff --git a/drivers/char/random.c b/drivers/char/random.c
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -804,12 +804,14 @@ static int crng_fast_load(const char *cp, size_t len)
cp++; crng_init_cnt++; len--;
}
if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
-   invalidate_batched_entropy();
crng_init = 1;
+   spin_unlock_irqrestore(&primary_crng.lock, flags);
+   invalidate_batched_entropy();
wake_up_interruptible(&crng_init_wait);
pr_notice("random: fast init done\n");
+   } else {
+   spin_unlock_irqrestore(&primary_crng.lock, flags);
}
-   spin_unlock_irqrestore(&primary_crng.lock, flags);
return 1;
 }
 
@@ -842,13 +844,16 @@ static void crng_reseed(struct crng_state *crng, struct 
entropy_store *r)
memzero_explicit(&buf, sizeof(buf));
crng->init_time = jiffies;
if (crng == &primary_crng && crng_init < 2) {
-   invalidate_batched_entropy();
crng_init = 2;
+   spin_unlock_irqrestore(&primary_crng.lock, flags);
+
+   invalidate_batched_entropy();
process_random_ready_list();
wake_up_interruptible(&crng_init_wait);
pr_notice("random: crng init done\n");
+   } else {
+   spin_unlock_irqrestore(&primary_crng.lock, flags);
}
-   spin_unlock_irqrestore(&primary_crng.lock, flags);
 }
 
 static inline void crng_wait_ready(void)

>  }
>  
> @@ -2041,8 +2041,8 @@ static DEFINE_PER_CPU(struct batched_entropy, 
> batched_entropy_u64);
>  u64 get_random_u64(void)
>  {
>   u64 ret;
> - bool use_lock = crng_init < 2;
> - unsigned long flags;
> + bool use_lock = READ_ONCE(crng_init) < 2;

Are use about that? I am not sure that the gcc will inline "crng_init"
read twice. It is not a local variable. READ_ONCE() is usually used
where gcc could cache a memory access but you do not want this. But hey!
If someone knows better I am here to learn.
One thing that this change does for sure is that crng_init is read very
early in the function while without READ_ONCE it is delayed _after_
arch_get_random_XXX(). So if arch_get_random_XXX() is around and works
you have one read for nothing.

> + unsigned long flags = 0;
>   struct batched_entropy *batch;
>  
>  #if BITS_PER_LONG == 64

Sebastian


Re: [PATCH v5 01/13] random: invalidate batched entropy after crng init

2017-06-16 Thread Sebastian Andrzej Siewior
On 2017-06-15 00:33:12 [+0200], Jason A. Donenfeld wrote:
> There's a potential race that I fixed in my v5 of that patch set, but
> Ted only took v4, and for whatever reason has been to busy to submit
> the additional patch I already posted showing the diff between v4&v5.
> Hopefully he actually gets around to it and sends this for the next
> rc. Here it is:
> 
> https://patchwork.kernel.org/patch/9774563/

So you replace "crng_init < 2" with use_lock instead. That is not what I
am talking about. Again:
add_interrupt_randomness()
->   crng_fast_load()   
spin_trylock_irqsave(&primary_crng.lock, )
->invalidate_batched_entropy()  
write_lock_irqsave(&batched_entropy_reset_lock, );

in that order while the code path
get_random_uXX()
read_lock_irqsave(&batched_entropy_reset_lock, );
->   extract_crng()
->_extract_crng()spin_lock_irqsave(&crng->lock, );

which allocates the same lock in opposite order.
That means
  T1T2
  crng_fast_load()  get_random_u64()
extract_crng()
  *dead lock*
invalidate_batched_entropy()
_extract_crng()

So T1 waits for batched_entropy_reset_lock holding primary_crng.lock and
T2 waits for primary_crng.lock holding batched_entropy_reset_lock.

Sebastian


Re: [PATCH v5 01/13] random: invalidate batched entropy after crng init

2017-06-14 Thread Sebastian Andrzej Siewior
On 2017-06-08 01:25:55 [+0200], Jason A. Donenfeld wrote:
> It's possible that get_random_{u32,u64} is used before the crng has
> initialized, in which case, its output might not be cryptographically
> secure. For this problem, directly, this patch set is introducing the
> *_wait variety of functions, but even with that, there's a subtle issue:
> what happens to our batched entropy that was generated before
> initialization. Prior to this commit, it'd stick around, supplying bad
> numbers. After this commit, we force the entropy to be re-extracted
> after each phase of the crng has initialized.
> 
> In order to avoid a race condition with the position counter, we
> introduce a simple rwlock for this invalidation. Since it's only during
> this awkward transition period, after things are all set up, we stop
> using it, so that it doesn't have an impact on performance.
> 
> This should probably be backported to 4.11.
> 
> (Also: adding my copyright to the top. With the patch series from
> January, this patch, and then the ones that come after, I think there's
> a relevant amount of code in here to add my name to the top.)
> 
> Signed-off-by: Jason A. Donenfeld 
> Cc: Greg Kroah-Hartman 

I don't understand why lockdep notices this possible deadlock only in
RT:

| the existing dependency chain (in reverse order) is:
|
| -> #1 (primary_crng.lock){+.+...}:
|lock_acquire+0xb5/0x2b0
|rt_spin_lock+0x46/0x50
|_extract_crng+0x39/0xa0
|extract_crng+0x3a/0x40
|get_random_u64+0x17a/0x200
|cache_random_seq_create+0x51/0x100
|init_cache_random_seq+0x35/0x90
|__kmem_cache_create+0xd3/0x560
|create_boot_cache+0x8c/0xb2
|create_kmalloc_cache+0x54/0x9f
|create_kmalloc_caches+0xe3/0xfd
|kmem_cache_init+0x14f/0x1f0
|start_kernel+0x1e7/0x3b3
|x86_64_start_reservations+0x2a/0x2c
|x86_64_start_kernel+0x13d/0x14c
|verify_cpu+0x0/0xfc
|
| -> #0 (batched_entropy_reset_lock){+.+...}:
|__lock_acquire+0x11b4/0x1320
|lock_acquire+0xb5/0x2b0
|rt_write_lock+0x26/0x40
|rt_write_lock_irqsave+0x9/0x10
|invalidate_batched_entropy+0x28/0xb0
|crng_fast_load+0xb5/0xe0
|add_interrupt_randomness+0x16c/0x1a0
|irq_thread+0x15c/0x1e0
|kthread+0x112/0x150
|ret_from_fork+0x31/0x40

We have the path add_interrupt_randomness() -> crng_fast_load() which
take
  primary_crng.lock -> batched_entropy_reset_lock
and we have get_random_uXX() -> extract_crng() which take the locks in
opposite order:
  batched_entropy_reset_lock -> crng->lock

however batched_entropy_reset_lock() is only taken if "crng_init < 2" so
it is possible RT hits this constraint more reliably.

> ---
>  drivers/char/random.c | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index a561f0c2f428..d35da1603e12 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1,6 +1,9 @@
>  /*
>   * random.c -- A strong random number generator
>   *
> + * Copyright (C) 2017 Jason A. Donenfeld . All
> + * Rights Reserved.
> + *
>   * Copyright Matt Mackall , 2003, 2004, 2005
>   *
>   * Copyright Theodore Ts'o, 1994, 1995, 1996, 1997, 1998, 1999.  All
> @@ -762,6 +765,8 @@ static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
>  static struct crng_state **crng_node_pool __read_mostly;
>  #endif
>  
> +static void invalidate_batched_entropy(void);
> +
>  static void crng_initialize(struct crng_state *crng)
>  {
>   int i;
> @@ -799,6 +804,7 @@ static int crng_fast_load(const char *cp, size_t len)
>   cp++; crng_init_cnt++; len--;
>   }
>   if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
> + invalidate_batched_entropy();
>   crng_init = 1;
>   wake_up_interruptible(&crng_init_wait);
>   pr_notice("random: fast init done\n");
> @@ -836,6 +842,7 @@ static void crng_reseed(struct crng_state *crng, struct 
> entropy_store *r)
>   memzero_explicit(&buf, sizeof(buf));
>   crng->init_time = jiffies;
>   if (crng == &primary_crng && crng_init < 2) {
> + invalidate_batched_entropy();
>   crng_init = 2;
>   process_random_ready_list();
>   wake_up_interruptible(&crng_init_wait);
> @@ -2023,6 +2030,7 @@ struct batched_entropy {
>   };
>   unsigned int position;
>  };
> +static rwlock_t batched_entropy_reset_lock = 
> __RW_LOCK_UNLOCKED(batched_entropy_reset_lock);
>  
>  /*
>   * Get a random word for internal kernel use only. The quality of the random
> @@ -2033,6 +2041,8 @@ static DEFINE_PER_CPU(struct batched_entropy, 
> batched_entropy_u64);
>  u64 get_random_u64(void)
>  {
>   u64 ret;
> + const bool use_lock = READ_ONCE(crng_init) < 2;
> + unsigned long flags = 0;
>   struct batched_entropy *batch;
>  
>  #if BITS_PER_LONG == 64
> @@ -2045,11

Re: [PATCH] padata: remove recently added module usage from bool code

2016-09-27 Thread Sebastian Andrzej Siewior
On 2016-09-26 21:18:21 [-0400], Paul Gortmaker wrote:
> 
> ...and so it currently is not being built as a module by anyone.

that is correct.

Acked-by: Sebastian Andrzej Siewior 

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/21] padata: Convert to hotplug state machine

2016-09-06 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine. CPU-hotplug multinstance support
is used with the nocalls() version. Maybe parts of padata_alloc() could be
moved into the online callback so that we could invoke ->startup callback for
instance and drop get_online_cpus().

Cc: Steffen Klassert 
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/padata.h |  2 +-
 kernel/padata.c| 92 --
 2 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 113ee626a4dc..0f9e567d5e15 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -151,7 +151,7 @@ struct parallel_data {
  * @flags: padata flags.
  */
 struct padata_instance {
-   struct notifier_blockcpu_notifier;
+   struct hlist_nodenode;
struct workqueue_struct *wq;
struct parallel_data*pd;
struct padata_cpumask   cpumask;
diff --git a/kernel/padata.c b/kernel/padata.c
index 993278895ccc..7848f0566403 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_OBJ_NUM 1000
 
@@ -769,52 +770,43 @@ static inline int pinst_has_cpu(struct padata_instance 
*pinst, int cpu)
cpumask_test_cpu(cpu, pinst->cpumask.cbcpu);
 }
 
-
-static int padata_cpu_callback(struct notifier_block *nfb,
-  unsigned long action, void *hcpu)
+static int padata_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-   int err;
struct padata_instance *pinst;
-   int cpu = (unsigned long)hcpu;
+   int ret;
 
-   pinst = container_of(nfb, struct padata_instance, cpu_notifier);
+   pinst = hlist_entry_safe(node, struct padata_instance, node);
+   if (!pinst_has_cpu(pinst, cpu))
+   return 0;
 
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   if (!pinst_has_cpu(pinst, cpu))
-   break;
-   mutex_lock(&pinst->lock);
-   err = __padata_add_cpu(pinst, cpu);
-   mutex_unlock(&pinst->lock);
-   if (err)
-   return notifier_from_errno(err);
-   break;
-
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   if (!pinst_has_cpu(pinst, cpu))
-   break;
-   mutex_lock(&pinst->lock);
-   err = __padata_remove_cpu(pinst, cpu);
-   mutex_unlock(&pinst->lock);
-   if (err)
-   return notifier_from_errno(err);
-   break;
-   }
-
-   return NOTIFY_OK;
+   mutex_lock(&pinst->lock);
+   ret = __padata_add_cpu(pinst, cpu);
+   mutex_unlock(&pinst->lock);
+   return ret;
 }
+
+static int padata_cpu_prep_down(unsigned int cpu, struct hlist_node *node)
+{
+   struct padata_instance *pinst;
+   int ret;
+
+   pinst = hlist_entry_safe(node, struct padata_instance, node);
+   if (!pinst_has_cpu(pinst, cpu))
+   return 0;
+
+   mutex_lock(&pinst->lock);
+   ret = __padata_remove_cpu(pinst, cpu);
+   mutex_unlock(&pinst->lock);
+   return ret;
+}
+
+static enum cpuhp_state hp_online;
 #endif
 
 static void __padata_free(struct padata_instance *pinst)
 {
 #ifdef CONFIG_HOTPLUG_CPU
-   unregister_hotcpu_notifier(&pinst->cpu_notifier);
+   cpuhp_state_remove_instance_nocalls(hp_online, &pinst->node);
 #endif
 
padata_stop(pinst);
@@ -1012,11 +1004,8 @@ struct padata_instance *padata_alloc(struct 
workqueue_struct *wq,
mutex_init(&pinst->lock);
 
 #ifdef CONFIG_HOTPLUG_CPU
-   pinst->cpu_notifier.notifier_call = padata_cpu_callback;
-   pinst->cpu_notifier.priority = 0;
-   register_hotcpu_notifier(&pinst->cpu_notifier);
+   cpuhp_state_add_instance_nocalls(hp_online, &pinst->node);
 #endif
-
return pinst;
 
 err_free_masks:
@@ -1039,3 +1028,26 @@ void padata_free(struct padata_instance *pinst)
kobject_put(&pinst->kobj);
 }
 EXPORT_SYMBOL(padata_free);
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+static __init int padata_driver_init(void)
+{
+   int ret;
+
+   ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "padata:online",
+ padata_cpu_online,
+ padata_cpu_prep_down);
+   if (ret < 0)
+   return ret;
+   hp_online = ret;
+   return 0;
+}
+module_init(padata_driver_init);
+
+static __exit void padata_driver_exit(void)
+{
+   cpuhp_remove_multi_state(hp_online);
+}
+module

[PATCH] crypto/ccp: remove rwlocks_types.h

2016-05-11 Thread Sebastian Andrzej Siewior
Users of rwlocks should include spinlock.h instead including this
header file. The current users of rwlocks_types.h are internal.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/crypto/ccp/ccp-dev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/ccp/ccp-dev.c b/drivers/crypto/ccp/ccp-dev.c
index 4dbc18727235..2a8ad712a5f2 100644
--- a/drivers/crypto/ccp/ccp-dev.c
+++ b/drivers/crypto/ccp/ccp-dev.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] 3.18.7-rt1

2015-02-16 Thread Sebastian Andrzej Siewior
* Sebastian Andrzej Siewior | 2015-02-16 12:18:22 [+0100]:

>Known issues:
>
>  - xor / raid_pq
>I had max latency jumping up to 67563us on one CPU while the next
>lower max was 58us. I tracked it down to module's init code of
>xor and raid_pq. Both disable preemption while measuring the
>measuring the performance of the individual implementation.

The patch at the bottom gets rid of it.

How important is this preempt_disable() and how likely is that we could
use precomputed priority lists of function instead this of this runtime
check? XOR already prefers AVX based-xor if available and numbers/test
at runtime could be removed.
Is there a case where SSE worse on CPU X better than MMX and this is why
we do it?

diff --git a/crypto/xor.c b/crypto/xor.c
index 35d6b3a..19e20f5 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -70,7 +70,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void 
*b2)
tmpl->next = template_list;
template_list = tmpl;
 
-   preempt_disable();
+   preempt_disable_nort();
 
/*
 * Count the number of XORs done during a whole jiffy, and use
@@ -94,7 +94,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void 
*b2)
max = count;
}
 
-   preempt_enable();
+   preempt_enable_nort();
 
speed = max * (HZ * BENCH_SIZE / 1024);
tmpl->speed = speed;
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 7d0e5cd..e9920d4 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -142,7 +142,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
 
perf = 0;
 
-   preempt_disable();
+   preempt_disable_nort();
j0 = jiffies;
while ((j1 = jiffies) == j0)
cpu_relax();
@@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
(*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
perf++;
}
-   preempt_enable();
+   preempt_enable_nort();
 
if (perf > bestperf) {
bestperf = perf;


Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto/omap aes: select BLKCIPHER2

2012-10-31 Thread Sebastian Andrzej Siewior
without it:
|drivers/built-in.o:(.data+0x14588): undefined reference to 
`crypto_ablkcipher_type'
|drivers/built-in.o:(.data+0x14668): undefined reference to 
`crypto_ablkcipher_type'

Not sure when this broke.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/crypto/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 308c7fb..ac7235a 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -254,6 +254,7 @@ config CRYPTO_DEV_OMAP_AES
tristate "Support for OMAP AES hw engine"
depends on ARCH_OMAP2 || ARCH_OMAP3
select CRYPTO_AES
+   select CRYPTO_BLKCIPHER2
help
  OMAP processors have AES module accelerator. Select this if you
  want to use the OMAP module for AES algorithms.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: now arc needs blockcipher support

2012-07-08 Thread Sebastian Andrzej Siewior
On Wed, Jun 27, 2012 at 02:52:47PM +0800, Herbert Xu wrote:
> > On a side note: do we pull in the blkcipher block mode for each cipher now 
> > to
> > gain some extra performance like the openssl project? I was under the
> > impression that is in general not worth it.
> 
> You mean normal block ciphers? Does it really make that much
> of a difference?

Yes. Jussi added block mode for RC4 instead that auto block-mode that is
prefered over the "automatic" one that is generated otherwise. I don't know
how much performance it brings but I would be supprised if it is a lot on an
average CPU. With this patch in I think it is a matter of time until we get
the AES-CBC & and friends block mode optimized code (which should be a little
faster since comparing to calling a function call for the XOR…) you get the
idea.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto/arc4: now arc needs blockcipher support

2012-06-26 Thread Sebastian Andrzej Siewior
Since commit ce6dd368 ("crypto: arc4 - improve performance by adding
ecb(arc4)) we need to pull in a blkcipher.

|ERROR: "crypto_blkcipher_type" [crypto/arc4.ko] undefined!
|ERROR: "blkcipher_walk_done" [crypto/arc4.ko] undefined!
|ERROR: "blkcipher_walk_virt" [crypto/arc4.ko] undefined!

Signed-off-by: Sebastian Andrzej Siewior 
---

On a side note: do we pull in the blkcipher block mode for each cipher now to
gain some extra performance like the openssl project? I was under the
impression that is in general not worth it.

 crypto/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 2c1c2df..cefbe15 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -593,7 +593,7 @@ config CRYPTO_ANUBIS
 
 config CRYPTO_ARC4
tristate "ARC4 cipher algorithm"
-   select CRYPTO_ALGAPI
+   select CRYPTO_BLKCIPHER
help
  ARC4 cipher algorithm.
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mv_cesa hash functions

2012-02-23 Thread Sebastian Andrzej Siewior
On Wed, Feb 22, 2012 at 09:10:46PM +0100, Nikos Mavrogiannopoulos wrote:
> On 02/22/2012 02:03 PM, Frank wrote:
> 
> > Hi,
> > 
> > After doing some trials with hardware crypto offloading through usermode 
> > interfaces (af_alg and cryptodev) to Marvell CESA accelerated ciphers and 
> > hash functions with the 3.2.4 kernel's mv_cesa in Debian Wheezy on a 
> > Marvell Kirkwood system, I've noticed the following kernel output when I 
> > load the mv_cesa kernel module:
> > 
> > [490889.448060] alg: hash: Test 1 failed for mv-sha1
> > [490889.452786] : c1 94 3f 2e a2 41 ce 88 d5 47 07 43 c4 a8 17 5d
> > [490889.459368] 0010: 77 e8 47 ca
> > [490889.464321] alg: hash: Test 1 failed for mv-hmac-sha1
> > [490889.469493] : 06 71 4d 7c cc cc b5 cf 1b d6 c7 ab d0 25 c4 21
> > [490889.476068] 0010: 66 0b 8e 70
> > Using SHA1 in a ssl/tls handshake fails in tests with mv_cesa loaded, which 
> > might be related to this.
> 
> 
> It might be related. I noticed the same issue in userspace with
> /dev/crypto. It can be solved by adding a 50 ms delay after the
> encryption, decryption and hashing operations. (a delay less than that
> didn't work).

It might a missing cache flush before/after the opration or before passing
the data to userland. I've seen that some hash operation fail which are not
perfromed in one go but via the start, update, end path. However I had no
time to get to the bottom of this.

> 
> regards,
> Nikos

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: kcrypto - (yet another) user space interface

2010-06-11 Thread Sebastian Andrzej Siewior
* Nikos Mavrogiannopoulos | 2010-06-11 09:47:15 [+0200]:

>Sebastian Andrzej Siewior wrote:
>> * Phil Sutter | 2010-06-10 20:22:29 [+0200]:
>
>The problem with right or wrong is that they are only known afterwards.
>For me the right way to go is _to go_. I can see discussions in this
>least, years ago on talks about the "perfect" userspace crypto api and
>rejections implementations because they are not perfect enough. I don't
>believe there is such thing as a perfect crypto api. Other operating
>systems have a userspace crypto API (maybe not perfect) but linux
>hasn't. I don't think this is the way to go.

Phil asked me for my opinion and he got it. The fundumention problems
from what I've seen was the interface:
- kernel structs which are exposed to userland which limit the
  parameters. For instance the iv was limited to 16 bytes while we have
  allready algos with a much longer iv.
- the interface was using write()/poll()/read() and get_user_pages(). I
  pointed out Herbert's opinion about this and the alternative. So this
  _was_ allready discsussed.

>regards,
>Nikos

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: kcrypto - (yet another) user space interface

2010-06-10 Thread Sebastian Andrzej Siewior
* Phil Sutter | 2010-06-10 20:22:29 [+0200]:

>Hello everyone,
Hi Phil,

please take look at [0] and [1]. From README I can tell that those two
posts are different from you have so far.
You might want to take a look at AF_PACKET interface. It does zero copy
via a ring buffer interface of pre-mmaped user memory. So no
get_user_pages() then :)

I think that is the way to go.

[0] http://article.gmane.org/gmane.linux.kernel.cryptoapi/2656
[1] http://article.gmane.org/gmane.linux.kernel.cryptoapi/2658

>Phil

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] crypto: Alchemy AES engine driver

2010-05-07 Thread Sebastian Andrzej Siewior
* Manuel Lauss | 2010-05-06 17:50:45 [+0200]:

A brief look.
>lightly "tested" with the tcrypt module on Au1200;  I have no idea whether
>it really works correctly:
>
># modprobe alchemy-aes
>alg: skcipher: setkey failed on test 2 for ecb-aes-alchemy: flags=20
># modprobe tcrypt mode=10
>alg: skcipher: setkey failed on test 3 for cbc-aes-alchemy: flags=0
>alg: skcipher: Failed to load transform for cbc(aes): -2
>alg: skcipher: Failed to load transform for cbc(aes): -2
>tcrypt: one or more tests failed!
>FATAL: Error inserting tcrypt 
>(/lib/modules/2.6.34-rc6-db1200-00214-g9f84af9/kernel/crypto/tcrypt.ko): 
>Unknown symbol in module, or unknown parameter (see dmesg)

>The error in "test 3 for cbc-aes-alchemy" probably comes from the inability
>to process keys larger than 128 bits.
You have to fix this, you have to be able to handle other keys as well.
In case your hardware does not support it, you have to handle this in
software. Look at the geode driver, via  or s390. All of them have
fallbacks for. If you fail the self test, you driver will no be used
afaik.

>Please have a look.
> Thanks!
>
>diff --git a/drivers/crypto/alchemy-aes.c b/drivers/crypto/alchemy-aes.c
>new file mode 100644
>index 000..14e8ace
>--- /dev/null
>+++ b/drivers/crypto/alchemy-aes.c
>+static int __init alchemy_aes_load(void)
>+{
>+  /* FIXME: hier sollte auch noch der PRId des prozessors getestet
>+   * werden; Au1210 (0x0503) und einige Au1300 haben lt. Daten-
>+   * blatt keine AES engine.
>+   */
You German right? You should handle this in SoC code. So if you figure
out, that you have an engine here you add the device. If you don't have
it you don't do it and the probe call won't be called. Also the module
won't be loaded by udev.

>+  /* need to do 8bit accesses to memory to get correct data */
>+  __alchemy_aes_memid = au1xxx_ddma_add_device(&alchemy_aes_mem_dbdev);
>+  if (!__alchemy_aes_memid)
>+  return -ENODEV;
What does it do? You don't want to add devices here. If you need
something additional do it in SoC code and pass it via platform_data.

>+
>+  return platform_driver_register(&alchemy_aes_driver);
>+}
>+

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test tools for crypt accelerators?

2010-05-05 Thread Sebastian Andrzej Siewior
* Manuel Lauss | 2010-05-05 14:02:15 [+0200]:

>I've written a prototype driver for an AES accelerator; I'd like to test it
>now.  Are there any userspace tools available for this?
modprobe tcrypt mode=10

will test varios blockmodes. There is no userland interface for hw
driver atm.

>
>Thanks!
>  Manuel Lauss

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#552270: Marvell CESA driver and Kirkwood

2010-04-30 Thread Sebastian Andrzej Siewior
* Uri Simchoni | 2010-04-24 21:43:35 [+0300]:

Sorry for the late reply.

>> I enabled list and sg debugging and a flood ping triggered a couple of
>> warning. Could you please look at this?
>Sure.
It seems that everything is working now.

>> IPsec requests authenc(hmac(sha1),cbc(aes)) so right now it reqeusts two
>> cesa provided algorithms. A single ping results in around 30ms RTT.
>Since the CESA does each operation faster than sw (at least when the packet 
>size exceeds some threshold), I see no reason for it to slow the process down. 
>The slowness probably is somehow caused by the same thing that causes the 
>oops, or by debug warning prints.
Yup looks like it.

>> Disabling hmac(sha1) gives me less than 1ms.
>> Implementing authenc() for IPsec should speed things up. Right I'm stuck
>> with hacking DMA support.
>Well, so far I wasn't able to figure out how it all fits together - sure, the 
>CESA can do AES-CBC+HMAC-SHA1 in one run, but I'm not sure it's suitable for 
>IPSec, or that the crypto infrastructure supports a HW driver for combined 
>operation. (the CESA is probably not suitable for SSL because of alignment 
>problems, IPSec is better in that respect).

It does, AEAD is just for this purpose. The FSL talitos driver does
this. Not sure if it is the only one.
I try to hack DMA support before I focus on this.
 
>>> Thanks,
>>> Uri.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/ipsec: don't treat EINPROGRESS as a regular error

2010-04-26 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-04-26 09:17:11 [+0800]:

>OK that was my fault.  Steffen had all the requisite EINPROGRESS
>checks in place but I told him to get rid of them.
>
>This patch should fix it.
Excellent job Herbert, it does solve the problem.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#552270: Marvell CESA driver and Kirkwood

2010-04-24 Thread Sebastian Andrzej Siewior
* Sebastian Andrzej Siewior | 2010-04-24 17:12:07 [+0200]:

>For IPSec I use this[0] shell script which sets up a connection. Good for
[0] http://breakpoint.cc/ipsec.sh

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#552270: Marvell CESA driver and Kirkwood

2010-04-24 Thread Sebastian Andrzej Siewior
* Uri Simchoni | 2010-04-22 06:23:12 [+0300]:

>I have some IPSec background but am not familiar with the Linux implementation 
>(I'm using the mv_cesa for SSL acceleration through a usermode interface I'm 
>working on). Can you point me to the nearest howto? I suppose I could have a 
>look.

If it is possible, please post some patches which describe the user land
interface.

For IPSec I use this[0] shell script which sets up a connection. Good for
testing :) So you need two boxes, start the script on both machines and
the first ping that reached my orion box triggered that error. I just
sent something that looked like a fix.

I enabled list and sg debugging and a flood ping triggered a couple of
warning. Could you please look at this?

IPsec requests authenc(hmac(sha1),cbc(aes)) so right now it reqeusts two
cesa provided algorithms. A single ping results in around 30ms RTT.
Disabling hmac(sha1) gives me less than 1ms.
Implementing authenc() for IPsec should speed things up. Right I'm stuck
with hacking DMA support.

For now I think lowering priority of hmac() should fix the problem. A
direct request "mv-hmac-sha1" should still returned the mv driver. What
do you thing?

Need to run now

>Thanks,
>Uri.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#552270: Marvell CESA driver and Kirkwood

2010-04-21 Thread Sebastian Andrzej Siewior
* L.C. | 2010-04-20 22:45:19 [+0200]:

>Sebastian, here is the OOPS from the latest cryptodev git tree
>(2.6.33), more clue than I thought, it looks?:

No I don't. I look at it this weekend. I need just to setup IPsec in
order to reproduce this, right?

>cut
>flip> ping 10.10.10.230
>PING 10.10.10.23[51252.081262] Unable to handle kernel NULL pointer
>dereference at virtual address 0034
>0 (10.10.10.230)[51252.090530] pgd = c0004000
> 56(84) bytes of[51252.094632] [0034] *pgd= data.
>
>[51252.100326] Internal error: Oops: 17 [#1]
>[51252.104351] last sysfs file: /sys/module/ccm/initstate
>[51252.109514] Modules linked in: xfrm_user ah6 ah4 esp6 esp4
>xfrm4_mode_beet xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport
>xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel
>ipcomp ipcomp6 xfrm6_tunnel af_key mv_cesa authenc ctr camellia cast5
>rmd160 crypto_null ccm serpent blowfish twofish twofish_common ecb
>xcbc cbc sha256_generic sha512_generic des_generic tunnel4
>xfrm_ipcomp tunnel6 autofs4 8021q garp stp ipv6 ext2 loop hmac
>sha1_generic aes_generic ext3 jbd mbcache mmc_block ehci_hcd mvsdio
>usbcore mv643xx_eth mmc_core nls_base libphy [last unloaded: af_key]
>[51252.162037] CPU: 0Not tainted  (2.6.33 #1)
>[51252.166509] PC is at xfrm_output_resume+0x140/0x35c
>[51252.171419] LR is at authenc_geniv_ahash_done+0x4c/0x50 [authenc]
>[51252.177541] pc : []lr : []psr: 6013
>[51252.177547] sp : c086bf48  ip : 0014  fp : 0178
>[51252.189084] r10: 0170  r9 : df4a7618  r8 : 
>[51252.194334] r7 :   r6 :   r5 :   r4 : df4a7600
>[51252.200891] r3 :   r2 : df621800  r1 :   r0 : df4a7600
>[51252.207449] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
>Segment kernel
>[51252.214791] Control: 0005397f  Table: 0097c000  DAC: 0017
>[51252.220564] Process mv_crypto (pid: 4702, stack limit = 0xc086a270)
>[51252.226860] Stack: (0xc086bf48 to 0xc086c000)
>[51252.231242] bf40:   000c c017f364 c086bf60
>df6218a8 08ac c017f548
>[51252.239465] bf60: df621800 df621840  bf2a8d90 def796c0
>df6218f8 c086a000 
>[51252.247688] bf80: def796ec bf2872e8 0001  0100
> 0078 bf2a814c
>[51252.255912] bfa0: def796c0 def796ec a013 dfeede58 c086bfd4
>bf2a7fe4 def796c0 
>[51252.264134] bfc0:    c005a848 
> c086bfd8 c086bfd8
>[51252.272356] bfe0:     
>c0026dec  
>[51252.280591] [] (xfrm_output_resume+0x140/0x35c) from
>[] (authenc_geniv_ahash_done+0x4c/0x50 [authenc])
>[51252.291709] [] (authenc_geniv_ahash_done+0x4c/0x50
>[authenc]) from [] (queue_manag+0x168/0x4f0 [mv_cesa])
>[51252.303082] [] (queue_manag+0x168/0x4f0 [mv_cesa]) from
>[] (kthread+0x78/0x80)
>[51252.312091] [] (kthread+0x78/0x80) from []
>(kernel_thread_exit+0x0/0x8)
>[51252.320483] Code: e1a06000 ea7f e5947048 e356 (e5975034)
>[51252.326631] ---[ end trace 0fd0f54982ede5bf ]---
>[51252.331284] Kernel panic - not syncing: Fatal exception in interrupt
>[51252.337690] [] (unwind_backtrace+0x0/0xd8) from
>[] (panic+0x40/0x120)
>[51252.345914] [] (panic+0x40/0x120) from []
>(die+0x2d4/0x32c)
>[51252.353277] [] (die+0x2d4/0x32c) from []
>(__do_kernel_fault+0x64/0x88)
>[51252.361603] [] (__do_kernel_fault+0x64/0x88) from
>[] (do_page_fault+0x22c/0x248)
>[51252.370792] [] (do_page_fault+0x22c/0x248) from
>[] (do_DataAbort+0x34/0x94)
>[51252.379545] [] (do_DataAbort+0x34/0x94) from
>[] (__dabt_svc+0x4c/0x60)
>[51252.387857] Exception stack(0xc086bf00 to 0xc086bf48)
>[51252.392949] bf00: df4a7600  df621800  df4a7600
>  
>[51252.401180] bf20:  df4a7618 0170 0178 0014
>c086bf48 bf2872e8 c02a207c
>[51252.409404] bf40: 6013 
>[51252.412920] [] (__dabt_svc+0x4c/0x60) from []
>(xfrm_output_resume+0x140/0x35c)
>[51252.421941] [] (xfrm_output_resume+0x140/0x35c) from
>[] (authenc_geniv_ahash_done+0x4c/0x50 [authenc])
>[51252.433064] [] (authenc_geniv_ahash_done+0x4c/0x50
>[authenc]) from [] (queue_manag+0x168/0x4f0 [mv_cesa])
>[51252.50] [] (queue_manag+0x168/0x4f0 [mv_cesa]) from
>[] (kthread+0x78/0x80)
>[51252.453468] [] (kthread+0x78/0x80) from []
>(kernel_thread_exit+0x0/0x8)
>cut

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Crypto: geode-aes: Fix some code style issues

2010-04-16 Thread Sebastian Andrzej Siewior
* Chihau Chau | 2010-04-15 13:17:59 [-0400]:

>From: Chihau Chau 
>
>This fixes some code style issues like:

looks good

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] crypto/testmgr: add testing for arc4 based on ecb(arc4)

2010-04-08 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-04-07 17:29:07 [+0800]:

>Sebastian, how about precomputing the IV and provide them directly
>as a hex array?
>
>To test arc4_setup_iv itself, you can add an alg_test_arc4 function
>(like alg_test_crc32) that tests IV generation specifically.
>
>Alternatively, just add an alg_test_arc4 that computes the IV
>before calling alg_test_skcipher.

I take a look at this.

>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: add blkcipher implementation of ARC4

2010-04-07 Thread Sebastian Andrzej Siewior
* Pavel Roskin | 2010-04-07 02:19:55 [-0400]:

>On Mon, 2010-04-05 at 19:04 +0200, Sebastian Andrzej Siewior wrote:
>
>> +module_init(arc4_init);
>> +module_exit(arc4_exit);
>
>I'm feelings uneasy about using the same module init/exit functions
>names in arc4blk.c and arc4cip.c.
>
>Even though it doesn't break for me on x86_64 (whether I'm compiling
>modules or a solid kernel), and even though the potential name conflict
Take a look at
- sd_mod_init
- via_init
- watchdog_init

just to name a few. There is no conflict because those functions are not
global. The only problem you have is in the backtrace since you can't
distinguish them.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: add blkcipher implementation of ARC4

2010-04-07 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-04-07 08:31:09 [+0800]:

>On Tue, Apr 06, 2010 at 10:30:02PM +0200, Sebastian Andrzej Siewior wrote:
>>
>> Good point. All arc4 users don't care about return value of setkey so I
>> think that I just change void to int add the check for the valid key
>> length.
>
>Actually, how about getting arc4_setup_iv to do all the legwork
>and turn it into a real IV? Then we don't need any checks on the
>data path.
So arc4_setup_iv() should do what the internal arc4_ivsetup() does and
we change void to int and check the keysize in there right? The problem
here is that we are bounded to *this* implementation of the algorithm
and are not able to replace it with a different implementation. Not that
this is likely to happen for RC4 but it may be true for other stream
ciphers.

>> While we are here, the .setkey() callback could be removed, couldn't it?
>> It returns 0 even it is doing nothing what looks kinda wrong. However it
>> shouldn't be called at all since min/max key is 0. Any objections on
>> that?
>
>I'm pretty sure testmgr will call setkey even for keylen == 0, no?
Prior patch #3 it has no test case so it should not test it at all.
Patch #3 adds a flag in order to distinguish it. You want to look at
patch #3 now :)

>
>Thanks,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: add blkcipher implementation of ARC4

2010-04-06 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-04-06 20:44:12 [+0800]:

>On Mon, Apr 05, 2010 at 07:04:06PM +0200, Sebastian Andrzej Siewior wrote:
>>
>> +static void arc4_key_to_iv(const u8 *in_key, u32 key_len, struct arc4_iv 
>> *iv)
>> +{
>> +int i, j = 0, k = 0;
>> +
>> +iv->iv.x = 1;
>> +iv->iv.y = 0;
>> +
>> +for (i = 0; i < 256; i++)
>> +iv->iv.S[i] = i;
>> +
>> +for (i = 0; i < 256; i++)
>> +{
>> +u8 a = iv->iv.S[i];
>> +j = (j + in_key[k] + a) & 0xff;
>> +iv->iv.S[i] = iv->iv.S[j];
>> +iv->iv.S[j] = a;
>> +if (++k >= key_len)
>> +k = 0;
>> +}
>> +}
>> +
>> +static void arc4_ivsetup(struct arc4_iv *iv)
>> +{
>> +struct arc4_iv tmp_iv;
>> +
>> +if (iv->type == ARC4_TYPE_IV)
>> +return;
>> +
>> +memcpy(&tmp_iv, iv, sizeof(tmp_iv));
>> +arc4_key_to_iv(tmp_iv.key.key, tmp_iv.key.key_len, iv);
>> +iv->type = ARC4_TYPE_IV;
>> +}
>
>We need to verify that 1 <= key_len <= 256.
Good point. All arc4 users don't care about return value of setkey so I
think that I just change void to int add the check for the valid key
length.

While we are here, the .setkey() callback could be removed, couldn't it?
It returns 0 even it is doing nothing what looks kinda wrong. However it
shouldn't be called at all since min/max key is 0. Any objections on
that?

>
>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] crypto: add blkcipher implementation of ARC4

2010-04-05 Thread Sebastian Andrzej Siewior
This is a pure blkcipher implementation of ARC4. The internal state is
saved within an IV which is supplied by the user. The goal is that the
cipher does not change its internal state now, only the iv changes during
encryption.

Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/Kconfig|   13 
 crypto/Makefile   |1 +
 crypto/arc4blk.c  |  150 +
 crypto/testmgr.h  |3 +-
 include/crypto/arc4.h |   26 +
 5 files changed, 192 insertions(+), 1 deletions(-)
 create mode 100644 crypto/arc4blk.c
 create mode 100644 include/crypto/arc4.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 81c185a..be9add2 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -570,6 +570,19 @@ config CRYPTO_ARC4
  WEP, but it should not be for other purposes because of the
  weakness of the algorithm.
 
+config CRYPTO_ARC4BLK
+   tristate "ARC4 cipher algorithm (alternative implemenation)"
+   select CRYPTO_BLKCIPHER
+   help
+ ARC4 cipher algorithm. This is an alternative ARC4 implementation 
which
+ will replace the other ARC4 implementation once all in-kernel users 
are
+ converted.
+
+ ARC4 is a stream cipher using keys ranging from 8 bits to 2048
+ bits in length.  This algorithm is required for driver-based
+ WEP, but it should not be for other purposes because of the
+ weakness of the algorithm.
+
 config CRYPTO_BLOWFISH
tristate "Blowfish cipher algorithm"
select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index 1f15112..11300e3 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6.o
 obj-$(CONFIG_CRYPTO_ARC4) += arc4cip.o
+obj-$(CONFIG_CRYPTO_ARC4BLK) += arc4blk.o
 obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
diff --git a/crypto/arc4blk.c b/crypto/arc4blk.c
new file mode 100644
index 000..bdf938a
--- /dev/null
+++ b/crypto/arc4blk.c
@@ -0,0 +1,150 @@
+/*
+ * Cryptographic API
+ *
+ * ARC4 Cipher Algorithm
+ *
+ * Jon Oberheide 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#define ARC4_MIN_KEY_SIZE  1
+#define ARC4_MAX_KEY_SIZE  256
+#define ARC4_BLOCK_SIZE1
+
+static int arc4_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+   unsigned int key_len)
+{
+   /*
+* ARC4 is special: The user should supply an IV as struct arc4_iv and
+* fill either the key or the iv.
+*/
+   return 0;
+}
+
+static void arc4_key_to_iv(const u8 *in_key, u32 key_len, struct arc4_iv *iv)
+{
+   int i, j = 0, k = 0;
+
+   iv->iv.x = 1;
+   iv->iv.y = 0;
+
+   for (i = 0; i < 256; i++)
+   iv->iv.S[i] = i;
+
+   for (i = 0; i < 256; i++)
+   {
+   u8 a = iv->iv.S[i];
+   j = (j + in_key[k] + a) & 0xff;
+   iv->iv.S[i] = iv->iv.S[j];
+   iv->iv.S[j] = a;
+   if (++k >= key_len)
+   k = 0;
+   }
+}
+
+static void arc4_ivsetup(struct arc4_iv *iv)
+{
+   struct arc4_iv tmp_iv;
+
+   if (iv->type == ARC4_TYPE_IV)
+   return;
+
+   memcpy(&tmp_iv, iv, sizeof(tmp_iv));
+   arc4_key_to_iv(tmp_iv.key.key, tmp_iv.key.key_len, iv);
+   iv->type = ARC4_TYPE_IV;
+}
+
+static int arc4_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+   struct scatterlist *src, unsigned int nbytes)
+{
+   struct blkcipher_walk walk;
+   struct arc4_iv *aiv;
+   u8 *S;
+   u8 x;
+   u8 y;
+   u8 a, b;
+   int ret;
+
+   blkcipher_walk_init(&walk, dst, src, nbytes);
+   ret = blkcipher_walk_virt(desc, &walk);
+   if (ret)
+   return ret;
+
+   aiv = (struct arc4_iv *)walk.iv;
+   arc4_ivsetup(aiv);
+
+   S = aiv->iv.S;
+   x = aiv->iv.x;
+   y = aiv->iv.y;
+
+   while (walk.nbytes) {
+   u8 *in = walk.src.virt.addr;
+   u8 *out = walk.dst.virt.addr;
+   u32 i;
+
+   for (i = 0; i < walk.nbytes; i++) {
+   a = S[x];
+   y = (y + a) & 0xff;
+   b = S[y];
+   S[x] = b;
+   S[y] = a;
+   x = (x + 1) & 0xff;
+   *out = *in ^ S[(a + b) & 0xff];
+
+   in++;
+   out++;
+   

[PATCH 6/7] net/mac80211: convert wep from arc4 to arc4blk

2010-04-03 Thread Sebastian Andrzej Siewior
ecb(arc4) is getting replaced by arc4 which is a blkcipher by itself.

Signed-off-by: Sebastian Andrzej Siewior 
---
 net/mac80211/Kconfig |3 +--
 net/mac80211/wep.c   |   11 +++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index a10d508..d14fe06 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -2,8 +2,7 @@ config MAC80211
tristate "Generic IEEE 802.11 Networking Stack (mac80211)"
depends on CFG80211
select CRYPTO
-   select CRYPTO_ECB
-   select CRYPTO_ARC4
+   select CRYPTO_ARC4BLK
select CRYPTO_AES
select CRC32
---help---
diff --git a/net/mac80211/wep.c b/net/mac80211/wep.c
index 247123f..4314b50 100644
--- a/net/mac80211/wep.c
+++ b/net/mac80211/wep.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -29,12 +30,12 @@ int ieee80211_wep_init(struct ieee80211_local *local)
/* start WEP IV from a random value */
get_random_bytes(&local->wep_iv, WEP_IV_LEN);
 
-   local->wep_tx_tfm = crypto_alloc_blkcipher("ecb(arc4)", 0,
+   local->wep_tx_tfm = crypto_alloc_blkcipher("arc4", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(local->wep_tx_tfm))
return PTR_ERR(local->wep_tx_tfm);
 
-   local->wep_rx_tfm = crypto_alloc_blkcipher("ecb(arc4)", 0,
+   local->wep_rx_tfm = crypto_alloc_blkcipher("arc4", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(local->wep_rx_tfm)) {
crypto_free_blkcipher(local->wep_tx_tfm);
@@ -125,13 +126,14 @@ void ieee80211_wep_encrypt_data(struct crypto_blkcipher 
*tfm, u8 *rc4key,
size_t klen, u8 *data, size_t data_len)
 {
struct blkcipher_desc desc = { .tfm = tfm };
+   struct arc4_iv *iv = crypto_blkcipher_crt(tfm)->iv;
struct scatterlist sg;
__le32 icv;
 
icv = cpu_to_le32(~crc32_le(~0, data, data_len));
put_unaligned(icv, (__le32 *)(data + data_len));
 
-   crypto_blkcipher_setkey(tfm, rc4key, klen);
+   arc4_setup_iv(iv, rc4key, klen);
sg_init_one(&sg, data, data_len + WEP_ICV_LEN);
crypto_blkcipher_encrypt(&desc, &sg, &sg, sg.length);
 }
@@ -181,10 +183,11 @@ int ieee80211_wep_decrypt_data(struct crypto_blkcipher 
*tfm, u8 *rc4key,
   size_t klen, u8 *data, size_t data_len)
 {
struct blkcipher_desc desc = { .tfm = tfm };
+   struct arc4_iv *iv = crypto_blkcipher_crt(tfm)->iv;
struct scatterlist sg;
__le32 crc;
 
-   crypto_blkcipher_setkey(tfm, rc4key, klen);
+   arc4_setup_iv(iv, rc4key, klen);
sg_init_one(&sg, data, data_len + WEP_ICV_LEN);
crypto_blkcipher_decrypt(&desc, &sg, &sg, sg.length);
 
-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] crypto: add blkcipher implementation of ARC4

2010-04-03 Thread Sebastian Andrzej Siewior
This is a pure blkcipher implementation of ARC4. The internal state is
saved within an IV which is supplied by the user. The goal is that the
cipher does not change its internal state now, only the iv changes during
encryption.

Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/Kconfig   |   13 +
 crypto/Makefile  |1 +
 crypto/arc4blk.c |  150 ++
 crypto/testmgr.h |3 +-
 4 files changed, 166 insertions(+), 1 deletions(-)
 create mode 100644 crypto/arc4blk.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 81c185a..be9add2 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -570,6 +570,19 @@ config CRYPTO_ARC4
  WEP, but it should not be for other purposes because of the
  weakness of the algorithm.
 
+config CRYPTO_ARC4BLK
+   tristate "ARC4 cipher algorithm (alternative implemenation)"
+   select CRYPTO_BLKCIPHER
+   help
+ ARC4 cipher algorithm. This is an alternative ARC4 implementation 
which
+ will replace the other ARC4 implementation once all in-kernel users 
are
+ converted.
+
+ ARC4 is a stream cipher using keys ranging from 8 bits to 2048
+ bits in length.  This algorithm is required for driver-based
+ WEP, but it should not be for other purposes because of the
+ weakness of the algorithm.
+
 config CRYPTO_BLOWFISH
tristate "Blowfish cipher algorithm"
select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index 1f15112..11300e3 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6.o
 obj-$(CONFIG_CRYPTO_ARC4) += arc4cip.o
+obj-$(CONFIG_CRYPTO_ARC4BLK) += arc4blk.o
 obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
diff --git a/crypto/arc4blk.c b/crypto/arc4blk.c
new file mode 100644
index 000..bdf938a
--- /dev/null
+++ b/crypto/arc4blk.c
@@ -0,0 +1,150 @@
+/*
+ * Cryptographic API
+ *
+ * ARC4 Cipher Algorithm
+ *
+ * Jon Oberheide 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#define ARC4_MIN_KEY_SIZE  1
+#define ARC4_MAX_KEY_SIZE  256
+#define ARC4_BLOCK_SIZE1
+
+static int arc4_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+   unsigned int key_len)
+{
+   /*
+* ARC4 is special: The user should supply an IV as struct arc4_iv and
+* fill either the key or the iv.
+*/
+   return 0;
+}
+
+static void arc4_key_to_iv(const u8 *in_key, u32 key_len, struct arc4_iv *iv)
+{
+   int i, j = 0, k = 0;
+
+   iv->iv.x = 1;
+   iv->iv.y = 0;
+
+   for (i = 0; i < 256; i++)
+   iv->iv.S[i] = i;
+
+   for (i = 0; i < 256; i++)
+   {
+   u8 a = iv->iv.S[i];
+   j = (j + in_key[k] + a) & 0xff;
+   iv->iv.S[i] = iv->iv.S[j];
+   iv->iv.S[j] = a;
+   if (++k >= key_len)
+   k = 0;
+   }
+}
+
+static void arc4_ivsetup(struct arc4_iv *iv)
+{
+   struct arc4_iv tmp_iv;
+
+   if (iv->type == ARC4_TYPE_IV)
+   return;
+
+   memcpy(&tmp_iv, iv, sizeof(tmp_iv));
+   arc4_key_to_iv(tmp_iv.key.key, tmp_iv.key.key_len, iv);
+   iv->type = ARC4_TYPE_IV;
+}
+
+static int arc4_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+   struct scatterlist *src, unsigned int nbytes)
+{
+   struct blkcipher_walk walk;
+   struct arc4_iv *aiv;
+   u8 *S;
+   u8 x;
+   u8 y;
+   u8 a, b;
+   int ret;
+
+   blkcipher_walk_init(&walk, dst, src, nbytes);
+   ret = blkcipher_walk_virt(desc, &walk);
+   if (ret)
+   return ret;
+
+   aiv = (struct arc4_iv *)walk.iv;
+   arc4_ivsetup(aiv);
+
+   S = aiv->iv.S;
+   x = aiv->iv.x;
+   y = aiv->iv.y;
+
+   while (walk.nbytes) {
+   u8 *in = walk.src.virt.addr;
+   u8 *out = walk.dst.virt.addr;
+   u32 i;
+
+   for (i = 0; i < walk.nbytes; i++) {
+   a = S[x];
+   y = (y + a) & 0xff;
+   b = S[y];
+   S[x] = b;
+   S[y] = a;
+   x = (x + 1) & 0xff;
+   *out = *in ^ S[(a + b) & 0xff];
+
+   in++;
+   out++;
+   }
+   ret = blkcipher_walk_done(desc, &walk, 0);
+ 

[PATCH 7/7] net/ppp_mppe: convert from arc4 to arc4blk

2010-04-03 Thread Sebastian Andrzej Siewior
ecb(arc4) is getting replaced by arc4 which is a blkcipher by itself

Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/net/Kconfig|3 +--
 drivers/net/ppp_mppe.c |   12 ++--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index dd9a09c..4b5dd86 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3075,8 +3075,7 @@ config PPP_MPPE
depends on PPP && EXPERIMENTAL
select CRYPTO
select CRYPTO_SHA1
-   select CRYPTO_ARC4
-   select CRYPTO_ECB
+   select CRYPTO_ARC4BLK
---help---
  Support for the MPPE Encryption protocol, as employed by the
 Microsoft Point-to-Point Tunneling Protocol.
diff --git a/drivers/net/ppp_mppe.c b/drivers/net/ppp_mppe.c
index 6d1a1b8..4deaf70 100644
--- a/drivers/net/ppp_mppe.c
+++ b/drivers/net/ppp_mppe.c
@@ -42,7 +42,6 @@
  *MOD_DEC_USAGE_COUNT/MOD_INC_USAGE_COUNT which are
  *deprecated in 2.6
  */
-
 #include 
 #include 
 #include 
@@ -55,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ppp_mppe.h"
 
@@ -162,11 +162,11 @@ static void mppe_rekey(struct ppp_mppe_state * state, int 
initial_key)
 {
struct scatterlist sg_in[1], sg_out[1];
struct blkcipher_desc desc = { .tfm = state->arc4 };
+   struct arc4_iv *iv = crypto_blkcipher_crt(state->arc4)->iv;
 
get_new_key_from_sha(state);
if (!initial_key) {
-   crypto_blkcipher_setkey(state->arc4, state->sha1_digest,
-   state->keylen);
+   arc4_setup_iv(iv, state->sha1_digest, state->keylen);
sg_init_table(sg_in, 1);
sg_init_table(sg_out, 1);
setup_sg(sg_in, state->sha1_digest, state->keylen);
@@ -184,7 +184,7 @@ static void mppe_rekey(struct ppp_mppe_state * state, int 
initial_key)
state->session_key[1] = 0x26;
state->session_key[2] = 0x9e;
}
-   crypto_blkcipher_setkey(state->arc4, state->session_key, state->keylen);
+   arc4_setup_iv(iv, state->session_key, state->keylen);
 }
 
 /*
@@ -204,7 +204,7 @@ static void *mppe_alloc(unsigned char *options, int optlen)
goto out;
 
 
-   state->arc4 = crypto_alloc_blkcipher("ecb(arc4)", 0, CRYPTO_ALG_ASYNC);
+   state->arc4 = crypto_alloc_blkcipher("arc4", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(state->arc4)) {
state->arc4 = NULL;
goto out_free;
@@ -712,7 +712,7 @@ static struct compressor ppp_mppe = {
 static int __init ppp_mppe_init(void)
 {
int answer;
-   if (!(crypto_has_blkcipher("ecb(arc4)", 0, CRYPTO_ALG_ASYNC) &&
+   if (!(crypto_has_blkcipher("arc4", 0, CRYPTO_ALG_ASYNC) &&
  crypto_has_hash("sha1", 0, CRYPTO_ALG_ASYNC)))
return -ENODEV;
 
-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] crypto/testmgr: add testing for arc4 based on ecb(arc4)

2010-04-03 Thread Sebastian Andrzej Siewior
Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/testmgr.c |   60 +-
 1 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 7620bfc..c471e04 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include "testmgr.h"
@@ -44,6 +45,7 @@
 /*
 * Used by test_cipher()
 */
+#define CRYPT_ARC4 2
 #define ENCRYPT 1
 #define DECRYPT 0
 
@@ -717,7 +719,7 @@ out_nobuf:
return ret;
 }
 
-static int test_skcipher(struct crypto_ablkcipher *tfm, int enc,
+static int test_skcipher(struct crypto_ablkcipher *tfm, int mode,
 struct cipher_testvec *template, unsigned int tcount)
 {
const char *algo =
@@ -736,7 +738,7 @@ static int test_skcipher(struct crypto_ablkcipher *tfm, int 
enc,
if (testmgr_alloc_buf(xbuf))
goto out_nobuf;
 
-   if (enc == ENCRYPT)
+   if (mode == ENCRYPT)
e = "encryption";
else
e = "decryption";
@@ -775,7 +777,11 @@ static int test_skcipher(struct crypto_ablkcipher *tfm, 
int enc,
crypto_ablkcipher_set_flags(
tfm, CRYPTO_TFM_REQ_WEAK_KEY);
 
-   ret = crypto_ablkcipher_setkey(tfm, template[i].key,
+   if (mode == CRYPT_ARC4)
+   arc4_setup_iv((struct arc4_iv *)iv,
+   template[i].key, template[i].klen);
+   else
+   ret = crypto_ablkcipher_setkey(tfm, 
template[i].key,
   template[i].klen);
if (!ret == template[i].fail) {
printk(KERN_ERR "alg: skcipher: setkey failed "
@@ -789,7 +795,7 @@ static int test_skcipher(struct crypto_ablkcipher *tfm, int 
enc,
 
ablkcipher_request_set_crypt(req, sg, sg,
 template[i].ilen, iv);
-   ret = enc ?
+   ret = mode ?
crypto_ablkcipher_encrypt(req) :
crypto_ablkcipher_decrypt(req);
 
@@ -839,7 +845,11 @@ static int test_skcipher(struct crypto_ablkcipher *tfm, 
int enc,
crypto_ablkcipher_set_flags(
tfm, CRYPTO_TFM_REQ_WEAK_KEY);
 
-   ret = crypto_ablkcipher_setkey(tfm, template[i].key,
+   if (mode == CRYPT_ARC4)
+   arc4_setup_iv((struct arc4_iv *)iv,
+   template[i].key, template[i].klen);
+   else
+   ret = crypto_ablkcipher_setkey(tfm, 
template[i].key,
   template[i].klen);
if (!ret == template[i].fail) {
printk(KERN_ERR "alg: skcipher: setkey failed "
@@ -876,7 +886,7 @@ static int test_skcipher(struct crypto_ablkcipher *tfm, int 
enc,
ablkcipher_request_set_crypt(req, sg, sg,
template[i].ilen, iv);
 
-   ret = enc ?
+   ret = mode ?
crypto_ablkcipher_encrypt(req) :
crypto_ablkcipher_decrypt(req);
 
@@ -1316,11 +1326,12 @@ out:
return err;
 }
 
-static int alg_test_skcipher(const struct alg_test_desc *desc,
-const char *driver, u32 type, u32 mask)
+static int _alg_test_skcipher(const struct alg_test_desc *desc,
+const char *driver, u32 type, u32 mask, u32 arc4)
 {
struct crypto_ablkcipher *tfm;
int err = 0;
+   int mode;
 
tfm = crypto_alloc_ablkcipher(driver, type, mask);
if (IS_ERR(tfm)) {
@@ -1329,15 +1340,17 @@ static int alg_test_skcipher(const struct alg_test_desc 
*desc,
return PTR_ERR(tfm);
}
 
+   mode = arc4 ? CRYPT_ARC4 : ENCRYPT;
if (desc->suite.cipher.enc.vecs) {
-   err = test_skcipher(tfm, ENCRYPT, desc->suite.cipher.enc.vecs,
+   err = test_skcipher(tfm, mode , desc->suite.cipher.enc.vecs,
desc->suite.cipher.enc.count);
if (err)
goto out;
}
 
+   mode = arc4 ? CRYPT_ARC4 : DECRYPT;
if (desc->suite.cipher.dec.vecs)
-   err = test_skcipher(tfm, DECRYPT, desc->suite.cipher.dec.vecs,
+   err = test_skcipher(tfm, mode, desc->suite.cipher.dec.vecs,
desc->suite.cip

[PATCH 4/7] net/wireless: switch lib80211_crypt_tkip from arc4 to arc4blk

2010-04-03 Thread Sebastian Andrzej Siewior
ecb(arc4) is getting replaced by arc4 which is a blkcipher by itself. The
required selects are now pulled in by LIB80211_CRYPT_TKIP instead of
selecting it by every driver.

Signed-off-by: Sebastian Andrzej Siewior 
---
 net/wireless/Kconfig   |2 ++
 net/wireless/lib80211_crypt_tkip.c |   11 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig
index 90e93a5..119f1eb 100644
--- a/net/wireless/Kconfig
+++ b/net/wireless/Kconfig
@@ -147,6 +147,8 @@ config LIB80211_CRYPT_CCMP
tristate
 
 config LIB80211_CRYPT_TKIP
+   select CRYPTO
+   select CRYPTO_ARC4BLK
tristate
 
 config LIB80211_DEBUG
diff --git a/net/wireless/lib80211_crypt_tkip.c 
b/net/wireless/lib80211_crypt_tkip.c
index c362873..089f84f 100644
--- a/net/wireless/lib80211_crypt_tkip.c
+++ b/net/wireless/lib80211_crypt_tkip.c
@@ -28,6 +28,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include 
@@ -94,7 +95,7 @@ static void *lib80211_tkip_init(int key_idx)
 
priv->key_idx = key_idx;
 
-   priv->tx_tfm_arc4 = crypto_alloc_blkcipher("ecb(arc4)", 0,
+   priv->tx_tfm_arc4 = crypto_alloc_blkcipher("arc4", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->tx_tfm_arc4)) {
printk(KERN_DEBUG "lib80211_crypt_tkip: could not allocate "
@@ -112,7 +113,7 @@ static void *lib80211_tkip_init(int key_idx)
goto fail;
}
 
-   priv->rx_tfm_arc4 = crypto_alloc_blkcipher("ecb(arc4)", 0,
+   priv->rx_tfm_arc4 = crypto_alloc_blkcipher("arc4", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->rx_tfm_arc4)) {
printk(KERN_DEBUG "lib80211_crypt_tkip: could not allocate "
@@ -360,6 +361,7 @@ static int lib80211_tkip_encrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 {
struct lib80211_tkip_data *tkey = priv;
struct blkcipher_desc desc = { .tfm = tkey->tx_tfm_arc4 };
+   struct arc4_iv *iv = crypto_blkcipher_crt(tkey->tx_tfm_arc4)->iv;
int len;
u8 rc4key[16], *pos, *icv;
u32 crc;
@@ -392,7 +394,7 @@ static int lib80211_tkip_encrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
icv[2] = crc >> 16;
icv[3] = crc >> 24;
 
-   crypto_blkcipher_setkey(tkey->tx_tfm_arc4, rc4key, 16);
+   arc4_setup_iv(iv, rc4key, 16);
sg_init_one(&sg, pos, len + 4);
return crypto_blkcipher_encrypt(&desc, &sg, &sg, len + 4);
 }
@@ -414,6 +416,7 @@ static int lib80211_tkip_decrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 {
struct lib80211_tkip_data *tkey = priv;
struct blkcipher_desc desc = { .tfm = tkey->rx_tfm_arc4 };
+   struct arc4_iv *iv = crypto_blkcipher_crt(tkey->rx_tfm_arc4)->iv;
u8 rc4key[16];
u8 keyidx, *pos;
u32 iv32;
@@ -485,7 +488,7 @@ static int lib80211_tkip_decrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 
plen = skb->len - hdr_len - 12;
 
-   crypto_blkcipher_setkey(tkey->rx_tfm_arc4, rc4key, 16);
+   arc4_setup_iv(iv, rc4key, 16);
sg_init_one(&sg, pos, plen + 4);
if (crypto_blkcipher_decrypt(&desc, &sg, &sg, plen + 4)) {
if (net_ratelimit()) {
-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] net/wireless: switch lib80211_crypt_wep from arc4 to arc4blk

2010-04-03 Thread Sebastian Andrzej Siewior
ecb(arc4) is getting replaced by arc4 which is a blkcipher by itself. The
required selects are now pulled in by LIB80211_CRYPT_WEP instead of
selecting it by every driver. Since there is no dependency on ecb and arc4
therr are removed from the idividual driver.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/net/wireless/hostap/Kconfig  |3 ---
 drivers/net/wireless/ipw2x00/Kconfig |3 ---
 net/wireless/Kconfig |2 ++
 net/wireless/lib80211_crypt_wep.c|   11 +++
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/hostap/Kconfig 
b/drivers/net/wireless/hostap/Kconfig
index 287d827..dab2c6b 100644
--- a/drivers/net/wireless/hostap/Kconfig
+++ b/drivers/net/wireless/hostap/Kconfig
@@ -4,11 +4,8 @@ config HOSTAP
select WEXT_SPY
select WEXT_PRIV
select CRYPTO
-   select CRYPTO_ARC4
-   select CRYPTO_ECB
select CRYPTO_AES
select CRYPTO_MICHAEL_MIC
-   select CRYPTO_ECB
select CRC32
select LIB80211
select LIB80211_CRYPT_WEP
diff --git a/drivers/net/wireless/ipw2x00/Kconfig 
b/drivers/net/wireless/ipw2x00/Kconfig
index 2715b10..6fe1995 100644
--- a/drivers/net/wireless/ipw2x00/Kconfig
+++ b/drivers/net/wireless/ipw2x00/Kconfig
@@ -158,11 +158,8 @@ config LIBIPW
select WIRELESS_EXT
select WEXT_SPY
select CRYPTO
-   select CRYPTO_ARC4
-   select CRYPTO_ECB
select CRYPTO_AES
select CRYPTO_MICHAEL_MIC
-   select CRYPTO_ECB
select CRC32
select LIB80211
select LIB80211_CRYPT_WEP
diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig
index 119f1eb..267eb42 100644
--- a/net/wireless/Kconfig
+++ b/net/wireless/Kconfig
@@ -141,6 +141,8 @@ config LIB80211
  you want this built into your kernel.
 
 config LIB80211_CRYPT_WEP
+   select CRYPTO
+   select CRYPTO_ARC4BLK
tristate
 
 config LIB80211_CRYPT_CCMP
diff --git a/net/wireless/lib80211_crypt_wep.c 
b/net/wireless/lib80211_crypt_wep.c
index 6d41e05..3759e46 100644
--- a/net/wireless/lib80211_crypt_wep.c
+++ b/net/wireless/lib80211_crypt_wep.c
@@ -23,6 +23,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 MODULE_AUTHOR("Jouni Malinen");
@@ -48,7 +49,7 @@ static void *lib80211_wep_init(int keyidx)
goto fail;
priv->key_idx = keyidx;
 
-   priv->tx_tfm = crypto_alloc_blkcipher("ecb(arc4)", 0, CRYPTO_ALG_ASYNC);
+   priv->tx_tfm = crypto_alloc_blkcipher("arc4", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->tx_tfm)) {
printk(KERN_DEBUG "lib80211_crypt_wep: could not allocate "
   "crypto API arc4\n");
@@ -56,7 +57,7 @@ static void *lib80211_wep_init(int keyidx)
goto fail;
}
 
-   priv->rx_tfm = crypto_alloc_blkcipher("ecb(arc4)", 0, CRYPTO_ALG_ASYNC);
+   priv->rx_tfm = crypto_alloc_blkcipher("arc4", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->rx_tfm)) {
printk(KERN_DEBUG "lib80211_crypt_wep: could not allocate "
   "crypto API arc4\n");
@@ -139,6 +140,7 @@ static int lib80211_wep_encrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 {
struct lib80211_wep_data *wep = priv;
struct blkcipher_desc desc = { .tfm = wep->tx_tfm };
+   struct arc4_iv *iv = crypto_blkcipher_crt(wep->tx_tfm)->iv;
u32 crc, klen, len;
u8 *pos, *icv;
struct scatterlist sg;
@@ -170,7 +172,7 @@ static int lib80211_wep_encrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
icv[2] = crc >> 16;
icv[3] = crc >> 24;
 
-   crypto_blkcipher_setkey(wep->tx_tfm, key, klen);
+   arc4_setup_iv(iv, key, klen);
sg_init_one(&sg, pos, len + 4);
return crypto_blkcipher_encrypt(&desc, &sg, &sg, len + 4);
 }
@@ -186,6 +188,7 @@ static int lib80211_wep_decrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 {
struct lib80211_wep_data *wep = priv;
struct blkcipher_desc desc = { .tfm = wep->rx_tfm };
+   struct arc4_iv *iv = crypto_blkcipher_crt(wep->rx_tfm)->iv;
u32 crc, klen, plen;
u8 key[WEP_KEY_LEN + 3];
u8 keyidx, *pos, icv[4];
@@ -210,7 +213,7 @@ static int lib80211_wep_decrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
/* Apply RC4 to data and compute CRC32 over decrypted data */
plen = skb->len - hdr_len - 8;
 
-   crypto_blkcipher_setkey(wep->rx_tfm, key, klen);
+   arc4_setup_iv(iv, key, klen);
sg_init_one(&sg, pos, plen + 4);
if (crypto_blkcipher_decrypt(&desc, &sg, &sg, plen + 4))
return -7;
-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Convert arc4 from a cipher into a block cipher

2010-04-03 Thread Sebastian Andrzej Siewior
This patch series converts arc4 into a block cipher and converts all its
users (except those in staging) to use it. The first two patches ensure
that two implementations can coexist, the following patches convert each
user so we remain bisectable.
- lib80211_crypt_tkip was tested with ipw2200
- mac80211 was tested with zd1211rw

Sebastian

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] crypto: rename arc4

2010-04-03 Thread Sebastian Andrzej Siewior
The blk version of arc4 is comming. The rename ensures that the request
for arc4 loads both modules: this one and the new blk edition.

Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/Makefile  |2 +-
 crypto/{arc4.c => arc4cip.c} |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)
 rename crypto/{arc4.c => arc4cip.c} (98%)

diff --git a/crypto/Makefile b/crypto/Makefile
index 9e8f619..1f15112 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -67,7 +67,7 @@ obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6.o
-obj-$(CONFIG_CRYPTO_ARC4) += arc4.o
+obj-$(CONFIG_CRYPTO_ARC4) += arc4cip.o
 obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
diff --git a/crypto/arc4.c b/crypto/arc4cip.c
similarity index 98%
rename from crypto/arc4.c
rename to crypto/arc4cip.c
index 8be47e1..bf04659 100644
--- a/crypto/arc4.c
+++ b/crypto/arc4cip.c
@@ -102,3 +102,4 @@ module_exit(arc4_exit);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("ARC4 Cipher Algorithm");
 MODULE_AUTHOR("Jon Oberheide ");
+MODULE_ALIAS("arc4");
-- 
1.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-03-14 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-23 08:32:39 [+0800]:

>If you can find a way that allows arc4 to be used by multiple
>threads at the same time while storing less than 258 bytes in
>each thread, please let me know :)
:)

>No, what you could do is structure the IV differently based on the
>flag:
>
>struct arc4_iv {
>   union {
>   struct key {
>   u8 key[256];
>   u16 keylen;
>   };
>   struct iv {
>   u8 S[256];
>   u8 x, y;
>   };
>   };
>   u8 type;
>};
>
>This relies on the fact that we never use more than 256 bytes in
>the key so limiting its length is OK.

Okay. So so are we talking about something like that below then? This is
untested and I break other users bexcept lib80211_crypt_tkip. 

the state has been moved from ctx into iv. That way encrypt()/decrypt() can
deliver the same result for a given IV. If the IV is supplied as a plain
key then it wil be converted into a different internal state.
The name is now arc4.

Signed-off-by: Sebastian Andrzej Siewior 
---
 crypto/Kconfig   |2 +-
 crypto/arc4.c|  131 +++---
 crypto/testmgr.h |3 +-
 drivers/net/Kconfig  |1 -
 drivers/net/wireless/hostap/Kconfig  |2 -
 drivers/net/wireless/ipw2x00/Kconfig |2 -
 include/crypto/arc4.h|   26 +++
 net/mac80211/Kconfig |1 -
 net/wireless/lib80211_crypt_tkip.c   |   10 ++-
 9 files changed, 123 insertions(+), 55 deletions(-)
 create mode 100644 include/crypto/arc4.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 81c185a..5fab1c3 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -561,7 +561,7 @@ config CRYPTO_ANUBIS
 
 config CRYPTO_ARC4
tristate "ARC4 cipher algorithm"
-   select CRYPTO_ALGAPI
+   select CRYPTO_BLKCIPHER
help
  ARC4 cipher algorithm.
 
diff --git a/crypto/arc4.c b/crypto/arc4.c
index 8be47e1..1b20463 100644
--- a/crypto/arc4.c
+++ b/crypto/arc4.c
@@ -1,4 +1,4 @@
-/* 
+/*
  * Cryptographic API
  *
  * ARC4 Cipher Algorithm
@@ -13,76 +13,122 @@
  */
 #include 
 #include 
-#include 
+#include 
+#include 
 
 #define ARC4_MIN_KEY_SIZE  1
 #define ARC4_MAX_KEY_SIZE  256
 #define ARC4_BLOCK_SIZE1
 
-struct arc4_ctx {
-   u8 S[256];
-   u8 x, y;
-};
-
 static int arc4_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
 {
-   struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
+   /*
+* ARC4 is special: The user should supply an IV as struct arc4_iv and
+* fill either the key or the iv.
+*/
+   return -EOPNOTSUPP;
+}
+
+static void arc4_key_to_iv(const u8 *in_key, u32 key_len, struct arc4_iv *iv)
+{
int i, j = 0, k = 0;
 
-   ctx->x = 1;
-   ctx->y = 0;
+   iv->iv.x = 1;
+   iv->iv.y = 0;
 
-   for(i = 0; i < 256; i++)
-   ctx->S[i] = i;
+   for (i = 0; i < 256; i++)
+   iv->iv.S[i] = i;
 
-   for(i = 0; i < 256; i++)
+   for (i = 0; i < 256; i++)
{
-   u8 a = ctx->S[i];
+   u8 a = iv->iv.S[i];
j = (j + in_key[k] + a) & 0xff;
-   ctx->S[i] = ctx->S[j];
-   ctx->S[j] = a;
-   if(++k >= key_len)
+   iv->iv.S[i] = iv->iv.S[j];
+   iv->iv.S[j] = a;
+   if (++k >= key_len)
k = 0;
}
-
-   return 0;
 }
 
-static void arc4_crypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void arc4_ivsetup(struct arc4_iv *iv)
 {
-   struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct arc4_iv tmp_iv;
 
-   u8 *const S = ctx->S;
-   u8 x = ctx->x;
-   u8 y = ctx->y;
-   u8 a, b;
+   if (iv->type == ARC4_TYPE_IV)
+   return;
 
-   a = S[x];
-   y = (y + a) & 0xff;
-   b = S[y];
-   S[x] = b;
-   S[y] = a;
-   x = (x + 1) & 0xff;
-   *out++ = *in ^ S[(a + b) & 0xff];
+   memcpy(&tmp_iv, iv, sizeof(tmp_iv));
+   arc4_key_to_iv(tmp_iv.key.key, tmp_iv.key.key_len, iv);
+   iv->type = ARC4_TYPE_IV;
+}
+
+static int arc4_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+   struct scatterlist *src, unsigned int nbytes)
+{
+   struct blkcipher_walk walk;
+   struct arc4_iv *aiv;
+   u8 *S;
+   u8 x;
+   u8 y;
+   u8 a, b;
+   int ret;
+
+   blkcipher_walk_init(&walk, dst, src, nbytes);
+   ret = blkcipher_walk_virt(desc, &walk);
+   if (ret)
+   return ret;
+
+   aiv = (struct arc4_iv *)walk.iv;
+   arc4_ivsetup(aiv);
+
+   S =

Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-22 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-22 08:52:17 [+0800]:

>On Mon, Feb 22, 2010 at 08:45:47AM +0800, Herbert Xu wrote:
>> 
>> How about this? You extend the IV by one more byte, and use that
>> byte as a boolean flag to indicate whether the IV is valid.  All
So I trick the crypto api to allocate more bytes than ->ivsize says.

>> users that cannot supply their own IVs can then set the IV to zero.
which works fine with kzalloc()

>> When you see the zero flag in the IV, you reinitialise the IV per
>> the key.
Okay. When we have to re-key and the user calls setkey() without
re-allocating thr cipher then I would not notice this. So I need a
counter. And all this will make it work but I still think it is fishy.
Plus we waste 258bytes.

>In fact for arc4 we could just drop the key altogether since it
>plays no part after setting the initial state.
Since I'm not allowed to kfree() the ctx in encrypt() are you proposing
tfm->setup_iv(iv, key)? 

>> > salsa also does not stick to plan here. ctx->input[6-9] is initialized
>> > in encrypt() path. So two threads sharing a ctx are going to clobber
>> > their state.
>> 
>> Salsa should also be fixed.
I saw that comming. And I complaind back then that the assembly code was
not pretty enough... and removing the assembly is probably not option :)

>For Salsa on the other hand the key is rather useful since all
>we need is a two-byte IV that's just a sequence number.
No it's 8 bytes. Berstein's U8TO32_LITTLE() is actually a cpu_to_be32().
Not sure if he knows it :)
However I'm not sure where you going with this. salsa is fine besides
the clobber thing, isn't it?

>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-21 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-16 20:51:25 [+0800]:

>On Fri, Feb 12, 2010 at 09:42:28AM +0100, Sebastian Andrzej Siewior wrote:
>>
>> -static void arc4_crypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>> +static void arc4_ivsetup(struct arc4_ctx *ctx, u8 *iv)
>>  {
>> -struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
>> +if (unlikely(!ctx->new_key))
>> +return;
>> +memcpy(iv, &ctx->iv, sizeof(ctx->iv));
>> +ctx->new_key = 0;
>
>Sorry, but this doesn't work.
>
>A ctx is supposed to be reentrant.  That is, while one thread
>is working away with a given ctx I should be able to use that
>same ctx in a different thread without them clobbering each
>other.
I also destroy the user supplied IV. You don't care about that? :)
So I have to know that someone called setkey() on this ctx but I can't
leave hints.
salsa also does not stick to plan here. ctx->input[6-9] is initialized
in encrypt() path. So two threads sharing a ctx are going to clobber
their state.

What about a new api for the stream cipher? We would merge the ctx part
and the iv into one handle. So the user would call setup_iv() instead of
setkey(). The difference would be that I can access the iv from within
setkey(). And the algorithm can fully express himself since he is no
longer trapped in the wrong body :)

>So that means (in general) you must not modify the ctx in any
>function other than setkey.
That is hard because I have a new state after encryption which I am only
allowed to save in the iv. And the new state may be reset in setkey()
where I can't touch the iv.
salsa does not keep/update its state. So the input[6-9] problem could be
fixed. Who/where is it used anyway? I can't see any user besides the
possible once (i.e. dm-crypt/ipsec).

>This also brings up the bigger question of how we transition to
>this new arc4.  I don't think we need to maintain exactly the
>same behaviour as the existing ecb(arc4).
>
>So what we could do is simply add a new blkcipher arc4, alongside
>the existing cipher arc4.  Then we can convert the existing users
>across, and finally remove the old arc4.
This has worked out before, lets stick to this :)

>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] to fix vmac test fails on s390

2010-02-21 Thread Sebastian Andrzej Siewior
* Wang, Shane | 2010-02-21 13:32:49 [+0800]:

>--- a/crypto/vmac.cThu Feb 11 00:45:57 2010 -0800
>+++ b/crypto/vmac.cSun Feb 21 02:23:01 2010 -0800
>@@ -42,6 +42,8 @@ const u64 m63   = UINT64_C(0x7ff
> const u64 m63   = UINT64_C(0x7fff);  /* 63-bit mask   */
> const u64 m64   = UINT64_C(0x);  /* 64-bit mask   */
> const u64 mpoly = UINT64_C(0x1fff1fff);  /* Poly key mask */
>+
>+#define pe64_to_cpup le64_to_cpup /* Prefer little endian */

Does it mean that I can switch it to be64_to_cpup ?

>@@ -575,6 +572,10 @@ static int vmac_final(struct shash_desc 
>   u8 nonce[16] = {};
> 
>   mac = vmac(NULL, 0, nonce, NULL, ctx);
>+
>+  /* set output invariant considering endianness */
>+  mac = le64_to_cpup(&mac);
So this is the fix. It would look better if you would include this
swap into vmac() itself. I'm not sure but this is probably causing a
dereference which could be avoided.
sparse should catch conversion bugs like this one if you were using
types likes __be64.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-15 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-15 08:10:08 [+0800]:

>How about we just remove it? It's not on a hot path anyway.
Sure.

>I can do this when integrating the patch so you don't have to
>resend.
Okay, thanks.

>Thanks,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-14 Thread Sebastian Andrzej Siewior
* Sebastian Andrzej Siewior | 2010-02-12 09:42:28 [+0100]:

>+static void arc4_ivsetup(struct arc4_ctx *ctx, u8 *iv)
> {
>-  struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
>+  if (unlikely(!ctx->new_key))

That should be likely(). Do you want me resend the whole thing? Haven't
noticed anything else :)

>+  return;
>+  memcpy(iv, &ctx->iv, sizeof(ctx->iv));
>+  ctx->new_key = 0;
>+}

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-12 Thread Sebastian Andrzej Siewior
* Adrian-Ken Rueegsegger | 2010-02-12 10:34:27 [+0100]:

>Hi,
Hi,

>Sebastian Andrzej Siewior schrieb:
>> The name is still ecb(aes) but since this is provided by the blkcipher 
>> itself,
>Just to avoid any confusion you meant ecb(arc4) not ecb(aes) here right?

Yes, I do. Not sure how I got aes in there. I did not mix it up in the
patch :)

>
>-Adrian

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto/arc4: convert this stream cipher into a block cipher

2010-02-12 Thread Sebastian Andrzej Siewior
the state has been moved from ctx into iv. That way encrypt()/decrypt() can
deliver the same result for a given IV. This patch makes the cipher work with
dm-crypt not that it is a good thing. However, the performance may have
improved :)
The name is still ecb(aes) but since this is provided by the blkcipher itself,
I removed the select statement.

Signed-off-by: Sebastian Andrzej Siewior 
---
I had it run with wireless and dm-crypt. No problems so far. Not sure if
it makes sense to rename it to arc4 and strip the ecb prefix. It would
make it consistent with salsa but would require another patch.

 crypto/Kconfig   |2 +-
 crypto/arc4.c|  120 --
 crypto/testmgr.h |3 +-
 drivers/net/Kconfig  |1 -
 drivers/net/wireless/hostap/Kconfig  |2 -
 drivers/net/wireless/ipw2x00/Kconfig |2 -
 net/mac80211/Kconfig |1 -
 7 files changed, 87 insertions(+), 44 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 81c185a..5fab1c3 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -561,7 +561,7 @@ config CRYPTO_ANUBIS
 
 config CRYPTO_ARC4
tristate "ARC4 cipher algorithm"
-   select CRYPTO_ALGAPI
+   select CRYPTO_BLKCIPHER
help
  ARC4 cipher algorithm.
 
diff --git a/crypto/arc4.c b/crypto/arc4.c
index 8be47e1..b67b656 100644
--- a/crypto/arc4.c
+++ b/crypto/arc4.c
@@ -1,4 +1,4 @@
-/* 
+/*
  * Cryptographic API
  *
  * ARC4 Cipher Algorithm
@@ -13,76 +13,124 @@
  */
 #include 
 #include 
-#include 
+#include 
 
 #define ARC4_MIN_KEY_SIZE  1
 #define ARC4_MAX_KEY_SIZE  256
 #define ARC4_BLOCK_SIZE1
 
-struct arc4_ctx {
+struct arc4_iv {
u8 S[256];
u8 x, y;
 };
 
+struct arc4_ctx {
+   struct arc4_iv iv;
+   u8 new_key;
+};
+
 static int arc4_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
 {
struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
int i, j = 0, k = 0;
 
-   ctx->x = 1;
-   ctx->y = 0;
+   ctx->iv.x = 1;
+   ctx->iv.y = 0;
 
-   for(i = 0; i < 256; i++)
-   ctx->S[i] = i;
+   for (i = 0; i < 256; i++)
+   ctx->iv.S[i] = i;
 
-   for(i = 0; i < 256; i++)
+   for (i = 0; i < 256; i++)
{
-   u8 a = ctx->S[i];
+   u8 a = ctx->iv.S[i];
j = (j + in_key[k] + a) & 0xff;
-   ctx->S[i] = ctx->S[j];
-   ctx->S[j] = a;
-   if(++k >= key_len)
+   ctx->iv.S[i] = ctx->iv.S[j];
+   ctx->iv.S[j] = a;
+   if (++k >= key_len)
k = 0;
}
-
+   ctx->new_key = 1;
return 0;
 }
 
-static void arc4_crypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void arc4_ivsetup(struct arc4_ctx *ctx, u8 *iv)
 {
-   struct arc4_ctx *ctx = crypto_tfm_ctx(tfm);
+   if (unlikely(!ctx->new_key))
+   return;
+   memcpy(iv, &ctx->iv, sizeof(ctx->iv));
+   ctx->new_key = 0;
+}
 
-   u8 *const S = ctx->S;
-   u8 x = ctx->x;
-   u8 y = ctx->y;
+static int arc4_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+   struct scatterlist *src, unsigned int nbytes)
+{
+   struct blkcipher_walk walk;
+   struct crypto_blkcipher *tfm = desc->tfm;
+   struct arc4_ctx *ctx = crypto_blkcipher_ctx(tfm);
+   struct arc4_iv *iv;
+   u8 *S;
+   u8 x;
+   u8 y;
u8 a, b;
+   int ret;
+
+   blkcipher_walk_init(&walk, dst, src, nbytes);
+   ret = blkcipher_walk_virt(desc, &walk);
+   if (ret)
+   return ret;
+
+   arc4_ivsetup(ctx, walk.iv);
+
+   iv = (struct arc4_iv *)walk.iv;
+
+   S = iv->S;
+   x = iv->x;
+   y = iv->y;
+
+   while (walk.nbytes) {
+   u8 *in = walk.src.virt.addr;
+   u8 *out = walk.dst.virt.addr;
+   u32 i;
+
+   for (i = 0; i < walk.nbytes; i++) {
+   a = S[x];
+   y = (y + a) & 0xff;
+   b = S[y];
+   S[x] = b;
+   S[y] = a;
+   x = (x + 1) & 0xff;
+   *out = *in ^ S[(a + b) & 0xff];
+
+   in++;
+   out++;
+   }
+   ret = blkcipher_walk_done(desc, &walk, 0);
+   WARN_ON(ret < 0);
+   }
 
-   a = S[x];
-   y = (y + a) & 0xff;
-   b = S[y];
-   S[x] = b;
-   S[y] = a;
-   x = (x + 1) & 0xff;
-   *out++ = *in ^ S[(a + b) & 0xff];
-
-   ctx->x = x;
-   ctx->y = y;
+   iv->x = x;
+   iv->y = y;
+   return ret;
 }
 

Re: [PATCH 1/1] CRYPTO: Fix checkpatch errors & warnings in arc4.c

2010-02-10 Thread Sebastian Andrzej Siewior
* richih.mailingl...@gmail.com | 2010-02-10 02:17:39 [+0100]:

>From: Richard Hartmann 
>
>
>Signed-off-by: Richard Hartmann 
>---
> crypto/arc4.c |9 -
> 1 files changed, 4 insertions(+), 5 deletions(-)

I've made this whitespace fixes and a few others while re-writting it
yesterday. So I've taken care of this allready.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dm-crypt: disable block encryption with arc4

2010-02-09 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-10 07:45:19 [+1100]:

>> Herbert, what happend to the "check for streamcipher" idea you had? Is
>> it gone? On the other hand it wouldn't be probably that bad to have a
>
>Well again whether that should be done is up to the dm-crypt
>maintainers.
Milan liked that afaik.

>> seprate interface to grab a block cipher _or_ a stream cipher. So
>
>Just because something isn't a stream cipher doesn't mean that
>it is safe for disk encryption.  People simply shouldn't be using
>random algorithms for disk encryption.
>
>> I can't imaging how you want to fix arc4 that it will work in dm-crypt.
>
>I thought I've explained this before.  Just turn it into a blkcipher
>and add IV.
I beg your pardon. I probably mixed things up.

>> The algorithm relies more or less on the fact that it envolves itself
>> during processing of data.
>
>This is no different to any stream cipher.
Sure. So we fix arc4 and don't play mother . Okay I will into this :)

>
>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dm-crypt: disable block encryption with arc4

2010-02-09 Thread Sebastian Andrzej Siewior
* Herbert Xu | 2010-02-09 18:37:18 [+1100]:

>Mikulas Patocka  wrote:
>> 
>> You should rather add a flag CRYPTO_ALG_CHANGES_STATE to determine that a 
>> cipher can't be used to encrypt disks.
>
>No, please see my reply in the previous thread.  What we should
>do is fix arc4.  I just haven't got around to doing it yet.
>
>As to blacklisting algorithms not suitable for disk encryption,
>that is up to the dm-crypt maintainers to decide.

Herbert, what happend to the "check for streamcipher" idea you had? Is
it gone? On the other hand it wouldn't be probably that bad to have a
seprate interface to grab a block cipher _or_ a stream cipher. So
something like this wouldn't happen. This is basically the "check for
stream cipher" without encrypt = decrypt plus with a struct the cra_u
union.
I can't imaging how you want to fix arc4 that it will work in dm-crypt.
The algorithm relies more or less on the fact that it envolves itself
during processing of data.
Salsa works with dm-crypt because the internal state is taken from the
IV and is never written back. dm-crypt always encrypts/decrypts a 512
block in one go. Splitting it in two and requesting two 256 block
encryptions would work with _every_ other block cipher but break salsa.

>
>Cheers,

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device driver for hardware encryption

2010-01-30 Thread Sebastian Andrzej Siewior
* Bai Shuwei | 2010-01-28 17:12:46 [+0800]:

>   When I add the hardware device driver for crypto, i get the bellow
>error information. My kernel is 2.6.26
>
>[  319.938922] Call Trace:
>[  319.938926]  [] schedule+0x95/0x635
>[  319.938934]  [] :libfpga:fpga_dma_open+0xa5/0xab
>[  319.938941]  [] :libfpga:fpga_dma_block_read+0x12b/0x167
>[  319.938945]  [] autoremove_wake_function+0x0/0x2e
>[  319.938954]  [] :dsi_aes:dsi_aes_crypt+0x1db/0x24b
>[  319.938960]  [] :cbc:crypto_cbc_encrypt+0xe6/0x138
>[  319.938964]  [] :aes_generic:aes_encrypt+0x0/0x21
>[  319.938980]  [] :crypto_blkcipher:async_encrypt+0x35/0x3a
That looks wrong from the implementation POV: If your FPGA is doing aes
in CBC mode you shouldn't hack it into aes_generic.c or cbc.c but
implement your own driver with a higher priority.

>I think it is happed when calling the
>wait_event_interruptible(fdev->wait) routine, but i don't know how to
>fix it. Hope can get you help. Thanks!
>
>The encryption calling tree and irq handler tree showed in bellow
>
>/* ecryption callint tree */
>aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>|-->dsi_aes_crypt((unsigned int *)in, (unsigned int *) out);
>   |-->do_crypt
>  |-->down_interruptible(&fpga->sem)
>   fpga_dma_block_write()
>   |--> fpga_dma_block_transfer(base, pcie_addr, local_addr,
>size, flag | FPGA_DMA_READ_FLAG); /*write data to hardware */
>|-->wait_event_interruptible(fdev->wait) /* waiting event*/
>  up()

This isn't helping. Please look at a few drivers which use DMA to
transfer the data and use hardware encryption. Examples are:
- drivers/crypto/talitos.c
- drivers/crypto/hifn_795x.c

>Best Regards
>
>Bai Shuwei

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geode: Fix cip/blk confusion

2010-01-29 Thread Sebastian Andrzej Siewior
* Roel Kluin | 2010-01-29 14:32:56 [+0100]:

>This was already discussed in december/januari but I still cannot find it in
>mainline, was it lost?

Isn't this patch [0] and [1] in Herbert's tree? If so Herbert is
probably going to merge in the next merge window because it is not
urgend enough.

[0] 
http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=commit;h=faad98f29606d9d3c6bddae7c88693be37d2fb43
[1] 
http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=commit;h=d7ac769068df87ca8c7f72d99cf67ead16739f18

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >