Re: [PATCH v8 04/10] drivers: qcom: rpmh: add RPMH helper functions

2018-05-23 Thread Raju P L S S S N

Hi,

On 5/15/2018 11:52 PM, Doug Anderson wrote:

Hi,

On Tue, May 15, 2018 at 10:47 AM, Lina Iyer  wrote:

On Fri, May 11 2018 at 14:17 -0600, Doug Anderson wrote:


Hi,

On Wed, May 9, 2018 at 10:01 AM, Lina Iyer  wrote:


+int rpmh_write(const struct device *dev, enum rpmh_state state,
+  const struct tcs_cmd *cmd, u32 n)
+{
+   DECLARE_COMPLETION_ONSTACK(compl);
+   DEFINE_RPMH_MSG_ONSTACK(dev, state, &compl, rpm_msg);
+   int ret;
+
+   if (!cmd || !n || n > MAX_RPMH_PAYLOAD)
+   return -EINVAL;
+
+   memcpy(rpm_msg.cmd, cmd, n * sizeof(*cmd));
+   rpm_msg.msg.num_cmds = n;
+
+   ret = __rpmh_write(dev, state, &rpm_msg);
+   if (ret)
+   return ret;
+
+   ret = wait_for_completion_timeout(&compl, RPMH_TIMEOUT_MS);



IMO it's almost never a good idea to use wait_for_completion_timeout()
together with a completion that's declared on the stack.  If you
somehow insist that this is a good idea then I need to see incredibly
clear and obvious code/comments that say why it's impossible that the
process might somehow try to signal the completion _after_
RPMH_TIMEOUT_MS has expired.

Specifically if the timeout happens but the process could still signal
a completion later then they will access random data on the stack of a
function that has already returned.  This causes ridiculously
difficult-to-debug crashes.


NOTE: You've got timeout set to 10 seconds here.  Is that really even
useful?  IMO just call wait_for_completion() without a timeout.  It's
much better to have a nice clean hang than a random stack corruption.



The 10 sec timeout will guarantee that we will not get a response at all
anymore for the request. Usually requests can be considered failed if
there is no response in a few tens of microseconds. 10 sec is just an
arbitarily large number.

The reason we use timeout is that once the timeout happens, we know we
have failed, we could trigger a watchdog or crash the system. This is
very important for our productization in debugging RPMH failures. A
hang would not always trigger a watchdog and the failure would be silent
and possibly fatal but hard to debug.


If you intend the system to crash when this timeout happens then IMHO
add a BUG_ON.  Then I won't worry about something coming around later
and clobbering the stack.

-Doug



Sure. Will add BUG_ON in next patch.

Thanks,
Raju.


Re: [PATCH v8 04/10] drivers: qcom: rpmh: add RPMH helper functions

2018-05-15 Thread Doug Anderson
Hi,

On Tue, May 15, 2018 at 10:47 AM, Lina Iyer  wrote:
> On Fri, May 11 2018 at 14:17 -0600, Doug Anderson wrote:
>>
>> Hi,
>>
>> On Wed, May 9, 2018 at 10:01 AM, Lina Iyer  wrote:
>>>
>>> +int rpmh_write(const struct device *dev, enum rpmh_state state,
>>> +  const struct tcs_cmd *cmd, u32 n)
>>> +{
>>> +   DECLARE_COMPLETION_ONSTACK(compl);
>>> +   DEFINE_RPMH_MSG_ONSTACK(dev, state, &compl, rpm_msg);
>>> +   int ret;
>>> +
>>> +   if (!cmd || !n || n > MAX_RPMH_PAYLOAD)
>>> +   return -EINVAL;
>>> +
>>> +   memcpy(rpm_msg.cmd, cmd, n * sizeof(*cmd));
>>> +   rpm_msg.msg.num_cmds = n;
>>> +
>>> +   ret = __rpmh_write(dev, state, &rpm_msg);
>>> +   if (ret)
>>> +   return ret;
>>> +
>>> +   ret = wait_for_completion_timeout(&compl, RPMH_TIMEOUT_MS);
>>
>>
>> IMO it's almost never a good idea to use wait_for_completion_timeout()
>> together with a completion that's declared on the stack.  If you
>> somehow insist that this is a good idea then I need to see incredibly
>> clear and obvious code/comments that say why it's impossible that the
>> process might somehow try to signal the completion _after_
>> RPMH_TIMEOUT_MS has expired.
>>
>> Specifically if the timeout happens but the process could still signal
>> a completion later then they will access random data on the stack of a
>> function that has already returned.  This causes ridiculously
>> difficult-to-debug crashes.
>>
>>
>> NOTE: You've got timeout set to 10 seconds here.  Is that really even
>> useful?  IMO just call wait_for_completion() without a timeout.  It's
>> much better to have a nice clean hang than a random stack corruption.
>>
>>
> The 10 sec timeout will guarantee that we will not get a response at all
> anymore for the request. Usually requests can be considered failed if
> there is no response in a few tens of microseconds. 10 sec is just an
> arbitarily large number.
>
> The reason we use timeout is that once the timeout happens, we know we
> have failed, we could trigger a watchdog or crash the system. This is
> very important for our productization in debugging RPMH failures. A
> hang would not always trigger a watchdog and the failure would be silent
> and possibly fatal but hard to debug.

If you intend the system to crash when this timeout happens then IMHO
add a BUG_ON.  Then I won't worry about something coming around later
and clobbering the stack.

-Doug


Re: [PATCH v8 04/10] drivers: qcom: rpmh: add RPMH helper functions

2018-05-15 Thread Lina Iyer

On Fri, May 11 2018 at 14:17 -0600, Doug Anderson wrote:

Hi,

On Wed, May 9, 2018 at 10:01 AM, Lina Iyer  wrote:

+int rpmh_write(const struct device *dev, enum rpmh_state state,
+  const struct tcs_cmd *cmd, u32 n)
+{
+   DECLARE_COMPLETION_ONSTACK(compl);
+   DEFINE_RPMH_MSG_ONSTACK(dev, state, &compl, rpm_msg);
+   int ret;
+
+   if (!cmd || !n || n > MAX_RPMH_PAYLOAD)
+   return -EINVAL;
+
+   memcpy(rpm_msg.cmd, cmd, n * sizeof(*cmd));
+   rpm_msg.msg.num_cmds = n;
+
+   ret = __rpmh_write(dev, state, &rpm_msg);
+   if (ret)
+   return ret;
+
+   ret = wait_for_completion_timeout(&compl, RPMH_TIMEOUT_MS);


IMO it's almost never a good idea to use wait_for_completion_timeout()
together with a completion that's declared on the stack.  If you
somehow insist that this is a good idea then I need to see incredibly
clear and obvious code/comments that say why it's impossible that the
process might somehow try to signal the completion _after_
RPMH_TIMEOUT_MS has expired.

Specifically if the timeout happens but the process could still signal
a completion later then they will access random data on the stack of a
function that has already returned.  This causes ridiculously
difficult-to-debug crashes.


NOTE: You've got timeout set to 10 seconds here.  Is that really even
useful?  IMO just call wait_for_completion() without a timeout.  It's
much better to have a nice clean hang than a random stack corruption.



The 10 sec timeout will guarantee that we will not get a response at all
anymore for the request. Usually requests can be considered failed if
there is no response in a few tens of microseconds. 10 sec is just an
arbitarily large number.

The reason we use timeout is that once the timeout happens, we know we
have failed, we could trigger a watchdog or crash the system. This is
very important for our productization in debugging RPMH failures. A
hang would not always trigger a watchdog and the failure would be silent
and possibly fatal but hard to debug.

-- Lina



Re: [PATCH v8 04/10] drivers: qcom: rpmh: add RPMH helper functions

2018-05-11 Thread Doug Anderson
Hi,

On Wed, May 9, 2018 at 10:01 AM, Lina Iyer  wrote:
> +int rpmh_write(const struct device *dev, enum rpmh_state state,
> +  const struct tcs_cmd *cmd, u32 n)
> +{
> +   DECLARE_COMPLETION_ONSTACK(compl);
> +   DEFINE_RPMH_MSG_ONSTACK(dev, state, &compl, rpm_msg);
> +   int ret;
> +
> +   if (!cmd || !n || n > MAX_RPMH_PAYLOAD)
> +   return -EINVAL;
> +
> +   memcpy(rpm_msg.cmd, cmd, n * sizeof(*cmd));
> +   rpm_msg.msg.num_cmds = n;
> +
> +   ret = __rpmh_write(dev, state, &rpm_msg);
> +   if (ret)
> +   return ret;
> +
> +   ret = wait_for_completion_timeout(&compl, RPMH_TIMEOUT_MS);

IMO it's almost never a good idea to use wait_for_completion_timeout()
together with a completion that's declared on the stack.  If you
somehow insist that this is a good idea then I need to see incredibly
clear and obvious code/comments that say why it's impossible that the
process might somehow try to signal the completion _after_
RPMH_TIMEOUT_MS has expired.

Specifically if the timeout happens but the process could still signal
a completion later then they will access random data on the stack of a
function that has already returned.  This causes ridiculously
difficult-to-debug crashes.


NOTE: You've got timeout set to 10 seconds here.  Is that really even
useful?  IMO just call wait_for_completion() without a timeout.  It's
much better to have a nice clean hang than a random stack corruption.


-Doug


[PATCH v8 04/10] drivers: qcom: rpmh: add RPMH helper functions

2018-05-09 Thread Lina Iyer
Sending RPMH requests and waiting for response from the controller
through a callback is common functionality across all platform drivers.
To simplify drivers, add a library functions to create RPMH client and
send resource state requests.

rpmh_write() is a synchronous blocking call that can be used to send
active state requests.

Signed-off-by: Lina Iyer 
---

Changes in v7:
- Optimization and locking fixes

Changes in v6:
- replace rpmh_client with device
- inline wait_for_tx_done()

Changes in v4:
- use const struct tcs_cmd in API
- remove wait count from this patch
- changed -EFAULT to -EINVAL
---
 drivers/soc/qcom/Makefile|   4 +-
 drivers/soc/qcom/rpmh-internal.h |   6 ++
 drivers/soc/qcom/rpmh-rsc.c  |   8 ++
 drivers/soc/qcom/rpmh.c  | 176 +++
 include/soc/qcom/rpmh.h  |  25 +
 5 files changed, 218 insertions(+), 1 deletion(-)
 create mode 100644 drivers/soc/qcom/rpmh.c
 create mode 100644 include/soc/qcom/rpmh.h

diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index cb6300f6a8e9..bb395c3202ca 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -7,7 +7,9 @@ obj-$(CONFIG_QCOM_PM)   +=  spm.o
 obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o
 qmi_helpers-y  += qmi_encdec.o qmi_interface.o
 obj-$(CONFIG_QCOM_RMTFS_MEM)   += rmtfs_mem.o
-obj-$(CONFIG_QCOM_RPMH)+=  rpmh-rsc.o
+obj-$(CONFIG_QCOM_RPMH)+= qcom_rpmh.o
+qcom_rpmh-y+= rpmh-rsc.o
+qcom_rpmh-y+= rpmh.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
index cc29176f1303..d9a21726e568 100644
--- a/drivers/soc/qcom/rpmh-internal.h
+++ b/drivers/soc/qcom/rpmh-internal.h
@@ -14,6 +14,7 @@
 #define MAX_CMDS_PER_TCS   16
 #define MAX_TCS_PER_TYPE   3
 #define MAX_TCS_NR (MAX_TCS_PER_TYPE * TCS_TYPE_NR)
+#define RPMH_MAX_CTRLR 2
 
 struct rsc_drv;
 
@@ -52,6 +53,7 @@ struct tcs_group {
  * @tcs:TCS groups
  * @tcs_in_use: s/w state of the TCS
  * @lock:   synchronize state of the controller
+ * @list:   element in list of drv
  */
 struct rsc_drv {
const char *name;
@@ -61,9 +63,13 @@ struct rsc_drv {
struct tcs_group tcs[TCS_TYPE_NR];
DECLARE_BITMAP(tcs_in_use, MAX_TCS_NR);
spinlock_t lock;
+   struct list_head list;
 };
 
+extern struct list_head rsc_drv_list;
 
 int rpmh_rsc_send_data(struct rsc_drv *drv, const struct tcs_request *msg);
 
+void rpmh_tx_done(const struct tcs_request *msg, int r);
+
 #endif /* __RPM_INTERNAL_H__ */
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index 0a8cec9d1651..c0edf3850147 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -61,6 +61,8 @@
 #define CMD_STATUS_ISSUED  BIT(8)
 #define CMD_STATUS_COMPL   BIT(16)
 
+LIST_HEAD(rsc_drv_list);
+
 static u32 read_tcs_reg(struct rsc_drv *drv, int reg, int tcs_id, int cmd_id)
 {
return readl_relaxed(drv->tcs_base + reg + RSC_DRV_TCS_OFFSET * tcs_id +
@@ -176,6 +178,8 @@ static irqreturn_t tcs_tx_done(int irq, void *p)
spin_lock(&drv->lock);
clear_bit(i, drv->tcs_in_use);
spin_unlock(&drv->lock);
+   if (req)
+   rpmh_tx_done(req, err);
}
 
return IRQ_HANDLED;
@@ -469,6 +473,10 @@ static int rpmh_rsc_probe(struct platform_device *pdev)
/* Enable the active TCS to send requests immediately */
write_tcs_reg(drv, RSC_DRV_IRQ_ENABLE, 0, drv->tcs[ACTIVE_TCS].mask);
 
+   INIT_LIST_HEAD(&drv->list);
+   list_add(&drv->list, &rsc_drv_list);
+   dev_set_drvdata(&pdev->dev, drv);
+
return devm_of_platform_populate(&pdev->dev);
 }
 
diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
new file mode 100644
index ..74bb82339b01
--- /dev/null
+++ b/drivers/soc/qcom/rpmh.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "rpmh-internal.h"
+
+#define RPMH_TIMEOUT_MSmsecs_to_jiffies(1)
+
+#define DEFINE_RPMH_MSG_ONSTACK(dev, s, q, name)   \
+   struct rpmh_request name = {\
+   .msg = {\
+   .state = s, \
+   .cmds = name.cmd,   \
+   .num_cmds = 0,  \
+   .wait_for_compl = true, \
+   },