Re: [PATCH V2] usb: xhci: add support for performing fake doorbell

2016-11-20 Thread Rafał Miłecki
Hi Mathias,

On 17 October 2016 at 22:30, Rafał Miłecki  wrote:
> From: Rafał Miłecki 
>
> Broadcom's Northstar XHCI controllers seem to need a special start
> procedure to work correctly. There isn't any official documentation of
> this, the problem is that controller doesn't detect any connected
> devices with default setup. Moreover connecting USB device to controller
> that doesn't run properly can cause SoC's watchdog issues.
>
> A workaround that was successfully tested on multiple devices is to
> perform a fake doorbell. This patch adds code for doing this and enables
> it on BCM4708 family.
>
> Signed-off-by: Rafał Miłecki 
> ---
> V2: Enable quirk for brcm,bcm4708 machines instead of adding separated binding
> for it. Thanks Rob for your comment on this.

Do you think you can pick & push this one? V2 follows Rob's suggestion
and he has some DT knowledge for sure, so I guess it should be OK.


Re: [RFC][PATCH 7/7] kref: Implement using refcount_t

2016-11-20 Thread Ingo Molnar

* Boqun Feng  wrote:

> > It also fails to decrement in the underflow case (which is fine, but not
> > obvious from the comment). Same thing below.
> > 
> 
> Maybe a table in the comment like the following helps?
> 
> /*
>  * T: return true, F: return fasle
>  * W: trigger WARNING
>  * N: no effect
>  *
>  *  |   value before ops  |
>  *  |   0   |   1   | UINT_MAX - 1 | UINT_MAX |
>  * -+---+---+--+--+
>  * inc()|  W|   |  W   |  N   |
>  * inc_not_zero()   |   FN  |   T   |  WT  |WTN   |
>  * dec_and_test()   |  WFN  |   T   |   F  | FN   |
>  * dec_and_mutex_lock() |  WFN  |   T   |   F  | FN   |
>  * dec_and_spin_lock()  |  WFN  |   T   |   F  | FN   |
>  */

Yes!

nit: s/fasle/false

Also, I think we want to do a couple of other changes as well to make it more 
readable, extend the columns with 'normal' values (2 and UINT_MAX-2) and order 
the 
colums properly. I.e. something like:

/*
 * The before/after outcome of various atomic ops:
 *
 *   T: returns true
 *   F: returns false
 *   --
 *   W: op triggers kernel WARNING
 *   --
 *   0: no change to atomic var value
 *   +: atomic var value increases by 1
 *   -: atomic var value decreases by 1
 *   --
 *  -1: UINT_MAX
 *  -2: UINT_MAX-1
 *  -3: UINT_MAX-2
 *
 * -+-+-+-+-+-+-+
 * value before:|  -3 |  -2 |  -1 |   0 |   1 |   2 |
 * -+-+-+-+-+-+-+
 * value+effect after:  |
 * -+ | | | | | |
 * inc()| ..+ | W.+ | ..0 | W.+ | ..+ | ..+ |
 * inc_not_zero()   | .T+ | WT+ | WT0 | .F0 | .T+ | .T+ |
 * dec_and_test()   | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * dec_and_mutex_lock() | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * dec_and_spin_lock()  | .F- | .F- | .F0 | WF0 | .T- | .F- |
 * -+-+-+-+-+-+-+
 *
 * So for example: 'WT+' in the inc_not_zero() row and '-2' column
 * means that when the atomic_inc_not_zero() function is called
 * with an atomic var that has a value of UINT_MAX-1, then the
 * atomic var's value will increase to the maximum overflow value
 * of UINT_MAX and will produce a warning. The function returns
 * 'true'.
 */

I think this table makes the overflow/underflow semantics pretty clear and also 
documents the regular behavior of these atomic ops pretty intuitively.

Agreed?

Thanks,

Ingo


Re: [PATCH 2/5] drm/modes: Support modes names on the command line

2016-11-20 Thread Maxime Ripard
Hi Sean,

On Wed, Nov 16, 2016 at 12:21:42PM -0500, Sean Paul wrote:
> On Tue, Oct 18, 2016 at 4:29 AM, Maxime Ripard
>  wrote:
> > The drm subsystem also uses the video= kernel parameter, and in the
> > documentation refers to the fbdev documentation for that parameter.
> >
> > However, that documentation also says that instead of giving the mode using
> > its resolution we can also give a name. However, DRM doesn't handle that
> > case at the moment. Even though in most case it shouldn't make any
> > difference, it might be useful for analog modes, where different standards
> > might have the same resolution, but still have a few different parameters
> > that are not encoded in the modes (NTSC vs NTSC-J vs PAL-M for example).
> >
> > Signed-off-by: Maxime Ripard 
> > ---
> >  drivers/gpu/drm/drm_connector.c |  3 +-
> >  drivers/gpu/drm/drm_fb_helper.c |  4 +++-
> >  drivers/gpu/drm/drm_modes.c | 49 +++---
> >  include/drm/drm_connector.h |  1 +-
> >  4 files changed, 41 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_connector.c 
> > b/drivers/gpu/drm/drm_connector.c
> > index 2db7fb510b6c..27a8a511257c 100644
> > --- a/drivers/gpu/drm/drm_connector.c
> > +++ b/drivers/gpu/drm/drm_connector.c
> > @@ -147,8 +147,9 @@ static void drm_connector_get_cmdline_mode(struct 
> > drm_connector *connector)
> > connector->force = mode->force;
> > }
> >
> > -   DRM_DEBUG_KMS("cmdline mode for connector %s %dx%d@%dHz%s%s%s\n",
> > +   DRM_DEBUG_KMS("cmdline mode for connector %s %s %dx%d@%dHz%s%s%s\n",
> >   connector->name,
> > + mode->name ? mode->name : "",
> >   mode->xres, mode->yres,
> >   mode->refresh_specified ? mode->refresh : 60,
> >   mode->rb ? " reduced blanking" : "",
> > diff --git a/drivers/gpu/drm/drm_fb_helper.c 
> > b/drivers/gpu/drm/drm_fb_helper.c
> > index 03414bde1f15..20a68305fb45 100644
> > --- a/drivers/gpu/drm/drm_fb_helper.c
> > +++ b/drivers/gpu/drm/drm_fb_helper.c
> > @@ -1748,6 +1748,10 @@ struct drm_display_mode 
> > *drm_pick_cmdline_mode(struct drm_fb_helper_connector *f
> > prefer_non_interlace = !cmdline_mode->interlace;
> >  again:
> > list_for_each_entry(mode, &fb_helper_conn->connector->modes, head) {
> > +   /* Check (optional) mode name first */
> > +   if (!strcmp(mode->name, cmdline_mode->name))
> > +   return mode;
> > +
> > /* check width/height */
> > if (mode->hdisplay != cmdline_mode->xres ||
> > mode->vdisplay != cmdline_mode->yres)
> > diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
> > index 7d5bdca276f2..fdbf541a5978 100644
> > --- a/drivers/gpu/drm/drm_modes.c
> > +++ b/drivers/gpu/drm/drm_modes.c
> > @@ -1413,7 +1413,7 @@ bool drm_mode_parse_command_line_for_connector(const 
> > char *mode_option,
> >struct drm_cmdline_mode 
> > *mode)
> >  {
> > const char *name;
> > -   bool parse_extras = false;
> > +   bool named_mode = false, parse_extras = false;
> > unsigned int bpp_off = 0, refresh_off = 0;
> > unsigned int mode_end = 0;
> > char *bpp_ptr = NULL, *refresh_ptr = NULL, *extra_ptr = NULL;
> > @@ -1432,8 +1432,14 @@ bool drm_mode_parse_command_line_for_connector(const 
> > char *mode_option,
> >
> > name = mode_option;
> >
> > +   /*
> > +* If the first character is not a digit, then it means that
> > +* we have a named mode.
> > +*/
> > if (!isdigit(name[0]))
> > -   return false;
> > +   named_mode = true;
> > +   else
> > +   named_mode = false;
> 
> named_mode = isalpha(name[0]); might be more succinct (and covers
> special characters).
> 
> >
> > /* Try to locate the bpp and refresh specifiers, if any */
> > bpp_ptr = strchr(name, '-');
> > @@ -1460,12 +1466,16 @@ bool 
> > drm_mode_parse_command_line_for_connector(const char *mode_option,
> > parse_extras = true;
> > }
> >
> > -   ret = drm_mode_parse_cmdline_res_mode(name, mode_end,
> > - parse_extras,
> > - connector,
> > - mode);
> > -   if (ret)
> > -   return false;
> > +   if (named_mode) {
> > +   strncpy(mode->name, name, mode_end);
> > +   } else {
> > +   ret = drm_mode_parse_cmdline_res_mode(name, mode_end,
> > + parse_extras,
> > + connector,
> > + mode);
> > +   if (ret)
> > +   return false;
> > +

Re: [PATCH v2 2/2] usb: dwc3: core: Support the dwc3 host suspend/resume

2016-11-20 Thread kbuild test robot
Hi Baolin,

[auto build test ERROR on next-20161117]
[cannot apply to balbi-usb/next usb/usb-testing v4.9-rc6 v4.9-rc5 v4.9-rc4 
v4.9-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Baolin-Wang/usb-host-plat-Enable-xhci-plat-runtime-PM/20161121-143535
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/usb/dwc3/host.c: In function 'dwc3_host_suspend':
>> drivers/usb/dwc3/host.c:158:10: error: implicit declaration of function 
>> 'pm_children_suspended' [-Werror=implicit-function-declaration]
 while (!pm_children_suspended(xhci) && --cnt > 0)
 ^
   cc1: some warnings being treated as errors

vim +/pm_children_suspended +158 drivers/usb/dwc3/host.c

   152  int ret, cnt = DWC3_HOST_SUSPEND_COUNT;
   153  
   154  /*
   155   * We need make sure the children of the xHCI device had been 
into
   156   * suspend state, or we will suspend xHCI device failed.
   157   */
 > 158  while (!pm_children_suspended(xhci) && --cnt > 0)
   159  msleep(DWC3_HOST_SUSPEND_TIMEOUT);
   160  
   161  if (cnt <= 0) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: moduleparam: introduce core_param_named macro for non-modular code

2016-11-20 Thread Jessica Yu

+++ Paul Gortmaker [14/11/16 21:00 -0500]:

We have the case where module_param_named() in file "foo.c" for
parameter myparam translates that into the bootarg for the
non-modular use case as "foo.myparam=..."

The problem exists where the use case with the filename and the
dot prefix is established, but the code is then realized to be 100%
non-modular, or is converted to non-modular.  Both of the existing
macros like core_param() or setup_param() do not append such a
prefix, so a straight conversion to either will break the existing
use cases.

Similarly, trying to embed a hard coded "foo." prefix on the name
fails cpp syntax due to the special nature of "." in code.  So we add
this parallel variant for the modular --> non-modular transition to
preserve existing and documented use cases with such a prefix.


Hm, I'm not convinced we need a core_ counterpart to module_param_named
(that's nearly identical), when module_param_named already implements
all of the above. Plenty of non-modular code already use it (e.g.
workqueue, printk), and a prefix is automatically supplied (which can be
overridden) in the non-modular case. That should already meet your
requirements, no?


Cc: Jessica Yu 
Cc: Rusty Russell 
Signed-off-by: Paul Gortmaker 
---

[Marking this RFC since I don't like the fact that it still requires
non-modular code to use moduleparam.h -- one possible fix for that is
to consider moving non-modular macros to a new param.h or similar. ]

include/linux/moduleparam.h | 17 +
1 file changed, 17 insertions(+)

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 52666d90ca94..4f2b92345eb5 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -269,6 +269,23 @@ static inline void kernel_param_unlock(struct module *mod)
__module_param_call("", name, ¶m_ops_##type, &var, perm, -1, 0)

/**
+ * core_param_named - define a module compat core kernel parameter.
+ * @name: the name of the cmdline and sysfs parameter (often the same as var)
+ * @var: the variable
+ * @type: the type of the parameter
+ * @perm: visibility in sysfs
+ *
+ * core_param_named is just like module_param_named(), but cannot be modular
+ * and it _does_ add a prefix (such as "printk.").  This is for compatibility
+ * with module_param_named(), and it exists to provide boot arg compatibility
+ * with code that was previously using the modular version with the prefix.
+ */
+#define core_param_named(name, var, type, perm)
\
+   param_check_##type(name, &(var));   \
+   __module_param_call(KBUILD_MODNAME ".", name, ¶m_ops_##type,\
+   &var, perm, -1, 0)
+
+/**
 * core_param_unsafe - same as core_param but taints kernel
 */
#define core_param_unsafe(name, var, type, perm)\
--
2.10.1



Re: [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion

2016-11-20 Thread Sagi Grimberg



diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..de80f56 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -619,7 +619,7 @@
mutex_unlock(&isert_np->mutex);

isert_info("np %p: Allow accept_np to continue\n", isert_np);
-   up(&isert_np->sem);
+   complete(&isert_np->comp);
 }

 static void
@@ -2311,7 +2311,7 @@ struct rdma_cm_id *
isert_err("Unable to allocate struct isert_np\n");
return -ENOMEM;
}
-   sema_init(&isert_np->sem, 0);
+   init_completion(&isert_np->comp);


This is still racy, a connect event can complete just before we
init the completion and *will* get lost...

This code started off with a waitqueue which exposes the same
problem, see:
531b7bf4bd79 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow

So, still NAK from me...


Re: [PATCH 1/5] drm/modes: Rewrite the command line parser

2016-11-20 Thread Maxime Ripard
Hi Sean,

Thanks for taking the time to review this.

On Wed, Nov 16, 2016 at 12:12:53PM -0500, Sean Paul wrote:
> On Tue, Oct 18, 2016 at 4:29 AM, Maxime Ripard
>  wrote:
> > Rewrite the command line parser in order to get away from the state machine
> > parsing the video mode lines.
> >
> > Hopefully, this will allow to extend it more easily to support named modes
> > and / or properties set directly on the command line.
> >
> > Signed-off-by: Maxime Ripard 
> > ---
> >  drivers/gpu/drm/drm_modes.c | 305 +++--
> >  1 file changed, 190 insertions(+), 115 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
> > index 53f07ac7c174..7d5bdca276f2 100644
> > --- a/drivers/gpu/drm/drm_modes.c
> > +++ b/drivers/gpu/drm/drm_modes.c
> > @@ -30,6 +30,7 @@
> >   * authorization from the copyright holder(s) and author(s).
> >   */
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1261,6 +1262,131 @@ void drm_mode_connector_list_update(struct 
> > drm_connector *connector)
> >  }
> >  EXPORT_SYMBOL(drm_mode_connector_list_update);
> >
> > +static int drm_mode_parse_cmdline_bpp(const char *str, char **end_ptr,
> > + struct drm_cmdline_mode *mode)
> > +{
> > +   if (str[0] != '-')
> > +   return -EINVAL;
> > +
> > +   mode->bpp = simple_strtol(str + 1, end_ptr, 10);
> > +   mode->bpp_specified = true;
> > +
> > +   return 0;
> > +}
> > +
> > +static int drm_mode_parse_cmdline_refresh(const char *str, char **end_ptr,
> > + struct drm_cmdline_mode *mode)
> > +{
> > +   if (str[0] != '@')
> > +   return -EINVAL;
> > +
> > +   mode->refresh = simple_strtol(str + 1, end_ptr, 10);
> > +   mode->refresh_specified = true;
> > +
> > +   return 0;
> > +}
> > +
> > +static int drm_mode_parse_cmdline_extra(const char *str, int length,
> > +   struct drm_connector *connector,
> > +   struct drm_cmdline_mode *mode)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < length; i++) {
> > +   switch (str[i]) {
> > +   case 'i':
> > +   mode->interlace = true;
> > +   break;
> > +   case 'm':
> > +   mode->margins = true;
> > +   break;
> > +   case 'D':
> > +   if (mode->force != DRM_FORCE_UNSPECIFIED)
> > +   return -EINVAL;
> > +
> > +   if ((connector->connector_type != 
> > DRM_MODE_CONNECTOR_DVII) &&
> > +   (connector->connector_type != 
> > DRM_MODE_CONNECTOR_HDMIB))
> > +   mode->force = DRM_FORCE_ON;
> > +   else
> > +   mode->force = DRM_FORCE_ON_DIGITAL;
> > +   break;
> > +   case 'd':
> > +   if (mode->force != DRM_FORCE_UNSPECIFIED)
> > +   return -EINVAL;
> > +
> > +   mode->force = DRM_FORCE_OFF;
> > +   break;
> > +   case 'e':
> > +   if (mode->force != DRM_FORCE_UNSPECIFIED)
> > +   return -EINVAL;
> > +
> > +   mode->force = DRM_FORCE_ON;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static int drm_mode_parse_cmdline_res_mode(const char *str, unsigned int 
> > length,
> > +  bool extras,
> > +  struct drm_connector *connector,
> > +  struct drm_cmdline_mode *mode)
> > +{
> > +   bool rb = false, cvt = false;
> > +   int xres = 0, yres = 0;
> > +   int remaining, i;
> > +   char *end_ptr;
> > +
> > +   xres = simple_strtol(str, &end_ptr, 10);
> > +
> 
> checkpatch is telling me to use kstrtol instead, as simple_strtol is 
> deprecated
> 
> > +   if (end_ptr[0] != 'x')
> 
> check that end_ptr != NULL? you should probably also check that xres
> isn't an error (ie: -ERANGE or -EINVAL)
> 
> > +   return -EINVAL;
> > +   end_ptr++;
> > +
> > +   yres = simple_strtol(end_ptr, &end_ptr, 10);
> 
> check end_ptr != NULL and yres sane
> 
> > +
> > +   remaining = length - (end_ptr - str);
> > +   if (remaining < 0)
> 
> right, so if end_ptr is NULL here, we'll end up with a huge positive
> value for remaining :)
> 
> > +   return -EINVAL;
> > +
> > +   for (i = 0; i < remaining; i++) {
> > +   switch (end_ptr[i]) {
> > +   case 'M':
> > +   cvt = true;
> 
> the previous code

Re: [PATCH] stm class: Add a missing call to put_device

2016-11-20 Thread Alexander Shishkin
Quentin Lambert  writes:

> Most error branches following the call to class_find_device contain
> a call to put_device. This patch add calls to put_device where
> they are missing.
>
> This issue was found with Hector.
>
> Signed-off-by: Quentin Lambert 
>
> ---
>  drivers/hwtracing/stm/core.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/drivers/hwtracing/stm/core.c
> +++ b/drivers/hwtracing/stm/core.c
> @@ -368,8 +368,10 @@ static int stm_char_open(struct inode *i
>   return -ENODEV;
>  
>   stmf = kzalloc(sizeof(*stmf), GFP_KERNEL);
> - if (!stmf)
> + if (!stmf) {
> + put_device(dev);
>   return -ENOMEM;
> + }

There is a goto label at the bottom of this function which is supposed
 to deal with this. See the fix that we already have [1] for this issue.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/ash/stm.git/commit/?h=stm-for-greg-20161118&id=a0ebf519b8a2666438d999c62995618c710573e5

Regards,
--
alex


Re: [PATCH v16 05/15] clocksource/drivers/arm_arch_timer: fix a bug in arch_timer_register about arch_timer_uses_ppi

2016-11-20 Thread Fu Wei
Hi Mark,

On 19 November 2016 at 02:52, Mark Rutland  wrote:
> On Wed, Nov 16, 2016 at 09:48:58PM +0800, fu@linaro.org wrote:
>> From: Fu Wei 
>>
>> The patch fix a potential bug about arch_timer_uses_ppi in
>> arch_timer_register.
>> On ARM64, we don't use ARCH_TIMER_PHYS_SECURE_PPI in Linux, so we will
>> just igorne it in init code.
>
> That's not currently the case. I assume you mean we will in later
> patches? If so, please make that clear in the commit message.
>
>> If arch_timer_uses_ppi is ARCH_TIMER_PHYS_NONSECURE_PPI, the orignal
>> code of arch_timer_uses_ppi may go wrong.
>
> How? What specifically happens?
>
> We don't currently assign ARCH_TIMER_PHYS_NONSECURE_PPI to
> arch_timer_uses_ppi, so I assume a later patch changes this. This change
> should be folded into said patch; it doesn't make sense in isolation.

yes, this patch is a preparation for the next which may set
arch_timer_use_ppi as ARCH_TIMER_PHYS_NONSECURE_PPI.
So you are right, I will merge this into the next and mention this
change in the commit message.

Great thanks

>
> Thanks,
> Mark.
>
>> Signed-off-by: Fu Wei 
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c 
>> b/drivers/clocksource/arm_arch_timer.c
>> index dd1040d..6de164f 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -699,7 +699,7 @@ static int __init arch_timer_register(void)
>>   case ARCH_TIMER_PHYS_NONSECURE_PPI:
>>   err = request_percpu_irq(ppi, arch_timer_handler_phys,
>>"arch_timer", arch_timer_evt);
>> - if (!err && arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]) {
>> + if (!err && arch_timer_has_nonsecure_ppi()) {
>>   ppi = arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI];
>>   err = request_percpu_irq(ppi, arch_timer_handler_phys,
>>"arch_timer", arch_timer_evt);
>> --
>> 2.7.4
>>



-- 
Best regards,

Fu Wei
Software Engineer
Red Hat


Re: [PATCH 5/5] drm/sun4i: Add support for the overscan profiles

2016-11-20 Thread Maxime Ripard
On Fri, Nov 11, 2016 at 10:17:55AM +0100, Daniel Vetter wrote:
> On Thu, Nov 10, 2016 at 03:56:30PM +0100, Maxime Ripard wrote:
> > Hi Daniel,
> > 
> > On Tue, Nov 08, 2016 at 09:59:27AM +0100, Daniel Vetter wrote:
> > > On Tue, Oct 18, 2016 at 10:29:38AM +0200, Maxime Ripard wrote:
> > > > Create overscan profiles reducing the displayed zone.
> > > > 
> > > > For each TV standard (PAL and NTSC so far), we create 4 more reduced 
> > > > modes
> > > > by steps of 5% that the user will be able to select.
> > > > 
> > > > Signed-off-by: Maxime Ripard 
> > > 
> > > tbh I think if we agree to do this (and that still seems an open question)
> > > I think there should be a generic helper to add these overscan modes with
> > > increased porches. Anything that only depends upon the sink (and
> > > overscanning is something the sink does) should imo be put into a suitable
> > > helper library for everyone to share.
> > > 
> > > Or maybe even stash it into the probe helpers and call it for all TV
> > > connectors. Definitely not a driver-private thing.
> > 
> > Last time we discussed it, my recollection was that you didn't want to
> > have generic code for it, but I'd be happy to implement it.
> > 
> > I'll come up with something like that.
> 
> Well I can flip-flop around with the nonsense I'm sometimes emitting ;-)
> Since you called me out, feel free to do whatever you want ...

I also found the generic solution to be a much better solution, so
I'll definitely implement it :)

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH resend] kbuild: provide include/asm/asm-prototypes.h for x86

2016-11-20 Thread Nicholas Piggin
Hi Adam,

Thanks. I'd suggest doing x86: or x86/kbuild: prefix for the patch. Also
possibly consider describing what the patch does at a higher level in your
subject line, e.g.:

  x86/kbuild: enable modversions for symbols exported from asm

Also, it wouldn't hurt to add a little changelog of your own. Describe
problem then solution, e.g.,

  Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
  modversion support for symbols exported from asm files. Architectures
  must include C-style declarations for those symbols in asm/asm-prototypes.h
  in order for them to be versioned.

  Add these declarations for x86, and an architecture-independent file that
  can be used for common symbols.

(if you want to use that as-is or rewrite it, no problem).

You can add Acked-by: Nicholas Piggin 

Also it's not a big deal, but if you redo the patch, you could consider
splitting it into two (first add the generic header, then the x86 header),
but both can go via the x86 tree.

Thanks,
Nick

On Mon, 21 Nov 2016 07:39:45 +0100
Adam Borowski  wrote:

> Nicholas Piggin wrote:
> > Architectures will need to have an include/asm/asm-prototypes.h that
> > defines or #include<>s C-style prototypes for exported asm functions.
> > We can do an asm-generic version for the common ones like memset so
> > there's not a lot of pointless duplication there.  
> 
> Signed-off-by: Adam Borowski 
> Tested-by: Kalle Valo 
> ---
>  arch/x86/include/asm/asm-prototypes.h | 12 
>  include/asm-generic/asm-prototypes.h  |  7 +++
>  2 files changed, 19 insertions(+)
>  create mode 100644 arch/x86/include/asm/asm-prototypes.h
>  create mode 100644 include/asm-generic/asm-prototypes.h
> 
> diff --git a/arch/x86/include/asm/asm-prototypes.h 
> b/arch/x86/include/asm/asm-prototypes.h
> new file mode 100644
> index 000..ae87224
> --- /dev/null
> +++ b/arch/x86/include/asm/asm-prototypes.h
> @@ -0,0 +1,12 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> diff --git a/include/asm-generic/asm-prototypes.h 
> b/include/asm-generic/asm-prototypes.h
> new file mode 100644
> index 000..df13637
> --- /dev/null
> +++ b/include/asm-generic/asm-prototypes.h
> @@ -0,0 +1,7 @@
> +#include 
> +extern void *__memset(void *, int, __kernel_size_t);
> +extern void *__memcpy(void *, const void *, __kernel_size_t);
> +extern void *__memmove(void *, const void *, __kernel_size_t);
> +extern void *memset(void *, int, __kernel_size_t);
> +extern void *memcpy(void *, const void *, __kernel_size_t);
> +extern void *memmove(void *, const void *, __kernel_size_t);



Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst  wrote:
> > On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
> >> This is a question for the old-timers here, since I can't find
> >> anything resembling an answer in the SDM.
> >>
> >> Suppose an exception happens (#UD in this case, but I assume it
> >> doesn't really matter).  We're not in long mode, and the IDT is set up
> >> to deliver to a normal 32-bit kernel code segment.  We're running in
> >> that very same code segment when the exception hits, so no CPL change
> >> occurs and the TSS doesn't particularly matter.
> >>
> >> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> >> happens to the high word of CS on the stack?
> >>
> >> The SDM appears to say nothing at all about this.  Modern systems
> >> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> >> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> >> high bits (or maybe just leave whatever was already on the stack in
> >> place).
> >>
> >> Do any of you happen to know what's going on and when the behavior
> >> changed?  I'd like to know just how big of a problem this is.  Because
> >> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> >> them.
> >>
> >> --Andy
> >
> > This came up a while back, and we was determined that we can't assume
> > zero-extension in 32-bit mode because older processors only do a
> > 16-bit write even on a 32-bit push.  So all segments have to be
> > treated as 16-bit values, or we have to explicitly zero-extend them.
> >
> > All 64-bit capable processors do zero-extend segments, even in 32-bit mode.
> 
> This almost makes me want to change the definition of pt_regs on
> 32-bit rather than fixing all the entry code.

So I have applied your fix that addresses the worst fallout directly:

  fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
early_fixup_exception()

... but otherwise we might be better off zeroing out the high bits of segment 
registers stored on the stack, in all entry code pathways - maybe using a 
single 
function and conditional on 

Re: [PATCH v3 0/2] Ajust lockdep static allocations for sparc

2016-11-20 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Fri, Nov 18, 2016 at 02:34:07PM -0500, David Miller wrote:
> > From: Babu Moger 
> > Date: Tue, 27 Sep 2016 12:33:26 -0700
> > 
> > > These patches limit the static allocations for lockdep data structures
> > > used for debugging locking correctness. For sparc, all the kernel's code,
> > > data, and bss, must have locked translations in the TLB so that we don't
> > > get TLB misses on kernel code and data. Current sparc chips have 8 TLB
> > > entries available that may be locked down, and with a 4mb page size,
> > > this gives a maximum of 32MB. With PROVE_LOCKING we could go over this
> > > limit and cause system boot-up problems. These patches limit the static
> > > allocations so that everything fits in current required size limit.
> > > 
> > > patch 1 : Adds new config parameter CONFIG_PROVE_LOCKING_SMALL
> > > Patch 2 : Adjusts the sizes based on the new config parameter
> > > 
> > > v2-> v3:
> > >Some more comments from Sam Ravnborg and Peter Zijlstra.
> > >Defined PROVE_LOCKING_SMALL as invisible and moved the selection to
> > >arch/sparc/Kconfig. 
> > > 
> > > v1-> v2:
> > >As suggested by Peter Zijlstra, keeping the default as is.
> > >Introduced new config variable CONFIG_PROVE_LOCKING_SMALL
> > >to handle sparc specific case.
> > > 
> > > v0:
> > >Initial revision.
> > 
> > Series applied, thanks.
> 
> Heh, I was only waiting for an ACK from you, but this works too :-)

Works for me too - as usual davem is fantastic in terms of efficient patch flow 
:)

Thanks,

Ingo


Re: [PATCH] drivers/usb: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-20 Thread Greg KH
On Sun, Nov 20, 2016 at 08:09:40AM -0800, Davidlohr Bueso wrote:
> Hi Greg!
> 
> On Sun, 20 Nov 2016, Greg KH wrote:
> 
> > On Sat, Nov 19, 2016 at 11:54:25AM -0800, Davidlohr Bueso wrote:
> > > With the new standardized functions, we can replace all ACCESS_ONCE()
> > > calls across relevant drivers/usb/.
> > > 
> > > ACCESS_ONCE() does not work reliably on non-scalar types. For example
> > > gcc 4.6 and 4.7 might remove the volatile tag for such accesses during
> > > the SRA (scalar replacement of aggregates) step:
> > > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
> > > 
> > > Update the new calls regardless of if it is a scalar type, this is
> > > cleaner than having three alternatives.
> > > 
> > > Signed-off-by: Davidlohr Bueso 
> > 
> > Nit, this doesn't match your From: line :(
> 
> That's on purpose, and all my patches are the same.

So they are all incorrect?  Not good, why?  You know this means I can't
take them...

> > If this is the case, why not just replacing the define for ACCESS_ONCE()
> > with READ_ONCE() and then go back and just do a search/replace for the
> > whole kernel all at once?
> 
> So that we don't have three variants; the idea is to eventually
> get rid of ACCESS_ONCE entirely.

Then just get rid of it all at once.

thanks,

greg k-h


Re: [PATCH] thermal/powerclamp: add back module device table

2016-11-20 Thread Greg Kroah-Hartman
On Mon, Nov 21, 2016 at 11:43:10AM +0800, Zhang Rui wrote:
> On Thu, 2016-11-17 at 11:42 -0800, Jacob Pan wrote:
> > On Tue, 15 Nov 2016 08:03:32 +0100
> > Greg Kroah-Hartman  wrote:
> > 
> > > 
> > > On Mon, Nov 14, 2016 at 11:08:45AM -0800, Jacob Pan wrote:
> > > > 
> > > > Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
> > > > cpu id table with a cpu feature check, which is logically
> > > > correct.
> > > > But we need the module device table to allow module auto loading.
> > > > 
> > > > Fixes:3105f234 thermal/powerclamp: correct cpu support check
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/thermal/intel_powerclamp.c | 9 -
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)  
> > > 
> > > 
> > > This is not the correct way to submit patches for inclusion in the
> > > stable kernel tree.  Please read
> > > Documentation/stable_kernel_rules.txt
> > > for how to do this properly.
> > > 
> > > 
> > Good to know, thanks. Rui will take care of it this time. Per Rui
> > "I will apply patch 1 and queue up for next -rc and 4.8 stable."
> > 
> 
> Just find another problem.
> We're still missing this upstream
> commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 (thermal/powerclamp:
> correct cpu support check) for 4.7 stable, and in this case, we can not
> queue this patch for both 4.7 and 4.8 stable at the moment because it
> does not apply to 4.7 stable.

I don't understand, 4.7 is end-of-life, no one cares about it anymore,
why are you worrying about that kernel version?

confused,

greg k-h


[PATCH] ARM: dts: exynos: remove the cd-gpios property for eMMC of odroid-xu3/4

2016-11-20 Thread Jaehoon Chung
Odroid-xu3/4 didn't need to use the cd-gpios for detecting card.
Because Host controller has the CDETECT register through SDx_CDN line.
Host controller can know whether card is inserted or not with this
register.

When i have checked the Odroid-xu3/4, they are using CDETECT register.
(Not using exteranl cd-gpio.)

Fixes: fb1aeedb61ad ("ARM: dts: add mmc detect gpio for exynos5422-odroidxu3")
Signed-off-by: Jaehoon Chung 
---
 arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi 
b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
index 9e63328..05b9afdd 100644
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
+++ b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
@@ -510,7 +510,6 @@
 &mmc_0 {
status = "okay";
mmc-pwrseq = <&emmc_pwrseq>;
-   cd-gpios = <&gpc0 2 GPIO_ACTIVE_LOW>;
card-detect-delay = <200>;
samsung,dw-mshc-ciu-div = <3>;
samsung,dw-mshc-sdr-timing = <0 4>;
-- 
2.10.1



Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs

2016-11-20 Thread Martin Schwidefsky
On Fri, 18 Nov 2016 15:47:02 +0100
Frederic Weisbecker  wrote:

> On Fri, Nov 18, 2016 at 01:08:46PM +0100, Martin Schwidefsky wrote:
> > On Thu, 17 Nov 2016 19:08:07 +0100
> > Frederic Weisbecker  wrote:
> > 
> > > I'm sorry for the patchbomb, especially as I usually complain about
> > > these myself but I don't see any way to split this patchset into
> > > standalone pieces, none of which would make any sense... All I can do
> > > is to isolate about 3 cleanup patches.
> > 
> > On first glance the patches look ok-ish, but I am not happy about the
> > direction this takes.
> > 
> > I can understand the wish to consolidate the common code to a single
> > format which is nano-seconds. It will have repercussions though.
> > 
> > First the obvious problem, it does not compile for s390:
> > 
> > arch/s390/kernel/vtime.c: In function 'do_account_vtime':
> > arch/s390/kernel/vtime.c:140:25: error: implicit declaration of function
> > 'cputime_to_nsecs' [-Werror=implicit-function-declaration]
> >   account_user_time(tsk, cputime_to_nsecs(user));
> >  ^~~~
> > arch/s390/kernel/idle.c: In function 'enabled_wait':
> > arch/s390/kernel/idle.c:46:20: error: implicit declaration of function
> > 'cputime_to_nsecs' [-Werror=implicit-function-declaration]
> >   account_idle_time(cputime_to_nsecs(idle_time));
> > ^~~~
> > arch/s390/kernel/idle.c: In function 'arch_cpu_idle_time':
> > arch/s390/kernel/idle.c:100:9: error: implicit declaration of function
> > 'cputime_to_nsec' [-Werror=implicit-function-declaration]
> >   return cputime_to_nsec(idle_enter ? ((idle_exit ?: now) - idle_enter) : 
> > 0);
> >  ^~~
> 
> Yes sorry I haven't yet done much build-testing. I should have written that 
> it's
> not build-tested yet. This patchset in its current state is rather an RFC.

No big deal, I got it to compile with a small change.

> > The error at idle.c:100 is a typo cputime_to_nsec vs cputime_to_nsecs.
> > The other two could probably be solved with an additional include but the
> > default cputime_to_nsecs is in include/linux/cputime.h is this:
> > 
> > #ifndef cputime_to_nsecs
> > # define cputime_to_nsecs(__ct) \
> > (cputime_to_usecs(__ct) * NSEC_PER_USEC)
> > #endif
> > 
> > which downgrades the accuracy for s390 from better than nano-seconds
> > to micro-seconds. Not good. For the s390 cputime format you would have
> > to do
> > 
> > static inline unsigned long long cputime_to_nsecs(const cputime_t cputime)
> > {
> > return ((__force unsigned long long) cputime * 1000) >> 12;
> > }
> 
> I agree, that loss of acurracy is my biggest worry. Hence the accumulation
> idea, but more about that later.

We can not allow that to happen, but the accumulation should take care of it.

> > 
> > But this *example* function has an overflow problem. 
> > 
> > > So currently, cputime_t serves the purpose, for s390 and
> > > powerpc (on CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y), to avoid converting
> > > arch clock counters to nanosecs or jiffies while accounting cputime.
> > 
> > The cputime_t has several purposes:
> > 1) Allow for different units in the calculations for virtual cpu time.
> >There are currently three models: jiffies, nano-seconds and the native
> >TOD clock format for s390 which is a bit better than nano-seconds.
> 
> Sure, I don't disagree with that, just with the way it is done (ie: stored
> and maintained in the core to this very obscure type).
> 
> > 2) Act as a marker in the common code where a virtual cpu time is used.
> >This is more important than you might think, unfortunately it is very
> >easy to confuse a wall-clock delta with cpu time.
> 
> There you lost me, I don't get which confusion you're pointing.

The confusion stems from the fact that you do *not* have a simple nano-second
value but a modal value that depends on the architecture. More below..

> > 3) Avoid expensive operations on the fast path to convert the native cpu
> >time to something else. Instead move the expensive calculation to the
> >read-out code, e.g. fs/proc.
> > 
> > You patches breaks all three of these purposes. My main gripe is with 3).
> > 
> > > But this comes at the cost of a lot of complexity and uglification
> > > in the core code to deal with such an opaque type that relies on lots of
> > > mutators and accessors in order to deal with a random granularity time
> > > unit that also involve lots of workarounds and likely some performance
> > > penalties.
> > 
> > Having an opaque type with a set of helper functions is the whole point, no?
> > And I would not call the generic implementations for jiffies or nano-seconds
> > complex, these are easy enough to understand. And what are the performance
> > penalties you are talking about?
> 
> Just because some code isn't too complex doesn't mean we really want to keep 
> it.
> I get regular questions about what unit does cputime_t map to on a given
> c

Re: [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags

2016-11-20 Thread Anshuman Khandual
On 11/21/2016 10:23 AM, Jerome Glisse wrote:
> On Mon, Nov 21, 2016 at 11:44:36AM +1100, Balbir Singh wrote:
>>
>>
>> On 19/11/16 05:18, Jérôme Glisse wrote:
>>> Only usefull for arch where we support ZONE_DEVICE and where we want to
>>> also support un-addressable device memory. We need struct page for such
>>> un-addressable memory. But we should avoid populating the kernel linear
>>> mapping for the physical address range because there is no real memory
>>> or anything behind those physical address.
>>>
>>> Hence we need more flags than just knowing if it is device memory or not.
>>>
>>
>>
>> Isn't it better to add a wrapper to arch_add/remove_memory and do those
>> checks inside and then call arch_add/remove_memory to reduce the churn.
>> If you need selectively enable MEMORY_UNADDRESSABLE that can be done with
>> _ARCH_HAS_FEATURE
> 
> The flag parameter can be use by other new features and thus i thought the
> churn was fine. But i do not mind either way, whatever people like best.

Right, once we get the device memory classification right, these flags
can be used in more places.

> 
> [...]
> 
>>> -extern int arch_add_memory(int nid, u64 start, u64 size, bool for_device);
>>> +
>>> +/*
>>> + * For device memory we want more informations than just knowing it is 
>>> device
>>   information
>>> + * memory. We want to know if we can migrate it (ie it is not storage 
>>> memory
>>> + * use by DAX). Is it addressable by the CPU ? Some device memory like GPU
>>> + * memory can not be access by CPU but we still want struct page so that we
>>  accessed
>>> + * can use it like regular memory.
>>
>> Can you please add some details on why -- migration needs them for example?
> 
> I am not sure what you mean ? DAX ie persistent memory device is intended to 
> be
> use for filesystem or persistent storage. Hence memory migration does not 
> apply
> to it (it would go against its purpose).

Why ? It can still be used for compaction, HW errors etc where we need to
move between persistent storage areas. The source and destination can be
persistent storage memory.

> 
> So i want to extend ZONE_DEVICE to be more then just DAX/persistent memory. 
> For
> that i need to differentatiate between device memory that can be migrated and
> should be more or less treated like regular memory (with struct page). This is
> what the MEMORY_MOVABLE flag is for.

ZONE_DEVICE right now also supports struct page for the addressable memory,
(whether inside it's own range or in system RAM) with this we are extending
it to cover un-addressable memory with struct pages. Yes the differentiation
is required.

> 
> Finaly in my case the device memory is not accessible by the CPU so i need yet
> another flag. In the end i am extending ZONE_DEVICE to be use for 3 differents
> type of memory.
> 
> Is this the kind of explanation you are looking for ?



[lkp] [rcu] 83ee00c6cf: WARNING:at_kernel/softirq.c:#__local_bh_enable

2016-11-20 Thread kernel test robot

FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Ding-Tianhong/rcu-fix-the-OOM-problem-of-huge-IP-abnormal-packet-traffic/20161118-204521
commit 83ee00c6cf5eaa85f74094d6800732edf7114ef9 ("rcu: fix the OOM problem of 
huge IP abnormal packet traffic")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -m 320M

caused below changes:


++++
|| 68ad1194cf | 83ee00c6cf |
++++
| boot_successes | 6  | 0  |
| boot_failures  | 0  | 6  |
| WARNING:at_kernel/softirq.c:#__local_bh_enable | 0  | 6  |
| calltrace:_local_bh_enable | 0  | 6  |
++++



[0.846125] PCI: CLS 0 bytes, default 64
[0.847479] Unpacking initramfs...
[0.849690] [ cut here ]
[0.850615] WARNING: CPU: 0 PID: 9 at kernel/softirq.c:140 
__local_bh_enable+0x35/0x41
[0.852518] Modules linked in:
[0.853178] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 4.9.0-rc1-00041-g83ee00c 
#1
[0.854630] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[0.856628]  882f7c70 81267760  
81837340
[0.858185]  882f7cb0 81060e07 008c0009 
0200
[0.859742]  882f7dc8 88000f808bb0 0001 
88000f808bb0
[0.861293] Call Trace:
[0.861795]  [] dump_stack+0x61/0x7e
[0.862809]  [] __warn+0xf5/0x110
[0.863870]  [] warn_slowpath_null+0x18/0x1a
[0.865020]  [] __local_bh_enable+0x35/0x41
[0.866143]  [] _local_bh_enable+0x3d/0x3f
[0.867252]  [] rcu_nocb_kthread+0x69b/0x6f2
[0.868393]  [] ? __d_free_external+0x3f/0x3f
[0.869554]  [] ? note_gp_changes+0xcd/0xcd
[0.870679]  [] ? __schedule+0x5fc/0x73c
[0.871755]  [] ? note_gp_changes+0xcd/0xcd
[0.872980]  [] kthread+0x191/0x1a0
[0.873971]  [] ? kthread_park+0x5d/0x5d
[0.875059]  [] ? finish_task_switch+0x1e4/0x2a0
[0.876262]  [] ? kthread_park+0x5d/0x5d
[0.877331]  [] ? kthread_park+0x5d/0x5d
[0.878401]  [] ret_from_fork+0x25/0x30
[0.879484] ---[ end trace 825c5dbf85ebfadd ]---
[0.899723] workqueue: round-robin CPU selection forced, expect performance 
impact
[2.115863] Freeing initrd memory: 9088K (88001370 - 
880013fe)


To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.9.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KASAN_SHADOW_OFFSET=0xdc00
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CON

[PATCH] ACPI: small formatting fixes

2016-11-20 Thread Nick Desaulniers
A quick cleanup that passes scripts/checkpatch.pl -f .

Signed-off-by: Nick Desaulniers 
---
 arch/x86/kernel/acpi/cstate.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index af15f44..ed52aec 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (C) 2005 Intel Corporation
- * Venkatesh Pallipadi 
- * - Added _PDC for SMP C-states on Intel CPUs
+ * Venkatesh Pallipadi 
+ * - Added _PDC for SMP C-states on Intel CPUs
  */
 
 #include 
@@ -12,7 +12,6 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
@@ -50,8 +49,8 @@ void acpi_processor_power_init_bm_check(struct 
acpi_processor_flags *flags,
 * P4, Core and beyond CPUs
 */
if (c->x86_vendor == X86_VENDOR_INTEL &&
-   (c->x86 > 0xf || (c->x86 == 6 && c->x86_model >= 0x0f)))
-   flags->bm_control = 0;
+   (c->x86 > 0xf || (c->x86 == 6 && c->x86_model >= 0x0f)))
+   flags->bm_control = 0;
 }
 EXPORT_SYMBOL(acpi_processor_power_init_bm_check);
 
@@ -89,7 +88,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void *_cx)
retval = 0;
/* If the HW does not support any sub-states in this C-state */
if (num_cstate_subtype == 0) {
-   pr_warn(FW_BUG "ACPI MWAIT C-state 0x%x not supported by HW 
(0x%x)\n", cx->address, edx_part);
+   pr_warn(FW_BUG "ACPI MWAIT C-state 0x%x not supported by HW 
(0x%x)\n",
+   cx->address, edx_part);
retval = -1;
goto out;
}
@@ -103,9 +103,8 @@ static long acpi_processor_ffh_cstate_probe_cpu(void *_cx)
 
if (!mwait_supported[cstate_type]) {
mwait_supported[cstate_type] = 1;
-   printk(KERN_DEBUG
-   "Monitor-Mwait will be used to enter C-%d "
-   "state\n", cx->type);
+   pr_debug("Monitor-Mwait will be used to enter C-%d state\n",
+   cx->type);
}
snprintf(cx->desc,
ACPI_CX_DESC_LEN, "ACPI FFH INTEL MWAIT 0x%x",
@@ -159,13 +158,14 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct 
acpi_processor_cx *cx)
 
percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
mwait_idle_with_hints(percpu_entry->states[cx->index].eax,
- percpu_entry->states[cx->index].ecx);
+   percpu_entry->states[cx->index].ecx);
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter);
 
 static int __init ffh_cstate_init(void)
 {
struct cpuinfo_x86 *c = &boot_cpu_data;
+
if (c->x86_vendor != X86_VENDOR_INTEL)
return -1;
 
-- 
2.9.3



Re: [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags

2016-11-20 Thread Anshuman Khandual
On 11/18/2016 11:48 PM, Jérôme Glisse wrote:
> Only usefull for arch where we support ZONE_DEVICE and where we want to

A small nit s/usefull/useful/

> also support un-addressable device memory. We need struct page for such
> un-addressable memory. But we should avoid populating the kernel linear
> mapping for the physical address range because there is no real memory
> or anything behind those physical address.
> 
> Hence we need more flags than just knowing if it is device memory or not.
> 
> Signed-off-by: Jérôme Glisse 
> Cc: Russell King 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Martin Schwidefsky 
> Cc: Heiko Carstens 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: Chris Metcalf 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> ---
>  arch/ia64/mm/init.c| 19 ---
>  arch/powerpc/mm/mem.c  | 18 +++---
>  arch/s390/mm/init.c| 10 --
>  arch/sh/mm/init.c  | 18 +++---
>  arch/tile/mm/init.c| 10 --
>  arch/x86/mm/init_32.c  | 19 ---
>  arch/x86/mm/init_64.c  | 19 ---
>  include/linux/memory_hotplug.h | 17 +++--
>  kernel/memremap.c  |  4 ++--
>  mm/memory_hotplug.c|  4 ++--
>  10 files changed, 113 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 1841ef6..95a2fa5 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -645,7 +645,7 @@ mem_init (void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
> +int arch_add_memory(int nid, u64 start, u64 size, int flags)
>  {
>   pg_data_t *pgdat;
>   struct zone *zone;
> @@ -653,10 +653,17 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>   int ret;
>  
> + /* Need to add support for device and unaddressable memory if needed */
> + if (flags & MEMORY_UNADDRESSABLE) {
> + BUG();
> + return -EINVAL;
> + }
> +
>   pgdat = NODE_DATA(nid);
>  
>   zone = pgdat->node_zones +
> - zone_for_memory(nid, start, size, ZONE_NORMAL, for_device);
> + zone_for_memory(nid, start, size, ZONE_NORMAL,
> + flags & MEMORY_DEVICE);
>   ret = __add_pages(nid, zone, start_pfn, nr_pages);
>  
>   if (ret)
> @@ -667,13 +674,19 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -int arch_remove_memory(u64 start, u64 size)
> +int arch_remove_memory(u64 start, u64 size, int flags)
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>   struct zone *zone;
>   int ret;
>  
> + /* Need to add support for device and unaddressable memory if needed */
> + if (flags & MEMORY_UNADDRESSABLE) {
> + BUG();
> + return -EINVAL;
> + }
> +
>   zone = page_zone(pfn_to_page(start_pfn));
>   ret = __remove_pages(zone, start_pfn, nr_pages);
>   if (ret)
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 5f84433..e3c0532 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -126,7 +126,7 @@ int __weak remove_section_mapping(unsigned long start, 
> unsigned long end)
>   return -ENODEV;
>  }
>  
> -int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
> +int arch_add_memory(int nid, u64 start, u64 size, int flags)
>  {
>   struct pglist_data *pgdata;
>   struct zone *zone;
> @@ -134,6 +134,12 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>   int rc;
>  
> + /* Need to add support for device and unaddressable memory if needed */
> + if (flags & MEMORY_UNADDRESSABLE) {
> + BUG();
> + return -EINVAL;
> + }
> +
>   pgdata = NODE_DATA(nid);
>  
>   start = (unsigned long)__va(start);
> @@ -147,18 +153,24 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  
>   /* this should work for most non-highmem platforms */
>   zone = pgdata->node_zones +
> - zone_for_memory(nid, start, size, 0, for_device);
> + zone_for_memory(nid, start, size, 0, flags & MEMORY_DEVICE);
>  
>   return __add_pages(nid, zone, start_pfn, nr_pages);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -int arch_remove_memory(u64 start, u64 size)
> +int arch_remove_memory(u64 start, u64 size, int flags)
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>   struct zone *zone;
>   int ret;
> + 
> + /* Need to add support for device and unaddressable memory if needed */
> + if (flags & MEMORY

[PATCH resend] kbuild: provide include/asm/asm-prototypes.h for x86

2016-11-20 Thread Adam Borowski
Nicholas Piggin wrote:
> Architectures will need to have an include/asm/asm-prototypes.h that
> defines or #include<>s C-style prototypes for exported asm functions.
> We can do an asm-generic version for the common ones like memset so
> there's not a lot of pointless duplication there.

Signed-off-by: Adam Borowski 
Tested-by: Kalle Valo 
---
 arch/x86/include/asm/asm-prototypes.h | 12 
 include/asm-generic/asm-prototypes.h  |  7 +++
 2 files changed, 19 insertions(+)
 create mode 100644 arch/x86/include/asm/asm-prototypes.h
 create mode 100644 include/asm-generic/asm-prototypes.h

diff --git a/arch/x86/include/asm/asm-prototypes.h 
b/arch/x86/include/asm/asm-prototypes.h
new file mode 100644
index 000..ae87224
--- /dev/null
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -0,0 +1,12 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
diff --git a/include/asm-generic/asm-prototypes.h 
b/include/asm-generic/asm-prototypes.h
new file mode 100644
index 000..df13637
--- /dev/null
+++ b/include/asm-generic/asm-prototypes.h
@@ -0,0 +1,7 @@
+#include 
+extern void *__memset(void *, int, __kernel_size_t);
+extern void *__memcpy(void *, const void *, __kernel_size_t);
+extern void *__memmove(void *, const void *, __kernel_size_t);
+extern void *memset(void *, int, __kernel_size_t);
+extern void *memcpy(void *, const void *, __kernel_size_t);
+extern void *memmove(void *, const void *, __kernel_size_t);
-- 
2.10.2

Nicholas Piggin wrote:
> On Sun, 20 Nov 2016 19:26:23 +0100 Peter Wu  wrote:
>
>> Current git master (v4.9-rc5-364-g77079b1) with the latest kbuild fixes
>> is still failing to load modules when built with CONFIG_MODVERSIONS=y on
>> x86_64 using GCC 6.2.1.
>>
>> It can still be reproduced with make defconfig, then enabling
>> CONFIG_MODVERSIONS=y. The build output contains:
>>
>> WARNING: "memcpy" [net/netfilter/nf_nat.ko] has no CRC!
>> WARNING: "memmove" [net/netfilter/nf_nat.ko] has no CRC!
>> WARNING: "_copy_to_user" [fs/efivarfs/efivarfs.ko] has no CRC!
>> WARNING: "memcpy" [fs/efivarfs/efivarfs.ko] has no CRC!
>> WARNING: "_copy_from_user" [fs/efivarfs/efivarfs.ko] has no CRC!
>
> Sorry it's taken some time, bear with us. The arch specific patches need
> to be merged now. Adam, what is the status of your patch? Please submit
> to x86 maintainers if you haven't already.

I've re-checked against 4.9-rc6.

It'd probably fit better with kbuild parts, but it's up to you to decide;
I'm sending to x86 guys as you wish.


Meow!


[GIT PULL][SECURITY] Apparmor bugfix

2016-11-20 Thread James Morris
Please pull this fix for 4.9.

>From JJ: "This is a fix for a policy replacement bug that is fairly 
serious for apache mod_apparmor users, as it results in the wrong policy 
being applied on an network facing service."


The following changes since commit 9c763584b7c8911106bb77af7e648bef09af9d80:

  Linux 4.9-rc6 (2016-11-20 13:52:19 -0800)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
for-linus

John Johansen (1):
  apparmor: fix change_hat not finding hat after policy replacement

 security/apparmor/domain.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

---

commit 3d40658c977769ce2138f286cf131537bf68bdfe
Author: John Johansen 
Date:   Wed Aug 31 21:10:06 2016 -0700

apparmor: fix change_hat not finding hat after policy replacement

After a policy replacement, the task cred may be out of date and need
to be updated. However change_hat is using the stale profiles from
the out of date cred resulting in either: a stale profile being applied
or, incorrect failure when searching for a hat profile as it has been
migrated to the new parent profile.

Fixes: 01e2b670aa898a39259bc85c78e3d74820f4d3b6 (failure to find hat)
Fixes: 898127c34ec03291c86f4ff3856d79e9e18952bc (stale policy being applied)
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000287
Cc: sta...@vger.kernel.org
Signed-off-by: John Johansen 
Signed-off-by: James Morris 

diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index fc3036b..a4d90aa 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -621,8 +621,8 @@ int aa_change_hat(const char *hats[], int count, u64 token, 
bool permtest)
/* released below */
cred = get_current_cred();
cxt = cred_cxt(cred);
-   profile = aa_cred_profile(cred);
-   previous_profile = cxt->previous;
+   profile = aa_get_newest_profile(aa_cred_profile(cred));
+   previous_profile = aa_get_newest_profile(cxt->previous);
 
if (unconfined(profile)) {
info = "unconfined";
@@ -718,6 +718,8 @@ int aa_change_hat(const char *hats[], int count, u64 token, 
bool permtest)
 out:
aa_put_profile(hat);
kfree(name);
+   aa_put_profile(profile);
+   aa_put_profile(previous_profile);
put_cred(cred);
 
return error;


Re: perf TUI fails with "failed to process type: 64"

2016-11-20 Thread Anton Blanchard
Hi,

I forgot about the set of issues below. Michael had a suggested powerpc
fix for 3, but it it would be nice to fix the perf bugs in 1 and 2.

Anton
--

> Updating to mainline as of last night, I started seeing the following
> error when running the perf report TUI:
> 
> 0x46068 [0x8]: failed to process type: 68
> 
> This event is just PERF_RECORD_FINISHED_ROUND:
> 
> 0x46068 [0x8]: event: 68
> .
> . ... raw event: size 8 bytes
> .  :  44 00 00 00 00 00 08 00
> D...
> 
> 0x46068 [0x8]: PERF_RECORD_FINISHED_ROUND
> 
> Which of course is not our error. It took me a while to find the real
> culprit:
> 
>  14c00-14c00 g exc_virt_0x4c00_system_call
> 
> A zero length symbol, which __symbol__inc_addr_samples() barfs on:
> 
> if (addr < sym->start || addr >= sym->end) {
> ...
>   return -ERANGE;
> 
> Seems like we have 3 bugs here:
> 
> 1. Output the real source of the error instead of
> PERF_RECORD_FINISHED_ROUND
> 
> 2. Don't exit the TUI if we find a sample on a zero length symbol
> 
> 3. Why do we have zero length symbols in the first place? Does the
> recent ppc64 exception clean up have something to do with it?
> 
> Anton


[PATCH v2 1/2] usb: host: plat: Enable xhci plat runtime PM

2016-11-20 Thread Baolin Wang
Enable the xhci plat runtime PM for parent device to suspend/resume xhci.

Signed-off-by: Baolin Wang 
---
Changes since v1:
 - No updates.
---
 drivers/usb/host/xhci-plat.c |   37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index ed56bf9..13f86ad 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -246,6 +246,9 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (ret)
goto dealloc_usb2_hcd;
 
+   pm_runtime_set_active(&pdev->dev);
+   pm_runtime_enable(&pdev->dev);
+
return 0;
 
 
@@ -274,6 +277,7 @@ static int xhci_plat_remove(struct platform_device *dev)
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
struct clk *clk = xhci->clk;
 
+   pm_runtime_disable(&dev->dev);
usb_remove_hcd(xhci->shared_hcd);
usb_phy_shutdown(hcd->usb_phy);
 
@@ -311,14 +315,37 @@ static int xhci_plat_resume(struct device *dev)
 
return xhci_resume(xhci, 0);
 }
+#endif /* CONFIG_PM_SLEEP */
+
+#ifdef CONFIG_PM
+static int xhci_plat_runtime_suspend(struct device *dev)
+{
+   struct usb_hcd  *hcd = dev_get_drvdata(dev);
+   struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+   return xhci_suspend(xhci, device_may_wakeup(dev));
+}
+
+static int xhci_plat_runtime_resume(struct device *dev)
+{
+   struct usb_hcd  *hcd = dev_get_drvdata(dev);
+   struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+   return xhci_resume(xhci, 0);
+}
+
+static int xhci_plat_runtime_idle(struct device *dev)
+{
+   return 0;
+}
+#endif /* CONFIG_PM */
 
 static const struct dev_pm_ops xhci_plat_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(xhci_plat_suspend, xhci_plat_resume)
+
+   SET_RUNTIME_PM_OPS(xhci_plat_runtime_suspend, xhci_plat_runtime_resume,
+  xhci_plat_runtime_idle)
 };
-#define DEV_PM_OPS (&xhci_plat_pm_ops)
-#else
-#define DEV_PM_OPS NULL
-#endif /* CONFIG_PM */
 
 static const struct acpi_device_id usb_xhci_acpi_match[] = {
/* XHCI-compliant USB Controller */
@@ -332,7 +359,7 @@ static int xhci_plat_resume(struct device *dev)
.remove = xhci_plat_remove,
.driver = {
.name = "xhci-hcd",
-   .pm = DEV_PM_OPS,
+   .pm = &xhci_plat_pm_ops,
.of_match_table = of_match_ptr(usb_xhci_of_match),
.acpi_match_table = ACPI_PTR(usb_xhci_acpi_match),
},
-- 
1.7.9.5



[PATCH v2 2/2] usb: dwc3: core: Support the dwc3 host suspend/resume

2016-11-20 Thread Baolin Wang
For some mobile devices with strict power management, we also want to suspend
the host when the slave is detached for power saving. Thus we add the host
suspend/resume functions to support this requirement.

Signed-off-by: Baolin Wang 
---
Changes since v1:
 - Add pm_runtime.h head file to avoid kbuild error.
---
 drivers/usb/dwc3/Kconfig |7 ++
 drivers/usb/dwc3/core.c  |   26 +-
 drivers/usb/dwc3/core.h  |   15 +
 drivers/usb/dwc3/host.c  |   54 ++
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/Kconfig b/drivers/usb/dwc3/Kconfig
index a45b4f1..47bb2f3 100644
--- a/drivers/usb/dwc3/Kconfig
+++ b/drivers/usb/dwc3/Kconfig
@@ -47,6 +47,13 @@ config USB_DWC3_DUAL_ROLE
 
 endchoice
 
+config USB_DWC3_HOST_SUSPEND
+   bool "Choose if the DWC3 host (xhci) can be suspend/resume"
+   depends on USB_DWC3_HOST=y || USB_DWC3_DUAL_ROLE=y
+   help
+ We can suspend the host when the slave is detached for power saving,
+ and resume the host when one slave is attached.
+
 comment "Platform Glue Driver Support"
 
 config USB_DWC3_OMAP
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 9a4a5e4..7ad4bc3 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1091,6 +1091,7 @@ static int dwc3_probe(struct platform_device *pdev)
pm_runtime_use_autosuspend(dev);
pm_runtime_set_autosuspend_delay(dev, DWC3_DEFAULT_AUTOSUSPEND_DELAY);
pm_runtime_enable(dev);
+   pm_suspend_ignore_children(dev, true);
ret = pm_runtime_get_sync(dev);
if (ret < 0)
goto err1;
@@ -1215,15 +1216,27 @@ static int dwc3_remove(struct platform_device *pdev)
 static int dwc3_suspend_common(struct dwc3 *dwc)
 {
unsigned long   flags;
+   int ret;
 
switch (dwc->dr_mode) {
case USB_DR_MODE_PERIPHERAL:
+   spin_lock_irqsave(&dwc->lock, flags);
+   dwc3_gadget_suspend(dwc);
+   spin_unlock_irqrestore(&dwc->lock, flags);
+   break;
case USB_DR_MODE_OTG:
+   ret = dwc3_host_suspend(dwc);
+   if (ret)
+   return ret;
+
spin_lock_irqsave(&dwc->lock, flags);
dwc3_gadget_suspend(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
break;
case USB_DR_MODE_HOST:
+   ret = dwc3_host_suspend(dwc);
+   if (ret)
+   return ret;
default:
/* do nothing */
break;
@@ -1245,12 +1258,23 @@ static int dwc3_resume_common(struct dwc3 *dwc)
 
switch (dwc->dr_mode) {
case USB_DR_MODE_PERIPHERAL:
+   spin_lock_irqsave(&dwc->lock, flags);
+   dwc3_gadget_resume(dwc);
+   spin_unlock_irqrestore(&dwc->lock, flags);
+   break;
case USB_DR_MODE_OTG:
+   ret = dwc3_host_resume(dwc);
+   if (ret)
+   return ret;
+
spin_lock_irqsave(&dwc->lock, flags);
dwc3_gadget_resume(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
-   /* FALLTHROUGH */
+   break;
case USB_DR_MODE_HOST:
+   ret = dwc3_host_resume(dwc);
+   if (ret)
+   return ret;
default:
/* do nothing */
break;
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index b585a30..db41908 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1226,4 +1226,19 @@ static inline void dwc3_ulpi_exit(struct dwc3 *dwc)
 { }
 #endif
 
+#if IS_ENABLED(CONFIG_USB_DWC3_HOST_SUSPEND)
+int dwc3_host_suspend(struct dwc3 *dwc);
+int dwc3_host_resume(struct dwc3 *dwc);
+#else
+static inline int dwc3_host_suspend(struct dwc3 *dwc)
+{
+   return 0;
+}
+
+static inline int dwc3_host_resume(struct dwc3 *dwc)
+{
+   return 0;
+}
+#endif
+
 #endif /* __DRIVERS_USB_DWC3_CORE_H */
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index ed82464..20e84fc 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -16,9 +16,13 @@
  */
 
 #include 
+#include 
 
 #include "core.h"
 
+#define DWC3_HOST_SUSPEND_COUNT100
+#define DWC3_HOST_SUSPEND_TIMEOUT  100
+
 static int dwc3_host_get_irq(struct dwc3 *dwc)
 {
struct platform_device  *dwc3_pdev = to_platform_device(dwc->dev);
@@ -130,3 +134,53 @@ void dwc3_host_exit(struct dwc3 *dwc)
  dev_name(&dwc->xhci->dev));
platform_device_unregister(dwc->xhci);
 }
+
+#ifdef CONFIG_USB_DWC3_HOST_SUSPEND
+int dwc3_host_suspend(struct dwc3 *dwc)
+{
+   struct device *xhci = &dwc->xhci->dev;
+   int ret, cnt = DWC3_HOST_SUSPEND_COUNT;
+
+   /*
+* We need make sure the children of the xHCI device had been

RE: [PATCH v3 6/9] mtd: spi-nor: Support R/W for S25FS-S family flash

2016-11-20 Thread Yao Yuan
On Thu, Nov 18, 2016 at 07:00 PM +, Krzeminski, Marcin (Nokia - PL/Wroclaw) 
wrote:
> > -Original Message-
> > From: Yao Yuan [mailto:yao.y...@nxp.com]
> > Sent: Friday, November 18, 2016 5:20 AM
> > To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> > ; Han Xu 
> > Cc: David Woodhouse ; linux-
> > ker...@vger.kernel.org; linux-...@lists.infradead.org;
> > han...@freescale.com; Brian Norris ;
> > jagannadh.t...@gmail.com; linux-arm-ker...@lists.infradead.org
> > Subject: RE: [PATCH v3 6/9] mtd: spi-nor: Support R/W for S25FS-S
> > family flash
> >
> > On Thu, Nov 17, 2016 at 10:14:55AM +, Krzeminski, Marcin (Nokia -
> > PL/Wroclaw) wrote:
> > > > On Thu, Nov 17, 2016 at 06:50:55AM +, Krzeminski, Marcin
> > > > (Nokia
> > > > -
> > > > PL/Wroclaw) wrote:
> > > > > > > > On Thu, Aug 18, 2016 at 2:38 AM, Yunhui Cui
> > > > > > > > 
> > > > > > > > wrote:
> > > > > > > > > From: Yunhui Cui 
> > > > > > > > >
> > > > > > > > > With the physical sectors combination, S25FS-S family
> > > > > > > > > flash requires some special operations for read/write 
> > > > > > > > > functions.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Yunhui Cui 
> > > > > > > > > ---
> > > > > > > > >  drivers/mtd/spi-nor/spi-nor.c | 56
> > > > > > > > > +++
> > > > > > > > >  1 file changed, 56 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/mtd/spi-nor/spi-nor.c
> > > > > > > > > b/drivers/mtd/spi-nor/spi-nor.c index d0fc165..495d0bb
> > > > > > > > > 100644
> > > > > > > > > --- a/drivers/mtd/spi-nor/spi-nor.c
> > > > > > > > > +++ b/drivers/mtd/spi-nor/spi-nor.c
> > > > > > > > > @@ -39,6 +39,10 @@
> > > > > > > > >
> > > > > > > > >  #define SPI_NOR_MAX_ID_LEN 6
> > > > > > > > >  #define SPI_NOR_MAX_ADDR_WIDTH 4
> > > > > > > > > +/* Added for S25FS-S family flash */
> > > > > > > > > +#define SPINOR_CONFIG_REG3_OFFSET  0x84
> > > > > > > > > +#define CR3V_4KB_ERASE_UNABLE  0x8 #define
> > > > > > > > > +SPINOR_S25FS_FAMILY_EXT_JEDEC  0x81
> > > > > > > > >
> > > > > > > > >  struct flash_info {
> > > > > > > > > char*name;
> > > > > > > > > @@ -78,6 +82,7 @@ struct flash_info {  };
> > > > > > > > >
> > > > > > > > >  #define JEDEC_MFR(info)((info)->id[0])
> > > > > > > > > +#define EXT_JEDEC(info)((info)->id[5])
> > > > > > > > >
> > > > > > > > >  static const struct flash_info *spi_nor_match_id(const
> > > > > > > > > char *name);
> > > > > > > > >
> > > > > > > > > @@ -899,6 +904,7 @@ static const struct flash_info
> > spi_nor_ids[] = {
> > > > > > > > >  */
> > > > > > > > > { "s25sl032p",  INFO(0x010215, 0x4d00,  64 *
> > > > > > > > > 1024, 64,
> > > > > > > > SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) },
> > > > > > > > > { "s25sl064p",  INFO(0x010216, 0x4d00,  64 *
> > > > > > > > > 1024, 128, SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) },
> > > > > > > > > +   { "s25fs256s1", INFO6(0x010219, 0x4d0181, 64 *
> > > > > > > > > + 1024, 512, 0)},
> > > > > > > > > { "s25fl256s0", INFO(0x010219, 0x4d00, 256 * 1024, 
> > > > > > > > > 128, 0) },
> > > > > > > > > { "s25fl256s1", INFO(0x010219, 0x4d01,  64 *
> > > > > > > > > 1024, 512,
> > > > > > > > SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) },
> > > > > > > > > { "s25fl512s",  INFO(0x010220, 0x4d00, 256 *
> > > > > > > > > 1024, 256, SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) },
> @@
> > > > > > > > > -
> > 1036,6
> > > > > > +1042,50
> > > > > > > > @@ static const struct flash_info *spi_nor_read_id(struct
> > > > > > > > spi_nor
> > > > > > > > *nor)
> > > > > > > > > return ERR_PTR(-ENODEV);  }
> > > > > > > > >
> > > > > > > > > +/*
> > > > > > > > > + * The S25FS-S family physical sectors may be
> > > > > > > > > +configured as a
> > > > > > > > > + * hybrid combination of eight 4-kB parameter sectors
> > > > > > > > > + * at the top or bottom of the address space with all
> > > > > > > > > + * but one of the remaining sectors being uniform size.
> > > > > > > > > + * The Parameter Sector Erase commands (20h or 21h)
> > > > > > > > > +must
> > > > > > > > > + * be used to erase the 4-kB parameter sectors individually.
> > > > > > > > > + * The Sector (uniform sector) Erase commands (D8h or
> > > > > > > > > +DCh)
> > > > > > > > > + * must be used to erase any of the remaining
> > > > > > > > > + * sectors, including the portion of highest or lowest
> > > > > > > > > +address
> > > > > > > > > + * sector that is not overlaid by the parameter sectors.
> > > > > > > > > + * The uniform sector erase command has no effect on
> > > > > > > > > +parameter
> > > > > > > > sectors.
> > > > > > > > > + */
> > > > > > > > > +static int spansion_s25fs_disable_4kb_erase(struct
> > > > > > > > > +spi_nor
> > *nor) {
> > > > > > > > > +   u32 cr3v_addr  = SPINOR_CONFIG_REG3_OFFSET;
> > > > > > > > > +   u8 cr3v = 0x0;
> > > > > > > > > +   int ret = 0x0;
> > > > > > > > > +
> > > > > > > > > +   

Re: [PATCH 2/2] usb: dwc3: core: Support the dwc3 host suspend/resume

2016-11-20 Thread Baolin Wang
On 18 November 2016 at 21:14, kbuild test robot  wrote:
> Hi Baolin,
>
> [auto build test ERROR on next-20161117]
> [cannot apply to balbi-usb/next usb/usb-testing v4.9-rc5 v4.9-rc4 v4.9-rc3 
> v4.9-rc5]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Baolin-Wang/usb-host-plat-Enable-xhci-plat-runtime-PM/20161118-202029
> config: i386-allmodconfig (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
>drivers/usb/dwc3/host.c: In function 'dwc3_host_suspend':
>>> drivers/usb/dwc3/host.c:157:10: error: implicit declaration of function 
>>> 'pm_children_suspended' [-Werror=implicit-function-declaration]
>  while (!pm_children_suspended(xhci) && --cnt > 0)
>  ^
>cc1: some warnings being treated as errors
>
> vim +/pm_children_suspended +157 drivers/usb/dwc3/host.c
>
>151  int ret, cnt = DWC3_HOST_SUSPEND_COUNT;
>152
>153  /*
>154   * We need make sure the children of the xHCI device had been 
> into
>155   * suspend state, or we will suspend xHCI device failed.
>156   */
>  > 157  while (!pm_children_suspended(xhci) && --cnt > 0)
>158  msleep(DWC3_HOST_SUSPEND_TIMEOUT);
>159
>160  if (cnt <= 0) {

I will send out new patch to fix this building error.

-- 
Baolin.wang
Best Regards


[PATCH] vfio: fix vfio_info_cap_add/shift

2016-11-20 Thread Eric Auger
Capability header next field is an offset relative to the start of
the INFO buffer. tmp->next is assigned the proper value but iterations
implemented in vfio_info_cap_add and vfio_info_cap_shift use next
as an offset between the headers. When coping with multiple capabilities
this leads to an Oops.

Signed-off-by: Eric Auger 
---
 drivers/vfio/vfio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..1e838d1 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1763,7 +1763,7 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct 
vfio_info_cap *caps,
header->version = version;
 
/* Add to the end of the capability chain */
-   for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next)
+   for (tmp = buf; tmp->next; tmp = buf + tmp->next)
; /* nothing */
 
tmp->next = caps->size;
@@ -1776,8 +1776,9 @@ EXPORT_SYMBOL_GPL(vfio_info_cap_add);
 void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset)
 {
struct vfio_info_cap_header *tmp;
+   void *buf = (void *)caps->buf;
 
-   for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next - offset)
+   for (tmp = buf; tmp->next; tmp = buf + tmp->next - offset)
tmp->next += offset;
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
-- 
2.5.5



[PATCH v5 2/9] IB/core: Replace semaphore sm_sem with an atomic wait

2016-11-20 Thread Binoy Jayan
The semaphore 'sm_sem' is used for an exclusive ownership of the device
so model the same as an atomic variable with an associated wait_event.
Semaphores are going away in the future.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/core/user_mad.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index 415a318..6101c0a 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -67,6 +67,8 @@ enum {
IB_UMAD_MINOR_BASE = 0
 };
 
+#define UMAD_F_CLAIM   0x01
+
 /*
  * Our lifetime rules for these structs are the following:
  * device special file is opened, we take a reference on the
@@ -87,7 +89,8 @@ struct ib_umad_port {
 
struct cdev   sm_cdev;
struct device *sm_dev;
-   struct semaphore   sm_sem;
+   wait_queue_head_t wq;
+   unsigned long flags;
 
struct mutex   file_mutex;
struct list_head   file_list;
@@ -1030,12 +1033,14 @@ static int ib_umad_sm_open(struct inode *inode, struct 
file *filp)
port = container_of(inode->i_cdev, struct ib_umad_port, sm_cdev);
 
if (filp->f_flags & O_NONBLOCK) {
-   if (down_trylock(&port->sm_sem)) {
+   if (test_and_set_bit(UMAD_F_CLAIM, &port->flags)) {
ret = -EAGAIN;
goto fail;
}
} else {
-   if (down_interruptible(&port->sm_sem)) {
+   if (wait_event_interruptible(port->wq,
+!test_and_set_bit(UMAD_F_CLAIM,
+&port->flags))) {
ret = -ERESTARTSYS;
goto fail;
}
@@ -1060,7 +1065,8 @@ static int ib_umad_sm_open(struct inode *inode, struct 
file *filp)
ib_modify_port(port->ib_dev, port->port_num, 0, &props);
 
 err_up_sem:
-   up(&port->sm_sem);
+   clear_bit(UMAD_F_CLAIM, &port->flags);
+   wake_up(&port->wq);
 
 fail:
return ret;
@@ -1079,7 +1085,8 @@ static int ib_umad_sm_close(struct inode *inode, struct 
file *filp)
ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props);
mutex_unlock(&port->file_mutex);
 
-   up(&port->sm_sem);
+   clear_bit(UMAD_F_CLAIM, &port->flags);
+   wake_up(&port->wq);
 
kobject_put(&port->umad_dev->kobj);
 
@@ -1177,7 +1184,8 @@ static int ib_umad_init_port(struct ib_device *device, 
int port_num,
 
port->ib_dev   = device;
port->port_num = port_num;
-   sema_init(&port->sm_sem, 1);
+   init_waitqueue_head(&port->wq);
+   __clear_bit(UMAD_F_CLAIM, &port->flags);
mutex_init(&port->file_mutex);
INIT_LIST_HEAD(&port->file_list);
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 5/9] IB/isert: Replace semaphore sem with completion

2016-11-20 Thread Binoy Jayan
The semaphore 'sem' in isert_device is used as completion, but in a
counting fashion as isert_connected_handler could be called multiple times
during which it allows for that number of waiters (isert_accept_np) to
continue without blocking, each consuming one node out from the list
isert_np-pending in the same order in which they were enqueued (FIFO). So,
convert it to struct completion. Semaphores are going away in the future.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 +++---
 drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..de80f56 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -619,7 +619,7 @@
mutex_unlock(&isert_np->mutex);
 
isert_info("np %p: Allow accept_np to continue\n", isert_np);
-   up(&isert_np->sem);
+   complete(&isert_np->comp);
 }
 
 static void
@@ -2311,7 +2311,7 @@ struct rdma_cm_id *
isert_err("Unable to allocate struct isert_np\n");
return -ENOMEM;
}
-   sema_init(&isert_np->sem, 0);
+   init_completion(&isert_np->comp);
mutex_init(&isert_np->mutex);
INIT_LIST_HEAD(&isert_np->accepted);
INIT_LIST_HEAD(&isert_np->pending);
@@ -2427,7 +2427,7 @@ struct rdma_cm_id *
int ret;
 
 accept_wait:
-   ret = down_interruptible(&isert_np->sem);
+   ret = wait_for_completion_interruptible(&isert_np->comp);
if (ret)
return -ENODEV;
 
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h 
b/drivers/infiniband/ulp/isert/ib_isert.h
index c02ada5..a1277c0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -190,7 +191,7 @@ struct isert_device {
 
 struct isert_np {
struct iscsi_np *np;
-   struct semaphoresem;
+   struct completion   comp;
struct rdma_cm_id   *cm_id;
struct mutexmutex;
struct list_headaccepted;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 4/9] IB/mthca: Replace semaphore poll_sem with mutex

2016-11-20 Thread Binoy Jayan
The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 10 +++---
 drivers/infiniband/hw/mthca/mthca_cmd.h |  1 +
 drivers/infiniband/hw/mthca/mthca_dev.h |  2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..49c6e19 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -347,7 +347,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
unsigned long end;
u8 status;
 
-   down(&dev->cmd.poll_sem);
+   mutex_lock(&dev->cmd.poll_mutex);
 
err = mthca_cmd_post(dev, in_param,
 out_param ? *out_param : 0,
@@ -382,7 +382,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
}
 
 out:
-   up(&dev->cmd.poll_sem);
+   mutex_unlock(&dev->cmd.poll_mutex);
return err;
 }
 
@@ -520,7 +520,7 @@ static int mthca_cmd_imm(struct mthca_dev *dev,
 int mthca_cmd_init(struct mthca_dev *dev)
 {
mutex_init(&dev->cmd.hcr_mutex);
-   sema_init(&dev->cmd.poll_sem, 1);
+   mutex_init(&dev->cmd.poll_mutex);
dev->cmd.flags = 0;
 
dev->hcr = ioremap(pci_resource_start(dev->pdev, 0) + MTHCA_HCR_BASE,
@@ -582,8 +582,6 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
 
dev->cmd.flags |= MTHCA_CMD_USE_EVENTS;
 
-   down(&dev->cmd.poll_sem);
-
return 0;
 }
 
@@ -600,8 +598,6 @@ void mthca_cmd_use_polling(struct mthca_dev *dev)
down(&dev->cmd.event_sem);
 
kfree(dev->cmd.context);
-
-   up(&dev->cmd.poll_sem);
 }
 
 struct mthca_mailbox *mthca_alloc_mailbox(struct mthca_dev *dev,
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.h 
b/drivers/infiniband/hw/mthca/mthca_cmd.h
index d2e5b19..a7f197e 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.h
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.h
@@ -35,6 +35,7 @@
 #ifndef MTHCA_CMD_H
 #define MTHCA_CMD_H
 
+#include 
 #include 
 
 #define MTHCA_MAILBOX_SIZE 4096
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index 4393a02..87ab964 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -120,7 +120,7 @@ enum {
 struct mthca_cmd {
struct pci_pool  *pool;
struct mutex  hcr_mutex;
-   struct semaphore  poll_sem;
+   struct mutex  poll_mutex;
struct semaphore  event_sem;
int   max_cmds;
spinlock_tcontext_lock;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v2 2/6] powerpc/powernv: Autoload IMA device driver module

2016-11-20 Thread Hemant Kumar
This patch does three things :
 - Enables "opal.c" to create a platform device for the IMA interface
   according to the appropriate compatibility string.
 - Find the reserved-memory region details from the system device tree
   and get the base address of HOMER region address for each chip.
 - We also get the Nest PMU counter data offsets (in the HOMER region)
   and their sizes. The offsets for the counters' data are fixed and
   won't change from chip to chip.

The device tree parsing logic is separated from the PMU creation
functions (which is done in subsequent patches). Right now, only Nest
units are taken care of.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-ima.c | 117 ++
 arch/powerpc/platforms/powernv/opal.c |  13 
 3 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-ima.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..ee28528 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-ima.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-ima.c 
b/arch/powerpc/platforms/powernv/opal-ima.c
new file mode 100644
index 000..446e7bc
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-ima.c
@@ -0,0 +1,117 @@
+/*
+ * OPAL IMA interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright  (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *(C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMA_MAX_CHIPS];
+
+static int opal_ima_counters_probe(struct platform_device *pdev)
+{
+   struct device_node *child, *ima_dev, *rm_node = NULL;
+   struct perchip_nest_info *pcni;
+   u32 reg[4], pages, nest_offset, nest_size, idx;
+   int i = 0;
+   const char *node_name;
+
+   if (!pdev || !pdev->dev.of_node)
+   return -ENODEV;
+
+   ima_dev = pdev->dev.of_node;
+
+   /*
+* nest_offset : where the nest-counters' data start.
+* size : size of the entire nest-counters region
+*/
+   if (of_property_read_u32(ima_dev, "ima-nest-offset", &nest_offset))
+   goto err;
+   if (of_property_read_u32(ima_dev, "ima-nest-size", &nest_size))
+   goto err;
+
+   /* Find the "homer region" for each chip */
+   rm_node = of_find_node_by_path("/reserved-memory");
+   if (!rm_node)
+   goto err;
+
+   for_each_child_of_node(rm_node, child) {
+   if (of_property_read_string_index(child, "name", 0,
+ &node_name))
+   continue;
+   if (strncmp("ibm,homer-image", node_name,
+   strlen("ibm,homer-image")))
+   continue;
+
+   /* Get the chip id to which the above homer region belongs to */
+   if (of_property_read_u32(child, "ibm,chip-id", &idx))
+   goto err;
+
+   /* reg property will have four u32 cells. */
+   if (of_property_read_u32_array(child, "reg", reg, 4))
+   goto err;
+
+   pcni = &nest_perchip_info[idx];
+
+   /* Fetch the homer region base address */
+   pcni->pbase = reg[0];
+   pcni->pbase = pcni->pbase << 32 | reg[1];
+   /* Add the nest IMA Base offset */
+   pcni->pbase = pcni->pbase + nest_offset;
+   /* Fetch the

Re: [PATCH 1/2] kbuild: provide include/asm/asm-prototypes.h for ARM

2016-11-20 Thread Nicholas Piggin
On Sun, 20 Nov 2016 19:12:57 +
Russell King - ARM Linux  wrote:

> On Sun, Nov 20, 2016 at 10:32:50AM -0800, Linus Torvalds wrote:
> > On Sun, Nov 20, 2016 at 5:21 AM, Russell King - ARM Linux
> >  wrote:  
> > > On Tue, Oct 25, 2016 at 07:32:00PM +1100, Nicholas Piggin wrote:  
> > >>
> > >> Michal, what's your thoughts? If you merge my patch 2/2 and skip 1/2, it
> > >> should not give any new build warnings or errors, so then arch patches 
> > >> can
> > >> go via arch trees. 1/2 could go in after everyone is up to date.  
> > >
> > > So what's the conclusion on this?  I've just had a failure due to
> > > CONFIG_TRIM_UNUSED_KSYMS reported on ARM, and it looks like (at
> > > least some of) patch 1 could resolve it.  
> > 
> > Hmm. I've got
> > 
> >   cc6acc11cad1 kbuild: be more careful about matching preprocessed asm
> > ___EXPORT_SYMBOL
> >   4efca4ed05cb kbuild: modversions for EXPORT_SYMBOL() for asm
> > 
> > in my tree. Is that sufficient, or do we still have issues?  
> 
> Hmm, those seem to have gone in during the last week, so I haven't
> tested it yet (build running, but it'll take a while).  However, I
> don't think they'll solve _this_ problem.
> 
> Some of the issue here is that we use a mixture of assembly macros
> and preprocessor for the ARM bitops - the ARM bitops are created
> with an assembly macro which contains some pre-processor expanded
> macros (eg, EXPORT_SYMBOL()).
> 
> This means that the actual symbol being exported is not known to
> the preprocessor, so doing the "__is_defined(__KSYM_##sym)" inside
> "EXPORT_SYMBOL(\name)" becomes "__is_defined(__KSYM_\name)" to the
> preprocessor.  As "__KSYM_\name" is never defined, it always comes
> out as zero, hence we always use __cond_export_sym_0, which omits
> the symbol export from the assembly macro definition:
> 
>  .macro bitop, name, instr
> .globl \name ; .align 0 ; \name:
> 
> ...
> 
> .type \name, %function; .size \name, .-\name
> 
>  .endm
> 
> In other words, using preprocessor macros inside an assembly macro
> may not work as expected, and now leads to config-specific failures.
> 

Yes, that's a limitation. cpp expansion we can handle, but not gas macros.
You will need Arnd's patches for ARM.

http://marc.info/?l=linux-kbuild&m=147732160529499&w=2

If that doesn't fix it for you, send me your .config offline and I'll set
up a cross compile to work on it.

Again, any arch always has the option of going back to doing asm exports
in the old style of putting them into a .c file, but hopefully you'll find
Arnd's reworked patches to be something you're willing to merge.

Thanks,
Nick


Re: [PATCH v16 04/15] clocksource/drivers/arm_arch_timer: rename some enums and defines, and some cleanups.

2016-11-20 Thread Fu Wei
Hi Mark,

On 19 November 2016 at 02:49, Mark Rutland  wrote:
> On Wed, Nov 16, 2016 at 09:48:57PM +0800, fu@linaro.org wrote:
>> From: Fu Wei 
>>
>> Rename some enums and defines, to unify the format of enums and defines
>> in arm_arch_timer.h, also update all the users of these enums and defines:
>> drivers/clocksource/arm_arch_timer.c
>> virt/kvm/arm/hyp/timer-sr.c
>
> I'm happy with making definitions use a consistent ARCH_TIMER_ prefix,
> given they're exposed in headers...
>
>> And do some cleanups, according to the suggestion from checkpatch.pl:
>> (1) using BIT(nr) instead of (1 << nr)
>> (2) using 'unsigned int' instead of 'unsigned'
>
> ... but these changes are pointless churn. They make the patch larger,
> hardwer to review, and more painful to merge.
>
> Please leave these as they are unless there is a functional problem. If
> there will be a functional problem unless these are changed, describe
> that in the commit message.

OK, Mark.
I will take these out of patch, thanks :-)


>
> Thanks,
> Mark.
>
>>
>> No functional change.
>>
>> Signed-off-by: Fu Wei 
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 111 
>> ++-
>>  include/clocksource/arm_arch_timer.h |  40 ++---
>>  virt/kvm/arm/hyp/timer-sr.c  |   6 +-
>>  3 files changed, 81 insertions(+), 76 deletions(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c 
>> b/drivers/clocksource/arm_arch_timer.c
>> index 15341cf..dd1040d 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -66,11 +66,11 @@ struct arch_timer {
>>  #define to_arch_timer(e) container_of(e, struct arch_timer, evt)
>>
>>  static u32 arch_timer_rate;
>> -static int arch_timer_ppi[MAX_TIMER_PPI];
>> +static int arch_timer_ppi[ARCH_TIMER_MAX_TIMER_PPI];
>>
>>  static struct clock_event_device __percpu *arch_timer_evt;
>>
>> -static enum arch_timer_ppi_nr arch_timer_uses_ppi = VIRT_PPI;
>> +static enum arch_timer_ppi_nr arch_timer_uses_ppi = ARCH_TIMER_VIRT_PPI;
>>  static bool arch_timer_c3stop;
>>  static bool arch_timer_mem_use_virtual;
>>
>> @@ -340,7 +340,7 @@ static void fsl_a008585_set_sne(struct 
>> clock_event_device *clk)
>>   if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
>>   return;
>>
>> - if (arch_timer_uses_ppi == VIRT_PPI)
>> + if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
>>   clk->set_next_event = fsl_a008585_set_next_event_virt;
>>   else
>>   clk->set_next_event = fsl_a008585_set_next_event_phys;
>> @@ -352,7 +352,7 @@ static void __arch_timer_setup(unsigned type,
>>  {
>>   clk->features = CLOCK_EVT_FEAT_ONESHOT;
>>
>> - if (type == ARCH_CP15_TIMER) {
>> + if (type == ARCH_TIMER_TYPE_CP15) {
>>   if (arch_timer_c3stop)
>>   clk->features |= CLOCK_EVT_FEAT_C3STOP;
>>   clk->name = "arch_sys_timer";
>> @@ -360,14 +360,14 @@ static void __arch_timer_setup(unsigned type,
>>   clk->cpumask = cpumask_of(smp_processor_id());
>>   clk->irq = arch_timer_ppi[arch_timer_uses_ppi];
>>   switch (arch_timer_uses_ppi) {
>> - case VIRT_PPI:
>> + case ARCH_TIMER_VIRT_PPI:
>>   clk->set_state_shutdown = arch_timer_shutdown_virt;
>>   clk->set_state_oneshot_stopped = 
>> arch_timer_shutdown_virt;
>>   clk->set_next_event = arch_timer_set_next_event_virt;
>>   break;
>> - case PHYS_SECURE_PPI:
>> - case PHYS_NONSECURE_PPI:
>> - case HYP_PPI:
>> + case ARCH_TIMER_PHYS_SECURE_PPI:
>> + case ARCH_TIMER_PHYS_NONSECURE_PPI:
>> + case ARCH_TIMER_HYP_PPI:
>>   clk->set_state_shutdown = arch_timer_shutdown_phys;
>>   clk->set_state_oneshot_stopped = 
>> arch_timer_shutdown_phys;
>>   clk->set_next_event = arch_timer_set_next_event_phys;
>> @@ -447,8 +447,8 @@ static void arch_counter_set_user_access(void)
>>
>>  static bool arch_timer_has_nonsecure_ppi(void)
>>  {
>> - return (arch_timer_uses_ppi == PHYS_SECURE_PPI &&
>> - arch_timer_ppi[PHYS_NONSECURE_PPI]);
>> + return (arch_timer_uses_ppi == ARCH_TIMER_PHYS_SECURE_PPI &&
>> + arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]);
>>  }
>>
>>  static u32 check_ppi_trigger(int irq)
>> @@ -469,14 +469,15 @@ static int arch_timer_starting_cpu(unsigned int cpu)
>>   struct clock_event_device *clk = this_cpu_ptr(arch_timer_evt);
>>   u32 flags;
>>
>> - __arch_timer_setup(ARCH_CP15_TIMER, clk);
>> + __arch_timer_setup(ARCH_TIMER_TYPE_CP15, clk);
>>
>>   flags = check_ppi_trigger(arch_timer_ppi[arch_timer_uses_ppi]);
>>   enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], flags);
>>
>>   if (arch_timer_has_nonsecure_ppi()) {
>> - flags = check_ppi_trigger(arch_timer_

[PATCH v2 4/6] powerpc/perf: Add event attribute and group to IMA pmus

2016-11-20 Thread Hemant Kumar
Device tree IMA driver code parses the IMA units and their events. It
passes the information to IMA pmu code which is placed in powerpc/perf
as "ima-pmu.c".

This patch creates only event attributes and attribute groups for the
IMA pmus.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
Changelog:
v1 -> v2:
 - Changes to Makefile to only enable this feature for
   CONFIG_PPC_POWERNV=y

 arch/powerpc/perf/Makefile|  6 +-
 arch/powerpc/perf/ima-pmu.c   | 96 +++
 arch/powerpc/platforms/powernv/opal-ima.c | 12 +++-
 3 files changed, 111 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/perf/ima-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f102d53..099c61a 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -2,10 +2,14 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
 obj-$(CONFIG_PERF_EVENTS)  += callchain.o perf_regs.o
 
+ima-$(CONFIG_PPC_POWERNV)   += ima-pmu.o
+
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
-  isa207-common.o power8-pmu.o power9-pmu.o
+  isa207-common.o power8-pmu.o power9-pmu.o \
+  $(ima-y)
+
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/ima-pmu.c b/arch/powerpc/perf/ima-pmu.c
new file mode 100644
index 000..50d2226
--- /dev/null
+++ b/arch/powerpc/perf/ima-pmu.c
@@ -0,0 +1,96 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *  (C) 2016 Hemant K Shaw, IBM Corporation.
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMA_MAX_CHIPS];
+struct ima_pmu *per_nest_pmu_arr[IMA_MAX_PMUS];
+
+/* dev_str_attr : Populate event "name" and string "str" in attribute */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+   struct perf_pmu_events_attr *attr;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+   sysfs_attr_init(&attr->attr.attr);
+
+   attr->event_str = str;
+   attr->attr.attr.name = name;
+   attr->attr.attr.mode = 0444;
+   attr->attr.show = perf_event_sysfs_show;
+
+   return &attr->attr.attr;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ * and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct ima_events *events,
+ int idx, struct ima_pmu *pmu)
+{
+   struct attribute_group *attr_group;
+   struct attribute **attrs;
+   int i;
+
+   /* Allocate memory for attribute group */
+   attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+   if (!attr_group)
+   return -ENOMEM;
+
+   /* Allocate memory for attributes */
+   attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+   if (!attrs) {
+   kfree(attr_group);
+   return -ENOMEM;
+   }
+
+   attr_group->name = "events";
+   attr_group->attrs = attrs;
+   for (i = 0; i < idx; i++, events++) {
+   attrs[i] = dev_str_attr((char *)events->ev_name,
+   (char *)events->ev_value);
+   }
+
+   pmu->attr_groups[0] = attr_group;
+   return 0;
+}
+
+/*
+ * init_ima_pmu : Setup the IMA pmu device in "pmu_ptr" and its events
+ *"events".
+ * Setup the cpu mask information for these pmus and setup the state machine
+ * hotplug notifiers as well.
+ */
+int init_ima_pmu(struct ima_events *events, int idx,
+struct ima_pmu *pmu_ptr)
+{
+   int ret = -ENODEV;
+
+   ret = update_events_in_group(events, idx, pmu_ptr);
+   if (ret)
+   goto err_free;
+
+   return 0;
+
+err_free:
+   /* Only free the attr_groups which are dynamically allocated  */
+   if (pmu_ptr->attr_groups[0]) {
+   kfree(pmu_ptr->attr_groups[0]->attrs);
+   kfree(pmu_ptr->attr_groups[0]);
+   }
+
+   return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-ima.c 
b/arch/powerpc/platforms/powernv/opal-ima.c
index e8d5771..d2e6910 100644
--- a/arch/powerpc/platforms/powernv/opal-ima.c
+++ b/arch/powerpc/platforms/powernv/opal-ima.c
@@

[PATCH v2 0/6] IMA Instrumentation Support

2016-11-20 Thread Hemant Kumar
Power 9 has In-Memory-Accumulation (IMA) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core). These Nest PMU counters are handled by a Nest
IMA microcode. This microcode runs in the OCC (On-Chip Controller)
complex and its purpose is to program the nest counters, collect the
counter data and move the counter data to memory. 

The IMA infrastructure encapsulates nest (per-chip), core and thread
level counters. While the nest IMA PMUs are handled by the nest IMA
microcode, the core and thread level PMUs are handled by the Core-HPMC
engine. This patchset enables the nest IMA PMUs and is based on the
initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" : 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

PMU Events' Information:
OPAL obtains the Nest PMU and event information from the IMA Catalog
and passes on to the kernel via the device tree. The events' information
contains :
 - Event name
 - Event Offset
 - Event description
and, maybe :
 - Event scale
 - Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2016-November/005552.html

The kernel discovers the IMA counters information in the device tree
at the "ima-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMA PMUs and events information, the kernel has to
discover the "ima-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the ima-counters and mcs node:
/dts-v1/;

[...]
ima-counters {   
ima-nest-offset = <0x32>;
compatible = "ibm,opal-in-memory-counters";
ima-nest-size = <0x3>;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x1238>;
version-id = [00];

mcs0 {
compatible = "ibm,ima-counters-chip";
ranges;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x1279>;
scale = "1.2207e-4";
unit = "MiB";

event@528 {
event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ;
desc = "Total Read Bandwidth seen on both MCS 
of MC0";
phandle = <0x128c>;
reg = <0x118 0x8>;
};
[...]

>From the device tree, the kernel parses the PMUs and their events'
information.

After parsing the nest IMA PMUs and their events, the PMUs and their
attributes are registered in the kernel.

Example Usage :
 # perf list

  [...]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/   [Kernel PMU event]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0_LAST_SAMPLE/ [Kernel PMU event]
  [...]

 # perf stat -e "nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/" -a --per-socket

TODOs:
 - Add support for Core IMA.
 - Add support for thread IMA.

Comments/feedback/suggestions are welcome.

Changelog:
 v1 -> v2 :
 - Account for the cases where a PMU can have a common scale and unit
   values for all its supported events (Patch 3/6).
 - Fixed a Build error (for maple_defconfig) by enabling ima_pmu.o
   only for CONFIG_PPC_POWERNV=y (Patch 4/6)
 - Read from the "event-name" property instead of "name" for an event
   node (Patch 3/6).

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 

Hemant Kumar (6):
  powerpc/powernv: Data structure and macros definitions
  powerpc/powernv: Autoload IMA device driver module
  powerpc/powernv: Detect supported IMA units and its events
  powerpc/perf: Add event attribute and group to IMA pmus
  powerpc/perf: Generic ima pmu event functions
  powerpc/perf: IMA pmu cpumask and cpu hotplug support

 arch/powerpc/include/asm/ima-pmu.h |  75 
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/perf/Makefile |   6 +-
 arch/powerpc/perf/ima-pmu.c 

[PATCH v5 6/9] IB/hns: Replace counting semaphore event_sem with wait_event

2016-11-20 Thread Binoy Jayan
Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c| 46 -
 drivers/infiniband/hw/hns/hns_roce_device.h |  2 +-
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 51a0675..12ef3d8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -189,6 +189,34 @@ void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 
token, u8 status,
complete(&context->done);
 }
 
+static inline struct hns_roce_cmd_context *
+hns_roce_try_get_context(struct hns_roce_cmdq *cmd)
+{
+   struct hns_roce_cmd_context *context = NULL;
+
+   spin_lock(&cmd->context_lock);
+
+   if (cmd->free_head < 0)
+   goto out;
+
+   context = &cmd->context[cmd->free_head];
+   context->token += cmd->token_mask + 1;
+   cmd->free_head = context->next;
+out:
+   spin_unlock(&cmd->context_lock);
+   return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct hns_roce_cmd_context *
+hns_roce_get_free_context(struct hns_roce_cmdq *cmd)
+{
+   struct hns_roce_cmd_context *context;
+
+   wait_event(cmd->wq, (context = hns_roce_try_get_context(cmd)));
+   return context;
+}
+
 /* this should be called with "use_events" */
 static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
u64 out_param, unsigned long in_modifier,
@@ -200,13 +228,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev 
*hr_dev, u64 in_param,
struct hns_roce_cmd_context *context;
int ret = 0;
 
-   spin_lock(&cmd->context_lock);
-   WARN_ON(cmd->free_head < 0);
-   context = &cmd->context[cmd->free_head];
-   context->token += cmd->token_mask + 1;
-   cmd->free_head = context->next;
-   spin_unlock(&cmd->context_lock);
-
+   context = hns_roce_get_free_context(cmd);
init_completion(&context->done);
 
ret = hns_roce_cmd_mbox_post_hw(hr_dev, in_param, out_param,
@@ -238,6 +260,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev 
*hr_dev, u64 in_param,
context->next = cmd->free_head;
cmd->free_head = context - cmd->context;
spin_unlock(&cmd->context_lock);
+   wake_up(&cmd->wq);
 
return ret;
 }
@@ -248,10 +271,8 @@ static int hns_roce_cmd_mbox_wait(struct hns_roce_dev 
*hr_dev, u64 in_param,
 {
int ret = 0;
 
-   down(&hr_dev->cmd.event_sem);
ret = __hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
   in_modifier, op_modifier, op, timeout);
-   up(&hr_dev->cmd.event_sem);
 
return ret;
 }
@@ -313,7 +334,7 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
hr_cmd->free_head = 0;
 
-   sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+   init_waitqueue_head(&hr_cmd->wq);
spin_lock_init(&hr_cmd->context_lock);
 
hr_cmd->token_mask = CMD_TOKEN_MASK;
@@ -325,12 +346,9 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 {
struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
-   int i;
 
hr_cmd->use_events = 0;
-
-   for (i = 0; i < hr_cmd->max_cmds; ++i)
-   down(&hr_cmd->event_sem);
+   hr_cmd->free_head = -1;
 
kfree(hr_cmd->context);
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2afe075..ac95f52 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -364,7 +364,7 @@ struct hns_roce_cmdq {
* Event mode: cmd register mutex protection,
* ensure to not exceed max_cmds and user use limit region
*/
-   struct semaphoreevent_sem;
+   wait_queue_head_t   wq;
int max_cmds;
spinlock_t  context_lock;
int free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 7/9] IB/mthca: Replace counting semaphore event_sem with wait_event

2016-11-20 Thread Binoy Jayan
Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 47 ++---
 drivers/infiniband/hw/mthca/mthca_dev.h |  3 ++-
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 49c6e19..d6a048a 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -405,6 +405,34 @@ void mthca_cmd_event(struct mthca_dev *dev,
complete(&context->done);
 }
 
+static inline struct mthca_cmd_context *
+mthca_try_get_context(struct mthca_cmd *cmd)
+{
+   struct mthca_cmd_context *context = NULL;
+
+   spin_lock(&cmd->context_lock);
+
+   if (cmd->free_head < 0)
+   goto out;
+
+   context = &cmd->context[cmd->free_head];
+   context->token += cmd->token_mask + 1;
+   cmd->free_head = context->next;
+out:
+   spin_unlock(&cmd->context_lock);
+   return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct mthca_cmd_context *
+mthca_get_free_context(struct mthca_cmd *cmd)
+{
+   struct mthca_cmd_context *context;
+
+   wait_event(cmd->wq, (context = mthca_try_get_context(cmd)));
+   return context;
+}
+
 static int mthca_cmd_wait(struct mthca_dev *dev,
  u64 in_param,
  u64 *out_param,
@@ -417,15 +445,7 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
int err = 0;
struct mthca_cmd_context *context;
 
-   down(&dev->cmd.event_sem);
-
-   spin_lock(&dev->cmd.context_lock);
-   BUG_ON(dev->cmd.free_head < 0);
-   context = &dev->cmd.context[dev->cmd.free_head];
-   context->token += dev->cmd.token_mask + 1;
-   dev->cmd.free_head = context->next;
-   spin_unlock(&dev->cmd.context_lock);
-
+   context = mthca_get_free_context(&dev->cmd);
init_completion(&context->done);
 
err = mthca_cmd_post(dev, in_param,
@@ -458,8 +478,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
context->next = dev->cmd.free_head;
dev->cmd.free_head = context - dev->cmd.context;
spin_unlock(&dev->cmd.context_lock);
+   wake_up(&dev->cmd.wq);
 
-   up(&dev->cmd.event_sem);
return err;
 }
 
@@ -571,7 +591,7 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
dev->cmd.context[dev->cmd.max_cmds - 1].next = -1;
dev->cmd.free_head = 0;
 
-   sema_init(&dev->cmd.event_sem, dev->cmd.max_cmds);
+   init_waitqueue_head(&dev->cmd.wq);
spin_lock_init(&dev->cmd.context_lock);
 
for (dev->cmd.token_mask = 1;
@@ -590,12 +610,9 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
  */
 void mthca_cmd_use_polling(struct mthca_dev *dev)
 {
-   int i;
-
dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS;
 
-   for (i = 0; i < dev->cmd.max_cmds; ++i)
-   down(&dev->cmd.event_sem);
+   dev->cmd.free_head = -1;
 
kfree(dev->cmd.context);
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index 87ab964..2fc86db 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -46,6 +46,7 @@
 #include 
 #include 
 
+#include 
 #include "mthca_provider.h"
 #include "mthca_doorbell.h"
 
@@ -121,7 +122,7 @@ struct mthca_cmd {
struct pci_pool  *pool;
struct mutex  hcr_mutex;
struct mutex  poll_mutex;
-   struct semaphore  event_sem;
+   wait_queue_head_t wq;
int   max_cmds;
spinlock_tcontext_lock;
int   free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 8/9] IB/mlx5: Add helper mlx5_ib_post_send_wait

2016-11-20 Thread Binoy Jayan
Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

As semaphores are going away in the future, moving all of these into the
shared helper leaves only a single function using the semaphore, which
can then be rewritten to use something else.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/mlx5/mr.c | 115 +++-
 1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index d4ad672..1593856 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -856,16 +856,40 @@ static inline void mlx5_ib_init_umr_context(struct 
mlx5_ib_umr_context *context)
init_completion(&context->done);
 }
 
+static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
+struct mlx5_umr_wr *umrwr)
+{
+   struct umr_common *umrc = &dev->umrc;
+   struct ib_send_wr *bad;
+   int err;
+   struct mlx5_ib_umr_context umr_context;
+
+   mlx5_ib_init_umr_context(&umr_context);
+   umrwr->wr.wr_cqe = &umr_context.cqe;
+
+   down(&umrc->sem);
+   err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
+   if (err) {
+   mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
+   } else {
+   wait_for_completion(&umr_context.done);
+   if (umr_context.status != IB_WC_SUCCESS) {
+   mlx5_ib_warn(dev, "reg umr failed (%u)\n",
+umr_context.status);
+   err = -EFAULT;
+   }
+   }
+   up(&umrc->sem);
+   return err;
+}
+
 static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
  u64 virt_addr, u64 len, int npages,
  int page_shift, int order, int access_flags)
 {
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = &dev->umrc;
-   struct mlx5_ib_umr_context umr_context;
struct mlx5_umr_wr umrwr = {};
-   struct ib_send_wr *bad;
struct mlx5_ib_mr *mr;
struct ib_sge sg;
int size;
@@ -894,24 +918,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, 
struct ib_umem *umem,
if (err)
goto free_mr;
 
-   mlx5_ib_init_umr_context(&umr_context);
-
-   umrwr.wr.wr_cqe = &umr_context.cqe;
prep_umr_reg_wqe(pd, &umrwr.wr, &sg, dma, npages, mr->mmkey.key,
 page_shift, virt_addr, len, access_flags);
 
-   down(&umrc->sem);
-   err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
-   if (err) {
-   mlx5_ib_warn(dev, "post send failed, err %d\n", err);
+   err = mlx5_ib_post_send_wait(dev, &umrwr);
+   if (err && err != -EFAULT)
goto unmap_dma;
-   } else {
-   wait_for_completion(&umr_context.done);
-   if (umr_context.status != IB_WC_SUCCESS) {
-   mlx5_ib_warn(dev, "reg umr failed\n");
-   err = -EFAULT;
-   }
-   }
 
mr->mmkey.iova = virt_addr;
mr->mmkey.size = len;
@@ -920,7 +932,6 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct 
ib_umem *umem,
mr->live = 1;
 
 unmap_dma:
-   up(&umrc->sem);
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 
kfree(mr_pas);
@@ -940,13 +951,10 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 {
struct mlx5_ib_dev *dev = mr->dev;
struct device *ddev = dev->ib_dev.dma_device;
-   struct umr_common *umrc = &dev->umrc;
-   struct mlx5_ib_umr_context umr_context;
struct ib_umem *umem = mr->umem;
int size;
__be64 *pas;
dma_addr_t dma;
-   struct ib_send_wr *bad;
struct mlx5_umr_wr wr;
struct ib_sge sg;
int err = 0;
@@ -1011,10 +1019,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
 
dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
 
-   mlx5_ib_init_umr_context(&umr_context);
-
memset(&wr, 0, sizeof(wr));
-   wr.wr.wr_cqe = &umr_context.cqe;
 
sg.addr = dma;
sg.length = ALIGN(npages * sizeof(u64),
@@ -1031,19 +1036,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 
start_page_index, int npages,
wr.mkey = mr->mmkey.key;
wr.target.offset = sta

[PATCH v2 5/6] powerpc/perf: Generic ima pmu event functions

2016-11-20 Thread Hemant Kumar
Since, the IMA counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMA PMU units.

This patch adds a set of generic ima pmu related event functions to be
used  by each ima pmu unit. Add code to setup format attribute and to
register ima pmus. Add a event_init function for nest_ima events.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/ima-pmu.h|   2 +
 arch/powerpc/perf/ima-pmu.c   | 122 ++
 arch/powerpc/platforms/powernv/opal-ima.c |  37 +++--
 3 files changed, 154 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/ima-pmu.h 
b/arch/powerpc/include/asm/ima-pmu.h
index 0ed8886..f0d95bb 100644
--- a/arch/powerpc/include/asm/ima-pmu.h
+++ b/arch/powerpc/include/asm/ima-pmu.h
@@ -70,4 +70,6 @@ struct ima_pmu {
 
 #define UNKNOWN_DOMAIN -1
 
+int ima_get_domain(struct device_node *pmu_dev);
+
 #endif /* PPC_POWERNV_IMA_PMU_DEF_H */
diff --git a/arch/powerpc/perf/ima-pmu.c b/arch/powerpc/perf/ima-pmu.c
index 50d2226..9948636 100644
--- a/arch/powerpc/perf/ima-pmu.c
+++ b/arch/powerpc/perf/ima-pmu.c
@@ -17,6 +17,117 @@
 struct perchip_nest_info nest_perchip_info[IMA_MAX_CHIPS];
 struct ima_pmu *per_nest_pmu_arr[IMA_MAX_PMUS];
 
+/* Needed for sanity check */
+extern u64 nest_max_offset;
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *ima_format_attrs[] = {
+   &format_attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group ima_format_group = {
+   .name = "format",
+   .attrs = ima_format_attrs,
+};
+
+static int nest_ima_event_init(struct perf_event *event)
+{
+   int chip_id;
+   u32 config = event->attr.config;
+   struct perchip_nest_info *pcni;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   /* Sanity check for config (event offset) */
+   if (config > nest_max_offset)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   pcni = &nest_perchip_info[chip_id];
+   event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
+   (config & ~PAGE_MASK);
+
+   return 0;
+}
+
+static void ima_read_counter(struct perf_event *event)
+{
+   u64 *addr, data;
+
+   addr = (u64 *)event->hw.event_base;
+   data = __be64_to_cpu(*addr);
+   local64_set(&event->hw.prev_count, data);
+}
+
+static void ima_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count, *addr;
+
+   addr = (u64 *)event->hw.event_base;
+   counter_prev = local64_read(&event->hw.prev_count);
+   counter_new = __be64_to_cpu(*addr);
+   final_count = counter_new - counter_prev;
+
+   local64_set(&event->hw.prev_count, counter_new);
+   local64_add(final_count, &event->count);
+}
+
+static void ima_event_start(struct perf_event *event, int flags)
+{
+   ima_read_counter(event);
+}
+
+static void ima_event_stop(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_UPDATE)
+   ima_perf_event_update(event);
+}
+
+static int ima_event_add(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_START)
+   ima_event_start(event, flags);
+
+   return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct ima_pmu *pmu)
+{
+   if (!pmu)
+   return -EINVAL;
+
+   pmu->pmu.task_ctx_nr = perf_invalid_context;
+   pmu->pmu.event_init = nest_ima_event_init;
+   pmu->pmu.add = ima_event_add;
+   pmu->pmu.del = ima_event_stop;
+   pmu->pmu.start = ima_event_start;
+   pmu->pmu.stop = ima_event_stop;
+   pmu->pmu.read = ima_perf_event_update;
+   pmu->attr_groups[1] = &ima_format_group;
+   pmu->pmu.attr_groups = pmu->attr_groups;
+
+   return 0;
+}
+
 /* dev_str_attr : Populate event "name" and string "str" in attribute */
 static struct attribute *dev_str_attr(const char *name, const char *str)
 {
@@ -83,6 +194,17 @@ int init_ima_pmu(struct ima_events *events, int idx,
if (ret)
goto err_free;
 
+   ret = update_pmu_ops(pmu_

[PATCH v2 3/6] powerpc/powernv: Detect supported IMA units and its events

2016-11-20 Thread Hemant Kumar
Parse device tree to detect IMA units. Traverse through each IMA unit
node to find supported events and corresponding unit/scale files (if any).

Right now, only nest IMA units are supported.
The nest IMA unit event node from device tree will contain the offset in
the reserved memory region to get the counter data for a given
event. The offsets for the nest events are contained in the "reg"
property of the event "node".

Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
Changelog :
v1 -> v2:
 - Read from the "event-name" property instead of "name" property for
   an event node.
 - Assign scale and unit values for events for a PMU which has a common
   unit and scale value.

 arch/powerpc/platforms/powernv/opal-ima.c | 332 ++
 1 file changed, 332 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-ima.c 
b/arch/powerpc/platforms/powernv/opal-ima.c
index 446e7bc..e8d5771 100644
--- a/arch/powerpc/platforms/powernv/opal-ima.c
+++ b/arch/powerpc/platforms/powernv/opal-ima.c
@@ -32,6 +32,337 @@
 #include 
 
 struct perchip_nest_info nest_perchip_info[IMA_MAX_CHIPS];
+struct ima_pmu *per_nest_pmu_arr[IMA_MAX_PMUS];
+
+static int ima_event_info(char *name, struct ima_events *events)
+{
+   char *buf;
+
+   /* memory for content */
+   buf = kzalloc(IMA_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   events->ev_name = name;
+   events->ev_value = buf;
+   return 0;
+}
+
+static int ima_event_info_str(struct property *pp, char *name,
+  struct ima_events *events)
+{
+   int ret;
+
+   ret = ima_event_info(name, events);
+   if (ret)
+   return ret;
+
+   if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > IMA_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+   strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+   return 0;
+}
+
+static int ima_event_info_val(char *name, u32 val,
+  struct ima_events *events)
+{
+   int ret;
+
+   ret = ima_event_info(name, events);
+   if (ret)
+   return ret;
+   sprintf(events->ev_value, "event=0x%x", val);
+
+   return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+ struct ima_events *events, char *ev_name)
+{
+   char *buf;
+   int ret;
+
+   buf = kzalloc(IMA_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   sprintf(buf, "%s.%s", ev_name, event_prop);
+   ret = ima_event_info_str(pp, buf, events);
+   if (ret) {
+   kfree(events->ev_name);
+   kfree(events->ev_value);
+   }
+
+   return ret;
+}
+
+/*
+ * ima_events_node_parser: Parse the event node "dev" and assign the parsed
+ * information to event "events".
+ *
+ * Parses the "reg" property of this event. "reg" gives us the event offset.
+ * Also, parse the "scale" and "unit" properties, if any.
+ */
+static int ima_events_node_parser(struct device_node *dev,
+ struct ima_events *events,
+ struct property *event_scale,
+ struct property *event_unit)
+{
+   struct property *name, *pp;
+   char *ev_name;
+   u32 val;
+   int idx = 0, ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   /*
+* Loop through each property of an event node
+*/
+   name = of_find_property(dev, "event-name", NULL);
+   if (!name)
+   return -ENODEV;
+
+   if (!name->value ||
+ (strnlen(name->value, name->length) == name->length) ||
+ (name->length > IMA_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+
+   ev_name = kzalloc(IMA_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!ev_name)
+   return -ENOMEM;
+
+   strncpy(ev_name, name->value, name->length);
+
+   /*
+* Parse each property of this event node "dev". Property "reg" has
+* the offset which is assigned to the event name. Other properties
+* like "scale" and

[PATCH v2 1/6] powerpc/powernv: Data structure and macros definitions

2016-11-20 Thread Hemant Kumar
Create new header file "ima-pmu.h" to add the data structures
and macros needed for IMA pmu support.


Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/ima-pmu.h | 73 ++
 1 file changed, 73 insertions(+)
 create mode 100644 arch/powerpc/include/asm/ima-pmu.h

diff --git a/arch/powerpc/include/asm/ima-pmu.h 
b/arch/powerpc/include/asm/ima-pmu.h
new file mode 100644
index 000..0ed8886
--- /dev/null
+++ b/arch/powerpc/include/asm/ima-pmu.h
@@ -0,0 +1,73 @@
+#ifndef PPC_POWERNV_IMA_PMU_DEF_H
+#define PPC_POWERNV_IMA_PMU_DEF_H
+
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IMA_MAX_CHIPS  32
+#define IMA_MAX_PMUS   32
+#define IMA_MAX_PMU_NAME_LEN   256
+
+#define NEST_IMA_ENGINE_START  1
+#define NEST_IMA_ENGINE_STOP   0
+#define NEST_MAX_PAGES 16
+
+#define NEST_IMA_PRODUCTION_MODE   1
+
+#define IMA_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMA_DTB_NEST_COMPAT"ibm,ima-counters-chip"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   u32 chip_id;
+   u64 pbase;
+   u64 vbase[NEST_MAX_PAGES];
+   u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct ima_events {
+   char *ev_name;
+   char *ev_value;
+};
+
+/*
+ * Device tree parser code detects IMA pmu support and
+ * registers new IMA pmus. This structure will
+ * hold the pmu functions and attrs for each ima pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct ima_pmu {
+   struct pmu pmu;
+   int domain;
+   const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMA PMUs
+ */
+#define IMA_DOMAIN_NEST1
+
+#define UNKNOWN_DOMAIN -1
+
+#endif /* PPC_POWERNV_IMA_PMU_DEF_H */
-- 
2.7.4



[PATCH v2 6/6] powerpc/perf: IMA pmu cpumask and cpu hotplug support

2016-11-20 Thread Hemant Kumar
Adds cpumask attribute to be used by each IMA pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/perf/ima-pmu.c| 167 -
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 4 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0e2e57b..116c155 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -167,7 +167,8 @@
 #define OPAL_INT_EOI   124
 #define OPAL_INT_SET_MFRR  125
 #define OPAL_PCI_TCE_KILL  126
-#define OPAL_LAST  126
+#define OPAL_NEST_IMA_COUNTERS_CONTROL  128
+#define OPAL_LAST  128
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..bc31251 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -229,6 +229,8 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
kill_type,
 int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
 uint32_t pe_num, uint32_t tce_size,
 uint64_t dma_addr, uint32_t npages);
+int64_t opal_nest_ima_counters_control(uint64_t mode, uint64_t value1,
+ uint64_t value2, uint64_t value3);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/perf/ima-pmu.c b/arch/powerpc/perf/ima-pmu.c
index 9948636..2b1bfc1 100644
--- a/arch/powerpc/perf/ima-pmu.c
+++ b/arch/powerpc/perf/ima-pmu.c
@@ -16,6 +16,7 @@
 
 struct perchip_nest_info nest_perchip_info[IMA_MAX_CHIPS];
 struct ima_pmu *per_nest_pmu_arr[IMA_MAX_PMUS];
+static cpumask_t nest_ima_cpumask;
 
 /* Needed for sanity check */
 extern u64 nest_max_offset;
@@ -31,6 +32,164 @@ static struct attribute_group ima_format_group = {
.attrs = ima_format_attrs,
 };
 
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t ima_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   cpumask_t *active_mask;
+
+   active_mask = &nest_ima_cpumask;
+   return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, ima_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *ima_pmu_cpumask_attrs[] = {
+   &dev_attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group ima_pmu_cpumask_attr_group = {
+   .attrs = ima_pmu_cpumask_attrs,
+};
+
+/*
+ * nest_init : Initializes the nest ima engine for the current chip.
+ */
+static void nest_init(int *loc)
+{
+   int rc;
+
+   rc = opal_nest_ima_counters_control(NEST_IMA_PRODUCTION_MODE,
+   NEST_IMA_ENGINE_START, 0, 0);
+   if (rc)
+   loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0;
+(per_nest_pmu_arr[i] != NULL) && (i < IMA_MAX_PMUS); i++)
+   perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+   old_cpu, new_cpu);
+}
+
+static int ppc_nest_ima_cpu_online(unsigned int cpu)
+{
+   int nid, fcpu, ncpu;
+   struct cpumask *l_cpumask, tmp_mask;
+
+   /* Fint the cpumask of this node */
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* If any of the cpu from this node is already present in the mask,
+* just return, if not, then set this cpu in the mask.
+*/
+   if (!cpumask_and(&tmp_mask, l_cpumask, &nest_ima_cpumask)) {
+   cpumask_set_cpu(cpu, &nest_ima_cpumask);
+   return 0;
+   }
+
+   fcpu = cpumask_first(l_cpumask);
+   ncpu = cpumask_next(cpu, l_cpumask);
+   if (cpu == fcpu) {
+   if (cpumask_test_and_clear_cpu(ncpu, &nest_ima_cpumask)) {
+   cpumask_set_cpu(cpu, &nest_ima_cpumask);
+   nest_change_cpu_context(ncpu, cpu);
+   }
+   }
+
+   return 0;
+}
+
+static int ppc_nest_ima_cpu_offline(unsigned int cpu)
+{
+   int nid, target = -1;
+   struct cpumask *l_cpu

[PATCH v5 9/9] IB/mlx5: Replace semaphore umr_common:sem with wait_event

2016-11-20 Thread Binoy Jayan
Remove semaphore umr_common:sem used to limit concurrent access to umr qp
and introduce an atomic value 'users' to keep track of the same. Use a
wait_event to block when the limit is reached.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/mlx5/main.c| 6 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 ++-
 drivers/infiniband/hw/mlx5/mr.c  | 6 --
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 63036c7..9de716c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2437,10 +2437,6 @@ static void destroy_umrc_res(struct mlx5_ib_dev *dev)
ib_dealloc_pd(dev->umrc.pd);
 }
 
-enum {
-   MAX_UMR_WR = 128,
-};
-
 static int create_umr_res(struct mlx5_ib_dev *dev)
 {
struct ib_qp_init_attr *init_attr = NULL;
@@ -2520,7 +2516,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
dev->umrc.cq = cq;
dev->umrc.pd = pd;
 
-   sema_init(&dev->umrc.sem, MAX_UMR_WR);
+   init_waitqueue_head(&dev->umrc.wq);
ret = mlx5_mr_cache_init(dev);
if (ret) {
mlx5_ib_warn(dev, "mr cache init failed %d\n", ret);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dcdcd19..de31b5f 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -533,7 +533,12 @@ struct umr_common {
struct ib_qp*qp;
/* control access to UMR QP
 */
-   struct semaphoresem;
+   wait_queue_head_t   wq;
+   atomic_tusers;
+};
+
+enum {
+   MAX_UMR_WR = 128,
 };
 
 enum {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1593856..dfaf6f6 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -867,7 +867,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev 
*dev,
mlx5_ib_init_umr_context(&umr_context);
umrwr->wr.wr_cqe = &umr_context.cqe;
 
-   down(&umrc->sem);
+   /* limit number of concurrent ib_post_send() on qp */
+   wait_event(umrc->wq, atomic_add_unless(&umrc->users, 1, MAX_UMR_WR));
err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
if (err) {
mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
@@ -879,7 +880,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev 
*dev,
err = -EFAULT;
}
}
-   up(&umrc->sem);
+   atomic_dec(&umrc->users);
+   wake_up(&umrc->wq);
return err;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 3/9] IB/hns: Replace semaphore poll_sem with mutex

2016-11-20 Thread Binoy Jayan
The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c| 11 ---
 drivers/infiniband/hw/hns/hns_roce_device.h |  3 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..51a0675 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -119,7 +119,7 @@ static int hns_roce_cmd_mbox_post_hw(struct hns_roce_dev 
*hr_dev, u64 in_param,
return ret;
 }
 
-/* this should be called with "poll_sem" */
+/* this should be called with "poll_mutex" */
 static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
u64 out_param, unsigned long in_modifier,
u8 op_modifier, u16 op,
@@ -167,10 +167,10 @@ static int hns_roce_cmd_mbox_poll(struct hns_roce_dev 
*hr_dev, u64 in_param,
 {
int ret;
 
-   down(&hr_dev->cmd.poll_sem);
+   mutex_lock(&hr_dev->cmd.poll_mutex);
ret = __hns_roce_cmd_mbox_poll(hr_dev, in_param, out_param, in_modifier,
   op_modifier, op, timeout);
-   up(&hr_dev->cmd.poll_sem);
+   mutex_unlock(&hr_dev->cmd.poll_mutex);
 
return ret;
 }
@@ -275,7 +275,7 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
struct device *dev = &hr_dev->pdev->dev;
 
mutex_init(&hr_dev->cmd.hcr_mutex);
-   sema_init(&hr_dev->cmd.poll_sem, 1);
+   mutex_init(&hr_dev->cmd.poll_mutex);
hr_dev->cmd.use_events = 0;
hr_dev->cmd.toggle = 1;
hr_dev->cmd.max_cmds = CMD_MAX_NUM;
@@ -319,8 +319,6 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
hr_cmd->token_mask = CMD_TOKEN_MASK;
hr_cmd->use_events = 1;
 
-   down(&hr_cmd->poll_sem);
-
return 0;
 }
 
@@ -335,7 +333,6 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
down(&hr_cmd->event_sem);
 
kfree(hr_cmd->context);
-   up(&hr_cmd->poll_sem);
 }
 
 struct hns_roce_cmd_mailbox
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..2afe075 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -34,6 +34,7 @@
 #define _HNS_ROCE_DEVICE_H
 
 #include 
+#include 
 
 #define DRV_NAME "hns_roce"
 
@@ -358,7 +359,7 @@ struct hns_roce_cmdq {
struct dma_pool *pool;
u8 __iomem  *hcr;
struct mutexhcr_mutex;
-   struct semaphorepoll_sem;
+   struct mutexpoll_mutex;
/*
* Event mode: cmd register mutex protection,
* ensure to not exceed max_cmds and user use limit region
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 1/9] IB/core: iwpm_nlmsg_request: Replace semaphore with completion

2016-11-20 Thread Binoy Jayan
Semaphore sem in iwpm_nlmsg_request is used as completion, so
convert it to a struct completion type. Semaphores are going
away in the future.

Signed-off-by: Binoy Jayan 
---
 drivers/infiniband/core/iwpm_msg.c  | 8 
 drivers/infiniband/core/iwpm_util.c | 7 +++
 drivers/infiniband/core/iwpm_util.h | 3 ++-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/iwpm_msg.c 
b/drivers/infiniband/core/iwpm_msg.c
index 1c41b95..761358f 100644
--- a/drivers/infiniband/core/iwpm_msg.c
+++ b/drivers/infiniband/core/iwpm_msg.c
@@ -394,7 +394,7 @@ int iwpm_register_pid_cb(struct sk_buff *skb, struct 
netlink_callback *cb)
/* always for found nlmsg_request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
-   up(&nlmsg_request->sem);
+   complete(&nlmsg_request->comp);
return 0;
 }
 EXPORT_SYMBOL(iwpm_register_pid_cb);
@@ -463,7 +463,7 @@ int iwpm_add_mapping_cb(struct sk_buff *skb, struct 
netlink_callback *cb)
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
-   up(&nlmsg_request->sem);
+   complete(&nlmsg_request->comp);
return 0;
 }
 EXPORT_SYMBOL(iwpm_add_mapping_cb);
@@ -555,7 +555,7 @@ int iwpm_add_and_query_mapping_cb(struct sk_buff *skb,
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
-   up(&nlmsg_request->sem);
+   complete(&nlmsg_request->comp);
return 0;
 }
 EXPORT_SYMBOL(iwpm_add_and_query_mapping_cb);
@@ -749,7 +749,7 @@ int iwpm_mapping_error_cb(struct sk_buff *skb, struct 
netlink_callback *cb)
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
-   up(&nlmsg_request->sem);
+   complete(&nlmsg_request->comp);
return 0;
 }
 EXPORT_SYMBOL(iwpm_mapping_error_cb);
diff --git a/drivers/infiniband/core/iwpm_util.c 
b/drivers/infiniband/core/iwpm_util.c
index ade71e7..08ddd2e 100644
--- a/drivers/infiniband/core/iwpm_util.c
+++ b/drivers/infiniband/core/iwpm_util.c
@@ -323,8 +323,7 @@ struct iwpm_nlmsg_request *iwpm_get_nlmsg_request(__u32 
nlmsg_seq,
nlmsg_request->nl_client = nl_client;
nlmsg_request->request_done = 0;
nlmsg_request->err_code = 0;
-   sema_init(&nlmsg_request->sem, 1);
-   down(&nlmsg_request->sem);
+   init_completion(&nlmsg_request->comp);
return nlmsg_request;
 }
 
@@ -368,8 +367,8 @@ int iwpm_wait_complete_req(struct iwpm_nlmsg_request 
*nlmsg_request)
 {
int ret;
 
-   ret = down_timeout(&nlmsg_request->sem, IWPM_NL_TIMEOUT);
-   if (ret) {
+   ret = wait_for_completion_timeout(&nlmsg_request->comp, 
IWPM_NL_TIMEOUT);
+   if (!ret) {
ret = -EINVAL;
pr_info("%s: Timeout %d sec for netlink request (seq = %u)\n",
__func__, (IWPM_NL_TIMEOUT/HZ), 
nlmsg_request->nlmsg_seq);
diff --git a/drivers/infiniband/core/iwpm_util.h 
b/drivers/infiniband/core/iwpm_util.h
index af1fc14..ea6c299 100644
--- a/drivers/infiniband/core/iwpm_util.h
+++ b/drivers/infiniband/core/iwpm_util.h
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -69,7 +70,7 @@ struct iwpm_nlmsg_request {
u8  nl_client;
u8  request_done;
u16 err_code;
-   struct semaphoresem;
+   struct completion   comp;
struct kref kref;
 };
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 0/9] infiniband: Remove semaphores

2016-11-20 Thread Binoy Jayan

Hi,

These are a set of patches [v5] which removes semaphores from infiniband.
These are part of a bigger effort to eliminate all semaphores from the
linux kernel.

v4 --> v5
-
IB/isert: Replace semaphore sem with completion
  - Modified changelog to support use of completion
IB/mlx5: Simplify completion into a wait_event
  - Avoid this patch.
As umr_context is on the stack, and we are waiting
for it to be fully done, it really should be a completion.

v3 -> v4:
-

IB/mlx5: Added patch - Replace semaphore umr_common:sem with wait_event
IB/mlx5: Fixed a bug pointed out by Leon Romanovsky

v2 -> v3:
-

IB/mlx5: Move '&umr_context' into helper fn
IB/mthca: Restructure mthca_cmd.c to manage free_head
IB/hns: Restructure hns_roce_cmd.c to manage free_head
IB/core: Convert completion to wait_event
IB/mlx5: Simplify completion into a wait_event

v1 -> v2:
-

IB/hns   : Use wait_event instead of open coding counting semaphores
IB/mthca : Use wait_event instead of open coding counting semaphores
IB/mthca : Remove mutex_[un]lock from *_cmd_use_events/*_cmd_use_polling
IB/mlx5  : Cleanup, add helper mlx5_ib_post_send_wait

v1
-
  IB/core: iwpm_nlmsg_request: Replace semaphore with completion
  IB/core: Replace semaphore sm_sem with completion
  IB/hns: Replace semaphore poll_sem with mutex
  IB/mthca: Replace semaphore poll_sem with mutex
  IB/isert: Replace semaphore sem with completion
  IB/hns: Replace counting semaphore event_sem with wait condition
  IB/mthca: Replace counting semaphore event_sem with wait condition
  IB/mlx5: Replace counting semaphore sem with wait condition

Thanks,
Binoy

Binoy Jayan (9):
  IB/core: iwpm_nlmsg_request: Replace semaphore with completion
  IB/core: Replace semaphore sm_sem with an atomic wait
  IB/hns: Replace semaphore poll_sem with mutex
  IB/mthca: Replace semaphore poll_sem with mutex
  IB/isert: Replace semaphore sem with completion
  IB/hns: Replace counting semaphore event_sem with wait_event
  IB/mthca: Replace counting semaphore event_sem with wait_event
  IB/mlx5: Add helper mlx5_ib_post_send_wait
  IB/mlx5: Replace semaphore umr_common:sem with wait_event

 drivers/infiniband/core/iwpm_msg.c  |   8 +-
 drivers/infiniband/core/iwpm_util.c |   7 +-
 drivers/infiniband/core/iwpm_util.h |   3 +-
 drivers/infiniband/core/user_mad.c  |  20 +++--
 drivers/infiniband/hw/hns/hns_roce_cmd.c|  57 +-
 drivers/infiniband/hw/hns/hns_roce_device.h |   5 +-
 drivers/infiniband/hw/mlx5/main.c   |   6 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h|   7 +-
 drivers/infiniband/hw/mlx5/mr.c | 117 
 drivers/infiniband/hw/mthca/mthca_cmd.c |  57 --
 drivers/infiniband/hw/mthca/mthca_cmd.h |   1 +
 drivers/infiniband/hw/mthca/mthca_dev.h |   5 +-
 drivers/infiniband/ulp/isert/ib_isert.c |   6 +-
 drivers/infiniband/ulp/isert/ib_isert.h |   3 +-
 14 files changed, 147 insertions(+), 155 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH V2 2/2] pinctrl: tegra: Add driver to configure voltage and power of io pads

2016-11-20 Thread kbuild test robot
Hi Laxman,

[auto build test ERROR on tegra/for-next]
[also build test ERROR on v4.9-rc6 next-20161117]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Laxman-Dewangan/pinctrl-tegra-Add-support-for-IO-pad-control/20161109-215733
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git for-next
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/pinctrl/tegra/pinctrl-tegra-io-pad.c: In function 
'tegra_io_pad_pinconf_get':
>> drivers/pinctrl/tegra/pinctrl-tegra-io-pad.c:113:9: error: implicit 
>> declaration of function 'tegra_io_pad_power_get_status' 
>> [-Werror=implicit-function-declaration]
  ret = tegra_io_pad_power_get_status(pad_id);
^
   cc1: some warnings being treated as errors

vim +/tegra_io_pad_power_get_status +113 
drivers/pinctrl/tegra/pinctrl-tegra-io-pad.c

   107  enum tegra_io_pad pad_id = pads_cfg->pad_id;
   108  int arg = 0;
   109  int ret;
   110  
   111  switch (param) {
   112  case PIN_CONFIG_LOW_POWER_MODE:
 > 113  ret = tegra_io_pad_power_get_status(pad_id);
   114  if (ret < 0)
   115  return ret;
   116  arg = !ret;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH v2 2/4] spi: spi-fsl-dspi: Fix continuous selection format

2016-11-20 Thread Sanchayan Maity
Current DMA implementation was not handling the continuous selection
format viz. SPI chip select would be deasserted even between sequential
serial transfers. Use the cs_change variable and correctly set or
reset the CONT bit accordingly for case where peripherals require
the chip select to be asserted between sequential transfers.

Signed-off-by: Sanchayan Maity 
---
 drivers/spi/spi-fsl-dspi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index b1ee1f5..41422cd 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -261,6 +261,8 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
dspi->dma->tx_dma_buf[i] = SPI_PUSHR_TXDATA(val) |
SPI_PUSHR_PCS(dspi->cs) |
SPI_PUSHR_CTAS(0);
+   if (!dspi->cs_change)
+   dspi->dma->tx_dma_buf[i] |= SPI_PUSHR_CONT;
dspi->tx += tx_word + 1;
 
dma->tx_desc = dmaengine_prep_slave_single(dma->chan_tx,
-- 
2.10.2



[PATCH v2 4/4] spi: spi-fsl-dspi: Minor code cleanup and error path fixes

2016-11-20 Thread Sanchayan Maity
Code cleanup for improving code readability and error path fixes
and cleanup removing use of devm_kfree.

Signed-off-by: Sanchayan Maity 
---
 drivers/spi/spi-fsl-dspi.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 08882f7..2987a16 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -226,8 +226,10 @@ static void dspi_rx_dma_callback(void *arg)
if (!(dspi->dataflags & TRAN_STATE_RX_VOID)) {
for (i = 0; i < dma->curr_xfer_len; i++) {
d = dspi->dma->rx_dma_buf[i];
-   rx_word ? (*(u16 *)dspi->rx = d) :
-   (*(u8 *)dspi->rx = d);
+   if (rx_word)
+   *(u16 *)dspi->rx = d;
+   else
+   *(u8 *)dspi->rx = d;
dspi->rx += rx_word + 1;
}
}
@@ -247,14 +249,20 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi 
*dspi)
tx_word = is_double_byte_mode(dspi);
 
for (i = 0; i < dma->curr_xfer_len - 1; i++) {
-   val = tx_word ? *(u16 *) dspi->tx : *(u8 *) dspi->tx;
+   if (tx_word)
+   val = *(u16 *) dspi->tx;
+   else
+   val = *(u8 *) dspi->tx;
dspi->dma->tx_dma_buf[i] =
SPI_PUSHR_TXDATA(val) | SPI_PUSHR_PCS(dspi->cs) |
SPI_PUSHR_CTAS(0) | SPI_PUSHR_CONT;
dspi->tx += tx_word + 1;
}
 
-   val = tx_word ? *(u16 *) dspi->tx : *(u8 *) dspi->tx;
+   if (tx_word)
+   val = *(u16 *) dspi->tx;
+   else
+   val = *(u8 *) dspi->tx;
dspi->dma->tx_dma_buf[i] = SPI_PUSHR_TXDATA(val) |
SPI_PUSHR_PCS(dspi->cs) |
SPI_PUSHR_CTAS(0);
@@ -430,9 +438,11 @@ static int dspi_request_dma(struct fsl_dspi *dspi, 
phys_addr_t phy_addr)
return 0;
 
 err_slave_config:
-   devm_kfree(dev, dma->rx_dma_buf);
+   dma_free_coherent(dev, DSPI_DMA_BUFSIZE,
+   dma->rx_dma_buf, dma->rx_dma_phys);
 err_rx_dma_buf:
-   devm_kfree(dev, dma->tx_dma_buf);
+   dma_free_coherent(dev, DSPI_DMA_BUFSIZE,
+   dma->tx_dma_buf, dma->tx_dma_phys);
 err_tx_dma_buf:
dma_release_channel(dma->chan_tx);
 err_tx_channel:
-- 
2.10.2



[PATCH v2 0/4] Fixes for Vybrid SPI DMA implementation

2016-11-20 Thread Sanchayan Maity
Hello,

The following set of patches have fixes for Vybrid SPI DMA
implementation along with some minor clean ups requested
at time when v3 version of SPI DMA support patch was accepted.

This series of patches is based on top of branch topic/fsl-dspi.
http://git.kernel.org/cgit/linux/kernel/git/broonie/spi.git/log/?h=topic/fsl-dspi

The patches have been tested on a Toradex Colibri Vybrid VF61 module
and now incoporate feedback from Stefan on version 1 of patchset.

Changes since v1:
1. Place the continuous selection format patch second in order and remove
code duplication
2. Improve the use of curr_xfer_len and instead of converting from bytes
to DMA transfers in every use, do it at a single place. Accordingly change
it's use at other places
3. Code cleanup patch has less to clean with change above 

v1:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1274632.html

Thanks & Regards,
Sanchayan.

Sanchayan Maity (4):
  spi: spi-fsl-dspi: Fix SPI transfer issue when using multiple SPI_IOC_MESSAGE
  spi: spi-fsl-dspi: Fix continuous selection format
  spi: spi-fsl-dspi: Fix incorrect DMA setup
  spi: spi-fsl-dspi: Minor code cleanup and error path fixes

 drivers/spi/spi-fsl-dspi.c | 71 --
 1 file changed, 44 insertions(+), 27 deletions(-)

-- 
2.10.2



[PATCH v2 3/4] spi: spi-fsl-dspi: Fix incorrect DMA setup

2016-11-20 Thread Sanchayan Maity
Currently dmaengine_prep_slave_single was being called with length
set to the complete DMA buffer size. This resulted in unwanted bytes
being transferred to the SPI register leading to clock and MOSI lines
having unwanted data even after chip select got deasserted and the
required bytes having been transferred.

While at it also clean up the use of curr_xfer_len which is central
to the DMA setup, from bytes to DMA transfers for every use.

Signed-off-by: Sanchayan Maity 
---
 drivers/spi/spi-fsl-dspi.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 41422cd..08882f7 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -151,6 +151,7 @@ static const struct fsl_dspi_devtype_data ls2085a_data = {
 };
 
 struct fsl_dspi_dma {
+   /* Length of transfer in words of DSPI_FIFO_SIZE */
u32 curr_xfer_len;
 
u32 *tx_dma_buf;
@@ -217,15 +218,13 @@ static void dspi_rx_dma_callback(void *arg)
struct fsl_dspi *dspi = arg;
struct fsl_dspi_dma *dma = dspi->dma;
int rx_word;
-   int i, len;
+   int i;
u16 d;
 
rx_word = is_double_byte_mode(dspi);
 
-   len = rx_word ? (dma->curr_xfer_len / 2) : dma->curr_xfer_len;
-
if (!(dspi->dataflags & TRAN_STATE_RX_VOID)) {
-   for (i = 0; i < len; i++) {
+   for (i = 0; i < dma->curr_xfer_len; i++) {
d = dspi->dma->rx_dma_buf[i];
rx_word ? (*(u16 *)dspi->rx = d) :
(*(u8 *)dspi->rx = d);
@@ -242,14 +241,12 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi 
*dspi)
struct device *dev = &dspi->pdev->dev;
int time_left;
int tx_word;
-   int i, len;
+   int i;
u16 val;
 
tx_word = is_double_byte_mode(dspi);
 
-   len = tx_word ? (dma->curr_xfer_len / 2) : dma->curr_xfer_len;
-
-   for (i = 0; i < len - 1; i++) {
+   for (i = 0; i < dma->curr_xfer_len - 1; i++) {
val = tx_word ? *(u16 *) dspi->tx : *(u8 *) dspi->tx;
dspi->dma->tx_dma_buf[i] =
SPI_PUSHR_TXDATA(val) | SPI_PUSHR_PCS(dspi->cs) |
@@ -267,7 +264,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 
dma->tx_desc = dmaengine_prep_slave_single(dma->chan_tx,
dma->tx_dma_phys,
-   DSPI_DMA_BUFSIZE, DMA_MEM_TO_DEV,
+   dma->curr_xfer_len *
+   DMA_SLAVE_BUSWIDTH_4_BYTES,
+   DMA_MEM_TO_DEV,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!dma->tx_desc) {
dev_err(dev, "Not able to get desc for DMA xfer\n");
@@ -283,7 +282,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 
dma->rx_desc = dmaengine_prep_slave_single(dma->chan_rx,
dma->rx_dma_phys,
-   DSPI_DMA_BUFSIZE, DMA_DEV_TO_MEM,
+   dma->curr_xfer_len *
+   DMA_SLAVE_BUSWIDTH_4_BYTES,
+   DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!dma->rx_desc) {
dev_err(dev, "Not able to get desc for DMA xfer\n");
@@ -330,17 +331,17 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
struct device *dev = &dspi->pdev->dev;
int curr_remaining_bytes;
int bytes_per_buffer;
-   int tx_word;
+   int word = 1;
int ret = 0;
 
-   tx_word = is_double_byte_mode(dspi);
+   if (is_double_byte_mode(dspi))
+   word = 2;
curr_remaining_bytes = dspi->len;
+   bytes_per_buffer = DSPI_DMA_BUFSIZE / DSPI_FIFO_SIZE;
while (curr_remaining_bytes) {
/* Check if current transfer fits the DMA buffer */
-   dma->curr_xfer_len = curr_remaining_bytes;
-   bytes_per_buffer = DSPI_DMA_BUFSIZE /
-   (DSPI_FIFO_SIZE / (tx_word ? 2 : 1));
-   if (curr_remaining_bytes > bytes_per_buffer)
+   dma->curr_xfer_len = curr_remaining_bytes / word;
+   if (dma->curr_xfer_len > bytes_per_buffer)
dma->curr_xfer_len = bytes_per_buffer;
 
ret = dspi_next_xfer_dma_submit(dspi);
@@ -349,7 +350,7 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
goto exit;
 
} else {
-   curr_remaining_bytes -= dma->curr_xfer_len;
+   curr_remaining_bytes -= dma->curr_xfer_len * word;
if (curr_remaining_bytes < 0)
curr_

[PATCH v2 1/4] spi: spi-fsl-dspi: Fix SPI transfer issue when using multiple SPI_IOC_MESSAGE

2016-11-20 Thread Sanchayan Maity
Current DMA implementation had a bug where the DMA transfer would
exit the loop in dspi_transfer_one_message after the completion of
a single transfer. This results in a multi message transfer submitted
with SPI_IOC_MESSAGE to terminate incorrectly without an error.

Signed-off-by: Sanchayan Maity 
Reviewed-by: Stefan Agner 
---
 drivers/spi/spi-fsl-dspi.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index bc64700..b1ee1f5 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -714,7 +714,7 @@ static int dspi_transfer_one_message(struct spi_master 
*master,
SPI_RSER_TFFFE | SPI_RSER_TFFFD |
SPI_RSER_RFDFE | SPI_RSER_RFDFD);
status = dspi_dma_xfer(dspi);
-   goto out;
+   break;
default:
dev_err(&dspi->pdev->dev, "unsupported trans_mode %u\n",
trans_mode);
@@ -722,9 +722,13 @@ static int dspi_transfer_one_message(struct spi_master 
*master,
goto out;
}
 
-   if (wait_event_interruptible(dspi->waitq, dspi->waitflags))
-   dev_err(&dspi->pdev->dev, "wait transfer complete 
fail!\n");
-   dspi->waitflags = 0;
+   if (trans_mode != DSPI_DMA_MODE) {
+   if (wait_event_interruptible(dspi->waitq,
+   dspi->waitflags))
+   dev_err(&dspi->pdev->dev,
+   "wait transfer complete fail!\n");
+   dspi->waitflags = 0;
+   }
 
if (transfer->delay_usecs)
udelay(transfer->delay_usecs);
-- 
2.10.2



Re: [PATCH v2] ARM: Drop fixed 200 Hz timer requirement from Samsung platforms

2016-11-20 Thread Tomasz Figa
2016-11-18 17:46 GMT+09:00 Arnd Bergmann :
> Maybe add a paragraph about the specific problem:
>
> "On s3c24xx, the PWM counter is only 16 bit wide, and with the
> typical 12MHz input clock that overflows every 5.5ms. This works
> with HZ=200 or higher but not with HZ=100 which needs a 10ms
> interval between ticks. On Later chips (S3C64xx, S5P and EXYNOS),
> the counter is 32 bits and does not have this problem.
> The new samsung_pwm_timer driver solves the problem by scaling
> the input clock by a factor of 50 on s3c24xx, which makes it
> less accurate but allows HZ=100 as well as CONFIG_NO_HZ with
> fewer wakeups".

One thing to correct here is that the typical clock is PCLK, which is
derived from one of the PLLs and AFAIR is between 33-66 MHz on
s3c24xx. Technically you can drive the PWM block from an external
clock (12 MHz for some board-file based boards), but for simplicity
this functionality was omitted in the new PWM timer driver used for DT
boards (which worked fine with the PWM driven by PCLK).

Also I'm wondering if the divisor we use right now for 16-bit timers
isn't too small, since it gives us a really short wraparound time,
which means getting more timer interrupts for longer intervals, kind
of defeating the benefit of tickless mode. However, AFAICT it doesn't
affect the HZ problem.

Best regards.
Tomasz


Re: [PATCH] xen-scsifront: Add a missing call to kfree

2016-11-20 Thread Juergen Gross
On 19/11/16 19:22, Quentin Lambert wrote:
> Most error branches following the call to kmalloc contain
> a call to kfree. This patch add these calls where they are
> missing.
> 
> This issue was found with Hector.
> 
> Signed-off-by: Quentin Lambert 

Nice catch. I think this will need some more work, I'll do a
follow-on patch.

Reviewed-by: Juergen Gross 

> 
> ---
>  drivers/scsi/xen-scsifront.c |1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/drivers/scsi/xen-scsifront.c
> +++ b/drivers/scsi/xen-scsifront.c
> @@ -627,6 +627,7 @@ static int scsifront_action_handler(stru
>  
>   if (scsifront_enter(info)) {
>   spin_unlock_irq(host->host_lock);
> + kfree(shadow);
>   return FAILED;
>   }
>  
> 



Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread hpa
On November 19, 2016 5:52:57 PM PST, Andy Lutomirski  wrote:
>This is a question for the old-timers here, since I can't find
>anything resembling an answer in the SDM.
>
>Suppose an exception happens (#UD in this case, but I assume it
>doesn't really matter).  We're not in long mode, and the IDT is set up
>to deliver to a normal 32-bit kernel code segment.  We're running in
>that very same code segment when the exception hits, so no CPL change
>occurs and the TSS doesn't particularly matter.
>
>The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
>happens to the high word of CS on the stack?
>
>The SDM appears to say nothing at all about this.  Modern systems
>(e.g. my laptop running in 32-bit legacy mode under KVM) appear to
>zero-extend CS.  But Matthew's 486DX appears to put garbage in the
>high bits (or maybe just leave whatever was already on the stack in
>place).
>
>Do any of you happen to know what's going on and when the behavior
>changed?  I'd like to know just how big of a problem this is.  Because
>if lots of CPUs work like Matthew's, we have lots of subtle bugs on
>them.
>
>--Andy

I believe i686+ writes zero, older CPUs leave unchanged.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-20 Thread Jon Masters
On 11/07/2016 07:45 PM, Will Deacon wrote:

> I figured this was a reasonable post to piggy-back on for the LPC minutes
> relating to guest MSIs on arm64.

Thanks for this Will. I'm still digging out post-LPC and SC16, but the
summary was much appreciated, and I'm glad the conversation is helping.

>   1. The physical memory map is not standardised (Jon pointed out that
>  this is something that was realised late on)

Just to note, we discussed this one about 3-4 years ago. I recall making
a vigorous slideshow at a committee meeting in defense of having a
single memory map for ARMv8 servers and requiring everyone to follow it.
I was weak. I listened to the comments that this was "unreasonable".
Instead, I consider it was unreasonable of me to not get with the other
OS vendors and force things to be done one way. The lack of a "map at
zero" RAM location on ARMv8 has been annoying enough for 32-bit DMA only
devices on 64-bit (behind an SMMU but in passthrough mode it doesn't
help) and other issues beyond fixing the MSI doorbell regions. If I ever
have a time machine, I tried harder.

> Jon pointed out that most people are pretty conservative about hardware
> choices when migrating between them -- that is, they may only migrate
> between different revisions of the same SoC, or they know ahead of time
> all of the memory maps they want to support and this could be communicated
> by way of configuration to libvirt.

I think it's certainly reasonable to assume this in an initial
implementation and fix it later. Currently, we're very conservative
about host CPU passthrough anyway and can't migrate from one microarch
to another revision of the same microarch even. And on x86, nobody
really supports e.g. Intel to AMD and back again. I've always been of
the mind that we should ensure the architecture can handle this, but
then cautiously approach this with a default to not doing it.

> Alex asked if there was a security
> issue with DMA bypassing the SMMU, but there aren't currently any systems
> where that is known to happen. Such a system would surely not be safe for
> passthrough.

There are other potential security issues that came up but don't need to
be noted here (yet). I have wanted to clarify the SBSA for a long time
when it comes to how IOMMUs should be implemented. It's past time that
we went back and had a few conversations about that. I've poked.

> Ben mused that a way to handle conflicts dynamically might be to hotplug
> on the entire host bridge in the guest, passing firmware tables describing
> the new reserved regions as a property of the host bridge. Whilst this
> may well solve the issue, it was largely considered future work due to
> its invasive nature and dependency on firmware tables (and guest support)
> that do not currently exist.

Indeed. It's an elegant solution (thanks Ben) that I gather POWER
already does (good for them). We've obviously got a few things to clean
up after we get the basics in place. Again, I think we can consider it
reasonable that the MSI doorbell regions are predetermined on system A
well ahead of any potential migration (that may or may not then work)
for the moment. Vendors will want to loosen this later, and they can
drive the work to do that, for example by hotplugging a host bridge.

Jon.



Re: 'kbuild' merge before 4.9-rc1 breaks build and boot

2016-11-20 Thread Nicholas Piggin
On Sun, 20 Nov 2016 19:26:23 +0100
Peter Wu  wrote:

> Hi Nicholas,
> 
> Current git master (v4.9-rc5-364-g77079b1) with the latest kbuild fixes
> is still failing to load modules when built with CONFIG_MODVERSIONS=y on
> x86_64 using GCC 6.2.1.
> 
> It can still be reproduced with make defconfig, then enabling
> CONFIG_MODVERSIONS=y. The build output contains:
> 
> WARNING: "memcpy" [net/netfilter/nf_nat.ko] has no CRC!
> WARNING: "memmove" [net/netfilter/nf_nat.ko] has no CRC!
> WARNING: "_copy_to_user" [fs/efivarfs/efivarfs.ko] has no CRC!
> WARNING: "memcpy" [fs/efivarfs/efivarfs.ko] has no CRC!
> WARNING: "_copy_from_user" [fs/efivarfs/efivarfs.ko] has no CRC!

Hi Peter,

Sorry it's taken some time, bear with us. The arch specific patches need
to be merged now. Adam, what is the status of your patch? Please submit
to x86 maintainers if you haven't already.

Thanks,
Nick


Re: [PATCH 4.9.0-rc5] AR9300 calibration problems with antenna selected

2016-11-20 Thread miaoqing


I would prefer that you didn't submit this.



I recently tried to select a single antenna on AR9300 and it works for
30 seconds only. The subsequent calibration makes the RX signal level
to drop from the usual -30/-40 dBm to -70/-80 dBm, and the
transmission practically stops.

With the attached patch it works, though selecting the antenna doesn't
seem to have any visible effect, at least with "iw wlanX station dump"
(perhaps it works for TX).

I'm using ad-hoc mode:

rmmod ath9k
modprobe ath9k
iw dev wlan0 set type ibss
iw phy phyX set antenna 2


2 is a bad mask. We use bitmap, the valid masks are 1, 3, 7.

--
Miaoqing




[PATCH] staging/lustre: Use proper number of bytes in copy_from_user

2016-11-20 Thread Oleg Drokin
From: Jian Yu 

This patch removes the usage of MAX_STRING_SIZE from
copy_from_user() and just copies enough bytes to cover
count passed in.

Signed-off-by: Jian Yu 
Reviewed-on: http://review.whamcloud.com/23462
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8774
Reviewed-by: John L. Hammond 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/obdclass/lprocfs_status.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c 
b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index 8a2f02f3..db49992 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -400,10 +400,17 @@ int lprocfs_wr_uint(struct file *file, const char __user 
*buffer,
char dummy[MAX_STRING_SIZE + 1], *end;
unsigned long tmp;
 
-   dummy[MAX_STRING_SIZE] = '\0';
-   if (copy_from_user(dummy, buffer, MAX_STRING_SIZE))
+   if (count >= sizeof(dummy))
+   return -EINVAL;
+
+   if (count == 0)
+   return 0;
+
+   if (copy_from_user(dummy, buffer, count))
return -EFAULT;
 
+   dummy[count] = '\0';
+
tmp = simple_strtoul(dummy, &end, 0);
if (dummy == end)
return -EINVAL;
-- 
2.7.4



Re: [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 02:30:46PM +1100, Balbir Singh wrote:
> On 19/11/16 05:18, Jérôme Glisse wrote:

[...]

> > +
> > +
> > +#if defined(CONFIG_HMM)
> > +struct hmm_migrate {
> > +   struct vm_area_struct   *vma;
> > +   unsigned long   start;
> > +   unsigned long   end;
> > +   unsigned long   npages;
> > +   hmm_pfn_t   *pfns;
> 
> I presume the destination is pfns[] or is the source?

Both when alloca_and_copy() is call it is fill with source memory, but once
that callback returns it must have set the destination memory inside that
array. This is what i discussed with Aneesh in this thread.

> > +};
> > +
> > +static int hmm_collect_walk_pmd(pmd_t *pmdp,
> > +   unsigned long start,
> > +   unsigned long end,
> > +   struct mm_walk *walk)
> > +{
> > +   struct hmm_migrate *migrate = walk->private;
> > +   struct mm_struct *mm = walk->vma->vm_mm;
> > +   unsigned long addr = start;
> > +   spinlock_t *ptl;
> > +   hmm_pfn_t *pfns;
> > +   int pages = 0;
> > +   pte_t *ptep;
> > +
> > +again:
> > +   if (pmd_none(*pmdp))
> > +   return 0;
> > +
> > +   split_huge_pmd(walk->vma, pmdp, addr);
> > +   if (pmd_trans_unstable(pmdp))
> > +   goto again;
> > +
> 
> OK., so we always split THP before migration

Yes because i need special swap entry and those does not exist for pmd.

> > +   pfns = &migrate->pfns[(addr - migrate->start) >> PAGE_SHIFT];
> > +   ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> > +   arch_enter_lazy_mmu_mode();
> > +
> > +   for (; addr < end; addr += PAGE_SIZE, pfns++, ptep++) {
> > +   unsigned long pfn;
> > +   swp_entry_t entry;
> > +   struct page *page;
> > +   hmm_pfn_t flags;
> > +   bool write;
> > +   pte_t pte;
> > +
> > +   pte = ptep_get_and_clear(mm, addr, ptep);
> > +   if (!pte_present(pte)) {
> > +   if (pte_none(pte))
> > +   continue;
> > +
> > +   entry = pte_to_swp_entry(pte);
> > +   if (!is_device_entry(entry)) {
> > +   set_pte_at(mm, addr, ptep, pte);
> 
> Why hard code this, in general the ability to migrate a VMA
> start/end range seems like a useful API.

Some memory can not be migrated, can not migrate something that is already
being migrated or something that is swap or something that is bad memory
... I only try to migrate valid memory.

> > +   continue;
> > +   }
> > +
> > +   flags = HMM_PFN_DEVICE | HMM_PFN_UNADDRESSABLE;
> 
> Currently UNADDRESSABLE?

Yes, this is a special device swap entry and those it is unaddressable memory.
The destination memory might also be unaddressable (migrating from one device
to another device).


> > +   page = device_entry_to_page(entry);
> > +   write = is_write_device_entry(entry);
> > +   pfn = page_to_pfn(page);
> > +
> > +   if (!(page->pgmap->flags & MEMORY_MOVABLE)) {
> > +   set_pte_at(mm, addr, ptep, pte);
> > +   continue;
> > +   }
> > +
> > +   } else {
> > +   pfn = pte_pfn(pte);
> > +   page = pfn_to_page(pfn);
> > +   write = pte_write(pte);
> > +   flags = is_zone_device_page(page) ? HMM_PFN_DEVICE : 0;
> > +   }
> > +
> > +   /* FIXME support THP see hmm_migrate_page_check() */
> > +   if (PageTransCompound(page))
> > +   continue;
> 
> Didn't we split the THP above?

We splited huge pmd not huge page. Intention is to support huge page but i 
wanted
to keep patch simple and THP need special handling when it comes to refcount to
check for pin (either on huge page or on one of its tail page).

> 
> > +
> > +   *pfns = hmm_pfn_from_pfn(pfn) | HMM_PFN_MIGRATE | flags;
> > +   *pfns |= write ? HMM_PFN_WRITE : 0;
> > +   migrate->npages++;
> > +   get_page(page);
> > +
> > +   if (!trylock_page(page)) {
> > +   set_pte_at(mm, addr, ptep, pte);
> 
> put_page()?

No, we will try latter to lock the page and thus we want to keep a ref on the 
page.

> > +   } else {
> > +   pte_t swp_pte;
> > +
> > +   *pfns |= HMM_PFN_LOCKED;
> > +
> > +   entry = make_migration_entry(page, write);
> > +   swp_pte = swp_entry_to_pte(entry);
> > +   if (pte_soft_dirty(pte))
> > +   swp_pte = pte_swp_mksoft_dirty(swp_pte);
> > +   set_pte_at(mm, addr, ptep, swp_pte);
> > +
> > +   page_remove_rmap(page, false);
> > +   put_page(page);
> > +   pages++;
> > +   }
> > +   }
> > +
> > +   arch_leave_lazy_mmu_mode();
> >

Re: Synopsys Ethernet QoS Driver

2016-11-20 Thread Rayagond Kokatanur
On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent  wrote:
> On Fri, Nov 18, 2016 at 02:20:27PM +, Joao Pinto wrote:
>> For now we are interesting in improving the synopsys QoS driver under
>> /nect/ethernet/synopsys. For now the driver structure consists of a single 
>> file
>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and 
>> platform
>> related stuff.
>>
>> Our strategy would be:
>>
>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have Ethernet 
>> QoS
>> related stuff to be reused by the platform / pci drivers
>> d) Add a set of features to the "core driver" that we have available 
>> internally
>
> Note that there are actually two drivers in mainline for this hardware:
>
>  drivers/net/ethernet/synopsis/
>  drivers/net/ethernet/stmicro/stmmac/

Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
both 3.x and 4.x. It has glue layer for pci, platform, core etc,
please refer this driver once before you start.

You can start adding missing feature of 4.x in stmmac driver.

>
> (See http://lists.openwall.net/netdev/2016/02/29/127)
>
> The former only supports 4.x of the hardware.
>
> The later supports 4.x and 3.x and already has a platform glue driver
> with support for several platforms, a PCI glue driver, and a core driver
> with several features not present in the former (for example: TX/RX
> interrupt coalescing, EEE, PTP).
>
> Have you evaluated both drivers?  Why have you decided to work on the
> former rather than the latter?



-- 
wwr
Rayagond


[PATCH v3] ARM: at91/dt: add dts file for sama5d36ek CMP board

2016-11-20 Thread Wenyou Yang
The sama5d36ek CMP board is the variant of sama5d3xek board.
It is equipped with the low-power DDR2 SDRAM, PMIC ACT8865 and
some power rail. Its main purpose is used to measure the power
consumption.
The difference of the sama5d36ek CMP dts from sama5d36ek dts is
listed as below.
 1. The USB host nodes are removed, that is, the USB host is disabled.
 2. The gpio_keys node is added to wake up from the sleep.
 3. The LCD isn't supported due to the pins for LCD are conflicted
with gpio_keys.
 4. The adc0 node support the pinctrl sleep state to fix the over
consumption on VDDANA.

As said in errata, "When the USB host ports are used in high speed
mode (EHCI), it is not possible to suspend the ports if no device is
attached on each port. This leads to increased power consumption even
if the system is in a low power mode." That is why the the USB host
is disabled.

Signed-off-by: Wenyou Yang 
---

Changes in v3:
 - Use a dual license scheme for DT files.
 - Use the proper model name and the compatible string to reflect
   the nature of this new "CMP" board.
 - Change name of wakeup property to "wakeup-source".
 - Remove unnecessary comments.
 - Remove bootargs.

Changes in v2:
 - Add the pinctrl sleep state for adc0 node to fix the over
   consumption on VDDANA.
 - Improve the commit log.

 arch/arm/boot/dts/sama5d36ek_cmp.dts  |  87 ++
 arch/arm/boot/dts/sama5d3xcm_cmp.dtsi | 201 +++
 arch/arm/boot/dts/sama5d3xmb_cmp.dtsi | 301 ++
 3 files changed, 589 insertions(+)
 create mode 100644 arch/arm/boot/dts/sama5d36ek_cmp.dts
 create mode 100644 arch/arm/boot/dts/sama5d3xcm_cmp.dtsi
 create mode 100644 arch/arm/boot/dts/sama5d3xmb_cmp.dtsi

diff --git a/arch/arm/boot/dts/sama5d36ek_cmp.dts 
b/arch/arm/boot/dts/sama5d36ek_cmp.dts
new file mode 100644
index 000..b632143
--- /dev/null
+++ b/arch/arm/boot/dts/sama5d36ek_cmp.dts
@@ -0,0 +1,87 @@
+/*
+ * sama5d36ek_cmp.dts - Device Tree file for SAMA5D36-EK CMP board
+ *
+ *  Copyright (C) 2016 Atmel,
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+/dts-v1/;
+#include "sama5d36.dtsi"
+#include "sama5d3xmb_cmp.dtsi"
+
+/ {
+   model = "Atmel SAMA5D36EK-CMP";
+   compatible = "atmel,sama5d36ek-cmp", "atmel,sama5d3xmb-cmp", 
"atmel,sama5d3xcm-cmp", "atmel,sama5d36", "atmel,sama5d3", "atmel,sama5";
+
+   ahb {
+   apb {
+   spi0: spi@f0004000 {
+   status = "okay";
+   };
+
+   ssc0: ssc@f0008000 {
+   status = "okay";
+   };
+
+   can0: can@f000c000 {
+   status = "okay";
+   };
+
+   i2c0: i2c@f0014000 {
+   status = "okay";
+   };
+
+   i2c1: i2c@f0018000 {
+   status = "okay";
+   };
+
+   macb0: ethernet@f0028000 {
+   

Re: [HMM v13 09/18] mm/hmm/mirror: mirror process address space on device with HMM helpers

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 01:42:43PM +1100, Balbir Singh wrote:
> On 19/11/16 05:18, Jérôme Glisse wrote:

[...]

> > +/*
> > + * hmm_mirror_register() - register a mirror against an mm
> > + *
> > + * @mirror: new mirror struct to register
> > + * @mm: mm to register against
> > + *
> > + * To start mirroring a process address space device driver must register 
> > an
> > + * HMM mirror struct.
> > + */
> > +int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm)
> > +{
> > +   /* Sanity check */
> > +   if (!mm || !mirror || !mirror->ops)
> > +   return -EINVAL;
> > +
> > +   mirror->hmm = hmm_register(mm);
> > +   if (!mirror->hmm)
> > +   return -ENOMEM;
> > +
> > +   /* Register mmu_notifier if not already, use mmap_sem for locking */
> > +   if (!mirror->hmm->mmu_notifier.ops) {
> > +   struct hmm *hmm = mirror->hmm;
> > +   down_write(&mm->mmap_sem);
> > +   if (!hmm->mmu_notifier.ops) {
> > +   hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
> > +   if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) {
> > +   hmm->mmu_notifier.ops = NULL;
> > +   up_write(&mm->mmap_sem);
> > +   return -ENOMEM;
> > +   }
> > +   }
> > +   up_write(&mm->mmap_sem);
> > +   }
> 
> Does everything get mirrored, every update to the PTE (clear dirty, clear
> accessed bit, etc) or does the driver decide?

Driver decide but only read/write/valid matter for device. Device driver must
report dirtyness on invalidation. Some device do not have access bit and thus
can't provide that information.

The idea here is really to snapshot the CPU page table and duplicate it as
a GPU page table. The only synchronization HMM provide is that each virtual
address point to same memory at that at no point in time the same virtual
address can point to different physical memory on the device and on the CPU.

Cheers,
Jérôme


RE: [PATCH] bfa: turn bfa_mem_{kva,dma}_setup into inline functions

2016-11-20 Thread Gurumurthy, Anil
Patch looks good.
Acked by: Anil Gurumurthy 


-Original Message-
From: Johannes Thumshirn [mailto:jthumsh...@suse.de] 
Sent: 18 November 2016 18:52
To: Arnd Bergmann 
Cc: James E.J. Bottomley ; Martin K. Petersen 
; Anil Gurumurthy ; 
Sudarsana Kalluru ; linux-s...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] bfa: turn bfa_mem_{kva,dma}_setup into inline functions

On Wed, Nov 16, 2016 at 04:14:27PM +0100, Arnd Bergmann wrote:
> These two macros cause lots of warnings with gcc-7:
> 
> drivers/scsi/bfa/bfa_svc.c: In function 'bfa_fcxp_meminfo':
> drivers/scsi/bfa/bfa_svc.c:521:103: error: '*' in boolean context, 
> suggest '&&' instead [-Werror=int-in-bool-context]
> 
> Using inline functions makes them much more readable and avoids the 
> warnings.
> 
> Signed-off-by: Arnd Bergmann 
> ---

Looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key 
fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short)

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 01:29:23PM +1100, Balbir Singh wrote:
> On 19/11/16 05:18, Jérôme Glisse wrote:
> > HMM provides 3 separate functionality :
> > - Mirroring: synchronize CPU page table and device page table
> > - Device memory: allocating struct page for device memory
> > - Migration: migrating regular memory to device memory
> > 
> > This patch introduces some common helpers and definitions to all of
> > those 3 functionality.
> > 

[...]

> > +/*
> > + * HMM provides 3 separate functionality :
> > + *   - Mirroring: synchronize CPU page table and device page table
> > + *   - Device memory: allocating struct page for device memory
> > + *   - Migration: migrating regular memory to device memory
> > + *
> > + * Each can be use independently from the others.
> > + *
> > + *
> > + * Mirroring:
> > + *
> > + * HMM provide helpers to mirror process address space on a device. For 
> > this it
> > + * provides several helpers to order device page table update in respect 
> > to CPU
> > + * page table update. Requirement is that for any given virtual address 
> > the CPU
> > + * and device page table can not point to different physical page. It uses 
> > the
> > + * mmu_notifier API and introduce virtual address range lock which block 
> > CPU
> > + * page table update for a range while the device page table is being 
> > updated.
> > + * Usage pattern is:
> > + *
> > + *  hmm_vma_range_lock(vma, start, end);
> > + *  // snap shot CPU page table
> > + *  // update device page table from snapshot
> > + *  hmm_vma_range_unlock(vma, start, end);
> > + *
> > + * Any CPU page table update that conflict with a range lock will wait 
> > until
> > + * range is unlock. This garanty proper serialization of CPU and device 
> > page
> > + * table update.
> > + *
> > + *
> > + * Device memory:
> > + *
> > + * HMM provides helpers to help leverage device memory either addressable 
> > like
> > + * regular memory by the CPU or un-addressable at all. In both case the 
> > device
> > + * memory is associated to dedicated structs page (which are allocated 
> > like for
> > + * hotplug memory). Device memory management is under the responsability 
> > of the
> > + * device driver. HMM only allocate and initialize the struct pages 
> > associated
> > + * with the device memory.
> > + *
> > + * Allocating struct page for device memory allow to use device memory 
> > allmost
> > + * like any regular memory. Unlike regular memory it can not be added to 
> > the
> > + * lru, nor can any memory allocation can use device memory directly. 
> > Device
> > + * memory will only end up to be use in a process if device driver migrate 
> > some
>  in use 
> > + * of the process memory from regular memory to device memory.
> > + *
> 
> A process can never directly allocate device memory?

Well yes and no, if the device driver is first to trigger a page fault on some
memory then it can decide to directly allocate device memory. But usual CPU page
fault would not trigger allocation of device memory. A new mechanism can be 
added
to achieve that if that make sense but for my main target (x86/pcie) it does 
not.

> > + *
> > + * Migration:
> > + *
> > + * Existing memory migration mechanism (mm/migrate.c) does not allow to use
> > + * something else than the CPU to copy from source to destination memory. 
> > More
> > + * over existing code is not tailor to drive migration from process virtual
>   tailored
> > + * address rather than from list of pages. Finaly the migration flow does 
> > not
> Finally 
> > + * allow for graceful failure at different step of the migration process.
> > + *
> > + * HMM solves all of the above though simple API :
> > + *
> > + *  hmm_vma_migrate(vma, start, end, ops);
> > + *
> > + * With ops struct providing 2 callback alloc_and_copy() which allocated 
> > the
> > + * destination memory and initialize it using source memory. Migration can 
> > fail
> > + * after this step and thus last callback finalize_and_map() allow the 
> > device
> > + * driver to know which page were successfully migrated and which were not.
> > + *
> > + * This can easily be use outside of HMM intended use case.
> > + *
> 
> I think it is a good API to have
> 
> > + *
> > + * This header file contain all the API related to this 3 functionality and
> > + * each functions and struct are more thouroughly documented in below 
> > comments.
> > + */
> > +#ifndef LINUX_HMM_H
> > +#define LINUX_HMM_H
> > +
> > +#include 
> > +
> > +#if IS_ENABLED(CONFIG_HMM)
> > +
> > +
> > +/*
> > + * hmm_pfn_t - HMM use its own pfn type to keep several flags per page
> uses
> > + *
> > + * Flags:
> > + * HMM_PFN_VALID: pfn is valid
> > + * HMM_PFN_WRITE: CPU page table have the write permission set
>   has
> > + */
> > +typedef unsigned long hmm_pfn_t;
> > +
> > +#define HMM

Re: [HMM v13 07/18] mm/ZONE_DEVICE/x86: add support for un-addressable device memory

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 01:08:56PM +1100, Balbir Singh wrote:
> 
> 
> On 19/11/16 05:18, Jérôme Glisse wrote:
> > It does not need much, just skip populating kernel linear mapping
> > for range of un-addressable device memory (it is pick so that there
> > is no physical memory resource overlapping it). All the logic is in
> > share mm code.
> > 
> > Only support x86-64 as this feature doesn't make much sense with
> > constrained virtual address space of 32bits architecture.
> > 
> 
> Is there a reason this would not work on powerpc64 for example?
> Could you document the limitations -- testing/APIs/missing features?

It should be straight forward for powerpc64, i haven't done it but i
certainly can try to get access to some powerpc64 and add support for
it.

The only thing to do is to avoid creating kernel linear mapping for the
un-addressable memory (just for safety reasons we do not want any read/
write to invalid physical address).

Cheers,
Jérôme


Re: [HMM v13 06/18] mm/ZONE_DEVICE/unaddressable: add special swap for unaddressable

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 01:06:45PM +1100, Balbir Singh wrote:
> 
> 
> On 19/11/16 05:18, Jérôme Glisse wrote:
> > To allow use of device un-addressable memory inside a process add a
> > special swap type. Also add a new callback to handle page fault on
> > such entry.
> > 
> > Signed-off-by: Jérôme Glisse 
> > Cc: Dan Williams 
> > Cc: Ross Zwisler 
> > ---
> >  fs/proc/task_mmu.c   | 10 +++-
> >  include/linux/memremap.h |  5 
> >  include/linux/swap.h | 18 ++---
> >  include/linux/swapops.h  | 67 
> > 
> >  kernel/memremap.c| 14 ++
> >  mm/Kconfig   | 12 +
> >  mm/memory.c  | 24 +
> >  mm/mprotect.c| 12 +
> >  8 files changed, 158 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 6909582..0726d39 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -544,8 +544,11 @@ static void smaps_pte_entry(pte_t *pte, unsigned long 
> > addr,
> > } else {
> > mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
> > }
> > -   } else if (is_migration_entry(swpent))
> > +   } else if (is_migration_entry(swpent)) {
> > page = migration_entry_to_page(swpent);
> > +   } else if (is_device_entry(swpent)) {
> > +   page = device_entry_to_page(swpent);
> > +   }
> 
> 
> So the reason there is a device swap entry for a page belonging to a user 
> process is
> that it is in the middle of migration or is it always that a swap entry 
> represents
> unaddressable memory belonging to a GPU device, but its tracked in the page 
> table
> entries of the process.

For page being migrated i use the existing special migration pte entry. This 
new device
special swap entry is only for unaddressable memory belonging to a device (GPU 
or any
else). We need to keep track of those inside the CPU page table. Using a new 
special
swap entry is the easiest way with the minimum amount of change to core mm.

[...]

> > +#ifdef CONFIG_DEVICE_UNADDRESSABLE
> > +static inline swp_entry_t make_device_entry(struct page *page, bool write)
> > +{
> > +   return swp_entry(write?SWP_DEVICE_WRITE:SWP_DEVICE, page_to_pfn(page));
> 
> Code style checks

I was trying to balance against 79 columns break rule :)

[...]

> > +   } else if (is_device_entry(entry)) {
> > +   page = device_entry_to_page(entry);
> > +
> > +   get_page(page);
> > +   rss[mm_counter(page)]++;
> 
> Why does rss count go up?

I wanted the device page to be treated like any other page. There is an argument
to be made against and for doing that. Do you have strong argument for not doing
this ?

[...]

> > @@ -2536,6 +2557,9 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
> > if (unlikely(non_swap_entry(entry))) {
> > if (is_migration_entry(entry)) {
> > migration_entry_wait(vma->vm_mm, fe->pmd, fe->address);
> > +   } else if (is_device_entry(entry)) {
> > +   ret = device_entry_fault(vma, fe->address, entry,
> > +fe->flags, fe->pmd);
> 
> What does device_entry_fault() actually do here?

Well it is a special fault handler, it must migrate the memory back to some 
place
where the CPU can access it. It only matter for unaddressable memory.

> > } else if (is_hwpoison_entry(entry)) {
> > ret = VM_FAULT_HWPOISON;
> > } else {
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 1bc1eb3..70aff3a 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -139,6 +139,18 @@ static unsigned long change_pte_range(struct 
> > vm_area_struct *vma, pmd_t *pmd,
> >  
> > pages++;
> > }
> > +
> > +   if (is_write_device_entry(entry)) {
> > +   pte_t newpte;
> > +
> > +   make_device_entry_read(&entry);
> > +   newpte = swp_entry_to_pte(entry);
> > +   if (pte_swp_soft_dirty(oldpte))
> > +   newpte = pte_swp_mksoft_dirty(newpte);
> > +   set_pte_at(mm, addr, pte, newpte);
> > +
> > +   pages++;
> > +   }
> 
> Does it make sense to call mprotect() on device memory ranges?

There is nothing special about vma that containt device memory. They can be
private anonymous, share, file back ... So any existing memory syscall must
behave as expected. This is really just like any other page except that CPU
can not access it.

Cheers,
Jérôme


[PATCH] ext4: remove unused function ext4_aligned_io()

2016-11-20 Thread Ross Zwisler
The last user of ext4_aligned_io() was the DAX path in
ext4_direct_IO_write().  This usage was removed by Jan Kara's patch
entitled "ext4: Rip out DAX handling from direct IO path".

Signed-off-by: Ross Zwisler 
---
 fs/ext4/ext4.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8b76311..8a8a9b2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3262,13 +3262,6 @@ static inline void 
ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
}
 }
 
-static inline bool ext4_aligned_io(struct inode *inode, loff_t off, loff_t len)
-{
-   int blksize = 1 << inode->i_blkbits;
-
-   return IS_ALIGNED(off, blksize) && IS_ALIGNED(len, blksize);
-}
-
 extern struct iomap_ops ext4_iomap_ops;
 
 #endif /* __KERNEL__ */
-- 
2.7.4



Re: Linux 4.9-rc6

2016-11-20 Thread Eric Dumazet
On Mon, 2016-11-21 at 01:35 +, Al Viro wrote:

> 
> Umm...  One possibility would be something like fs/namespace.c:m_start() -
> if nothing has changed since the last time, just use a cached pointer.
> That has sped the damn thing (/proc/mounts et.al.) big way, but it's
> dependent upon having an event count updated whenever we change the
> mount tree - doing the same for vma_area list might or might not be
> a good idea.  /proc/mounts and friends get ->poll() on that as well;
> that probably would _not_ be a good idea in this case.

Yes, a generation number could help in some cases.

Another potential issue with CONFIG_VMAP_STACK is that we make no
attempt to allocate 4 consecutive pages.

Even if we have plenty of memory, 4 calls to alloc_page() are likely to
give us 4 pages in completely different locations.

Here I printed the hugepage number of the 4 pages for some stacks :


0xc9001a07c000-0xc9001a081000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfcac Hfeba Hfec0 Hfc9d N0=4
0xc9001a084000-0xc9001a089000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfc79 Hfc79 Hfc79 Hfc83 N0=4
0xc9001a08c000-0xc9001a091000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfc9b Hfe91 Hfebe Hfca2 N0=4
0xc9001a094000-0xc9001a099000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfcaa Hfcaa Hfca6 Hfebc N0=4
0xc9001a09c000-0xc9001a0a1000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfe9b Hfe90 Hff09 Hfefb N0=4
0xc9001a0a4000-0xc9001a0a9000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfe94 Hfe62 Hfea0 Hfe7b N0=4
0xc9001a0ac000-0xc9001a0b1000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfe78 Hff05 Hff05 Hfc74 N0=4
0xc9001a0b4000-0xc9001a0b9000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfc9b Hfc9b Hfe83 Hf782 N0=4
0xc9001a0bc000-0xc9001a0c1000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfe78 Hfe78 Hfc7f Hfc7f N0=4
0xc9001a0c4000-0xc9001a0c9000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfebe Hfebe Hfe82 Hfe85 N0=4
0xc9001a0cc000-0xc9001a0d1000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfc6b Hfe62 Hfe62 Hfcaa N0=4
0xc9001a0d4000-0xc9001a0d9000   20480 _do_fork+0xe1/0x360 pages=4 
vmalloc Hfebd Hfebd Hfc92 Hfc92 N0=4

This is a vmalloc() generic issue that is worth fixing now ?

Note this RFC might conflict with NUMA interleave policy.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..0123e97debb9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1602,9 +1602,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
 pgprot_t prot, int node)
 {
struct page **pages;
-   unsigned int nr_pages, array_size, i;
+   unsigned int nr_pages, array_size, i, j;
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
+   const gfp_t multi_alloc_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | 
__GFP_NORETRY;
 
nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));
@@ -1624,20 +1625,34 @@ static void *__vmalloc_area_node(struct vm_struct 
*area, gfp_t gfp_mask,
return NULL;
}
 
-   for (i = 0; i < area->nr_pages; i++) {
-   struct page *page;
-
-   if (node == NUMA_NO_NODE)
-   page = alloc_page(alloc_mask);
-   else
-   page = alloc_pages_node(node, alloc_mask, 0);
+   for (i = 0; i < area->nr_pages;) {
+   struct page *page = NULL;
+   unsigned int chunk_order = min(ilog2(area->nr_pages - i), 
MAX_ORDER - 1);
+
+   while (chunk_order && !page) {
+   if (node == NUMA_NO_NODE)
+   page = alloc_pages(multi_alloc_mask, 
chunk_order);
+   else
+   page = alloc_pages_node(node, multi_alloc_mask, 
chunk_order);
+   if (page)
+   split_page(page, chunk_order);
+   else
+   chunk_order--;
+   }
+   if (!page) {
+   if (node == NUMA_NO_NODE)
+   page = alloc_pages(alloc_mask, 0);
+   else
+   page = alloc_pages_node(node, alloc_mask, 0);
+   }
 
if (unlikely(!page)) {
/* Successfully allocated i pages, free them in 
__vunmap() */
area->nr_pages = i;
goto fail;
}
-   area->pages[i] = page;
+   for (j = 0; j < (1 << chunk_order); j++)
+   area->pages[i++] = page++;
if (gfpflags_allow_blocking(gfp_mask))
cond_resched();
}




[PATCHv2] arm64: dts: exynos: add the mshc_2 node for supporting T-Flash

2016-11-20 Thread Jaehoon Chung
Add the mshc_2 node for supporting T-flash.

And it needs to add the "mshc*" aliases. Because dwmmc driver should be
assigned to "ctrl_id" after parsing to "mshc".
If there is no aliases for mshc, then it might be set to the wrong
capabilities.

Signed-off-by: Jaehoon Chung 
---
Changelog on V2:
- Changed from 0 to GPIO_ACTIVE_HIGH

 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index ce41781..88cb6c1 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -45,6 +45,8 @@
spi2 = &spi_2;
spi3 = &spi_3;
spi4 = &spi_4;
+   mshc0 = &mshc_0;
+   mshc2 = &mshc_2;
};
 
chosen {
@@ -715,6 +717,23 @@
assigned-clock-rates = <8>;
 };
 
+&mshc_2 {
+   status = "okay";
+   num-slots = <1>;
+   cap-sd-highspeed;
+   disable-wp;
+   cd-gpios = <&gpa2 4 GPIO_ACTIVE_HIGH>;
+   cd-inverted;
+   card-detect-delay = <200>;
+   samsung,dw-mshc-ciu-div = <3>;
+   samsung,dw-mshc-sdr-timing = <0 4>;
+   samsung,dw-mshc-ddr-timing = <0 2>;
+   fifo-depth = <0x80>;
+   pinctrl-names = "default";
+   pinctrl-0 = <&sd2_clk &sd2_cmd &sd2_bus1 &sd2_bus4>;
+   bus-width = <4>;
+};
+
 &pinctrl_alive {
pinctrl-names = "default";
pinctrl-0 = <&initial_alive>;
-- 
2.10.1



[PATCH] powerpc: cputime: fix a compile warning

2016-11-20 Thread yanjiang.jin
From: Yanjiang Jin 

This patch is to avoid the below warning:

kernel/sched/cpuacct.c:298:25: warning:
format '%lld' expects argument of type 'long long int',
but argument 4 has type 'long unsigned int' [-Wformat=]

Signed-off-by: Yanjiang Jin 
---
 arch/powerpc/include/asm/cputime.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index 4f60db0..4423e97 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -228,7 +228,8 @@ static inline cputime_t clock_t_to_cputime(const unsigned 
long clk)
return (__force cputime_t) ct;
 }
 
-#define cputime64_to_clock_t(ct)   cputime_to_clock_t((cputime_t)(ct))
+#define cputime64_to_clock_t(ct)   \
+   (__force u64)(cputime_to_clock_t((cputime_t)(ct)))
 
 /*
  * PPC64 uses PACA which is task independent for storing accounting data while
-- 
1.9.1



Re: [HMM v13 04/18] mm/ZONE_DEVICE/free-page: callback when page is freed

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 12:49:55PM +1100, Balbir Singh wrote:
> On 19/11/16 05:18, Jérôme Glisse wrote:
> > When a ZONE_DEVICE page refcount reach 1 it means it is free and nobody
> > is holding a reference on it (only device to which the memory belong do).
> > Add a callback and call it when that happen so device driver can implement
> > their own free page management.
> > 
> 
> Could you give an example of what their own free page management might look 
> like?

Well hard to do that, the free management is whatever the device driver want to 
do.
So i don't have any example to give. Each device driver (especialy GPU ones) 
have
their own memory management with little commonality.

So how the device driver manage that memory is really not important, at least 
it is
not something for which i want to impose a policy onto driver. I want to leave 
each
device driver decide on how to achieve that.

Cheers,
Jérôme


Re: [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags

2016-11-20 Thread Jerome Glisse
On Mon, Nov 21, 2016 at 11:44:36AM +1100, Balbir Singh wrote:
> 
> 
> On 19/11/16 05:18, Jérôme Glisse wrote:
> > Only usefull for arch where we support ZONE_DEVICE and where we want to
> > also support un-addressable device memory. We need struct page for such
> > un-addressable memory. But we should avoid populating the kernel linear
> > mapping for the physical address range because there is no real memory
> > or anything behind those physical address.
> > 
> > Hence we need more flags than just knowing if it is device memory or not.
> > 
> 
> 
> Isn't it better to add a wrapper to arch_add/remove_memory and do those
> checks inside and then call arch_add/remove_memory to reduce the churn.
> If you need selectively enable MEMORY_UNADDRESSABLE that can be done with
> _ARCH_HAS_FEATURE

The flag parameter can be use by other new features and thus i thought the
churn was fine. But i do not mind either way, whatever people like best.

[...]

> > -extern int arch_add_memory(int nid, u64 start, u64 size, bool for_device);
> > +
> > +/*
> > + * For device memory we want more informations than just knowing it is 
> > device
>information
> > + * memory. We want to know if we can migrate it (ie it is not storage 
> > memory
> > + * use by DAX). Is it addressable by the CPU ? Some device memory like GPU
> > + * memory can not be access by CPU but we still want struct page so that we
>   accessed
> > + * can use it like regular memory.
> 
> Can you please add some details on why -- migration needs them for example?

I am not sure what you mean ? DAX ie persistent memory device is intended to be
use for filesystem or persistent storage. Hence memory migration does not apply
to it (it would go against its purpose).

So i want to extend ZONE_DEVICE to be more then just DAX/persistent memory. For
that i need to differentatiate between device memory that can be migrated and
should be more or less treated like regular memory (with struct page). This is
what the MEMORY_MOVABLE flag is for.

Finaly in my case the device memory is not accessible by the CPU so i need yet
another flag. In the end i am extending ZONE_DEVICE to be use for 3 differents
type of memory.

Is this the kind of explanation you are looking for ?

> > + */
> > +#define MEMORY_FLAGS_NONE 0
> > +#define MEMORY_DEVICE (1 << 0)
> > +#define MEMORY_MOVABLE (1 << 1)
> > +#define MEMORY_UNADDRESSABLE (1 << 2)

Cheers,
Jérôme


[PATCH] fsldma: t4240qds: drop "SG" CAP for DMA3

2016-11-20 Thread yanjiang.jin
From: Yanjiang Jin 

T4240QDS DMA controller uses the external DMA control signals to start or
restart a paused DMA transfer, acknowledge a DMA transfer in progress and
also indicates a transfer completion.
"scatterlist copy" depends on these signals.

But as "T4240 Reference Manual" shown:
"The external DMA control signals are available on DMA1 and DMA2. They are
 not supported by DMA3."

So add an of_node property "fsl,external-dma-control-signals" to only DMA1
and DMA2, it can prevent DMA3 from doing DMA_SG operations. Else we would
get the below errors during doing dmatest:

modprobe dmatest run=1 iterations=42

dmatest: Started 1 threads using dma2chan0
fsl-elo-dma ffe102300.dma: chan0: Transfer Error!
fsl-elo-dma ffe102300.dma: chan0: irq: unhandled sr 0x0080
dmatest: dma2chan0-sg0: dstbuf[0x3954] not copied! Expected d8, got 2b

dmatest: dma2chan7-sg0: dstbuf[0x1c51] not copied! Expected df, got 2e
dmatest: dma2chan7-sg0: 1301 errors suppressed
dmatest: dma2chan7-sg0: result #42: 'data error' with
src_off=0xf21 dst_off=0x1c32 len=0x535 (1333)
dmatest: dma2chan7-sg0: summary 42 tests, 42 failures 2952 iops 23968 KB/s

Signed-off-by: Yanjiang Jin 
---
 arch/powerpc/boot/dts/fsl/t4240si-post.dtsi |  6 ++
 drivers/dma/fsldma.c| 11 +--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
index 68c4ead..155997d 100644
--- a/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t4240si-post.dtsi
@@ -1029,7 +1029,13 @@
};
 
 /include/ "elo3-dma-0.dtsi"
+   dma@100300 {
+   fsl,external-dma-control-signals;
+   };
 /include/ "elo3-dma-1.dtsi"
+   dma@101300 {
+   fsl,external-dma-control-signals;
+   };
 /include/ "elo3-dma-2.dtsi"
 
 /include/ "qoriq-espi-0.dtsi"
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 51c75bf..f7054f4 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1354,12 +1354,19 @@ static int fsldma_of_probe(struct platform_device *op)
fdev->irq = irq_of_parse_and_map(op->dev.of_node, 0);
 
dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
-   dma_cap_set(DMA_SG, fdev->common.cap_mask);
+
dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
+
+   if (of_get_property(op->dev.of_node,
+   "fsl,external-dma-control-signals", NULL)) {
+   dma_cap_set(DMA_SG, fdev->common.cap_mask);
+   fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
+   } else
+   dma_cap_clear(DMA_SG, fdev->common.cap_mask);
+
fdev->common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
fdev->common.device_free_chan_resources = fsl_dma_free_chan_resources;
fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
-   fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
fdev->common.device_tx_status = fsl_tx_status;
fdev->common.device_issue_pending = fsl_dma_memcpy_issue_pending;
fdev->common.device_config = fsl_dma_device_config;
-- 
1.9.1



Re: [PATCH] mm: don't cap request size based on read-ahead setting

2016-11-20 Thread Hillf Danton
On Saturday, November 19, 2016 3:41 AM Jens Axboe wrote:
> We ran into a funky issue, where someone doing 256K buffered reads saw
> 128K requests at the device level. Turns out it is read-ahead capping
> the request size, since we use 128K as the default setting. This doesn't
> make a lot of sense - if someone is issuing 256K reads, they should see
> 256K reads, regardless of the read-ahead setting, if the underlying
> device can support a 256K read in a single command.
> 
Is it also making any sense to see 4M reads to meet 4M requests if 
the underlying device can support 4M IO?

thanks
Hillf

> This patch introduces a bdi hint, io_pages. This is the soft max IO size
> for the lower level, I've hooked it up to the bdev settings here.
> Read-ahead is modified to issue the maximum of the user request size,
> and the read-ahead max size, but capped to the max request size on the
> device side. The latter is done to avoid reading ahead too much, if the
> application asks for a huge read. With this patch, the kernel behaves
> like the application expects.
> 
> Signed-off-by: Jens Axboe 
> Acked-by: Johannes Weiner 
> ---
>  block/blk-settings.c |  1 +
>  block/blk-sysfs.c|  1 +
>  include/linux/backing-dev-defs.h |  1 +
>  mm/readahead.c   | 39 ---
>  4 files changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f679ae122843..65f16cf4f850 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -249,6 +249,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
> unsigned int max_hw_secto
>   max_sectors = min_not_zero(max_hw_sectors, limits->max_dev_sectors);
>   max_sectors = min_t(unsigned int, max_sectors, BLK_DEF_MAX_SECTORS);
>   limits->max_sectors = max_sectors;
> + q->backing_dev_info.io_pages = max_sectors >> (PAGE_SHIFT - 9);
>  }
>  EXPORT_SYMBOL(blk_queue_max_hw_sectors);
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9cc8d7c5439a..ea374e820775 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -212,6 +212,7 @@ queue_max_sectors_store(struct request_queue *q, const 
> char *page, size_t count)
> 
>   spin_lock_irq(q->queue_lock);
>   q->limits.max_sectors = max_sectors_kb << 1;
> + q->backing_dev_info.io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
>   spin_unlock_irq(q->queue_lock);
> 
>   return ret;
> diff --git a/include/linux/backing-dev-defs.h 
> b/include/linux/backing-dev-defs.h
> index c357f27d5483..b8144b2d59ce 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -136,6 +136,7 @@ struct bdi_writeback {
>  struct backing_dev_info {
>   struct list_head bdi_list;
>   unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
> + unsigned long io_pages; /* max allowed IO size */
>   unsigned int capabilities; /* Device capabilities */
>   congested_fn *congested_fn; /* Function pointer if device is md/dm */
>   void *congested_data;   /* Pointer to aux data for congested func */
> diff --git a/mm/readahead.c b/mm/readahead.c
> index c8a955b1297e..fb4c99f85618 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -207,12 +207,21 @@ int __do_page_cache_readahead(struct address_space 
> *mapping, struct file *filp,
>   * memory at once.
>   */
>  int force_page_cache_readahead(struct address_space *mapping, struct file 
> *filp,
> - pgoff_t offset, unsigned long nr_to_read)
> +pgoff_t offset, unsigned long nr_to_read)
>  {
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + struct file_ra_state *ra = &filp->f_ra;
> + unsigned long max_pages;
> +
>   if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
>   return -EINVAL;
> 
> - nr_to_read = min(nr_to_read, inode_to_bdi(mapping->host)->ra_pages);
> + /*
> +  * If the request exceeds the readahead window, allow the read to
> +  * be up to the optimal hardware IO size
> +  */
> + max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
> + nr_to_read = min(nr_to_read, max_pages);
>   while (nr_to_read) {
>   int err;
> 
> @@ -369,10 +378,18 @@ ondemand_readahead(struct address_space *mapping,
>  bool hit_readahead_marker, pgoff_t offset,
>  unsigned long req_size)
>  {
> - unsigned long max = ra->ra_pages;
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + unsigned long max_pages = ra->ra_pages;
>   pgoff_t prev_offset;
> 
>   /*
> +  * If the request exceeds the readahead window, allow the read to
> +  * be up to the optimal hardware IO size
> +  */
> + if (req_size > max_pages && bdi->io_pages > max_pages)
> + max_pages = min(req_size, bdi->io_pages);
> +
> + /*
>* start of file
>*/
>  

[PATCH] ARM: davinci: Allocate spare interrupts

2016-11-20 Thread David Lechner
This allocates spare interrupts for mach-davinci. These extra interrupts
are need for things like IIO triggers that define software interrupts.

Signed-off-by: David Lechner 
---
 arch/arm/mach-davinci/include/mach/irqs.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-davinci/include/mach/irqs.h 
b/arch/arm/mach-davinci/include/mach/irqs.h
index edb2ca6..2b56bb2 100644
--- a/arch/arm/mach-davinci/include/mach/irqs.h
+++ b/arch/arm/mach-davinci/include/mach/irqs.h
@@ -403,7 +403,9 @@
 
 /* da850 currently has the most gpio pins (144) */
 #define DAVINCI_N_GPIO 144
+/* Extra IRQs for things like IIO triggers */
+#define DAVINCI_N_SPARE_IRQ16
 /* da850 currently has the most irqs so use DA850_N_CP_INTC_IRQ */
-#define NR_IRQS(DA850_N_CP_INTC_IRQ + 
DAVINCI_N_GPIO)
+#define NR_IRQS (DA850_N_CP_INTC_IRQ + DAVINCI_N_GPIO + DAVINCI_N_SPARE_IRQ)
 
 #endif /* __ASM_ARCH_IRQS_H */
-- 
2.7.4



[PATCH v2 1/2] dmaengine: dma_slave_config: add support for slave port window

2016-11-20 Thread Peter Ujfalusi
Some slave devices uses address window instead of single register for read
and/or write of data. With the src/dst_port_window_size the address window
can be specified and the DMAengine driver should use this information to
correctly set up the transfer to loop within the provided window.

Signed-off-by: Peter Ujfalusi 
---
 include/linux/dmaengine.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index cc535a478bae..689d44327ef3 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -336,6 +336,12 @@ enum dma_slave_buswidth {
  * may or may not be applicable on memory sources.
  * @dst_maxburst: same as src_maxburst but for destination target
  * mutatis mutandis.
+ * @src_port_window_size: The length of the register area the data need to be
+ * written on the device side. It is only used for devices which is using an
+ * area instead of a single register to receive the data. Typically the DMA
+ * loops in this area in order to transfer the data.
+ * @dst_port_window_size: same as src_port_window_size but for the destination
+ * port.
  * @device_fc: Flow Controller Settings. Only valid for slave channels. Fill
  * with 'true' if peripheral should be flow controller. Direction will be
  * selected at Runtime.
@@ -363,6 +369,8 @@ struct dma_slave_config {
enum dma_slave_buswidth dst_addr_width;
u32 src_maxburst;
u32 dst_maxburst;
+   u32 src_port_window_size;
+   u32 dst_port_window_size;
bool device_fc;
unsigned int slave_id;
 };
-- 
2.10.2



Re: [PATCH]: staging: Greybus: Remove unnecessary braces for single statement block

2016-11-20 Thread Viresh Kumar
On Fri, Nov 18, 2016 at 8:45 PM, Rahul Krishnan
 wrote:
> This patch fixes the following checkpath.pl warning
> WARNING: braces {} are not necessary for single statement blocks
>
>
> Signed-off-by: Rahul Krishnan 
> ---
>  drivers/staging/greybus/sdio.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Acked-by: Viresh Kumar 


Re: [RFC][PATCH 7/7] kref: Implement using refcount_t

2016-11-20 Thread Boqun Feng
On Fri, Nov 18, 2016 at 05:06:55PM +, Will Deacon wrote:
> On Fri, Nov 18, 2016 at 12:37:18PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 18, 2016 at 10:07:26AM +, Reshetova, Elena wrote:
> > > 
> > > Peter do you have the changes to the refcount_t interface compare to
> > > the version in this patch? 
> > 
> > > We are now starting working on atomic_t --> refcount_t conversions and
> > > it would save a bit of work to have latest version from you that we
> > > can be based upon. 
> > 
> > The latestest version below, mostly just comment changes since last
> > time.
> > 
> > ---
> > Subject: refcount_t: A special purpose refcount type
> > From: Peter Zijlstra 
> > Date: Mon Nov 14 18:06:19 CET 2016
> > 
> > Provide refcount_t, an atomic_t like primitive built just for
> > refcounting.
> > 
> > It provides saturation semantics such that overflow becomes impossible
> > and thereby 'spurious' use-after-free is avoided.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  include/linux/refcount.h |  241 
> > +++
> >  1 file changed, 241 insertions(+)
> > 
> > --- /dev/null
> > +++ b/include/linux/refcount.h
> > @@ -0,0 +1,241 @@
> > +#ifndef _LINUX_REFCOUNT_H
> > +#define _LINUX_REFCOUNT_H
> > +
> > +/*
> > + * Variant of atomic_t specialized for reference counts.
> > + *
> > + * The interface matches the atomic_t interface (to aid in porting) but 
> > only
> > + * provides the few functions one should use for reference counting.
> > + *
> > + * It differs in that the counter saturates at UINT_MAX and will not move 
> > once
> > + * there. This avoids wrapping the counter and causing 'spurious'
> > + * use-after-free issues.
> > + *
> > + * Memory ordering rules are slightly relaxed wrt regular atomic_t 
> > functions
> > + * and provide only what is strictly required for refcounts.
> > + *
> > + * The increments are fully relaxed; these will not provide ordering. The
> > + * rationale is that whatever is used to obtain the object we're 
> > increasing the
> > + * reference count on will provide the ordering. For locked data 
> > structures,
> > + * its the lock acquire, for RCU/lockless data structures its the dependent
> > + * load.
> > + *
> > + * Do note that inc_not_zero() provides a control dependency which will 
> > order
> > + * future stores against the inc, this ensures we'll never modify the 
> > object
> > + * if we did not in fact acquire a reference.
> > + *
> > + * The decrements will provide release order, such that all the prior 
> > loads and
> > + * stores will be issued before, it also provides a control dependency, 
> > which
> > + * will order us against the subsequent free().
> > + *
> > + * The control dependency is against the load of the cmpxchg (ll/sc) that
> > + * succeeded. This means the stores aren't fully ordered, but this is fine
> > + * because the 1->0 transition indicates no concurrency.
> > + *
> > + * Note that the allocator is responsible for ordering things between 
> > free()
> > + * and alloc().
> > + *
> > + *
> > + * Note: the implementation hard relies on increments, bigger than 1 
> > additions
> > + *   need explicit overflow -> saturation logic.
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +typedef struct refcount_struct {
> > +   atomic_t refs;
> > +} refcount_t;
> > +
> > +#define REFCOUNT_INIT(n)   { .refs = ATOMIC_INIT(n), }
> > +
> > +static inline void refcount_set(refcount_t *r, int n)
> > +{
> > +   atomic_set(&r->refs, n);
> > +}
> > +
> > +static inline unsigned int refcount_read(const refcount_t *r)
> > +{
> > +   return atomic_read(&r->refs);
> > +}
> 
> Minor nit, but it might be worth being consistent in our usage of int
> (parameter to refcount_set) and unsigned int (return value of
> refcount_read).
> 
> > +
> > +/*
> > + * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
> > + *
> > + * Provides no memory ordering, it is assumed the caller already has a
> > + * reference on the object, will WARN when this is not so.
> > + */
> > +static inline void refcount_inc(refcount_t *r)
> > +{
> > +   unsigned int old, new, val = atomic_read(&r->refs);
> > +
> > +   for (;;) {
> > +   WARN(!val, "refcount_t: increment on 0; use-after-free.\n");
> > +
> > +   if (unlikely(val == UINT_MAX))
> > +   return;
> > +
> > +   new = val + 1;
> > +   old = atomic_cmpxchg_relaxed(&r->refs, val, new);
> > +   if (old == val)
> > +   break;
> > +
> > +   val = old;
> > +   }
> > +
> > +   WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> > +}
> > +
> > +/*
> > + * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
> > + *
> > + * Provides no memory ordering, it is assumed the caller has guaranteed the
> > + * object memory to be stable (RCU, etc.). It does provide a control 
> > dependency
> > + * and thereby orders future stores. 

RE: [PATCH v11 1/6] drivers/platform/x86/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-11-20 Thread Tan, Jui Nee


> -Original Message-
> From: Andy Shevchenko [mailto:andriy.shevche...@linux.intel.com]
> Sent: Friday, November 18, 2016 7:22 PM
> To: Tan, Jui Nee ; mika.westerb...@linux.intel.com;
> heikki.kroge...@linux.intel.com; t...@linutronix.de; dvh...@infradead.org;
> mi...@redhat.com; h...@zytor.com; x...@kernel.org; pty...@xes-inc.com;
> lee.jo...@linaro.org; linus.wall...@linaro.org
> Cc: linux-g...@vger.kernel.org; platform-driver-...@vger.kernel.org;
> linux-kernel@vger.kernel.org; Yong, Jonathan ;
> Yu, Ong Hock ; Luck, Tony ;
> Wan Mohamad, Wan Ahmad Zainie ;
> Sun, Yunying 
> Subject: Re: [PATCH v11 1/6] drivers/platform/x86/p2sb: New Primary to
> Sideband bridge support driver for Intel SOC's
> 
> On Fri, 2016-11-18 at 13:22 +0800, Tan Jui Nee wrote:
> > From: Andy Shevchenko 
> >
> > There is already one and at least one more user coming which require
> > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > MMIO bar hidden by BIOS.
> > Create a driver to access P2SB for x86 devices.
> >
> > Signed-off-by: Yong, Jonathan 
> > Signed-off-by: Andy Shevchenko 
> > ---
> > Changes in V11:
> > - No change
> 
> Any particular reason you ignored my comments to v10 of this patch?
>
Hi Andy,
I am sorry for missing your comments as the email filtered into other folder 
and I was not aware of that. I will applied your comments into next patch 
version.
 
> --
> Andy Shevchenko 
> Intel Finland Oy


Re: [PATCH] thermal/powerclamp: add back module device table

2016-11-20 Thread Zhang Rui
On Thu, 2016-11-17 at 11:42 -0800, Jacob Pan wrote:
> On Tue, 15 Nov 2016 08:03:32 +0100
> Greg Kroah-Hartman  wrote:
> 
> > 
> > On Mon, Nov 14, 2016 at 11:08:45AM -0800, Jacob Pan wrote:
> > > 
> > > Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
> > > cpu id table with a cpu feature check, which is logically
> > > correct.
> > > But we need the module device table to allow module auto loading.
> > > 
> > > Fixes:3105f234 thermal/powerclamp: correct cpu support check
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/thermal/intel_powerclamp.c | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)  
> > 
> > 
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read
> > Documentation/stable_kernel_rules.txt
> > for how to do this properly.
> > 
> > 
> Good to know, thanks. Rui will take care of it this time. Per Rui
> "I will apply patch 1 and queue up for next -rc and 4.8 stable."
> 

Just find another problem.
We're still missing this upstream
commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 (thermal/powerclamp:
correct cpu support check) for 4.7 stable, and in this case, we can not
queue this patch for both 4.7 and 4.8 stable at the moment because it
does not apply to 4.7 stable.

I will send this patch out asap to catch 4.9, and then send a note to
stable kernel with the following Option 2
in Documentation/stable_kernel_rules.txt after it's merged.

thanks,
rui


Re: [PATCH 0/2] ACPI / processor / cpufreq: Function return value cleanups

2016-11-20 Thread Viresh Kumar
On 18-11-16, 13:57, Rafael J. Wysocki wrote:
> Hi,
> 
> Two cleanups related to unused function return values, [1/2] in 
> processor_perflib.c
> and [2/2] in cpufreq.c.

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH v2] cpufreq: Avoid using inactive policies

2016-11-20 Thread Viresh Kumar
On 18-11-16, 13:40, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> There are two places in the cpufreq core in which low-level driver
> callbacks may be invoked for an inactive cpufreq policy, which isn't
> guaranteed to work in general.  Both are due to possible races with
> CPU offline.
> 
> First, in cpufreq_get(), the policy may become inactive after
> the check against policy->cpus in cpufreq_cpu_get() and before
> policy->rwsem is acquired, in which case using it going forward may
> not be correct.
> 
> Second, an analogous situation is possible in cpufreq_update_policy().
> 
> Avoid using inactive policies by adding policy_is_inactive() checks
> to the code in the above places.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> -> v2:
>  Initialize ret in cpufreq_update_policy() if the inactive policy check
>  doesn't pass.

Acked-by: Viresh Kumar 

-- 
viresh


[PATCH - v2] block: call trace_block_split() from bio_split()

2016-11-20 Thread NeilBrown

Somewhere around
Commit: 20d0189b1012 ("block: Introduce new bio_split()")
and
Commit: 4b1faf931650 ("block: Kill bio_pair_split()")

in 3.14 we lost the call to trace_block_split() from bio_split().

Commit: cda22646adaa ("block: add call to split trace point")

in 4.5 added it back for blk_queue_split(), but not for other users of
bio_split(), and particularly not for md.

This patch moves the trace_block_split() call from blk_queue_split()
to bio_split().
As blk_queue_split() calls bio_split() (via various helper functions)
the same events that were traced before will still be traced.

Reviewed-by: Christoph Hellwig 
Signed-off-by: NeilBrown 
---

Thanks Christoph.
This adds the wrap and the reviewed-by.

NeilBrown


 block/bio.c   | 2 ++
 block/blk-merge.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..0aa755abd10b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1804,6 +1804,8 @@ struct bio *bio_split(struct bio *bio, int sectors,
bio_integrity_trim(split, 0, sectors);
 
bio_advance(bio, split->bi_iter.bi_size);
+   trace_block_split(bdev_get_queue(bio->bi_bdev), split,
+ bio->bi_iter.bi_sector);
 
return split;
 }
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5fc8b69..82cdd35a9f07 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -217,7 +217,6 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio,
split->bi_opf |= REQ_NOMERGE;
 
bio_chain(split, *bio);
-   trace_block_split(q, split, (*bio)->bi_iter.bi_sector);
generic_make_request(*bio);
*bio = split;
}
-- 
2.10.2



signature.asc
Description: PGP signature


Re: [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory

2016-11-20 Thread Balbir Singh


On 19/11/16 05:18, Jérôme Glisse wrote:
> This patch add a new memory migration helpers, which migrate memory
 adds   helper migrates
> backing a range of virtual address of a process to different memory
> (which can be allocated through special allocator). It differs from
> numa migration by working on a range of virtual address and thus by
> doing migration in chunk that can be large enough to use DMA engine
> or special copy offloading engine.
> 
> Expected users are any one with heterogeneous memory where different
> memory have different characteristics (latency, bandwidth, ...). As
> an example IBM platform with CAPI bus can make use of this feature
> to migrate between regular memory and CAPI device memory. New CPU
> architecture with a pool of high performance memory not manage as
> cache but presented as regular memory (while being faster and with
> lower latency than DDR) will also be prime user of this patch.
> 
> Migration to private device memory will be usefull for device that
> have large pool of such like GPU, NVidia plans to use HMM for that.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Jatin Kumar 
> Signed-off-by: John Hubbard 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> ---
>  include/linux/hmm.h |  54 -
>  mm/migrate.c| 584 
> 
>  2 files changed, 635 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index c79abfc..9777309 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -101,10 +101,13 @@ struct hmm;
>   * HMM_PFN_EMPTY: corresponding CPU page table entry is none (pte_none() 
> true)
>   * HMM_PFN_FAULT: use by hmm_vma_fault() to signify which address need 
> faulting
>   * HMM_PFN_DEVICE: this is device memory (ie a ZONE_DEVICE page)
> + * HMM_PFN_LOCKED: underlying struct page is lock
>   * HMM_PFN_SPECIAL: corresponding CPU page table entry is special ie result 
> of
>   *  vm_insert_pfn() or vm_insert_page() and thus should not be mirror by 
> a
>   *  device (the entry will never have HMM_PFN_VALID set and the pfn value
>   *  is undefine)
> + * HMM_PFN_MIGRATE: use by hmm_vma_migrate() to signify which address can be
> + *  migrated
>   * HMM_PFN_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
>   */
>  typedef unsigned long hmm_pfn_t;
> @@ -116,9 +119,11 @@ typedef unsigned long hmm_pfn_t;
>  #define HMM_PFN_EMPTY (1 << 4)
>  #define HMM_PFN_FAULT (1 << 5)
>  #define HMM_PFN_DEVICE (1 << 6)
> -#define HMM_PFN_SPECIAL (1 << 7)
> -#define HMM_PFN_UNADDRESSABLE (1 << 8)
> -#define HMM_PFN_SHIFT 9
> +#define HMM_PFN_LOCKED (1 << 7)
> +#define HMM_PFN_SPECIAL (1 << 8)
> +#define HMM_PFN_MIGRATE (1 << 9)
> +#define HMM_PFN_UNADDRESSABLE (1 << 10)
> +#define HMM_PFN_SHIFT 11
>  
>  static inline struct page *hmm_pfn_to_page(hmm_pfn_t pfn)
>  {
> @@ -323,6 +328,49 @@ bool hmm_vma_fault(struct vm_area_struct *vma,
>  hmm_pfn_t *pfns);
>  
>  
> +/*
> + * struct hmm_migrate_ops - migrate operation callback
> + *
> + * @alloc_and_copy: alloc destination memoiry and copy source to it
> + * @finalize_and_map: allow caller to inspect successfull migrated page
> + *
> + * The new HMM migrate helper hmm_vma_migrate() allow memory migration to use
> + * device DMA engine to perform copy from source to destination memory it 
> also
> + * allow caller to use its own memory allocator for destination memory.
> + *
> + * Note that in alloc_and_copy device driver can decide not to migrate some 
> of
> + * the entry, for those it must clear the HMM_PFN_MIGRATE flag. The 
> destination
> + * page must lock and the corresponding hmm_pfn_t value in the array updated
> + * with the HMM_PFN_MIGRATE and HMM_PFN_LOCKED flag set (and of course be a
> + * valid entry). It is expected that the page allocated will have an elevated
> + * refcount and that a put_page() will free the page. Device driver might 
> want
> + * to allocate with an extra-refcount if they want to control deallocation of
> + * failed migration inside the finalize_and_map() callback.
> + *
> + * Inside finalize_and_map() device driver must use the HMM_PFN_MIGRATE flag 
> to
> + * determine which page have been successfully migrated.
> + */
> +struct hmm_migrate_ops {
> + void (*alloc_and_copy)(struct vm_area_struct *vma,
> +unsigned long start,
> +unsigned long end,
> +hmm_pfn_t *pfns,
> +void *private);
> + void (*finalize_and_map)(struct vm_area_struct *vma,
> +  unsigned long start,
> +  unsigned long end,
> +  hmm_pfn_t *pfns,
> +  void *private);
> +};
> +
> +int hmm_vma_migrate(const struct hmm_migrate_ops *ops,
> + struct vm_

Re: [PATCH V8 1/3] tracing: add a possibility of exporting function trace to other places instead of ring buffer only

2016-11-20 Thread Chunyan Zhang
On 19 November 2016 at 00:45, Steven Rostedt  wrote:
> On Fri, 18 Nov 2016 16:57:53 +0200
> Alexander Shishkin  wrote:
>
>> Steven Rostedt  writes:
>>
>> > This looks good to me, although I would like this to go through my tree
>> > (to make sure it gets all my testing). I understand the next two
>> > patches depend on this, how would you want to go about that?
>> >
>> > One is that I can pull it in the next merge window, and the rest go in
>> > after that. Or I can take the other two patches with the proper acks as
>> > well.
>>
>> I just asked for the last patch be split 4 ways, but otherwise, they
>> have my acks. If Chunyan can do that, you can take all of them into your
>> tree.
>>
>
> OK, I'll wait for the split then.

OK, I will split that and send another patch-set with Alex's acks.

Many thanks for the reviews from you two,
Chunyan

>
> Thanks,
>
> -- Steve


Re: [kbuild-all] [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-20 Thread Ye Xiaolong
On 11/15, He Chen wrote:
>On Tue, Nov 15, 2016 at 04:24:39AM +0800, kbuild test robot wrote:
>> Hi He,
>> 
>> [auto build test ERROR on kvm/linux-next]
>> [also build test ERROR on v4.9-rc5]
>> [cannot apply to next-20161114]
>> [if your patch is applied to the wrong git tree, please drop us a note to 
>> help improve the system]
>> 
>> url:
>> https://github.com/0day-ci/linux/commits/He-Chen/x86-kvm-Add-AVX512_4VNNIW-and-AVX512_4FMAPS-support/20161114-170941
>> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
>> config: x86_64-kexec (attached as .config)
>> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=x86_64 
>> 
>> All errors (new ones prefixed by >>):
>> 
>>arch/x86/kvm/cpuid.c: In function '__do_cpuid_ent':
>> >> arch/x86/kvm/cpuid.c:472:18: error: implicit declaration of function 
>> >> 'get_scattered_cpuid_leaf' [-Werror=implicit-function-declaration]
>>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
>>  ^~~~
>> >> arch/x86/kvm/cpuid.c:472:49: error: 'CPUID_EDX' undeclared (first use in 
>> >> this function)
>>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
>> ^
>>arch/x86/kvm/cpuid.c:472:49: note: each undeclared identifier is reported 
>> only once for each function it appears in
>>cc1: some warnings being treated as errors
>>
>I have downloaded .config.gz in attachment and use the .config in it
>to build kernel in my local branch again, and I don't see any warn or
>error message.
>
>I wonder whether the previous 0001 and 0002 patches have applied to run
>this test? Or is there something wrong with my compiler or patches?

Hi, He

0day robot has't applied previous 0001 and 0002 patches in this case
for it considered this patch as an individual one. Please ignore this
warning. 

Btw: You could try use git(>=2.9.0) format-patch --base= (or
--base=auto for convenience) to record what (public, well-known) commit
your patch series was built on.

Thanks,
Xiaolong
>
>Thanks,
>-He
>___
>kbuild-all mailing list
>kbuild-...@lists.01.org
>https://lists.01.org/mailman/listinfo/kbuild-all


Re: [RFC] timekeeping: Use cached readouts for monotonic and raw clocks in suspend

2016-11-20 Thread joelaf

Hi Thomas,

On 11/20/2016 05:24 AM, Thomas Gleixner wrote:

On Sat, 19 Nov 2016, Joel Fernandes wrote:


I am planning to add boot clock as a trace clock that can account suspend time
during tracing, however ktime_get_with_offset throws a warning as the
clocksource is attempted to be accessed in suspend.


ktime_get_with_offset() cannot be used as trace clock at all because it can
life lock in NMI context. That's why we have ktime_get_mono_fast().


But ktime_get_mono_fast doesn't account the suspend time, only boot 
clock (accessed with ktime_get_with_offset) does and while tracing it is 
useful for the trace clock to account suspended time for my usecase.


Instead, would it be ok to introduce a fast boot clock derived from fast 
monotonic clock to address the NMI live lock issues you mentioned? Below 
is an untested patch just to show the idea. Let me know your suggestions 
and Thanks,


Joel
--8<--
From 78c4f89e6f39cdd32e91883f2d2a80c7d97e34cf Mon Sep 17 00:00:00 2001
From: Joel Fernandes 
Date: Sun, 20 Nov 2016 18:58:28 -0800
Subject: [RFC] timekeeping: Add a fast boot clock derived from fast
 monotonic clock

Signed-off-by: Joel Fernandes 
---
 kernel/time/timekeeping.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 37dec7e..41afa1e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -55,6 +55,12 @@ static struct timekeeper shadow_timekeeper;
  */
 struct tk_fast {
seqcount_t  seq;
+
+   /*
+* first dimension is based on lower seq bit,
+* second dimension is for offset type (real, boot, tai)
+*/
+   ktime_t offsets[2][3];
struct tk_read_base base[2];
 };

@@ -350,14 +356,20 @@ static void update_fast_timekeeper(struct 
tk_read_base *tkr, struct tk_fast *tkf

/* Force readers off to base[1] */
raw_write_seqcount_latch(&tkf->seq);

-   /* Update base[0] */
+   /* Update base[0] and offsets*/
memcpy(base, tkr, sizeof(*base));
+   tkf->offsets[0][TK_OFFS_REAL] = tk_core.timekeeper.offs_real;
+   tkf->offsets[0][TK_OFFS_BOOT] = tk_core.timekeeper.offs_boot;
+   tkf->offsets[0][TK_OFFS_TAI] = tk_core.timekeeper.offs_tai;

/* Force readers back to base[0] */
raw_write_seqcount_latch(&tkf->seq);

-   /* Update base[1] */
+   /* Update base[1] and offsets*/
memcpy(base + 1, base, sizeof(*base));
+   tkf->offsets[1][TK_OFFS_REAL] = tk_core.timekeeper.offs_real;
+   tkf->offsets[1][TK_OFFS_BOOT] = tk_core.timekeeper.offs_boot;
+   tkf->offsets[1][TK_OFFS_TAI] = tk_core.timekeeper.offs_tai;
 }

 /**
@@ -392,16 +404,23 @@ static void update_fast_timekeeper(struct 
tk_read_base *tkr, struct tk_fast *tkf

  * of the following timestamps. Callers need to be aware of that and
  * deal with it.
  */
-static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
+static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf, int 
offset)

 {
struct tk_read_base *tkr;
unsigned int seq;
u64 now;
+   ktime_t *off;

do {
seq = raw_read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01);
-   now = ktime_to_ns(tkr->base);
+
+   if (offset >= 0) {
+   off = tkf->offsets[seq & 0x01];
+   now = ktime_to_ns(ktime_add(tkr->base, off[offset]));
+   } else {
+   now = ktime_to_ns(tkr->base);
+   }

now += timekeeping_delta_to_ns(tkr,
clocksource_delta(
@@ -415,16 +434,21 @@ static __always_inline u64 
__ktime_get_fast_ns(struct tk_fast *tkf)


 u64 ktime_get_mono_fast_ns(void)
 {
-   return __ktime_get_fast_ns(&tk_fast_mono);
+   return __ktime_get_fast_ns(&tk_fast_mono, -1);
 }
 EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns);

 u64 ktime_get_raw_fast_ns(void)
 {
-   return __ktime_get_fast_ns(&tk_fast_raw);
+   return __ktime_get_fast_ns(&tk_fast_raw, -1);
 }
 EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);

+u64 ktime_get_boot_fast_ns(void)
+{
+   return __ktime_get_fast_ns(&tk_fast_mono, TK_OFFS_BOOT);
+}
+
 /* Suspend-time cycles value for halted fast timekeeper. */
 static cycle_t cycles_at_suspend;

--
2.8.0.rc3.226.g39d4020


Re: [PATCH] reset: hisilicon: add a polarity cell for reset line specifier

2016-11-20 Thread Jiancheng Xue
Hi Philipp,

On 2016/11/16 11:17, Jiancheng Xue wrote:
> Hi Philipp,
> 
> On 2016/11/15 18:43, Philipp Zabel wrote:
>> Hi Jiancheng,
>>
>> Am Dienstag, den 15.11.2016, 15:09 +0800 schrieb Jiancheng Xue:
>>> Add a polarity cell for reset line specifier. If the reset line
>>> is asserted when the register bit is 1, the polarity is
>>> normal. Otherwise, it is inverted.
>>>
>>> Signed-off-by: Jiancheng Xue 
>>> ---
> Thank you very much for replying so soon.
> 
> Please allow me to decribe the reason why this patch exists first.
> All bits in the reset controller were designed to be active-high.
> But in a recent chip only one bit was implemented to be active-low :(
> 
>>>  .../devicetree/bindings/clock/hisi-crg.txt | 11 ---
>>>  arch/arm/boot/dts/hi3519.dtsi  |  2 +-
>>>  drivers/clk/hisilicon/reset.c  | 36 
>>> --
>>>  3 files changed, 33 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/clock/hisi-crg.txt 
>>> b/Documentation/devicetree/bindings/clock/hisi-crg.txt
>>> index e3919b6..fcbb4f3 100644
>>> --- a/Documentation/devicetree/bindings/clock/hisi-crg.txt
>>> +++ b/Documentation/devicetree/bindings/clock/hisi-crg.txt
>>> @@ -25,19 +25,20 @@ to specify the clock which they consume.
>>>  
>>>  All these identifier could be found in .
>>>  
>>> -- #reset-cells: should be 2.
>>> +- #reset-cells: should be 3.
>>>  
>>>  A reset signal can be controlled by writing a bit register in the CRG 
>>> module.
>>> -The reset specifier consists of two cells. The first cell represents the
>>> +The reset specifier consists of three cells. The first cell represents the
>>>  register offset relative to the base address. The second cell represents 
>>> the
>>> -bit index in the register.
>>> +bit index in the register. The third cell represents the polarity of the 
>>> reset
>>> +line (0 for normal, 1 for inverted).
>>
#reset-cells: Should be 2 if compatilbe string is "hisilicon,hi3519-crg". 
Should be 3 otherwise.
  A reset signal can be controlled by writing a bit register in the 
CRG module.
  The reset specifier consists of two or three cells. The first 
cell represents the
  register offset relative to the base address. The second cell 
represents the
  bit index in the register.The third cell represents the polarity 
of the reset
  line (0 for active-high, 1 for active-low).

If I change the binding like this, can it be accepted?

Regards,
Jiancheng

>> What is normal and what is inverted? Please specify which is active-high
>> and which is active-low.
>>
> OK. I'll use active-high and active-low instead.
> 
>>>  
>>>  Example: CRG nodes
>>>  CRG: clock-reset-controller@1201 {
>>> compatible = "hisilicon,hi3519-crg";
>>> reg = <0x1201 0x1>;
>>> #clock-cells = <1>;
>>> -   #reset-cells = <2>;
>>> +   #reset-cells = <3>;
>>>  };
>>>  
>>>  Example: consumer nodes
>>> @@ -45,5 +46,5 @@ i2c0: i2c@1211 {
>>> compatible = "hisilicon,hi3519-i2c";
>>> reg = <0x1211 0x1000>;
>>> clocks = <&CRG HI3519_I2C0_RST>;
>>> -   resets = <&CRG 0xe4 0>;
>>> +   resets = <&CRG 0xe4 0 0>;
>>>  };
>>> diff --git a/arch/arm/boot/dts/hi3519.dtsi b/arch/arm/boot/dts/hi3519.dtsi
>>> index 5729ecf..b7cb182 100644
>>> --- a/arch/arm/boot/dts/hi3519.dtsi
>>> +++ b/arch/arm/boot/dts/hi3519.dtsi
>>> @@ -50,7 +50,7 @@
>>> crg: clock-reset-controller@1201 {
>>> compatible = "hisilicon,hi3519-crg";
>>> #clock-cells = <1>;
>>> -   #reset-cells = <2>;
>>> +   #reset-cells = <3>;
>>
>> That is a backwards incompatible change. Which I think in this case
>> could be tolerated, because there are no users yet of the reset
>> controller. Or are there any hi3519 based device trees that use the
>> resets out in the wild? If there are, the driver must continue to
>> support old device trees with two reset-cells. Which would not be
>> trivial because currently the core checks in reset_control_get that
>> rcdev->of_n_reset_cells is equal to the #reset-cells value from DT.
> 




Re: [PATCH v2 3/9] arm64: dts: rockchip: add VOP and VOP iommu node for rk3399

2016-11-20 Thread Caesar Wang

在 2016年11月15日 00:05, Heiko Stuebner 写道:

Am Mittwoch, 9. November 2016, 21:21:55 CET schrieb Caesar Wang:

From: Mark Yao 

Add the core display-subsystem node and the two display controllers
available on the rk3399.

Signed-off-by: Mark Yao 
Signed-off-by: Yakir Yang 
Signed-off-by: Caesar Wang 
---

Changes in v2: None

  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 58
 1 file changed, 58 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index e5b5b3d..f1d289a 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -1290,6 +1290,64 @@
status = "disabled";
};

+   vopl: vop@ff8f {
+   compatible = "rockchip,rk3399-vop-lit";
+   reg = <0x0 0xff8f 0x0 0x3efc>;
+   interrupts = ;

we're usig 4 irq elements nowadays to accomodate the pmus for separate
clusters, see

https://git.kernel.org/cgit/linux/kernel/git/mmind/linux-rockchip.git/commit/?id=210bbd38bb88989ce19208f98e530ff0468f38bd

Same for the edp node.


Ah!  Sorry.



Also, sadly the rockchip drm seems to need some tweaks still, as I wasn't
able to get any display output yet.

To make the vop at least compile I needed to forward-port
https://github.com/mmind/linux-rockchip/commit/05ad856e54fc1aa1939ad1057897036cedc7fb0b
https://github.com/mmind/linux-rockchip/commit/0edb1f7e1ac77437a17d7966121ee6e10ab5db67

[full branch is 
https://github.com/mmind/linux-rockchip/commits/tmp/testing_20161109 ]


Pls allow me to have a look at it and bring up with ChromeOs, the 
upstream maybe miss some patches for upstream. (DRM or IOMMU or )
I will resend the other patches if  I bring up and show display with 
upstream  on 
https://github.com/Caesar-github/rockchip/commits/rk3399/tmp-test


-Caesar

but I'm not sure if I did that correctly yet and am also still seeing
nothing on the display and get iommu errors when starting X11


Heiko


+   clocks = <&cru ACLK_VOP1>, <&cru DCLK_VOP1>, <&cru HCLK_VOP1>;
+   clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
+   resets = <&cru SRST_A_VOP1>, <&cru SRST_H_VOP1>, <&cru 
SRST_D_VOP1>;
+   reset-names = "axi", "ahb", "dclk";
+   iommus = <&vopl_mmu>;
+   status = "disabled";
+
+   vopl_out: port {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+   };
+
+   vopl_mmu: iommu@ff8f3f00 {
+   compatible = "rockchip,iommu";
+   reg = <0x0 0xff8f3f00 0x0 0x100>;
+   interrupts = ;
+   interrupt-names = "vopl_mmu";
+   #iommu-cells = <0>;
+   status = "disabled";
+   };
+
+   vopb: vop@ff90 {
+   compatible = "rockchip,rk3399-vop-big";
+   reg = <0x0 0xff90 0x0 0x3efc>;
+   interrupts = ;
+   clocks = <&cru ACLK_VOP0>, <&cru DCLK_VOP0>, <&cru HCLK_VOP0>;
+   clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
+   resets = <&cru SRST_A_VOP0>, <&cru SRST_H_VOP0>, <&cru 
SRST_D_VOP0>;
+   reset-names = "axi", "ahb", "dclk";
+   iommus = <&vopb_mmu>;
+   status = "disabled";
+
+   vopb_out: port {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+   };
+
+   vopb_mmu: iommu@ff903f00 {
+   compatible = "rockchip,iommu";
+   reg = <0x0 0xff903f00 0x0 0x100>;
+   interrupts = ;
+   interrupt-names = "vopb_mmu";
+   #iommu-cells = <0>;
+   status = "disabled";
+   };
+
+   display_subsystem: display-subsystem {
+   compatible = "rockchip,display-subsystem";
+   ports = <&vopl_out>, <&vopb_out>;
+   status = "disabled";
+   };
+
pinctrl: pinctrl {
compatible = "rockchip,rk3399-pinctrl";
rockchip,grf = <&grf>;



___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip




Re: [HMM v13 09/18] mm/hmm/mirror: mirror process address space on device with HMM helpers

2016-11-20 Thread Balbir Singh


On 19/11/16 05:18, Jérôme Glisse wrote:
> This is a heterogeneous memory management (HMM) process address space
> mirroring. In a nutshell this provide an API to mirror process address
> space on a device. This boils down to keeping CPU and device page table
> synchronize (we assume that both device and CPU are cache coherent like
> PCIe device can be).
> 
> This patch provide a simple API for device driver to achieve address
> space mirroring thus avoiding each device driver to grow its own CPU
> page table walker and its own CPU page table synchronization mechanism.
> 
> This is usefull for NVidia GPU >= Pascal, Mellanox IB >= mlx5 and more
   useful
> hardware in the future.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Jatin Kumar 
> Signed-off-by: John Hubbard 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> ---
>  include/linux/hmm.h |  97 +++
>  mm/hmm.c| 160 
> 
>  2 files changed, 257 insertions(+)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 54dd529..f44e270 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -88,6 +88,7 @@
>  
>  #if IS_ENABLED(CONFIG_HMM)
>  
> +struct hmm;
>  
>  /*
>   * hmm_pfn_t - HMM use its own pfn type to keep several flags per page
> @@ -127,6 +128,102 @@ static inline hmm_pfn_t hmm_pfn_from_pfn(unsigned long 
> pfn)
>  }
>  
>  
> +/*
> + * Mirroring: how to use synchronize device page table with CPU page table ?
> + *
> + * Device driver must always synchronize with CPU page table update, for this
> + * they can either directly use mmu_notifier API or they can use the 
> hmm_mirror
> + * API. Device driver can decide to register one mirror per device per 
> process
> + * or just one mirror per process for a group of device. Pattern is :
> + *
> + *  int device_bind_address_space(..., struct mm_struct *mm, ...)
> + *  {
> + *  struct device_address_space *das;
> + *  int ret;
> + *  // Device driver specific initialization, and allocation of das
> + *  // which contain an hmm_mirror struct as one of its field.
> + *  ret = hmm_mirror_register(&das->mirror, mm, &device_mirror_ops);
> + *  if (ret) {
> + *  // Cleanup on error
> + *  return ret;
> + *  }
> + *  // Other device driver specific initialization
> + *  }
> + *
> + * Device driver must not free the struct containing hmm_mirror struct before
> + * calling hmm_mirror_unregister() expected usage is to do that when device
> + * driver is unbinding from an address space.
> + *
> + *  void device_unbind_address_space(struct device_address_space *das)
> + *  {
> + *  // Device driver specific cleanup
> + *  hmm_mirror_unregister(&das->mirror);
> + *  // Other device driver specific cleanup and now das can be free
> + *  }
> + *
> + * Once an hmm_mirror is register for an address space, device driver will 
> get
> + * callback through the update() operation (see hmm_mirror_ops struct).
> + */
> +
> +struct hmm_mirror;
> +
> +/*
> + * enum hmm_update - type of update
> + * @HMM_UPDATE_INVALIDATE: invalidate range (no indication as to why)
> + */
> +enum hmm_update {
> + HMM_UPDATE_INVALIDATE,
> +};
> +
> +/*
> + * struct hmm_mirror_ops - HMM mirror device operations callback
> + *
> + * @update: callback to update range on a device
> + */
> +struct hmm_mirror_ops {
> + /* update() - update virtual address range of memory
> +  *
> +  * @mirror: pointer to struct hmm_mirror
> +  * @update: update's type (turn read only, unmap, ...)
> +  * @start: virtual start address of the range to update
> +  * @end: virtual end address of the range to update
> +  *
> +  * This callback is call when the CPU page table is updated, the device
> +  * driver must update device page table accordingly to update's action.
> +  *
> +  * Device driver callback must wait until device have fully updated its
> +  * view for the range. Note we plan to make this asynchronous in later
> +  * patches. So that multiple devices can schedule update to their page
> +  * table and once all device have schedule the update then we wait for
> +  * them to propagate.
> +  */
> + void (*update)(struct hmm_mirror *mirror,
> +enum hmm_update action,
> +unsigned long start,
> +unsigned long end);
> +};
> +
> +/*
> + * struct hmm_mirror - mirror struct for a device driver
> + *
> + * @hmm: pointer to struct hmm (which is unique per mm_struct)
> + * @ops: device driver callback for HMM mirror operations
> + * @list: for list of mirrors of a given mm
> + *
> + * Each address space (mm_struct) being mirrored by a device must register 
> one
> + * of hmm_mirror struct with HMM. HMM will track list of all mirror

Re: [PATCH v3 10/10] ARM: dts: da850: add usb device node

2016-11-20 Thread David Lechner

On 11/07/2016 02:39 PM, Axel Haslam wrote:

This adds the ohci device node for the da850 soc.
It also enables it for the omapl138 hawk board.

Signed-off-by: Axel Haslam 
---
 arch/arm/boot/dts/da850-lcdk.dts | 8 
 arch/arm/boot/dts/da850.dtsi | 8 
 2 files changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/da850-lcdk.dts b/arch/arm/boot/dts/da850-lcdk.dts
index 7b8ab21..aaf533e 100644
--- a/arch/arm/boot/dts/da850-lcdk.dts
+++ b/arch/arm/boot/dts/da850-lcdk.dts
@@ -86,6 +86,14 @@
};
 };

+&usb_phy {
+   status = "okay";
+};
+
+&ohci {
+   status = "okay";
+};
+
 &serial2 {
pinctrl-names = "default";
pinctrl-0 = <&serial2_rxtx_pins>;
diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 2534aab..50e86da 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -405,6 +405,14 @@
>;
status = "disabled";
};
+   ohci: usb@0225000 {


In commit 2957e36e76c836b167e5e0c1edb578d8a9bd7af6 in the linux-davinci 
tree, the alias for the musb device is usb0. So, I think we should use 
usb1 here instead of ohci - or change the usb0 alias to musb.


https://git.kernel.org/cgit/linux/kernel/git/nsekhar/linux-davinci.git/commit/?h=v4.10/dt&id=2957e36e76c836b167e5e0c1edb578d8a9bd7af6


+   compatible = "ti,da830-ohci";
+   reg = <0x225000 0x1000>;
+   interrupts = <59>;
+   phys = <&usb_phy 1>;
+   phy-names = "usb-phy";
+   status = "disabled";
+   };
gpio: gpio@226000 {
compatible = "ti,dm6441-gpio";
gpio-controller;





Re: [PATCHv0 1/1] fbdev: add Intel FPGA FRAME BUFFER driver

2016-11-20 Thread Ong, Hean Loong
On Fri, 2016-11-18 at 12:56 -0600, Rob Herring wrote:
> On Fri, Nov 18, 2016 at 8:15 AM, One Thousand Gnomes
>  wrote:
> > 
> > > 
> > > AIUI, we're not taking new FB drivers. This should be a DRM
> > > driver
> > > instead.
> > Yes - clone one of the dumb DRM drivers, or if you've got any
> > little bits
> > of acceleration (even rolling the display) then it's possibly worth
> > accelerating for text mode.
> > 
> > > 
> > > > 
> > > > +- max-width: The width of the framebuffer in pixels.
> > > > +- max-height: The height of the framebuffer in pixels.
> > > > +- bits-per-color: only "8" is currently supported
> > > These are not h/w properties.
> > How are the max ones not hardware properties ?
> Because the way they are used is setting the mode, not some check of
> the max when the mode is set. If this is synthesized for only one
> size, then that would be different, but we have bindings for modes.
> 
> Rob

Currently the idea is to just synthesize the display to just 1920 x
1080. Therefore we came to a conclusion that it should be part of the
HW properties.

HeanLoong


[PATCH v2.1 7/9] arm64: dts: rockchip: add pd_edp node for rk3399

2016-11-20 Thread Caesar Wang
From: zhangqing 

This patch adds the below pd_edp information for rk3399.
1. add pd_edp node for RK3399 SoC
2. add the pd support for edp

Signed-off-by: Elaine Zhang 
Signed-off-by: Caesar Wang 
Reviewed-by: Doug Anderson 
---

Changes in v2.1: (Hope the v3 will fix the display stuff with upstream)
- change the commit message as Doug comments on
  https://patchwork.kernel.org/patch/9419241

Changes in v2: None

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index db72033..7354c63 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -838,6 +838,10 @@
};
 
/* These power domains are grouped by VD_LOGIC */
+   pd_edp@RK3399_PD_EDP {
+   reg = ;
+   clocks = <&cru PCLK_EDP_CTRL>;
+   };
pd_emmc@RK3399_PD_EMMC {
reg = ;
clocks = <&cru ACLK_EMMC>;
@@ -1388,6 +1392,7 @@
status = "disabled";
pinctrl-names = "default";
pinctrl-0 = <&edp_hpd>;
+   power-domains = <&power RK3399_PD_EDP>;
 
ports {
#address-cells = <1>;
-- 
2.7.4



Re: [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short)

2016-11-20 Thread Balbir Singh


On 19/11/16 05:18, Jérôme Glisse wrote:
> HMM provides 3 separate functionality :
> - Mirroring: synchronize CPU page table and device page table
> - Device memory: allocating struct page for device memory
> - Migration: migrating regular memory to device memory
> 
> This patch introduces some common helpers and definitions to all of
> those 3 functionality.
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Jatin Kumar 
> Signed-off-by: John Hubbard 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> ---
>  MAINTAINERS  |   7 +++
>  include/linux/hmm.h  | 139 
> +++
>  include/linux/mm_types.h |   5 ++
>  kernel/fork.c|   2 +
>  mm/Kconfig   |  11 
>  mm/Makefile  |   1 +
>  mm/hmm.c |  86 +
>  7 files changed, 251 insertions(+)
>  create mode 100644 include/linux/hmm.h
>  create mode 100644 mm/hmm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f593300..41cd63d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5582,6 +5582,13 @@ S: Supported
>  F:   drivers/scsi/hisi_sas/
>  F:   Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
>  
> +HMM - Heterogeneous Memory Management
> +M:   Jérôme Glisse 
> +L:   linux...@kvack.org
> +S:   Maintained
> +F:   mm/hmm*
> +F:   include/linux/hmm*
> +
>  HOST AP DRIVER
>  M:   Jouni Malinen 
>  L:   hos...@shmoo.com (subscribers-only)
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> new file mode 100644
> index 000..54dd529
> --- /dev/null
> +++ b/include/linux/hmm.h
> @@ -0,0 +1,139 @@
> +/*
> + * Copyright 2013 Red Hat Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors: Jérôme Glisse 
> + */
> +/*
> + * HMM provides 3 separate functionality :
> + *   - Mirroring: synchronize CPU page table and device page table
> + *   - Device memory: allocating struct page for device memory
> + *   - Migration: migrating regular memory to device memory
> + *
> + * Each can be use independently from the others.
> + *
> + *
> + * Mirroring:
> + *
> + * HMM provide helpers to mirror process address space on a device. For this 
> it
> + * provides several helpers to order device page table update in respect to 
> CPU
> + * page table update. Requirement is that for any given virtual address the 
> CPU
> + * and device page table can not point to different physical page. It uses 
> the
> + * mmu_notifier API and introduce virtual address range lock which block CPU
> + * page table update for a range while the device page table is being 
> updated.
> + * Usage pattern is:
> + *
> + *  hmm_vma_range_lock(vma, start, end);
> + *  // snap shot CPU page table
> + *  // update device page table from snapshot
> + *  hmm_vma_range_unlock(vma, start, end);
> + *
> + * Any CPU page table update that conflict with a range lock will wait until
> + * range is unlock. This garanty proper serialization of CPU and device page
> + * table update.
> + *
> + *
> + * Device memory:
> + *
> + * HMM provides helpers to help leverage device memory either addressable 
> like
> + * regular memory by the CPU or un-addressable at all. In both case the 
> device
> + * memory is associated to dedicated structs page (which are allocated like 
> for
> + * hotplug memory). Device memory management is under the responsability of 
> the
> + * device driver. HMM only allocate and initialize the struct pages 
> associated
> + * with the device memory.
> + *
> + * Allocating struct page for device memory allow to use device memory 
> allmost
> + * like any regular memory. Unlike regular memory it can not be added to the
> + * lru, nor can any memory allocation can use device memory directly. Device
> + * memory will only end up to be use in a process if device driver migrate 
> some
   in use 
> + * of the process memory from regular memory to device memory.
> + *

A process can never directly allocate device memory?

> + *
> + * Migration:
> + *
> + * Existing memory migration mechanism (mm/migrate.c) does not allow to use
> + * something else than the CPU to copy from source to destination memory. 
> More
> + * over existing code is not tailor to drive migration from process virtual
tailored
> + * address rather than from list of pages. Finaly the migration flow does not
 

Re: [PATCH V8 3/3] stm: Mark the functions of writing buffer with notrace

2016-11-20 Thread Chunyan Zhang
On 18 November 2016 at 22:45, Alexander Shishkin
 wrote:
> Chunyan Zhang  writes:
>
>> If CONFIG_STM_SOURCE_FTRACE is selected, Function trace data can be writen
>> to sink via STM, all functions that related to writing data packets to
>> STM should be marked 'notrace' to avoid being traced by Ftrace, otherwise
>> the program would stall into an endless loop.
>>
>> Signed-off-by: Chunyan Zhang 
>> Acked-by: Steven Rostedt 
>> ---
>>  drivers/hwtracing/coresight/coresight-stm.c |  2 +-
>>  drivers/hwtracing/intel_th/sth.c| 11 +++
>>  drivers/hwtracing/stm/core.c|  7 ---
>>  drivers/hwtracing/stm/dummy_stm.c   |  2 +-
>>  include/linux/stm.h |  4 ++--
>>  5 files changed, 15 insertions(+), 11 deletions(-)
>
> Quick nit: can you please split this one in 4: one for Coresight, one
> for Intel TH, one for stm/dummy and one for stm/core?

Sure, will do.

>
> I'd like to keep the bisectability. Otherwise, this is fine by me:
>
> Acked-by: Alexander Shishkin 

Thanks,
Chunyan

>
> Regards,
> --
> Alex


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-20 Thread Andy Lutomirski
[Adding Thorsten to help keep this from getting lost]

On Thu, Nov 3, 2016 at 1:30 PM, Andy Lutomirski  wrote:
> On Thu, Nov 3, 2016 at 11:16 AM, Eric Biggers  wrote:
>> Hello,
>>
>> I hit the BUG_ON() in arch/x86/mm/physaddr.c:26 while testing some crypto 
>> code
>> in an x86_64 kernel with CONFIG_DEBUG_VIRTUAL=y and CONFIG_VMAP_STACK=y:
>>
>> /* carry flag will be set if starting x was >= PAGE_OFFSET */
>> VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
>>
>> The problem is the following code in scatterwalk_map_and_copy() in
>> crypto/scatterwalk.c, which tries to determine if the buffer passed in 
>> aliases
>> the physical memory of the first segment of the scatterlist:
>>
>> if (sg_page(sg) == virt_to_page(buf) &&
>> sg->offset == offset_in_page(buf))
>> return;
>
> ...
>
>>
>> Currently I think the best solution would be to require that callers to
>> scatterwalk_map_and_copy() do not alias their source and destination.  Then 
>> the
>> alias check could be removed.  This check has only been there since v4.2 
>> (commit
>> 74412fd5d71b6), so I'd hope not many callers rely on the behavior.  I'm not 
>> sure
>> exactly which ones do, though.
>>
>> Thoughts on this?
>
> The relevant commit is:
>
> commit 74412fd5d71b6eda0beb302aa467da000f0d530c
> Author: Herbert Xu 
> Date:   Thu May 21 15:11:12 2015 +0800
>
> crypto: scatterwalk - Check for same address in map_and_copy
>
> This patch adds a check for in scatterwalk_map_and_copy to avoid
> copying from the same address to the same address.  This is going
> to be used for IV copying in AEAD IV generators.
>
> There is no provision for partial overlaps.
>
> This patch also uses the new scatterwalk_ffwd instead of doing
> it by hand in scatterwalk_map_and_copy.
>
> Signed-off-by: Herbert Xu 
>
> Herbert, can you clarify this?  The check seems rather bizarre --
> you're doing an incomplete check for aliasing and skipping the whole
> copy if the beginning aliases.  In any event the stack *can't*
> reasonably alias the scatterlist because a scatterlist can't safely
> point to the stack.  Is there any code that actually relies on the
> aliasing-detecting behavior?
>
> Also, Herbert, it seems like the considerable majority of the crypto
> code is acting on kernel virtual memory addresses and does software
> processing.  Would it perhaps make sense to add a kvec-based or
> iov_iter-based interface to the crypto code?  I bet it would be quite
> a bit faster and it would make crypto on stack buffers work directly.


Ping, everyone!

It's getting quite close to 4.9 release time.  Is there an actual bug
here?  Because, if so, we need to fix it.  My preference is to just
delete the weird aliasing check, but it would be really nice to know
if that check is needed for some reason.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [HMM v13 07/18] mm/ZONE_DEVICE/x86: add support for un-addressable device memory

2016-11-20 Thread Balbir Singh


On 19/11/16 05:18, Jérôme Glisse wrote:
> It does not need much, just skip populating kernel linear mapping
> for range of un-addressable device memory (it is pick so that there
> is no physical memory resource overlapping it). All the logic is in
> share mm code.
> 
> Only support x86-64 as this feature doesn't make much sense with
> constrained virtual address space of 32bits architecture.
> 

Is there a reason this would not work on powerpc64 for example?
Could you document the limitations -- testing/APIs/missing features?

Balbir Singh.


  1   2   3   >