Re: [tip:x86/debug] printk: Make the printk*once() variants return a value

2016-07-09 Thread Borislav Petkov
On Sat, Jul 09, 2016 at 10:56:55AM -0700, Joe Perches wrote:
> defconfigs both with and without CONFIG_PRINTK build
> properly with the proposed change to this specific patch.

Did you try latest tip/master?

> Borislav, your delightful personality always impresses.
> Never change.

What goes around comes around.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


[char-misc 4.7] mei: me: disable driver on SPT SPS firmware

2016-07-09 Thread Tomas Winkler
Sunrise Point PCH with SPS Firmware doesn't expose working
MEI interface, we need to quirk it out.

Cc:  #4.4+
Signed-off-by: Tomas Winkler 
---
 drivers/misc/mei/pci-me.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index 64e64da6da44..e64464c5c160 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -85,7 +85,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
 
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT, mei_me_pch8_cfg)},
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT_2, mei_me_pch8_cfg)},
-   {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H, mei_me_pch8_cfg)},
+   {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H, mei_me_pch8_sps_cfg)},
{MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H_2, mei_me_pch8_cfg)},
 
{MEI_PCI_DEVICE(MEI_DEV_ID_BXT_M, mei_me_pch8_cfg)},
-- 
2.5.5



Re: Missing include file in include/uapi/linux/errqueue.h?

2016-07-09 Thread Brooks Moses
On Sat, Jul 9, 2016 at 10:36 AM, Brooks Moses  wrote:
> I've been attempting to qualify the Linux 4.5.2 user-space headers for
> a toolchain release, and ran into what looks like a missing include
> file in include/uapi/linux/errqueue.h.  In particular,
> https://github.com/torvalds/linux/commit/f24b9be5957b38bb420b838115040dc2031b7d0c
> adds the following to this file:
>
> +struct scm_timestamping {
> + struct timespec ts[3];
> +};
>
> However, struct timespec is defined in time.h, which isn't included
> either in 4.5.2 or in current head.  Is this simply a missing #include
> line, or am I misunderstanding something?

As a followup: Unfortunately the obvious fix -- adding "#include
" -- causes other problems, since linux/time.h is
incompatible with the glibc time.h such that including both of them
into the same compilation unit causes errors about redefined types.
And we, at least, have some programs that want to include
linux/errqueue.h and (glibc's) time.h.  The fix of adding "#include
" to linux/errqueue.h seems to work for us, but I'm not sure
that won't cause problems in the other direction for other people.

- Brooks


Re: [CRIU] Introspecting userns relationships to other namespaces?

2016-07-09 Thread Andrew Vagin
On Fri, Jul 08, 2016 at 10:13:08PM -0500, Eric W. Biederman wrote:
> "W. Trevor King"  writes:
> 
> > On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote:
> >> In theory, we could get nsfs to show this information as an option
> >> (just add a show_options entry to the superblock ops), but the
> >> problem is that although each namespace has a parent user_ns,
> >> there's no way to get it without digging in the namespace specific
> >> structure.  Probably we should restructure to move it into
> >> ns_common, then we could display it (and enforce all namespaces
> >> having owning user_ns) but it would be a reasonably large (but
> >> mechanical) change.
> >
> > It sounds like everyone is either positive or or neutral on this
> > groundwork, even if we haven't decided if/how to expose the
> > information to userspace.  I'm happy to work up a patch while the rest
> > of the discussion continues.  I'm also happy to let someone else work
> > up the patch, if anyone else is chomping at the bit ;).
> 
> I am dubious on moving all of the user namespace members into ns_common.
> 
> I would happy to be proved wrong but I suspect in the cases where we
> actually use that user namespace the code will become uglier.  Making
> the ordinary uses uglier to make a rare corner case nicer is the wrong
> trade off.
> 
> But feel free to try it is certainly worth doing if it doesn't make the
> code that uses the user namespaces uglier.

If it's interesting for someone, I have this patch in my tree
https://github.com/avagin/linux-task-diag/commit/63b32df68ae8d3a3842bae42bbcae3468db76d85

I can't say that it makes something uglier.

> 
> Eric
> 
> ___
> CRIU mailing list
> c...@openvz.org
> https://lists.openvz.org/mailman/listinfo/criu


Re: [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users

2016-07-09 Thread kbuild test robot
Hi,

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on v4.7-rc6 next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
libnvdimm-for-next
config: um-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=um 

All error/warnings (new ones prefixed by >>):

   drivers/nvdimm/core.c: In function 'alloc_nvdimm_map':
>> drivers/nvdimm/core.c:108:23: error: implicit declaration of function 
>> 'ioremap' [-Werror=implicit-function-declaration]
  nvdimm_map->iomem = ioremap(offset, size);
  ^~~
>> drivers/nvdimm/core.c:108:21: warning: assignment makes pointer from integer 
>> without a cast [-Wint-conversion]
  nvdimm_map->iomem = ioremap(offset, size);
^
   drivers/nvdimm/core.c: In function 'nvdimm_map_release':
>> drivers/nvdimm/core.c:139:3: error: implicit declaration of function 
>> 'iounmap' [-Werror=implicit-function-declaration]
  iounmap(nvdimm_map->iomem);
  ^~~
   cc1: some warnings being treated as errors

vim +/ioremap +108 drivers/nvdimm/core.c

   102  if (!request_mem_region(offset, size, 
dev_name(&nvdimm_bus->dev)))
   103  goto err_request_region;
   104  
   105  if (flags)
   106  nvdimm_map->mem = memremap(offset, size, flags);
   107  else
 > 108  nvdimm_map->iomem = ioremap(offset, size);
   109  
   110  if (!nvdimm_map->mem)
   111  goto err_map;
   112  
   113  dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), "%s: bus 
unlocked!",
   114  __func__);
   115  list_add(&nvdimm_map->list, &nvdimm_bus->mapping_list);
   116  
   117  return nvdimm_map;
   118  
   119   err_map:
   120  release_mem_region(offset, size);
   121   err_request_region:
   122  kfree(nvdimm_map);
   123  return NULL;
   124  }
   125  
   126  static void nvdimm_map_release(struct kref *kref)
   127  {
   128  struct nvdimm_bus *nvdimm_bus;
   129  struct nvdimm_map *nvdimm_map;
   130  
   131  nvdimm_map = container_of(kref, struct nvdimm_map, kref);
   132  nvdimm_bus = nvdimm_map->nvdimm_bus;
   133  
   134  dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, 
&nvdimm_map->offset);
   135  list_del(&nvdimm_map->list);
   136  if (nvdimm_map->flags)
   137  memunmap(nvdimm_map->mem);
   138  else
 > 139  iounmap(nvdimm_map->iomem);
   140  release_mem_region(nvdimm_map->offset, nvdimm_map->size);
   141  kfree(nvdimm_map);
   142  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC PATCH 0/3] doc-rst: customize HTML (RTD) theme

2016-07-09 Thread Jonathan Corbet
On Tue, 5 Jul 2016 14:55:09 -0300
Mauro Carvalho Chehab  wrote:

> I hope you don't mind. I'm merging those three patches on my tree
> (for now, they're on an experimental tree that I can easily rebase, if
> needed). If OK for you, my plan is to merge it on a separate branch,
> together with the other patches for Documentation/linux_tv.

[Slowly trying to catch back up with the real world; service will
continue to be intermittent for a bit yet.]

So as far as I can tell, I never got part 1/3, not sure what happened
there.

In general, my only concern is that we haven't really begun the process
of debating the proper bikeshed^Wtheme for the kernel docs.  Which is
just fine.  At some point, we may want to think about it a bit more, but,
for now, there is certainly no harm in making what we have work better.
Please feel free to include these with your stuff with my acked-by.

Thanks,

jon


Re: Odd performance results

2016-07-09 Thread Peter Zijlstra


On 10 July 2016 06:26:39 CEST, "Paul E. McKenney"  
wrote:
>Hello!
>
>So I ran a quick benchmark which showed stair-step results.  I
>immediately
>thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7
>being threads in a core."  Then I thought "Wait, this is an x86!"
>Then I dumped out cpu*/topology/thread_siblings_list, getting the
>following:
>
>   cpu0/topology/thread_siblings_list: 0-1
>   cpu1/topology/thread_siblings_list: 0-1
>   cpu2/topology/thread_siblings_list: 2-3
>   cpu3/topology/thread_siblings_list: 2-3
>   cpu4/topology/thread_siblings_list: 4-5
>   cpu5/topology/thread_siblings_list: 4-5
>   cpu6/topology/thread_siblings_list: 6-7
>   cpu7/topology/thread_siblings_list: 6-7


I'm guessing this is an AMD bulldozer like machine?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()

2016-07-09 Thread Dan Williams
On Sat, Jul 9, 2016 at 9:47 PM, kbuild test robot  wrote:
> Hi,
>
> [auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
> [also build test ERROR on next-20160708]
> [cannot apply to v4.7-rc6]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
> libnvdimm-for-next
> config: i386-randconfig-r0-201628 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386

Hi kbuild team,

Can we add an "i386 allmodconfig" build to the standard "BUILD
SUCCESS" notification runs?  I had two positive build results on a
private branch prior to posting this series, but the i386 runs did not
build the nvdimm sub-system.

In any event this report is valid, so thank you for that!


>
> All errors (new ones prefixed by >>):
>
>drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
>>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function 
>>> 'writeq' [-Werror=implicit-function-declaration]
>writeq(1, ndrd->flush_wpq[i][0]);
>^~
>cc1: some warnings being treated as errors
>
> vim +/writeq +887 drivers/nvdimm/region_devs.c
>
>881   * writes to avoid the cache via arch_memcpy_to_pmem().  The
>882   * final wmb() ensures ordering for the NVDIMM flush write.
>883   */
>884  wmb();
>885  for (i = 0; i < nd_region->ndr_mappings; i++)
>886  if (ndrd->flush_wpq[i][0])
>  > 887  writeq(1, ndrd->flush_wpq[i][0]);
>888  wmb();
>889  }
>890  EXPORT_SYMBOL_GPL(nvdimm_flush);
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v3 0/7] lib: string: add functions to case-convert strings

2016-07-09 Thread Chris Metcalf

On 7/8/2016 6:43 PM, Markus Mayer wrote:

This series introduces a family of generic string case conversion
functions. This kind of functionality is needed in several places in
the kernel. Right now, everybody seems to be implementing their own
copy of this functionality.

Based on the discussion of the previous version of this series[1] and
the use cases found in the kernel, it does look like having several
flavours of case conversion functions is beneficial. The use cases fall
into three categories:
 - copying a string and converting the case while specifying a
   maximum length to mimic strlcpy()
 - copying a string and converting the case without specifying a
   length to mimic strcpy()
 - converting the case of a string in-place (i.e. modifying the
   string that was passed in)

Consequently, I am proposing these new functions:
 void strlcpytoupper(char *dst, const char *src, size_t len);
 void strlcpytolower(char *dst, const char *src, size_t len);
 void strcpytoupper(char *dst, const char *src);
 void strcpytolower(char *dst, const char *src);
 void strtoupper(char *s);
 void strtolower(char *s);


You may want to read the article here:

https://lwn.net/Articles/659214/

and follow up some of the discussion threads on LKML about the best
semantics to advertise for the strlcpy/strscpy variants.  It might be
helpful to return some kind of overflow/truncation error from your
copy functions so people can error-check the result.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com



Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()

2016-07-09 Thread kbuild test robot
Hi,

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on next-20160708]
[cannot apply to v4.7-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
libnvdimm-for-next
config: i386-randconfig-r0-201628 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function 
>> 'writeq' [-Werror=implicit-function-declaration]
   writeq(1, ndrd->flush_wpq[i][0]);
   ^~
   cc1: some warnings being treated as errors

vim +/writeq +887 drivers/nvdimm/region_devs.c

   881   * writes to avoid the cache via arch_memcpy_to_pmem().  The
   882   * final wmb() ensures ordering for the NVDIMM flush write.
   883   */
   884  wmb();
   885  for (i = 0; i < nd_region->ndr_mappings; i++)
   886  if (ndrd->flush_wpq[i][0])
 > 887  writeq(1, ndrd->flush_wpq[i][0]);
   888  wmb();
   889  }
   890  EXPORT_SYMBOL_GPL(nvdimm_flush);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Odd performance results

2016-07-09 Thread Paul E. McKenney
Hello!

So I ran a quick benchmark which showed stair-step results.  I immediately
thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7
being threads in a core."  Then I thought "Wait, this is an x86!"
Then I dumped out cpu*/topology/thread_siblings_list, getting the following:

cpu0/topology/thread_siblings_list: 0-1
cpu1/topology/thread_siblings_list: 0-1
cpu2/topology/thread_siblings_list: 2-3
cpu3/topology/thread_siblings_list: 2-3
cpu4/topology/thread_siblings_list: 4-5
cpu5/topology/thread_siblings_list: 4-5
cpu6/topology/thread_siblings_list: 6-7
cpu7/topology/thread_siblings_list: 6-7

Is this now expected behavior or a fluke of my particular laptop?  Here is
hoping for expected behavior, as it makes NUMA locality the default for
a great many workloads.

Enlightenment?

Thanx, Paul



Re: [PATCH v3 0/7] lib: string: add functions to case-convert strings

2016-07-09 Thread Markus Mayer
On 9 July 2016 at 20:13, Chris Metcalf  wrote:
> On 7/8/2016 6:43 PM, Markus Mayer wrote:
>>
>> This series introduces a family of generic string case conversion
>> functions. This kind of functionality is needed in several places in
>> the kernel. Right now, everybody seems to be implementing their own
>> copy of this functionality.
>>
>> Based on the discussion of the previous version of this series[1] and
>> the use cases found in the kernel, it does look like having several
>> flavours of case conversion functions is beneficial. The use cases fall
>> into three categories:
>>  - copying a string and converting the case while specifying a
>>maximum length to mimic strlcpy()
>>  - copying a string and converting the case without specifying a
>>length to mimic strcpy()
>>  - converting the case of a string in-place (i.e. modifying the
>>string that was passed in)
>>
>> Consequently, I am proposing these new functions:
>>  void strlcpytoupper(char *dst, const char *src, size_t len);
>>  void strlcpytolower(char *dst, const char *src, size_t len);
>>  void strcpytoupper(char *dst, const char *src);
>>  void strcpytolower(char *dst, const char *src);
>>  void strtoupper(char *s);
>>  void strtolower(char *s);
>
>
> You may want to read the article here:
>
> https://lwn.net/Articles/659214/

I'll read that. Thanks.

> and follow up some of the discussion threads on LKML about the best
> semantics to advertise for the strlcpy/strscpy variants.  It might be
> helpful to return some kind of overflow/truncation error from your
> copy functions so people can error-check the result.

I am inclined to agree. However, everybody has been telling me that
these functions should be void. Originally they weren't.

Regards,
-Markus


Re: [PATCH 12/13] nvme: switch to use pci_alloc_irq_vectors

2016-07-09 Thread Christoph Hellwig
On Thu, Jul 07, 2016 at 09:30:19PM +0200, Alexander Gordeev wrote:
> On Mon, Jul 04, 2016 at 05:39:33PM +0900, Christoph Hellwig wrote:
> > @@ -1575,6 +1546,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
> > dev->tagset.cmd_size = nvme_cmd_size(dev);
> > dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
> > dev->tagset.driver_data = dev;
> > +   dev->tagset.affinity_mask = to_pci_dev(dev->dev)->irq_affinity;
> >  
> > if (blk_mq_alloc_tag_set(&dev->tagset))
> > return 0;
> 
> Are there any post-init uses of blk_mq_tag_set::affinity_mask other than
> calling to blk_mq_alloc_tag_set()? If no, blk_mq_tag_set::affinity_mask
> is redundant, since the mask could be passed as a parameter.

We'll have to look at it in the block code when reinitializing rebuilding
the queue topology.  This isn't currently done, but we'll need it rather
soon.


Re: [PATCH 08/13] pci: spread interrupt vectors in pci_alloc_irq_vectors

2016-07-09 Thread Christoph Hellwig
On Thu, Jul 07, 2016 at 01:05:01PM +0200, Alexander Gordeev wrote:
> irq_create_affinity_mask() bails out with no affinity in case of single
> vector, but alloc_descs() (see below (*)) assigns the whole affinity
> mask. It should be consistent instead.

I don't understand the comment.  If we only have one vector (of any
kinds) there is no need to create an affinity mask, we'll leave the
interrupt to the existing irq balancing code.

> Actually, I just realized pci_alloc_irq_vectors() should probably call
> irq_create_affinity_mask() and handle it in a consistent way for all four
> cases: MSI-X, mulit-MSI, MSI and legacy.

That's what the earlier versions did, but you correctly pointed out
that we should call irq_create_affinity_mask only after we have reduced
the number of vectors to the number that the bridges can route, i.e.
that we have to move it into the pci_enable_msi(x)_range main loop.

> Optionally, the three latter could be dropped for now so you could proceed
> with NVMe.

NVMe cares for all these cases at least in theory.

> (*) In the future IRQ vs CPU mapping 1:N is possible/desirable so I suppose
> this piece of code worth a comment or better - a separate function. In fact,
> this algorithm already exists in alloc_descs(), which makes even more sense
> to factor it out:
> 
>   for (i = 0; i < cnt; i++) {
>   if (affinity) {
>   cpu = cpumask_next(cpu, affinity);
>   if (cpu >= nr_cpu_ids)
>   cpu = cpumask_first(affinity);
>   node = cpu_to_node(cpu);
> 
>   /*
>* For single allocations we use the caller provided
>* mask otherwise we use the mask of the target cpu
>*/
>   mask = cnt == 1 ? affinity : cpumask_of(cpu);
>   }
> 
>   [...]

While these two pieces of code look very similar there is an important
difference in why and how the mask is calculated.  In alloc_descs()
the difference here is that cnt = 1 is the MSI-X case where the
passed in affinity is that for the MSI-X descriptor which is for
a single vector.  in the MSI case where we have multiple vectors per
descriptor a different affinity is asigned for each vector based
of a single passed in mask.


Re: [PATCH 07/13] pci: Provide sensible irq vector alloc/free routines

2016-07-09 Thread Christoph Hellwig
On Wed, Jul 06, 2016 at 10:05:45AM +0200, Alexander Gordeev wrote:
> > + pci_enable_msi, pci_enable_msi_range, pci_enable_msi_exact, 
> > pci_disable_msi,
> > + pci_msi_vec_count, pci_enable_msix_range, pci_enable_msix_exact,
> > + pci_disable_msix, pci_msix_vec_count
> 
> Description of these functions can be removed when all drivers migrated
> to the new API. Also implementation descriptions + examples would still
> be needed AFAICT.

I diagreed - if we deprecated functions the only thing that should
be mentioned is a "don't use these". 

> This function's code almost matches the existing pci_enable_msix_range()
> so pci_enable_msix_range() should be reworked instead IMHO.

That's what earlier versions of the code did.  However due to the
fact that we want to avoid over-allocating the msix_vectors array
(minor) and get the vectors count of the affinity mask right (major,
as pointed out by you last time) I had to move the allocations inside
the helpers that loop around the atctual enablement.  I didn't want
to change the function to a different version of the algorithm just
before removing them relatively soon.  But given that strong preference
for changing these simple functions instead of duplicating them I've
changed that patch to do that now.

> We do not need to keep msix_entry array, since it only needed for
> pci_irq_vector() function. But the same info could be retrieved from
> msi_desc::irq.

Indeed.  Avoiding this allocation makes these interfaces quite a bit
simpler.  It requires a few prep patches, but I think it's definitively
worth, so the next version will avoid the need for the msix_entry array.

> > +   /* use legacy irq if allowed */
> > +   if (min_vecs == 1)
> > +   return 1;
> > +   return -ENOSPC;
> 
> The original error code (in vecs) would be overridden with -ENOSPC here.

Ok, fixed.

> > +   WARN_ON_ONCE(!dev->msi_enabled && nr > 0);
> > +   return dev->irq + nr;
> 
> I think this function should check irq number existence and return the
> vector number or -EINVAL;

Ok, fixed.

> > +   unsigned int flags)
> > +{
> > +   if (min_vecs > 1)
> > +   return -ENOSPC;
> 
> In case CONFIG_PCI_MSI is unset min_vecs > 1 is -EINVAL;

Ok, fixed.


Re: [PATCH] intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate

2016-07-09 Thread Srinivas Pandruvada
On Sat, 2016-07-09 at 02:45 +0200, Rafael J. Wysocki wrote:
> On Friday, July 08, 2016 12:39:07 PM Srinivas Pandruvada wrote:
> > On Fri, 2016-07-08 at 20:42 +0200, Jan Kiszka wrote:
> > > If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address
> > > some
> > > MSR 0x8648 or so. Mask out the relevant level bits 0 and 1.
> > > 
> > > Found while running over the Jailhouse hypervisor which became
> > > upset
> > > about this strange MSR index.
> > > 
> > > Signed-off-by: Jan Kiszka 
> > Acked-by: Srinivas Pandruvada 
> 
> OK
> 
> Should this go into stable?
Better to mark for stable tree 4.4+

Thanks,
Srinivas



Re: [PATCH 11/13] blk-mq: allow the driver to pass in an affinity mask

2016-07-09 Thread Christoph Hellwig
On Mon, Jul 04, 2016 at 11:35:28AM +0200, Alexander Gordeev wrote:
> > mq_map is initialized to zero already, so we don't really need the
> > assignment for queue 0.  The reason why this check exists is because
> > we start with queue = -1 and we never want to assignment -1 to mq_map.
> 
> Would this read better then?
> 
>   int queue = 0;
> 
>   ...
> 
>   /* If cpus are offline, map them to first hctx */
>   for_each_online_cpu(cpu) {
>   set->mq_map[cpu] = queue;
>   if (cpumask_test_cpu(cpu, affinity_mask))
>   queue++;

It would read better, but I don't think it's actually correct.
We'd still assign the 'old' queue to the cpu that is set in the affinity
mask.


[PATCH v2 02/17] nfit: don't override return value of nfit_mem_init

2016-07-09 Thread Dan Williams
We were needlessly converting nfit_mem_init() errors to -ENOMEM.

Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index d79837b9d07e..f8c1a850effc 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -2422,10 +2422,9 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, 
acpi_size sz)
if (rc)
goto out_unlock;
 
-   if (nfit_mem_init(acpi_desc) != 0) {
-   rc = -ENOMEM;
+   rc = nfit_mem_init(acpi_desc);
+   if (rc)
goto out_unlock;
-   }
 
acpi_nfit_init_dsms(acpi_desc);
 



[PATCH v2 01/17] nfit: always associate flush hints

2016-07-09 Thread Dan Williams
Before enabling use of flush hints for pmem regions, we need to make
sure they are always associated.  Move the initialization of nfit_flush
out of the block-window specific init path to the general init path.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c |   17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 3e54157f02cc..d79837b9d07e 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -614,7 +614,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc 
*acpi_desc,
 {
u16 dcr = __to_nfit_memdev(nfit_mem)->region_index;
struct nfit_memdev *nfit_memdev;
-   struct nfit_flush *nfit_flush;
struct nfit_bdw *nfit_bdw;
struct nfit_idt *nfit_idt;
u16 idt_idx, range_index;
@@ -647,14 +646,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc 
*acpi_desc,
nfit_mem->idt_bdw = nfit_idt->idt;
break;
}
-
-   list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
-   if (nfit_flush->flush->device_handle !=
-   nfit_memdev->memdev->device_handle)
-   continue;
-   nfit_mem->nfit_flush = nfit_flush;
-   break;
-   }
break;
}
 }
@@ -675,6 +666,7 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc 
*acpi_desc,
}
 
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+   struct nfit_flush *nfit_flush;
struct nfit_dcr *nfit_dcr;
u32 device_handle;
u16 dcr;
@@ -721,6 +713,13 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc 
*acpi_desc,
break;
}
 
+   list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
+   if (nfit_flush->flush->device_handle != device_handle)
+   continue;
+   nfit_mem->nfit_flush = nfit_flush;
+   break;
+   }
+
if (dcr && !nfit_mem->dcr) {
dev_err(acpi_desc->dev, "SPA %d missing DCR %d\n",
spa->range_index, dcr);



[PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users

2016-07-09 Thread Dan Williams
In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api.  Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active.  This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping.  Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.

This eliminates the need for the nd_blk_region.disable() callback.  Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.

Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c   |   14 +++--
 drivers/nvdimm/core.c |  122 +
 drivers/nvdimm/nd-core.h  |1 
 include/linux/libnvdimm.h |9 +++
 4 files changed, 139 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index f8c1a850effc..b047dbe13bed 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1616,7 +1616,8 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc 
*acpi_desc,
  * when all region devices referencing the same mapping are disabled /
  * unbound.
  */
-static void __iomem *nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
+static __maybe_unused void __iomem *nfit_spa_map(
+   struct acpi_nfit_desc *acpi_desc,
struct acpi_nfit_system_address *spa, enum spa_map_type type)
 {
void __iomem *iomem;
@@ -1669,7 +1670,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
struct device *dev)
 {
struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
-   struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
struct nd_blk_region *ndbr = to_nd_blk_region(dev);
struct nfit_flush *nfit_flush;
struct nfit_blk_mmio *mmio;
@@ -1697,8 +1697,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
/* map block aperture memory */
nfit_blk->bdw_offset = nfit_mem->bdw->offset;
mmio = &nfit_blk->mmio[BDW];
-   mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw,
-   SPA_MAP_APERTURE);
+   mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address,
+nfit_mem->spa_bdw->length, ARCH_MEMREMAP_PMEM);
if (!mmio->addr.base) {
dev_dbg(dev, "%s: %s failed to map bdw\n", __func__,
nvdimm_name(nvdimm));
@@ -1720,8 +1720,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
nfit_blk->cmd_offset = nfit_mem->dcr->command_offset;
nfit_blk->stat_offset = nfit_mem->dcr->status_offset;
mmio = &nfit_blk->mmio[DCR];
-   mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr,
-   SPA_MAP_CONTROL);
+   mmio->addr.base = devm_nvdimm_ioremap(dev, nfit_mem->spa_dcr->address,
+   nfit_mem->spa_dcr->length);
if (!mmio->addr.base) {
dev_dbg(dev, "%s: %s failed to map dcr\n", __func__,
nvdimm_name(nvdimm));
@@ -1748,7 +1748,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
 
nfit_flush = nfit_mem->nfit_flush;
if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-   nfit_blk->nvdimm_flush = devm_ioremap_nocache(dev,
+   nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
nfit_flush->flush->hint_address[0], 8);
if (!nfit_blk->nvdimm_flush)
return -ENOMEM;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 32e4fe2f6274..f9686297ff79 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -57,6 +57,127 @@ bool is_nvdimm_bus_locked(struct device *dev)
 }
 EXPORT_SYMBOL(is_nvdimm_bus_locked);
 
+struct nvdimm_map {
+   struct nvdimm_bus *nvdimm_bus;
+   struct list_head list;
+   resource_size_t offset;
+   unsigned long flags;
+   size_t size;
+   union {
+   void *mem;
+   void __iomem *iomem;
+   };
+   struct kref kref;
+};
+
+static struct nvdimm_map *find_nvdimm_map(struct device *dev,
+   resource_size_t offset)
+{
+   struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+   struct nvdimm_map *nvdimm_map;
+
+   list_for_each_entry(nvdimm_map, &nvdimm_bus->mapping_list, list)
+   if (nvdimm_map->offset == offset)
+   return nvdimm_map;
+   return NULL;
+}
+
+static struct nvdimm_map *alloc_nvdimm_map(struct device *dev,
+   resource_size_t offset, size_t size, unsigned long flags)
+{
+   struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+   struct nvdimm_map *nvdimm_ma

[PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm

2016-07-09 Thread Dan Williams
Sample nfit data to test the kernel's handling of the multiple
flush-hint case.

Signed-off-by: Dan Williams 
---
 tools/testing/nvdimm/test/nfit.c |   55 +++---
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 4fdd139f6e6c..ff09a28890ed 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -98,6 +98,7 @@
 enum {
NUM_PM  = 3,
NUM_DCR = 5,
+   NUM_HINTS = 8,
NUM_BDW = NUM_DCR,
NUM_SPA = NUM_PM + NUM_DCR + NUM_BDW,
NUM_MEM = NUM_DCR + NUM_BDW + 2 /* spa0 iset */ + 4 /* spa1 iset */,
@@ -569,7 +570,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
+ offsetof(struct acpi_nfit_control_region,
window_size) * NUM_DCR
+ sizeof(struct acpi_nfit_data_region) * NUM_BDW
-   + sizeof(struct acpi_nfit_flush_address) * NUM_DCR;
+   + (sizeof(struct acpi_nfit_flush_address)
+   + sizeof(u64) * NUM_HINTS) * NUM_DCR;
int i;
 
t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
@@ -599,7 +601,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
return -ENOMEM;
sprintf(t->label[i], "label%d", i);
 
-   t->flush[i] = test_alloc(t, 8, &t->flush_dma[i]);
+   t->flush[i] = test_alloc(t, sizeof(u64) * NUM_HINTS,
+   &t->flush_dma[i]);
if (!t->flush[i])
return -ENOMEM;
}
@@ -633,6 +636,8 @@ static int nfit_test1_alloc(struct nfit_test *t)
 
 static void nfit_test0_setup(struct nfit_test *t)
 {
+   const int flush_hint_size = sizeof(struct acpi_nfit_flush_address)
+   + (sizeof(u64) * NUM_HINTS);
struct acpi_nfit_desc *acpi_desc;
struct acpi_nfit_memory_map *memdev;
void *nfit_buf = t->nfit_buf;
@@ -640,7 +645,7 @@ static void nfit_test0_setup(struct nfit_test *t)
struct acpi_nfit_control_region *dcr;
struct acpi_nfit_data_region *bdw;
struct acpi_nfit_flush_address *flush;
-   unsigned int offset;
+   unsigned int offset, i;
 
/*
 * spa0 (interleave first half of dimm0 and dimm1, note storage
@@ -1126,37 +1131,41 @@ static void nfit_test0_setup(struct nfit_test *t)
/* flush0 (dimm0) */
flush = nfit_buf + offset;
flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-   flush->header.length = sizeof(struct acpi_nfit_flush_address);
+   flush->header.length = flush_hint_size;
flush->device_handle = handle[0];
-   flush->hint_count = 1;
-   flush->hint_address[0] = t->flush_dma[0];
+   flush->hint_count = NUM_HINTS;
+   for (i = 0; i < NUM_HINTS; i++)
+   flush->hint_address[i] = t->flush_dma[0] + i * sizeof(u64);
 
/* flush1 (dimm1) */
-   flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 1;
+   flush = nfit_buf + offset + flush_hint_size * 1;
flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-   flush->header.length = sizeof(struct acpi_nfit_flush_address);
+   flush->header.length = flush_hint_size;
flush->device_handle = handle[1];
-   flush->hint_count = 1;
-   flush->hint_address[0] = t->flush_dma[1];
+   flush->hint_count = NUM_HINTS;
+   for (i = 0; i < NUM_HINTS; i++)
+   flush->hint_address[i] = t->flush_dma[1] + i * sizeof(u64);
 
/* flush2 (dimm2) */
-   flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 2;
+   flush = nfit_buf + offset + flush_hint_size  * 2;
flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-   flush->header.length = sizeof(struct acpi_nfit_flush_address);
+   flush->header.length = flush_hint_size;
flush->device_handle = handle[2];
-   flush->hint_count = 1;
-   flush->hint_address[0] = t->flush_dma[2];
+   flush->hint_count = NUM_HINTS;
+   for (i = 0; i < NUM_HINTS; i++)
+   flush->hint_address[i] = t->flush_dma[2] + i * sizeof(u64);
 
/* flush3 (dimm3) */
-   flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 3;
+   flush = nfit_buf + offset + flush_hint_size * 3;
flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-   flush->header.length = sizeof(struct acpi_nfit_flush_address);
+   flush->header.length = flush_hint_size;
flush->device_handle = handle[3];
-   flush->hint_count = 1;
-   flush->hint_address[0] = t->flush_dma[3];
+   flush->hint_count = NUM_HINTS;
+   for (i = 0; i < NUM_HINTS; i++)
+   flush->hint_address[i] = t->flush_dma[3] + i * sizeof(u64);
 
if (t->setup_hotplug) {
-   offset = offset + sizeof(struct acpi_nfit_flush_address) * 4;
+  

[PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure

2016-07-09 Thread Dan Williams
Now that all shared mappings are handled by devm_nvdimm_memremap() we no
longer need nfit_spa_map() nor do we need to trigger a callback to the
bus provider at region disable time.

Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c  |  146 --
 drivers/acpi/nfit.h  |   21 --
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/region_devs.c |3 -
 include/linux/libnvdimm.h|1 
 5 files changed, 172 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index b047dbe13bed..b76c95981547 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1509,126 +1509,6 @@ static int acpi_nfit_blk_region_do_io(struct 
nd_blk_region *ndbr,
return rc;
 }
 
-static void nfit_spa_mapping_release(struct kref *kref)
-{
-   struct nfit_spa_mapping *spa_map = to_spa_map(kref);
-   struct acpi_nfit_system_address *spa = spa_map->spa;
-   struct acpi_nfit_desc *acpi_desc = spa_map->acpi_desc;
-
-   WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-   dev_dbg(acpi_desc->dev, "%s: SPA%d\n", __func__, spa->range_index);
-   if (spa_map->type == SPA_MAP_APERTURE)
-   memunmap((void __force *)spa_map->addr.aperture);
-   else
-   iounmap(spa_map->addr.base);
-   release_mem_region(spa->address, spa->length);
-   list_del(&spa_map->list);
-   kfree(spa_map);
-}
-
-static struct nfit_spa_mapping *find_spa_mapping(
-   struct acpi_nfit_desc *acpi_desc,
-   struct acpi_nfit_system_address *spa)
-{
-   struct nfit_spa_mapping *spa_map;
-
-   WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-   list_for_each_entry(spa_map, &acpi_desc->spa_maps, list)
-   if (spa_map->spa == spa)
-   return spa_map;
-
-   return NULL;
-}
-
-static void nfit_spa_unmap(struct acpi_nfit_desc *acpi_desc,
-   struct acpi_nfit_system_address *spa)
-{
-   struct nfit_spa_mapping *spa_map;
-
-   mutex_lock(&acpi_desc->spa_map_mutex);
-   spa_map = find_spa_mapping(acpi_desc, spa);
-
-   if (spa_map)
-   kref_put(&spa_map->kref, nfit_spa_mapping_release);
-   mutex_unlock(&acpi_desc->spa_map_mutex);
-}
-
-static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
-   struct acpi_nfit_system_address *spa, enum spa_map_type type)
-{
-   resource_size_t start = spa->address;
-   resource_size_t n = spa->length;
-   struct nfit_spa_mapping *spa_map;
-   struct resource *res;
-
-   WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-
-   spa_map = find_spa_mapping(acpi_desc, spa);
-   if (spa_map) {
-   kref_get(&spa_map->kref);
-   return spa_map->addr.base;
-   }
-
-   spa_map = kzalloc(sizeof(*spa_map), GFP_KERNEL);
-   if (!spa_map)
-   return NULL;
-
-   INIT_LIST_HEAD(&spa_map->list);
-   spa_map->spa = spa;
-   kref_init(&spa_map->kref);
-   spa_map->acpi_desc = acpi_desc;
-
-   res = request_mem_region(start, n, dev_name(acpi_desc->dev));
-   if (!res)
-   goto err_mem;
-
-   spa_map->type = type;
-   if (type == SPA_MAP_APERTURE)
-   spa_map->addr.aperture = (void __pmem *)memremap(start, n,
-   ARCH_MEMREMAP_PMEM);
-   else
-   spa_map->addr.base = ioremap_nocache(start, n);
-
-
-   if (!spa_map->addr.base)
-   goto err_map;
-
-   list_add_tail(&spa_map->list, &acpi_desc->spa_maps);
-   return spa_map->addr.base;
-
- err_map:
-   release_mem_region(start, n);
- err_mem:
-   kfree(spa_map);
-   return NULL;
-}
-
-/**
- * nfit_spa_map - interleave-aware managed-mappings of 
acpi_nfit_system_address ranges
- * @nvdimm_bus: NFIT-bus that provided the spa table entry
- * @nfit_spa: spa table to map
- * @type: aperture or control region
- *
- * In the case where block-data-window apertures and
- * dimm-control-regions are interleaved they will end up sharing a
- * single request_mem_region() + ioremap() for the address range.  In
- * the style of devm nfit_spa_map() mappings are automatically dropped
- * when all region devices referencing the same mapping are disabled /
- * unbound.
- */
-static __maybe_unused void __iomem *nfit_spa_map(
-   struct acpi_nfit_desc *acpi_desc,
-   struct acpi_nfit_system_address *spa, enum spa_map_type type)
-{
-   void __iomem *iomem;
-
-   mutex_lock(&acpi_desc->spa_map_mutex);
-   iomem = __nfit_spa_map(acpi_desc, spa, type);
-   mutex_unlock(&acpi_desc->spa_map_mutex);
-
-   return iomem;
-}
-
 static int nfit_blk_init_interleave(struct nfit_blk_mmio *mmio,
struct acpi_nfit_interleave *idt, u16 interleave_ways)
 {
@@ -1773,29 +1653,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
return 0;
 }
 

[PATCH v2 09/17] libnvdimm: cycle flush hints

2016-07-09 Thread Dan Williams
When the NFIT provides multiple flush hint addresses per-dimm it is
expressing that the platform is capable of processing multiple flush
requests in parallel.  There is some fixed cost per flush request, let
the cost be shared in parallel on multiple cpus.

Since there may not be enough flush hint addresses for each cpu to have
one, keep a per-cpu index of the last used hint, hash it with current
pid, and assume that access pattern and scheduler randomness will keep
the flush-hint usage somewhat staggered across cpus.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/nd.h  |1 +
 drivers/nvdimm/region_devs.c |   17 ++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 5912bd6b4234..40476399d227 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -52,6 +52,7 @@ struct nvdimm_drvdata {
 struct nd_region_data {
int ns_count;
int ns_active;
+   unsigned int flush_mask;
void __iomem *flush_wpq[0][0];
 };
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 46b6e2f7d5f0..4bcb3b6744aa 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -22,6 +23,7 @@
 #include "nd.h"
 
 static DEFINE_IDA(region_ida);
+static DEFINE_PER_CPU(int, flush_idx);
 
 static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int 
dimm,
struct nd_region_data *ndrd)
@@ -61,7 +63,7 @@ static int nvdimm_map_flush(struct device *dev, struct nvdimm 
*nvdimm, int dimm,
 
 int nd_region_activate(struct nd_region *nd_region)
 {
-   int i;
+   int i, num_flush = 0;
struct nd_region_data *ndrd;
struct device *dev = &nd_region->dev;
size_t flush_data_size = sizeof(void *);
@@ -73,6 +75,7 @@ int nd_region_activate(struct nd_region *nd_region)
 
/* at least one null hint slot per-dimm for the "no-hint" case 
*/
flush_data_size += sizeof(void *);
+   num_flush = min_not_zero(num_flush, nvdimm->num_flush);
if (!nvdimm->num_flush)
continue;
flush_data_size += nvdimm->num_flush * sizeof(void *);
@@ -84,6 +87,7 @@ int nd_region_activate(struct nd_region *nd_region)
return -ENOMEM;
dev_set_drvdata(dev, ndrd);
 
+   ndrd->flush_mask = (1 << ilog2(num_flush)) - 1;
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm *nvdimm = nd_mapping->nvdimm;
@@ -872,7 +876,14 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 void nvdimm_flush(struct nd_region *nd_region)
 {
struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
-   int i;
+   int i, idx;
+
+   /*
+* Try to encourage some diversity in flush hint addresses
+* across cpus assuming a limited number of flush hints.
+*/
+   idx = this_cpu_read(flush_idx);
+   idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
/*
 * The first wmb() is needed to 'sfence' all previous writes
@@ -884,7 +895,7 @@ void nvdimm_flush(struct nd_region *nd_region)
wmb();
for (i = 0; i < nd_region->ndr_mappings; i++)
if (ndrd->flush_wpq[i][0])
-   writeq(1, ndrd->flush_wpq[i][0]);
+   writeq(1, ndrd->flush_wpq[i][idx & ndrd->flush_mask]);
wmb();
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);



[PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal

2016-07-09 Thread Dan Williams
nd_region device driver data will be used in the namespace i/o path.
Re-order nd_region_remove() to ensure this data stays live across
namespace device removal

Signed-off-by: Dan Williams 
---
 drivers/nvdimm/region.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 333175dac8d5..8f241772ec0b 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -82,6 +82,8 @@ static int nd_region_remove(struct device *dev)
 {
struct nd_region *nd_region = to_nd_region(dev);
 
+   device_for_each_child(dev, NULL, child_unregister);
+
/* flush attribute readers and disable */
nvdimm_bus_lock(dev);
nd_region->ns_seed = NULL;
@@ -91,7 +93,6 @@ static int nd_region_remove(struct device *dev)
dev_set_drvdata(dev, NULL);
nvdimm_bus_unlock(dev);
 
-   device_for_each_child(dev, NULL, child_unregister);
return 0;
 }
 



[PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown

2016-07-09 Thread Dan Williams
Commit writes to media on system shutdown or pmem driver unload.

Signed-off-by: Dan Williams 
---
 drivers/nvdimm/bus.c  |   16 
 drivers/nvdimm/pmem.c |8 
 include/linux/nd.h|1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index e4882e63bece..1cc7880320fe 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -136,6 +136,21 @@ static int nvdimm_bus_remove(struct device *dev)
return rc;
 }
 
+static void nvdimm_bus_shutdown(struct device *dev)
+{
+   struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+   struct nd_device_driver *nd_drv = NULL;
+
+   if (dev->driver)
+   nd_drv = to_nd_device_driver(dev->driver);
+
+   if (nd_drv && nd_drv->shutdown) {
+   nd_drv->shutdown(dev);
+   dev_dbg(&nvdimm_bus->dev, "%s.shutdown(%s)\n",
+   dev->driver->name, dev_name(dev));
+   }
+}
+
 void nd_device_notify(struct device *dev, enum nvdimm_event event)
 {
device_lock(dev);
@@ -214,6 +229,7 @@ static struct bus_type nvdimm_bus_type = {
.match = nvdimm_bus_match,
.probe = nvdimm_bus_probe,
.remove = nvdimm_bus_remove,
+   .shutdown = nvdimm_bus_shutdown,
 };
 
 static ASYNC_DOMAIN_EXCLUSIVE(nd_async_domain);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 18cd95719da0..3f3fdb9586b9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -351,9 +351,16 @@ static int nd_pmem_remove(struct device *dev)
 {
if (is_nd_btt(dev))
nvdimm_namespace_detach_btt(to_nd_btt(dev));
+   nvdimm_flush(to_nd_region(dev->parent));
+
return 0;
 }
 
+static void nd_pmem_shutdown(struct device *dev)
+{
+   nvdimm_flush(to_nd_region(dev->parent));
+}
+
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
struct pmem_device *pmem = dev_get_drvdata(dev);
@@ -393,6 +400,7 @@ static struct nd_device_driver nd_pmem_driver = {
.probe = nd_pmem_probe,
.remove = nd_pmem_remove,
.notify = nd_pmem_notify,
+   .shutdown = nd_pmem_shutdown,
.drv = {
.name = "nd_pmem",
},
diff --git a/include/linux/nd.h b/include/linux/nd.h
index aee2761d294c..1ecd64643512 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -26,6 +26,7 @@ struct nd_device_driver {
unsigned long type;
int (*probe)(struct device *dev);
int (*remove)(struct device *dev);
+   void (*shutdown)(struct device *dev);
void (*notify)(struct device *dev, enum nvdimm_event event);
 };
 



[PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()

2016-07-09 Thread Dan Williams
Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pmem.c |   24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e303655f243e..18cd95719da0 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,6 +113,11 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct 
page *page,
return rc;
 }
 
+/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */
+#ifndef REQ_FLUSH
+#define REQ_FLUSH REQ_PREFLUSH
+#endif
+
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
int rc = 0;
@@ -121,6 +126,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct bio_vec bvec;
struct bvec_iter iter;
struct pmem_device *pmem = q->queuedata;
+   struct nd_region *nd_region = to_region(pmem);
+
+   if (bio->bi_rw & REQ_FLUSH)
+   nvdimm_flush(nd_region);
 
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -135,8 +144,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
if (do_acct)
nd_iostat_end(bio, start);
 
-   if (bio_data_dir(bio))
-   nvdimm_flush(to_region(pmem));
+   if (bio->bi_rw & REQ_FUA)
+   nvdimm_flush(nd_region);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -149,8 +158,6 @@ static int pmem_rw_page(struct block_device *bdev, sector_t 
sector,
int rc;
 
rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
-   if (rw & WRITE)
-   nvdimm_flush(to_region(pmem));
 
/*
 * The ->rw_page interface is subtle and tricky.  The core
@@ -209,9 +216,9 @@ static int pmem_attach_disk(struct device *dev,
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
struct nd_region *nd_region = to_nd_region(dev->parent);
struct vmem_altmap __altmap, *altmap = NULL;
+   int nid = dev_to_node(dev), has_flush;
struct resource *res = &nsio->res;
struct nd_pfn *nd_pfn = NULL;
-   int nid = dev_to_node(dev);
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
struct resource pfn_res;
@@ -237,8 +244,6 @@ static int pmem_attach_disk(struct device *dev,
dev_set_drvdata(dev, pmem);
pmem->phys_addr = res->start;
pmem->size = resource_size(res);
-   if (nvdimm_has_flush(nd_region) < 0)
-   dev_warn(dev, "unable to guarantee persistence of writes\n");
 
if (!devm_request_mem_region(dev, res->start, resource_size(res),
dev_name(dev))) {
@@ -279,6 +284,11 @@ static int pmem_attach_disk(struct device *dev,
return PTR_ERR(addr);
pmem->virt_addr = (void __pmem *) addr;
 
+   has_flush = nvdimm_has_flush(nd_region);
+   if (has_flush < 0)
+   dev_warn(dev, "unable to guarantee persistence of writes\n");
+   else if (has_flush > 0)
+   blk_queue_write_cache(q, true, true);
blk_queue_make_request(q, pmem_make_request);
blk_queue_physical_block_size(q, PAGE_SIZE);
blk_queue_max_hw_sectors(q, UINT_MAX);



[PATCH v2 12/17] fs/dax: remove wmb_pmem()

2016-07-09 Thread Dan Williams
Flushing posted-write queues is now deferred to REQ_FLUSH context, or
otherwise handled by an ADR event at the platform level.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 fs/dax.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 761495bf5eb9..434f421da660 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -147,7 +147,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter 
*iter,
  struct buffer_head *bh)
 {
loff_t pos = start, max = start, bh_max = start;
-   bool hole = false, need_wmb = false;
+   bool hole = false;
struct block_device *bdev = NULL;
int rw = iov_iter_rw(iter), rc;
long map_len = 0;
@@ -213,7 +213,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter 
*iter,
 
if (iov_iter_rw(iter) == WRITE) {
len = copy_from_iter_pmem(dax.addr, max - pos, iter);
-   need_wmb = true;
} else if (!hole)
len = copy_to_iter((void __force *) dax.addr, max - pos,
iter);
@@ -230,8 +229,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter 
*iter,
dax.addr += len;
}
 
-   if (need_wmb)
-   wmb_pmem();
dax_unmap_atomic(bdev, &dax);
 
return (pos == start) ? rc : pos - start;
@@ -783,7 +780,6 @@ int dax_writeback_mapping_range(struct address_space 
*mapping,
return ret;
}
}
-   wmb_pmem();
return 0;
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
@@ -1227,7 +1223,6 @@ int __dax_zero_page_range(struct block_device *bdev, 
sector_t sector,
if (dax_map_atomic(bdev, &dax) < 0)
return PTR_ERR(dax.addr);
clear_pmem(dax.addr + offset, length);
-   wmb_pmem();
dax_unmap_atomic(bdev, &dax);
}
return 0;



[PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()

2016-07-09 Thread Dan Williams
nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

  1: flush addresses defined
  0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

  1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
  0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c  |   33 ++---
 drivers/acpi/nfit.h  |1 -
 drivers/nvdimm/pmem.c|   27 -
 drivers/nvdimm/region_devs.c |   55 ++
 include/linux/libnvdimm.h|2 ++
 5 files changed, 81 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 6796f780870a..0497175ee6cb 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1393,24 +1393,6 @@ static u64 to_interleave_offset(u64 offset, struct 
nfit_blk_mmio *mmio)
return mmio->base_offset + line_offset + table_offset + sub_line_offset;
 }
 
-static void wmb_blk(struct nfit_blk *nfit_blk)
-{
-
-   if (nfit_blk->nvdimm_flush) {
-   /*
-* The first wmb() is needed to 'sfence' all previous writes
-* such that they are architecturally visible for the platform
-* buffer flush.  Note that we've already arranged for pmem
-* writes to avoid the cache via arch_memcpy_to_pmem().  The
-* final wmb() ensures ordering for the NVDIMM flush write.
-*/
-   wmb();
-   writeq(1, nfit_blk->nvdimm_flush);
-   wmb();
-   } else
-   wmb_pmem();
-}
-
 static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
 {
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
@@ -1445,7 +1427,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   wmb_blk(nfit_blk);
+   nvdimm_flush(nfit_blk->nd_region);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -1496,7 +1478,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   wmb_blk(nfit_blk);
+   nvdimm_flush(nfit_blk->nd_region);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
@@ -1570,7 +1552,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
 {
struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
struct nd_blk_region *ndbr = to_nd_blk_region(dev);
-   struct nfit_flush *nfit_flush;
struct nfit_blk_mmio *mmio;
struct nfit_blk *nfit_blk;
struct nfit_mem *nfit_mem;
@@ -1645,15 +1626,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus 
*nvdimm_bus,
return rc;
}
 
-   nfit_flush = nfit_mem->nfit_flush;
-   if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-   nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
-   nfit_flush->flush->hint_address[0], 8);
-   if (!nfit_blk->nvdimm_flush)
-   return -ENOMEM;
-   }
-
-   if (!arch_has_wmb_pmem() && !nfit_blk->nvdimm_flush)
+   if (nvdimm_has_flush(nfit_blk->nd_region) < 0)
dev_warn(dev, "unable to guarantee persistence of writes\n");
 
if (mmio->line_size == 0)
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9282eb324dcc..9fda77cf81da 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -183,7 +183,6 @@ struct nfit_blk {
u64 bdw_offset; /* post interleave offset */
u64 stat_offset;
u64 cmd_offset;
-   void __iomem *nvdimm_flush;
u32 dimm_flags;
 };
 
diff --git a/dri

[PATCH v2 15/17] Revert "KVM: x86: add pcommit support"

2016-07-09 Thread Dan Williams
This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.

Given the deprecation of the pcommit instruction, revert its usage as a
vm exit source in kvm.

Cc: Xiao Guangrong 
Cc: Paolo Bonzini 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/x86/include/asm/vmx.h  |1 -
 arch/x86/include/uapi/asm/vmx.h |4 +---
 arch/x86/kvm/cpuid.c|2 +-
 arch/x86/kvm/cpuid.h|8 
 arch/x86/kvm/vmx.c  |   32 
 5 files changed, 6 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 14c63c7e8337..a002b07a7099 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -72,7 +72,6 @@
 #define SECONDARY_EXEC_SHADOW_VMCS  0x4000
 #define SECONDARY_EXEC_ENABLE_PML   0x0002
 #define SECONDARY_EXEC_XSAVES  0x0010
-#define SECONDARY_EXEC_PCOMMIT 0x0020
 #define SECONDARY_EXEC_TSC_SCALING  0x0200
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 5b15d94a33f8..37fee272618f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -78,7 +78,6 @@
 #define EXIT_REASON_PML_FULL62
 #define EXIT_REASON_XSAVES  63
 #define EXIT_REASON_XRSTORS 64
-#define EXIT_REASON_PCOMMIT 65
 
 #define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -127,8 +126,7 @@
{ EXIT_REASON_INVVPID,   "INVVPID" }, \
{ EXIT_REASON_INVPCID,   "INVPCID" }, \
{ EXIT_REASON_XSAVES,"XSAVES" }, \
-   { EXIT_REASON_XRSTORS,   "XRSTORS" }, \
-   { EXIT_REASON_PCOMMIT,   "PCOMMIT" }
+   { EXIT_REASON_XRSTORS,   "XRSTORS" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL1
 #define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7597b42a8a88..643565364497 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -366,7 +366,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
-   F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
+   F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB);
 
/* cpuid 0xD.1.eax */
const u32 kvm_cpuid_D_1_eax_x86_features =
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e17a74b1d852..35058c2c0eea 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu 
*vcpu)
return best && (best->ebx & bit(X86_FEATURE_RTM));
 }
 
-static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)
-{
-   struct kvm_cpuid_entry2 *best;
-
-   best = kvm_find_cpuid_entry(vcpu, 7, 0);
-   return best && (best->ebx & bit(X86_FEATURE_PCOMMIT));
-}
-
 static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fb93010beaa4..2e2685424fdc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2705,8 +2705,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_WBINVD_EXITING |
-   SECONDARY_EXEC_XSAVES |
-   SECONDARY_EXEC_PCOMMIT;
+   SECONDARY_EXEC_XSAVES;
 
if (enable_ept) {
/* nested EPT: emulate EPT also to L1 */
@@ -3268,7 +3267,6 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_SHADOW_VMCS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_ENABLE_PML |
-   SECONDARY_EXEC_PCOMMIT |
SECONDARY_EXEC_TSC_SCALING;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -4856,9 +4854,6 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx 
*vmx)
if (!enable_pml)
exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
-   /* Currently, we allow L1 guest to directly run pcommit instruction. */
-   exec_control &= ~SECONDARY_EXEC_PCOMMIT;
-
return exec_control;
 }
 
@@ -4902,9 +4897,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx));
 
-   if (cpu_has_secondary_exec_ctrls())
+   if (cpu_has_secondary_exec_ctrls()) {

[PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes

2016-07-09 Thread Dan Williams
nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write.  Replace wmb_pmem() with
nvdimm_flush() in this path.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/claim.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 9997ff94a132..d5dc80c48b4c 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -240,7 +240,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
return memcpy_from_pmem(buf, nsio->addr + offset, size);
} else {
memcpy_to_pmem(nsio->addr + offset, buf, size);
-   wmb_pmem();
+   nvdimm_flush(to_nd_region(ndns->dev.parent));
}
 
return 0;



[PATCH v2 16/17] x86/insn: remove pcommit

2016-07-09 Thread Dan Williams
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Xiao Guangrong 
Cc: Adrian Hunter 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/x86/include/asm/cpufeatures.h |1 
 arch/x86/include/asm/special_insns.h   |   46 
 arch/x86/lib/x86-opcode-map.txt|2 -
 tools/objtool/arch/x86/insn/x86-opcode-map.txt |2 -
 tools/perf/arch/x86/tests/insn-x86-dat-32.c|2 -
 tools/perf/arch/x86/tests/insn-x86-dat-64.c|2 -
 tools/perf/arch/x86/tests/insn-x86-dat-src.c   |4 --
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |2 -
 8 files changed, 3 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4a413485f9eb..700d97df7d28 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -225,7 +225,6 @@
 #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   ( 9*32+20) /* Supervisor Mode Access Prevention 
*/
-#define X86_FEATURE_PCOMMIT( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB   ( 9*32+24) /* CLWB instruction */
 #define X86_FEATURE_AVX512PF   ( 9*32+26) /* AVX-512 Prefetch */
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index d96d04377765..587d7914ea4b 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -253,52 +253,6 @@ static inline void clwb(volatile void *__p)
: [pax] "a" (p));
 }
 
-/**
- * pcommit_sfence() - persistent commit and fence
- *
- * The PCOMMIT instruction ensures that data that has been flushed from the
- * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
- * memory and is durable on the DIMM.  The primary use case for this is
- * persistent memory.
- *
- * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
- * with appropriate fencing.
- *
- * Example:
- * void flush_and_commit_buffer(void *vaddr, unsigned int size)
- * {
- * unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
- * void *vend = vaddr + size;
- * void *p;
- *
- * for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- *  p < vend; p += boot_cpu_data.x86_clflush_size)
- * clwb(p);
- *
- * // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
- * // MFENCE via mb() also works
- * wmb();
- *
- * // PCOMMIT and the required SFENCE for ordering
- * pcommit_sfence();
- * }
- *
- * After this function completes the data pointed to by 'vaddr' has been
- * accepted to memory and will be durable if the 'vaddr' points to persistent
- * memory.
- *
- * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
- * things we include both the PCOMMIT and the required SFENCE in the
- * alternatives generated by pcommit_sfence().
- */
-static inline void pcommit_sfence(void)
-{
-   alternative(ASM_NOP7,
-   ".byte 0x66, 0x0f, 0xae, 0xf8\n\t" /* pcommit */
-   "sfence",
-   X86_FEATURE_PCOMMIT);
-}
-
 #define nop() asm volatile ("nop")
 
 
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt 
b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/tools/objtool/arch/x86/insn/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c 
b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
index 3b491cfe204e..38a48daed154 100644
--- a/tools/perf/arch/x86/tests/insn-

[PATCH v2 17/17] pmem: kill __pmem address space

2016-07-09 Thread Dan Williams
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 Documentation/filesystems/Locking |2 +
 arch/powerpc/sysdev/axonram.c |4 +-
 arch/x86/include/asm/pmem.h   |   41 +-
 drivers/acpi/nfit.h   |2 +
 drivers/block/brd.c   |4 +-
 drivers/nvdimm/pmem.c |6 ++-
 drivers/nvdimm/pmem.h |4 +-
 drivers/s390/block/dcssblk.c  |6 ++-
 fs/dax.c  |6 ++-
 include/linux/blkdev.h|6 ++-
 include/linux/compiler.h  |2 -
 include/linux/nd.h|2 +
 include/linux/pmem.h  |   70 +
 scripts/checkpatch.pl |1 -
 tools/testing/nvdimm/pmem-dax.c   |2 +
 15 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 75eea7ce3d7c..d9c37ec4c760 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -395,7 +395,7 @@ prototypes:
int (*release) (struct gendisk *, fmode_t);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned 
long);
-   int (*direct_access) (struct block_device *, sector_t, void __pmem **,
+   int (*direct_access) (struct block_device *, sector_t, void **,
unsigned long *);
int (*media_changed) (struct gendisk *);
void (*unlock_native_capacity) (struct gendisk *);
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ff75d70f7285..154cd9110c08 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -143,12 +143,12 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, pfn_t *pfn, long size)
+  void **kaddr, pfn_t *pfn, long size)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *kaddr = bank->io_addr + offset;
*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index a8cf2a6b14d9..643eba42d620 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -28,10 +28,9 @@
  * Copy data to persistent memory media via non-temporal stores so that
  * a subsequent pmem driver flush operation will drain posted write queues.
  */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-   size_t n)
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
-   int unwritten;
+   int rem;
 
/*
 * We are copying between two kernel buffers, if
@@ -39,19 +38,17 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, 
const void *src,
 * fault) we would have already reported a general protection fault
 * before the WARN+BUG.
 */
-   unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
-   (void __user *) src, n);
-   if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
-   __func__, dst, src, unwritten))
+   rem = __copy_from_user_inatomic_nocache(dst, (void __user *) src, n);
+   if (WARN(rem, "%s: fault copying %p <- %p unwritten: %d\n",
+   __func__, dst, src, rem))
BUG();
 }
 
-static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
-   size_t n)
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
 {
if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
-   return memcpy_mcsafe(dst, (void __force *) src, n);
-   memcpy(dst, (void __force *) src, n);
+   return memcpy_mcsafe(dst, src, n);
+   memcpy(dst, src, n);
return 0;
 }
 
@@ -63,15 +60,14 @@ static inline int arch_memcpy_from_pmem(void *dst, const 
void __pmem *src,
  * Write back a cache range using the CLWB (cache line write back)
  * instruction.
  */
-static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
unsigned long clflush_mask = x86_clflush_size - 1;
-   void *vaddr = (void __force *)addr;
-   void *vend = vaddr + s

[PATCH v2 14/17] pmem: kill wmb_pmem()

2016-07-09 Thread Dan Williams
All users have been replaced with flushing in the pmem driver.

Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/x86/include/asm/pmem.h |   36 ++---
 include/linux/pmem.h|   47 ---
 2 files changed, 6 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index fbc5e92e1ecc..a8cf2a6b14d9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -26,8 +26,7 @@
  * @n: length of the copy in bytes
  *
  * Copy data to persistent memory media via non-temporal stores so that
- * a subsequent arch_wmb_pmem() can flush cpu and memory controller
- * write buffers to guarantee durability.
+ * a subsequent pmem driver flush operation will drain posted write queues.
  */
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
size_t n)
@@ -57,32 +56,12 @@ static inline int arch_memcpy_from_pmem(void *dst, const 
void __pmem *src,
 }
 
 /**
- * arch_wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of arch_memcpy_to_pmem() operations this drains data
- * from cpu write buffers and any platform (memory controller) buffers
- * to ensure that written data is durable on persistent memory media.
- */
-static inline void arch_wmb_pmem(void)
-{
-   /*
-* wmb() to 'sfence' all previous writes such that they are
-* architecturally visible to 'pcommit'.  Note, that we've
-* already arranged for pmem writes to avoid the cache via
-* arch_memcpy_to_pmem().
-*/
-   wmb();
-   pcommit_sfence();
-}
-
-/**
  * arch_wb_cache_pmem - write back a cache range with CLWB
  * @vaddr: virtual start address
  * @size:  number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.
+ * instruction.
  */
 static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
 {
@@ -113,7 +92,6 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
  * @i: iterator with source data
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
struct iov_iter *i)
@@ -136,7 +114,6 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem 
*addr, size_t bytes,
  * @size:  number of bytes to zero
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline void arch_clear_pmem(void __pmem *addr, size_t size)
 {
@@ -150,14 +127,5 @@ static inline void arch_invalidate_pmem(void __pmem *addr, 
size_t size)
 {
clflush_cache_range((void __force *) addr, size);
 }
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-   /*
-* We require that wmb() be an 'sfence', that is only guaranteed on
-* 64-bit builds
-*/
-   return static_cpu_has(X86_FEATURE_PCOMMIT);
-}
 #endif /* CONFIG_ARCH_HAS_PMEM_API */
 #endif /* __ASM_X86_PMEM_H__ */
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 57d146fe44dd..9e3ea94b8157 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -26,16 +26,6 @@
  * calling these symbols with arch_has_pmem_api() and redirect to the
  * implementation in asm/pmem.h.
  */
-static inline bool __arch_has_wmb_pmem(void)
-{
-   return false;
-}
-
-static inline void arch_wmb_pmem(void)
-{
-   BUG();
-}
-
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
size_t n)
 {
@@ -101,20 +91,6 @@ static inline int memcpy_from_pmem(void *dst, void __pmem 
const *src,
return default_memcpy_from_pmem(dst, src, size);
 }
 
-/**
- * arch_has_wmb_pmem - true if wmb_pmem() ensures durability
- *
- * For a given cpu implementation within an architecture it is possible
- * that wmb_pmem() resolves to a nop.  In the case this returns
- * false, pmem api users are unable to ensure durability and may want to
- * fall back to a different data consistency model, or otherwise notify
- * the user.
- */
-static inline bool arch_has_wmb_pmem(void)
-{
-   return arch_has_pmem_api() && __arch_has_wmb_pmem();
-}
-
 /*
  * These defaults seek to offer decent performance and minimize the
  * window between i/o completion and writes being durable on media.
@@ -152,7 +128,7 @@ static inline void default_clear_pmem(void __pmem *addr, 
size_t size)
  * being effectively evicted from, or never written to, the processor
  * cache hierarchy after the copy completes.  After memcpy_to_pmem()
  * data may still reside in cpu or platform buffers, so this operation
- * must be followed by a wmb_pmem().
+ * must be followed by a blkdev_issue_flush() 

[PATCH v2 00/17] replace pcommit with ADR or directed flushing

2016-07-09 Thread Dan Williams
Changes since v1 [1]:

1/ Move flush address data from nvdimm_drvdata to nd_region_data (Greg,
   Toshi)

2/ Add more detail to cover letter and patch descriptions (Linda, Jeff)

3/ Account for s/REQ_FLUSH/REQ_PREFLUSH/ rename pending in -next.

4/ Add a directed flush at pmem ->remove() and ->shutdown() time.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005897.html

---

The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the expectation is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.

Code paths that previously called wmb_pmem() instead must arrange for a
flush request to be sent to the pmem driver. Towards this end, the pmem
driver is converted to advertise itself as having a write cache to
indicate to a filesystem that a flush request must occur before writes
are guaranteed to be on media.  See "[PATCH v2 08/17] libnvdimm:
introduce nvdimm_flush() and nvdimm_has_flush()" for details.

---

Dan Williams (17):
  nfit: always associate flush hints
  nfit: don't override return value of nfit_mem_init
  libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
  libnvdimm, nfit: remove nfit_spa_map() infrastructure
  libnvdimm, nfit: move flush hint mapping to region-device driver-data
  tools/testing/nvdimm: simulate multiple flush hints per-dimm
  libnvdimm: keep region data alive over namespace removal
  libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  libnvdimm: cycle flush hints
  libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
  libnvdimm, pmem: flush posted-write queues on shutdown
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  pmem: kill wmb_pmem()
  Revert "KVM: x86: add pcommit support"
  x86/insn: remove pcommit
  pmem: kill __pmem address space


 Documentation/filesystems/Locking  |2 
 arch/powerpc/sysdev/axonram.c  |4 
 arch/x86/include/asm/cpufeatures.h |1 
 arch/x86/include/asm/pmem.h|   77 ++-
 arch/x86/include/asm/special_insns.h   |   46 
 arch/x86/include/asm/vmx.h |1 
 arch/x86/include/uapi/asm/vmx.h|4 
 arch/x86/kvm/cpuid.c   |2 
 arch/x86/kvm/cpuid.h   |8 -
 arch/x86/kvm/vmx.c |   32 ---
 arch/x86/lib/x86-opcode-map.txt|2 
 drivers/acpi/nfit.c|  230 +++-
 drivers/acpi/nfit.h|   25 --
 drivers/block/brd.c|4 
 drivers/nvdimm/bus.c   |   16 +
 drivers/nvdimm/claim.c |2 
 drivers/nvdimm/core.c  |  122 +++
 drivers/nvdimm/dimm_devs.c |5 
 drivers/nvdimm/nd-core.h   |4 
 drivers/nvdimm/nd.h|   10 +
 drivers/nvdimm/pmem.c  |   59 -
 drivers/nvdimm/pmem.h  |4 
 drivers/nvdimm/region.c|   19 +-
 drivers/nvdimm/region_devs.c   |  148 -
 drivers/s390/block/dcssblk.c   |6 -
 fs/dax.c   |   13 -
 include/linux/blkdev.h |6 -
 include/linux/compiler.h   |2 
 include/linux/libnvdimm.h  |   16 +
 include/linux/nd.h |3 
 include/linux/pmem.h   |  117 ++
 scripts/checkpatch.pl  |1 
 tools/objtool/arch/x86/insn/x86-opcode-map.txt |2 
 tools/perf/arch/x86/tests/insn-x86-dat-32.c|2 
 tools/perf/arch/x86/tests/insn-x86-dat-64.c|2 
 tools/perf/arch/x86/tests/insn-x86-dat-src.c   |4 
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |2 
 tools/testing/nvdimm/pmem-dax.c|2 
 tools/testing/nvdimm/test/nfit.c   |   55 +++--
 39 files changed, 505 insertions(+), 555 deletions(-)


[PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data

2016-07-09 Thread Dan Williams
In preparation for triggering flushes of a DIMM's writes-posted-queue
(WPQ) via the pmem driver move mapping of flush hint addresses to the
region driver.  Since this uses devm_nvdimm_memremap() the flush
addresses will remain mapped while any region to which the dimm belongs
is active.

We need to communicate more information to the nvdimm core to facilitate
this mapping, namely each dimm object now carries an array of flush hint
address resources.

Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c  |   21 +++
 drivers/acpi/nfit.h  |1 +
 drivers/nvdimm/dimm_devs.c   |5 ++-
 drivers/nvdimm/nd-core.h |3 +-
 drivers/nvdimm/nd.h  |8 +++-
 drivers/nvdimm/region.c  |   16 -
 drivers/nvdimm/region_devs.c |   79 --
 include/linux/libnvdimm.h|4 ++
 8 files changed, 119 insertions(+), 18 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index b76c95981547..6796f780870a 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -714,9 +714,24 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc 
*acpi_desc,
}
 
list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
+   struct acpi_nfit_flush_address *flush;
+   u16 i;
+
if (nfit_flush->flush->device_handle != device_handle)
continue;
nfit_mem->nfit_flush = nfit_flush;
+   flush = nfit_flush->flush;
+   nfit_mem->flush_wpq = devm_kzalloc(acpi_desc->dev,
+   flush->hint_count
+   * sizeof(struct resource), GFP_KERNEL);
+   if (!nfit_mem->flush_wpq)
+   return -ENOMEM;
+   for (i = 0; i < flush->hint_count; i++) {
+   struct resource *res = &nfit_mem->flush_wpq[i];
+
+   res->start = flush->hint_address[i];
+   res->end = res->start + 8 - 1;
+   }
break;
}
 
@@ -1171,6 +1186,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
int dimm_count = 0;
 
list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
+   struct acpi_nfit_flush_address *flush;
unsigned long flags = 0, cmd_mask;
struct nvdimm *nvdimm;
u32 device_handle;
@@ -1204,9 +1220,12 @@ static int acpi_nfit_register_dimms(struct 
acpi_nfit_desc *acpi_desc)
if (nfit_mem->family == NVDIMM_FAMILY_INTEL)
cmd_mask |= nfit_mem->dsm_mask;
 
+   flush = nfit_mem->nfit_flush ? nfit_mem->nfit_flush->flush
+   : NULL;
nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
acpi_nfit_dimm_attribute_groups,
-   flags, cmd_mask);
+   flags, cmd_mask, flush ? flush->hint_count : 0,
+   nfit_mem->flush_wpq);
if (!nvdimm)
return -ENOMEM;
 
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 52078475d969..9282eb324dcc 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -127,6 +127,7 @@ struct nfit_mem {
struct list_head list;
struct acpi_device *adev;
struct acpi_nfit_desc *acpi_desc;
+   struct resource *flush_wpq;
unsigned long dsm_mask;
int family;
 };
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index bbde28d3dec5..d9bba5edd8dc 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -346,7 +346,8 @@ EXPORT_SYMBOL_GPL(nvdimm_attribute_group);
 
 struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void 
*provider_data,
const struct attribute_group **groups, unsigned long flags,
-   unsigned long cmd_mask)
+   unsigned long cmd_mask, int num_flush,
+   struct resource *flush_wpq)
 {
struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
struct device *dev;
@@ -362,6 +363,8 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, 
void *provider_data,
nvdimm->provider_data = provider_data;
nvdimm->flags = flags;
nvdimm->cmd_mask = cmd_mask;
+   nvdimm->num_flush = num_flush;
+   nvdimm->flush_wpq = flush_wpq;
atomic_set(&nvdimm->busy, 0);
dev = &nvdimm->dev;
dev_set_name(dev, "nmem%d", nvdimm->id);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 790b62cc81ed..6e961f7f43e7 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -41,7 +41,8 @@ struct nvdimm {
unsigned long cmd_mask;
s

Re: [PATCH] drivers/mtd/chips/cfi_cmdset_0020.c: Deinline do_write_buffer, save 5316 bytes

2016-07-09 Thread Brian Norris
On Fri, Apr 08, 2016 at 08:35:43PM +0200, Denys Vlasenko wrote:
> This function compiles to 2554 bytes of machine code.
> In C, the function is almost 200 lines long.
> 
> It has only one callsite, but forced inlining that much code
> makes gcc generate significantly worse code. Let gcc itself decide
> what to do.
> 
> Signed-off-by: Denys Vlasenko 
> CC: David Woodhouse 
> CC: Brian Norris 
> CC: Dan Carpenter 
> CC: Artem Bityutskiy 
> CC: linux-...@lists.infradead.org
> CC: linux-kernel@vger.kernel.org

Applied to l2-mtd.git


Re: [PATCH] mtd: Replace if and BUG with BUG_ON

2016-07-09 Thread Brian Norris
Hi,

On Tue, May 31, 2016 at 07:41:23AM +0200, Julia Lawall wrote:
> On Mon, 30 May 2016, Ezequiel Garcia wrote:
> > On 28 May 2016 at 13:41, Amitoj Kaur Chawla  wrote:
> > > Replace if condition and BUG() with a BUG_ON having the conditional
> > > expression of the if statement as argument.
[...]

> > > diff --git a/drivers/mtd/ssfdc.c b/drivers/mtd/ssfdc.c
> > > index daf82ba..41b13d1 100644
> > > --- a/drivers/mtd/ssfdc.c
> > > +++ b/drivers/mtd/ssfdc.c
> > > @@ -380,8 +380,7 @@ static int ssfdcr_readsect(struct mtd_blktrans_dev 
> > > *dev,
> > > " block_addr=%d\n", logic_sect_no, sectors_per_block, 
> > > offset,
> > > block_address);
> > >
> > > -   if (block_address >= ssfdc->map_len)
> > > -   BUG();
> > > +   BUG_ON(block_address >= ssfdc->map_len);
> > >
> > 
> > I don't want to be rude, but I wonder if there's any value at all in
> > such a patch. It barely improves readability, it barely reduces the
> > LoC, yet it consumes developer time, maintainer time, and changes git
> > per-line authorship (used in git blame).
> 
> Actually, I think that this particular patch does improve readability a 
> bit.  Scanning straight down the code is easier than looking under an if.
> Also, git blame now has a way to go back in history (although I don't 
> remember what it is), so the argument that cleaning up the code makes it 
> very difficult to find why the nontrivial part of the code is as it is 
> doesn't completely hold any more.

I agree it's a small improvement. Not sure I'd worry too much about
git-blame. Applied to l2-mtd.git.

Brian


[PATCH] x86 / hibernate: Use hlt_play_dead() when resuming from hibernation

2016-07-09 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.

However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again.  Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.

First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid.  Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.

A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.

To prevent it from happening, modify native_play_dead() to make
it use hlt_play_dead() instead of mwait_play_dead() during resume
from hibernation which avoids the inadvertent "revivals" of "dead"
CPUs.

A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases.  It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta 
Original-by: Chen Yu 
Signed-off-by: Rafael J. Wysocki 
---

This is a slightly rearranged new version of

https://patchwork.kernel.org/patch/9217459/

---
 arch/x86/include/asm/cpu.h |6 ++
 arch/x86/kernel/smpboot.c  |3 +++
 arch/x86/power/cpu.c   |   21 +
 kernel/power/hibernate.c   |7 ++-
 kernel/power/power.h   |2 ++
 5 files changed, 38 insertions(+), 1 deletion(-)

Index: linux-pm/kernel/power/hibernate.c
===
--- linux-pm.orig/kernel/power/hibernate.c
+++ linux-pm/kernel/power/hibernate.c
@@ -409,6 +409,11 @@ int hibernation_snapshot(int platform_mo
goto Close;
 }
 
+int __weak hibernate_resume_nonboot_cpu_disable(void)
+{
+   return disable_nonboot_cpus();
+}
+
 /**
  * resume_target_kernel - Restore system state from a hibernation image.
  * @platform_mode: Whether or not to use the platform driver.
@@ -433,7 +438,7 @@ static int resume_target_kernel(bool pla
if (error)
goto Cleanup;
 
-   error = disable_nonboot_cpus();
+   error = hibernate_resume_nonboot_cpu_disable();
if (error)
goto Enable_cpus;
 
Index: linux-pm/kernel/power/power.h
===
--- linux-pm.orig/kernel/power/power.h
+++ linux-pm/kernel/power/power.h
@@ -38,6 +38,8 @@ static inline char *check_image_kernel(s
 }
 #endif /* CONFIG_ARCH_HIBERNATION_HEADER */
 
+extern int hibernate_resume_nonboot_cpu_disable(void);
+
 /*
  * Keep some memory free so that I/O operations can succeed without paging
  * [Might this be more than 4 MB?]
Index: linux-pm/arch/x86/power/cpu.c
===
--- linux-pm.orig/arch/x86/power/cpu.c
+++ linux-pm/arch/x86/power/cpu.c
@@ -266,6 +266,27 @@ void notrace restore_processor_state(voi
 EXPORT_SYMBOL(restore_processor_state);
 #endif
 
+#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HOTPLUG_CPU)
+bool force_hlt_play_dead __read_mostly;
+
+int hibernate_resume_nonboot_cpu_disable(void)
+{
+   int ret;
+
+   /*
+* Ensure that MONITOR/MWAIT will not be used in the "play dead" loop
+* during hibernate image restoration, because it is likely that the
+* monitored address will be actually written to at that time and then
+* the "dead" CPU may start executing instructions from an image
+* kernel's page (and that may not be the "play dead" loop any more).
+*/
+   force_hlt_play_dead = true;
+   ret = disable_nonboot_cpus();
+   force_hlt_play_dead = false;
+   return ret;
+}
+#endif
+
 /*
  * When bsp_check() is called in hibernate and suspend, cpu hotplug
  * is disabled already. So it's unnessary to handle race condition between
Index: linux-pm/arch/x86/kernel/smpboot.c
===
--- linux-pm.orig/arch/x86/kernel/smpboot.c
+++ linux-pm/arch/x86/kernel/smpboot.c

Re: [PATCH] mtd: nand: brcmnand: Change BUG_ON in brcmnand_send_cmd

2016-07-09 Thread Brian Norris
On Fri, Jul 08, 2016 at 10:36:39AM -0700, Florian Fainelli wrote:
> Change the BUG_ON() condition in brcmnand_send_cmd() which checks for
> the interrupt status "controller ready" bit to a WARN_ON.
> 
> There is no good reason to kill the system when this condition occur
> because we could have systems which listed the NAND controller as
> available (e.g: from Device Tree), but the NAND chip could be
> malfunctioning and not responding.
> 
> Signed-off-by: Florian Fainelli 

Acked-by: Brian Norris 

> ---
> Note that I even hesitated to remove that completely, but there is
> some value in knowing about this condition since it helps figuring
> out what could be wrong.
> 
>  drivers/mtd/nand/brcmnand/brcmnand.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c 
> b/drivers/mtd/nand/brcmnand/brcmnand.c
> index b6062a2f3dfd..72bdc283778d 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1165,7 +1165,7 @@ static void brcmnand_send_cmd(struct brcmnand_host 
> *host, int cmd)
>   ctrl->cmd_pending = cmd;
>  
>   intfc = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS);
> - BUG_ON(!(intfc & INTFC_CTLR_READY));
> + WARN_ON(!(intfc & INTFC_CTLR_READY));
>  
>   mb(); /* flush previous writes */
>   brcmnand_write_reg(ctrl, BRCMNAND_CMD_START,
> -- 
> 2.7.4
> 


Re: [PATCH v6 00/10] acpi, clocksource: add GTDT driver and GTDT support in arm_arch_timer

2016-07-09 Thread Rafael J. Wysocki
On Saturday, July 09, 2016 11:44:47 AM Hanjun Guo wrote:
> On 2016/7/8 21:22, Lorenzo Pieralisi wrote:
> > On Thu, Jul 07, 2016 at 03:58:04PM +0200, Rafael J. Wysocki wrote:
> >
> > [...]
> >
> >>> Anyway let's avoid these petty arguments, I agree there must be some
> >>> sort of ARM64 ACPI maintainership for the reasons you mentioned above.
> >>
> >> To avoid confusion on who's going to push stuff to Linus, I can do
> >> that, but it must be clear whose ACKs are needed for that to happen.
> >> That may be one person or all of you, whatever you decide.
> >
> > I think the reasoning is the same, to avoid confusion and avoid stepping
> > on each other toes it is best to have a single gatekeeper (still
> > multiple maintainer entries to keep patches reviewed correctly), if no
> > one complains I will do that and a) provide ACKs (I will definitely
> > require and request Hanjun and Sudeep ones too appropriately on a per
> > patch basis) and b) send you pull requests.
> 
> Fine to me.
> 
> >
> > Having a maintainer per file would be farcical, I really do not
> 
> Agree, but having three of us in maintainer entries in MAINTAINERS
> file will help the patches be reviewed correctly with more eyes.
> 
> > expect that amount of traffic for drivers/acpi/arm64 therefore I
> > really doubt there is any risk of me slowing things down.
> >
> > Does this sound reasonable ? Comments/complaints welcome, please
> > manifest yourselves.
> 
> Fair enough. What I'm concern most is land ACPI on ARM64 soundly,
> let's do that :)
> 
> OK, let's back to this patch set, Fuwei already prepared a new version
> of patches [1] (moving acpi_gtdt.c to drivers/acpi/arm64/ and add a
> maintainer entries patch), shall we review and comment on this patch
> set for now, or just let Fuwei send out the new version?

Frankly, I don't see a point in discussing the old version only if a new
one is available already.  Post it, please.

Thanks,
Rafael



[PATCH] media: solo6x10: increase FRAME_BUF_SIZE

2016-07-09 Thread Andrey Utkin
In practice, devices sometimes return frames larger than current buffer
size, leading to failure in solo_send_desc().
It is not clear which minimal increase in buffer size would be enough,
so this patch doubles it, this should be safely assumed as sufficient.

Signed-off-by: Andrey Utkin 
---
 drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c 
b/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c
index 8b1cde5..3991643 100644
--- a/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c
+++ b/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c
@@ -33,7 +33,7 @@
 #include "solo6x10-jpeg.h"
 
 #define MIN_VID_BUFFERS2
-#define FRAME_BUF_SIZE (196 * 1024)
+#define FRAME_BUF_SIZE (400 * 1024)
 #define MP4_QS 16
 #define DMA_ALIGN  4096
 
-- 
2.8.4



[RFC PATCH v3 2/2] mm, thp: convert from optimistic swapin collapsing to conservative

2016-07-09 Thread Ebru Akagunduz
To detect whether khugepaged swapin worthwhile, this patch checks
the amount of young pages. There should be at least half of
HPAGE_PMD_NR to swapin.

Signed-off-by: Ebru Akagunduz 
Suggested-by: Minchan Kim 
---
Changes in v2:
 - Don't change thp design, only notice amount of young
   pages, if khugepaged needs to swapin (Minchan Kim).
 - Print out count of referenced pages in
   __collapse_huge_page_swapin() (Ebru Akagunduz)

Changes in v3:
 - After khugepaged extracted from huge_memory.c,
   changes moved to khugepaged.c

 include/trace/events/huge_memory.h | 19 +++
 mm/khugepaged.c| 38 +++---
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/huge_memory.h 
b/include/trace/events/huge_memory.h
index 830d47d..04f58ac 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -13,7 +13,7 @@
EM( SCAN_EXCEED_NONE_PTE,   "exceed_none_pte")  \
EM( SCAN_PTE_NON_PRESENT,   "pte_non_present")  \
EM( SCAN_PAGE_RO,   "no_writable_page") \
-   EM( SCAN_NO_REFERENCED_PAGE,"no_referenced_page")   \
+   EM( SCAN_LACK_REFERENCED_PAGE,  "lack_referenced_page") \
EM( SCAN_PAGE_NULL, "page_null")\
EM( SCAN_SCAN_ABORT,"scan_aborted") \
EM( SCAN_PAGE_COUNT,"not_suitable_page_count")  \
@@ -47,7 +47,7 @@ SCAN_STATUS
 TRACE_EVENT(mm_khugepaged_scan_pmd,
 
TP_PROTO(struct mm_struct *mm, struct page *page, bool writable,
-bool referenced, int none_or_zero, int status, int unmapped),
+int referenced, int none_or_zero, int status, int unmapped),
 
TP_ARGS(mm, page, writable, referenced, none_or_zero, status, unmapped),
 
@@ -55,7 +55,7 @@ TRACE_EVENT(mm_khugepaged_scan_pmd,
__field(struct mm_struct *, mm)
__field(unsigned long, pfn)
__field(bool, writable)
-   __field(bool, referenced)
+   __field(int, referenced)
__field(int, none_or_zero)
__field(int, status)
__field(int, unmapped)
@@ -108,14 +108,14 @@ TRACE_EVENT(mm_collapse_huge_page,
 TRACE_EVENT(mm_collapse_huge_page_isolate,
 
TP_PROTO(struct page *page, int none_or_zero,
-bool referenced, bool  writable, int status),
+int referenced, bool  writable, int status),
 
TP_ARGS(page, none_or_zero, referenced, writable, status),
 
TP_STRUCT__entry(
__field(unsigned long, pfn)
__field(int, none_or_zero)
-   __field(bool, referenced)
+   __field(int, referenced)
__field(bool, writable)
__field(int, status)
),
@@ -138,25 +138,28 @@ TRACE_EVENT(mm_collapse_huge_page_isolate,
 
 TRACE_EVENT(mm_collapse_huge_page_swapin,
 
-   TP_PROTO(struct mm_struct *mm, int swapped_in, int ret),
+   TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret),
 
-   TP_ARGS(mm, swapped_in, ret),
+   TP_ARGS(mm, swapped_in, referenced, ret),
 
TP_STRUCT__entry(
__field(struct mm_struct *, mm)
__field(int, swapped_in)
+   __field(int, referenced)
__field(int, ret)
),
 
TP_fast_assign(
__entry->mm = mm;
__entry->swapped_in = swapped_in;
+   __entry->referenced = referenced;
__entry->ret = ret;
),
 
-   TP_printk("mm=%p, swapped_in=%d, ret=%d",
+   TP_printk("mm=%p, swapped_in=%d, referenced=%d, ret=%d",
__entry->mm,
__entry->swapped_in,
+   __entry->referenced,
__entry->ret)
 );
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5661484..7dbee69 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -27,7 +27,7 @@ enum scan_result {
SCAN_EXCEED_NONE_PTE,
SCAN_PTE_NON_PRESENT,
SCAN_PAGE_RO,
-   SCAN_NO_REFERENCED_PAGE,
+   SCAN_LACK_REFERENCED_PAGE,
SCAN_PAGE_NULL,
SCAN_SCAN_ABORT,
SCAN_PAGE_COUNT,
@@ -500,8 +500,8 @@ static int __collapse_huge_page_isolate(struct 
vm_area_struct *vma,
 {
struct page *page = NULL;
pte_t *_pte;
-   int none_or_zero = 0, result = 0;
-   bool referenced = false, writable = false;
+   int none_or_zero = 0, result = 0, referenced = 0;
+   bool writable = false;
 
for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
 _pte++, address += PAGE_SIZE) {
@@ -580,11 +580,11 @@ static int __collapse_huge_page_isolate(struct 
vm_area_struct *vma,
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageLRU(page), page);
 
-   /* If there is n

[PATCH v3 1/2] mm, thp: fix comment inconsistency for swapin readahead functions

2016-07-09 Thread Ebru Akagunduz
After fixing swapin issues, comment lines stayed as in old version.
This patch updates the comments.

Signed-off-by: Ebru Akagunduz 
Cc: Hillf Danton 
---
Changes in v2:
 - Newly created in this version.

Changes in v3:
 - Replace Reported-by with Cc (Hillf Danton)
 - Remove RFC tag (Hillf Danton)
 - After khugepaged extracted from huge_memory.c,
   changes moved to khugepaged.c

 mm/khugepaged.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 93d5f87..5661484 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -891,9 +891,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct 
*mm,
/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
if (ret & VM_FAULT_RETRY) {
down_read(&mm->mmap_sem);
-   /* vma is no longer available, don't continue to swapin 
*/
-   if (hugepage_vma_revalidate(mm, address))
+   if (hugepage_vma_revalidate(mm, address)) {
+   /* vma is no longer available, don't continue 
to swapin */
return false;
+   }
/* check if the pmd is still valid */
if (mm_find_pmd(mm, address) != pmd)
return false;
@@ -969,7 +970,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 
/*
 * __collapse_huge_page_swapin always returns with mmap_sem locked.
-* If it fails, release mmap_sem and jump directly out.
+* If it fails, we release mmap_sem and jump out_nolock.
 * Continuing to collapse causes inconsistency.
 */
if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
-- 
1.9.1



[RFC PATCH v3 0/2] mm, thp: convert from optimistic swapin collapsing to conservative

2016-07-09 Thread Ebru Akagunduz
This patch series fixes comment inconsistency and supplies to decide
to swapin looking the amount of young pages.

Changes in v2:
 - Don't change thp design, notice young pages
   if needs to swapin
 - Add comment line fixing patch

Changes in v3:
 - Remove revert patch (allocstall), the patch automatically
   dropped
 - Set comment line fixing patch as first part of the series
 - Move changes from huge_memory.c to khugepaged.c

Ebru Akagunduz (2):
  mm, thp: fix comment inconsistency for swapin readahead functions
  mm, thp: convert from optimistic swapin collapsing to conservative

 include/trace/events/huge_memory.h | 19 +---
 mm/khugepaged.c| 45 +++---
 2 files changed, 38 insertions(+), 26 deletions(-)

-- 
1.9.1



Re: [PATCH v7 3/4] perf: xgene: Add APM X-Gene SoC Performance Monitoring Unit driver

2016-07-09 Thread Paul Gortmaker
On Wed, Jul 6, 2016 at 8:07 PM, Tai Nguyen  wrote:
> Signed-off-by: Tai Nguyen 
> ---
>  Documentation/perf/xgene-pmu.txt |   48 ++
>  drivers/perf/Kconfig |7 +
>  drivers/perf/Makefile|1 +
>  drivers/perf/xgene_pmu.c | 1398 
> ++
>  4 files changed, 1454 insertions(+)
>  create mode 100644 Documentation/perf/xgene-pmu.txt
>  create mode 100644 drivers/perf/xgene_pmu.c
>

[...]

> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 04e2653..4d5c5f9 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -12,4 +12,11 @@ config ARM_PMU
>   Say y if you want to use CPU performance monitors on ARM-based
>   systems.
>
> +config XGENE_PMU
> +depends on PERF_EVENTS && ARCH_XGENE
> +bool "APM X-Gene SoC PMU"

If the driver is bool, then please avoid using module.h and anything from
within it.  They are either no-ops when built in, or there are non-modular
equivalents available, so it is entirely avoidable, and makes for smaller
and better code.

> +default n
> +help
> +  Say y if you want to use APM X-Gene SoC performance monitors.
> +
>  endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index acd2397..b116e98 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -1 +1,2 @@
>  obj-$(CONFIG_ARM_PMU) += arm_pmu.o
> +obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o

[...]
ver = {
> +   .name   = "xgene-pmu",
> +   .of_match_table = xgene_pmu_of_match,
> +   .acpi_match_table = ACPI_PTR(xgene_pmu_acpi_match),
> +   },
> +};
> +
> +module_platform_driver(xgene_pmu_driver);

builtin_platform_driver

> +
> +MODULE_DESCRIPTION("APM X-Gene SoC PMU driver");
> +MODULE_AUTHOR("Hoan Tran ");
> +MODULE_AUTHOR("Tai Nguyen ");
> +MODULE_LICENSE("GPL");

As long as this information is at the top of the file, then these can
go away too -- just like MODULE_DEVICE_TABLE they are no-op.

Paul.


bug in memcg oom-killer results in a hung syscall in another process in the same cgroup

2016-07-09 Thread Shayan Pooya
I came across the following issue in kernel 3.16 (Ubuntu 14.04) which
was then reproduced in kernels 4.4 LTS:
After a couple of of memcg oom-kills in a cgroup, a syscall in
*another* process in the same cgroup hangs indefinitely.

Reproducing:

# mkdir -p strace_run
#  mkdir /sys/fs/cgroup/memory/1
# echo 1073741824 > /sys/fs/cgroup/memory/1/memory.limit_in_bytes
# echo 0 > /sys/fs/cgroup/memory/1/memory.swappiness
# for i in $(seq 1000); do ./call-mem-hog
/sys/fs/cgroup/memory/1/cgroup.procs & done

Where call-mem-hog is:
#!/bin/sh
set -ex
echo $$ > $1
echo "Adding $$ to $1"
strace -ff -tt ./mem-hog 2> strace_run/$$


Initially I thought it was a userspace bug in dash as it only happened
with /bin/sh (which points to dash) and not with bash. I see the
following hanging processes:

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 20999  0.0  0.0   4508   100 pts/6S16:28   0:00
/bin/sh ./call-mem-hog /sys/fs/cgroup/memory/1/cgroup.procs

However, when using strace, I noticed that sometimes there is actually
a mem-hog process hanging on sbrk syscall (Of course the
memory.oom_control is 0 and this is not expected).
Sending an ABRT signal to the waiting strace process then resulted in
the mem-hog process getting oom-killed by the kernel.


Re: [PATCH v6 3/5] usb: dwc3: add phyif_utmi_quirk

2016-07-09 Thread Heiko Stuebner
Am Samstag, 9. Juli 2016, 11:38:00 schrieb William.wu:
> Dear Heiko & Balbi,
> 
> On 2016/7/8 21:29, Felipe Balbi wrote:
> > Hi,
> > 
> > Heiko Stuebner  writes:
> >> Am Donnerstag, 7. Juli 2016, 10:54:24 schrieb William Wu:
> >>> Add a quirk to configure the core to support the
> >>> UTMI+ PHY with an 8- or 16-bit interface. UTMI+ PHY
> >>> interface is hardware property, and it's platform
> >>> dependent. Normall, the PHYIf can be configured
> >>> during coreconsultant. But for some specific usb
> >>> cores(e.g. rk3399 soc dwc3), the default PHYIf
> >>> configuration value is fault, so we need to
> >>> reconfigure it by software.
> >>> 
> >>> And refer to the dwc3 databook, the GUSB2PHYCFG.USBTRDTIM
> >>> must be set to the corresponding value according to
> >>> the UTMI+ PHY interface.
> >>> 
> >>> Signed-off-by: William Wu 
> >>> ---
> >> 
> >> [...]
> >> 
> >>> diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt
> >>> b/Documentation/devicetree/bindings/usb/dwc3.txt index
> >>> 020b0e9..8d7317d
> >>> 100644
> >>> --- a/Documentation/devicetree/bindings/usb/dwc3.txt
> >>> +++ b/Documentation/devicetree/bindings/usb/dwc3.txt
> >>> 
> >>> @@ -42,6 +42,10 @@ Optional properties:
> >>>- snps,dis-u2-freeclk-exists-quirk: when set, clear the
> >>> 
> >>> u2_freeclk_exists in GUSB2PHYCFG, specify that USB2 PHY doesn't
> >>> provide
> >>> 
> >>>   a free-running PHY clock.
> >>> 
> >>> + - snps,phyif-utmi-quirk: when set core will set phyif UTMI+
> >>> interface.
> >>> + - snps,phyif-utmi: the value to configure the core to support a
> >>> UTMI+
> >>> PHY + with an 8- or 16-bit interface. Value 0 
select 8-bit
> >>> + interface, value 1 select 16-bit interface.
> >> 
> >> maybe
> >> 
> >>snps,phyif-utmi-width = <8> or <16>;
> >> 
> >> devicetree is about describing the hardware, not the things that get
> >> written to registers :-) . The conversion from the described width to
> >> the register value can easily be done in the driver.
> 
> Thanks for your suggestion:-)
> Yes, “snps,phyif-utmi-width = <8> or <16>” is much clearer and easier to
> understand.
> And I have considered the same dts property for phyif-utmi, but I have
> no good idea about
> the conversion from described width to the registers value for the time
> being.
> 
> About phyif utmi width configuration, we need to set two places in
> GUSB2PHYCFG register,
> according to DWC3 USB3.0 controller databook version3.00a,6.3.46
> GUSB2PHYCFG
> 
> --
>  Bits   |  Name | Description
> --
>  13:10  |   USBTRDTIM   | Sets the turnaround
> time in PHY clocks.
>  || 4'h5: When the MAC
> 
> interface is 16-bit UTMI+
> 
>  || 4'h9: When the MAC
> 
> interface is 8-bit UTMI+/ULPI.
> --
>  3|   PHYIF|If UTMI+ is
> selected, the application uses this bit to configure
> 
>  ||core to support a UTMI+
> 
> PHY with an 8- or 16-bit interface.
> 
>  ||1'b0: 8 bits
>  ||1'b1: 16 bits
> 
> --
> 
> 
> And I think maybe I can try to do this:
> change it in dts:
>  snps,phyif-utmi-width = <8> or <16>;
> 
> Then convert to register value like this:
> device_property_read_u8(dev, "snps,phyif-utmi-width",
>   &phyif_utmi_width);
> 
> dwc->phyif_utmi = phyif_utmi_width >> 4;
> 
>   Ater the conversion, dwc->phyif_utmi value 0 means 8 bits, value 1
> means 16 bits,
>   and it's easier for us to config GUSB2PHYCFG.
> 
> Is it OK?

or you could just store the actual width value read from the dts and make 
the core handle accordingly, making everything a bit more explicit.

I guess personally I'd do something like:

make dwc->phyif_utmi a regular unsigned int

in probe:
ret = device_property_read_u8(dev, "snps,phyif-utmi-width",
  &dwc->phyif_utmi);
if (ret < 0) {
dwc->phyif_utmi = 0;
else if (dwc->phyif_utmi != 16 && dwc->phyif_utmi != 8) {
dev_err(dev, "unsupported utmi interface width %d\n",
dwc->phyif_utmi);
return -EINVAL;
}


when setting your GUSB2PHYCFG register:

   if (dwc->phyif_utmi > 0) {
   reg &= ~(DWC3_GUSB2PHYCFG_PHYIF_MASK |
  DWC3_GUSB2PHYCFG_USBTRDTIM_MASK);
   usbtrdtim = (dwc->phyif_utmi == 1

Re: [PATCH V2 04/10] firmware: tegra: add IVC library

2016-07-09 Thread Paul Gortmaker
On Tue, Jul 5, 2016 at 5:04 AM, Joseph Lo  wrote:
> The Inter-VM communication (IVC) is a communication protocol, which is
> designed for interprocessor communication (IPC) or the communication
> between the hypervisor and the virtual machine with a guest OS on it. So
> it can be translated as inter-virtual memory or inter-virtual machine
> communication. The message channels are maintained on the DRAM or SRAM
> and the data coherency should be considered. Or the data could be
> corrupted or out of date when the remote client checking it.
>
> Inside the IVC, it maintains memory-based descriptors for the TX/RX
> channels and the coherency issue of the counter and payloads. So the
> clients can use it to send/receive messages to/from remote ones.
>
> We introduce it as a library for the firmware drivers, which can use it
> for IPC.
>
> Based-on-the-work-by:
> Peter Newman 
>
> Signed-off-by: Joseph Lo 
> ---
> Changes in V2:
> - None
> ---
>  drivers/firmware/Kconfig|   1 +
>  drivers/firmware/Makefile   |   1 +
>  drivers/firmware/tegra/Kconfig  |  13 +
>  drivers/firmware/tegra/Makefile |   1 +
>  drivers/firmware/tegra/ivc.c| 659 
> 
>  include/soc/tegra/ivc.h | 102 +++
>  6 files changed, 777 insertions(+)
>  create mode 100644 drivers/firmware/tegra/Kconfig
>  create mode 100644 drivers/firmware/tegra/Makefile
>  create mode 100644 drivers/firmware/tegra/ivc.c
>  create mode 100644 include/soc/tegra/ivc.h
>
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index 5e618058defe..bbd64ae8c4c6 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -200,5 +200,6 @@ config HAVE_ARM_SMCCC
>  source "drivers/firmware/broadcom/Kconfig"
>  source "drivers/firmware/google/Kconfig"
>  source "drivers/firmware/efi/Kconfig"
> +source "drivers/firmware/tegra/Kconfig"
>
>  endmenu
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 474bada56fcd..9a4df8171cc4 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -24,3 +24,4 @@ obj-y += broadcom/
>  obj-$(CONFIG_GOOGLE_FIRMWARE)  += google/
>  obj-$(CONFIG_EFI)  += efi/
>  obj-$(CONFIG_UEFI_CPER)+= efi/
> +obj-y  += tegra/
> diff --git a/drivers/firmware/tegra/Kconfig b/drivers/firmware/tegra/Kconfig
> new file mode 100644
> index ..1fa3e4e136a5
> --- /dev/null
> +++ b/drivers/firmware/tegra/Kconfig
> @@ -0,0 +1,13 @@
> +menu "Tegra firmware driver"
> +
> +config TEGRA_IVC
> +   bool "Tegra IVC protocol"

If this driver is not tristate, then why does the driver include the
module.h header below?

> +   depends on ARCH_TEGRA
> +   help
> + IVC (Inter-VM Communication) protocol is part of the IPC
> + (Inter Processor Communication) framework on Tegra. It maintains the
> + data and the different commuication channels in SysRAM or RAM and
> + keeps the content is synchronization between host CPU and remote
> + processors.
> +
> +endmenu
> diff --git a/drivers/firmware/tegra/Makefile b/drivers/firmware/tegra/Makefile
> new file mode 100644
> index ..92e2153e8173
> --- /dev/null
> +++ b/drivers/firmware/tegra/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_TEGRA_IVC)+= ivc.o
> diff --git a/drivers/firmware/tegra/ivc.c b/drivers/firmware/tegra/ivc.c
> new file mode 100644
> index ..3e736bb9915a
> --- /dev/null
> +++ b/drivers/firmware/tegra/ivc.c
> @@ -0,0 +1,659 @@
> +/*
> + * Copyright (c) 2014-2016, NVIDIA CORPORATION.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#include 
 ^

I'm sure it "works" since module.h includes nearly everything else,
but that is less than ideal for exactly the same reason.

Thanks,
Paul.
--

> +
> +#include 
> +
> +#define IVC_ALIGN 64
> +


Re: [PATCH v2 2/6] clk: mvebu: Add the xtal clock for Armada 3700 SoC

2016-07-09 Thread Paul Gortmaker
On Thu, Jul 7, 2016 at 6:37 PM, Gregory CLEMENT
 wrote:
> This clock is the parent of all the Armada 3700 clocks. It is a fixed
> rate clock which depends on the gpio configuration read when resetting
> the SoC.
>
> Signed-off-by: Gregory CLEMENT 
> ---
>  drivers/clk/mvebu/Kconfig|  3 ++
>  drivers/clk/mvebu/Makefile   |  1 +
>  drivers/clk/mvebu/armada-37xx-xtal.c | 98 
> 
>  3 files changed, 102 insertions(+)
>  create mode 100644 drivers/clk/mvebu/armada-37xx-xtal.c
>
> diff --git a/drivers/clk/mvebu/Kconfig b/drivers/clk/mvebu/Kconfig
> index 3165da77d525..fddc8ac5faff 100644
> --- a/drivers/clk/mvebu/Kconfig
> +++ b/drivers/clk/mvebu/Kconfig
> @@ -24,6 +24,9 @@ config ARMADA_39X_CLK
> bool
> select MVEBU_CLK_COMMON
>
> +config ARMADA_37XX_CLK
> +   bool
> +

Since the driver is not tristate, can you please remove all modular
references from it?   With the author and license etc. at the top you
can just delete the last three lines, the DEVICE_TABLE and register
with builtin_platform_driver, and then no need for module.h either.

Either that, or change it to a tristate, if that use case makes sense.

Thanks,
Paul.
--


>  config ARMADA_XP_CLK
> bool
> select MVEBU_CLK_COMMON
> diff --git a/drivers/clk/mvebu/Makefile b/drivers/clk/mvebu/Makefile
> index 7172ef65693d..4257a36d0219 100644
> --- a/drivers/clk/mvebu/Makefile
> +++ b/drivers/clk/mvebu/Makefile
> @@ -6,6 +6,7 @@ obj-$(CONFIG_ARMADA_370_CLK)+= armada-370.o
>  obj-$(CONFIG_ARMADA_375_CLK)   += armada-375.o
>  obj-$(CONFIG_ARMADA_38X_CLK)   += armada-38x.o
>  obj-$(CONFIG_ARMADA_39X_CLK)   += armada-39x.o
> +obj-$(CONFIG_ARMADA_37XX_CLK)  += armada-37xx-xtal.o
>  obj-$(CONFIG_ARMADA_XP_

[...]


Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-09 Thread PaX Team
On 9 Jul 2016 at 14:27, Andy Lutomirski wrote:

> On Jul 6, 2016 6:25 PM, "Kees Cook"  wrote:
> >
> > Hi,
> >
> > This is a start of the mainline port of PAX_USERCOPY[1]. After I started
> > writing tests (now in lkdtm in -next) for Casey's earlier port[2], I
> > kept tweaking things further and further until I ended up with a whole
> > new patch series. To that end, I took Rik's feedback and made a number
> > of other changes and clean-ups as well.
> >
> 
> I like the series, but I have one minor nit to pick.  The effect of
> this series is to harden usercopy, but most of the code is really
> about infrastructure to validate that a pointed-to object is valid.

actually USERCOPY has never been about validating pointers. its sole purpose
is to validate the *size* argument of copy*user calls, a very specific form
of runtime bounds checking. it's only really relevant for slab objects and the
pointer checks (that one might mistake for being a part of the defense 
mechanism)
are only there to determine whether the kernel pointer refers to a slab object
or not (the stack part is a small bonus and was never the main goal either).

> Might it make sense to call the infrastructure part something else?

yes, more bikeshedding will surely help, like the renaming of .data..read_only
to .data..ro_after_init which also had nothing to do with init but everything
to do with objects being conceptually read-only...

> After all, this could be extended in the future for memcpy or even for
> some GCC plugin to check pointers passed to ordinary (non-allocator)
> functions.

what kind of checks are you thinking of here? and more fundamentally, against
what kind of threats? as for memcpy, it's the standard mandated memory copying
function, what security related properties can it check on its pointer 
arguments?



Re: [PATCH 14/14] PCI: xgene: make it explicitly non-modular

2016-07-09 Thread Paul Gortmaker
[Re: [PATCH 14/14] PCI: xgene: make it explicitly non-modular] On 07/07/2016 
(Thu 15:42) Duc Dang wrote:

> On Thu, Jul 7, 2016 at 3:35 PM, Tanmay Inamdar  wrote:
> >
> >
> > On Sat, Jul 2, 2016 at 4:13 PM, Paul Gortmaker
> >  wrote:
> >>
> >> The Kconfig currently controlling compilation of this code is:
> >>
> >> drivers/pci/host/Kconfig:config PCI_XGENE
> >> drivers/pci/host/Kconfig:   bool "X-Gene PCIe controller"
> >>
> >> ...meaning that it currently is not being built as a module by anyone.
> >>
> >> Lets remove the few trace uses of modular code and macros, so that
> >> when reading the driver there is no doubt it is builtin-only.
> >>
> >> Since module_platform_driver() uses the same init level priority as
> >> builtin_platform_driver() the init ordering remains unchanged with
> >> this commit.
> >>
> >> We also delete the MODULE_LICENSE tag etc. since all that information
> >> is already contained at the top of the file in the comments.
> >>
> >> Cc: Tanmay Inamdar 
> >> Cc: Bjorn Helgaas 
> >> Cc: linux-...@vger.kernel.org
> >> Signed-off-by: Paul Gortmaker 
> 
> Thanks for taking care of this, Paul.
> 
> I tested your patch and it worked fine on my X-Gene Mustang board.
> 
> One minor comment below.
> 
> >> ---
> >>  drivers/pci/host/pci-xgene.c | 8 ++--
> >>  1 file changed, 2 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/pci/host/pci-xgene.c b/drivers/pci/host/pci-xgene.c
> >> index 7eb20cc76dd3..a81273c23341 100644
> >> --- a/drivers/pci/host/pci-xgene.c
> >> +++ b/drivers/pci/host/pci-xgene.c
> >> @@ -21,7 +21,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> -#include 
> >> +#include 
> 
> The platform_device.h already has builtin_platform_driver macro
> defined. So this init.h is not need?

If you look, you will find that platform_device.h does not include the
init.h even though it references __init; it can do this w/o error since
all the references themselves are in a macro.  However once code wants
to be a consumer of those macros, they will need init.h present.  Often
you can overlook directly calling it out for inclusion since it gets
sourced by another header, but it is best policy to list what gets used.

Thanks for testing!

Paul.
--

> 
> >>  #include 
> >>  #include 
> >>  #include 
> >> @@ -579,8 +579,4 @@ static struct platform_driver xgene_pcie_driver = {
> >> },
> >> .probe = xgene_pcie_probe_bridge,
> >>  };
> >> -module_platform_driver(xgene_pcie_driver);
> >> -
> >> -MODULE_AUTHOR("Tanmay Inamdar ");
> >> -MODULE_DESCRIPTION("APM X-Gene PCIe driver");
> >> -MODULE_LICENSE("GPL v2");
> >> +builtin_platform_driver(xgene_pcie_driver);
> >
> >
> > Copying Duc.
> >>
> >> --
> >> 2.8.4
> >>
> >
> Regards,
> Duc Dang.


Re: [PATCH 0/3] ARM: dts: the dts support for rk3288 firefly reload

2016-07-09 Thread Heiko Stuebner
Hi Randy,

Am Samstag, 9. Juli 2016, 23:42:28 schrieb ayaka:
> On 07/08/2016 05:35 AM, Heiko Stuebner wrote:
> > Am Donnerstag, 7. Juli 2016, 02:22:57 schrieb Randy Li:
> >> The rk3288 firefly reload  is a Rockchip RK3288 based board be found by
> >> core board and main board. The regulators are connected in a different
> >> way to the previous version of firefly boards, it is necessary to
> >> move some common code to uncommon place.
> >> 
> >> I only tested the ethernet and confirmed that works.
> >> The usb in this board won't caused by the bugs in the driver.
> >> 
> >> This version follow the suggests from Heiko Stuebner,
> >> except the duplicated supply name problem, I don't think
> >> it could be fixed in that way.
> > 
> > I've now had a chance to look at that reload board on the firefly site.
> > Firefly also is the company name, so a board named that way is not
> > necessarily a "variant" :-) .
> > 
> > And looking at the "reload" board this definitly seems to be a very
> > different product with it being a system-on-module+baseboard design with
> > additional peripherals like that sata bridge, camera interfaces and
> > probably
> sata bridge is just a SATA to usb bridge and the "reload" bring back the
> DVP camera interface and
> a HDMI rx chip connected to the other MIPI camera interface.

there are always more things to control (reset pins, regulators) and the usb 
subsystem is currently in the process of getting support for such "embedded" 
uses.


> > more.
> > 
> > As you might've seen, most Rockchip boards are based on some reference-
> > design, so are similar in a big part of their core layout.
> 
> Yes, from the evb. But the even the main board of evb in rockchip
> company have at lease 3 versions
> as I known.
> Also the evb is found by power board, main board and core board.
> 
> > So, looking at the vastly different product the reload is, I'd really
> > like to have a separate dts for the reload, to not run into more
> > confusing differences later on.
> 
> The main problem is that power connections are different. That is why I
> decide to make a
> separate dts. If the kernel introduce the override dts, I could have a
> better way to implement
> it.

Just to make sure we're not talking about different things. This was meant 
to illustrate that even though core layouts often look similar we should not 
try to connect different product board files unnecessarily, as the small 
differences will make everything more complicated.

The "reload" definitly is a completely different product that only shares 
the manufacturer (firefly) and the soc (rk3288) with the other product and 
as I wrote should get its own independent dts file.


If anything you could do a split into a reload-core dtsi for the system-on-
module part and a baseboard dts that includes that (something like what is 
done for rk3288-rock2).



> > Also, when adding a new board, please also add an entry to
> > Documentation/devicetree/bindingd/arm/rockchip.txt
> 
> I would send a patch set in a few days.
> 
> > Thanks
> > Heiko
> 
> Thank you for you review and you patient again

no problem, always nice to have more people play with Rockchip stuff on a 
mainline kernel :-)


Heiko


[PATCH] drm/vc4: remove redundant ret status check

2016-07-09 Thread Colin King
From: Colin Ian King 

At the current point where ret is being checked for non-zero it has
not changed since it was initialized to zero, hence the check and the
label unref are redundant and can be removed.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/vc4/vc4_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index 54d0471..0e4cf27 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -195,8 +195,6 @@ static int vc4_drm_bind(struct device *dev)
vc4_bo_cache_init(drm);
 
drm_mode_config_init(drm);
-   if (ret)
-   goto unref;
 
vc4_gem_init(drm);
 
@@ -218,7 +216,6 @@ unbind_all:
component_unbind_all(dev, drm);
 gem_destroy:
vc4_gem_destroy(drm);
-unref:
drm_dev_unref(drm);
vc4_bo_cache_destroy(drm);
return ret;
-- 
2.8.1



Re: [PATCH v2 0/6] net: ethernet: bgmac: Add platform device support

2016-07-09 Thread David Miller
From: Jon Mason 
Date: Thu,  7 Jul 2016 19:08:52 -0400

> David Miller, Please consider including patches 1-5 in net-next

Done.


Re: [PATCH] Need proper type casting before assignment, Remove compilation Warning.

2016-07-09 Thread David Miller
From: Arvind Yadav 
Date: Fri,  8 Jul 2016 00:07:54 +0530

> -Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
> assigned in ucc_fast_tx_virtual_fifo_base_offset and
> ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'.
> So before assginment need a proper type casting.
> 
> -Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int'
> into a function that takes an 'unsigned long' argument.This happens
> to work because the type is sign-extended on 64-bit architectures
> before it gets converted into an unsigned type.
> 
> -Passing an 'unsigned short' or 'unsigned int'argument into
> IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers
> and types that are wider than 'unsigned long'.
> 
> -Any user will get compilation warning for that do not pass an
> unsigned long' argument.
> 
> Signed-off-by: Arvind Yadav 

Your subject line is improperly formed.

It must have the subsystem or driver name, followed by a colon ":"
and a space.  Such as:

[PATCH] ucc_geth: Need proper type ...



Re: [PATCH net-next 0/3] r8152: remove the redundant code

2016-07-09 Thread David Miller
From: Hayes Wang 
Date: Thu, 7 Jul 2016 15:09:17 +0800

> Remove the unnacessary code.

Series applied.


Minor PKRU bug?

2016-07-09 Thread Andy Lutomirski
is_prefetch in arch/x86/mm/fault.c can be called on a user address
that's not readable due to PKRU.  This could break it.  You might need
to add a get_user_exec or similar.

--Andy


Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-09 Thread Andy Lutomirski
On Jul 6, 2016 6:25 PM, "Kees Cook"  wrote:
>
> Hi,
>
> This is a start of the mainline port of PAX_USERCOPY[1]. After I started
> writing tests (now in lkdtm in -next) for Casey's earlier port[2], I
> kept tweaking things further and further until I ended up with a whole
> new patch series. To that end, I took Rik's feedback and made a number
> of other changes and clean-ups as well.
>

I like the series, but I have one minor nit to pick.  The effect of
this series is to harden usercopy, but most of the code is really
about infrastructure to validate that a pointed-to object is valid.
Might it make sense to call the infrastructure part something else?
After all, this could be extended in the future for memcpy or even for
some GCC plugin to check pointers passed to ordinary (non-allocator)
functions.


Re: linux-next: Tree for Jul 8

2016-07-09 Thread Guenter Roeck
On Fri, Jul 08, 2016 at 06:03:38PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20160707:
> 
> New trees: netfilter and netfilter-next
> 
> The drm-msm tree gained a conflict against the arm tree.
> 
> The block tree gained conflicts against Linus' and the btrfs-kdave trees.
> 
> The userns tree gained a conflict against Linus' tree.
> 
> Non-merge commits (relative to Linus' tree): 7460
>  6931 files changed, 350754 insertions(+), 147233 deletions(-)
> 

Build results:
total: 148 pass: 136 fail: 12
Failed builds:
arc:defconfig
arc:allnoconfig
arc:tb10x_defconfig
arc:axs103_defconfig
arc:nsim_hs_smp_defconfig
arc:vdk_hs38_smp_defconfig
arm:allmodconfig
arm64:allmodconfig
hexagon:defconfig
hexagon:allnoconfig
mips:ath79_defconfig
mips:malta_defconfig

Qemu test results:
total: 107 pass: 95 fail: 12
Failed tests:
arm64:smp:defconfig
arm64:nosmp:defconfig
mips:malta_defconfig:nosmp
mips:malta_defconfig:smp
mips64:malta_defconfig:nosmp
mips64:malta_defconfig:smp
mipsel:malta_defconfig:nosmp
mipsel:malta_defconfig:smp
mipsel64:malta_defconfig:nosmp
mipsel64:malta_defconfig:smp
xtensa:dc233c:ml605:generic_kc705_defconfig
xtensa:dc233c:kc705:generic_kc705_defconfig

Details are available at http://kerneltests.org/builders.

Thanks,
Guenter



[PATCH v1] module: Fully remove the kernel_module_from_file hook

2016-07-09 Thread Mickaël Salaün
Fixes: a1db74209483 ("module: replace copy_module_from_fd with kernel version")

Signed-off-by: Mickaël Salaün 
Cc: Mimi Zohar 
Cc: Kees Cook 
Cc: Luis R. Rodriguez 
Cc: Rusty Russell 
Cc: Linus Torvalds 
Cc: Greg Kroah-Hartman 
---
 include/linux/lsm_hooks.h | 1 -
 include/linux/security.h  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7ae397669d8b..58c777ec8bcf 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1455,7 +1455,6 @@ union security_list_options {
int (*kernel_act_as)(struct cred *new, u32 secid);
int (*kernel_create_files_as)(struct cred *new, struct inode *inode);
int (*kernel_module_request)(char *kmod_name);
-   int (*kernel_module_from_file)(struct file *file);
int (*kernel_read_file)(struct file *file, enum kernel_read_file_id id);
int (*kernel_post_read_file)(struct file *file, char *buf, loff_t size,
 enum kernel_read_file_id id);
diff --git a/include/linux/security.h b/include/linux/security.h
index 14df373ff2ca..2b8c7d2a3fd8 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -307,7 +307,6 @@ void security_transfer_creds(struct cred *new, const struct 
cred *old);
 int security_kernel_act_as(struct cred *new, u32 secid);
 int security_kernel_create_files_as(struct cred *new, struct inode *inode);
 int security_kernel_module_request(char *kmod_name);
-int security_kernel_module_from_file(struct file *file);
 int security_kernel_read_file(struct file *file, enum kernel_read_file_id id);
 int security_kernel_post_read_file(struct file *file, char *buf, loff_t size,
   enum kernel_read_file_id id);
-- 
2.8.1



Re: [Ksummit-discuss] 2016 Kernel Summit Planning Kickoff

2016-07-09 Thread Theodore Ts'o
On Fri, Jul 08, 2016 at 03:06:07PM -0700, Dmitry Torokhov wrote:
> > Last year we had the invite-only session on the 3rd day and what I heard
> > from some people that was considered better. People had a chance to
> > already solve several things upfront and the invite-only day had less
> > issues to discuss. Not sure if this can be changed and if the majority
> > of people agree with that conclusion.
> 
> I think that only worked because Korea Linux Forum preceded KS so we
> had shared talks first. We'd have to swap KS and Plumbers and I'd
> guess it is too late now.

The Korea Linux Forum is a much shorter conference, so holding it
afterwards probably worked better.  With conferences that are longer
and/or more intense, we've gotten complaints from Kernel Summit
attendees that by the time the invite-only day happened at the
tail-end of the week meant that people were pretty brain-fried by
then.  This would have been especially true with the Linux Plumbers
Conference, which throws a very nice party at the very end of the
conference --- with an open bar, no less.  (What this might mean if we
tried to hold the Kernel Summit invite-only day afterwards is left to
imagination of the gentle reader.  :-)

People will be very much encouraged to stay for all of the Plumbers
Conference, and not just because the party at the end of the week.
There's no rule that says we have to make all of our decisions on the
invite-only day.  In fact, it may be good for decisions to be
discussed with the wider LPC community before we make a final
decision.  That's why we'll have spare slots in reserve for people to
schedule topic-specific discussions on Wednesday and Thursday.

Cheers,

- Ted


[PATCH] spi: spi-ti-qspi: clear wlen field while setting word length.

2016-07-09 Thread Prahlad V
When a word length of 1 byte is selected and writing data of length
more than QSPI_WLEN_MAX_BYTES, first MAX_BYTES will be transfered
and remaining will be transfered byte by byte. In that case wlen
field should be cleared before setting.

Signed-off-by: Prahlad V 
---
 drivers/spi/spi-ti-qspi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
index 29ea8d2..6c61f54 100644
--- a/drivers/spi/spi-ti-qspi.c
+++ b/drivers/spi/spi-ti-qspi.c
@@ -276,9 +276,9 @@ static int qspi_write_msg(struct ti_qspi *qspi, struct 
spi_transfer *t,
cmd |= QSPI_WLEN(QSPI_WLEN_MAX_BITS);
} else {
writeb(*txbuf, qspi->base + QSPI_SPI_DATA_REG);
-   cmd = qspi->cmd | QSPI_WR_SNGL;
xfer_len = wlen;
-   cmd |= QSPI_WLEN(wlen);
+   cmd = ((qspi->cmd & ~QSPI_WLEN_MASK) |
+QSPI_WLEN(wlen));
}
break;
case 2:
-- 
2.5.5



Re: [CRIU] Introspecting userns relationships to other namespaces?

2016-07-09 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Andrew Vagin  writes:
>
>> All these thoughts about security make me thinking that kcmp is what we
>> should use here. It's maybe something like this:
>>
>> kcmp(pid1, pid2, KCMP_NS_USERNS, fd1, fd2)
>>
>> - to check if userns of the fd1 namepsace is equal to the fd2 userns
>>
>> kcmp(pid1, pid2, KCMP_NS_PARENT, fd1, fd2)
>>
>> - to check if a parent namespace of the fd1 pidns is equal to fd pidns.
>>
>> fd1 and fd2 is file descriptors to namespace files.
>>
>> So if we want to build a hierarchy, we need to collect all namespaces
>> and then enumerate them to check dependencies with help of kcmp.
>
> That is certainly one way to go.
>
> There is a funny case where we would want to compare a user namespace
> file descriptor to a parent user namespace file descriptor.
>
>
> Grumble, Grumble.  I think this may actually a case for creating ioctls
> for these two cases.  Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.
>
> We also need some way to get a command file descriptor for a file system
> super block.  Al Viro has a pet project for cleaning up the mount API
> and this might be the idea excuse to start looking at that.
>
> (In principle we might be able to run commands through the namespace
>  file descriptor and using an ioctl feels dirty.  But an ioctl that
>  only uses the fd and request argument does not suffer from the same
>  problems that ioctls that have to pass additional arguments suffer
>  from.)

Of course it should be an error perhaps -EINVAL to get a user
namespace owner or parent namespace that is outside of a processes
current user namespace or pid namespace.  That way thing stay bounded
within the current namespaces the process is in.  Which prevents any
leak possibilities, and keeps CRIU working.

Eric


Re: [CRIU] Introspecting userns relationships to other namespaces?

2016-07-09 Thread Eric W. Biederman
Andrew Vagin  writes:

> All these thoughts about security make me thinking that kcmp is what we
> should use here. It's maybe something like this:
>
> kcmp(pid1, pid2, KCMP_NS_USERNS, fd1, fd2)
>
> - to check if userns of the fd1 namepsace is equal to the fd2 userns
>
> kcmp(pid1, pid2, KCMP_NS_PARENT, fd1, fd2)
>
> - to check if a parent namespace of the fd1 pidns is equal to fd pidns.
>
> fd1 and fd2 is file descriptors to namespace files.
>
> So if we want to build a hierarchy, we need to collect all namespaces
> and then enumerate them to check dependencies with help of kcmp.

That is certainly one way to go.

There is a funny case where we would want to compare a user namespace
file descriptor to a parent user namespace file descriptor.


Grumble, Grumble.  I think this may actually a case for creating ioctls
for these two cases.  Now that random nsfs file descriptors are bind
mountable the original reason for using proc files is not as pressing.

One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.

We also need some way to get a command file descriptor for a file system
super block.  Al Viro has a pet project for cleaning up the mount API
and this might be the idea excuse to start looking at that.

(In principle we might be able to run commands through the namespace
 file descriptor and using an ioctl feels dirty.  But an ioctl that
 only uses the fd and request argument does not suffer from the same
 problems that ioctls that have to pass additional arguments suffer
 from.)

Eric


Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.

2016-07-09 Thread kbuild test robot
Hi,

[auto build test WARNING on v4.7-rc6]
[also build test WARNING on next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356
config: x86_64-randconfig-x007-201628 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/mod_devicetable.h:11,
from include/linux/pci.h:20,
from include/linux/bcma/bcma.h:4,
from drivers/bcma/bcma_private.h:8,
from drivers/bcma/scan.c:9:
   drivers/bcma/scan.c: In function 'bcma_get_next_core':
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
   ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~
   include/linux/err.h:23:29: note: in expansion of macro 'unlikely'
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
^~~~
   drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~~~
   include/linux/err.h:23:38: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
 ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~
   include/linux/err.h:23:29: note: in expansion of macro 'unlikely'
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
^~~~
   drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~~~
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
   ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~
   include/linux/err.h:23:29: note: in expansion of macro 'unlikely'
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
^~~~
   drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~~~
   include/linux/err.h:23:38: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
 ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^~
   include/linux/err.h:23:29: note: in expansion of macro 'unlikely'
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
^~~~
   drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 

Re: [tip:x86/debug] printk: Make the printk*once() variants return a value

2016-07-09 Thread Joe Perches
On Sat, 2016-07-09 at 09:50 +0200, Borislav Petkov wrote:
> On Fri, Jul 08, 2016 at 07:40:48PM -0700, Joe Perches wrote:
> > This change isn't described in the commit message and there
> > doesn't seem to be a need to change this.
> How do *you* know? Did *you* actually sit down and build a kernel with
> your proposed change before sending a reply?
> I'm pretty sure you didn't.

defconfigs both with and without CONFIG_PRINTK build
properly with the proposed change to this specific patch.

> Well, there is a very good reason why I made that change but I'm not
> going to tell you.

Borislav, your delightful personality always impresses.
Never change.

If there is a specific reason you know why this 0; value
must be added to a do {} while (0) to statement expression
macro conversion, it'd be good to write that in the
commit message.  It'd also be good to remove the useless
"do {} while (0);" surrounding a single statement.


[tip:x86/urgent] x86/cpu: Fix duplicated X86_BUG(9) macro

2016-07-09 Thread tip-bot for Dave Hansen
Commit-ID:  8709ed4d4b0eab04561c1ec9e6ea50fd1e3897ff
Gitweb: http://git.kernel.org/tip/8709ed4d4b0eab04561c1ec9e6ea50fd1e3897ff
Author: Dave Hansen 
AuthorDate: Fri, 17 Jun 2016 17:15:03 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 9 Jul 2016 14:06:06 +0200

x86/cpu: Fix duplicated X86_BUG(9) macro

cpufeatures.h currently defines X86_BUG(9) twice on 32-bit:

#define X86_BUG_NULL_SEGX86_BUG(9) /* Nulling a selector 
preserves the base */
...
#ifdef CONFIG_X86_32
#define X86_BUG_ESPFIX  X86_BUG(9) /* "" IRET to 16-bit SS 
corrupts ESP/RSP high bits */
#endif

I think what happened was that this added the X86_BUG_ESPFIX, but
in an #ifdef below most of the bugs:

58a5aac53313 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of 
paravirt_enabled

Then this came along and added X86_BUG_NULL_SEG, but collided
with the earlier one that did the bug below the main block
defining all the X86_BUG()s.

7a5d67048745 x86/cpu: Probe the behavior of nulling out a segment at 
boot time

Signed-off-by: Dave Hansen 
Acked-by: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/20160618001503.cee1b...@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4a41348..c64b1e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -301,10 +301,6 @@
 #define X86_BUG_FXSAVE_LEAKX86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
 #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required 
before MONITOR */
 #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up SS 
attrs */
-#define X86_BUG_NULL_SEG   X86_BUG(9) /* Nulling a selector preserves the 
base */
-#define X86_BUG_SWAPGS_FENCE   X86_BUG(10) /* SWAPGS without input dep on GS */
-
-
 #ifdef CONFIG_X86_32
 /*
  * 64-bit kernels don't use X86_BUG_ESPFIX.  Make the define conditional
@@ -312,5 +308,7 @@
  */
 #define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts 
ESP/RSP high bits */
 #endif
+#define X86_BUG_NULL_SEG   X86_BUG(10) /* Nulling a selector preserves the 
base */
+#define X86_BUG_SWAPGS_FENCE   X86_BUG(11) /* SWAPGS without input dep on GS */
 
 #endif /* _ASM_X86_CPUFEATURES_H */


[tip:x86/platform] x86/platform/intel-mid: Rename mrfl.c to mrfld.c

2016-07-09 Thread tip-bot for Andy Shevchenko
Commit-ID:  62d855d3e725f4e4b0d2786f7cad3f0660a03a59
Gitweb: http://git.kernel.org/tip/62d855d3e725f4e4b0d2786f7cad3f0660a03a59
Author: Andy Shevchenko 
AuthorDate: Sat, 18 Jun 2016 18:51:34 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 9 Jul 2016 14:02:09 +0200

x86/platform/intel-mid: Rename mrfl.c to mrfld.c

Use mrfld as an abbreviation of Merrifield to be consistent with the rest of
the code.

In the future we are going to add more files here prefixed with 'mrfld'.

Signed-off-by: Andy Shevchenko 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1466265094-146113-1-git-send-email-andriy.shevche...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/intel-mid/Makefile| 2 +-
 arch/x86/platform/intel-mid/{mrfl.c => mrfld.c} | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/intel-mid/Makefile 
b/arch/x86/platform/intel-mid/Makefile
index aebb5b9..fa021df 100644
--- a/arch/x86/platform/intel-mid/Makefile
+++ b/arch/x86/platform/intel-mid/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o mfld.o mrfl.o pwr.o
+obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o mfld.o mrfld.o 
pwr.o
 
 # SFI specific code
 ifdef CONFIG_X86_INTEL_MID
diff --git a/arch/x86/platform/intel-mid/mrfl.c 
b/arch/x86/platform/intel-mid/mrfld.c
similarity index 97%
rename from arch/x86/platform/intel-mid/mrfl.c
rename to arch/x86/platform/intel-mid/mrfld.c
index bd1adc6..59253db 100644
--- a/arch/x86/platform/intel-mid/mrfl.c
+++ b/arch/x86/platform/intel-mid/mrfld.c
@@ -1,5 +1,5 @@
 /*
- * mrfl.c: Intel Merrifield platform specific setup code
+ * Intel Merrifield platform specific setup code
  *
  * (C) Copyright 2013 Intel Corporation
  *


[tip:sched/core] sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums

2016-07-09 Thread tip-bot for Zhao Lei
Commit-ID:  9acacc2ac525ef1397af63b15cef7bb77a823c06
Gitweb: http://git.kernel.org/tip/9acacc2ac525ef1397af63b15cef7bb77a823c06
Author: Zhao Lei 
AuthorDate: Mon, 20 Jun 2016 17:37:18 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 9 Jul 2016 13:56:15 +0200

sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums

These two types have similar function, no need to separate them.

Signed-off-by: Zhao Lei 
Cc: KOSAKI Motohiro 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhao...@cn.fujitsu.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpuacct.c | 47 ---
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 41f85c4..74241eb 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -25,15 +25,13 @@ enum cpuacct_stat_index {
CPUACCT_STAT_NSTATS,
 };
 
-enum cpuacct_usage_index {
-   CPUACCT_USAGE_USER, /* ... user mode */
-   CPUACCT_USAGE_SYSTEM,   /* ... kernel mode */
-
-   CPUACCT_USAGE_NRUSAGE,
+static const char * const cpuacct_stat_desc[] = {
+   [CPUACCT_STAT_USER] = "user",
+   [CPUACCT_STAT_SYSTEM] = "system",
 };
 
 struct cpuacct_usage {
-   u64 usages[CPUACCT_USAGE_NRUSAGE];
+   u64 usages[CPUACCT_STAT_NSTATS];
 };
 
 /* track cpu usage of a group of tasks and its child groups */
@@ -108,16 +106,16 @@ static void cpuacct_css_free(struct cgroup_subsys_state 
*css)
 }
 
 static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
-enum cpuacct_usage_index index)
+enum cpuacct_stat_index index)
 {
struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
u64 data;
 
/*
-* We allow index == CPUACCT_USAGE_NRUSAGE here to read
+* We allow index == CPUACCT_STAT_NSTATS here to read
 * the sum of suages.
 */
-   BUG_ON(index > CPUACCT_USAGE_NRUSAGE);
+   BUG_ON(index > CPUACCT_STAT_NSTATS);
 
 #ifndef CONFIG_64BIT
/*
@@ -126,11 +124,11 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int 
cpu,
raw_spin_lock_irq(&cpu_rq(cpu)->lock);
 #endif
 
-   if (index == CPUACCT_USAGE_NRUSAGE) {
+   if (index == CPUACCT_STAT_NSTATS) {
int i = 0;
 
data = 0;
-   for (i = 0; i < CPUACCT_USAGE_NRUSAGE; i++)
+   for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
data += cpuusage->usages[i];
} else {
data = cpuusage->usages[index];
@@ -155,7 +153,7 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int 
cpu, u64 val)
raw_spin_lock_irq(&cpu_rq(cpu)->lock);
 #endif
 
-   for (i = 0; i < CPUACCT_USAGE_NRUSAGE; i++)
+   for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
cpuusage->usages[i] = val;
 
 #ifndef CONFIG_64BIT
@@ -165,7 +163,7 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int 
cpu, u64 val)
 
 /* return total cpu usage (in nanoseconds) of a group */
 static u64 __cpuusage_read(struct cgroup_subsys_state *css,
-  enum cpuacct_usage_index index)
+  enum cpuacct_stat_index index)
 {
struct cpuacct *ca = css_ca(css);
u64 totalcpuusage = 0;
@@ -180,18 +178,18 @@ static u64 __cpuusage_read(struct cgroup_subsys_state 
*css,
 static u64 cpuusage_user_read(struct cgroup_subsys_state *css,
  struct cftype *cft)
 {
-   return __cpuusage_read(css, CPUACCT_USAGE_USER);
+   return __cpuusage_read(css, CPUACCT_STAT_USER);
 }
 
 static u64 cpuusage_sys_read(struct cgroup_subsys_state *css,
 struct cftype *cft)
 {
-   return __cpuusage_read(css, CPUACCT_USAGE_SYSTEM);
+   return __cpuusage_read(css, CPUACCT_STAT_SYSTEM);
 }
 
 static u64 cpuusage_read(struct cgroup_subsys_state *css, struct cftype *cft)
 {
-   return __cpuusage_read(css, CPUACCT_USAGE_NRUSAGE);
+   return __cpuusage_read(css, CPUACCT_STAT_NSTATS);
 }
 
 static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -213,7 +211,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, 
struct cftype *cft,
 }
 
 static int __cpuacct_percpu_seq_show(struct seq_file *m,
-enum cpuacct_usage_index index)
+enum cpuacct_stat_index index)
 {
struct cpuacct *ca = css_ca(seq_css(m));
u64 percpu;
@@ -229,24 +227,19 @@ static int __cpuacct_percpu_seq_show(struct seq_file *m,
 
 static int cpuacct_percpu_user_seq_show(struct seq_file *m, void *V)
 {
-   return __cpuacct_percpu_seq_show(m, CPUACCT_USAGE_USER);
+   return __cpuacct_percpu_seq_show(m, CPUACCT_STAT_USER);
 }
 
 static int cpuacct_percpu_sys_seq_show(struct seq_file *m, v

[tip:sched/core] sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()

2016-07-09 Thread tip-bot for Zhao Lei
Commit-ID:  8e546bfafb3121ed25c73a0c02311ec58459344a
Gitweb: http://git.kernel.org/tip/8e546bfafb3121ed25c73a0c02311ec58459344a
Author: Zhao Lei 
AuthorDate: Mon, 20 Jun 2016 17:37:19 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 9 Jul 2016 13:56:15 +0200

sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()

In cpuacct_stats_show() we currently we have copies of similar code,
for each cpustat(system/user) variant.

Use a loop instead to consolidate the code. This will also work better
if we extend the CPUACCT_STAT_NSTATS type.

Signed-off-by: Zhao Lei 
Cc: KOSAKI Motohiro 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhao...@cn.fujitsu.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpuacct.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 74241eb..677cd1a 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -243,27 +243,26 @@ static int cpuacct_percpu_seq_show(struct seq_file *m, 
void *V)
 static int cpuacct_stats_show(struct seq_file *sf, void *v)
 {
struct cpuacct *ca = css_ca(seq_css(sf));
+   s64 val[CPUACCT_STAT_NSTATS];
int cpu;
-   s64 val = 0;
+   int stat;
 
+   memset(val, 0, sizeof(val));
for_each_possible_cpu(cpu) {
-   struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu);
-   val += kcpustat->cpustat[CPUTIME_USER];
-   val += kcpustat->cpustat[CPUTIME_NICE];
-   }
-   val = cputime64_to_clock_t(val);
-   seq_printf(sf, "%s %lld\n", cpuacct_stat_desc[CPUACCT_STAT_USER], val);
+   u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat;
 
-   val = 0;
-   for_each_possible_cpu(cpu) {
-   struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu);
-   val += kcpustat->cpustat[CPUTIME_SYSTEM];
-   val += kcpustat->cpustat[CPUTIME_IRQ];
-   val += kcpustat->cpustat[CPUTIME_SOFTIRQ];
+   val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_USER];
+   val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_NICE];
+   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM];
+   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ];
+   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ];
}
 
-   val = cputime64_to_clock_t(val);
-   seq_printf(sf, "%s %lld\n", cpuacct_stat_desc[CPUACCT_STAT_SYSTEM], 
val);
+   for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) {
+   seq_printf(sf, "%s %lld\n",
+  cpuacct_stat_desc[stat],
+  cputime64_to_clock_t(val[stat]));
+   }
 
return 0;
 }


[tip:sched/core] sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together

2016-07-09 Thread tip-bot for Zhao Lei
Commit-ID:  277a13e4f0d661678a7084bf97ed96a99c7dac21
Gitweb: http://git.kernel.org/tip/277a13e4f0d661678a7084bf97ed96a99c7dac21
Author: Zhao Lei 
AuthorDate: Mon, 20 Jun 2016 17:37:20 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 9 Jul 2016 13:56:15 +0200

sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together

In current code, we can get cpuacct data from several files,
but each file has various limitations.

For example:

 - We can get CPU usage in user and kernel mode via cpuacct.stat,
   but we can't get detailed data about each CPU.

 - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys,
   but we can't get user mode usage data at the same time.

This patch introduces cpuacct.usage_all, to show all detailed CPU
accounting data together:

 # cat cpuacct.usage_all
 cpu user system
 0 3809760299 5807968992
 1 3250329855 454612211
 ..

Signed-off-by: Zhao Lei 
Cc: KOSAKI Motohiro 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhao...@cn.fujitsu.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpuacct.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 677cd1a..bc0b309c 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -240,6 +240,42 @@ static int cpuacct_percpu_seq_show(struct seq_file *m, 
void *V)
return __cpuacct_percpu_seq_show(m, CPUACCT_STAT_NSTATS);
 }
 
+static int cpuacct_all_seq_show(struct seq_file *m, void *V)
+{
+   struct cpuacct *ca = css_ca(seq_css(m));
+   int index;
+   int cpu;
+
+   seq_puts(m, "cpu");
+   for (index = 0; index < CPUACCT_STAT_NSTATS; index++)
+   seq_printf(m, " %s", cpuacct_stat_desc[index]);
+   seq_puts(m, "\n");
+
+   for_each_possible_cpu(cpu) {
+   struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
+
+   seq_printf(m, "%d", cpu);
+
+   for (index = 0; index < CPUACCT_STAT_NSTATS; index++) {
+#ifndef CONFIG_64BIT
+   /*
+* Take rq->lock to make 64-bit read safe on 32-bit
+* platforms.
+*/
+   raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+#endif
+
+   seq_printf(m, " %llu", cpuusage->usages[index]);
+
+#ifndef CONFIG_64BIT
+   raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+#endif
+   }
+   seq_puts(m, "\n");
+   }
+   return 0;
+}
+
 static int cpuacct_stats_show(struct seq_file *sf, void *v)
 {
struct cpuacct *ca = css_ca(seq_css(sf));
@@ -294,6 +330,10 @@ static struct cftype files[] = {
.seq_show = cpuacct_percpu_sys_seq_show,
},
{
+   .name = "usage_all",
+   .seq_show = cpuacct_all_seq_show,
+   },
+   {
.name = "stat",
.seq_show = cpuacct_stats_show,
},


Missing include file in include/uapi/linux/errqueue.h?

2016-07-09 Thread Brooks Moses
Hello!

I've been attempting to qualify the Linux 4.5.2 user-space headers for
a toolchain release, and ran into what looks like a missing include
file in include/uapi/linux/errqueue.h.  In particular,
https://github.com/torvalds/linux/commit/f24b9be5957b38bb420b838115040dc2031b7d0c
adds the following to this file:

+struct scm_timestamping {
+ struct timespec ts[3];
+};

However, struct timespec is defined in time.h, which isn't included
either in 4.5.2 or in current head.  Is this simply a missing #include
line, or am I misunderstanding something?

I also note that this is the second user-space header in the Linux
4.5.2 release we've run into that simply fails to compile when
included by itself.  Is there not a test target that tests for this?
Would it be welcome if I were to work on adding one?

Thanks,
- Brooks


[PATCH] mm: gup: Re-define follow_page_mask output parameter page_mask usage

2016-07-09 Thread chengang
From: Chen Gang 

For a pure output parameter:

 - When callee fails, the caller should not assume the output parameter
   is still valid.

 - And callee should not assume the pure output parameter must be
   provided by caller -- caller has right to pass NULL when caller does
   not care about it.

Signed-off-by: Chen Gang 
---
 include/linux/mm.h | 5 ++---
 mm/gup.c   | 6 +++---
 mm/mlock.c | 2 +-
 mm/nommu.c | 1 -
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b21e5f3..5c560fd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2205,10 +2205,9 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
  unsigned int *page_mask);
 
 static inline struct page *follow_page(struct vm_area_struct *vma,
-   unsigned long address, unsigned int foll_flags)
+   unsigned long address, unsigned int foll_flags)
 {
-   unsigned int unused_page_mask;
-   return follow_page_mask(vma, address, foll_flags, &unused_page_mask);
+   return follow_page_mask(vma, address, foll_flags, NULL);
 }
 
 #define FOLL_WRITE 0x01/* check pte is writable */
diff --git a/mm/gup.c b/mm/gup.c
index 96b2b2f..9684b06 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -222,8 +222,6 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
struct page *page;
struct mm_struct *mm = vma->vm_mm;
 
-   *page_mask = 0;
-
page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
if (!IS_ERR(page)) {
BUG_ON(flags & FOLL_GET);
@@ -298,7 +296,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 
page = follow_trans_huge_pmd(vma, address, pmd, flags);
spin_unlock(ptl);
-   *page_mask = HPAGE_PMD_NR - 1;
+   if (page_mask)
+   *page_mask = HPAGE_PMD_NR - 1;
return page;
 }
 
@@ -574,6 +573,7 @@ retry:
if (unlikely(fatal_signal_pending(current)))
return i ? i : -ERESTARTSYS;
cond_resched();
+   page_mask = 0;
page = follow_page_mask(vma, start, foll_flags, &page_mask);
if (!page) {
int ret;
diff --git a/mm/mlock.c b/mm/mlock.c
index ef8dc9f..626eb58 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -438,7 +438,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 
while (start < end) {
struct page *page;
-   unsigned int page_mask;
+   unsigned int page_mask = 0;
unsigned long page_increm;
struct pagevec pvec;
struct zone *zone;
diff --git a/mm/nommu.c b/mm/nommu.c
index 95daf81..c1a0a89 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1749,7 +1749,6 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
  unsigned long address, unsigned int flags,
  unsigned int *page_mask)
 {
-   *page_mask = 0;
return NULL;
 }
 
-- 
1.9.3



Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.

2016-07-09 Thread Mario Kleiner

Hi Eric,

thanks for all the infos and help! Both your patches look good and i 
have successfully tested them on top of with my vblank timestamping patch.


So for both:

Reviewed-and-tested-by: Mario Kleiner 

Will you squash 2/2 into my patch or should i resend my patch with yours 
squashed in?


thanks,
-mario

On 07/08/2016 08:44 PM, Eric Anholt wrote:

Read out the DISPBASE registers to decide on the FIFO size.

Signed-off-by: Eric Anholt 
---

Mario: How about this for a squash into your commit?  Here are the
values I dumped for cob_size:

[2.148314] [drm] Scaler 0 size 5232
[2.162239] [drm] Scaler 2 size 2048
[2.172957] [drm] Scaler 1 size 13456

  drivers/gpu/drm/vc4/vc4_crtc.c | 23 +--
  drivers/gpu/drm/vc4/vc4_regs.h | 18 +-
  2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index baf962bce063..3b7db17c356d 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -55,6 +55,8 @@ struct vc4_crtc {
u8 lut_r[256];
u8 lut_g[256];
u8 lut_b[256];
+   /* Size in pixels of the COB memory allocated to this CRTC. */
+   u32 cob_size;

struct drm_pending_vblank_event *event;
  };
@@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, 
unsigned int crtc_id,
*hpos = 0;

/* This is the offset we need for translating hvs -> pv scanout pos. */
-   /* XXX Find proper formula from hw docs instead of guesstimating? */
-   fifo_lines = 2048 * 7 / mode->crtc_hdisplay;
+   fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay;

if (fifo_lines > 0)
ret |= DRM_SCANOUTPOS_VALID;
@@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device 
*drm,
}
  }

+static void
+vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc)
+{
+   struct drm_device *drm = vc4_crtc->base.dev;
+   struct vc4_dev *vc4 = to_vc4_dev(drm);
+   u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel));
+   /* Top/base are supposed to be 4-pixel aligned, but the
+* Raspberry Pi firmware fills the low bits (which are
+* presumably ignored).
+*/
+   u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3;
+   u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3;
+
+   vc4_crtc->cob_size = top - base + 4;
+}
+
  static int vc4_crtc_bind(struct device *dev, struct device *master, void 
*data)
  {
struct platform_device *pdev = to_platform_device(dev);
@@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device 
*master, void *data)
crtc->cursor = cursor_plane;
}

+   vc4_crtc_get_cob_allocation(vc4_crtc);
+
CRTC_WRITE(PV_INTEN, 0);
CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START);
ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h
index 63cdc28ff7bb..160942a9180e 100644
--- a/drivers/gpu/drm/vc4/vc4_regs.h
+++ b/drivers/gpu/drm/vc4/vc4_regs.h
@@ -366,7 +366,6 @@
  # define SCALER_DISPBKGND_FILLBIT(24)

  #define SCALER_DISPSTAT00x0048
-#define SCALER_DISPBASE00x004c
  # define SCALER_DISPSTATX_MODE_MASK   VC4_MASK(31, 30)
  # define SCALER_DISPSTATX_MODE_SHIFT  30
  # define SCALER_DISPSTATX_MODE_DISABLED   0
@@ -379,6 +378,20 @@
  # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT   12
  # define SCALER_DISPSTATX_LINE_MASK   VC4_MASK(11, 0)
  # define SCALER_DISPSTATX_LINE_SHIFT  0
+
+#define SCALER_DISPBASE00x004c
+/* Last pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned (and thus 4 pixels less than the
+ * next COB base).
+ */
+# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16)
+# define SCALER_DISPBASEX_TOP_SHIFT16
+/* First pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned.
+ */
+# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0)
+# define SCALER_DISPBASEX_BASE_SHIFT   0
+
  #define SCALER_DISPCTRL10x0050
  #define SCALER_DISPBKGND1   0x0054
  #define SCALER_DISPBKGNDX(x)  (SCALER_DISPBKGND0 +\
@@ -389,6 +402,9 @@
 (x) * (SCALER_DISPSTAT1 - \
SCALER_DISPSTAT0))
  #define SCALER_DISPBASE10x005c
+#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\
+(x) * (SCALER_DISPBASE1 - \
+   SCALER_DISPBASE0))
  #define SCALER_DISPCTRL2

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-09 Thread Kees Cook
On Fri, Jul 8, 2016 at 11:17 PM,   wrote:
> Yeah, 'ping' dies with a similar traceback going to rawv6_setsockopt(),
> and 'trinity' dies a horrid death during initialization because it creates
> some sctp sockets to fool around with.  The problem in all these cases is that
> setsockopt uses copy_from_user() to pull in the option value, and the 
> allocation
> isn't tagged with USERCOPY to whitelist it.

Just a note to clear up confusion: this series doesn't include the
whitelist protection, so this appears to be either bugs in the slub
checker or bugs in the code using the cfq_io_cq cache. I suspect the
former. :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.

2016-07-09 Thread arvind Yadav

Hi,
I have summited one more version-2 patch. Please test on that. please 
share your result with us.


Thanks,
Arvind yadav

On Saturday 09 July 2016 10:08 PM, kbuild test robot wrote:

Hi,

[auto build test WARNING on v4.7-rc6]
[also build test WARNING on next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356
config: x86_64-rhel (attached as .config)
compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3
reproduce:
 # save the attached .config to linux build tree
 make ARCH=x86_64

All warnings (new ones prefixed by >>):

In file included from include/uapi/linux/stddef.h:1:0,
 from include/linux/stddef.h:4,
 from include/uapi/linux/posix_types.h:4,
 from include/uapi/linux/types.h:13,
 from include/linux/types.h:5,
 from include/linux/mod_devicetable.h:11,
 from include/linux/pci.h:20,
 from include/linux/bcma/bcma.h:4,
 from drivers/bcma/bcma_private.h:8,
 from drivers/bcma/scan.c:9:
drivers/bcma/scan.c: In function 'bcma_get_next_core':
include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^

drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'

  if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
  ^

include/linux/err.h:23:38: warning: cast from pointer to integer of different 
size [-Wpointer-to-int-cast]

 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
  ^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^

drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'

  if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
  ^
include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^
drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32'
   if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
   ^

include/linux/err.h:23:38: warning: cast from pointer to integer of different 
size [-Wpointer-to-int-cast]

 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
  ^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^
drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32'
   if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
   ^
include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^
drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32'
if (IS_ERR_VALUE_U32(tmp)) {
^

include/linux/err.h:23:38: warning: cast from pointer to integer of different 
size [-Wpointer-to-int-cast]

 #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= 
(unsigned int)-MAX_ERRNO)
  ^
include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
  ^
drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32'
if (IS_ERR_VALUE_U32(tmp)) {
   

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-09 Thread Kees Cook
On Sat, Jul 9, 2016 at 1:25 AM, Ard Biesheuvel
 wrote:
> On 9 July 2016 at 04:22, Laura Abbott  wrote:
>> On 07/06/2016 03:25 PM, Kees Cook wrote:
>>>
>>> Hi,
>>>
>>> This is a start of the mainline port of PAX_USERCOPY[1]. After I started
>>> writing tests (now in lkdtm in -next) for Casey's earlier port[2], I
>>> kept tweaking things further and further until I ended up with a whole
>>> new patch series. To that end, I took Rik's feedback and made a number
>>> of other changes and clean-ups as well.
>>>
>>> Based on my understanding, PAX_USERCOPY was designed to catch a few
>>> classes of flaws around the use of copy_to_user()/copy_from_user(). These
>>> changes don't touch get_user() and put_user(), since these operate on
>>> constant sized lengths, and tend to be much less vulnerable. There
>>> are effectively three distinct protections in the whole series,
>>> each of which I've given a separate CONFIG, though this patch set is
>>> only the first of the three intended protections. (Generally speaking,
>>> PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY (this) and
>>> CONFIG_HARDENED_USERCOPY_WHITELIST (future), and PAX_USERCOPY_SLABS covers
>>> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC (future).)
>>>
>>> This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects
>>> being copied to/from userspace meet certain criteria:
>>> - if address is a heap object, the size must not exceed the object's
>>>   allocated size. (This will catch all kinds of heap overflow flaws.)
>>> - if address range is in the current process stack, it must be within the
>>>   current stack frame (if such checking is possible) or at least entirely
>>>   within the current process's stack. (This could catch large lengths that
>>>   would have extended beyond the current process stack, or overflows if
>>>   their length extends back into the original stack.)
>>> - if the address range is part of kernel data, rodata, or bss, allow it.
>>> - if address range is page-allocated, that it doesn't span multiple
>>>   allocations.
>>> - if address is within the kernel text, reject it.
>>> - everything else is accepted
>>>
>>> The patches in the series are:
>>> - The core copy_to/from_user() checks, without the slab object checks:
>>> 1- mm: Hardened usercopy
>>> - Per-arch enablement of the protection:
>>> 2- x86/uaccess: Enable hardened usercopy
>>> 3- ARM: uaccess: Enable hardened usercopy
>>> 4- arm64/uaccess: Enable hardened usercopy
>>> 5- ia64/uaccess: Enable hardened usercopy
>>> 6- powerpc/uaccess: Enable hardened usercopy
>>> 7- sparc/uaccess: Enable hardened usercopy
>>> - The heap allocator implementation of object size checking:
>>> 8- mm: SLAB hardened usercopy support
>>> 9- mm: SLUB hardened usercopy support
>>>
>>> Some notes:
>>>
>>> - This is expected to apply on top of -next which contains fixes for the
>>>   position of _etext on both arm and arm64.
>>>
>>> - I couldn't detect a measurable performance change with these features
>>>   enabled. Kernel build times were unchanged, hackbench was unchanged,
>>>   etc. I think we could flip this to "on by default" at some point.
>>>
>>> - The SLOB support extracted from grsecurity seems entirely broken. I
>>>   have no idea what's going on there, I spent my time testing SLAB and
>>>   SLUB. Having someone else look at SLOB would be nice, but this series
>>>   doesn't depend on it.
>>>
>>> Additional features that would be nice, but aren't blocking this series:
>>>
>>> - Needs more architecture support for stack frame checking (only x86 now).
>>>
>>>
>>
>> Even with the SLUB fixup I'm still seeing this blow up on my arm64 system.
>> This is a
>> Fedora rawhide kernel + the patches
>>
>> [ 0.666700] usercopy: kernel memory exposure attempt detected from
>> fc0008b4dd58 () (8 bytes)
>> [ 0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted: GW
>> 4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1
>> [ 0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Nov 24
>> 2015
>> [ 0.666744] Call trace:
>> [ 0.666756] [] dump_backtrace+0x0/0x1e8
>> [ 0.666765] [] show_stack+0x24/0x30
>> [ 0.666775] [] dump_stack+0xa4/0xe0
>> [ 0.666785] [] __check_object_size+0x6c/0x230
>> [ 0.666795] [] create_elf_tables+0x74/0x420
>> [ 0.666805] [] load_elf_binary+0x828/0xb70
>> [ 0.666814] [] search_binary_handler+0xb4/0x240
>> [ 0.666823] [] do_execveat_common+0x63c/0x950
>> [ 0.666832] [] do_execve+0x3c/0x50
>> [ 0.666841] [] call_usermodehelper_exec_async+0xe8/0x148
>> [ 0.666850] [] ret_from_fork+0x10/0x50
>>
>> This happens on every call to execve. This seems to be the first
>> copy_to_user in
>> create_elf_tables. I didn't get a chance to debug and I'm going out of town
>> all of next week so all I have is the report unfortunately. config attached.
>>
>
> This is a known issue, and a fix is already queued for v4.8 in the arm64 tree:
>
> 9fdc14c55c arm64: mm: fix location of _etext [0]
>

[v2] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.

2016-07-09 Thread Arvind Yadav
IS_ERR_VALUE() assumes that its parameter is an unsigned long.
It can not be used to check if an 'unsigned int' reflects an error.
As they pass an 'unsigned int' into a function that takes an
'unsigned long' argument. This happens to work because the type
is sign-extended on 64-bit architectures before it gets converted
into an unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

It would be nice to any users that are not passing 'unsigned int'
arguments.

Signed-off-by: Arvind Yadav 
---
 drivers/bcma/scan.c | 1 -
 include/linux/err.h | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/bcma/scan.c b/drivers/bcma/scan.c
index 4a2d1b2..3bc77eb 100644
--- a/drivers/bcma/scan.c
+++ b/drivers/bcma/scan.c
@@ -272,7 +272,6 @@ static struct bcma_device *bcma_find_core_reverse(struct 
bcma_bus *bus, u16 core
return NULL;
 }
 
-#define IS_ERR_VALUE_U32(x) ((x) >= (u32)-MAX_ERRNO)
 
 static int bcma_get_next_core(struct bcma_bus *bus, u32 __iomem **eromptr,
  struct bcma_device_id *match, int core_num,
diff --git a/include/linux/err.h b/include/linux/err.h
index 1e35588..e05a63d 100644
--- a/include/linux/err.h
+++ b/include/linux/err.h
@@ -20,6 +20,8 @@
 
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
 
+#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(x) >= (unsigned 
int)-MAX_ERRNO)
+
 static inline void * __must_check ERR_PTR(long error)
 {
return (void *) error;
-- 
1.9.1



Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-09 Thread Kees Cook
On Fri, Jul 8, 2016 at 7:22 PM, Laura Abbott  wrote:
> On 07/06/2016 03:25 PM, Kees Cook wrote:
>>
>> Hi,
>>
>> This is a start of the mainline port of PAX_USERCOPY[1]. After I started
>> writing tests (now in lkdtm in -next) for Casey's earlier port[2], I
>> kept tweaking things further and further until I ended up with a whole
>> new patch series. To that end, I took Rik's feedback and made a number
>> of other changes and clean-ups as well.
>>
>> Based on my understanding, PAX_USERCOPY was designed to catch a few
>> classes of flaws around the use of copy_to_user()/copy_from_user(). These
>> changes don't touch get_user() and put_user(), since these operate on
>> constant sized lengths, and tend to be much less vulnerable. There
>> are effectively three distinct protections in the whole series,
>> each of which I've given a separate CONFIG, though this patch set is
>> only the first of the three intended protections. (Generally speaking,
>> PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY (this) and
>> CONFIG_HARDENED_USERCOPY_WHITELIST (future), and PAX_USERCOPY_SLABS covers
>> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC (future).)
>>
>> This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects
>> being copied to/from userspace meet certain criteria:
>> - if address is a heap object, the size must not exceed the object's
>>   allocated size. (This will catch all kinds of heap overflow flaws.)
>> - if address range is in the current process stack, it must be within the
>>   current stack frame (if such checking is possible) or at least entirely
>>   within the current process's stack. (This could catch large lengths that
>>   would have extended beyond the current process stack, or overflows if
>>   their length extends back into the original stack.)
>> - if the address range is part of kernel data, rodata, or bss, allow it.
>> - if address range is page-allocated, that it doesn't span multiple
>>   allocations.
>> - if address is within the kernel text, reject it.
>> - everything else is accepted
>>
>> The patches in the series are:
>> - The core copy_to/from_user() checks, without the slab object checks:
>> 1- mm: Hardened usercopy
>> - Per-arch enablement of the protection:
>> 2- x86/uaccess: Enable hardened usercopy
>> 3- ARM: uaccess: Enable hardened usercopy
>> 4- arm64/uaccess: Enable hardened usercopy
>> 5- ia64/uaccess: Enable hardened usercopy
>> 6- powerpc/uaccess: Enable hardened usercopy
>> 7- sparc/uaccess: Enable hardened usercopy
>> - The heap allocator implementation of object size checking:
>> 8- mm: SLAB hardened usercopy support
>> 9- mm: SLUB hardened usercopy support
>>
>> Some notes:
>>
>> - This is expected to apply on top of -next which contains fixes for the
>>   position of _etext on both arm and arm64.
>>
>> - I couldn't detect a measurable performance change with these features
>>   enabled. Kernel build times were unchanged, hackbench was unchanged,
>>   etc. I think we could flip this to "on by default" at some point.
>>
>> - The SLOB support extracted from grsecurity seems entirely broken. I
>>   have no idea what's going on there, I spent my time testing SLAB and
>>   SLUB. Having someone else look at SLOB would be nice, but this series
>>   doesn't depend on it.
>>
>> Additional features that would be nice, but aren't blocking this series:
>>
>> - Needs more architecture support for stack frame checking (only x86 now).
>>
>>
>
> Even with the SLUB fixup I'm still seeing this blow up on my arm64 system.
> This is a
> Fedora rawhide kernel + the patches

Is this on top of -next? The recent _etext change ("arm64: mm: fix
location of _etext") is needed to fix the kernel text test for arm64.

-Kees

>
> [0.666700] usercopy: kernel memory exposure attempt detected from
> fc0008b4dd58 () (8 bytes)
> [0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted: GW
> 4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1
> [0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Nov
> 24 2015
> [0.666744] Call trace:
> [0.666756] [] dump_backtrace+0x0/0x1e8
> [0.666765] [] show_stack+0x24/0x30
> [0.666775] [] dump_stack+0xa4/0xe0
> [0.666785] [] __check_object_size+0x6c/0x230
> [0.666795] [] create_elf_tables+0x74/0x420
> [0.666805] [] load_elf_binary+0x828/0xb70
> [0.666814] [] search_binary_handler+0xb4/0x240
> [0.666823] [] do_execveat_common+0x63c/0x950
> [0.666832] [] do_execve+0x3c/0x50
> [0.666841] []
> call_usermodehelper_exec_async+0xe8/0x148
> [0.666850] [] ret_from_fork+0x10/0x50
>
> This happens on every call to execve. This seems to be the first
> copy_to_user in
> create_elf_tables. I didn't get a chance to debug and I'm going out of town
> all of next week so all I have is the report unfortunately. config attached.
>
> Thanks,
> Laura



-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH] kbuild: Abort build on bad stack protector flag

2016-07-09 Thread Kees Cook
On Sat, Jul 9, 2016 at 5:03 AM, Ingo Molnar  wrote:
>
> * Kees Cook  wrote:
>
>> Before, the stack protector flag was sanity checked before .config had
>> been reprocessed. This meant the build couldn't be aborted early, and
>> only a warning could be emitted followed later by the compiler blowing
>> up with an unknown flag. This has caused a lot of confusion over time,
>> so this splits the flag selection from sanity checking and performs the
>> sanity checking after the make has been restarted from a reprocessed
>> .config, so builds can be aborted as early as possible now.
>>
>> Additionally moves the x86-specific sanity check to the same location,
>> since it suffered from the same warn-then-wait-for-compiler-failure
>> problem.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  Makefile  | 69 
>> +--
>>  arch/x86/Makefile |  8 ---
>>  2 files changed, 42 insertions(+), 35 deletions(-)
>
> What's the status of this patch? I can merge it if Michal acks the main 
> Makefile
> bits.

There's been no feedback yet, but I'd really like to see it landed: it
removes a lot of ambiguity for this option (and creates a place for
future similar options).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.

2016-07-09 Thread kbuild test robot
Hi,

[auto build test WARNING on v4.7-rc6]
[also build test WARNING on next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356
config: x86_64-rhel (attached as .config)
compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/mod_devicetable.h:11,
from include/linux/pci.h:20,
from include/linux/bcma/bcma.h:4,
from drivers/bcma/bcma_private.h:8,
from drivers/bcma/scan.c:9:
   drivers/bcma/scan.c: In function 'bcma_get_next_core':
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
   ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
>> drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^
>> include/linux/err.h:23:38: warning: cast from pointer to integer of 
>> different size [-Wpointer-to-int-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
 ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
>> drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
 if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
 ^
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
   ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32'
  if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
  ^
>> include/linux/err.h:23:38: warning: cast from pointer to integer of 
>> different size [-Wpointer-to-int-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
 ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32'
  if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) {
  ^
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
   ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32'
   if (IS_ERR_VALUE_U32(tmp)) {
   ^
>> include/linux/err.h:23:38: warning: cast from pointer to integer of 
>> different size [-Wpointer-to-int-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
 ^
   include/linux/compiler.h:170:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32'
   if (IS_ERR_VALUE_U32(tmp)) {
   ^
   include/linux/err.h:23:52: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) &g

Re: [PATCH] capabilities: add capability cgroup controller

2016-07-09 Thread Topi Miettinen
On 07/08/16 09:13, Petr Mladek wrote:
> On Thu 2016-07-07 20:27:13, Topi Miettinen wrote:
>> On 07/07/16 09:16, Petr Mladek wrote:
>>> On Sun 2016-07-03 15:08:07, Topi Miettinen wrote:
 The attached patch would make any uses of capabilities generate audit
 messages. It works for simple tests as you can see from the commit
 message, but unfortunately the call to audit_cgroup_list() deadlocks the
 system when booting a full blown OS. There's no deadlock when the call
 is removed.

 I guess that in some cases, cgroup_mutex and/or css_set_lock could be
 already held earlier before entering audit_cgroup_list(). Holding the
 locks is however required by task_cgroup_from_root(). Is there any way
 to avoid this? For example, only print some kind of cgroup ID numbers
 (are there unique and stable IDs, available without locks?) for those
 cgroups where the task is registered in the audit message?
>>>
>>> I am not sure if anyone know what really happens here. I suggest to
>>> enable lockdep. It might detect possible deadlock even before it
>>> really happens, see Documentation/locking/lockdep-design.txt
>>>
>>> It can be enabled by
>>>
>>>CONFIG_PROVE_LOCKING=y
>>>
>>> It depends on
>>>
>>> CONFIG_DEBUG_KERNEL=y
>>>
>>> and maybe some more options, see lib/Kconfig.debug
>>
>> Thanks a lot! I caught this stack dump:
>>
>> starting version 230
>> [3.416647] [ cut here ]
>> [3.417310] WARNING: CPU: 0 PID: 95 at
>> /home/topi/d/linux.git/kernel/locking/lockdep.c:2871
>> lockdep_trace_alloc+0xb4/0xc0
>> [3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
>> [3.417923] Modules linked in:
>> [3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97
>> [3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS Debian-1.8.2-1 04/01/2014
>> [3.418726]  0086 7970f3b0 8816fb00
>> 813c9c45
>> [3.418993]  8816fb50  8816fb40
>> 81091e9b
>> [3.419176]  0b3705e2c798 0046 0410
>> 
>> [3.419374] Call Trace:
>> [3.419511]  [] dump_stack+0x67/0x92
>> [3.419644]  [] __warn+0xcb/0xf0
>> [3.419745]  [] warn_slowpath_fmt+0x5f/0x80
>> [3.419868]  [] lockdep_trace_alloc+0xb4/0xc0
>> [3.419988]  [] kmem_cache_alloc_node+0x42/0x600
>> [3.420156]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
>> [3.420170]  [] __alloc_skb+0x5b/0x1d0
>> [3.420170]  [] audit_log_start+0x29b/0x480
>> [3.420170]  [] ? __lock_task_sighand+0x95/0x270
>> [3.420170]  [] audit_log_cap_use+0x39/0xf0
>> [3.420170]  [] ns_capable+0x45/0x70
>> [3.420170]  [] capable+0x17/0x20
>> [3.420170]  [] oom_score_adj_write+0x150/0x2f0
>> [3.420170]  [] __vfs_write+0x37/0x160
>> [3.420170]  [] ? update_fast_ctr+0x17/0x30
>> [3.420170]  [] ? percpu_down_read+0x49/0x90
>> [3.420170]  [] ? __sb_start_write+0xb7/0xf0
>> [3.420170]  [] ? __sb_start_write+0xb7/0xf0
>> [3.420170]  [] vfs_write+0xb8/0x1b0
>> [3.420170]  [] ? __fget_light+0x66/0x90
>> [3.420170]  [] SyS_write+0x58/0xc0
>> [3.420170]  [] do_syscall_64+0x5c/0x300
>> [3.420170]  [] entry_SYSCALL64_slow_path+0x25/0x25
>> [3.420170] ---[ end trace fb586899fb556a5e ]---
>> [3.447922] random: systemd-udevd urandom read with 3 bits of entropy
>> available
>> [4.014078] clocksource: Switched to clocksource tsc
>> Begin: Loading essential drivers ... done.
>>
>> This is with qemu and the boot continues normally. With real computer,
>> there's no such output and system just seems to freeze.
>>
>> Could it be possible that the deadlock happens because there's some IO
>> towards /sys/fs/cgroup, which causes a capability check and that in turn
>> causes locking problems when we try to print cgroup list?
> 
> The above warning is printed by the code from
> kernel/locking/lockdep.c:2871
> 
> static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
> {
> [...]
>   /* We're only interested __GFP_FS allocations for now */
>   if (!(gfp_mask & __GFP_FS))
>   return;
> 
>   /*
>* Oi! Can't be having __GFP_FS allocations with IRQs disabled.
>*/
>   if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
>   return;
> 
> 
> The backtrace shows that your new audit_log_cap_use() is called
> from vfs_write(). You might try to use audit_log_start() with
> GFP_NOFS instead of GFP_KERNEL.
> 
> Note that this is rather intuitive advice. I still need to learn a lot
> about memory management and kernel in general to be more sure about
> a correct solution.

Here's what I got now:

[   18.043181]
[   18.044123] ==
[   18.044123] [ INFO: possible circular locking dependency detected ]
[   18.044123] 4.7.0-rc5+ #99 Not tainted
[   18.044123] 

[PATCH] include: mman: Use bool instead of int for the return value of arch_validate_prot

2016-07-09 Thread chengang
From: Chen Gang 

For pure bool function's return value, bool is a little better more or
less than int.

And return boolean result directly. Since 'if' statement is also for
boolean checking, and return boolean result, too.

Signed-off-by: Chen Gang 
---
 arch/powerpc/include/asm/mman.h | 8 +++-
 include/linux/mman.h| 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 2563c43..62e1f47 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -31,13 +31,11 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long 
vm_flags)
 }
 #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
-static inline int arch_validate_prot(unsigned long prot)
+static inline bool arch_validate_prot(unsigned long prot)
 {
if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
-   return 0;
-   if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
-   return 0;
-   return 1;
+   return false;
+   return (prot & PROT_SAO) == 0 || cpu_has_feature(CPU_FTR_SAO);
 }
 #define arch_validate_prot(prot) arch_validate_prot(prot)
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 33e17f6..634c4c5 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -49,7 +49,7 @@ static inline void vm_unacct_memory(long pages)
  *
  * Returns true if the prot flags are valid
  */
-static inline int arch_validate_prot(unsigned long prot)
+static inline bool arch_validate_prot(unsigned long prot)
 {
return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0;
 }
-- 
1.9.3



Re: [f2fs-dev] [PATCH 3/7] f2fs: drop any block plugging

2016-07-09 Thread Jaegeuk Kim
On Sat, Jul 09, 2016 at 10:28:49AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2016/6/9 1:24, Jaegeuk Kim wrote:
> > In f2fs, we don't need to keep block plugging for NODE and DATA writes, 
> > since
> > we already merged bios as much as possible.
> 
> IMO, we can not remove block plug, this is because there are still many
> conditions which stops us merging r/w IOs into one bio as we expect,
> theoretically, block plug can hold bios as much as possible, then submitting
> them into queue in batch, it will reduce racing of grabbing queue->lock during
> bio submitting, if we drop them, when syncing nodes or flushing datas, we will
> suffer more lock racing.
> 
> Or there are something I am missing, do you suffer any performance issue on
> block plug?

In the latest patch, I've turned off plugging forcefully, only if the underlying
device is SMR drive.
And, still I removed other block plugging, since I couldn't see any performance
regression. Even in some workloads, I could have seen some inverted IOs due to
race condition between plugged and unplugged IOs.

Thanks,

> 
> Thanks,
> 
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/checkpoint.c |  4 
> >  fs/f2fs/data.c   | 17 ++---
> >  fs/f2fs/gc.c |  5 -
> >  fs/f2fs/segment.c|  7 +--
> >  4 files changed, 11 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 5ddd15c..4179c7b 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -897,11 +897,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
> > .nr_to_write = LONG_MAX,
> > .for_reclaim = 0,
> > };
> > -   struct blk_plug plug;
> > int err = 0;
> >  
> > -   blk_start_plug(&plug);
> > -
> >  retry_flush_dents:
> > f2fs_lock_all(sbi);
> > /* write all the dirty dentry pages */
> > @@ -938,7 +935,6 @@ retry_flush_nodes:
> > goto retry_flush_nodes;
> > }
> >  out:
> > -   blk_finish_plug(&plug);
> > return err;
> >  }
> >  
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 30dc448..5f655d0 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -98,10 +98,13 @@ static struct bio *__bio_alloc(struct f2fs_sb_info 
> > *sbi, block_t blk_addr,
> >  }
> >  
> >  static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw,
> > -   struct bio *bio)
> > +   struct bio *bio, enum page_type type)
> >  {
> > -   if (!is_read_io(rw))
> > +   if (!is_read_io(rw)) {
> > atomic_inc(&sbi->nr_wb_bios);
> > +   if (current->plug && (type == DATA || type == NODE))
> > +   blk_finish_plug(current->plug);
> > +   }
> > submit_bio(rw, bio);
> >  }
> >  
> > @@ -117,7 +120,7 @@ static void __submit_merged_bio(struct f2fs_bio_info 
> > *io)
> > else
> > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio);
> >  
> > -   __submit_bio(io->sbi, fio->rw, io->bio);
> > +   __submit_bio(io->sbi, fio->rw, io->bio, fio->type);
> > io->bio = NULL;
> >  }
> >  
> > @@ -235,7 +238,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
> > return -EFAULT;
> > }
> >  
> > -   __submit_bio(fio->sbi, fio->rw, bio);
> > +   __submit_bio(fio->sbi, fio->rw, bio, fio->type);
> > return 0;
> >  }
> >  
> > @@ -1040,7 +1043,7 @@ got_it:
> >  */
> > if (bio && (last_block_in_bio != block_nr - 1)) {
> >  submit_and_realloc:
> > -   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio, DATA);
> > bio = NULL;
> > }
> > if (bio == NULL) {
> > @@ -1083,7 +1086,7 @@ set_error_page:
> > goto next_page;
> >  confused:
> > if (bio) {
> > -   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio, DATA);
> > bio = NULL;
> > }
> > unlock_page(page);
> > @@ -1093,7 +1096,7 @@ next_page:
> > }
> > BUG_ON(pages && !list_empty(pages));
> > if (bio)
> > -   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio, DATA);
> > return 0;
> >  }
> >  
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 4a03076..67fd285 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -777,7 +777,6 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
> >  {
> > struct page *sum_page;
> > struct f2fs_summary_block *sum;
> > -   struct blk_plug plug;
> > unsigned int segno = start_segno;
> > unsigned int end_segno = start_segno + sbi->segs_per_sec;
> > int seg_freed = 0;
> > @@ -795,8 +794,6 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
> > unlock_page(sum_page);
> > }
> >  
> > -   blk_start_plug(&plug);
> > -
> > for (segno = start_seg

Re: [PATCH v3] f2fs: fix to avoid data update racing between GC and DIO

2016-07-09 Thread Jaegeuk Kim
On Fri, Jul 08, 2016 at 11:50:02PM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2016/7/8 11:19, Jaegeuk Kim wrote:
> > Hi Chao,
> > 
> > Could you take a look at this in xfstests/generic/013?
> > 
> > [  502.480850] ==
> > [  502.480864] [ INFO: possible circular locking dependency detected ]
> > [  502.480877] 4.7.0-rc1+ #124 Tainted: G   OE  
> > [  502.480886] ---
> > [  502.480897] fsstress/10729 is trying to acquire lock:
> > [  502.480906]  (&sb->s_type->i_mutex_key#18){+.+.+.}, at: 
> > [] do_blockdev_direct_IO+0x1db/0x2310
> > [  502.480948] 
> > [  502.480948] but task is already holding lock:
> > [  502.480959]  (&fi->dio_rwsem){.+.+.+}, at: [] 
> > f2fs_direct_IO+0xd1/0x3d0 [f2fs]
> > [  502.481003] 
> > [  502.481003] which lock already depends on the new lock.
> > [  502.481003] 
> > [  502.481018] 
> > [  502.481018] the existing dependency chain (in reverse order) is:
> > [  502.481030] 
> > [  502.481030] -> #1 (&fi->dio_rwsem){.+.+.+}:
> > [  502.481054][] lock_acquire+0xd3/0x220
> > [  502.481071][] down_read+0x51/0xa0
> > [  502.481089][] f2fs_direct_IO+0xd1/0x3d0 [f2fs]
> > [  502.481114][] 
> > generic_file_direct_write+0xa7/0x160
> > [  502.481133][] 
> > __generic_file_write_iter+0xbd/0x1e0
> > [  502.481149][] f2fs_file_write_iter+0xdb/0x100 
> > [f2fs]
> > [  502.481173][] __vfs_write+0xc8/0x140
> > [  502.481190][] vfs_write+0xb5/0x1b0
> > [  502.481205][] SyS_write+0x49/0xa0
> > [  502.481220][] 
> > entry_SYSCALL_64_fastpath+0x23/0xc1
> > [  502.481236] 
> > [  502.481236] -> #0 (&sb->s_type->i_mutex_key#18){+.+.+.}:
> > [  502.481264][] __lock_acquire+0x161c/0x1940
> > [  502.481280][] lock_acquire+0xd3/0x220
> > [  502.481296][] down_write+0x5a/0xc0
> > [  502.481312][] 
> > do_blockdev_direct_IO+0x1db/0x2310
> > [  502.481328][] __blockdev_direct_IO+0x3a/0x40
> > [  502.481344][] f2fs_direct_IO+0x104/0x3d0 [f2fs]
> > [  502.481368][] 
> > generic_file_read_iter+0x689/0x7e0
> > [  502.481384][] __vfs_read+0xc1/0x130
> > [  502.481399][] vfs_read+0x91/0x140
> > [  502.481414][] SyS_read+0x49/0xa0
> > [  502.481429][] 
> > entry_SYSCALL_64_fastpath+0x23/0xc1
> > [  502.481445] 
> > [  502.481445] other info that might help us debug this:
> > [  502.481445] 
> > [  502.481459]  Possible unsafe locking scenario:
> > [  502.481459] 
> > [  502.481726]CPU0CPU1
> > [  502.481987]
> > [  502.482242]   lock(&fi->dio_rwsem);
> > [  502.482501]
> > lock(&sb->s_type->i_mutex_key#18);
> > [  502.482765]lock(&fi->dio_rwsem);
> > [  502.483025]   lock(&sb->s_type->i_mutex_key#18);
> 
> Seems we will suffer ABBA deadlock:
> 
> writerreader
> - f2fs_file_write_iter
>  - down_write(&inode->i_rwsem)
>  - __generic_file_write_iter
>   - generic_file_direct_write
>- f2fs_direct_IO
>   - generic_file_read_iter
>- f2fs_direct_IO
>- down_read(&fi->dio_rwsem)
> - __blockdev_direct_IO
>  - do_blockdev_direct_IO
>   - down_write(&inode->i_rwsem)
>   
> - down_read(&fi->dio_rwsem)
> 
> What about splitting dio_rwsem to rdio_rwsem/wdio_rwsem for reader/writer to
> avoid deadlock?

Hmm, how about inode_trylock in GC?

> 
> Thanks,
> 
> > [  502.483285] 
> > [  502.483285]  *** DEADLOCK ***
> > [  502.483285] 
> > [  502.484018] 1 lock held by fsstress/10729:
> > [  502.484262]  #0:  (&fi->dio_rwsem){.+.+.+}, at: [] 
> > f2fs_direct_IO+0xd1/0x3d0 [f2fs]
> > 
> > Thanks,
> > 
> > On Thu, Jul 07, 2016 at 12:49:12PM +0800, Chao Yu wrote:
> >> From: Chao Yu 
> >>
> >> Datas in file can be operated by GC and DIO simultaneously, so we will
> >> face race case as below:
> >>
> >> For write case:
> >> Thread A   Thread B
> >> - generic_file_direct_write
> >>  - invalidate_inode_pages2_range
> >>  - f2fs_direct_IO
> >>   - do_blockdev_direct_IO
> >>- do_direct_IO
> >> - get_more_blocks
> >>- f2fs_gc
> >> - do_garbage_collect
> >>  - gc_data_segment
> >>   - move_data_page
> >>- do_write_data_page
> >>migrate data block to new block 
> >> address
> >>- dio_bio_submit
> >>update user data to old block address
> >>
> >> For read case:
> >> Thread 

[PATCH] mm: migrate: Use bool instead of int for the return value of PageMovable

2016-07-09 Thread chengang
From: Chen Gang 

For pure bool function's return value, bool is a little better more or
less than int.

And return boolean result directly, since 'if' statement is also for
boolean checking, and return boolean result, too.

Signed-off-by: Chen Gang 
---
 include/linux/migrate.h | 4 ++--
 mm/compaction.c | 9 +++--
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ae8d475..0e366f8 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -72,11 +72,11 @@ static inline int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_COMPACTION
-extern int PageMovable(struct page *page);
+extern bool PageMovable(struct page *page);
 extern void __SetPageMovable(struct page *page, struct address_space *mapping);
 extern void __ClearPageMovable(struct page *page);
 #else
-static inline int PageMovable(struct page *page) { return 0; };
+static inline bool PageMovable(struct page *page) { return false; };
 static inline void __SetPageMovable(struct page *page,
struct address_space *mapping)
 {
diff --git a/mm/compaction.c b/mm/compaction.c
index 0bd53fb..cfcfe88 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -95,19 +95,16 @@ static inline bool migrate_async_suitable(int migratetype)
 
 #ifdef CONFIG_COMPACTION
 
-int PageMovable(struct page *page)
+bool PageMovable(struct page *page)
 {
struct address_space *mapping;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
if (!__PageMovable(page))
-   return 0;
+   return false;
 
mapping = page_mapping(page);
-   if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
-   return 1;
-
-   return 0;
+   return mapping && mapping->a_ops && mapping->a_ops->isolate_page;
 }
 EXPORT_SYMBOL(PageMovable);
 
-- 
1.9.3



[v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.

2016-07-09 Thread Arvind Yadav
IS_ERR_VALUE() assumes that its parameter is an unsigned long.
It can not be used to check if an 'unsigned int' reflects an error.
As they pass an 'unsigned int' into a function that takes an
'unsigned long' argument. This happens to work because the type
is sign-extended on 64-bit architectures before it gets converted
into an unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

It would be nice to any users that are not passing 'unsigned int'
arguments.

Signed-off-by: Arvind Yadav 
---
 drivers/bcma/scan.c | 1 -
 include/linux/err.h | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/bcma/scan.c b/drivers/bcma/scan.c
index 4a2d1b2..3bc77eb 100644
--- a/drivers/bcma/scan.c
+++ b/drivers/bcma/scan.c
@@ -272,7 +272,6 @@ static struct bcma_device *bcma_find_core_reverse(struct 
bcma_bus *bus, u16 core
return NULL;
 }
 
-#define IS_ERR_VALUE_U32(x) ((x) >= (u32)-MAX_ERRNO)
 
 static int bcma_get_next_core(struct bcma_bus *bus, u32 __iomem **eromptr,
  struct bcma_device_id *match, int core_num,
diff --git a/include/linux/err.h b/include/linux/err.h
index 1e35588..1940af7 100644
--- a/include/linux/err.h
+++ b/include/linux/err.h
@@ -20,6 +20,8 @@
 
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
 
+#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned 
int)-MAX_ERRNO)
+
 static inline void * __must_check ERR_PTR(long error)
 {
return (void *) error;
-- 
1.9.1



Re: [PATCH 0/3] ARM: dts: the dts support for rk3288 firefly reload

2016-07-09 Thread ayaka

Thank you for you review, HeiKo


On 07/08/2016 05:35 AM, Heiko Stuebner wrote:

Hi Randy,

Am Donnerstag, 7. Juli 2016, 02:22:57 schrieb Randy Li:

The rk3288 firefly reload  is a Rockchip RK3288 based board be found by
core board and main board. The regulators are connected in a different
way to the previous version of firefly boards, it is necessary to
move some common code to uncommon place.

I only tested the ethernet and confirmed that works.
The usb in this board won't caused by the bugs in the driver.

This version follow the suggests from Heiko Stuebner,
except the duplicated supply name problem, I don't think
it could be fixed in that way.

I've now had a chance to look at that reload board on the firefly site.
Firefly also is the company name, so a board named that way is not
necessarily a "variant" :-) .

And looking at the "reload" board this definitly seems to be a very
different product with it being a system-on-module+baseboard design with
additional peripherals like that sata bridge, camera interfaces and probably
sata bridge is just a SATA to usb bridge and the "reload" bring back the 
DVP camera interface and

a HDMI rx chip connected to the other MIPI camera interface.

more.

As you might've seen, most Rockchip boards are based on some reference-
design, so are similar in a big part of their core layout.
Yes, from the evb. But the even the main board of evb in rockchip 
company have at lease 3 versions

as I known.
Also the evb is found by power board, main board and core board.

So, looking at the vastly different product the reload is, I'd really like
to have a separate dts for the reload, to not run into more confusing
differences later on.
The main problem is that power connections are different. That is why I 
decide to make a
separate dts. If the kernel introduce the override dts, I could have a 
better way to implement

it.

Also, when adding a new board, please also add an entry to
Documentation/devicetree/bindingd/arm/rockchip.txt

I would send a patch set in a few days.

Thanks
Heiko

Thank you for you review and you patient again
Randy



Re: [PATCH v3 1/7] lib: string: add functions to case-convert strings

2016-07-09 Thread Markus Mayer
On 9 July 2016 at 05:04, Luis de Bethencourt  wrote:
> On 08/07/16 23:43, Markus Mayer wrote:
>> Add a collection of generic functions to convert strings to lowercase
>> or uppercase.
>>
>> Changing the case of a string (with or without copying it first) seems
>> to be a recurring requirement in the kernel that is currently being
>> solved by several duplicated implementations doing the same thing. This
>> change aims at reducing this code duplication.
>>
>> The new functions are
>> void strlcpytoupper(char *dst, const char *src, size_t len);
>> void strlcpytolower(char *dst, const char *src, size_t len);
>> void strcpytoupper(char *dst, const char *src);
>> void strcpytolower(char *dst, const char *src);
>> void strtoupper(char *s);
>> void strtolower(char *s);
>>
>> The "str[l]cpyto*" versions of the function take a destination string
>> and a source string as arguments. The "strlcpyto*" versions additionally
>> take a length argument like strlcpy() itself. Lastly, the strto*
>> functions take a single string argument and modify the passed-in string.
>>
>> Like strlcpy(), and unlike strncpy(), the functions guarantee NULL
>> termination of the destination string.
>>
>> Signed-off-by: Markus Mayer 
>> ---
>>  include/linux/string.h | 40 
>>  lib/string.c   | 38 ++
>>  2 files changed, 78 insertions(+)
>>
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 26b6f6a..36c9d14 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -116,6 +116,8 @@ extern void * memchr(const void *,int,__kernel_size_t);
>>  #endif
>>  void *memchr_inv(const void *s, int c, size_t n);
>>  char *strreplace(char *s, char old, char new);
>> +extern void strlcpytoupper(char *dst, const char *src, size_t len);
>> +extern void strlcpytolower(char *dst, const char *src, size_t len);
>>
>>  extern void kfree_const(const void *x);
>>
>> @@ -169,4 +171,42 @@ static inline const char *kbasename(const char *path)
>>   return tail ? tail + 1 : path;
>>  }
>>
>> +/**
>> + * strcpytoupper - Copy string and convert to uppercase.
>> + * @dst: The buffer to store the result.
>> + * @src: The string to convert to uppercase.
>> + */
>> +static inline void strcpytoupper(char *dst, const char *src)
>> +{
>> + strlcpytoupper(dst, src, -1);
>> +}
>> +
>
> Why not use SIZE_MAX instead of -1?

Sure. I'll change all four of them. Thanks.

>> +/**
>> + * strcpytolower - Copy string and convert to lowercase.
>> + * @dst: The buffer to store the result.
>> + * @src: The string to convert to lowercase.
>> + */
>> +static inline void strcpytolower(char *dst, const char *src)
>> +{
>> + strlcpytolower(dst, src, -1);
>> +}
>> +
>
> Same here, and the 2 below :)
>
> Thanks Markus,
> Luis
>
>> +/**
>> + * strtoupper - Convert string to uppercase.
>> + * @s: The string to operate on.
>> + */
>> +static inline void strtoupper(char *s)
>> +{
>> + strlcpytoupper(s, s, -1);
>> +}
>> +
>> +/**
>> + * strtolower - Convert string to lowercase.
>> + * @s: The string to operate on.
>> + */
>> +static inline void strtolower(char *s)
>> +{
>> + strlcpytolower(s, s, -1);
>> +}
>> +
>>  #endif /* _LINUX_STRING_H_ */
>> diff --git a/lib/string.c b/lib/string.c
>> index ed83562..fd8c427 100644
>> --- a/lib/string.c
>> +++ b/lib/string.c
>> @@ -952,3 +952,41 @@ char *strreplace(char *s, char old, char new)
>>   return s;
>>  }
>>  EXPORT_SYMBOL(strreplace);
>> +
>> +/**
>> + * strlcpytoupper - Copy a length-limited string and convert to uppercase.
>> + * @dst: The buffer to store the result.
>> + * @src: The string to convert to uppercase.
>> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit.
>> + */
>> +void strlcpytoupper(char *dst, const char *src, size_t len)
>> +{
>> + size_t i;
>> +
>> + if (!len)
>> + return;
>> +
>> + for (i = 0; i < len && src[i]; ++i)
>> + dst[i] = toupper(src[i]);
>> + dst[i < len ? i : i - 1] = '\0';
>> +}
>> +EXPORT_SYMBOL(strlcpytoupper);
>> +
>> +/**
>> + * strlcpytolower - Copy a length-limited string and convert to lowercase.
>> + * @dst: The buffer to store the result.
>> + * @src: The string to convert to lowercase.
>> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit.
>> + */
>> +void strlcpytolower(char *dst, const char *src, size_t len)
>> +{
>> + size_t i;
>> +
>> + if (!len)
>> + return;
>> +
>> + for (i = 0; i < len && src[i]; ++i)
>> + dst[i] = tolower(src[i]);
>> + dst[i < len ? i : i - 1] = '\0';
>> +}
>> +EXPORT_SYMBOL(strlcpytolower);
>>
>


Re: Re: cgroup: Fix split bio been throttled more than once

2016-07-09 Thread Tejun Heo
Hello, Ming.

On Fri, Jul 08, 2016 at 06:35:06PM +0800, Ming Lei wrote:
> I am wondering why REQ_THROTTLED is cleared for the original bio
> even it has been charged and will be issued to driver, and is it allowed
> to throttle and charge the same bio for many times?

So, IIUC, the flag is just to prevent the bio from recursing while
being issued from blk-throtl after queued there for throttling.  We
can probably extend the flag.  I'm not sure how it'd interact with
stacked drivers tho.  It'd definitely need to be cleared before
traveling down to a lower level device.

Thanks.

-- 
tejun


Re: [PATCH v2] Add tw5864 driver

2016-07-09 Thread Andrey Utkin
Hi Hans,

Thanks for great help.
I believe the issues highlighted by your are rectified by now.

One chunk of your proposed changes seems to be wrong.

Also I have one non-technical change I want to introduce to this driver, see it
in the bottom of this letter ("Also, I decided to document known video quality
issues in a printed warning...").

On Fri, Jul 01, 2016 at 03:35:40PM +0200, Hans Verkuil wrote:
> On 06/10/2016 12:11 AM, Andrey Utkin wrote:
> > +   cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> 
> This line can be dropped: the v4l2 core will do this automatically.

This seems not so: dropping it resulted in new compliance fails:

Required ioctls:
fail: v4l2-compliance.cpp(550): dcaps & ~caps
test VIDIOC_QUERYCAP: FAIL

Allow for multiple opens:
test second video open: OK
fail: v4l2-compliance.cpp(550): dcaps & ~caps
test VIDIOC_QUERYCAP: FAIL


I am running latest v4l-utils from git.
This particular fail happens on kernels built from next-20160707 and
next-20160609.

BTW next-20160707 makes my dev machine to hang after few minutes of uptime,
regardless of my module being loaded, so for now I am testing driver on
next-20160609.
This (running old linux-next) causes such new fail with latest v4l-utils:

fail: v4l2-test-buffers.cpp(293): g_flags() & V4L2_BUF_FLAG_DONE

which is understandable because of recent commit to v4l-utils flipping expected
behaviour in this regard:

commit 7d784c6894b10cdf5ec025c2cd7c320320f5f658
Author: Hans Verkuil 
Date:   Fri Jul 8 23:10:34 2016 +0200

v4l2-compliance: fix a check for the DONE flag

This was always set by vb2 drivers due to a bug. It is now cleared
again after that bug was fixed, but the test should now be inverted.

Signed-off-by: Hans Verkuil 

diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp 
b/utils/v4l2-compliance/v4l2-test-buffers.cpp
index fb14170..dc82918 100644
--- a/utils/v4l2-compliance/v4l2-test-buffers.cpp
+++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp
@@ -290,7 +290,7 @@ int buffer::check(unsigned type, unsigned memory, unsigned 
index,
fail_on_test(g_bytesused(p) > g_length(p));
}
fail_on_test(!g_timestamp().tv_sec && !g_timestamp().tv_usec);
-   fail_on_test(!(g_flags() & V4L2_BUF_FLAG_DONE));
+   fail_on_test(g_flags() & V4L2_BUF_FLAG_DONE);
fail_on_test((int)g_sequence() < seq.last_seq + 1);
if (v4l_type_is_video(g_type())) {
fail_on_test(g_field() == V4L2_FIELD_ALTERNATE);

So please expect this fail in v4l2-compliance logs of my new submission.



Also, I decided to document known video quality issues in a printed warning; I
like how it looks now both in code and in dmesg, but checkpatch.pl doesn't like
it. See commit at
https://github.com/bluecherrydvr/linux/commit/83395b6c5e1e5ceb642c9a04a28db5fc22566c87

The message is split in pieces because otherwise it gets truncated.

I'd like some approval or suggestion for rework on this.

It looks like this in dmesg:

[ 5101.182151] tw5864 :06:07.0: BEWARE OF KNOWN ISSUES WITH VIDEO QUALITY

   This driver was developed by Bluecherry LLC by deducing 
behaviour of original manufacturer's driver, from both source code and 
execution traces.
   It is known that there are some artifacts on output video with 
this driver:
- on all known hardware samples: random pixels of wrong color 
(mostly white, red or blue) appearing and disappearing on sequences of P-frames;
- on some hardware samples (known with H.264 core version 
e006:2800): total madness on P-frames: blocks of wrong luminance; blocks of 
wrong colors "creeping" across the picture.
   There is a workaround for both issues: avoid P-frames by setting 
GOP size to 1. To do that, run such command on device files created by this 
driver:

   for dev in /dev/video*; do v4l2-ctl --device $dev 
--set-ctrl=video_gop_size=1; done

[ 5101.357312] systemd-journald[219]: Compressed data object 850 -> 636 using XZ
[ 5101.471071] tw5864 :06:07.0: These issues are not decoding errors; all 
produced H.264 streams are decoded properly. Streams without P-frames don't 
have these artifacts so it's not analog-to-digital conversion issues nor 
internal memory errors; we conclude it's internal H.264 encoder issues.
   We cannot even check the original driver's behaviour because it 
has never worked properly at all in our development environment. So these 
issues may be actually related to firmware or hardware. However it may be that 
there's just some more register settings missing in the driver which would 
please the hardware.
   Manufacturer didn't help much on our inquiries, but feel free to 
disturb again the support of Intersil (owner of former Techwell).


And checkpatch says this:

 $ ./../../../../scripts/checkpatch.pl -f tw5864-core

Re:[v1.1,1/3] driver: input :touchscreen : add Raydium crc touch function

2016-07-09 Thread jeffrey.lin
Hi dmitry:
> >>input_mt_report_slot_state(ts->input, MT_TOOL_FINGER, state);
> >>  
> >> -  if (!state)
> >> -  continue;
> >> -
> >> -  input_report_abs(ts->input, ABS_MT_POSITION_X,
> >> +  if (state == 0x01) {
> 
> >Why we need this change? How is it related to CRC? Do you intent to
> >report contact as active but not emit any position data of state is
> >neither 0 nor 1?
> This is no relationship with CRC, just want to make sure report points as 
> state equal to 1.

>If active contact only reported when state is 0x01 you need to update
>the statements above like this:
>
>   input_mt_report_slot_state(ts->input, MT_TOOL_FINGER,
>  state == 0x01);
>
>   if (state != 0x01)
>   continue;
>
>but I am surprised that your firmware would report anything but 0 for
>inactive contact.
>
>Could you document all possible state values?

Actual, our firmware only can report touch points as 1. Other cases is nothing 
to do. Can I merge this
part you suggested into the CRC version patch?

Thanks.

Jeffrey


Re:[v1.1,3/3] modify raydium firmware update rule

2016-07-09 Thread jeffrey.lin
Hi dmitry:

>> >> modify raydium touch firmware update rule.
>> 
>> >Why? You need to explain why you are proposing a change (but as I
>> >mentioned I see no reason for using custom file names for firmware. Have
>> >userspace adjust name as needed by the driver.
>> 
>> >Thanks.
>> 
>> Just want to easy to do firmware update version control in the factory. If 
>> do this,
>> factory do not easy update wrong version.
>
>Just have your factory image rename firmware to canonical name before
>initiating update. There is no need to encumber kernel code with this.
Okay
Thanks.

Jeffrey.


[PATCH v1 1/1] x86/platform/intel-mid: Mark regulators explicitly defined

2016-07-09 Thread Andy Shevchenko
Intel MID platforms are using explicitly defined regulators.

Let regulator core know that we do not have any additional
regulators left. This lets it substitute unprovided regulators with
dummy ones.

Without this change when CONFIG_REGULATOR=y the USB driver fails on getting
"vbus" regulator and SDHCI can't get "vmmc" and "vqmmc" regulators either.

Signed-off-by: Andy Shevchenko 
---
 arch/x86/platform/intel-mid/intel-mid.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/platform/intel-mid/intel-mid.c 
b/arch/x86/platform/intel-mid/intel-mid.c
index 90bb997..ad10fce 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -144,6 +145,15 @@ static void intel_mid_arch_setup(void)
 out:
if (intel_mid_ops->arch_setup)
intel_mid_ops->arch_setup();
+
+   /*
+* Intel MID platforms are using explicitly defined regulators.
+*
+* Let regulator core know that we do not have any additional
+* regulators left. This lets it substitute unprovided regulators with
+* dummy ones.
+*/
+   regulator_has_full_constraints();
 }
 
 /* MID systems don't have i8042 controller */
-- 
2.8.1



Re: [PATCH] qxl: correctly handling failed allocation

2016-07-09 Thread kbuild test robot
Hi,

[auto build test ERROR on drm/drm-next]
[also build test ERROR on v4.7-rc6 next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Insu-Yun/qxl-correctly-handling-failed-allocation/20151230-031647
base:   git://people.freedesktop.org/~airlied/linux.git drm-next
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/qxl/qxl_kms.c: In function 'qxl_device_init':
>> drivers/gpu/drm/qxl/qxl_kms.c:224:11: error: 'struct qxl_device' has no 
>> member named 'memslots'; did you mean 'mem_slots'?
 if (!qdev->memslots)
  ^~

vim +224 drivers/gpu/drm/qxl/qxl_kms.c

   218  (~(uint64_t)0) >> (qdev->slot_id_bits + 
qdev->slot_gen_bits);
   219  
   220  qdev->mem_slots =
   221  kmalloc(qdev->n_mem_slots * sizeof(struct qxl_memslot),
   222  GFP_KERNEL);
   223  
 > 224  if (!qdev->memslots)
   225  return -ENOMEM;
   226  
   227  idr_init(&qdev->release_idr);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [tip:x86/asm] x86/entry: Inline enter_from_user_mode()

2016-07-09 Thread Borislav Petkov
tip-bot for Paolo Bonzini  wrote:

>Commit-ID:  eec4b1227db153ca16f8f5f285d01fefdce05438
>Gitweb:
>http://git.kernel.org/tip/eec4b1227db153ca16f8f5f285d01fefdce05438
>Author: Paolo Bonzini 
>AuthorDate: Mon, 20 Jun 2016 16:58:30 +0200
>Committer:  Ingo Molnar 
>CommitDate: Sat, 9 Jul 2016 10:44:02 +0200
>
>x86/entry: Inline enter_from_user_mode()
>
>This matches what is already done for prepare_exit_to_usermode(),
>and saves about 60 clock cycles (4% speedup) with the benchmark
>in the previous commit message.
>
>Signed-off-by: Paolo Bonzini 
>Reviewed-by: Rik van Riel 
>Reviewed-by: Andy Lutomirski 
>Reviewed-by: Rik van Riel 
>Reviewed-by: Andy Lutomirski 
>Reviewed-by: Rik van Riel 
>Reviewed-by: Andy Lutomirski 
>Reviewed-by: Rik van Riel 
>Reviewed-by: Andy Lutomirski 
>Acked-by: Paolo Bonzini 

Woohaa, if that amount of review doesn't get this patch upstream I don't know 
what will ;-)

-- 
Sent from a small device: formatting sucks and brevity is inevitable. 


Re: [PATCH net] udp: prevent bugcheck if filter truncates packet too much

2016-07-09 Thread Willem de Bruijn
On Sat, Jul 9, 2016 at 6:43 AM, Michal Kubecek  wrote:
> On Sat, Jul 09, 2016 at 11:48:49AM +0200, Daniel Borkmann wrote:
>> On 07/09/2016 02:20 AM, Alexei Starovoitov wrote:
>> >On Sat, Jul 09, 2016 at 01:31:40AM +0200, Eric Dumazet wrote:
>> >>On Fri, 2016-07-08 at 17:52 +0200, Michal Kubecek wrote:
>> >>>If socket filter truncates an udp packet below the length of UDP header
>> >>>in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
>> >>>BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
>> >>>kernel is configured that way) can be easily enforced by an unprivileged
>> >>>user which was reported as CVE-2016-6162. For a reproducer, see
>> >>>http://seclists.org/oss-sec/2016/q3/8
>> >>>
>> >>>Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before 
>> >>>queueing")
>> >>>Reported-by: Marco Grassi 
>> >>>Signed-off-by: Michal Kubecek 
>> >>>---
>
>> >>Acked-by: Eric Dumazet 
>> >
>> >this is incomplete fix. Please do not apply. See discussion at 
>> >security@kernel
>>
>> Ohh well, didn't see it earlier before starting the discussion at 
>> security@...
>>
>> I'm okay if we take this for now as a quick band aid and find a better
>> way how to deal with the underlying issue long-term so that it's
>> /guaranteed/ that it doesn't bite us any further in such fragile ways.
>
> Agreed. As rc7 is due in a day or two, rushing a complex and intrusive
> solution in might be too risky.

Acked-by: Willem de Bruijn 

Thanks, Michal.


Re: [PATCH v2] input: tablet: pegasus_notetaker: USB PM fixes

2016-07-09 Thread Martin Kepplinger
Am 2016-07-08 um 23:08 schrieb Dmitry Torokhov:
> On Tue, Jun 28, 2016 at 06:17:13PM +0200, Martin Kepplinger wrote:
>> Am 2016-06-23 um 19:18 schrieb Dmitry Torokhov:
>>> Hi Martin,
>>>
>>> On Tue, Jun 14, 2016 at 01:20:15PM +0200, Martin Kepplinger wrote:
  static int pegasus_reset_resume(struct usb_interface *intf)
  {
 +  struct pegasus *pegasus = usb_get_intfdata(intf);
 +
 +  if (pegasus->dev->users)
 +  pegasus_set_mode(pegasus, PEN_MODE_XY, NOTETAKER_LED_MOUSE);
 +
return pegasus_resume(intf);
>>>
>>> Hmm, we need to take input mutex when using pegasus->dev->users, how
>>> about the version below instead?
>>>
>>> Thanks.
>>>
>>
>> Sorry for the delay, give me a few more days to test and confirm this or
>> come up with a final patch.
> 
> Martin, did you have time to try out this version of the patch?
> 
> Thanks!
> 

This patch doesn't seem to work as is. Holidays get in the way, but you
can expect a working patch(set) next week.

   martin



Re: [PATCH v3 1/7] lib: string: add functions to case-convert strings

2016-07-09 Thread Luis de Bethencourt
On 08/07/16 23:43, Markus Mayer wrote:
> Add a collection of generic functions to convert strings to lowercase
> or uppercase.
> 
> Changing the case of a string (with or without copying it first) seems
> to be a recurring requirement in the kernel that is currently being
> solved by several duplicated implementations doing the same thing. This
> change aims at reducing this code duplication.
> 
> The new functions are
> void strlcpytoupper(char *dst, const char *src, size_t len);
> void strlcpytolower(char *dst, const char *src, size_t len);
> void strcpytoupper(char *dst, const char *src);
> void strcpytolower(char *dst, const char *src);
> void strtoupper(char *s);
> void strtolower(char *s);
> 
> The "str[l]cpyto*" versions of the function take a destination string
> and a source string as arguments. The "strlcpyto*" versions additionally
> take a length argument like strlcpy() itself. Lastly, the strto*
> functions take a single string argument and modify the passed-in string.
> 
> Like strlcpy(), and unlike strncpy(), the functions guarantee NULL
> termination of the destination string.
> 
> Signed-off-by: Markus Mayer 
> ---
>  include/linux/string.h | 40 
>  lib/string.c   | 38 ++
>  2 files changed, 78 insertions(+)
> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 26b6f6a..36c9d14 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -116,6 +116,8 @@ extern void * memchr(const void *,int,__kernel_size_t);
>  #endif
>  void *memchr_inv(const void *s, int c, size_t n);
>  char *strreplace(char *s, char old, char new);
> +extern void strlcpytoupper(char *dst, const char *src, size_t len);
> +extern void strlcpytolower(char *dst, const char *src, size_t len);
>  
>  extern void kfree_const(const void *x);
>  
> @@ -169,4 +171,42 @@ static inline const char *kbasename(const char *path)
>   return tail ? tail + 1 : path;
>  }
>  
> +/**
> + * strcpytoupper - Copy string and convert to uppercase.
> + * @dst: The buffer to store the result.
> + * @src: The string to convert to uppercase.
> + */
> +static inline void strcpytoupper(char *dst, const char *src)
> +{
> + strlcpytoupper(dst, src, -1);
> +}
> +

Why not use SIZE_MAX instead of -1?

> +/**
> + * strcpytolower - Copy string and convert to lowercase.
> + * @dst: The buffer to store the result.
> + * @src: The string to convert to lowercase.
> + */
> +static inline void strcpytolower(char *dst, const char *src)
> +{
> + strlcpytolower(dst, src, -1);
> +}
> +

Same here, and the 2 below :)

Thanks Markus,
Luis

> +/**
> + * strtoupper - Convert string to uppercase.
> + * @s: The string to operate on.
> + */
> +static inline void strtoupper(char *s)
> +{
> + strlcpytoupper(s, s, -1);
> +}
> +
> +/**
> + * strtolower - Convert string to lowercase.
> + * @s: The string to operate on.
> + */
> +static inline void strtolower(char *s)
> +{
> + strlcpytolower(s, s, -1);
> +}
> +
>  #endif /* _LINUX_STRING_H_ */
> diff --git a/lib/string.c b/lib/string.c
> index ed83562..fd8c427 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -952,3 +952,41 @@ char *strreplace(char *s, char old, char new)
>   return s;
>  }
>  EXPORT_SYMBOL(strreplace);
> +
> +/**
> + * strlcpytoupper - Copy a length-limited string and convert to uppercase.
> + * @dst: The buffer to store the result.
> + * @src: The string to convert to uppercase.
> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit.
> + */
> +void strlcpytoupper(char *dst, const char *src, size_t len)
> +{
> + size_t i;
> +
> + if (!len)
> + return;
> +
> + for (i = 0; i < len && src[i]; ++i)
> + dst[i] = toupper(src[i]);
> + dst[i < len ? i : i - 1] = '\0';
> +}
> +EXPORT_SYMBOL(strlcpytoupper);
> +
> +/**
> + * strlcpytolower - Copy a length-limited string and convert to lowercase.
> + * @dst: The buffer to store the result.
> + * @src: The string to convert to lowercase.
> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit.
> + */
> +void strlcpytolower(char *dst, const char *src, size_t len)
> +{
> + size_t i;
> +
> + if (!len)
> + return;
> +
> + for (i = 0; i < len && src[i]; ++i)
> + dst[i] = tolower(src[i]);
> + dst[i < len ? i : i - 1] = '\0';
> +}
> +EXPORT_SYMBOL(strlcpytolower);
> 



Re: [LEDE-DEV] DHCP via bridge in case of IPv4

2016-07-09 Thread Alexey Brodkin
Hi Aaron,

On Sat, 2016-07-09 at 07:47 -0400, Aaron Z wrote:
> On Sat, Jul 9, 2016 at 4:37 AM, Alexey Brodkin
>  wrote:
> > 
> > Hello,
> > 
> > I was playing with quite simple bridged setup on different boards with
> > very recent kernels (4.6.3 as of this writing) and found one interesting
> > behavior that I cannot yet understand and googling din't help here as well.
> > 
> > My setup is pretty simple:
> > -   --   -
> > > 
> > > HOST  |   | "Dumb AP"  |   | Wireless client   |
> > > with DHCP |<->(eth0) (wlan0)<->| attempting to |
> > > server|   |\ br0 / |   | get settings via DHCP |
> > -   --   -
> > 
> > * HOST is my laptop with DHCP server that works for sure.
> > * "Dumb AP" is a separate board (I tried ARM-based Wandboard and ARC-based
> >   AXS10x boards but results are exactly the same) with wired (eth0) and 
> > wireless
> >   (wlan0) network controllers bridged together (br0). That "br0" bridge 
> > flawlessly
> >   gets its settings from DHCP server on host.
> > * Wireless client could be either a smatrphone or another laptop etc but
> >   what's important it should be configured to get network settings by DHCP 
> > as well.
> > 
> > So what happens "br0" always gets network settings from DHCP server on HOST.
> > That's fine. But wireless client only reliably gets settings from DHCP 
> > server
> > if IPv6 is enabled on "Dumb AP" board. If IPv6 is disabled I may see that
> > wireless client sends "DHCP Discover" then server replies with "DHCP Offer" 
> > but
> > that offer never reaches wireless client.
> 
>
> Do you have WDS enabled? If not, DHCP has issues in that scenario:
> https://wiki.openwrt.org/doc/howto/clientmode

I don't have WDS enabled. I tried to have as simple setup as possible.
Still from what I see in the Wiki article above problem happens when
there're 4 devices in the chain, right? Because as it says:
>8
The 802.11 standard only uses three MAC addresses for frames transmitted between
the Access Point and the Station. Frames transmitted from the Station to the AP
don't include the ethernet source MAC of the requesting host and response frames
are missing the destination ethernet MAC to address the target host behind the
client bridge.
>8

But in my case I only have 3 devices in the chain so I would think it's
something else but issue described in the article.

Anyways thanks for the hint.

-Alexey


  1   2   >